-
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Paper • 2006.03654 • Published • 3 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 21 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 9 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 16
Mariusj G
MariusjG
·
AI & ML interests
None yet
Recent Activity
upvoted
an
article
15 days ago
From Zero to GPU: A Guide to Building and Scaling Production-Ready CUDA Kernels
upvoted
a
paper
15 days ago
Prompt Orchestration Markup Language
upvoted
a
paper
30 days ago
Attention Heads of Large Language Models: A Survey
Organizations
None yet