|
<!--Copyright 2020 The HuggingFace Team. All rights reserved. |
|
|
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with |
|
the License. You may obtain a copy of the License at |
|
|
|
http: |
|
|
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on |
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the |
|
specific language governing permissions and limitations under the License. |
|
--> |
|
|
|
# 🤗 Transformers |
|
|
|
Apprentissage automatique de pointe pour [PyTorch](https: |
|
|
|
🤗 Transformers fournit des API et des outils pour télécharger et entraîner facilement des modèles pré-entraînés de pointe. L'utilisation de modèles pré-entraînés peut réduire vos coûts de calcul, votre empreinte carbone, et vous faire économiser le temps et les ressources nécessaires pour entraîner un modèle à partir de zéro. Ces modèles prennent en charge des tâches courantes dans différentes modalités, telles que : |
|
|
|
📝 **Traitement automatique des langues**: classification de texte, reconnaissance d'entités, système de question-réponse, modèle de langage, génération de résumé, traduction, question à choix multiples et génération de texte.<br> |
|
🖼️ **Vision par ordinateur**: classification d'image, détection d'objet et segmentation.<br> |
|
🗣️ **Audio**: reconnaissance automatique de la parole et classification audio.<br> |
|
🐙 **Multimodalité**: système de question-réponse avec des tableaux ou images, reconnaissance optique de caractères, extraction d'information depuis des documents scannés et classification de vidéo. |
|
|
|
🤗 Transformers prend en charge l'interopérabilité entre PyTorch, TensorFlow et JAX. Cela permet d'utiliser un framework différent à chaque étape de la vie d'un modèle, par example entraîner un modèle en trois lignes de code avec un framework, et le charger pour l'inférence avec un autre. Les modèles peuvent également être exportés dans un format comme ONNX et TorchScript pour être déployés dans des environnements de production. |
|
|
|
Rejoignez la communauté grandissante sur le [Hub](https: |
|
|
|
## Si vous cherchez un support personnalisé de l'équipe Hugging Face |
|
|
|
<a target="_blank" href="https://huggingface.co/support"> |
|
<img alt="HuggingFace Expert Acceleration Program" src="https://cdn-media.huggingface.co/marketing/transformers/new-support-improved.png" style="width: 100%; max-width: 600px; border: 1px solid #eee; border-radius: 4px; box-shadow: 0 1px 2px 0 rgba(0, 0, 0, 0.05);"> |
|
</a> |
|
|
|
## Contents |
|
|
|
La documentation est organisée en 5 parties: |
|
|
|
- **DEMARRER** propose une visite rapide de la bibliothèque et des instructions d'installation pour être opérationnel. |
|
- **TUTORIELS** excellent point de départ pour les débutants. Cette section vous aidera à acquérir les compétences de base dont vous avez besoin pour commencer à utiliser la bibliothèque. |
|
- **GUIDES D'UTILISATION** pour différentes tâches comme par exemple le finetuning d'un modèle pré-entraîné pour la classification de texte ou comment créer et partager votre propre modèle. |
|
- **GUIDES CONCEPTUELS** pour plus de discussions et d'explications sur les concepts et les idées sous-jacentes aux modèles, aux tâches et à la philosophie de conception de 🤗 Transformers. |
|
- **API** décrit toutes les classes et fonctions : |
|
|
|
- **CLASSES PRINCIPALES** détaille les classes les plus importantes comme la configuration, le modèle, le tokenizer et le pipeline.. |
|
- **MODELES** détaille les classes et les fonctions propres à chaque modèle de la bibliothèque. |
|
- **UTILITAIRES INTERNES** détaille les classes et fonctions utilitaires utilisées en interne. |
|
|
|
### Modèles supportés |
|
|
|
<!--This list is updated automatically from the README with _make fix-copies_. Do not update manually! --> |
|
|
|
1. **[ALBERT](model_doc/albert)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https: |
|
1. **[ALIGN](model_doc/align)** (from Google Research) released with the paper [Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision](https: |
|
1. **[AltCLIP](model_doc/altclip)** (from BAAI) released with the paper [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https: |
|
1. **[Audio Spectrogram Transformer](model_doc/audio-spectrogram-transformer)** (from MIT) released with the paper [AST: Audio Spectrogram Transformer](https: |
|
1. **[BART](model_doc/bart)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https: |
|
1. **[BARThez](model_doc/barthez)** (from École polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https: |
|
1. **[BARTpho](model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https: |
|
1. **[BEiT](model_doc/beit)** (from Microsoft) released with the paper [BEiT: BERT Pre-Training of Image Transformers](https: |
|
1. **[BERT](model_doc/bert)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https: |
|
1. **[BERT For Sequence Generation](model_doc/bert-generation)** (from Google) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https: |
|
1. **[BERTweet](model_doc/bertweet)** (from VinAI Research) released with the paper [BERTweet: A pre-trained language model for English Tweets](https: |
|
1. **[BigBird-Pegasus](model_doc/bigbird_pegasus)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https: |
|
1. **[BigBird-RoBERTa](model_doc/big_bird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https: |
|
1. **[BioGpt](model_doc/biogpt)** (from Microsoft Research AI4Science) released with the paper [BioGPT: generative pre-trained transformer for biomedical text generation and mining](https: |
|
1. **[BiT](model_doc/bit)** (from Google AI) released with the paper [Big Transfer (BiT): General Visual Representation Learning](https: |
|
1. **[Blenderbot](model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https: |
|
1. **[BlenderbotSmall](model_doc/blenderbot-small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https: |
|
1. **[BLIP](model_doc/blip)** (from Salesforce) released with the paper [BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation](https: |
|
1. **[BLOOM](model_doc/bloom)** (from BigScience workshop) released by the [BigScience Workshop](https: |
|
1. **[BORT](model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https: |
|
1. **[BridgeTower](model_doc/bridgetower)** (from Harbin Institute of Technology/Microsoft Research Asia/Intel Labs) released with the paper [BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning](https: |
|
1. **[ByT5](model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https: |
|
1. **[CamemBERT](model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https: |
|
1. **[CANINE](model_doc/canine)** (from Google Research) released with the paper [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https: |
|
1. **[Chinese-CLIP](model_doc/chinese_clip)** (from OFA-Sys) released with the paper [Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese](https: |
|
1. **[CLIP](model_doc/clip)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https: |
|
1. **[CLIPSeg](model_doc/clipseg)** (from University of Göttingen) released with the paper [Image Segmentation Using Text and Image Prompts](https: |
|
1. **[CodeGen](model_doc/codegen)** (from Salesforce) released with the paper [A Conversational Paradigm for Program Synthesis](https: |
|
1. **[Conditional DETR](model_doc/conditional_detr)** (from Microsoft Research Asia) released with the paper [Conditional DETR for Fast Training Convergence](https: |
|
1. **[ConvBERT](model_doc/convbert)** (from YituTech) released with the paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https: |
|
1. **[ConvNeXT](model_doc/convnext)** (from Facebook AI) released with the paper [A ConvNet for the 2020s](https: |
|
1. **[ConvNeXTV2](model_doc/convnextv2)** (from Facebook AI) released with the paper [ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders](https: |
|
1. **[CPM](model_doc/cpm)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https: |
|
1. **[CTRL](model_doc/ctrl)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https: |
|
1. **[CvT](model_doc/cvt)** (from Microsoft) released with the paper [CvT: Introducing Convolutions to Vision Transformers](https: |
|
1. **[Data2Vec](model_doc/data2vec)** (from Facebook) released with the paper [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https: |
|
1. **[DeBERTa](model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https: |
|
1. **[DeBERTa-v2](model_doc/deberta-v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https: |
|
1. **[Decision Transformer](model_doc/decision_transformer)** (from Berkeley/Facebook/Google) released with the paper [Decision Transformer: Reinforcement Learning via Sequence Modeling](https: |
|
1. **[Deformable DETR](model_doc/deformable_detr)** (from SenseTime Research) released with the paper [Deformable DETR: Deformable Transformers for End-to-End Object Detection](https: |
|
1. **[DeiT](model_doc/deit)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https: |
|
1. **[DETA](model_doc/deta)** (from The University of Texas at Austin) released with the paper [NMS Strikes Back](https: |
|
1. **[DETR](model_doc/detr)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https: |
|
1. **[DialoGPT](model_doc/dialogpt)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https: |
|
1. **[DiNAT](model_doc/dinat)** (from SHI Labs) released with the paper [Dilated Neighborhood Attention Transformer](https: |
|
1. **[DistilBERT](model_doc/distilbert)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https: |
|
1. **[DiT](model_doc/dit)** (from Microsoft Research) released with the paper [DiT: Self-supervised Pre-training for Document Image Transformer](https: |
|
1. **[Donut](model_doc/donut)** (from NAVER), released together with the paper [OCR-free Document Understanding Transformer](https: |
|
1. **[DPR](model_doc/dpr)** (from Facebook) released with the paper [Dense Passage Retrieval for Open-Domain Question Answering](https: |
|
1. **[DPT](master/model_doc/dpt)** (from Intel Labs) released with the paper [Vision Transformers for Dense Prediction](https: |
|
1. **[EfficientFormer](model_doc/efficientformer)** (from Snap Research) released with the paper [EfficientFormer: Vision Transformers at MobileNetSpeed](https: |
|
1. **[ELECTRA](model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https: |
|
1. **[EncoderDecoder](model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https: |
|
1. **[ERNIE](model_doc/ernie)** (from Baidu) released with the paper [ERNIE: Enhanced Representation through Knowledge Integration](https: |
|
1. **[ESM](model_doc/esm)** (from Meta AI) are transformer protein language models. **ESM-1b** was released with the paper [Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences](https: |
|
1. **[FLAN-T5](model_doc/flan-t5)** (from Google AI) released in the repository [google-research/t5x](https: |
|
1. **[FlauBERT](model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https: |
|
1. **[FLAVA](model_doc/flava)** (from Facebook AI) released with the paper [FLAVA: A Foundational Language And Vision Alignment Model](https: |
|
1. **[FNet](model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https: |
|
1. **[Funnel Transformer](model_doc/funnel)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https: |
|
1. **[GIT](model_doc/git)** (from Microsoft Research) released with the paper [GIT: A Generative Image-to-text Transformer for Vision and Language](https: |
|
1. **[GLPN](model_doc/glpn)** (from KAIST) released with the paper [Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth](https: |
|
1. **[GPT](model_doc/openai-gpt)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https: |
|
1. **[GPT Neo](model_doc/gpt_neo)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https: |
|
1. **[GPT NeoX](model_doc/gpt_neox)** (from EleutherAI) released with the paper [GPT-NeoX-20B: An Open-Source Autoregressive Language Model](https: |
|
1. **[GPT NeoX Japanese](model_doc/gpt_neox_japanese)** (from ABEJA) released by Shinya Otani, Takayoshi Makabe, Anuj Arora, and Kyo Hattori. |
|
1. **[GPT-2](model_doc/gpt2)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https: |
|
1. **[GPT-J](model_doc/gptj)** (from EleutherAI) released in the repository [kingoflolz/mesh-transformer-jax](https: |
|
1. **[GPT-Sw3](model_doc/gpt-sw3)** (from AI-Sweden) released with the paper [Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish](http: |
|
1. **[Graphormer](model_doc/graphormer)** (from Microsoft) released with the paper [Do Transformers Really Perform Bad for Graph Representation?](https: |
|
1. **[GroupViT](model_doc/groupvit)** (from UCSD, NVIDIA) released with the paper [GroupViT: Semantic Segmentation Emerges from Text Supervision](https: |
|
1. **[Hubert](model_doc/hubert)** (from Facebook) released with the paper [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https: |
|
1. **[I-BERT](model_doc/ibert)** (from Berkeley) released with the paper [I-BERT: Integer-only BERT Quantization](https: |
|
1. **[ImageGPT](model_doc/imagegpt)** (from OpenAI) released with the paper [Generative Pretraining from Pixels](https: |
|
1. **[Jukebox](model_doc/jukebox)** (from OpenAI) released with the paper [Jukebox: A Generative Model for Music](https: |
|
1. **[LayoutLM](model_doc/layoutlm)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https: |
|
1. **[LayoutLMv2](model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https: |
|
1. **[LayoutLMv3](model_doc/layoutlmv3)** (from Microsoft Research Asia) released with the paper [LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking](https: |
|
1. **[LayoutXLM](model_doc/layoutxlm)** (from Microsoft Research Asia) released with the paper [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https: |
|
1. **[LED](model_doc/led)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https: |
|
1. **[LeViT](model_doc/levit)** (from Meta AI) released with the paper [LeViT: A Vision Transformer in ConvNet's Clothing for Faster Inference](https: |
|
1. **[LiLT](model_doc/lilt)** (from South China University of Technology) released with the paper [LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding](https: |
|
1. **[Longformer](model_doc/longformer)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https: |
|
1. **[LongT5](model_doc/longt5)** (from Google AI) released with the paper [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https: |
|
1. **[LUKE](model_doc/luke)** (from Studio Ousia) released with the paper [LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](https: |
|
1. **[LXMERT](model_doc/lxmert)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https: |
|
1. **[M-CTC-T](model_doc/mctct)** (from Facebook) released with the paper [Pseudo-Labeling For Massively Multilingual Speech Recognition](https: |
|
1. **[M2M100](model_doc/m2m_100)** (from Facebook) released with the paper [Beyond English-Centric Multilingual Machine Translation](https: |
|
1. **[MarianMT](model_doc/marian)** Machine translation models trained using [OPUS](http: |
|
1. **[MarkupLM](model_doc/markuplm)** (from Microsoft Research Asia) released with the paper [MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding](https: |
|
1. **[Mask2Former](model_doc/mask2former)** (from FAIR and UIUC) released with the paper [Masked-attention Mask Transformer for Universal Image Segmentation](https: |
|
1. **[MaskFormer](model_doc/maskformer)** (from Meta and UIUC) released with the paper [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https: |
|
1. **[mBART](model_doc/mbart)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https: |
|
1. **[mBART-50](model_doc/mbart)** (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https: |
|
1. **[Megatron-BERT](model_doc/megatron-bert)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https: |
|
1. **[Megatron-GPT2](model_doc/megatron_gpt2)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https: |
|
1. **[mLUKE](model_doc/mluke)** (from Studio Ousia) released with the paper [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https: |
|
1. **[MobileBERT](model_doc/mobilebert)** (from CMU/Google Brain) released with the paper [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https: |
|
1. **[MobileNetV1](model_doc/mobilenet_v1)** (from Google Inc.) released with the paper [MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https: |
|
1. **[MobileNetV2](model_doc/mobilenet_v2)** (from Google Inc.) released with the paper [MobileNetV2: Inverted Residuals and Linear Bottlenecks](https: |
|
1. **[MobileViT](model_doc/mobilevit)** (from Apple) released with the paper [MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer](https: |
|
1. **[MPNet](model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https: |
|
1. **[MT5](model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https: |
|
1. **[MVP](model_doc/mvp)** (from RUC AI Box) released with the paper [MVP: Multi-task Supervised Pre-training for Natural Language Generation](https: |
|
1. **[NAT](model_doc/nat)** (from SHI Labs) released with the paper [Neighborhood Attention Transformer](https: |
|
1. **[Nezha](model_doc/nezha)** (from Huawei Noah’s Ark Lab) released with the paper [NEZHA: Neural Contextualized Representation for Chinese Language Understanding](https: |
|
1. **[NLLB](model_doc/nllb)** (from Meta) released with the paper [No Language Left Behind: Scaling Human-Centered Machine Translation](https: |
|
1. **[Nyströmformer](model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https: |
|
1. **[OneFormer](model_doc/oneformer)** (from SHI Labs) released with the paper [OneFormer: One Transformer to Rule Universal Image Segmentation](https: |
|
1. **[OPT](master/model_doc/opt)** (from Meta AI) released with the paper [OPT: Open Pre-trained Transformer Language Models](https: |
|
1. **[OWL-ViT](model_doc/owlvit)** (from Google AI) released with the paper [Simple Open-Vocabulary Object Detection with Vision Transformers](https: |
|
1. **[Pegasus](model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https: |
|
1. **[PEGASUS-X](model_doc/pegasus_x)** (from Google) released with the paper [Investigating Efficiently Extending Transformers for Long Input Summarization](https: |
|
1. **[Perceiver IO](model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https: |
|
1. **[PhoBERT](model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https: |
|
1. **[PLBart](model_doc/plbart)** (from UCLA NLP) released with the paper [Unified Pre-training for Program Understanding and Generation](https: |
|
1. **[PoolFormer](model_doc/poolformer)** (from Sea AI Labs) released with the paper [MetaFormer is Actually What You Need for Vision](https: |
|
1. **[ProphetNet](model_doc/prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https: |
|
1. **[QDQBert](model_doc/qdqbert)** (from NVIDIA) released with the paper [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https: |
|
1. **[RAG](model_doc/rag)** (from Facebook) released with the paper [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https: |
|
1. **[REALM](model_doc/realm.html)** (from Google Research) released with the paper [REALM: Retrieval-Augmented Language Model Pre-Training](https: |
|
1. **[Reformer](model_doc/reformer)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https: |
|
1. **[RegNet](model_doc/regnet)** (from META Platforms) released with the paper [Designing Network Design Space](https: |
|
1. **[RemBERT](model_doc/rembert)** (from Google Research) released with the paper [Rethinking embedding coupling in pre-trained language models](https: |
|
1. **[ResNet](model_doc/resnet)** (from Microsoft Research) released with the paper [Deep Residual Learning for Image Recognition](https: |
|
1. **[RoBERTa](model_doc/roberta)** (from Facebook), released together with the paper [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https: |
|
1. **[RoBERTa-PreLayerNorm](model_doc/roberta-prelayernorm)** (from Facebook) released with the paper [fairseq: A Fast, Extensible Toolkit for Sequence Modeling](https: |
|
1. **[RoCBert](model_doc/roc_bert)** (from WeChatAI) released with the paper [RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining](https: |
|
1. **[RoFormer](model_doc/roformer)** (from ZhuiyiTechnology), released together with the paper [RoFormer: Enhanced Transformer with Rotary Position Embedding](https: |
|
1. **[SegFormer](model_doc/segformer)** (from NVIDIA) released with the paper [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https: |
|
1. **[SEW](model_doc/sew)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https: |
|
1. **[SEW-D](model_doc/sew_d)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https: |
|
1. **[SpeechT5](model_doc/speecht5)** (from Microsoft Research) released with the paper [SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing](https: |
|
1. **[SpeechToTextTransformer](model_doc/speech_to_text)** (from Facebook), released together with the paper [fairseq S2T: Fast Speech-to-Text Modeling with fairseq](https: |
|
1. **[SpeechToTextTransformer2](model_doc/speech_to_text_2)** (from Facebook), released together with the paper [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https: |
|
1. **[Splinter](model_doc/splinter)** (from Tel Aviv University), released together with the paper [Few-Shot Question Answering by Pretraining Span Selection](https: |
|
1. **[SqueezeBERT](model_doc/squeezebert)** (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https: |
|
1. **[Swin Transformer](model_doc/swin)** (from Microsoft) released with the paper [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https: |
|
1. **[Swin Transformer V2](model_doc/swinv2)** (from Microsoft) released with the paper [Swin Transformer V2: Scaling Up Capacity and Resolution](https: |
|
1. **[Swin2SR](model_doc/swin2sr)** (from University of Würzburg) released with the paper [Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration](https: |
|
1. **[SwitchTransformers](model_doc/switch_transformers)** (from Google) released with the paper [Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity](https: |
|
1. **[T5](model_doc/t5)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https: |
|
1. **[T5v1.1](model_doc/t5v1.1)** (from Google AI) released in the repository [google-research/text-to-text-transfer-transformer](https: |
|
1. **[Table Transformer](model_doc/table-transformer)** (from Microsoft Research) released with the paper [PubTables-1M: Towards Comprehensive Table Extraction From Unstructured Documents](https: |
|
1. **[TAPAS](model_doc/tapas)** (from Google AI) released with the paper [TAPAS: Weakly Supervised Table Parsing via Pre-training](https: |
|
1. **[TAPEX](model_doc/tapex)** (from Microsoft Research) released with the paper [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https: |
|
1. **[Time Series Transformer](model_doc/time_series_transformer)** (from HuggingFace). |
|
1. **[TimeSformer](model_doc/timesformer)** (from Facebook) released with the paper [Is Space-Time Attention All You Need for Video Understanding?](https: |
|
1. **[Trajectory Transformer](model_doc/trajectory_transformers)** (from the University of California at Berkeley) released with the paper [Offline Reinforcement Learning as One Big Sequence Modeling Problem](https: |
|
1. **[Transformer-XL](model_doc/transfo-xl)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https: |
|
1. **[TrOCR](model_doc/trocr)** (from Microsoft), released together with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https: |
|
1. **[UL2](model_doc/ul2)** (from Google Research) released with the paper [Unifying Language Learning Paradigms](https: |
|
1. **[UniSpeech](model_doc/unispeech)** (from Microsoft Research) released with the paper [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https: |
|
1. **[UniSpeechSat](model_doc/unispeech-sat)** (from Microsoft Research) released with the paper [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING](https: |
|
1. **[UPerNet](model_doc/upernet)** (from Peking University) released with the paper [Unified Perceptual Parsing for Scene Understanding](https: |
|
1. **[VAN](model_doc/van)** (from Tsinghua University and Nankai University) released with the paper [Visual Attention Network](https: |
|
1. **[VideoMAE](model_doc/videomae)** (from Multimedia Computing Group, Nanjing University) released with the paper [VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training](https: |
|
1. **[ViLT](model_doc/vilt)** (from NAVER AI Lab/Kakao Enterprise/Kakao Brain) released with the paper [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https: |
|
1. **[Vision Transformer (ViT)](model_doc/vit)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https: |
|
1. **[VisualBERT](model_doc/visual_bert)** (from UCLA NLP) released with the paper [VisualBERT: A Simple and Performant Baseline for Vision and Language](https: |
|
1. **[ViT Hybrid](model_doc/vit_hybrid)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https: |
|
1. **[ViTMAE](model_doc/vit_mae)** (from Meta AI) released with the paper [Masked Autoencoders Are Scalable Vision Learners](https: |
|
1. **[ViTMSN](model_doc/vit_msn)** (from Meta AI) released with the paper [Masked Siamese Networks for Label-Efficient Learning](https: |
|
1. **[Wav2Vec2](model_doc/wav2vec2)** (from Facebook AI) released with the paper [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https: |
|
1. **[Wav2Vec2-Conformer](model_doc/wav2vec2-conformer)** (from Facebook AI) released with the paper [FAIRSEQ S2T: Fast Speech-to-Text Modeling with FAIRSEQ](https: |
|
1. **[Wav2Vec2Phoneme](model_doc/wav2vec2_phoneme)** (from Facebook AI) released with the paper [Simple and Effective Zero-shot Cross-lingual Phoneme Recognition](https: |
|
1. **[WavLM](model_doc/wavlm)** (from Microsoft Research) released with the paper [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https: |
|
1. **[Whisper](model_doc/whisper)** (from OpenAI) released with the paper [Robust Speech Recognition via Large-Scale Weak Supervision](https: |
|
1. **[X-CLIP](model_doc/xclip)** (from Microsoft Research) released with the paper [Expanding Language-Image Pretrained Models for General Video Recognition](https: |
|
1. **[XGLM](model_doc/xglm)** (From Facebook AI) released with the paper [Few-shot Learning with Multilingual Language Models](https: |
|
1. **[XLM](model_doc/xlm)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https: |
|
1. **[XLM-ProphetNet](model_doc/xlm-prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https: |
|
1. **[XLM-RoBERTa](model_doc/xlm-roberta)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https: |
|
1. **[XLM-RoBERTa-XL](model_doc/xlm-roberta-xl)** (from Facebook AI), released together with the paper [Larger-Scale Transformers for Multilingual Masked Language Modeling](https: |
|
1. **[XLNet](model_doc/xlnet)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https: |
|
1. **[XLS-R](model_doc/xls_r)** (from Facebook AI) released with the paper [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale](https: |
|
1. **[XLSR-Wav2Vec2](model_doc/xlsr_wav2vec2)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https: |
|
1. **[YOLOS](model_doc/yolos)** (from Huazhong University of Science & Technology) released with the paper [You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection](https: |
|
1. **[YOSO](model_doc/yoso)** (from the University of Wisconsin - Madison) released with the paper [You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling](https: |
|
|
|
|
|
### Frameworks compatibles |
|
|
|
Le tableau ci-dessous représente la prise en charge actuelle dans la bibliothèque pour chacun de ces modèles, qu'ils aient ou non un tokenizer Python (appelé "slow"). Un tokenizer rapide ("fast") soutenu par la bibliothèque 🤗 Tokenizers, qu'ils aient un support en Jax (via Flax), PyTorch, et/ou TensorFlow. |
|
|
|
<!--This table is updated automatically from the auto modules with _make fix-copies_. Do not update manually!--> |
|
|
|
| Modèle | Tokenizer slow | Tokenizer fast | PyTorch support | TensorFlow support | Flax Support | |
|
|:-----------------------------:|:--------------:|:--------------:|:---------------:|:------------------:|:------------:| |
|
| ALBERT | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| AltCLIP | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| Audio Spectrogram Transformer | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| BART | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| BEiT | ❌ | ❌ | ✅ | ❌ | ✅ | |
|
| BERT | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| Bert Generation | ✅ | ❌ | ✅ | ❌ | ❌ | |
|
| BigBird | ✅ | ✅ | ✅ | ❌ | ✅ | |
|
| BigBird-Pegasus | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| BioGpt | ✅ | ❌ | ✅ | ❌ | ❌ | |
|
| BiT | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| Blenderbot | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| BlenderbotSmall | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| BLIP | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| BLOOM | ❌ | ✅ | ✅ | ❌ | ❌ | |
|
| BridgeTower | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| CamemBERT | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| CANINE | ✅ | ❌ | ✅ | ❌ | ❌ | |
|
| Chinese-CLIP | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| CLIP | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| CLIPSeg | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| CodeGen | ✅ | ✅ | ✅ | ❌ | ❌ | |
|
| Conditional DETR | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| ConvBERT | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| ConvNeXT | ❌ | ❌ | ✅ | ✅ | ❌ | |
|
| CTRL | ✅ | ❌ | ✅ | ✅ | ❌ | |
|
| CvT | ❌ | ❌ | ✅ | ✅ | ❌ | |
|
| Data2VecAudio | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| Data2VecText | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| Data2VecVision | ❌ | ❌ | ✅ | ✅ | ❌ | |
|
| DeBERTa | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| DeBERTa-v2 | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| Decision Transformer | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| Deformable DETR | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| DeiT | ❌ | ❌ | ✅ | ✅ | ❌ | |
|
| DETA | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| DETR | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| DiNAT | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| DistilBERT | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| DonutSwin | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| DPR | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| DPT | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| EfficientFormer | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| ELECTRA | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| Encoder decoder | ❌ | ❌ | ✅ | ✅ | ✅ | |
|
| ERNIE | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| ESM | ✅ | ❌ | ✅ | ✅ | ❌ | |
|
| FairSeq Machine-Translation | ✅ | ❌ | ✅ | ❌ | ❌ | |
|
| FlauBERT | ✅ | ❌ | ✅ | ✅ | ❌ | |
|
| FLAVA | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| FNet | ✅ | ✅ | ✅ | ❌ | ❌ | |
|
| Funnel Transformer | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| GIT | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| GLPN | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| GPT Neo | ❌ | ❌ | ✅ | ❌ | ✅ | |
|
| GPT NeoX | ❌ | ✅ | ✅ | ❌ | ❌ | |
|
| GPT NeoX Japanese | ✅ | ❌ | ✅ | ❌ | ❌ | |
|
| GPT-J | ❌ | ❌ | ✅ | ✅ | ✅ | |
|
| GPT-Sw3 | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| Graphormer | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| GroupViT | ❌ | ❌ | ✅ | ✅ | ❌ | |
|
| Hubert | ❌ | ❌ | ✅ | ✅ | ❌ | |
|
| I-BERT | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| ImageGPT | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| Jukebox | ✅ | ❌ | ✅ | ❌ | ❌ | |
|
| LayoutLM | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| LayoutLMv2 | ✅ | ✅ | ✅ | ❌ | ❌ | |
|
| LayoutLMv3 | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| LED | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| LeViT | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| LiLT | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| Longformer | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| LongT5 | ❌ | ❌ | ✅ | ❌ | ✅ | |
|
| LUKE | ✅ | ❌ | ✅ | ❌ | ❌ | |
|
| LXMERT | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| M-CTC-T | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| M2M100 | ✅ | ❌ | ✅ | ❌ | ❌ | |
|
| Marian | ✅ | ❌ | ✅ | ✅ | ✅ | |
|
| MarkupLM | ✅ | ✅ | ✅ | ❌ | ❌ | |
|
| Mask2Former | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| MaskFormer | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| MaskFormerSwin | ❌ | ❌ | ❌ | ❌ | ❌ | |
|
| mBART | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| Megatron-BERT | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| MobileBERT | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| MobileNetV1 | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| MobileNetV2 | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| MobileViT | ❌ | ❌ | ✅ | ✅ | ❌ | |
|
| MPNet | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| MT5 | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| MVP | ✅ | ✅ | ✅ | ❌ | ❌ | |
|
| NAT | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| Nezha | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| Nyströmformer | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| OneFormer | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| OpenAI GPT | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| OpenAI GPT-2 | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| OPT | ❌ | ❌ | ✅ | ✅ | ✅ | |
|
| OWL-ViT | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| Pegasus | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| PEGASUS-X | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| Perceiver | ✅ | ❌ | ✅ | ❌ | ❌ | |
|
| PLBart | ✅ | ❌ | ✅ | ❌ | ❌ | |
|
| PoolFormer | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| ProphetNet | ✅ | ❌ | ✅ | ❌ | ❌ | |
|
| QDQBert | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| RAG | ✅ | ❌ | ✅ | ✅ | ❌ | |
|
| REALM | ✅ | ✅ | ✅ | ❌ | ❌ | |
|
| Reformer | ✅ | ✅ | ✅ | ❌ | ❌ | |
|
| RegNet | ❌ | ❌ | ✅ | ✅ | ❌ | |
|
| RemBERT | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| ResNet | ❌ | ❌ | ✅ | ✅ | ❌ | |
|
| RetriBERT | ✅ | ✅ | ✅ | ❌ | ❌ | |
|
| RoBERTa | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| RoBERTa-PreLayerNorm | ❌ | ❌ | ✅ | ✅ | ✅ | |
|
| RoCBert | ✅ | ❌ | ✅ | ❌ | ❌ | |
|
| RoFormer | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| SegFormer | ❌ | ❌ | ✅ | ✅ | ❌ | |
|
| SEW | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| SEW-D | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| Speech Encoder decoder | ❌ | ❌ | ✅ | ❌ | ✅ | |
|
| Speech2Text | ✅ | ❌ | ✅ | ✅ | ❌ | |
|
| Speech2Text2 | ✅ | ❌ | ❌ | ❌ | ❌ | |
|
| SpeechT5 | ✅ | ❌ | ✅ | ❌ | ❌ | |
|
| Splinter | ✅ | ✅ | ✅ | ❌ | ❌ | |
|
| SqueezeBERT | ✅ | ✅ | ✅ | ❌ | ❌ | |
|
| Swin Transformer | ❌ | ❌ | ✅ | ✅ | ❌ | |
|
| Swin Transformer V2 | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| Swin2SR | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| SwitchTransformers | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| T5 | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| Table Transformer | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| TAPAS | ✅ | ❌ | ✅ | ✅ | ❌ | |
|
| Time Series Transformer | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| TimeSformer | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| Trajectory Transformer | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| Transformer-XL | ✅ | ❌ | ✅ | ✅ | ❌ | |
|
| TrOCR | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| UniSpeech | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| UniSpeechSat | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| UPerNet | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| VAN | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| VideoMAE | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| ViLT | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| Vision Encoder decoder | ❌ | ❌ | ✅ | ✅ | ✅ | |
|
| VisionTextDualEncoder | ❌ | ❌ | ✅ | ❌ | ✅ | |
|
| VisualBERT | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| ViT | ❌ | ❌ | ✅ | ✅ | ✅ | |
|
| ViT Hybrid | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| ViTMAE | ❌ | ❌ | ✅ | ✅ | ❌ | |
|
| ViTMSN | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| Wav2Vec2 | ✅ | ❌ | ✅ | ✅ | ✅ | |
|
| Wav2Vec2-Conformer | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| WavLM | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| Whisper | ✅ | ❌ | ✅ | ✅ | ❌ | |
|
| X-CLIP | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| XGLM | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| XLM | ✅ | ❌ | ✅ | ✅ | ❌ | |
|
| XLM-ProphetNet | ✅ | ❌ | ✅ | ❌ | ❌ | |
|
| XLM-RoBERTa | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
| XLM-RoBERTa-XL | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| XLNet | ✅ | ✅ | ✅ | ✅ | ❌ | |
|
| YOLOS | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
| YOSO | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
|
|
<!-- End table--> |