OWG
/

bigbird-roberta-base

Model card Files Files and versions

chainyo commited on Apr 22, 2022

Commit

1371dfa

·

1 Parent(s): 586a876

Update README.md

Files changed (1) hide show

README.md +46 -0

README.md CHANGED Viewed

@@ -1,3 +1,49 @@
 ---
 license: apache-2.0
 ---

 ---
+language: en
 license: apache-2.0
+datasets:
+- bookcorpus
+- wikipedia
+- cc_news
 ---
+# BigBird base model
+BigBird, is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. Moreover, BigBird comes along with a theoretical understanding of the capabilities of a complete transformer that the sparse model can handle.
+It is a pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in this [paper](https://arxiv.org/abs/2007.14062) and first released in this [repository](https://github.com/google-research/bigbird).
+## Model description
+BigBird relies on **block sparse attention** instead of normal attention (i.e. BERT's attention) and can handle sequences up to a length of 4096 at a much lower compute cost compared to BERT. It has achieved SOTA on various tasks involving very long sequences such as long documents summarization, question-answering with long contexts.
+## Original implementation
+Follow [this link](https://huggingface.co/google/bigbird-roberta-base) to see the original implementation.
+## How to use
+Download the model by cloning the repository via `git clone https://huggingface.co/OWG/bigbird-roberta-base`.
+Then you can use the model with the following code:
+```python
+from onnxruntime import InferenceSession, SessionOptions, GraphOptimizationLevel
+from transformers import BertTokenizer
+tokenizer = BertTokenizer.from_pretrained("google/bigbird-roberta-base")
+options = SessionOptions()
+options.graph_optimization_level = GraphOptimizationLevel.ORT_ENABLE_ALL
+session = InferenceSession("path/to/model.onnx", sess_options=options)
+session.disable_fallback()
+text = "Replace me by any text you want to encode."
+input_ids = tokenizer(text, return_tensors="pt", return_attention_mask=True)
+inputs = {k: v.cpu().detach().numpy() for k, v in input_ids.items()}
+outputs_name = session.get_outputs()[0].name
+outputs = session.run(output_names=[outputs_name], input_feed=inputs)
+```