Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,49 @@
|
|
| 1 |
---
|
|
|
|
| 2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
language: en
|
| 3 |
license: apache-2.0
|
| 4 |
+
datasets:
|
| 5 |
+
- bookcorpus
|
| 6 |
+
- wikipedia
|
| 7 |
+
- cc_news
|
| 8 |
---
|
| 9 |
+
|
| 10 |
+
# BigBird base model
|
| 11 |
+
|
| 12 |
+
BigBird, is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. Moreover, BigBird comes along with a theoretical understanding of the capabilities of a complete transformer that the sparse model can handle.
|
| 13 |
+
|
| 14 |
+
It is a pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in this [paper](https://arxiv.org/abs/2007.14062) and first released in this [repository](https://github.com/google-research/bigbird).
|
| 15 |
+
|
| 16 |
+
## Model description
|
| 17 |
+
|
| 18 |
+
BigBird relies on **block sparse attention** instead of normal attention (i.e. BERT's attention) and can handle sequences up to a length of 4096 at a much lower compute cost compared to BERT. It has achieved SOTA on various tasks involving very long sequences such as long documents summarization, question-answering with long contexts.
|
| 19 |
+
|
| 20 |
+
## Original implementation
|
| 21 |
+
|
| 22 |
+
Follow [this link](https://huggingface.co/google/bigbird-roberta-base) to see the original implementation.
|
| 23 |
+
|
| 24 |
+
## How to use
|
| 25 |
+
|
| 26 |
+
Download the model by cloning the repository via `git clone https://huggingface.co/OWG/bigbird-roberta-base`.
|
| 27 |
+
|
| 28 |
+
Then you can use the model with the following code:
|
| 29 |
+
|
| 30 |
+
```python
|
| 31 |
+
from onnxruntime import InferenceSession, SessionOptions, GraphOptimizationLevel
|
| 32 |
+
from transformers import BertTokenizer
|
| 33 |
+
|
| 34 |
+
tokenizer = BertTokenizer.from_pretrained("google/bigbird-roberta-base")
|
| 35 |
+
|
| 36 |
+
options = SessionOptions()
|
| 37 |
+
options.graph_optimization_level = GraphOptimizationLevel.ORT_ENABLE_ALL
|
| 38 |
+
|
| 39 |
+
session = InferenceSession("path/to/model.onnx", sess_options=options)
|
| 40 |
+
session.disable_fallback()
|
| 41 |
+
|
| 42 |
+
text = "Replace me by any text you want to encode."
|
| 43 |
+
|
| 44 |
+
input_ids = tokenizer(text, return_tensors="pt", return_attention_mask=True)
|
| 45 |
+
inputs = {k: v.cpu().detach().numpy() for k, v in input_ids.items()}
|
| 46 |
+
|
| 47 |
+
outputs_name = session.get_outputs()[0].name
|
| 48 |
+
outputs = session.run(output_names=[outputs_name], input_feed=inputs)
|
| 49 |
+
```
|