metadata
language: en
tags:
- exbert
license: apache-2.0
datasets:
- bookcorpus
- wikipedia
VGCN-BERT (DistilBERT based, uncased)
This model is a VGCN-BERT model based on DistilBert-base-uncased version. The original paper is VGCN-BERT.
How to use
- First prepare WGraph symmetric adjacency matrix
import transformers as tfr
from transformers.models.vgcn_bert.modeling_graph import WordGraph
tokenizer = tfr.AutoTokenizer.from_pretrained(
"distilbert-base-uncased"
)
# 1st method: Build graph using NPMI statistical method from training corpus
wgraph = WordGraph(rows=train_valid_df["text"], tokenizer=tokenizer)
# 2nd method: Build graph from pre-defined entity relationship tuple with weight
entity_relations = [
("dog", "labrador", 0.6),
("cat", "garfield", 0.7),
("city", "montreal", 0.8),
("weather", "rain", 0.3),
]
wgraph = WordGraph(rows=entity_relations, tokenizer=tokenizer)
- Then instantiate VGCN-BERT model with your WGraphs (support multiple graphs).
from transformers.models.vgcn_bert.modeling_vgcn_bert import VGCNBertModel
model = VGCNBertModel.from_pretrained(
"zhibinlu/vgcn-bert-distilbert-base-uncased", trust_remote_code=True,
wgraphs=[wgraph.to_torch_sparse()],
wgraph_id_to_tokenizer_id_maps=[wgraph.wgraph_id_to_tokenizer_id_map]
)
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors="pt")
output = model(**encoded_input)
Fine-tune model
It's better fin-tune vgcn-bert model for the specific tasks.