--- language: en tags: - exbert license: apache-2.0 datasets: - bookcorpus - wikipedia --- # VGCN-BERT (DistilBERT based, uncased) This model is a VGCN-BERT model based on [DistilBert-base-uncased](https://huggingface.co/distilbert-base-uncased) version. The original paper is [VGCN-BERT](https://arxiv.org/abs/2004.05707). ### How to use - First prepare WGraph symmetric adjacency matrix ```python import transformers as tfr from transformers.models.vgcn_bert.modeling_graph import WordGraph tokenizer = tfr.AutoTokenizer.from_pretrained( "distilbert-base-uncased" ) # 1st method: Build graph using NPMI statistical method from training corpus wgraph = WordGraph(rows=train_valid_df["text"], tokenizer=tokenizer) # 2nd method: Build graph from pre-defined entity relationship tuple with weight entity_relations = [ ("dog", "labrador", 0.6), ("cat", "garfield", 0.7), ("city", "montreal", 0.8), ("weather", "rain", 0.3), ] wgraph = WordGraph(rows=entity_relations, tokenizer=tokenizer) ``` - Then instantiate VGCN-BERT model with your WGraphs (support multiple graphs). ```python from transformers.models.vgcn_bert.modeling_vgcn_bert import VGCNBertModel model = VGCNBertModel.from_pretrained( "zhibinlu/vgcn-bert-distilbert-base-uncased", trust_remote_code=True, wgraphs=[wgraph.to_torch_sparse()], wgraph_id_to_tokenizer_id_maps=[wgraph.wgraph_id_to_tokenizer_id_map] ) text = "Replace me by any text you'd like." encoded_input = tokenizer(text, return_tensors="pt") output = model(**encoded_input) ``` ## Fine-tune model It's better fin-tune vgcn-bert model for the specific tasks.