File size: 3,886 Bytes

---
language:
- en
base_model:
- FacebookAI/xlm-roberta-large
pipeline_tag: text-classification
library_name: transformers
---

# Patent Classification Model

### Model Description

**multilabel_patent_classifier** is a fine-tuned [XLM-RoBERTa-large](https://huggingface.co/FacebookAI/xlm-roberta-large) model that has been trained on patent class information between 1855-1883 made available [here](http://walkerhanlon.com/data_resources/british_patent_classification_database.zip). 

It has been trained to recognize 146 classes of named entities outlined by the British Patent Office. These are made available [here](https://huggingface.co/matthewleechen/multiclass-classifier-patents/edit/main/BPO_classes.csv).

We take the original xlm-roberta-large [weights](https://huggingface.co/FacebookAI/xlm-roberta-large/blob/main/pytorch_model.bin) and fine tune on our custom dataset for 10 epochs with a learning rate of 2e-05 and a batch size of 64.

### Usage

This model can be used with HuggingFace Transformer's Pipelines API for NER: 

```python
from transformers import pipeline, AutoModelForTokenClassification, AutoTokenizer

model_name = "matthewleechen/multilabel_patent_classifier"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

pipe = pipeline(
  task="text-classification",
  model=model,
  device = 0,
  tokenizer=tokenizer,
  return_all_scores=True
)

```

### Training Data

Our training data consists of patent titles labelled with 0-1 tags for each patent class. Labels were generated by the British Patent Office between 1855-1883 and our patent titles were extracted from the front pages of our specification texts using a patent title NER [model](https://huggingface.co/matthewleechen/patent_titles_ner).

### Training Procedure

We use the standard multi-label classification protocols with the HuggingFace Trainer API, but replace the default `BCEWithLogitsLoss` with a [focal loss](https://arxiv.org/pdf/1708.02002) function (α=1, γ=2) to address class imbalance. Both during evaluation and at inference, we apply a sigmoid to each logit and use a 0.5 threshold to determine positive labels for each class.
 
### Evaluation 

We compute precision, recall, and F1 for each class (with a 0.5 sigmoid threshold), as well as exact match (only if ground truth and predicted classes are identical) and any match (if any overlap between ground truth and predicted classes) percentages. 

These scores are aggregated for the test set below.

<table>
  <thead>
    <tr>
      <th>Metric Type</th>
      <th>Precision (Micro)</th>
      <th>Recall (Micro)</th>
      <th>F1 (Micro)</th>
      <th>Exact Match</th>
      <th>Any Match</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Micro Average</td>
      <td>83.4%</td>
      <td>60.3%</td>
      <td>70.0%</td>
      <td>52.9%</td>
      <td>90.8%</td>
    </tr>
  </tbody>
</table>


## References

```bibtex
@misc{hanlon2016,
  title = {{British Patent Technology Classification Database: 1855–1882}},
  author = {Hanlon, Walker},
  year = {2016},
  url = {http://www.econ.ucla.edu/whanlon/},
  note = {Available at: \url{http://www.econ.ucla.edu/whanlon/}}
}

@misc{lin2018focallossdenseobject,
  title={Focal Loss for Dense Object Detection}, 
  author={Tsung-Yi Lin and Priya Goyal and Ross Girshick and Kaiming He and Piotr Dollár},
  year={2018},
  eprint={1708.02002},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/1708.02002}, 
}
```

## Citation 

If you use our model in your research, please cite our accompanying paper as follows:

```bibtex
@article{bct2025,
  title = {300 Years of British Patents},
  author = {Enrico Berkes and Matthew Lee Chen and Matteo Tranchero},
  journal = {arXiv preprint arXiv:2401.12345},
  year = {2025},
  url = {https://arxiv.org/abs/2401.12345}
}
```