Update README.md
Browse files
README.md
CHANGED
|
@@ -7,11 +7,11 @@ base_model:
|
|
| 7 |
---
|
| 8 |
# PatentBERT - PyTorch
|
| 9 |
|
| 10 |
-
BERT model specialized for patent classification using the **
|
| 11 |
|
| 12 |
## π Specifications
|
| 13 |
|
| 14 |
-
- **Output classes**: 656 (
|
| 15 |
- **Classification system**: CPC (Cooperative Patent Classification)
|
| 16 |
- **Architecture**: BERT-base (768 hidden, 12 layers, 12 attention heads)
|
| 17 |
- **Vocabulary**: 30,522 tokens
|
|
@@ -32,7 +32,7 @@ The model predicts classes according to the authentic CPC system used in PatentB
|
|
| 32 |
- **H (51 classes)**: Electricity - Electronics, Power generation, Communication
|
| 33 |
- **Y (9 classes)**: General Tagging of New Technological Developments
|
| 34 |
|
| 35 |
-
### Example
|
| 36 |
|
| 37 |
- `A01B`: SOIL WORKING IN AGRICULTURE OR FORESTRY
|
| 38 |
- `B25J`: MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
|
|
@@ -63,7 +63,7 @@ with torch.no_grad():
|
|
| 63 |
predicted_class_id = predictions.argmax().item()
|
| 64 |
confidence = predictions.max().item()
|
| 65 |
|
| 66 |
-
# Use model labels (
|
| 67 |
predicted_label = model.config.id2label[str(predicted_class_id)]
|
| 68 |
|
| 69 |
print(f"Predicted CPC class: {predicted_label} (ID: {predicted_class_id})")
|
|
@@ -73,21 +73,20 @@ print(f"Confidence: {confidence:.2%}")
|
|
| 73 |
## π Included Files
|
| 74 |
|
| 75 |
- `model.safetensors`: Model weights (420 MB)
|
| 76 |
-
- `config.json`: Configuration with integrated
|
| 77 |
- `vocab.txt`: Tokenizer vocabulary
|
| 78 |
- `tokenizer_config.json`: Tokenizer configuration
|
| 79 |
-
- `labels.json`: Complete
|
| 80 |
- `README.md`: This documentation
|
| 81 |
|
| 82 |
## π¬ Performance
|
| 83 |
|
| 84 |
-
This model was trained on a large patent corpus to automatically classify documents according to the
|
| 85 |
|
| 86 |
## π References
|
| 87 |
|
| 88 |
- [Cooperative Patent Classification (CPC)](https://www.cooperativepatentclassification.org/)
|
| 89 |
- [Original PatentBERT Paper](https://arxiv.org/abs/2103.02557)
|
| 90 |
-
- [Hugging Face Transformers](https://huggingface.co/transformers/)
|
| 91 |
|
| 92 |
## π Citation
|
| 93 |
|
|
|
|
| 7 |
---
|
| 8 |
# PatentBERT - PyTorch
|
| 9 |
|
| 10 |
+
BERT model specialized for patent classification using the **CPC (Cooperative Patent Classification) system**. (PyTorch version of the original [PatentBert](https://github.com/jiehsheng/PatentBERT/) model.)
|
| 11 |
|
| 12 |
## π Specifications
|
| 13 |
|
| 14 |
+
- **Output classes**: 656 (CPC subclass labels)
|
| 15 |
- **Classification system**: CPC (Cooperative Patent Classification)
|
| 16 |
- **Architecture**: BERT-base (768 hidden, 12 layers, 12 attention heads)
|
| 17 |
- **Vocabulary**: 30,522 tokens
|
|
|
|
| 32 |
- **H (51 classes)**: Electricity - Electronics, Power generation, Communication
|
| 33 |
- **Y (9 classes)**: General Tagging of New Technological Developments
|
| 34 |
|
| 35 |
+
### Example of CPC Subclasses
|
| 36 |
|
| 37 |
- `A01B`: SOIL WORKING IN AGRICULTURE OR FORESTRY
|
| 38 |
- `B25J`: MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
|
|
|
|
| 63 |
predicted_class_id = predictions.argmax().item()
|
| 64 |
confidence = predictions.max().item()
|
| 65 |
|
| 66 |
+
# Use model labels (CPC codes)
|
| 67 |
predicted_label = model.config.id2label[str(predicted_class_id)]
|
| 68 |
|
| 69 |
print(f"Predicted CPC class: {predicted_label} (ID: {predicted_class_id})")
|
|
|
|
| 73 |
## π Included Files
|
| 74 |
|
| 75 |
- `model.safetensors`: Model weights (420 MB)
|
| 76 |
+
- `config.json`: Configuration with integrated CPC labels
|
| 77 |
- `vocab.txt`: Tokenizer vocabulary
|
| 78 |
- `tokenizer_config.json`: Tokenizer configuration
|
| 79 |
+
- `labels.json`: Complete CPC label mapping (656 authentic labels)
|
| 80 |
- `README.md`: This documentation
|
| 81 |
|
| 82 |
## π¬ Performance
|
| 83 |
|
| 84 |
+
This model was trained on a large patent corpus to automatically classify documents according to the CPC system, using the exact same 656 CPC codes from the original PatentBERT training data.
|
| 85 |
|
| 86 |
## π References
|
| 87 |
|
| 88 |
- [Cooperative Patent Classification (CPC)](https://www.cooperativepatentclassification.org/)
|
| 89 |
- [Original PatentBERT Paper](https://arxiv.org/abs/2103.02557)
|
|
|
|
| 90 |
|
| 91 |
## π Citation
|
| 92 |
|