File size: 2,331 Bytes
9d84136 1f9a904 9d84136 1f9a904 9d84136 1f9a904 68377da 9d84136 68377da 9d84136 68377da 9d84136 9c584b7 0a02c5c 1477943 0a02c5c 9c584b7 9d84136 68377da 3fb0ea8 68377da 1f9a904 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
---
language: en
tags:
- Text Classification
- TDAMM
- Multi-label Classification
- NASA
- Astrophysics
base_model:
- adsabs/astroBERT
library_name: transformers
license: apache-2.0
---
# TDAMM Multi-Label Classification Model
The TDAMM (Time Domain Multi-Messenger Astronomy) model is created to categorize NASA’s time domain multi-messenger resources into one or more of 36 distinct categories identified by subject matter experts (SMEs)
## Model Description
- **Base Model:** astroBERT, fine-tuned for multi-label classification
- **Task:** Multi-label classification
- **Training Data:** A collection of 408 NASA and non-NASA documents related to TDAMM topics identified by SMEs
## Data Distribution
<img src="https://cdn-uploads.huggingface.co/production/uploads/67804a0abd67e99d000342e1/oOZ3PhRsh6TDEfaSTTpxa.png" width="70%" alt="Distribution 1">
<img src="https://cdn-uploads.huggingface.co/production/uploads/67804a0abd67e99d000342e1/kKpL5XWCtgWiXHLAAmGz5.png" width="70%" alt="Distribution 2">
<img src="https://cdn-uploads.huggingface.co/production/uploads/67804a0abd67e99d000342e1/hJQt5iBKYsVPSHQLIH2RG.png" width="50%" alt="Distribution 3">
## Performance Analysis
<img src="https://cdn-uploads.huggingface.co/production/uploads/67804a0abd67e99d000342e1/aX8X-b7dehTwaA-opBulN.png" width="70%" alt="Threshold 1">
## Usage
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("nasa-impact/tdamm-classification")
model = AutoModelForSequenceClassification.from_pretrained("nasa-impact/tdamm-classification")
# Prepare input
text = "Your astronomical test text here"
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)
# Get predictions
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.sigmoid(outputs.logits)
# Convert to binary predictions (threshold = 0.5)
predictions = (predictions > 0.5).int()
```
## Label Mapping During Inference
After obtaining predictions from the model, we can map the predicted label indices to their actual names using the `model.config.id2label` dictionary
```python
# Example usage
predicted_indices = [0, 2, 5]
predicted_labels = [model.config.id2label[idx] for idx in predicted_indices]
print(predicted_labels)
``` |