|
--- |
|
license: mit |
|
language: |
|
- en |
|
base_model: |
|
- google-bert/bert-base-cased |
|
tags: |
|
- genderization |
|
- text-classification |
|
- prediction |
|
--- |
|
|
|
# Gender Classification by Name |
|
|
|
### Model Details |
|
- **Model Name**: Genderize |
|
- **Developed By**: Imran Ali |
|
- **Model Type**: Text Classification |
|
- **Language**: English |
|
- **License**: MIT |
|
|
|
### Description |
|
This model classifies gender based on the input name. It uses a pre-trained BERT model as the base and has been fine-tuned on a dataset of names and their associated genders. |
|
|
|
### Training Details |
|
- **Training Data**: Dataset of names and genders (e.g., Dannel gender-name dataset) |
|
- **Training Procedure**: Fine-tuned using BERT model with a classification head |
|
- **Training Hyperparameters**: |
|
- Batch size: 8 |
|
- Gradient accumulation steps: 1 |
|
- learning_rate: 2e-5 |
|
- Total steps: 20,005 |
|
- Number of trainable parameters: 109,483,778 (1.9M) |
|
|
|
### Evaluation |
|
- **Testing Data**: Split from the training dataset |
|
- **Metrics**: Accuracy, Precision, Recall, F1 Score |
|
|
|
### Uses |
|
- **Direct Use**: Classifying the gender of a given name |
|
- **Downstream Use**: Enhancing applications that require gender identification based on names (e.g., personalized marketing, user profiling) |
|
- **Out-of-Scope Use**: Using the model for purposes other than gender classification without proper validation |
|
|
|
### Bias, Risks, and Limitations |
|
- **Bias**: The model may reflect biases present in the training data. It is important to validate its performance across diverse datasets. |
|
- **Risks**: Misclassification can occur, especially for names that are unisex or less common. |
|
- **Limitations**: The model's accuracy may vary depending on the cultural and linguistic context of the names. |
|
|
|
### Recommendations |
|
- Users should be aware of the potential biases and limitations of the model. |
|
- Further validation is recommended for specific use cases and datasets. |
|
|
|
### How to Get Started with the Model |
|
```python |
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer |
|
|
|
# Load the model and tokenizer from the Hub |
|
model_name = "imranali291/genderize" |
|
model = AutoModelForSequenceClassification.from_pretrained(model_name) |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
|
# Example inference function |
|
def predict_gender(name): |
|
inputs = tokenizer(name, return_tensors="pt", padding=True, truncation=True, max_length=32) |
|
outputs = model(**inputs) |
|
predicted_label = outputs.logits.argmax(dim=-1).item() |
|
return label_encoder.inverse_transform([predicted_label])[0] |
|
|
|
print(predict_gender("Alex")) # Example output: 'M' |
|
print(predict_gender("Maria")) # Example output: 'F' |
|
``` |
|
|
|
|