Gender Classification by Name

Model Details

  • Model Name: Genderize
  • Developed By: Imran Ali
  • Model Type: Text Classification
  • Language: English
  • License: MIT

Description

This model classifies gender based on the input name. It uses a pre-trained BERT model as the base and has been fine-tuned on a dataset of names and their associated genders.

Training Details

  • Training Data: Dataset of names and genders (e.g., Dannel gender-name dataset)
  • Training Procedure: Fine-tuned using BERT model with a classification head
  • Training Hyperparameters:
    • Batch size: 8
    • Gradient accumulation steps: 1
    • learning_rate: 2e-5
    • Total steps: 20,005
    • Number of trainable parameters: 109,483,778 (1.9M)

Evaluation

  • Testing Data: Split from the training dataset
  • Metrics: Accuracy, Precision, Recall, F1 Score

Uses

  • Direct Use: Classifying the gender of a given name
  • Downstream Use: Enhancing applications that require gender identification based on names (e.g., personalized marketing, user profiling)
  • Out-of-Scope Use: Using the model for purposes other than gender classification without proper validation

Bias, Risks, and Limitations

  • Bias: The model may reflect biases present in the training data. It is important to validate its performance across diverse datasets.
  • Risks: Misclassification can occur, especially for names that are unisex or less common.
  • Limitations: The model's accuracy may vary depending on the cultural and linguistic context of the names.

Recommendations

  • Users should be aware of the potential biases and limitations of the model.
  • Further validation is recommended for specific use cases and datasets.

How to Get Started with the Model

from transformers import AutoModelForSequenceClassification, AutoTokenizer

# Load the model and tokenizer from the Hub
model_name = "imranali291/genderize"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Example inference function
def predict_gender(name):
    inputs = tokenizer(name, return_tensors="pt", padding=True, truncation=True, max_length=32)
    outputs = model(**inputs)
    predicted_label = outputs.logits.argmax(dim=-1).item()
    return label_encoder.inverse_transform([predicted_label])[0]

print(predict_gender("Alex"))  # Example output: 'M'
print(predict_gender("Maria"))  # Example output: 'F'
Downloads last month
48
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for imranali291/genderize

Finetuned
(2185)
this model