imranali291
/

genderize

Text Classification

Model card Files Files and versions Community

genderize / README.md

imranali291's picture

Update README.md

9c34372 verified about 2 months ago

|

history blame contribute delete

2.64 kB

	---
	license: mit
	language:
	- en
	base_model:
	- google-bert/bert-base-cased
	tags:
	- genderization
	- text-classification
	- prediction
	---

	# Gender Classification by Name

	### Model Details
	- Model Name: Genderize
	- Developed By: Imran Ali
	- Model Type: Text Classification
	- Language: English
	- License: MIT

	### Description
	This model classifies gender based on the input name. It uses a pre-trained BERT model as the base and has been fine-tuned on a dataset of names and their associated genders.

	### Training Details
	- Training Data: Dataset of names and genders (e.g., Dannel gender-name dataset)
	- Training Procedure: Fine-tuned using BERT model with a classification head
	- Training Hyperparameters:
	- Batch size: 8
	- Gradient accumulation steps: 1
	- learning_rate: 2e-5
	- Total steps: 20,005
	- Number of trainable parameters: 109,483,778 (1.9M)

	### Evaluation
	- Testing Data: Split from the training dataset
	- Metrics: Accuracy, Precision, Recall, F1 Score

	### Uses
	- Direct Use: Classifying the gender of a given name
	- Downstream Use: Enhancing applications that require gender identification based on names (e.g., personalized marketing, user profiling)
	- Out-of-Scope Use: Using the model for purposes other than gender classification without proper validation

	### Bias, Risks, and Limitations
	- Bias: The model may reflect biases present in the training data. It is important to validate its performance across diverse datasets.
	- Risks: Misclassification can occur, especially for names that are unisex or less common.
	- Limitations: The model's accuracy may vary depending on the cultural and linguistic context of the names.

	### Recommendations
	- Users should be aware of the potential biases and limitations of the model.
	- Further validation is recommended for specific use cases and datasets.

	### How to Get Started with the Model
	```python
	from transformers import AutoModelForSequenceClassification, AutoTokenizer

	# Load the model and tokenizer from the Hub
	model_name = "imranali291/genderize"
	model = AutoModelForSequenceClassification.from_pretrained(model_name)
	tokenizer = AutoTokenizer.from_pretrained(model_name)

	# Example inference function
	def predict_gender(name):
	inputs = tokenizer(name, return_tensors="pt", padding=True, truncation=True, max_length=32)
	outputs = model(**inputs)
	predicted_label = outputs.logits.argmax(dim=-1).item()
	return label_encoder.inverse_transform([predicted_label])[0]

	print(predict_gender("Alex")) # Example output: 'M'
	print(predict_gender("Maria")) # Example output: 'F'
	```