imranali291 commited on
Commit
9c34372
·
verified ·
1 Parent(s): 820a199

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +72 -3
README.md CHANGED
@@ -1,3 +1,72 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ base_model:
6
+ - google-bert/bert-base-cased
7
+ tags:
8
+ - genderization
9
+ - text-classification
10
+ - prediction
11
+ ---
12
+
13
+ # Gender Classification by Name
14
+
15
+ ### Model Details
16
+ - **Model Name**: Genderize
17
+ - **Developed By**: Imran Ali
18
+ - **Model Type**: Text Classification
19
+ - **Language**: English
20
+ - **License**: MIT
21
+
22
+ ### Description
23
+ This model classifies gender based on the input name. It uses a pre-trained BERT model as the base and has been fine-tuned on a dataset of names and their associated genders.
24
+
25
+ ### Training Details
26
+ - **Training Data**: Dataset of names and genders (e.g., Dannel gender-name dataset)
27
+ - **Training Procedure**: Fine-tuned using BERT model with a classification head
28
+ - **Training Hyperparameters**:
29
+ - Batch size: 8
30
+ - Gradient accumulation steps: 1
31
+ - learning_rate: 2e-5
32
+ - Total steps: 20,005
33
+ - Number of trainable parameters: 109,483,778 (1.9M)
34
+
35
+ ### Evaluation
36
+ - **Testing Data**: Split from the training dataset
37
+ - **Metrics**: Accuracy, Precision, Recall, F1 Score
38
+
39
+ ### Uses
40
+ - **Direct Use**: Classifying the gender of a given name
41
+ - **Downstream Use**: Enhancing applications that require gender identification based on names (e.g., personalized marketing, user profiling)
42
+ - **Out-of-Scope Use**: Using the model for purposes other than gender classification without proper validation
43
+
44
+ ### Bias, Risks, and Limitations
45
+ - **Bias**: The model may reflect biases present in the training data. It is important to validate its performance across diverse datasets.
46
+ - **Risks**: Misclassification can occur, especially for names that are unisex or less common.
47
+ - **Limitations**: The model's accuracy may vary depending on the cultural and linguistic context of the names.
48
+
49
+ ### Recommendations
50
+ - Users should be aware of the potential biases and limitations of the model.
51
+ - Further validation is recommended for specific use cases and datasets.
52
+
53
+ ### How to Get Started with the Model
54
+ ```python
55
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
56
+
57
+ # Load the model and tokenizer from the Hub
58
+ model_name = "imranali291/genderize"
59
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
60
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
61
+
62
+ # Example inference function
63
+ def predict_gender(name):
64
+ inputs = tokenizer(name, return_tensors="pt", padding=True, truncation=True, max_length=32)
65
+ outputs = model(**inputs)
66
+ predicted_label = outputs.logits.argmax(dim=-1).item()
67
+ return label_encoder.inverse_transform([predicted_label])[0]
68
+
69
+ print(predict_gender("Alex")) # Example output: 'M'
70
+ print(predict_gender("Maria")) # Example output: 'F'
71
+ ```
72
+