sllawlis commited on
Commit
180e505
·
verified ·
1 Parent(s): 991b73f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +71 -137
README.md CHANGED
@@ -1,13 +1,17 @@
1
- ---
2
- library_name: transformers
3
- tags: []
4
- ---
 
 
 
 
5
 
6
- # Model Card for Model ID
7
 
8
  <!-- Provide a quick summary of what the model is/does. -->
9
 
10
-
11
 
12
  ## Model Details
13
 
@@ -15,23 +19,20 @@ tags: []
15
 
16
  <!-- Provide a longer summary of what this model is. -->
17
 
18
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
 
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
 
28
- ### Model Sources [optional]
29
 
30
  <!-- Provide the basic links for the model. -->
31
 
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
 
36
  ## Uses
37
 
@@ -41,159 +42,92 @@ This is the model card of a 🤗 transformers model that has been pushed on the
41
 
42
  <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
 
44
- [More Information Needed]
45
 
46
- ### Downstream Use [optional]
47
 
48
  <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
 
50
- [More Information Needed]
51
-
52
- ### Out-of-Scope Use
 
 
53
 
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
-
56
- [More Information Needed]
57
 
58
  ## Bias, Risks, and Limitations
59
 
60
  <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
 
62
- [More Information Needed]
63
-
64
- ### Recommendations
65
-
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
-
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
 
70
  ## How to Get Started with the Model
71
 
72
  Use the code below to get started with the model.
73
 
74
- [More Information Needed]
75
-
76
- ## Training Details
77
-
78
- ### Training Data
79
-
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
-
82
- [More Information Needed]
83
-
84
- ### Training Procedure
85
-
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
-
88
- #### Preprocessing [optional]
89
-
90
- [More Information Needed]
91
-
92
-
93
- #### Training Hyperparameters
94
-
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
-
97
- #### Speeds, Sizes, Times [optional]
98
-
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
-
101
- [More Information Needed]
102
-
103
- ## Evaluation
104
-
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
-
107
- ### Testing Data, Factors & Metrics
108
-
109
- #### Testing Data
110
 
111
- <!-- This should link to a Dataset Card if possible. -->
 
 
 
 
112
 
113
- [More Information Needed]
114
 
115
- #### Factors
116
 
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
 
 
 
118
 
119
- [More Information Needed]
 
 
 
 
 
 
 
120
 
121
- #### Metrics
 
 
 
 
122
 
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
 
125
- [More Information Needed]
126
-
127
- ### Results
128
-
129
- [More Information Needed]
130
-
131
- #### Summary
132
-
133
-
134
-
135
- ## Model Examination [optional]
136
-
137
- <!-- Relevant interpretability work for the model goes here -->
138
-
139
- [More Information Needed]
140
-
141
- ## Environmental Impact
142
-
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
-
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
-
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
-
153
- ## Technical Specifications [optional]
154
-
155
- ### Model Architecture and Objective
156
-
157
- [More Information Needed]
158
-
159
- ### Compute Infrastructure
160
-
161
- [More Information Needed]
162
-
163
- #### Hardware
164
-
165
- [More Information Needed]
166
-
167
- #### Software
168
-
169
- [More Information Needed]
170
-
171
- ## Citation [optional]
172
-
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
 
175
- **BibTeX:**
176
 
177
- [More Information Needed]
178
 
179
- **APA:**
180
 
181
- [More Information Needed]
182
 
183
- ## Glossary [optional]
184
 
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
 
187
- [More Information Needed]
188
 
189
- ## More Information [optional]
190
 
191
- [More Information Needed]
 
 
192
 
193
- ## Model Card Authors [optional]
194
 
195
- [More Information Needed]
196
 
197
  ## Model Card Contact
198
 
199
- [More Information Needed]
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ language:
5
+ - en
6
+ base_model:
7
+ - sentence-transformers/all-distilroberta-v1
8
+ ---
9
 
10
+ # distilroberta-ce-esci
11
 
12
  <!-- Provide a quick summary of what the model is/does. -->
13
 
14
+ This is a cross-encoder model optimized for e-commerce text classification tasks.
15
 
16
  ## Model Details
17
 
 
19
 
20
  <!-- Provide a longer summary of what this model is. -->
21
 
22
+ This is a fine-tuned cross-encoder model based on all-distilroberta-v1, trained on an [e-commerce dataset](https://github.com/amazon-science/esci-data/tree/main/shopping_queries_dataset) of query-product pairs. The model predicts relevance classes in the ESCI (Exact, Substitute, Complementary, Irrelevant) framework by capturing the relationship of the input text and class labels, which can be used for multi-class classification tasks or more complex downstream tasks.
23
 
24
+ - **Developed by:** Sarah Lawlis / DASC Practicum Team 12
25
+ - **Shared by:** University of Arkansas Data Science Practicum Team 12
26
+ - **Model type:** Sequence Classification (Cross-Encoder)
27
+ - **Language(s) (NLP):** English
28
+ - **License:** apache-2.0
29
+ - **Finetuned from model:** sentence-transformers/all-distilroberta-v1
 
30
 
31
+ ### Model Sources
32
 
33
  <!-- Provide the basic links for the model. -->
34
 
35
+ - **Repository:** [sllawlis/distilroberta-ce-esci](https://huggingface.co/sllawlis/distilroberta-ce-esci)
 
 
36
 
37
  ## Uses
38
 
 
42
 
43
  <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
44
 
45
+ This model is designed for multi-class product classification within the ESCI framework. The model directly predicts one of the ESCI labels for a given query-product pair. This task is the foundation for downstream use cases.
46
 
47
+ ### Downstream Use
48
 
49
  <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
50
 
51
+ The model's multi-class predictions can be used in the following downstream tasks:
52
+ 1. Ranking Systems:
53
+ * Combine the model's predictions with bi-encoders for a two-stage ranking pipeline:
54
+ * First Stage (Bi-Encoders): Generate candidate products efficiently by retrieving embeddings of query and product titles
55
+ * Second Stage (Cross-Encoders): Re-rank the candidates using fine-grained ESCI label predictions for better accuracy
56
 
57
+ 2. Product Substitute Identification:
58
+ * Use the Substitute label from the model to identify products that can replace one another
 
59
 
60
  ## Bias, Risks, and Limitations
61
 
62
  <!-- This section is meant to convey both technical and sociotechnical limitations. -->
63
 
64
+ * Bias: Due to heavy imbalance in ESCI labels in the training data, this model's predictions may skew to predicting more Exact labels.
65
+ * Limitations: This model is domain-specific to e-commerce data and may not generalize well to other domains. This model is optimized for the English language and may perform poorly with non-English data. Cross-encoders are computationally expensive for large-scale applications, there may be difficulty implementing this model for real-time inference.
 
 
 
 
 
66
 
67
  ## How to Get Started with the Model
68
 
69
  Use the code below to get started with the model.
70
 
71
+ ```python
72
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
 
74
+ # Load the tokenizer and model
75
+ model_name = "sllawlis/distilroberta-ce-esci"
76
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
77
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
78
+ ```
79
 
80
+ ## Usage (Multi-class Classification Example)
81
 
82
+ Below is a quick usage example of this model.
83
 
84
+ ```python
85
+ # Example query-product pair
86
+ query = "wireless headphones"
87
+ product = "Noise-cancelling wireless headphones with long battery life"
88
 
89
+ # Tokenize inputs
90
+ inputs = tokenizer(
91
+ query,
92
+ product,
93
+ truncation=True,
94
+ padding=True,
95
+ return_tensors="pt"
96
+ )
97
 
98
+ # Predict relevance
99
+ outputs = model(**inputs)
100
+ predicted_class = outputs.logits.argmax(dim=1).item()
101
+ print(f"Predicted Class: {predicted_class}")
102
+ ```
103
 
 
104
 
105
+ ## Training Details
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
106
 
107
+ ### Pre-training
108
 
109
+ The model uses the pretrained [all-distilroberta-v1](https://huggingface.co/sentence-transformers/all-distilroberta-v1).
110
 
111
+ ### Fine-tuning
112
 
113
+ The model is fine-tuned for multi-class relevance classification based on the ESCI framework. The fine-tuning process involves an input of query-product pairs, and an objective of classification using cross entropy loss to align predicted class probabilities with true labels.
114
 
115
+ ### Hyperparameters
116
 
117
+ Training was performed on a Tesla V100-PCIE-32GB GPU with a batch size of 32 over 3 epochs. The learning rate was set to 5e-5 and optimized using the AdamW optimizer, with 10% of the total training steps allocated for warm-up. Input sequences were padded to a max length of 512 tokens. Validation was conducted every ~10% of an epoch, and micro F1 score and accuracy were used to evaluate performance.
118
 
119
+ ### Training Data
120
 
121
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
122
 
123
+ | Dataset | Paper | Number of training tuples |
124
+ |--------------------------------------------------------|:----------------------------------------:|:--------------------------:|
125
+ | [Amazon Shopping Queries Dataset](https://github.com/amazon-science/esci-data/tree/main/shopping_queries_dataset) | [paper](https://arxiv.org/pdf/2206.06588) | 1,253,756 |
126
 
127
+ ## Model Card Authors
128
 
129
+ [Sarah Lawlis](https://www.linkedin.com/in/sarah-lawlis/)
130
 
131
  ## Model Card Contact
132
 
133