File size: 9,290 Bytes
0ead5cf 7361c9b 0ead5cf 7361c9b 0ead5cf 7361c9b ab9a547 7361c9b ab9a547 7361c9b ab9a547 7361c9b ab9a547 7361c9b ab9a547 7361c9b ab9a547 7361c9b ab9a547 7361c9b ab9a547 7361c9b ab9a547 7361c9b ab9a547 7361c9b ab9a547 7361c9b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 |
---
base_model: bert-base-multilingual-uncased
datasets:
- ai4bharat/indic_glue
license: apache-2.0
tags:
- embedding_space_map
- BaseLM:bert-base-multilingual-uncased
---
# ESM ai4bharat/indic_glue
<!-- Provide a quick summary of what the model is/does. -->
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
ESM
- **Developed by:** David Schulte
- **Model type:** ESM
- **Base Model:** bert-base-multilingual-uncased
- **Intermediate Task:** ai4bharat/indic_glue
- **ESM architecture:** linear
- **ESM embedding dimension:** [More Information Needed]
- **Language(s) (NLP):** [More Information Needed]
- **License:** Apache-2.0 license
- **ESM version:** 0.1.0
## Training Details
### Intermediate Task
- **Task ID:** ai4bharat/indic_glue
- **Subset [optional]:** wnli.hi
- **Text Column:** ['premise', 'hypothesis']
- **Label Column:** label
- **Dataset Split:** train
- **Sample size [optional]:** 635
- **Sample seed [optional]:**
### Training Procedure [optional]
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
#### Language Model Training Hyperparameters [optional]
- **Epochs:** 3
- **Batch size:** 32
- **Learning rate:** 2e-05
- **Weight Decay:** 0.01
- **Optimizer**: AdamW
### ESM Training Hyperparameters [optional]
- **Epochs:** 10
- **Batch size:** 32
- **Learning rate:** 0.001
- **Weight Decay:** 0.01
- **Optimizer**: AdamW
### Additional trainiung details [optional]
## Model evaluation
### Evaluation of fine-tuned language model [optional]
### Evaluation of ESM [optional]
MSE:
### Additional evaluation details [optional]
## What are Embedding Space Maps used for?
Embedding Space Maps are a part of ESM-LogME, a efficient method for finding intermediate datasets for transfer learning. There are two reasons to use ESM-LogME:
### You don't have enough training data for your problem
If you don't have a enough training data for your problem, just use ESM-LogME to find more.
You can supplement model training by including publicly available datasets in the training process.
1. Fine-tune a language model on suitable intermediate dataset.
2. Fine-tune the resulting model on your target dataset.
This workflow is called intermediate task transfer learning and it can significantly improve the target performance.
But what is a suitable dataset for your problem? ESM-LogME enable you to quickly rank thousands of datasets on the Hugging Face Hub by how well they are exptected to transfer to your target task.
### You want to find similar datasets to your target dataset
Using ESM-LogME can be used like search engine on the Hugging Face Hub. You can find similar tasks to your target task without having to rely on heuristics. ESM-LogME estimates how language models fine-tuned on each intermediate task would benefinit your target task. This quantitative approach combines the effects of domain similarity and task similarity.
## How can I use ESM-LogME / ESMs?
[](https://pypi.org/project/hf-dataset-selector)
We release **hf-dataset-selector**, a Python package for intermediate task selection using Embedding Space Maps.
**hf-dataset-selector** fetches ESMs for a given language model and uses it to find the best dataset for applying intermediate training to the target task. ESMs are found by their tags on the Huggingface Hub.
```python
from hfselect import Dataset, compute_task_ranking
# Load target dataset from the Hugging Face Hub
dataset = Dataset.from_hugging_face(
name="stanfordnlp/imdb",
split="train",
text_col="text",
label_col="label",
is_regression=False,
num_examples=1000,
seed=42
)
# Fetch ESMs and rank tasks
task_ranking = compute_task_ranking(
dataset=dataset,
model_name="bert-base-multilingual-uncased"
)
# Display top 5 recommendations
print(task_ranking[:5])
```
```python
1. davanstrien/test_imdb_embedd2 Score: -0.618529
2. davanstrien/test_imdb_embedd Score: -0.618644
3. davanstrien/test1 Score: -0.619334
4. stanfordnlp/imdb Score: -0.619454
5. stanfordnlp/sst Score: -0.62995
```
| Rank | Task ID | Task Subset | Text Column | Label Column | Task Split | Num Examples | ESM Architecture | Score |
|-------:|:------------------------------|:----------------|:--------------|:---------------|:-------------|---------------:|:-------------------|----------:|
| 1 | davanstrien/test_imdb_embedd2 | default | text | label | train | 10000 | linear | -0.618529 |
| 2 | davanstrien/test_imdb_embedd | default | text | label | train | 10000 | linear | -0.618644 |
| 3 | davanstrien/test1 | default | text | label | train | 10000 | linear | -0.619334 |
| 4 | stanfordnlp/imdb | plain_text | text | label | train | 10000 | linear | -0.619454 |
| 5 | stanfordnlp/sst | dictionary | phrase | label | dictionary | 10000 | linear | -0.62995 |
| 6 | stanfordnlp/sst | default | sentence | label | train | 8544 | linear | -0.63312 |
| 7 | kuroneko5943/snap21 | CDs_and_Vinyl_5 | sentence | label | train | 6974 | linear | -0.634365 |
| 8 | kuroneko5943/snap21 | Video_Games_5 | sentence | label | train | 6997 | linear | -0.638787 |
| 9 | kuroneko5943/snap21 | Movies_and_TV_5 | sentence | label | train | 6989 | linear | -0.639068 |
| 10 | fancyzhx/amazon_polarity | amazon_polarity | content | label | train | 10000 | linear | -0.639718 |
For more information on how to use ESMs please have a look at the [official Github repository](https://github.com/davidschulte/hf-dataset-selector). We provide documentation further documentation and tutorials for finding intermediate datasets and training your own ESMs.
## How do Embedding Space Maps work?
<!-- This section describes the evaluation protocols and provides the results. -->
Embedding Space Maps (ESMs) are neural networks that approximate the effect of fine-tuning a language model on a task. They can be used to quickly transform embeddings from a base model to approximate how a fine-tuned model would embed the the input text.
ESMs can be used for intermediate task selection with the ESM-LogME workflow.
## How can I use Embedding Space Maps for Intermediate Task Selection?
## Citation
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
If you are using this Embedding Space Maps, please cite our [paper](https://aclanthology.org/2024.emnlp-main.529/).
**BibTeX:**
```
@inproceedings{schulte-etal-2024-less,
title = "Less is More: Parameter-Efficient Selection of Intermediate Tasks for Transfer Learning",
author = "Schulte, David and
Hamborg, Felix and
Akbik, Alan",
editor = "Al-Onaizan, Yaser and
Bansal, Mohit and
Chen, Yun-Nung",
booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2024",
address = "Miami, Florida, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.emnlp-main.529/",
doi = "10.18653/v1/2024.emnlp-main.529",
pages = "9431--9442",
abstract = "Intermediate task transfer learning can greatly improve model performance. If, for example, one has little training data for emotion detection, first fine-tuning a language model on a sentiment classification dataset may improve performance strongly. But which task to choose for transfer learning? Prior methods producing useful task rankings are infeasible for large source pools, as they require forward passes through all source language models. We overcome this by introducing Embedding Space Maps (ESMs), light-weight neural networks that approximate the effect of fine-tuning a language model. We conduct the largest study on NLP task transferability and task selection with 12k source-target pairs. We find that applying ESMs on a prior method reduces execution time and disk space usage by factors of 10 and 278, respectively, while retaining high selection performance (avg. regret@5 score of 2.95)."
}
```
**APA:**
```
Schulte, D., Hamborg, F., & Akbik, A. (2024, November). Less is More: Parameter-Efficient Selection of Intermediate Tasks for Transfer Learning. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (pp. 9431-9442).
```
## Additional Information
|