|
--- |
|
library_name: peft |
|
base_model: EleutherAI/gpt-neo-1.3B |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
|
|
``` |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
import torch |
|
from peft import PeftModel, PeftConfig |
|
import shelve |
|
|
|
|
|
model_name = "MyMoodAI/basicmood" |
|
adapters_name = "MyMoodAI/basicmood" |
|
|
|
|
|
torch.cuda.empty_cache() |
|
|
|
|
|
|
|
print(f"Starting to load the model {model_name} into memory") |
|
|
|
m = AutoModelForCausalLM.from_pretrained( |
|
model_name, |
|
#load_in_4bit=True, |
|
) |
|
print(f"Loading the adapters from {adapters_name}") |
|
m = PeftModel.from_pretrained(m, adapters_name) |
|
|
|
|
|
|
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained("MyMoodAI/basicmood", trust_remote_code=True) |
|
|
|
while True: |
|
mood_input = input("Mood: ") |
|
|
|
inputs = tokenizer("Prompt: %s ### Answer: "%mood_input, return_tensors="pt", return_attention_mask=True) |
|
outputs = m.generate(**inputs, max_length=24) |
|
|
|
print(tokenizer.batch_decode(outputs)[0]) |
|
|
|
``` |
|
|
|
|
|
Train Proccedure at the very bottom |
|
|
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
Classify Guilty, Anxious, Depressed states (low accuracy and rudimentary); trained on a generic dataset from the GEMENI API |
|
|
|
- **Developed by:** Emmanuel Nsanga (space and a communication channel on Slack. provided by (mainly the AI builders Club - thebuilderclub.org), Canberra Deep Learning and the Sydney Startup Hub |
|
- **Funded by**:**Emmanuel Nsanga & Roy Kwan [More Information Needed] |
|
- **Shared by [optional]:** [More Information Needed] |
|
- **Model type:** [More Information Needed] |
|
- **Language(s) (NLP):** [More Information Needed] This model specifically (only for now) is English |
|
- **License:** [More Information Needed] Big Science RAILS |
|
- **Finetuned from model [optional]:** [More Information Needed] gpt-neo-1.3B |
|
|
|
### Model Sources [optional] |
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
- **Repository:** [More Information Needed] |
|
- **Paper [optional]:** [More Information Neede Model name: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz |
|
d] |
|
- **Demo [optional]:** [More Information Needed] |
|
|
|
## Uses |
|
|
|
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
|
|
|
### Direct Use |
|
|
|
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. --> |
|
|
|
[More Information Needed] |
|
|
|
### Downstream Use [optional] |
|
|
|
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app --> |
|
|
|
[More Information Needed] |
|
|
|
### Out-of-Scope Use |
|
|
|
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. --> |
|
|
|
[More Information Needed] |
|
|
|
## Bias, Risks, and Limitations |
|
Risks: Total inaccuracy and sensetive human emotions understanding. (Kudos - 'Crystal Pang') |
|
|
|
Limitations: Not a real undestanding of emotions - still need human feeback. |
|
|
|
Bias. Out of distrbution bias and model size. (Kudos Leo Chow) |
|
|
|
|
|
<!-- This section is meant to convey both technical and sociotechnical limitations. --> |
|
|
|
[More Information Needed] |
|
|
|
### Recommendations |
|
|
|
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. --> |
|
|
|
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. |
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. |
|
|
|
[More Information Needed] |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
GEMENI API Prompts - (Generate a 1000 samples of very simple guilty/anxious/depressed mood states of short sentences) |
|
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. --> |
|
|
|
[More Information Needed] |
|
|
|
### Training Procedure |
|
SFTTrainer (Kudos - Cheng Yu at Canberra DL) |
|
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. --> |
|
|
|
#### Preprocessing [optional] |
|
|
|
[More Information Needed] |
|
|
|
|
|
#### Training Hyperparameters |
|
|
|
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision --> |
|
|
|
#### Speeds, Sizes, Times [optional] |
|
|
|
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. --> |
|
|
|
[More Information Needed] |
|
|
|
## Evaluation |
|
|
|
<!-- This section describes the evaluation protocols and provides the results. --> |
|
|
|
### Testing Data, Factors & Metrics |
|
|
|
#### Testing Data |
|
|
|
<!-- This should link to a Dataset Card if possible. --> |
|
|
|
[More Information Needed] |
|
|
|
#### Factors |
|
|
|
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. --> |
|
|
|
[More Information Needed] |
|
|
|
#### Metrics |
|
|
|
<!-- These are the evaluation metrics being used, ideally with a description of why. --> |
|
|
|
[More Information Needed] |
|
|
|
### Results |
|
|
|
|
|
0.0007 loss (improved by HyperParam Opt.) |
|
|
|
|
|
[More Information Needed] |
|
|
|
#### Summary |
|
|
|
|
|
|
|
## Model Examination [optional] |
|
|
|
<!-- Relevant interpretability work for the model goes here --> |
|
|
|
[More Information Needed] |
|
|
|
## Environmental Impact |
|
|
|
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly --> |
|
|
|
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). |
|
|
|
- **Hardware Type:** [More Information Needed] |
|
- **Hours used:** [More Information Needed] (Just under roughly 10 hours to fine-tune exactly and/or six months of Google Colab Pro+) |
|
- **Cloud Provider:** [More Information Needed] Google Colab Pro+ |
|
- **Compute Region:** [More Information Needed] Sydney |
|
- **Carbon Emitted:** [More Information Needed] Refer to Google Data Centre Emisions management |
|
|
|
## Technical Specifications [optional] |
|
|
|
Trained for under two hours on one Epoch. |
|
|
|
### Model Architecture and Objective |
|
|
|
[More Information Needed] |
|
|
|
### Compute Infrastructure |
|
|
|
[More Information Needed] Google Colab Pro+, Vultr, AWS |
|
|
|
#### Hardware |
|
|
|
V100 High RAM (for Fine-tuning) |
|
|
|
CPU (Hardware) - 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz Octa-core (4GB - 4GB SWAP) |
|
|
|
[More Information Needed] |
|
|
|
#### Software |
|
|
|
[More Information Needed] |
|
|
|
## Citation [optional] |
|
|
|
Eleuther.ai |
|
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> |
|
|
|
**BibTeX:** |
|
|
|
[More Information Needed] |
|
|
|
**APA:** |
|
|
|
[More Information Needed] |
|
|
|
## Glossary [optional] |
|
|
|
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. --> |
|
|
|
[More Information Needed] |
|
|
|
## More Information [optional] |
|
|
|
[More Information Needed] |
|
|
|
## Model Card Authors [optional] |
|
|
|
[More Information Needed] |
|
|
|
## Model Card Contact |
|
|
|
[email protected] |
|
|
|
[More Information Needed] |
|
|
|
|
|
### Framework versions |
|
|
|
- PEFT 0.10.0 |
|
|
|
|
|
``` |
|
!pip3 install -q -U google-generativeai |
|
|
|
|
|
import google.generativeai as genai |
|
|
|
GOOGLE_API_KEY = '' |
|
|
|
|
|
|
|
genai.configure(api_key=GOOGLE_API_KEY) |
|
|
|
|
|
|
|
model = genai.GenerativeModel('gemini-pro') |
|
|
|
|
|
response = model.generate_content("Generate a 1000 samples of very simple guilty mood states of short sentences", stream=True) |
|
|
|
|
|
response.resolve() |
|
guiltsamples = response.text.split('\n') |
|
|
|
|
|
response = model.generate_content("Generate a 1000 samples of very simple anxious mood states of short sentences", stream=True) |
|
response.resolve() |
|
anxioussamples = response.text.split('\n') |
|
|
|
|
|
|
|
response = model.generate_content("Generate a 1000 samples of very simple depressed mood states of short sentences", stream=True) |
|
response.resolve() |
|
depressedsamples = response.text.split('\n') |
|
|
|
|
|
|
|
guiltsamples = list(zip(guiltsamples, ["You're feeling guilty" for d in range(len(guiltsamples))])) |
|
anxioussamples = list(zip(anxioussamples, ["You're feeling anxious" for d in range(len(anxioussamples))])) |
|
depressedsamples = list(zip(depressedsamples, ["You're feeling depressed" for d in range(len(depressedsamples))])) |
|
data = guiltsamples + anxioussamples + depressedsamples |
|
|
|
|
|
|
|
|
|
|
|
|
|
from peft import PeftModel |
|
import pandas as pd |
|
import shelve |
|
from datasets import Dataset |
|
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer, DataCollatorForLanguageModeling, BitsAndBytesConfig |
|
from transformers import AutoModelForCausalLM |
|
import torch |
|
from datasets import load_dataset, Dataset |
|
import datasets |
|
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer, DataCollatorForLanguageModeling, BitsAndBytesConfig |
|
from peft import LoraConfig, get_peft_model |
|
from peft import prepare_model_for_kbit_training, LoraConfig, get_peft_model |
|
|
|
from trl import SFTTrainer, DataCollatorForCompletionOnlyLM |
|
|
|
from transformers import GPTNeoXForCausalLM, AutoTokenizer |
|
|
|
from transformers import get_scheduler |
|
|
|
torch.cuda.empty_cache() |
|
|
|
class TrainModel: |
|
def __init__(self, params, data, accu_epochs): |
|
self.quant_config = BitsAndBytesConfig( |
|
load_in_4bit=True, |
|
bnb_16bit_use_double_quant=True, |
|
bnb_4bit_quant_type="nf4", |
|
bnb_4bit_compute_dtype=torch.bfloat16 |
|
|
|
|
|
) |
|
self.model = AutoModelForCausalLM.from_pretrained( |
|
"EleutherAI/gpt-neo-1.3B", |
|
quantization_config=self.quant_config, |
|
device_map="auto" |
|
) |
|
|
|
self.tokenizer = AutoTokenizer.from_pretrained( |
|
"EleutherAI/gpt-neo-1.3B", |
|
) |
|
self.params = params |
|
self.data = data |
|
self.epochs = accu_epochs |
|
|
|
|
|
|
|
|
|
def lora_config(self): |
|
lora_config = LoraConfig( |
|
r=abs(int(self.params['r']*100)), |
|
lora_alpha=int(self.params['alpha']), |
|
target_modules=["Wqkv", "out_proj"], |
|
lora_dropout=int(self.params['dropout']), |
|
bias="none", |
|
task_type="CAUSAL_LM" |
|
) |
|
print(self.params['r'], self.params['dropout'], self.params['alpha']) |
|
return(lora_config) |
|
|
|
|
|
def formatting_prompts_func(self, example): |
|
output_texts = [] |
|
for i in range(len(example['Prompt'])): |
|
text = f"### Question: {example['Prompt'][i]}\n ### Answer: {example['Completion'][i]}" |
|
output_texts.append(text) |
|
return(output_texts) |
|
|
|
def prepare_data(self): |
|
df = pd.DataFrame(self.data, columns=['Prompt', 'Completion']) |
|
data = Dataset.from_pandas(df) |
|
return(data) |
|
|
|
|
|
|
|
def training(self): |
|
print(abs(self.params['r'].item()*100)) |
|
bnb_config = BitsAndBytesConfig( |
|
load_in_4bit=True, |
|
bnb_4bit_use_double_quant=True, |
|
bnb_4bit_quant_type="nf4", |
|
bnb_4bit_compute_dtype=torch.float16 |
|
) |
|
|
|
|
|
self.model = get_peft_model(self.model, self.lora_config()) |
|
training_arguments = TrainingArguments( |
|
optim='paged_adamw_8bit', |
|
output_dir="Multi-lingual-finetuned-med-text", |
|
per_device_train_batch_size=4, |
|
gradient_accumulation_steps=4, |
|
lr_scheduler_type="cosine", |
|
save_strategy="epoch", |
|
logging_steps=100, |
|
max_steps=10000, |
|
warmup_steps=10, |
|
num_train_epochs=self.epochs, |
|
fp16=True |
|
|
|
) |
|
self.tokenizer.pad_token = self.tokenizer.eos_token |
|
response_template = " ### Answer:" |
|
collator = DataCollatorForCompletionOnlyLM(response_template, tokenizer=self.tokenizer) |
|
|
|
|
|
trainer = SFTTrainer( |
|
model=self.model, |
|
train_dataset=self.prepare_data(), |
|
args=training_arguments, |
|
formatting_func=self.formatting_prompts_func, |
|
data_collator=collator |
|
|
|
|
|
) |
|
trainer.train() |
|
trainer.state.log_history |
|
print(trainer.state.log_history) |
|
return(trainer.state.log_history[0]['loss']) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
class HyperParam: |
|
def __init__(self): |
|
self.meana = torch.Tensor([8]) |
|
self.stda = torch.Tensor([0.1]) |
|
self.meanr = torch.Tensor([16.]) |
|
self.stdr = torch.Tensor([1.]) |
|
self.meand = torch.Tensor([.25]) |
|
self.stdd = torch.Tensor([0.01]) |
|
self.lr = 0.5 |
|
self.accu_epochs = 1 |
|
|
|
|
|
|
|
def sample_params(self): |
|
alpha = torch.distributions.Normal(self.meana.unsqueeze(0), self.stda.unsqueeze(0)) |
|
dropout = torch.distributions.Normal(self.meand.unsqueeze(0), self.stdd.unsqueeze(0)) |
|
r = torch.distributions.Normal(self.meand.unsqueeze(0), self.stdd.unsqueeze(0)) |
|
return({'alpha': alpha.sample(), 'dropout': dropout.sample(), 'r': r.sample()}) |
|
|
|
|
|
def loss(self): |
|
Training = TrainModel(self.sample_params(), data, self.accu_epochs) |
|
loss = Training.training() |
|
return(12) |
|
|
|
|
|
def hyper(self): |
|
optimizer = torch.optim.Adagrad([self.meanr, self.stdr, self.meana, self.stda, self.meand, self.stdd], self.lr) |
|
while True: |
|
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=100) |
|
scheduler.step() |
|
params = optimizer.param_groups |
|
params = params[0]['params'] |
|
optimizer.step(closure=self.loss) |
|
self.lr = scheduler.get_last_lr()[0] |
|
self.meanr = params[0] |
|
self.stdr = params[1] |
|
self.meana = params[2] |
|
self.stda = params[3] |
|
self.meand = params[4] |
|
self.stdd = params[5] |
|
self.accu_epochs+=1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hyper = HyperParam() |
|
|
|
Hyper.hyper() |
|
``` |