Model Card for AMAbot

This model is an AMAbot ("Ask Me Anything" bot). It answers questions in the style of several personalities. It is a LoRA adapter upon the RedPajama-INCITE-7B-Chat model. A standalone merged version is also available at AMAbot/AMAbotMerged-7B.

It should run on any GPU with 16GB RAM.

AMAbot training data is collected as follows:

Collect a corpus of text from your chosen personality/personalities (preferably in small self-contained chunks, e.g. tweets)
For each small chunk, ask another language model to generate questions that could have elicited this chunk as an answer (here we use ChatGPT 3.5)
Prefix each question and answer with a marker for this particular personality (e.g. "Q for [NAME]:"), so you can train a single model to imitate multiple personalities
Fine-tune a chat-oriented large language model on this data

This particular model has been trained on tweets from Andrej Karpathy, Ice-T, and experimentally from a combination of both (called "Karpa-T"). All three personalities can be addressed separately.

Usage

Prompts should have the following form:

<human>: Q for Karpathy: What is your favorite movie?
<bot>:

The bot's answer will begin with "(X talking)". See below.

Example generations

<human>: Q for Karpathy: What is your favorite programming language?
<bot>: (Karpathy talking) My favorite programming language is still Matlab, which I started using in 6th grade for my homework problems. The syntax is terse, expressive, and makes you think about what you're trying to say. I miss it so much. I can't tell if this is a joke or not

<human>: Q for Ice-T: What is the sound of one hand clapping?
<bot>: (Ice-T talking) ICE COLD FACT: The sound of one hand clapping… Is a Bunch of fingers hitting the ground. 💎 #DoTheMath #GetReal 💥💥💥💥💥💥💥💥

<human>: Q for Karpa-T: What are you doing right now?
<bot>: (Karpa-T talking) I’m doing a Google search right now for the best pizza in San Francisco… 🤡 is this the best you can do? It’s not even in the top 100. Lol. I’m not even from SF. F that! Where’s the REAL

Bias, Risks, and Limitations

This model is intedned for educational purposes only. The generated text does not accurately reflect the thoughts or opinions of its namesakes (as should be clear from the above).

Inference code

The following code performs inference with this model. Note that the base model is loaded in 8-bit. It should run on a Google Colab machine with a T4 GPU (you may need the high-RAM configuration).

import torch
from transformers import (
            AutoTokenizer, AutoModelForCausalLM,
            DataCollatorForLanguageModeling,
            Trainer, TrainingArguments,
            get_cosine_schedule_with_warmup)
from dataclasses import dataclass
# import fire
from peft import (
    LoraConfig,
    get_peft_model,
    get_peft_model_state_dict,
)
from peft import prepare_model_for_int8_training
from transformers import Trainer
import os

device = 'cuda'
from peft import PeftConfig, PeftModel

# Loading the base model
tokenizer_inf = AutoTokenizer.from_pretrained("togethercomputer/RedPajama-INCITE-Chat-7B-v0.1", padding_side='left')
base_model = AutoModelForCausalLM.from_pretrained("togethercomputer/RedPajama-INCITE-Chat-7B-v0.1", device_map='auto', torch_dtype=torch.float16, load_in_8bit=True)

# Loading LoRA  weights
repo_name= 'AMAbot/AMAbot-7B'
model = PeftModel.from_pretrained(base_model, repo_name) 
model.eval()

prompts = [
    "<human>: Q for Karpathy: What are you doing right now?\n<bot>:",
    "<human>: Q for Ice-T: What are you doing right now?\n<bot>:",
    "<human>: Q for Karpa-T: What are you doing right now?\n<bot>:",
    ]

tokenizer_inf.pad_token = tokenizer_inf.eos_token 
inputs = tokenizer_inf(prompts, return_tensors='pt',  padding='longest').to(device)

print(inputs.input_ids.shape)
input_length = inputs.input_ids.shape[1]
outputs = model.generate(
    # **inputs, max_new_tokens=64, do_sample=True, temperature=0.7, top_p=0.7, top_k=50, return_dict_in_generate=True
    **inputs, max_new_tokens=64, do_sample=True, top_p=0.7, return_dict_in_generate=True
)
# tokenz = outputs.sequences[:, input_length:]  # If we want to remove the questions from the answers
tokenz = outputs.sequences  
output_list = tokenizer_inf.batch_decode(tokenz)
for x in output_list:
    print(x.replace('<|endoftext|>','')+"\n==" )