LLaMa-10b-instruct model card

Model Details

Developed by: EmpirischTech/ChaperoneAI
Backbone Model: LLaMA
Language(s): English
Library: HuggingFace Transformers
License: This model is under a Non-commercial Bespoke License and governed by the Meta license. You should only use this repository if you have been granted access to the model by filling out this form, but have either lost your copy of the weights or encountered issues converting them to the Transformers format
Where to send comments: Instructions on how to provide feedback or comments on a model can be found by opening an issue in the Hugging Face community's model repository
Contact: For questions and comments about the model, please email contact-us

Training

Bigger models, more data, and better hardware have consistently improved deep learning performance. Whether in NLP or computer vision, larger models have led to major breakthroughs. However, most cutting-edge models are still trained from scratch, meaning they start with randomly initialized weights. The problem? Training costs are skyrocketing.

To address the escalating computational costs of training large-scale models, various approaches have been proposed. For instance, arXiv.2212.05055 demonstrates a method where pretrained large models are upscaled by selectively retaining dense layers called Mixture-of-Experts (MoE), followed by continued pretraining. This strategy can potentially reduce the training budget by up to 50% while maintaining performance.

In this work, we take a step toward realizing such an approach. Specifically, we extend an existing 8B-parameter model to 10B parameters by initializing the additional layers with pretrained weights, followed by continued pretraining on a smaller dataset across multiple epochs. Due to budget constraints, we were unable to surpass the foundational model on the EleutherAI evaluation benchmark. However, our approach yielded improved performance in terms of perplexity, demonstrating potential for cost-efficient scaling strategies in large language model development.

Usage

Tested on A100 80GB
Our model can handle up to 132k input tokens as supported by the Llama-3.1 architecture.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
model_id="empirischtech/Llama-3.1-10B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.float16
)

prompt = "### User:\nEmma feels perfectly fine, yet she still has an appointment at the hospital. What might be the reasons?\n\n### Assistant:\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
del inputs["token_type_ids"]
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

output = model.generate(**inputs, streamer=streamer, use_cache=True, max_new_tokens=1024)
output_text = tokenizer.decode(output[0], skip_special_tokens=True)

Hardware and Software

Hardware: We utilized an A100x8 for training our model
Training Factors: The model was pretrained using a combination of the DeepSpeed library and the HuggingFace Trainer

Evaluation Results

The following two different evaluations are performed.

Preplexity as Evaluation Metric

Perplexity (PPL) is a metric used to evaluate the performance of language models. It measures how well a probability distribution or a language model predicts a sample. A lower perplexity score indicates better performance (i.e., the model is more confident in its predictions).

Main Results

Model	Perplexity Score
Llama-3.1-8B-Instruct	842611366.59
Llama-3.1-10B-Instruct	2890.31

Scripts to generate evalution results

from evaluate import load
import datasets


perplexity = load("perplexity", module_type="metric")
input_texts = datasets.load_dataset("wikitext",
                                    "wikitext-2-raw-v1",
                                    split="test")["text"]

input_texts = [s for s in input_texts if s!='']

model_path='empirischtech/Llama-3.1-10B-Instruct'
results = perplexity.compute(model_id=model_name_or_path,
                             add_start_token=False,
                             predictions=input_texts)


print(round(results["mean_perplexity"], 2))

Harness Evaluation

The performance evaluation is based on the tasks being evaluated on the Open LLM Leaderboard. The model is evaluated on three benchmark datasets, which include ARC-Challenge, HellaSwag, MMLU and IFEval. The library used is lm-evaluation-harness repository

Main Results

Model	ARC	HellaSwag	MMLU	IFEval
Llama-3.1-8B-Instruct	52.05	59.10	42.07	42.14
Llama-3.1-10B-Instruct	50.42	57.81	35.62	35.67

Scripts to generate evalution results

# install from https://github.com/EleutherAI/lm-evaluation-harness
pip install lm-eval>=0.4.7

from lm_eval import evaluator

tasks_list = ["arc_challenge", "ifeval", "mmlu_pro", "hellaswag"]  # Benchmark dataset

model_path="empirischtech/Llama-3.1-10B-Instruct"

# Run evaluation
results = evaluator.simple_evaluate(
    model="hf",  # Hugging Face model
    cache_requests=False,
    model_args=f"pretrained={model_path}",
    tasks=tasks_list, 
    batch_size=4,
    device="cuda:0" 
)

# Extract results
results = results['results']
json_string = json.dumps(results, indent=4)

Ethical Issues

Ethical Considerations

There were no ethical issues involved, as we did not include the benchmark test set or the training set in the model's training process

Contact Us

Why Our LLMs?

EmpirischTech/ChaperoneAI Unlock the full potential of private LLMs for your business with ease. Customize and fine-tune them using your own data for a solution that fits your unique needs. Want a seamless integration? Let’s connect! ► Get in touch

empirischtech
/

Llama-3.1-10B-Instruct

LLaMa-10b-instruct model card

Model Details

Training

Usage

Hardware and Software

Evaluation Results

Preplexity as Evaluation Metric

Main Results

Scripts to generate evalution results

Harness Evaluation

Main Results

Scripts to generate evalution results

Ethical Issues

Ethical Considerations

Contact Us

Why Our LLMs?

Model tree for empirischtech/Llama-3.1-10B-Instruct