Models

class lighteval.models.abstract_model.LightevalModel

< source >

( )

cleanup

< source >

( )

Clean up operations if needed, such as closing an endpoint.

greedy_until

< source >

( requests: listoverride_bs: typing.Optional[int] = None ) → list[GenerativeResponse]

Parameters

requests (list[Request]) — list of requests containing the context and ending conditions.
disable_tqdm (bool, optional) — Whether to disable the progress bar. Defaults to False.
override_bs (int, optional) — Override the batch size for generation. Defaults to None.

Returns

list[GenerativeResponse]

list of generated responses.

Generates responses using a greedy decoding strategy until certain ending conditions are met.

greedy_until_multi_turn

< source >

( requests: listoverride_bs: typing.Optional[int] = None )

Generates responses using a greedy decoding strategy until certain ending conditions are met.

loglikelihood

< source >

( requests: listoverride_bs: typing.Optional[int] = None )

Tokenize the context and continuation and compute the log likelihood of those tokenized sequences.

loglikelihood_rolling

< source >

( requests: listoverride_bs: typing.Optional[int] = None )

This function is used to compute the log likelihood of the context for perplexity metrics.

loglikelihood_single_token

< source >

( requests: listoverride_bs: typing.Optional[int] = None )

Tokenize the context and continuation and compute the log likelihood of those tokenized sequences.

tok_encode_pair

< source >

( contextcontinuationpairwise: bool = False ) → Tuple[TokenSequence, TokenSequence]

Parameters

context (str) — The context string to be encoded.
continuation (str) — The continuation string to be encoded.
pairwise (bool) — If True, encode context and continuation separately. If False, encode them together and then split.

Returns

Tuple[TokenSequence, TokenSequence]

A tuple containing the encoded context and continuation.

Encodes a context, continuation pair by taking care of the spaces in between.

The advantage of pairwise is: 1) It better aligns with how LLM predicts tokens 2) Works in case len(tok(context,cont)) != len(tok(context)) + len(tok(continuation)). E.g this can happen for chinese if no space is used between context/continuation

class lighteval.models.transformers.transformers_model.TransformersModelConfig

< source >

( pretrained: straccelerator: Accelerator = Nonetokenizer: typing.Optional[str] = Nonemultichoice_continuations_start_space: typing.Optional[bool] = Nonepairwise_tokenization: bool = Falsesubfolder: typing.Optional[str] = Nonerevision: str = 'main'batch_size: int = -1max_gen_toks: typing.Optional[int] = 256max_length: typing.Optional[int] = Noneadd_special_tokens: bool = Truemodel_parallel: typing.Optional[bool] = Nonedtype: typing.Union[str, torch.dtype, NoneType] = Nonedevice: typing.Union[int, str] = 'cuda'quantization_config: typing.Optional[transformers.utils.quantization_config.BitsAndBytesConfig] = Nonetrust_remote_code: bool = Falseuse_chat_template: bool = Falsecompile: bool = Falsegeneration_parameters: GenerationParameters = Nonegeneration_config: GenerationConfig = None )

Parameters

pretrained (str) — HuggingFace Hub model ID name or the path to a pre-trained model to load. This is effectively the pretrained_model_name_or_path argument of from_pretrained in the HuggingFace transformers API.
accelerator (Accelerator) — accelerator to use for model training.
tokenizer (Optional[str]) — HuggingFace Hub tokenizer ID that will be used for tokenization.
multichoice_continuations_start_space (Optional[bool]) — Whether to add a space at the start of each continuation in multichoice generation. For example, context: “What is the capital of France?” and choices: “Paris”, “London”. Will be tokenized as: “What is the capital of France? Paris” and “What is the capital of France? London”. True adds a space, False strips a space, None does nothing
pairwise_tokenization (bool) — Whether to tokenize the context and continuation as separately or together.
subfolder (Optional[str]) — The subfolder within the model repository.
revision (str) — The revision of the model.
batch_size (int) — The batch size for model training.
max_gen_toks (Optional[int]) — The maximum number of tokens to generate.
max_length (Optional[int]) — The maximum length of the generated output.
add_special_tokens (bool, optional, defaults to True) — Whether to add special tokens to the input sequences. If None, the default value will be set to True for seq2seq models (e.g. T5) and False for causal models.
model_parallel (bool, optional, defaults to False) — True/False: force to use or not the accelerate library to load a large model across multiple devices. Default: None which corresponds to comparing the number of processes with the number of GPUs. If it’s smaller => model-parallelism, else not.
dtype (Union[str, torch.dtype], optional, defaults to None) —): Converts the model weights to dtype, if specified. Strings get converted to torch.dtype objects (e.g. float16 -> torch.float16). Use dtype="auto" to derive the type from the model’s weights.
device (Union[int, str]) — device to use for model training.
quantization_config (Optional[BitsAndBytesConfig]) — quantization configuration for the model, manually provided to load a normally floating point model at a quantized precision. Needed for 4-bit and 8-bit precision.
trust_remote_code (bool) — Whether to trust remote code during model loading.
generation_parameters (GenerationParameters) — Range of parameters which will affect the generation.
generation_config (GenerationConfig) — GenerationConfig object (only passed during manual creation)

Base configuration class for models.

Methods: post_init(): Performs post-initialization checks on the configuration. _init_configs(model_name, env_config): Initializes the model configuration. init_configs(env_config): Initializes the model configuration using the environment configuration. get_model_sha(): Retrieves the SHA of the model.

class lighteval.models.transformers.transformers_model.TransformersModel

< source >

( env_config: EnvConfigconfig: TransformersModelConfig )

greedy_until

< source >

( requests: listoverride_bs: typing.Optional[int] = None ) → list[GenerativeResponse]

Parameters

requests (list[Request]) — list of requests containing the context and ending conditions.
override_bs (int, optional) — Override the batch size for generation. Defaults to None.

Returns

list[GenerativeResponse]

list of generated responses.

Generates responses using a greedy decoding strategy until certain ending conditions are met.

init_model_parallel

< source >

( model_parallel: bool | None = None )

Compute all the parameters related to model_parallel

loglikelihood

< source >

( requests: listoverride_bs: typing.Optional[int] = None ) → list[Tuple[float, bool]]

Parameters

requests (list[Tuple[str, dict]]) — description

Returns

list[Tuple[float, bool]]

description

Tokenize the context and continuation and compute the log likelihood of those tokenized sequences.

loglikelihood_single_token

< source >

( requests: listoverride_bs: typing.Optional[int] = None ) → list[Tuple[float, bool]]

Parameters

requests (list[Tuple[str, dict]]) — description

Returns

list[Tuple[float, bool]]

description

Tokenize the context and continuation and compute the log likelihood of those tokenized sequences.

pad_and_gather

< source >

( output_tensor: Tensordrop_last_samples: bool = Truenum_samples: int = None ) → torch.Tensor

Parameters

output_tensor (torch.Tensor) — The output tensor to be padded.
drop_last_samples (bool, optional) — Whether to drop the last samples during gathering.
Last samples are dropped when the number of samples is not divisible by the number of processes. — Defaults to True.

Returns

torch.Tensor

The padded output tensor and the gathered length tensor.

Pads the output_tensor to the maximum length and gathers the lengths across processes.

prepare_batch_logprob

< source >

( batch: listpadding_length: intmax_context: typing.Optional[int] = Nonesingle_token: bool = False )

Tokenize a batch of inputs and return also the length, truncations and padding. This step is done manually since we tokenize log probability inputs together with their continuation, to manage possible extra spaces added at the start by tokenizers, see tok_encode_pair.

class lighteval.models.transformers.adapter_model.AdapterModelConfig

< source >

( pretrained: straccelerator: Accelerator = Nonetokenizer: typing.Optional[str] = Nonemultichoice_continuations_start_space: typing.Optional[bool] = Nonepairwise_tokenization: bool = Falsesubfolder: typing.Optional[str] = Nonerevision: str = 'main'batch_size: int = -1max_gen_toks: typing.Optional[int] = 256max_length: typing.Optional[int] = Noneadd_special_tokens: bool = Truemodel_parallel: typing.Optional[bool] = Nonedtype: typing.Union[str, torch.dtype, NoneType] = Nonedevice: typing.Union[int, str] = 'cuda'quantization_config: typing.Optional[transformers.utils.quantization_config.BitsAndBytesConfig] = Nonetrust_remote_code: bool = Falseuse_chat_template: bool = Falsecompile: bool = Falsegeneration_parameters: GenerationParameters = Nonegeneration_config: GenerationConfig = Nonebase_model: str = None )

class lighteval.models.transformers.adapter_model.AdapterModel

< source >

( env_config: EnvConfigconfig: TransformersModelConfig )

class lighteval.models.transformers.delta_model.DeltaModelConfig

< source >

( pretrained: straccelerator: Accelerator = Nonetokenizer: typing.Optional[str] = Nonemultichoice_continuations_start_space: typing.Optional[bool] = Nonepairwise_tokenization: bool = Falsesubfolder: typing.Optional[str] = Nonerevision: str = 'main'batch_size: int = -1max_gen_toks: typing.Optional[int] = 256max_length: typing.Optional[int] = Noneadd_special_tokens: bool = Truemodel_parallel: typing.Optional[bool] = Nonedtype: typing.Union[str, torch.dtype, NoneType] = Nonedevice: typing.Union[int, str] = 'cuda'quantization_config: typing.Optional[transformers.utils.quantization_config.BitsAndBytesConfig] = Nonetrust_remote_code: bool = Falseuse_chat_template: bool = Falsecompile: bool = Falsegeneration_parameters: GenerationParameters = Nonegeneration_config: GenerationConfig = Nonebase_model: str = None )

class lighteval.models.transformers.delta_model.DeltaModel

< source >

( env_config: EnvConfigconfig: TransformersModelConfig )

class lighteval.models.endpoints.endpoint_model.InferenceEndpointModelConfig

< source >

( endpoint_name: str = Nonemodel_name: str = Nonereuse_existing: bool = Falseaccelerator: str = 'gpu'model_dtype: str = Nonevendor: str = 'aws'region: str = 'us-east-1'instance_size: str = Noneinstance_type: str = Noneframework: str = 'pytorch'endpoint_type: str = 'protected'add_special_tokens: bool = Truerevision: str = 'main'namespace: str = Noneimage_url: str = Noneenv_vars: dict = Nonegeneration_parameters: GenerationParameters = None )

from_path

< source >

( path: str ) → InferenceEndpointModelConfig

Parameters

path (str) — Path of the model configuration YAML file.

Returns

InferenceEndpointModelConfig

Configuration for inference endpoint model.

Load configuration for inference endpoint model from YAML file path.

class lighteval.models.endpoints.endpoint_model.ServerlessEndpointModelConfig

< source >

( model_name: stradd_special_tokens: bool = Truegeneration_parameters: GenerationParameters = None )

class lighteval.models.endpoints.endpoint_model.InferenceEndpointModel

< source >

( config: typing.Union[lighteval.models.endpoints.endpoint_model.InferenceEndpointModelConfig, lighteval.models.endpoints.endpoint_model.ServerlessEndpointModelConfig]env_config: EnvConfig )

InferenceEndpointModels can be used both with the free inference client, or with inference endpoints, which will use text-generation-inference to deploy your model for the duration of the evaluation.

class lighteval.models.endpoints.tgi_model.TGIModelConfig

< source >

( inference_server_address: strinference_server_auth: strmodel_id: strgeneration_parameters: GenerationParameters = None )

from_path

< source >

( path: str ) → TGIModelConfig

Parameters

path (str) — Path of the model configuration YAML file.

Returns

TGIModelConfig

Configuration for TGI endpoint model.

Load configuration for TGI endpoint model from YAML file path.

class lighteval.models.endpoints.tgi_model.ModelClient

< source >

( config: TGIModelConfig )

class lighteval.models.endpoints.openai_model.OpenAIClient

< source >

( config: OpenAIModelConfigenv_config )

greedy_until

< source >

( requests: listoverride_bs: typing.Optional[int] = None ) → list[GenerativeResponse]

Parameters

requests (list[Request]) — list of requests containing the context and ending conditions.
override_bs (int, optional) — Override the batch size for generation. Defaults to None.

Returns

list[GenerativeResponse]

list of generated responses.

Generates responses using a greedy decoding strategy until certain ending conditions are met.

class lighteval.models.nanotron.nanotron_model.NanotronLightevalModel

< source >

( checkpoint_path: strnanotron_config: FullNanotronConfigparallel_context: ParallelContextmax_gen_toks: typing.Optional[int] = 256max_length: typing.Optional[int] = Noneadd_special_tokens: typing.Optional[bool] = Truedtype: typing.Union[str, torch.dtype, NoneType] = Nonetrust_remote_code: bool = Falsedebug_one_layer_model: bool = Falsemodel_class: typing.Optional[typing.Type] = Noneenv_config: EnvConfig = None )

gather

< source >

( output_tensor: Tensorprocess_group: dist.ProcessGroup = None )

Gather together tensors of (possibly) various size spread on separate GPUs (first exchange the lengths and then pad and gather)

greedy_until

< source >

( requests: typing.List[lighteval.tasks.requests.GreedyUntilRequest]disable_tqdm: bool = Falseoverride_bs: int = -1num_dataset_splits: int = 1 )

Greedy generation until a stop token is generated.

homogeneize_ending_conditions

< source >

( ending_condition: tuple | dict | list | str )

Ending conditions are submitted in several possible formats. By default in lighteval we pass them as tuples (stop sequence, max number of items). In the harness they sometimes are passed as dicts {“until”: .., “max_length”: …} or as only ending conditions, either lists or strings. Here, we convert all these formats to a tuple containing a list of ending conditions, and a float for the max length allowed.

loglikelihood_single_token

< source >

( requests: typing.List[typing.Tuple[str, dict]]override_bs = 0 ) → List[Tuple[float, bool]]

Parameters

requests (List[Tuple[str, dict]]) — description

Returns

List[Tuple[float, bool]]

description

Tokenize the context and continuation and compute the log likelihood of those tokenized sequences.

pad_and_gather

< source >

( output_tensor: Tensor )

Gather together tensors of (possibly) various size spread on separate GPUs (first exchange the lengths and then pad and gather)

prepare_batch

< source >

( batch: typing.List[str]padding_length: intmax_context: typing.Optional[int] = Nonefull_attention_masks: bool = Falsepad_on_left: bool = False )

Tokenize a batch of inputs and return also the length, truncations and padding

We truncate to keep only at most max_context tokens We pad to padding_length tokens

class lighteval.models.vllm.vllm_model.VLLMModelConfig

< source >

( pretrained: strgpu_memory_utilisation: float = 0.9revision: str = 'main'dtype: str | None = Nonetensor_parallel_size: int = 1pipeline_parallel_size: int = 1data_parallel_size: int = 1max_model_length: int | None = Noneswap_space: int = 4seed: int = 1234trust_remote_code: bool = Falseuse_chat_template: bool = Falseadd_special_tokens: bool = Truemultichoice_continuations_start_space: bool = Truepairwise_tokenization: bool = Falsegeneration_parameters: GenerationParameters = Nonesubfolder: typing.Optional[str] = None )

class lighteval.models.vllm.vllm_model.VLLMModel

< source >

( config: VLLMModelConfigenv_config: EnvConfig )

greedy_until

< source >

( requests: listoverride_bs: typing.Optional[int] = None ) → list[GenerateReturn]

Parameters

requests (list[Request]) — list of requests containing the context and ending conditions.
override_bs (int, optional) — Override the batch size for generation. Defaults to None.

Returns

list[GenerateReturn]

list of generated responses.

Generates responses using a greedy decoding strategy until certain ending conditions are met.

Lighteval

Models

Model

LightevalModel

class lighteval.models.abstract_model.LightevalModel

cleanup

greedy_until

greedy_until_multi_turn

loglikelihood

loglikelihood_rolling

loglikelihood_single_token

tok_encode_pair

Accelerate and Transformers Models

TransformersModel

class lighteval.models.transformers.transformers_model.TransformersModelConfig

class lighteval.models.transformers.transformers_model.TransformersModel

greedy_until

init_model_parallel

loglikelihood

loglikelihood_single_token

pad_and_gather

prepare_batch_logprob

AdapterModel

class lighteval.models.transformers.adapter_model.AdapterModelConfig

class lighteval.models.transformers.adapter_model.AdapterModel

DeltaModel

class lighteval.models.transformers.delta_model.DeltaModelConfig

class lighteval.models.transformers.delta_model.DeltaModel

Endpoints-based Models

InferenceEndpointModel

class lighteval.models.endpoints.endpoint_model.InferenceEndpointModelConfig

from_path

class lighteval.models.endpoints.endpoint_model.ServerlessEndpointModelConfig

class lighteval.models.endpoints.endpoint_model.InferenceEndpointModel

TGI ModelClient

class lighteval.models.endpoints.tgi_model.TGIModelConfig

from_path

class lighteval.models.endpoints.tgi_model.ModelClient

Open AI Models

class lighteval.models.endpoints.openai_model.OpenAIClient

greedy_until

Nanotron Model

NanotronLightevalModel

class lighteval.models.nanotron.nanotron_model.NanotronLightevalModel

gather

greedy_until

homogeneize_ending_conditions

loglikelihood_single_token

pad_and_gather

prepare_batch

VLLM Model

VLLMModel

class lighteval.models.vllm.vllm_model.VLLMModelConfig

class lighteval.models.vllm.vllm_model.VLLMModel

greedy_until