metadata
license: apache-2.0
language:
- en
pipeline_tag: text-generation
library_name: transformers
Requirement Checker Adapters
Model Summary
This Requirement Checker family of adapters are designed to check if specified requirements were satisfied by the last model generation. Only one requirement is checked at a time (multiple requirements can be checked with parallel model calls).
- Developer: IBM Research
- License: Apache 2.0
Usage
Intended use
Usage steps Given a generation task and a set of requirements:
- Use the base model to generate a response as normal (via the
assistantrole), with the prompt describing the task followed by "Requirements:"" and the list of active requirements. - Repeat the requirement to be checked.
- The Requirement Checker model will respond with "true" or "false", where "true" means the requirement is satisfied.
Quickstart Example
First, see information elsewhere in this repo on how to start up a vLLM server hosting the LoRAs and/or aLoRAs. Once this server is started, it can be queried via the OpenAI API. An example for this intrinsic follows.
import os
import openai
import json
import granite_common
PROMPT = "What is IBM?"
REQUIREMENTS = "Use a formal tone.\n Do not use long words."
REPONSE = ... # this should be generated by the base model corresponding to the chosen adapter
REQUIREMENT_TO_CHECK = "Use a formal tone."
request = {
"messages": [
{
"content": PROMPT + "\nRequirements: " + REQUIREMENTS,
"role": "user"
},
{
"role": "assistant",
"content": RESPONSE
},
],
"model": "requirement_check",
"temperature": 0.0
}
openai_base_url = ...
openai_api_key = ...
io_yaml_file = "./rag_intrinsics_lib/requirement_check/.../io.yaml"
rewriter = granite_common.IntrinsicsRewriter(config_file=io_yaml_file)
result_processor = granite_common.IntrinsicsResultProcessor(config_file=io_yaml_file)
rewritten_request = rewriter.transform(request, requirement = REQUIREMENT_TO_CHECK)
client = openai.OpenAI(base_url=openai_base_url, api_key=openai_api_key)
chat_completion = client.chat.completions.create(**rewritten_request.model_dump())
transformed_completion = result_processor.transform(chat_completion)
print(transformed_completion.model_dump_json(indent=2))
Evaluation
The model was evaluated on 200 rows of held-out synthetic data. Error rates are as follows:
aLoRA models
- Granite 3.3 2B: 6.0%
- Granite 3.3 8B: 5.75%
- GPT-OSS 20B: 5.75%
LoRA models
- Granite 3.3 2B: 4.5%
- Granite 3.3 8B: 4.0%
- GPT-OSS 20B: 4.0%
Training Data
Synthetic data generated by Mixtral 8x22b and GPT-OSS 120B.
Model Card Authors
Kristjan Greenewald Bo Wu