rag-intrinsics-lib / requirement_check /README.md

kgreenewald

Update requirement_check/README.md

013699d verified 2 months ago

preview code

raw

history blame

2.94 kB

metadata

license: apache-2.0
language:
  - en
pipeline_tag: text-generation
library_name: transformers

Requirement Checker Adapters

Model Summary

This Requirement Checker family of adapters are designed to check if specified requirements were satisfied by the last model generation. Only one requirement is checked at a time (multiple requirements can be checked with parallel model calls).

Developer: IBM Research
License: Apache 2.0

Usage

Intended use

Usage steps Given a generation task and a set of requirements:

Use the base model to generate a response as normal (via the assistant role), with the prompt describing the task followed by "Requirements:"" and the list of active requirements.
Repeat the requirement to be checked.
The Requirement Checker model will respond with "true" or "false", where "true" means the requirement is satisfied.

Quickstart Example

First, see information elsewhere in this repo on how to start up a vLLM server hosting the LoRAs and/or aLoRAs. Once this server is started, it can be queried via the OpenAI API. An example for this intrinsic follows.

import os
import openai
import json
import granite_common

PROMPT = "What is IBM?"
REQUIREMENTS = "Use a formal tone.\n Do not use long words."
REPONSE = ... # this should be generated by the base model corresponding to the chosen adapter
REQUIREMENT_TO_CHECK = "Use a formal tone."

request = {
  "messages": [
    {
      "content": PROMPT + "\nRequirements: " + REQUIREMENTS,
      "role": "user"
    },
    {
      "role": "assistant",
      "content": RESPONSE
    },
  ],
  "model": "requirement_check",
  "temperature": 0.0
}
openai_base_url = ...
openai_api_key = ...
io_yaml_file = "./rag_intrinsics_lib/requirement_check/.../io.yaml"

rewriter = granite_common.IntrinsicsRewriter(config_file=io_yaml_file)
result_processor = granite_common.IntrinsicsResultProcessor(config_file=io_yaml_file)

rewritten_request = rewriter.transform(request, requirement = REQUIREMENT_TO_CHECK)

client = openai.OpenAI(base_url=openai_base_url, api_key=openai_api_key)
chat_completion = client.chat.completions.create(**rewritten_request.model_dump())

transformed_completion = result_processor.transform(chat_completion)

print(transformed_completion.model_dump_json(indent=2))

Evaluation

The model was evaluated on 200 rows of held-out synthetic data. Error rates are as follows:

aLoRA models

Granite 3.3 2B: 6.0%
Granite 3.3 8B: 5.75%
GPT-OSS 20B: 5.75%

LoRA models

Granite 3.3 2B: 4.5%
Granite 3.3 8B: 4.0%
GPT-OSS 20B: 4.0%

Training Data

Synthetic data generated by Mixtral 8x22b and GPT-OSS 120B.

Model Card Authors

Kristjan Greenewald Bo Wu