Safetensors
English
llama
CoALM-8B / README.md
emrecanacikgoz's picture
Update README.md
8c8ed27 verified
|
raw
history blame
4.26 kB
metadata
license: apache-2.0
language:
  - en
metrics:
  - accuracy
base_model:
  - meta-llama/Llama-3.1-8B-Instruct

CALM-8B: Conversational Agentic Language Model

Model Description

CALM-8B is the smallest open-source model of CALM (Conversational Agentic Language Model) series, designed to integrate both Task-Oriented Dialogue (TOD) capabilities and Language Agent (LA) functionalities into a unified system. By fine-tuning on CALM-IT, a novel dataset that interleaves multi-turn ReAct-based reasoning with complex API usage, CALM-8B achieves promising results on TOD and function-calling benchmarks.

CALM-8B is trained on a multi-task dataset covering dialogue state tracking, function calling, and multi-turn reasoning. The model outperforms top proprietary and domain-specific models, including GPT-4o, on key evaluation benchmarks: MultiWOZ 2.4 (TOD), BFCL V3 (LA), and API-Bank (LA).

Model Sources [optional]

  • Paper [optional]: [More Information Needed]
  • Repository: [More Information Needed]

Model Details

  • Model Name: CALM-8B
  • Developed by: Colloboration of UIUC Conversational AI LAB and Oumi
  • License: Apache 2.0
  • Architecture: Fine-tuned Llama 3.1 8B Instruct
  • Training Data: CALM-IT dataset
  • Fine-tuning Framework: Oumi
  • Training Hardware: 8 NVIDIA H100 GPUs
  • Training Duration: ~8 hours
  • Evaluation Benchmarks: MultiWOZ 2.4, BFCL V3, API-Bank
  • Release Date: February 5, 2025

Capabilities and Features

πŸ—£ Conversational Agentic Abilities

  • Multi-turn Dialogue Mastery: Maintains coherent conversations across multiple turns with accurate state tracking.
  • Function Calling and API Integration: Dynamically selects and calls APIs for task execution.
  • ReAct-based Reasoning: Utilizes a structured reasoning process (User-Thought-Action-Observation-Thought-Response).
  • Zero-Shot Generalization: Excels in previously unseen function-calling tasks.

πŸš€ Benchmark Performance

  • MultiWOZ 2.4 (TOD): Excels in dialogue state tracking and task completion.
  • BFCL V3 (LA): Demonstrates superior function-calling abilities over language agents.
  • API-Bank (LA): Accurately generates API calls and integrates responses into conversation flow.

Training Process

πŸ”§ Fine-tuning Stages

  1. TOD Fine-tuning: Optimized for dialogue state tracking (e.g., augmented SNIPS reformatted in Alpaca-style instruction tuning).
  2. Function Calling Fine-tuning: Trained to select and generate well-formed API calls from LA datasets.
  3. ReAct-based Fine-tuning: Addresses multi-turn conversations with API integration using a structured reasoning framework.

πŸ” Training Hyperparameters

  • Base Model: Llama 3.1 8B Instruct
  • LoRA Config: Rank = 16, Scaling Factor = 32
  • Batch Size: 8
  • Learning Rate: 1e-4
  • Optimizer: AdamW (betas = 0.9, 0.999, epsilon = 1e-8)
  • Precision: Mixed precision (bfloat16)
  • Warm-up Steps: 0.1 ratio of total steps
  • Gradient Accumulation Steps: 1

Usage

πŸ— How to Load the Model

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("uiuc-convai/CALM-8B")
model = AutoModelForCausalLM.from_pretrained("uiuc-convai/CALM-8B")

πŸ›  Example Inference

TODO

  • Task-Specific Calibration: While CALM-8B generalizes well across tasks, performance can improve with domain-specific fine-tuning.
  • Scalability to Larger Models: Future iterations (CALM-70B, CALM-405B) extend capabilities to larger-scale agentic conversations.
  • Open-Source Expansion: All datasets, training scripts, and model checkpoints are publicly available to foster further research.

Citation

If you use CALM-8B in your research, please cite:

@article{yourpaper2024,
  title={CALM: Conversational Agentic Language Model},
  author={Your Name and Collaborators},
  journal={Your Conference/Journal},
  year={2025}
}

For more details, visit Project Repository or contact [email protected].