Update README.md
Browse files
README.md
CHANGED
@@ -9,13 +9,13 @@ base_model:
|
|
9 |
pipeline_tag: text-generation
|
10 |
---
|
11 |
|
12 |
-
#
|
13 |
|
14 |
[](https://github.com/oumi-ai/oumi)
|
15 |
|
16 |
## π Model Overview
|
17 |
|
18 |
-
**
|
19 |
It is designed to **push the boundaries** of open-source agentic LLMs, excelling at **multi-turn dialogue, tool usage, reasoning, and API execution**. It is the **best-performing fully open-source LLM** on the **Berkeley Function Calling Leaderboard V3 (BFCL V3)**, marking a leap in open-source AI research.
|
20 |
|
21 |
## Model Sources
|
@@ -23,19 +23,19 @@ It is designed to **push the boundaries** of open-source agentic LLMs, excelling
|
|
23 |
<!-- Provide the basic links for the model. -->
|
24 |
|
25 |
- π **Paper:** https://arxiv.org/abs/2502.08820
|
26 |
-
- π **Project Page:** https://emrecanacikgoz.github.io/
|
27 |
-
- π» **Repository:** https://github.com/oumi-ai/oumi/tree/main/configs/projects/
|
28 |
-
- π **Dataset:** https://huggingface.co/datasets/uiuc-convai/
|
29 |
|
30 |
|
31 |
---
|
32 |
## π Model Details
|
33 |
|
34 |
-
- **Model Name:**
|
35 |
- **Developed by:** Colloboration of UIUC Conversational AI LAB and Oumi
|
36 |
- **License:** cc-by-nc-4.0
|
37 |
- **Architecture:** Meta-Llama 3.1-405B Instruct
|
38 |
-
- **Training Data:**
|
39 |
- **Fine-tuning Framework:** [Oumi](https://github.com/oumi-ai/oumi)
|
40 |
- **Training Hardware:** 8 NVIDIA H100 GPUs
|
41 |
- **Training Duration:** ~6.5 days
|
@@ -43,7 +43,7 @@ It is designed to **push the boundaries** of open-source agentic LLMs, excelling
|
|
43 |
- **Release Date:** February 5, 2025
|
44 |
|
45 |
---
|
46 |
-
## π Why
|
47 |
|
48 |
- **π¨ Largest Open-Source Agentic LLM:** A **405B** parameter model that brings state-of-the-art agentic capabilities to the public domain.
|
49 |
- **π― Best Open-Source Performance on BFCL V3:** Outperforms leading proprietary models like **GPT-4o, Gemini, and Claude** in function-calling tasks.
|
@@ -53,7 +53,7 @@ It is designed to **push the boundaries** of open-source agentic LLMs, excelling
|
|
53 |
- **π Fully Open-Source & Reproducible:** Released under **cc-by-nc-4.0**, including model weights, training logs, and datasets.
|
54 |
|
55 |
|
56 |
-
## π‘
|
57 |
<img src="table.png" alt="CALM-IT Dataset Statistics" width="800"/>
|
58 |
|
59 |
|
@@ -81,20 +81,20 @@ It is designed to **push the boundaries** of open-source agentic LLMs, excelling
|
|
81 |
|
82 |
---
|
83 |
|
84 |
-
## βοΈ How to Use
|
85 |
It requires 16xH100 NVIDIA GPUs for Inference.
|
86 |
|
87 |
### π How to Load the Model using HuggingFace
|
88 |
```python
|
89 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
90 |
|
91 |
-
tokenizer = AutoTokenizer.from_pretrained("uiuc-convai/
|
92 |
-
model = AutoModelForCausalLM.from_pretrained("uiuc-convai/
|
93 |
```
|
94 |
|
95 |
### π Example Oumi Inference
|
96 |
Oumi multi-node inference support is under development.
|
97 |
-
|
98 |
To run multi-node inference, we recommend [vLLM](https://docs.vllm.ai/en/latest/serving/distributed_serving.html).
|
99 |
|
100 |
### π Example Oumi Fine-Tuning
|
@@ -115,10 +115,10 @@ This model is licensed under [Creative Commons NonCommercial (CC BY-NC 4.0)](htt
|
|
115 |
|
116 |
---
|
117 |
## π Citation
|
118 |
-
If you use **
|
119 |
```
|
120 |
@misc{acikgoz2025singlemodelmastermultiturn,
|
121 |
-
title={Can a Single Model Master Both Multi-turn Conversations and Tool Use?
|
122 |
author={Emre Can Acikgoz and Jeremiah Greer and Akul Datta and Ze Yang and William Zeng and Oussama Elachqar and Emmanouil Koukoumidis and Dilek Hakkani-TΓΌr and Gokhan Tur},
|
123 |
year={2025},
|
124 |
eprint={2502.08820},
|
@@ -128,7 +128,7 @@ If you use **CALM-405B** in your research, please cite:
|
|
128 |
}
|
129 |
```
|
130 |
|
131 |
-
For more details, visit [Project Repository](https://github.com/oumi-ai/oumi/tree/main/configs/projects/
|
132 |
|
133 |
|
134 |
|
|
|
9 |
pipeline_tag: text-generation
|
10 |
---
|
11 |
|
12 |
+
# CoALM-405B: The Largest Open-Source Agentic LLM
|
13 |
|
14 |
[](https://github.com/oumi-ai/oumi)
|
15 |
|
16 |
## π Model Overview
|
17 |
|
18 |
+
**CoALM-405B** is the **largest fully open-source Conversational Agentic Language Model**. This model sets a new standard in **Conversational AI**, seamlessly integrating both **Task-Oriented Dialogue (TOD) capabilities** and **Language Agent (LA) functionalities**.
|
19 |
It is designed to **push the boundaries** of open-source agentic LLMs, excelling at **multi-turn dialogue, tool usage, reasoning, and API execution**. It is the **best-performing fully open-source LLM** on the **Berkeley Function Calling Leaderboard V3 (BFCL V3)**, marking a leap in open-source AI research.
|
20 |
|
21 |
## Model Sources
|
|
|
23 |
<!-- Provide the basic links for the model. -->
|
24 |
|
25 |
- π **Paper:** https://arxiv.org/abs/2502.08820
|
26 |
+
- π **Project Page:** https://emrecanacikgoz.github.io/CoALM/
|
27 |
+
- π» **Repository:** https://github.com/oumi-ai/oumi/tree/main/configs/projects/CALM
|
28 |
+
- π **Dataset:** https://huggingface.co/datasets/uiuc-convai/CoALM-IT
|
29 |
|
30 |
|
31 |
---
|
32 |
## π Model Details
|
33 |
|
34 |
+
- **Model Name:** CoALM-405B
|
35 |
- **Developed by:** Colloboration of UIUC Conversational AI LAB and Oumi
|
36 |
- **License:** cc-by-nc-4.0
|
37 |
- **Architecture:** Meta-Llama 3.1-405B Instruct
|
38 |
+
- **Training Data:** CoALM-IT
|
39 |
- **Fine-tuning Framework:** [Oumi](https://github.com/oumi-ai/oumi)
|
40 |
- **Training Hardware:** 8 NVIDIA H100 GPUs
|
41 |
- **Training Duration:** ~6.5 days
|
|
|
43 |
- **Release Date:** February 5, 2025
|
44 |
|
45 |
---
|
46 |
+
## π Why CoALM-405B is a Game-Changer
|
47 |
|
48 |
- **π¨ Largest Open-Source Agentic LLM:** A **405B** parameter model that brings state-of-the-art agentic capabilities to the public domain.
|
49 |
- **π― Best Open-Source Performance on BFCL V3:** Outperforms leading proprietary models like **GPT-4o, Gemini, and Claude** in function-calling tasks.
|
|
|
53 |
- **π Fully Open-Source & Reproducible:** Released under **cc-by-nc-4.0**, including model weights, training logs, and datasets.
|
54 |
|
55 |
|
56 |
+
## π‘ CoALM-IT Dataset
|
57 |
<img src="table.png" alt="CALM-IT Dataset Statistics" width="800"/>
|
58 |
|
59 |
|
|
|
81 |
|
82 |
---
|
83 |
|
84 |
+
## βοΈ How to Use CoALM-405B
|
85 |
It requires 16xH100 NVIDIA GPUs for Inference.
|
86 |
|
87 |
### π How to Load the Model using HuggingFace
|
88 |
```python
|
89 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
90 |
|
91 |
+
tokenizer = AutoTokenizer.from_pretrained("uiuc-convai/CoALM-8B")
|
92 |
+
model = AutoModelForCausalLM.from_pretrained("uiuc-convai/CoALM-8B")
|
93 |
```
|
94 |
|
95 |
### π Example Oumi Inference
|
96 |
Oumi multi-node inference support is under development.
|
97 |
+
CoALM-405B likely requires multi-node inference as most single nodes support up to 640GB of GPU VRAM.
|
98 |
To run multi-node inference, we recommend [vLLM](https://docs.vllm.ai/en/latest/serving/distributed_serving.html).
|
99 |
|
100 |
### π Example Oumi Fine-Tuning
|
|
|
115 |
|
116 |
---
|
117 |
## π Citation
|
118 |
+
If you use **CoALM-405B** in your research, please cite:
|
119 |
```
|
120 |
@misc{acikgoz2025singlemodelmastermultiturn,
|
121 |
+
title={Can a Single Model Master Both Multi-turn Conversations and Tool Use? CoALM: A Unified Conversational Agentic Language Model},
|
122 |
author={Emre Can Acikgoz and Jeremiah Greer and Akul Datta and Ze Yang and William Zeng and Oussama Elachqar and Emmanouil Koukoumidis and Dilek Hakkani-TΓΌr and Gokhan Tur},
|
123 |
year={2025},
|
124 |
eprint={2502.08820},
|
|
|
128 |
}
|
129 |
```
|
130 |
|
131 |
+
For more details, visit [Project Repository](https://github.com/oumi-ai/oumi/tree/main/configs/projects/CALM) or contact **[email protected]**.
|
132 |
|
133 |
|
134 |
|