Initial commit
Browse files- .gitattributes +2 -0
- README.md +97 -3
- ReaderLM-v2-Q4_K_M.gguf +3 -0
- ReaderLM-v2-Q8_0.gguf +3 -0
.gitattributes
CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
+
ReaderLM-v2-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
37 |
+
ReaderLM-v2-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
@@ -1,3 +1,97 @@
|
|
1 |
-
---
|
2 |
-
license: cc-by-nc-4.0
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: cc-by-nc-4.0
|
3 |
+
tags:
|
4 |
+
- llama.cpp
|
5 |
+
- gguf
|
6 |
+
- ReaderLM-v2
|
7 |
+
- html-to-markdown
|
8 |
+
- jina-ai
|
9 |
+
---
|
10 |
+
|
11 |
+
# ReaderLM-v2 GGUF Quantized Models for llama.cpp
|
12 |
+
|
13 |
+
This repository contains **GGUF quantized versions** of the [ReaderLM-v2](https://huggingface.co/jinaai/ReaderLM-v2) model by [Jina AI](https://jina.ai/). These models are optimized for **llama.cpp**, making them efficient to run on CPUs and GPUs.
|
14 |
+
|
15 |
+
## Model Information
|
16 |
+
|
17 |
+
ReaderLM-v2 is a **1.5 billion parameter** model designed for **HTML-to-Markdown** and **HTML-to-JSON** conversion. It supports **29 languages** and can handle **up to 512,000 tokens** in combined input and output length.
|
18 |
+
|
19 |
+
The model is useful for extracting structured data from web pages and various NLP applications.
|
20 |
+
|
21 |
+
## Available Quantized Models
|
22 |
+
|
23 |
+
| Model File | Quantization Type | Size | Description |
|
24 |
+
|---------------------------|------------------|-------|-------------|
|
25 |
+
| `ReaderLM-v2-Q4_K_M.gguf` | Q4_K_M | 986MB | Lower precision, optimized for CPU performance |
|
26 |
+
| `ReaderLM-v2-Q8_0.gguf` | Q8_0 | 1.6GB | Higher precision, better quality |
|
27 |
+
|
28 |
+
These quantized versions balance **performance and accuracy**, making them suitable for different hardware setups.
|
29 |
+
|
30 |
+
## Usage
|
31 |
+
|
32 |
+
### Running the Model with llama.cpp
|
33 |
+
|
34 |
+
1. **Clone and build llama.cpp**:
|
35 |
+
```bash
|
36 |
+
git clone https://github.com/ggerganov/llama.cpp.git
|
37 |
+
cd llama.cpp
|
38 |
+
mkdir build && cd build
|
39 |
+
cmake ..
|
40 |
+
make -j$(nproc)
|
41 |
+
```
|
42 |
+
|
43 |
+
2. **Run the model**:
|
44 |
+
```bash
|
45 |
+
./llama-cli --model ReaderLM-v2-Q4_K_M.gguf --no-conversation --no-display-prompt --temp 0 --prompt '<|im_start|>system
|
46 |
+
Convert the HTML to Markdown.
|
47 |
+
<|im_end|>
|
48 |
+
<|im_start|>user
|
49 |
+
<html><body><h1>Hello, world!</h1></body></html>
|
50 |
+
<|im_end|>
|
51 |
+
<|im_start|>assistant' 2>/dev/null
|
52 |
+
```
|
53 |
+
|
54 |
+
Replace `ReaderLM-v2-Q4_K_M.gguf` with `ReaderLM-v2-Q8_0.gguf` for better quality at the cost of performance.
|
55 |
+
|
56 |
+
### Using the Model in Python with llama-cpp-python
|
57 |
+
|
58 |
+
```bash
|
59 |
+
pip install llama-cpp-python
|
60 |
+
```
|
61 |
+
|
62 |
+
```python
|
63 |
+
model_path = "./models/ReaderLM-v2-Q4_K_M.gguf"
|
64 |
+
llm = Llama(model_path=model_path, chat_format="chatml")
|
65 |
+
output = llm.create_chat_completion(
|
66 |
+
messages = [
|
67 |
+
{"role": "system", "content": "Convert the HTML to Markdown."},
|
68 |
+
{
|
69 |
+
"role": "user",
|
70 |
+
"content": "<html><body><h1>Hello, world!</h1><p>This is a test!</p></body></html>"
|
71 |
+
}
|
72 |
+
],
|
73 |
+
temperature=0.1,
|
74 |
+
)
|
75 |
+
|
76 |
+
print(output['choices'][0]['message']['content'].strip())
|
77 |
+
```
|
78 |
+
|
79 |
+
## Hardware Requirements
|
80 |
+
|
81 |
+
- **Q4_K_M (986MB)**: Runs well on CPUs with **8GB RAM or more**
|
82 |
+
- **Q8_0 (1.6GB)**: Requires **16GB RAM** for smooth performance
|
83 |
+
|
84 |
+
For **GPU acceleration**, compile `llama.cpp` with CUDA support.
|
85 |
+
|
86 |
+
## Credits
|
87 |
+
|
88 |
+
- **Original Model**: [Jina AI - ReaderLM-v2](https://huggingface.co/jinaai/ReaderLM-v2)
|
89 |
+
- **Quantization**: Performed using [llama.cpp](https://github.com/ggerganov/llama.cpp)
|
90 |
+
|
91 |
+
## License
|
92 |
+
|
93 |
+
This model is released under **Creative Commons Attribution-NonCommercial 4.0 (CC-BY-NC-4.0)**. See [LICENSE](https://huggingface.co/spaces/jinaai/ReaderLM-v2) for details.
|
94 |
+
|
95 |
+
---
|
96 |
+
_Last updated: **January 31, 2025**_
|
97 |
+
|
ReaderLM-v2-Q4_K_M.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:5c19ed3117873c716e25a3556dbdc6e7c99969acfbac26e2273d8eb563244ddf
|
3 |
+
size 986046080
|
ReaderLM-v2-Q8_0.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:0a0b0464ee4f91a2f9ae8294fc01a00e2023c1498d5c4dde2870df532dc0829d
|
3 |
+
size 1646570624
|