rbehzadan commited on
Commit
e85f7c2
·
verified ·
1 Parent(s): 048da0f

Initial commit

Browse files
Files changed (4) hide show
  1. .gitattributes +2 -0
  2. README.md +97 -3
  3. ReaderLM-v2-Q4_K_M.gguf +3 -0
  4. ReaderLM-v2-Q8_0.gguf +3 -0
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ ReaderLM-v2-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
37
+ ReaderLM-v2-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,97 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ tags:
4
+ - llama.cpp
5
+ - gguf
6
+ - ReaderLM-v2
7
+ - html-to-markdown
8
+ - jina-ai
9
+ ---
10
+
11
+ # ReaderLM-v2 GGUF Quantized Models for llama.cpp
12
+
13
+ This repository contains **GGUF quantized versions** of the [ReaderLM-v2](https://huggingface.co/jinaai/ReaderLM-v2) model by [Jina AI](https://jina.ai/). These models are optimized for **llama.cpp**, making them efficient to run on CPUs and GPUs.
14
+
15
+ ## Model Information
16
+
17
+ ReaderLM-v2 is a **1.5 billion parameter** model designed for **HTML-to-Markdown** and **HTML-to-JSON** conversion. It supports **29 languages** and can handle **up to 512,000 tokens** in combined input and output length.
18
+
19
+ The model is useful for extracting structured data from web pages and various NLP applications.
20
+
21
+ ## Available Quantized Models
22
+
23
+ | Model File | Quantization Type | Size | Description |
24
+ |---------------------------|------------------|-------|-------------|
25
+ | `ReaderLM-v2-Q4_K_M.gguf` | Q4_K_M | 986MB | Lower precision, optimized for CPU performance |
26
+ | `ReaderLM-v2-Q8_0.gguf` | Q8_0 | 1.6GB | Higher precision, better quality |
27
+
28
+ These quantized versions balance **performance and accuracy**, making them suitable for different hardware setups.
29
+
30
+ ## Usage
31
+
32
+ ### Running the Model with llama.cpp
33
+
34
+ 1. **Clone and build llama.cpp**:
35
+ ```bash
36
+ git clone https://github.com/ggerganov/llama.cpp.git
37
+ cd llama.cpp
38
+ mkdir build && cd build
39
+ cmake ..
40
+ make -j$(nproc)
41
+ ```
42
+
43
+ 2. **Run the model**:
44
+ ```bash
45
+ ./llama-cli --model ReaderLM-v2-Q4_K_M.gguf --no-conversation --no-display-prompt --temp 0 --prompt '<|im_start|>system
46
+ Convert the HTML to Markdown.
47
+ <|im_end|>
48
+ <|im_start|>user
49
+ <html><body><h1>Hello, world!</h1></body></html>
50
+ <|im_end|>
51
+ <|im_start|>assistant' 2>/dev/null
52
+ ```
53
+
54
+ Replace `ReaderLM-v2-Q4_K_M.gguf` with `ReaderLM-v2-Q8_0.gguf` for better quality at the cost of performance.
55
+
56
+ ### Using the Model in Python with llama-cpp-python
57
+
58
+ ```bash
59
+ pip install llama-cpp-python
60
+ ```
61
+
62
+ ```python
63
+ model_path = "./models/ReaderLM-v2-Q4_K_M.gguf"
64
+ llm = Llama(model_path=model_path, chat_format="chatml")
65
+ output = llm.create_chat_completion(
66
+ messages = [
67
+ {"role": "system", "content": "Convert the HTML to Markdown."},
68
+ {
69
+ "role": "user",
70
+ "content": "<html><body><h1>Hello, world!</h1><p>This is a test!</p></body></html>"
71
+ }
72
+ ],
73
+ temperature=0.1,
74
+ )
75
+
76
+ print(output['choices'][0]['message']['content'].strip())
77
+ ```
78
+
79
+ ## Hardware Requirements
80
+
81
+ - **Q4_K_M (986MB)**: Runs well on CPUs with **8GB RAM or more**
82
+ - **Q8_0 (1.6GB)**: Requires **16GB RAM** for smooth performance
83
+
84
+ For **GPU acceleration**, compile `llama.cpp` with CUDA support.
85
+
86
+ ## Credits
87
+
88
+ - **Original Model**: [Jina AI - ReaderLM-v2](https://huggingface.co/jinaai/ReaderLM-v2)
89
+ - **Quantization**: Performed using [llama.cpp](https://github.com/ggerganov/llama.cpp)
90
+
91
+ ## License
92
+
93
+ This model is released under **Creative Commons Attribution-NonCommercial 4.0 (CC-BY-NC-4.0)**. See [LICENSE](https://huggingface.co/spaces/jinaai/ReaderLM-v2) for details.
94
+
95
+ ---
96
+ _Last updated: **January 31, 2025**_
97
+
ReaderLM-v2-Q4_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5c19ed3117873c716e25a3556dbdc6e7c99969acfbac26e2273d8eb563244ddf
3
+ size 986046080
ReaderLM-v2-Q8_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0a0b0464ee4f91a2f9ae8294fc01a00e2023c1498d5c4dde2870df532dc0829d
3
+ size 1646570624