GGUF
Inference Endpoints
conversational
aashish1904 commited on
Commit
7d52a38
·
verified ·
1 Parent(s): 1f67ac8

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +67 -0
README.md ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+
4
+ license: gemma
5
+ datasets:
6
+ - anthracite-org/stheno-filtered-v1.1
7
+ base_model: google/gemma-2-2b-it
8
+
9
+ ---
10
+
11
+ ![](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)
12
+
13
+ # QuantFactory/Gemma-2-2B-Stheno-Filtered-GGUF
14
+ This is quantized version of [SaisExperiments/Gemma-2-2B-Stheno-Filtered](https://huggingface.co/SaisExperiments/Gemma-2-2B-Stheno-Filtered) created using llama.cpp
15
+
16
+ # Original Model Card
17
+
18
+
19
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/660e67afe23148df7ca321a5/F1TQkG-VUmlTFL-xtk3wW.png)
20
+
21
+ I don't have anything else so you get a cursed cat image
22
+
23
+ # Basic info
24
+ This is [anthracite-org/stheno-filtered-v1.1](https://huggingface.co/datasets/anthracite-org/stheno-filtered-v1.1) over [unsloth/gemma-2-2b-it](https://huggingface.co/unsloth/gemma-2-2b-it)
25
+
26
+ It saw 76.6M tokens
27
+
28
+ This time it took 14 hours and i'm pretty sure i've been training with the wrong prompt template X-X
29
+
30
+ # Training config:
31
+ ```
32
+ cutoff_len: 1024
33
+ dataset: stheno-3.4
34
+ dataset_dir: data
35
+ ddp_timeout: 180000000
36
+ do_train: true
37
+ finetuning_type: lora
38
+ flash_attn: auto
39
+ fp16: true
40
+ gradient_accumulation_steps: 8
41
+ include_num_input_tokens_seen: true
42
+ learning_rate: 5.0e-05
43
+ logging_steps: 5
44
+ lora_alpha: 64
45
+ lora_dropout: 0
46
+ lora_rank: 64
47
+ lora_target: all
48
+ lr_scheduler_type: cosine
49
+ max_grad_norm: 1.0
50
+ max_samples: 100000
51
+ model_name_or_path: unsloth/gemma-2-2b-it
52
+ num_train_epochs: 3.0
53
+ optim: adamw_8bit
54
+ output_dir: saves/Gemma-2-2B-Chat/lora/stheno
55
+ packing: false
56
+ per_device_train_batch_size: 2
57
+ plot_loss: true
58
+ preprocessing_num_workers: 16
59
+ quantization_bit: 4
60
+ quantization_method: bitsandbytes
61
+ report_to: none
62
+ save_steps: 100
63
+ stage: sft
64
+ template: gemma
65
+ use_unsloth: true
66
+ warmup_steps: 0
67
+ ```