sachin2000keshav commited on
Commit
8786f34
·
verified ·
1 Parent(s): 5e28480

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +88 -0
README.md ADDED
@@ -0,0 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama3.2
3
+ base_model: canopylabs/3b-hi-pretrain-research_release
4
+ tags:
5
+ - text-to-speech
6
+ - hindi
7
+ - hinglish
8
+ - audio-generation
9
+ - fine-tuned
10
+ - unsloth
11
+ language:
12
+ - hi
13
+ - en
14
+ pipeline_tag: text-generation
15
+ ---
16
+
17
+ # Hinglish TTS 3B Model
18
+
19
+ This is a fine-tuned version of [canopylabs/3b-hi-pretrain-research_release](https://huggingface.co/canopylabs/3b-hi-pretrain-research_release) specialized for Hinglish (Hindi-English mixed) text-to-speech generation.
20
+
21
+ ## Model Details
22
+
23
+ - **Base Model**: canopylabs/3b-hi-pretrain-research_release
24
+ - **Fine-tuning Method**: LoRA with Unsloth (merged)
25
+ - **Languages**: Hindi, English, Hinglish
26
+ - **Task**: Text-to-Speech via audio token generation
27
+ - **Model Size**: ~3B parameters
28
+
29
+ ## Usage
30
+
31
+ ```python
32
+ from transformers import AutoTokenizer, AutoModelForCausalLM
33
+ import torch
34
+
35
+ # Load model and tokenizer
36
+ model_name = "sachin2000keshav/hinglish_tts_v1"
37
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
38
+ model = AutoModelForCausalLM.from_pretrained(
39
+ model_name,
40
+ torch_dtype=torch.float16,
41
+ device_map="auto"
42
+ )
43
+
44
+ # Generate text
45
+ prompt = "Hello doston, main aapka dost hun"
46
+ inputs = tokenizer(prompt, return_tensors="pt")
47
+ outputs = model.generate(**inputs, max_new_tokens=1200)
48
+ ```
49
+
50
+ ## Fine-tuning Details
51
+
52
+ - **LoRA Rank**: 64
53
+ - **LoRA Alpha**: 64
54
+ - **Target Modules**: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
55
+ - **Training Framework**: Unsloth
56
+
57
+ ## Audio Generation
58
+
59
+ This model generates audio tokens that need to be decoded using a SNAC (Scalable Neural Audio Codec) model:
60
+
61
+ ```python
62
+ from snac import SNAC
63
+
64
+ # Load SNAC decoder
65
+ snac_model = SNAC.from_pretrained("hubertsiuzdak/snac_24khz")
66
+
67
+ # Process generated tokens to audio codes and decode
68
+ # (See full implementation in the original training code)
69
+ ```
70
+
71
+ ## Limitations
72
+
73
+ - Requires SNAC model for audio generation
74
+ - Optimized for Hinglish content
75
+ - May not perform well on pure English or pure Hindi in some cases
76
+
77
+ ## Citation
78
+
79
+ If you use this model, please cite the original base model:
80
+
81
+ ```bibtex
82
+ @misc{canopylabs-3b-hi,
83
+ title={3B Hindi Pretrained Model},
84
+ author={Canopy Labs},
85
+ year={2024},
86
+ url={https://huggingface.co/canopylabs/3b-hi-pretrain-research_release}
87
+ }
88
+ ```