sachin2000keshav
/

hinglish_tts_v1

Text Generation

audio-generation

Model card Files Files and versions

sachin2000keshav commited on 30 days ago

Commit

8786f34

·

verified ·

1 Parent(s): 5e28480

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md +88 -0

README.md ADDED Viewed

	@@ -0,0 +1,88 @@

+---
+license: llama3.2
+base_model: canopylabs/3b-hi-pretrain-research_release
+tags:
+- text-to-speech
+- hindi
+- hinglish
+- audio-generation
+- fine-tuned
+- unsloth
+language:
+- hi
+- en
+pipeline_tag: text-generation
+---
+# Hinglish TTS 3B Model
+This is a fine-tuned version of [canopylabs/3b-hi-pretrain-research_release](https://huggingface.co/canopylabs/3b-hi-pretrain-research_release) specialized for Hinglish (Hindi-English mixed) text-to-speech generation.
+## Model Details
+- **Base Model**: canopylabs/3b-hi-pretrain-research_release
+- **Fine-tuning Method**: LoRA with Unsloth (merged)
+- **Languages**: Hindi, English, Hinglish
+- **Task**: Text-to-Speech via audio token generation
+- **Model Size**: ~3B parameters
+## Usage
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+# Load model and tokenizer
+model_name = "sachin2000keshav/hinglish_tts_v1"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype=torch.float16,
+    device_map="auto"
+)
+# Generate text
+prompt = "Hello doston, main aapka dost hun"
+inputs = tokenizer(prompt, return_tensors="pt")
+outputs = model.generate(**inputs, max_new_tokens=1200)
+```
+## Fine-tuning Details
+- **LoRA Rank**: 64
+- **LoRA Alpha**: 64
+- **Target Modules**: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
+- **Training Framework**: Unsloth
+## Audio Generation
+This model generates audio tokens that need to be decoded using a SNAC (Scalable Neural Audio Codec) model:
+```python
+from snac import SNAC
+# Load SNAC decoder
+snac_model = SNAC.from_pretrained("hubertsiuzdak/snac_24khz")
+# Process generated tokens to audio codes and decode
+# (See full implementation in the original training code)
+```
+## Limitations
+- Requires SNAC model for audio generation
+- Optimized for Hinglish content
+- May not perform well on pure English or pure Hindi in some cases
+## Citation
+If you use this model, please cite the original base model:
+```bibtex
+@misc{canopylabs-3b-hi,
+  title={3B Hindi Pretrained Model},
+  author={Canopy Labs},
+  year={2024},
+  url={https://huggingface.co/canopylabs/3b-hi-pretrain-research_release}
+}
+```