thesven
/

openchat-3.6-8b-20240522-GPTQ

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions Community

thesven commited on May 25, 2024

Commit

d501315

·

verified ·

1 Parent(s): 97050ff

Update README.md

Files changed (1) hide show

README.md +23 -1

README.md CHANGED Viewed

@@ -9,13 +9,35 @@ library_name: transformers
 pipeline_tag: text-generation
 ---
-## Quantization Deaails
 This repo contains a GPTQ 4bit quantized version of the openchat/openchat-3.6-8b-20240522 model.
 ### Using with transfomers
 ```python
 ```
 # Original Model Card

 pipeline_tag: text-generation
 ---
+## Quantization Details
 This repo contains a GPTQ 4bit quantized version of the openchat/openchat-3.6-8b-20240522 model.
 ### Using with transfomers
 ```python
+from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
+model_name_or_path = "thesven/openchat-3.6-8b-20240522-GPTQ"
+tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
+model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
+                                             device_map="auto",
+                                             trust_remote_code=False,
+                                             revision="main")
+model.pad_token = model.config.eos_token_id
+prompt_template=f'''
+<<SYS>> You are a very creative story writer. Write a store on the following topic:<</SYS>>
+[INST] Write a story about Ai[/INST]
+[ASSISTANT]
+'''
+input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
+output = model.generate(inputs=input_ids, temperature=0.1, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)
+print(tokenizer.decode(output[0]))
 ```
 # Original Model Card