run in colab t4

by sdyy - opened Dec 19, 2024

sdyy

Dec 19, 2024

Use a pipeline as a high-level helper

from transformers import pipeline

messages = [
{"role": "user", "content": "Who are you?"},
]
pipe = pipeline("text-generation", model="ISTA-DASLab/Qwen2-72B-AQLM-PV-1bit-1x16", trust_remote_code=True, device_map="auto")
pipe(messages)

/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_auth.py:94: UserWarning:
The secret HF_TOKEN does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
warnings.warn(
config.json: 100%
959/959 [00:00<00:00, 24.6kB/s]
model.safetensors.index.json: 100%
171k/171k [00:00<00:00, 2.36MB/s]
Downloading shards: 100%
5/5 [09:16<00:00, 104.07s/it]
model-00001-of-00005.safetensors: 100%
4.99G/4.99G [01:59<00:00, 42.2MB/s]
model-00002-of-00005.safetensors: 100%
4.99G/4.99G [01:59<00:00, 41.2MB/s]
model-00003-of-00005.safetensors: 100%
4.99G/4.99G [02:00<00:00, 42.7MB/s]
model-00004-of-00005.safetensors: 100%
4.99G/4.99G [01:58<00:00, 42.4MB/s]
model-00005-of-00005.safetensors: 100%
3.17G/3.17G [01:15<00:00, 42.2MB/s]
Loading checkpoint shards: 100%
5/5 [00:52<00:00, 6.42s/it]
generation_config.json: 100%
242/242 [00:00<00:00, 13.7kB/s]
WARNING:accelerate.big_modeling:Some parameters are on the meta device because they were offloaded to the disk and cpu.
tokenizer_config.json: 100%
1.29k/1.29k [00:00<00:00, 74.3kB/s]
vocab.json: 100%
2.78M/2.78M [00:00<00:00, 8.50MB/s]
merges.txt: 100%
1.67M/1.67M [00:00<00:00, 12.5MB/s]
tokenizer.json: 100%
7.03M/7.03M [00:00<00:00, 21.2MB/s]
Device set to use cuda:0
/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py:1965: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
warnings.warn(
/usr/local/lib/python3.10/dist-packages/aqlm/inference_kernels/cuda_kernel.py:20: FutureWarning: torch.library.impl_abstract was renamed to torch.library.register_fake. Please use that instead; we will remove torch.library.impl_abstract in a future version of PyTorch.
@torch .library.impl_abstract("aqlm::code1x16_matmat")
/usr/local/lib/python3.10/dist-packages/aqlm/inference_kernels/cuda_kernel.py:33: FutureWarning: torch.library.impl_abstract was renamed to torch.library.register_fake. Please use that instead; we will remove torch.library.impl_abstract in a future version of PyTorch.
@torch .library.impl_abstract("aqlm::code1x16_matmat_dequant")
/usr/local/lib/python3.10/dist-packages/aqlm/inference_kernels/cuda_kernel.py:48: FutureWarning: torch.library.impl_abstract was renamed to torch.library.register_fake. Please use that instead; we will remove torch.library.impl_abstract in a future version of PyTorch.
@torch .library.impl_abstract("aqlm::code1x16_matmat_dequant_transposed")
/usr/local/lib/python3.10/dist-packages/aqlm/inference_kernels/cuda_kernel.py:62: FutureWarning: torch.library.impl_abstract was renamed to torch.library.register_fake. Please use that instead; we will remove torch.library.impl_abstract in a future version of PyTorch.
@torch .library.impl_abstract("aqlm::code2x8_matmat")
/usr/local/lib/python3.10/dist-packages/aqlm/inference_kernels/cuda_kernel.py:75: FutureWarning: torch.library.impl_abstract was renamed to torch.library.register_fake. Please use that instead; we will remove torch.library.impl_abstract in a future version of PyTorch.
@torch .library.impl_abstract("aqlm::code2x8_matmat_dequant")
/usr/local/lib/python3.10/dist-packages/aqlm/inference_kernels/cuda_kernel.py:88: FutureWarning: torch.library.impl_abstract was renamed to torch.library.register_fake. Please use that instead; we will remove torch.library.impl_abstract in a future version of PyTorch.
@torch .library.impl_abstract("aqlm::code2x8_matmat_dequant_transposed")
[{'generated_text': [{'role': 'user', 'content': 'Who are you?'},
{'role': 'assistant',
'content': 'I am Qwen, a large language model created by Alibaba Cloud. I am here to assist you'}]}]

sdyy

Dec 19, 2024

'content': 'I am Qwen, a large language model created by Alibaba Cloud. I am here to assist you'}]}]

sdyy

Dec 19, 2024

sdyy

Dec 19, 2024

!pip uninstall torch torchvision torchaudio -y
!pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu121
!pip install aqlm[gpu,cpu]

rakmik

26 days ago

https://github.com/werruww/run-Qwen2-72B-Instruct-on-16gb-vram/blob/main/succ_Qwen2_72B_AQLM.ipynb

rakmik

26 days ago

from transformers import pipeline, AutoTokenizer

model_name = "ISTA-DASLab/Meta-Llama-3-70B-AQLM-PV-1Bit-1x16"

Load tokenizer

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

Manually set a chat template (modify based on your model's expected format)

tokenizer.chat_template = "~~[INST] {user_message} [/INST] "~~

Load pipeline

pipe = pipeline("text-generation", model=model_name, trust_remote_code=True, device_map="auto", tokenizer=tokenizer)

Format message correctly

messages = [{"role": "user", "content": "Who are you?"}]
formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False)

Generate response

output = pipe(formatted_prompt, max_new_tokens=100)
print(output)

/usr/local/lib/python3.11/dist-packages/huggingface_hub/utils/_auth.py:94: UserWarning:
The secret HF_TOKEN does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
warnings.warn(
Loading checkpoint shards: 100%
3/3 [01:03<00:00, 20.22s/it]
Device set to use cuda:0
Setting pad_token_id to eos_token_id:128001 for open-end generation.
/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py:1965: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
warnings.warn(
/usr/local/lib/python3.11/dist-packages/aqlm/inference_kernels/cuda_kernel.py:20: FutureWarning: torch.library.impl_abstract was renamed to torch.library.register_fake. Please use that instead; we will remove torch.library.impl_abstract in a future version of PyTorch.
@torch .library.impl_abstract("aqlm::code1x16_matmat")
/usr/local/lib/python3.11/dist-packages/aqlm/inference_kernels/cuda_kernel.py:33: FutureWarning: torch.library.impl_abstract was renamed to torch.library.register_fake. Please use that instead; we will remove torch.library.impl_abstract in a future version of PyTorch.
@torch .library.impl_abstract("aqlm::code1x16_matmat_dequant")
/usr/local/lib/python3.11/dist-packages/aqlm/inference_kernels/cuda_kernel.py:48: FutureWarning: torch.library.impl_abstract was renamed to torch.library.register_fake. Please use that instead; we will remove torch.library.impl_abstract in a future version of PyTorch.
@torch .library.impl_abstract("aqlm::code1x16_matmat_dequant_transposed")
/usr/local/lib/python3.11/dist-packages/aqlm/inference_kernels/cuda_kernel.py:62: FutureWarning: torch.library.impl_abstract was renamed to torch.library.register_fake. Please use that instead; we will remove torch.library.impl_abstract in a future version of PyTorch.
@torch .library.impl_abstract("aqlm::code2x8_matmat")
/usr/local/lib/python3.11/dist-packages/aqlm/inference_kernels/cuda_kernel.py:75: FutureWarning: torch.library.impl_abstract was renamed to torch.library.register_fake. Please use that instead; we will remove torch.library.impl_abstract in a future version of PyTorch.
@torch .library.impl_abstract("aqlm::code2x8_matmat_dequant")
/usr/local/lib/python3.11/dist-packages/aqlm/inference_kernels/cuda_kernel.py:88: FutureWarning: torch.library.impl_abstract was renamed to torch.library.register_fake. Please use that instead; we will remove torch.library.impl_abstract in a future version of PyTorch.
@torch .library.impl_abstract("aqlm::code2x8_matmat_dequant_transposed")
[{'generated_text': '[INST] {user_message} [/INST] Pent Weg Weg Weg Weg Weg Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Slots Blink Blink Blink Blink Blink Blink Blink Blink Blink Blink Blink Blink Blink Blinkaklıaklıaklıaklıaklıaklıaklıaklıaklıaklıaklıaklıaklıaklıaklıaklıaklıaklıaklıaklıaklıaklıaklıaklıaklıaklı'}

rakmik

26 days ago

from transformers import AutoTokenizer, TextStreamer, AutoModelForCausalLM
import transformers
import torch

quantized_model = AutoModelForCausalLM.from_pretrained(
"ISTA-DASLab/Meta-Llama-3-70B-AQLM-PV-1Bit-1x16", trust_remote_code=True, torch_dtype=torch.float16,
).cuda()
tokenizer = AutoTokenizer.from_pretrained("ISTA-DASLab/Meta-Llama-3-70B-AQLM-PV-1Bit-1x16")

inputs = tokenizer(["An increasing sequence: one,"], return_tensors="pt")["input_ids"].cuda()

streamer = TextStreamer(tokenizer)
_ = quantized_model.generate(inputs, streamer=streamer, max_new_tokens=120)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
Setting pad_token_id to eos_token_id:128001 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
An increasing sequence: /usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py:1965: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
warnings.warn(
/usr/local/lib/python3.11/dist-packages/aqlm/inference_kernels/cuda_kernel.py:20: FutureWarning: torch.library.impl_abstract was renamed to torch.library.register_fake. Please use that instead; we will remove torch.library.impl_abstract in a future version of PyTorch.
@torch .library.impl_abstract("aqlm::code1x16_matmat")
/usr/local/lib/python3.11/dist-packages/aqlm/inference_kernels/cuda_kernel.py:33: FutureWarning: torch.library.impl_abstract was renamed to torch.library.register_fake. Please use that instead; we will remove torch.library.impl_abstract in a future version of PyTorch.
@torch .library.impl_abstract("aqlm::code1x16_matmat_dequant")
/usr/local/lib/python3.11/dist-packages/aqlm/inference_kernels/cuda_kernel.py:48: FutureWarning: torch.library.impl_abstract was renamed to torch.library.register_fake. Please use that instead; we will remove torch.library.impl_abstract in a future version of PyTorch.
@torch .library.impl_abstract("aqlm::code1x16_matmat_dequant_transposed")
/usr/local/lib/python3.11/dist-packages/aqlm/inference_kernels/cuda_kernel.py:62: FutureWarning: torch.library.impl_abstract was renamed to torch.library.register_fake. Please use that instead; we will remove torch.library.impl_abstract in a future version of PyTorch.
@torch .library.impl_abstract("aqlm::code2x8_matmat")
/usr/local/lib/python3.11/dist-packages/aqlm/inference_kernels/cuda_kernel.py:75: FutureWarning: torch.library.impl_abstract was renamed to torch.library.register_fake. Please use that instead; we will remove torch.library.impl_abstract in a future version of PyTorch.
@torch .library.impl_abstract("aqlm::code2x8_matmat_dequant")
/usr/local/lib/python3.11/dist-packages/aqlm/inference_kernels/cuda_kernel.py:88: FutureWarning: torch.library.impl_abstract was renamed to torch.library.register_fake. Please use that instead; we will remove torch.library.impl_abstract in a future version of PyTorch.
@torch .library.impl_abstract("aqlm::code2x8_matmat_dequant_transposed")
one,augaaugaaugaaugaaugaaugaaugaaugaaugaaugaaugaaugaaugaaugaaugaaugaaugaaugaaugaNewPropNewPropNewPropĠNewPropĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠĠ

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment