Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
1
7
48
William Suffill
wsuff
Follow
shtefcs's profile picture
1 follower
ยท
52 following
wsuff
AI & ML interests
None yet
Recent Activity
reacted
to
nicolay-r
's
post
with ๐
26 days ago
๐ข For those who wish to launch distilled DeepSeek R1 for reasoning with schema, sharing the Google Colab notebook: ๐ https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/llm_deep_seek_7b_distill_colab.ipynb This is a wrapper of the Qwen2 transformers ๐ค provider via bulk-chain framework. Model: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B GPU: T4 (15GB) is nearly enough in float32 mode. ๐ To boost the performance you may set bf16 mode (use_bf16=True) ๐ Powered by bulk-chain: https://github.com/nicolay-r/bulk-chain
reacted
to
m-ric
's
post
with ๐ฅ
27 days ago
๐ง๐ต๐ฒ ๐๐๐ฏ ๐๐ฒ๐น๐ฐ๐ผ๐บ๐ฒ๐ ๐ฒ๐ ๐๐ฒ๐ฟ๐ป๐ฎ๐น ๐ถ๐ป๐ณ๐ฒ๐ฟ๐ฒ๐ป๐ฐ๐ฒ ๐ฝ๐ฟ๐ผ๐๐ถ๐ฑ๐ฒ๐ฟ๐! โ Hosting our own inference was not enough: now the Hub 4 new inference providers: fal, Replicate, SambaNova Systems, & Together AI. Check model cards on the Hub: you can now, in 1 click, use inference from various providers (cf video demo) Their inference can also be used through our Inference API client. There, you can use either your custom provider key, or your HF token, then billing will be handled directly on your HF account, as a way to centralize all expenses. ๐ธ Also, PRO users get 2$ inference credits per month! Read more in the announcement ๐ https://huggingface.co/blog/inference-providers
reacted
to
merve
's
post
with ๐
29 days ago
Oof, what a week! ๐ฅต So many things have happened, let's recap! https://huggingface.co/collections/merve/jan-24-releases-6793d610774073328eac67a9 Multimodal ๐ฌ - We have released SmolVLM -- tiniest VLMs that come in 256M and 500M, with it's retrieval models ColSmol for multimodal RAG ๐ - UI-TARS are new models by ByteDance to unlock agentic GUI control ๐คฏ in 2B, 7B and 72B - Alibaba DAMO lab released VideoLlama3, new video LMs that come in 2B and 7B - MiniMaxAI released Minimax-VL-01, where decoder is based on MiniMax-Text-01 456B MoE model with long context - Dataset: Yale released a new benchmark called MMVU - Dataset: CAIS released Humanity's Last Exam (HLE) a new challenging MM benchmark LLMs ๐ - DeepSeek-R1 & DeepSeek-R1-Zero: gigantic 660B reasoning models by DeepSeek, and six distilled dense models, on par with o1 with MIT license! ๐คฏ - Qwen2.5-Math-PRM: new math models by Qwen in 7B and 72B - NVIDIA released AceMath and AceInstruct, new family of models and their datasets (SFT and reward ones too!) Audio ๐ฃ๏ธ - Llasa is a new speech synthesis model based on Llama that comes in 1B,3B, and 8B - TangoFlux is a new audio generation model trained from scratch and aligned with CRPO Image/Video/3D Generation โฏ๏ธ - Flex.1-alpha is a new 8B pre-trained diffusion model by ostris similar to Flux - tencent released Hunyuan3D-2, new 3D asset generation from images
View all activity
Organizations
None yet
wsuff
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
New activity in
allenai/tulu-2-dpo-7b
about 1 year ago
GitHub Model Source Repository Link
#1 opened about 1 year ago by
wsuff