359 26 174

Sayak Paul

sayakpaul

https://sayak.dev

AI & ML interests

Diffusion models, representation learning

Recent Activity

updated a dataset about 13 hours ago

sayakpaul/arxiv-new-datasets

posted an update about 14 hours ago

We have been cooking a couple of fine-tuning runs on CogVideoX with finetrainers, smol datasets, and LoRA to generate cool video effects like crushing, dissolving, etc. We are also releasing a LoRA extraction utility from a fully fine-tuned checkpoint. I know that kind of stuff has existed since eternity, but the quality on video models was nothing short of spectacular. Below are some links: * Models and datasets: https://huggingface.co/finetrainers * finetrainers: https://github.com/a-r-r-o-w/finetrainers * LoRA extraction: https://github.com/huggingface/diffusers/blob/main/scripts/extract_lora_from_model.py

published a dataset about 14 hours ago

sayakpaul/arxiv-new-datasets

View all activity

Articles

🤗 PEFT: Parameter-Efficient Fine-Tuning of Billion-Scale Models on Low-Resource Hardware

Feb 10, 2023

• 47

A Dive into Pretraining Strategies for Vision-Language Models

Feb 3, 2023

• 54

The State of Computer Vision at Hugging Face 🤗

Jan 30, 2023

• 6

Using LoRA for Efficient Stable Diffusion Fine-Tuning

Jan 26, 2023

• 42

Image Similarity with Hugging Face Datasets and Transformers

Jan 16, 2023

• 23

Deploying 🤗 ViT on Kubernetes with TF Serving

Aug 11, 2022

Deploying TensorFlow Vision Models in Hugging Face with TF Serving

Jul 25, 2022

• 1

Organizations

sayakpaul's activity

posted an update about 14 hours ago

Post

719

We have been cooking a couple of fine-tuning runs on CogVideoX with finetrainers, smol datasets, and LoRA to generate cool video effects like crushing, dissolving, etc.

We are also releasing a LoRA extraction utility from a fully fine-tuned checkpoint. I know that kind of stuff has existed since eternity, but the quality on video models was nothing short of spectacular. Below are some links:

* Models and datasets: https://huggingface.co/finetrainers
* finetrainers: https://github.com/a-r-r-o-w/finetrainers
* LoRA extraction: https://github.com/huggingface/diffusers/blob/main/scripts/extract_lora_from_model.py

posted an update 3 days ago

Post

1784

We have authored a post to go over the state of video generation in the Diffusers ecosystem 🧨

We cover the models supported, the knobs of optims our users can fire, fine-tuning, and more 🔥

5-6GBs for HunyuanVideo, sky is the limit 🌌 🤗
https://huggingface.co/blog/video_gen

posted an update about 1 month ago

Post

4315

Commits speak louder than words 🤪

* 4 new video models
* Multiple image models, including SANA & Flux Control
* New quantizers -> GGUF & TorchAO
* New training scripts

Enjoy this holiday-special Diffusers release 🤗
Notes: https://github.com/huggingface/diffusers/releases/tag/v0.32.0

posted an update about 1 month ago

Post

2140

In the past seven days, the Diffusers team has shipped:

1. Two new video models
2. One new image model
3. Two new quantization backends
4. Three new fine-tuning scripts
5. Multiple fixes and library QoL improvements

Coffee on me if someone can guess 1 - 4 correctly.

1 reply

posted an update about 2 months ago

Post

2108

Introducing a high-quality open-preference dataset to further this line of research for image generation.

Despite being such an inseparable component for modern image generation, open preference datasets are a rarity!

So, we decided to work on one with the community!

Check it out here:
https://huggingface.co/blog/image-preferences

7 replies

posted an update about 2 months ago

Post

2145

The Control family of Flux from @black-forest-labs should be discussed more!

It enables structural controls like ControlNets while being significantly less expensive to run!

So, we're working on a Control LoRA training script 🤗

It's still WIP, so go easy:
https://github.com/huggingface/diffusers/pull/10130

posted an update 2 months ago

Post

1503

Let 2024 be the year of video model fine-tunes!

Check it out here:
https://github.com/a-r-r-o-w/cogvideox-factory/tree/main/training/mochi-1

posted an update 2 months ago

Post

2662

It's been a while we shipped native quantization support in diffusers 🧨

We currently support bistandbytes as the official backend but using others like torchao is already very simple.

This post is just a reminder of what's possible:

1. Loading a model with a quantization config
2. Saving a model with quantization config
3. Loading a pre-quantized model
4. enable_model_cpu_offload()
5. Training and loading LoRAs into quantized checkpoints

Docs:
https://huggingface.co/docs/diffusers/main/en/quantization/bitsandbytes

1 reply

posted an update 4 months ago

Post

2767

Did some little experimentation to resize pre-trained LoRAs on Flux. I explored two themes:

* Decrease the rank of a LoRA
* Increase the rank of a LoRA

The first one is helpful in reducing memory requirements if the LoRA is of a high rank, while the second one is merely an experiment. Another implication of this study is in the unification of LoRA ranks when you would like to torch.compile() them.

Check it out here:
sayakpaul/flux-lora-resizing

1 reply

reacted to dn6's post with ❤️ 5 months ago

Post

2744

Sharing for anyone using Diffusers from_single_file loading and affected by the Runway SD 1.5 issue.

If you have runwayml/stable-diffusion-v1-5 saved locally in your HF cache then loading single file checkpoints in the following way should still work.

from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_single_file("<url or path to single file checkpoint>")

If you do not have the model repo saved in your cache, then automatically inferring the pipeline config will not work since the reference repo runwayml/stable-diffusion-v1-5 doesn't exist anymore.

You can use an alternative SD1.5 repo id to still configure your pipeline.

from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_single_file("<url or path to single file checkpoint>", config="Lykon/DreamShaper")

We're working on resolving the issue ASAP.

2 replies

posted an update 6 months ago

Post

2956

Here is a hackable and minimal implementation showing how to perform distributed text-to-image generation with Diffusers and Accelerate.

Full snippet is here: https://gist.github.com/sayakpaul/cfaebd221820d7b43fae638b4dfa01ba

With @JW17

posted an update 6 months ago

Post

4497

Flux.1-Dev like images but in fewer steps.

Merging code (very simple), inference code, merged params: sayakpaul/FLUX.1-merged

Enjoy the Monday 🤗

4 replies

posted an update 6 months ago

Post

3803

With larger and larger diffusion transformers coming up, it's becoming increasingly important to have some good quantization tools for them.

We present our findings from a series of experiments on quantizing different diffusion pipelines based on diffusion transformers.

We demonstrate excellent memory savings with a bit of sacrifice on inference latency which is expected to improve in the coming days.

Diffusers 🤝 Quanto ❤️

This was a juicy collaboration between @dacorvo and myself.

Check out the post to learn all about it
https://huggingface.co/blog/quanto-diffusers

3 replies

reacted to alex-abb's post with 🔥 7 months ago

Post

4829

Hi everyone!
I'm Alex, I'm 16, I've been an internship at Hugging Face for a little over a week and I've already learned a lot about using and prompting LLM models. With @victor as tutor I've just finished a space that analyzes your feelings by prompting an LLM chat model. The aim is to extend it so that it can categorize hugging face posts.

alex-abb/LLM_Feeling_Analyzer

4 replies

posted an update 7 months ago

Post

2218

Were you aware that we have a dedicated guide on different prompting mechanisms to improve the image generation quality? 🧨

Takes you through simple prompt engineering, prompt weighting, prompt enhancement using GPT-2, and more.

Check out the guide here 🦯
https://huggingface.co/docs/diffusers/main/en/using-diffusers/weighted_prompts

1 reply

posted an update 7 months ago

Post

3142

What is your favorite part of our Diffusers integration of Stable Diffusion 3?

My personal favorite is the ability to run it on a variety of different GPUs with minimal code changes.

Learn more about them here:
https://huggingface.co/blog/sd3

replied to lunarflu's post 8 months ago

I think you definitely missed out on another big release:
https://huggingface.co/posts/sayakpaul/557387472547604

posted an update 8 months ago

Post

1880

🧨 Diffusers 0.28.0 is out 🔥

It features the first non-generative pipeline of the library -- Marigold 🥁

Marigold shines at performing Depth Estimation and Surface Normal Estimation. It was contributed by @toshas , one of the authors of Marigold.

This release also features a massive refactor (led by @DN6 ) of the from_single_file() method, highlighting our efforts for making our library more amenable to community features 🤗

Check out the release notes here:
https://github.com/huggingface/diffusers/releases/tag/v0.28.0

reacted to lunarflu's post with ❤️ 8 months ago

Post

1961

cooking up something....anyone interested in a daily activity tracker for HF?

12 replies

posted an update 9 months ago

Post

2029

Custom pipelines and components in Diffusers 🎸

Wanted to use customized pipelines and other components (schedulers, unets, text encoders, etc.) in Diffusers?

Found it inflexible?

Since the first dawn on earth, we have supported loading custom pipelines via a custom_pipeline argument 🌄

These pipelines are inference-only, i.e., the assumption is that we're leveraging an existing checkpoint (e.g., runwayml/stable-diffusion-v1-5) and ONLY modifying the pipeline implementation.

We have many cool pipelines, implemented that way. They all share the same benefits available to a DiffusionPipeline, no compromise there 🤗

Check them here:
https://github.com/huggingface/diffusers/tree/main/examples/community

Then we might have a requirement of everything customized i.e., custom components along with a custom pipeline. Sure, that's all possible.

All you have to do is keep the implementations of those custom components on the Hub repository you're loading your pipeline checkpoint from.

SDXL Japanese was implemented like this 🔥
stabilityai/japanese-stable-diffusion-xl

Full guide is available here ⬇️
https://huggingface.co/docs/diffusers/main/en/using-diffusers/custom_pipeline_overview

And, of course, these share all the benefits that come with DiffusionPipeline.

Sayak Paul

AI & ML interests

Recent Activity

Articles

State of open video generation models in Diffusers

Open Preference Dataset for Text-to-Image Generation by the 🤗 Community

🧨 Diffusers welcomes Stable Diffusion 3.5 Large

Memory-efficient Diffusion Transformers with Quanto and Diffusers

🧨 Diffusers welcomes Stable Diffusion 3

🤗 PEFT welcomes new merging methods

Welcome aMUSEd: Efficient Text-to-Image Generation

SDXL in 4 steps with Latent Consistency LoRAs

Personal Copilot: Train Your Own Coding Assistant

Exploring simple optimizations for SDXL

Finetune Stable Diffusion Models with DDPO via TRL

Introducing Würstchen: Fast Diffusion for Image Generation

Efficient Controllable Generation for SDXL with T2I-Adapters

Happy 1st anniversary 🤗 Diffusers!

Optimizing Stable Diffusion for Intel CPUs with NNCF and 🤗 Optimum

Instruction-tuning Stable Diffusion with InstructPix2Pix

Training a language model with 🤗 Transformers using TensorFlow and TPUs

ControlNet in Diffusers 🧨

🤗 PEFT: Parameter-Efficient Fine-Tuning of Billion-Scale Models on Low-Resource Hardware

A Dive into Pretraining Strategies for Vision-Language Models

The State of Computer Vision at Hugging Face 🤗

Using LoRA for Efficient Stable Diffusion Fine-Tuning

Image Similarity with Hugging Face Datasets and Transformers

Deploying 🤗 ViT on Vertex AI

Deploying 🤗 ViT on Kubernetes with TF Serving

Deploying TensorFlow Vision Models in Hugging Face with TF Serving

Organizations

sayakpaul's activity