QuantStack/Wan2.2-S2V-14B-GGUF · Share your test here

QuantStack org 7 days ago

We have uploaded the workflow with the GGUF loaders also in this repo here

7 days ago

hello thanks for the quants

I have enhanced the workflow by incorporating a Scale Image To Pixels node and a custom Math Expression script. The latter calculation accurately determines the video length based on the audio length. This modification may be beneficial in certain applications.

hope it can help

AI-Joe-git

7 days ago

same as above with integrated voice cloning with chatterbox srt voice node

AI-Joe-git

7 days ago

also for fast generation its seem working with 4 steps

lightx2v_I2V_14B_480p_cfg_step_distill_rank128_bf16.safetensors

YarvixPA

QuantStack org 7 days ago

•

edited 6 days ago

I also test I2V 14 B 4 steps lora (High and Noise) seems to works fine

https://huggingface.co/lightx2v/Wan2.2-Lightning/tree/main/Wan2.2-I2V-A14B-4steps-lora-rank64-Seko-V1

pheonis

7 days ago

I also test I2V 14 B 4 steps lora (High and Noise) seems to works fine

I dont understand, Did you use both high and low i2v loras in this workflow

YarvixPA

QuantStack org 7 days ago

yes

I also test I2V 14 B 4 steps lora (High and Noise) seems to works fine

I dont understand, Did you use both high and low i2v loras in this workflow

I dont generated all the video for the audio

pheonis

7 days ago

Interesting. Thanks

seddiqx

6 days ago

Hi, thanks for the GGUF QUANT. could you please help with fixing this node , Comfyui is not able to install it , any advice??

YarvixPA

QuantStack org 6 days ago

Update your ComfyUI. You can use the update.bat that’s inside your update folder if you are in the Portable version

pheonis

6 days ago

yes

I also test I2V 14 B 4 steps lora (High and Noise) seems to works fine

How do you generate full length audio to video(for example 12,15sec audio to video), currently its set to 77 frames,i get OOM when i increase this

GavrikCat

6 days ago

•

edited 6 days ago

Update your ComfyUI. You can use the update.bat that’s inside your update folder if you are in the Portable version

this is what I got

current: 0.3.52
Requires ComfyUI 0.3.52

Updated with Manager. It also wants me to install ComfyUI-GGUF, which I already have and updated.

Update: changing version to nightly resolved the issue

Albertga

6 days ago

if i save in webp the video is muted. 99.08 secs

wsbagnsv1

QuantStack org 6 days ago

yes

I also test I2V 14 B 4 steps lora (High and Noise) seems to works fine

How do you generate full length audio to video(for example 12,15sec audio to video), currently its set to 77 frames,i get OOM when i increase this

either lower resolution lower quant or use other memory saving tricks, you run out of memory. Or use other tricks like generating two videos and stich them together.

RuneXX

6 days ago

•

edited 6 days ago

Seems to work, i guess its still WIP on ComfyUI side, but the GGUF models seems to be all ok.

The very first frame of the video is most always quite high contrast/dark, and the lipsync seems to be quite a bit off on most generations. But thats probably down to tweaks in ComfyUI and workflow, and not the model itself (or at least not related to GGUF). I'll do a comparison with Kijais fp8, but havent gotten around to that yet

(edit: got around to testing the fp8 file from Kijai, and the result are same with fp8 and gguf - so the gguf files seems all good)

GavrikCat

6 days ago

yes

I also test I2V 14 B 4 steps lora (High and Noise) seems to works fine

I dont understand, Did you use both high and low i2v loras in this workflow

I dont generated all the video for the audio

No need to use both of them, high lora working fine, but there is first frame degradation

Sikaworld1990

6 days ago

Update your ComfyUI. You can use the update.bat that’s inside your update folder if you are in the Portable version

this is what I got

current: 0.3.52
Requires ComfyUI 0.3.52

Updated with Manager. It also wants me to install ComfyUI-GGUF, which I already have and updated.

Update: changing version to nightly resolved the issue

As the manager updates only to the lastest stable version without experimental commits I did update the master branch for my standard comfy ui. git checkout master in the comfy ui folder then git pull, afterwards requirement.txt.

Sikaworld1990

6 days ago

•

edited 6 days ago

System Ryzen 9 5900x, 64 GB RAM, RTX 4070 12 GB, Using Q8 with multigpu node.
1st gen "raw workflow" 5sec vid 26 mins

2nd gen, sageattn etc enabled, high and low speed, 4 steps, gentime 284 secs but the well known motion problem. At least videogirl looks close to the 1 showcase girl

GavrikCat

6 days ago

YarvixPA

QuantStack org 6 days ago

•

edited 6 days ago

@GavrikCat What setting you use?

Sikaworld1990

6 days ago

•

edited 6 days ago by

YarvixPA

What setting you use?

Its a prompt issue I checked his prompt "android looking at his fingers twitching" of course there will be no talking.

YarvixPA

QuantStack org 6 days ago

•

edited 6 days ago

Its a prompt issue I checked his prompt "android looking at his fingers twitching" of course there will be no talking.

Thank you, good observation, but I also asking about @GavrikCat setting that he use on that outputs that he share

Sikaworld1990

6 days ago

Its a prompt issue I checked his prompt "android looking at his fingers twitching" of course there will be no talking.

Thank you, good observation, but I also checked your generation configurations

Thx, I am currently testing 2 samplers different samplers. The lipsync is very good but the "background scene" is completely odd. I am using the picture, audio and prompt from wans showcase: In the video, a woman stood on the deck of a sailing boat and sang loudly. (The background was the choppy sea and the thundering sky. It was raining heavily in the sky, the ship swayed, the camera swayed, and the waves splashed everywhere, creating a heroic atmosphere:1.8). The woman has long dark hair, part of which is wet by rain. Her expression is serious and firm, her eyes are sharp, and she seems to be staring at the distance or thinking.

YarvixPA

QuantStack org 6 days ago

@Sikaworld1990 sorry my past message wasn’t clear. I was asking about the settings of @GavrikCat in the ksampler, loras, etc that he use in the output that he share

Sikaworld1990

6 days ago

@Sikaworld1990 sorry my past message wasn’t clear. I was asking about the settings of @GavrikCat in the ksampler, loras, etc that he use in the output that he share

He is using the standard wf with lightning I2V high at 1.0 strength

RuneXX

6 days ago

Seems to work ok with lightning lora 4steps

(low res input image, so its a bit blurry in this example though)

Ton1989

6 days ago

•

edited 6 days ago

Hi , anyone can you help about this error

YarvixPA

QuantStack org 6 days ago

•

edited 6 days ago

Hi, anyone can you help about this error

@Ton1989 : That's an issue with the size of the image input. Don't know exactly works with this, but 1280x720 works for me. I got that issue when I use image that was 1408x768

GavrikCat

6 days ago

What setting you use?

Its a prompt issue I checked his prompt "android looking at his fingers twitching" of course there will be no talking.

Dont see an issue with my prompt. Realistic images dont require describing that person is speaking, they just work. You are welcome to provide your results with anime characters.

Ton1989

6 days ago

Hi, anyone can you help about this error

@Ton1989 : That's an issue with the size of the image input. Don't know exactly works with this, but 1280x720 works for me. I got that issue when I use image that was 1408x768

Hi , i try 1280 x 720 is worked !! Thankyou so much , and we cannot use input another resolution ? i mean 9:16 720x1280 or 840x480 ?

YarvixPA

QuantStack org 6 days ago

Yes, you should be able to lower the resolution or change the ratio aspect freely... I honestly don't know the problem. I was just telling you the size that worked for me, you could try others

Ton1989

6 days ago

Yes, you should be able to lower the resolution or change the ratio aspect freely... I honestly don't know the problem. I was just telling you the size that worked for me, you could try others

@YarvixPA OK , Thankyou so much , have a good day 😊

RuneXX

6 days ago

•

edited 6 days ago

@Ton1989 @YarvixPA i dont know if this is true, but any theory is a good conspiracy theory ;-) when i ran into this error it was on a workflow with a non gguf clip model. Changing the clip model to be gguf as well, and the error was gone. Worth a try, but defo dont know if it holds water ;-) or if i was just having some random luck

EDIT: strike that, it seems to happen if not using certain resolutions

Sikaworld1990

6 days ago

What setting you use?

Its a prompt issue I checked his prompt "android looking at his fingers twitching" of course there will be no talking.

Dont see an issue with my prompt. Realistic images dont require describing that person is speaking, they just work. You are welcome to provide your results with anime characters.

nah mate all examples from the official wan page are having a specific prompt. Its the same like with loras they are mosty working without triggerwords but much better using the specific trigger words.

Ton1989

6 days ago

Hi , This is my result with 2 Lora and just 4 step cf1 on ksampler it fast and keeping nice quality 👉 01:31 min. for RTX 4070 12GB .
and sound i try on my lang. is Thailand is perfect !!!

video say 👉 สบายดีพี่น้อง วันนี้จะไปเที่ยวไหนดีคะ (Hi everyone. Where would you like to go today? )

Sikaworld1990

6 days ago

•

edited 6 days ago

What setting you use?

Its a prompt issue I checked his prompt "android looking at his fingers twitching" of course there will be no talking.

Dont see an issue with my prompt. Realistic images dont require describing that person is speaking, they just work. You are welcome to provide your results with anime characters.

Here u go

@YarvixPA why u deleted my post it was sfw. Also I have discovered an issue. When I use Q8 the microphone dissapears for a few secs no problem with Q3 much better prompt following.

loscrossos

5 days ago

could anyone share a working workflow? using the original workflow in the first comment i get terrible times like 1300seconds per step... so doing 4 steps predicts to need like 5 hours.
RTX506016GB here

what are your generation times?

dansmadness

5 days ago

•

edited 5 days ago

Any Ideas with the following error?

AudioEncoderEncode
Input type (double) and bias type (float) should be the same

Sikaworld1990

5 days ago

could anyone share a working workflow? using the original workflow in the first comment i get terrible times like 1300seconds per step... so doing 4 steps predicts to need like 5 hours.
RTX506016GB here

what are your generation times?

There are several different workfows in the videos here..just download and load it into comfy. As for gentime 740 secs for a 9sec video 480x832 resolution using Q8 8 steps with 2 samplers and high and low speed lora on my 4070 12 GB

miguelamendez

5 days ago

•

edited 5 days ago

@Sikaworld,loscrossos In my 5060 16gb ram marks out of memory which specs are you starting running comfyui and the workflow? would you mind to share the json of your workflow?

johnhampel

5 days ago

•

edited 5 days ago

Hi , anyone can you help about this error

I experience very similar errors with SOME starting pictures. Tried two different pictures with same resolution, one works, the other does not. So it can't be just that.

example:
"RuntimeError: The size of tensor a (23) must match the size of tensor b (22) at non-singleton dimension 4"

EDIT: Guess I don't understand enough, but does it mean some resolutions are guaranteed to work ALWAYS and some are just SOMETIMES?

miguelamendez

5 days ago

System Ryzen 9 5900x, 64 GB RAM, RTX 4070 12 GB, Using Q8 with multigpu node.
1st gen "raw workflow" 5sec vid 26 mins

2nd gen, sageattn etc enabled, high and low speed, 4 steps, gentime 284 secs but the well known motion problem. At least videogirl looks close to the 1 showcase girl

Would you mind sharing the workflow? seems im running out of memory and have 16 gigs vram

miguelamendez

5 days ago

Hi , anyone can you help about this error

I experience very similar errors with SOME starting pictures. Tried two different pictures with same resolution, one works, the other does not. So it can't be just that.

example:
"RuntimeError: The size of tensor a (23) must match the size of tensor b (22) at non-singleton dimension 4"

I found resizing the pictures to the mentioned standards above works.

johnhampel

5 days ago

@miguelamendez ,thank you, I just discovered the "WanVideo Image Resize To Closest" node and so far no more errors.

loscrossos

5 days ago

@Sikaworld,loscrossos In my 5060 16gb ram marks out of memory which specs are you starting running comfyui and the workflow? would you mind to share the json of your workflow?

i am running the WF of the very first post fully unchanged. only thing is i have sage attention installed. i only changed steps to 4. it comes with steps 20 and it takes forever to start an then says it going to take 11 hours.. so i aborted it.

i havent had luck with any of the embedded videos yet.. all take so long to start. i have even a fully brand new comfy setup only for this

YarvixPA

QuantStack org 5 days ago

@Sikaworld,loscrossos In my 5060 16gb ram marks out of memory which specs are you starting running comfyui and the workflow? would you mind to share the json of your workflow?

i am running the WF of the very first post fully unchanged. only thing is i have sage attention installed. i only changed steps to 4. it comes with steps 20 and it takes forever to start an then says it going to take 11 hours.. so i aborted it.

i havent had luck with any of the embedded videos yet.. all take so long to start. i have even a fully brand new comfy setup only for this

@loscrossos What GPU and how much VRAM and RAM do you have?

niyalparmar

5 days ago

We have uploaded the workflow with the GGUF loaders also in this repo here

does this work for v2v ?? or I2V only?

YarvixPA changed discussion status to closed 5 days ago

YarvixPA changed discussion status to open 5 days ago

Sikaworld1990

5 days ago

@Sikaworld,loscrossos In my 5060 16gb ram marks out of memory which specs are you starting running comfyui and the workflow? would you mind to share the json of your workflow?

i am running the WF of the very first post fully unchanged. only thing is i have sage attention installed. i only changed steps to 4. it comes with steps 20 and it takes forever to start an then says it going to take 11 hours.. so i aborted it.

i havent had luck with any of the embedded videos yet.. all take so long to start. i have even a fully brand new comfy setup only for this

I recommend to use this workflow

but using lighting lora high and low 1.0 instead of fusionx, 8 steps split 4/4. Generation time should be 700secs to 12 mins depending on seed for a 9 sec video 480x832 using Q8 multigpu node (u should repplace)
I have discovered that the ai video (wan) generation time is significantly depending on the chosen seed. Thats why:
In generative models, the seed initializes the random number generator that controls the starting noise pattern.

This noise is the “canvas” the model gradually refines into your video.

Different seeds → different starting noise → different intermediate states during generation.

PiquantSalt

4 days ago

•

edited 4 days ago

Modified the comfy workflow, almost no color drift. The lora that's not bypassed is the best one that I found. It's very fast too, for relatively good results. Q8 on 3090, 4 steps, the old wan2.1 lora works the best, from 1 to 1.5. 120-140 sec per 77 frames. So 14.5 sec of video would be around 7 min (before upscale, etc). Oh yeah, and just ignore the indexed batch select, I was trying something there, it didn't work.

And it works with ultimate upscale, too, without much issues (cut out best parts here)

YarvixPA

QuantStack org 4 days ago

@PiquantSalt is the with native nodes? Good job… The video has embedded the workflow?

PiquantSalt

4 days ago

•

edited 4 days ago

@YarvixPA Yup, wf should be included. q8 is obviously superior. Sticking close to 16:9 seems to help. Native nodes + gguf (obviously) and some convenience nodes, but it should work with just native and gguf, too. Adjusted the settings a bit, euler beta57 is quick and pretty accurate.

lightx2v_t2v_14b_cfg_step_distill_v2_lora_rank256_bf16 lora for wan2.1 has given me the best results so far, others didn’t really work. It's still not 100% amazing, but at 1.5 strength it's "OK". Truncating first 1 or 2 frames (best 2 imo) improves the overall result.

Oh, and torch compile sometimes freaks out, but just clearing the cache seems to solve it.

Sikaworld1990

3 days ago

@YarvixPA Yup, wf should be included. q8 is obviously superior. Sticking close to 16:9 seems to help. Native nodes + gguf (obviously) and some convenience nodes, but it should work with just native and gguf, too. Adjusted the settings a bit, euler beta57 is quick and pretty accurate.

lightx2v_t2v_14b_cfg_step_distill_v2_lora_rank256_bf16 lora for wan2.1 has given me the best results so far, others didn’t really work. It's still not 100% amazing, but at 1.5 strength it's "OK". Truncating first 1 or 2 frames (best 2 imo) improves the overall result.

Oh, and torch compile sometimes freaks out, but just clearing the cache seems to solve it.

Have u also tested the apative lightx2v?

PiquantSalt

about 10 hours ago

@Sikaworld1990
Enjoy!

S2V gguf q8, cfg 1, 4 steps, 25 s/it, 77frames, seed 19, euler beta57, inductor, sage attention. LIMITED testing. The sync might be better on slower text (used relatively fast speech)

Because it's the best so far, I ran additional tests:
lightx2v_T2V_14B_cfg_step_distill_v2_lora_rank256_bf16.safetensors
1.0: mid (slight color shift, ~75% synced, very slight ghosting)
1.5: good (slight color shift, ~90% synced, great movement)
1.6-1.9: good (color shift increases, lips ~90% synced, movement ranges from great to good - probably solved with different seed)
2.0: mid (color fried, ~95% synced, great movement)

Wan21_T2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors
1.0: fail (face ghosting, synced ~75%)
1.5: mid/good (slight color shift, ~80% synced)
Expect similar results from 1.6 as above

lightx2v_14B_T2V_cfg_step_distill_lora_adaptive_rank_quantile_0.15_bf16.safetensors
1.0: mid (color shift, synced ~70%)
1.5: good (slight color shift, synced ~80%)
2.0: good (color shift, synced ~90%)
Similar performance to rank256_bf16

Wan2.2-Lightning_T2V-v1.1-A14B-4steps-lora_HIGH_fp16.safetensors
1.0: mid (color shift, synced ~60%)
1.5: fail (color fried, synced ~60%)

Wan2.2-Lightning_T2V-v1.1-A14B-4steps-lora_LOW_fp16.safetensors
1.0: fail (total ghosting)
1.5: fail (weirdly enough the motion and lip sync are great, it's just a ghost talking)
2.0: mid (slight ghosting, good movement) There might be something here! res_2s beta57 - fail, euler simple - mid, dpmpp_3m_sde beta57 - mid/fail.
2.5: fail (less ghosting, bad movement)

Wan2.2-low-T2V-A14B-4steps-lora-rank64-Seko-V1.1.safetensors
1.0: fail (total ghosting)
1.5: fail (total ghosting)

Wan21_PusaV1_Lora_14B_rank512_bf16.safetensors (as addon lora)
fail (no noticeable improvement with rank256_bf16)

johnhampel

about 9 hours ago

@PiquantSalt why are you using the T2V-lora, when S2V should be more like I2V? I am using the low+high noise lightx2v-I2V-loras with acceptable results.

PiquantSalt

about 8 hours ago

•

edited about 8 hours ago

@PiquantSalt why are you using the T2V-lora, when S2V should be more like I2V? I am using the low+high noise lightx2v-I2V-loras with acceptable results.

I agree, in theory. The reason is cause my gen results were worse with i2v loras, and s2v loads both i2v and t2v loras normally. See above t2v lora result with the woman in armor video and upscale (no drift on face movement, no color drift at all). Let me run the full test on all my i2v variants and I'll come back with the exact results.