Share your test here
hello thanks for the quants
I have enhanced the workflow by incorporating a Scale Image To Pixels node and a custom Math Expression script. The latter calculation accurately determines the video length based on the audio length. This modification may be beneficial in certain applications.
hope it can help
also for fast generation its seem working with 4 steps
lightx2v_I2V_14B_480p_cfg_step_distill_rank128_bf16.safetensors
I also test I2V 14 B 4 steps lora (High and Noise) seems to works fine
I also test I2V 14 B 4 steps lora (High and Noise) seems to works fine
I dont understand, Did you use both high and low i2v loras in this workflow
Interesting. Thanks
Update your ComfyUI. You can use the update.bat that’s inside your update folder if you are in the Portable version
yes
I also test I2V 14 B 4 steps lora (High and Noise) seems to works fine
How do you generate full length audio to video(for example 12,15sec audio to video), currently its set to 77 frames,i get OOM when i increase this
Update your ComfyUI. You can use the update.bat that’s inside your update folder if you are in the Portable version
current: 0.3.52
Requires ComfyUI 0.3.52
Updated with Manager. It also wants me to install ComfyUI-GGUF, which I already have and updated.
Update: changing version to nightly resolved the issue
yes
I also test I2V 14 B 4 steps lora (High and Noise) seems to works fine
How do you generate full length audio to video(for example 12,15sec audio to video), currently its set to 77 frames,i get OOM when i increase this
either lower resolution lower quant or use other memory saving tricks, you run out of memory. Or use other tricks like generating two videos and stich them together.
Seems to work, i guess its still WIP on ComfyUI side, but the GGUF models seems to be all ok.
The very first frame of the video is most always quite high contrast/dark, and the lipsync seems to be quite a bit off on most generations. But thats probably down to tweaks in ComfyUI and workflow, and not the model itself (or at least not related to GGUF). I'll do a comparison with Kijais fp8, but havent gotten around to that yet
(edit: got around to testing the fp8 file from Kijai, and the result are same with fp8 and gguf - so the gguf files seems all good)
Update your ComfyUI. You can use the update.bat that’s inside your update folder if you are in the Portable version
current: 0.3.52
Requires ComfyUI 0.3.52Updated with Manager. It also wants me to install ComfyUI-GGUF, which I already have and updated.
Update: changing version to nightly resolved the issue
As the manager updates only to the lastest stable version without experimental commits I did update the master branch for my standard comfy ui. git checkout master in the comfy ui folder then git pull, afterwards requirement.txt.
System Ryzen 9 5900x, 64 GB RAM, RTX 4070 12 GB, Using Q8 with multigpu node.
1st gen "raw workflow" 5sec vid 26 mins
2nd gen, sageattn etc enabled, high and low speed, 4 steps, gentime 284 secs but the well known motion problem. At least videogirl looks close to the 1 showcase girl
What setting you use?
Its a prompt issue I checked his prompt "android looking at his fingers twitching" of course there will be no talking.
Its a prompt issue I checked his prompt "android looking at his fingers twitching" of course there will be no talking.
Thank you, good observation, but I also asking about @GavrikCat setting that he use on that outputs that he share
Its a prompt issue I checked his prompt "android looking at his fingers twitching" of course there will be no talking.
Thank you, good observation, but I also checked your generation configurations
Thx, I am currently testing 2 samplers different samplers. The lipsync is very good but the "background scene" is completely odd. I am using the picture, audio and prompt from wans showcase: In the video, a woman stood on the deck of a sailing boat and sang loudly. (The background was the choppy sea and the thundering sky. It was raining heavily in the sky, the ship swayed, the camera swayed, and the waves splashed everywhere, creating a heroic atmosphere:1.8). The woman has long dark hair, part of which is wet by rain. Her expression is serious and firm, her eyes are sharp, and she seems to be staring at the distance or thinking.
@Sikaworld1990 sorry my past message wasn’t clear. I was asking about the settings of @GavrikCat in the ksampler, loras, etc that he use in the output that he share
@Sikaworld1990 sorry my past message wasn’t clear. I was asking about the settings of @GavrikCat in the ksampler, loras, etc that he use in the output that he share
He is using the standard wf with lightning I2V high at 1.0 strength
Seems to work ok with lightning lora 4steps
(low res input image, so its a bit blurry in this example though)
Hi, anyone can you help about this error
@Ton1989 : That's an issue with the size of the image input. Don't know exactly works with this, but 1280x720 works for me. I got that issue when I use image that was 1408x768
What setting you use?
Its a prompt issue I checked his prompt "android looking at his fingers twitching" of course there will be no talking.
Dont see an issue with my prompt. Realistic images dont require describing that person is speaking, they just work. You are welcome to provide your results with anime characters.
Hi, anyone can you help about this error
@Ton1989 : That's an issue with the size of the image input. Don't know exactly works with this, but 1280x720 works for me. I got that issue when I use image that was 1408x768
Hi , i try 1280 x 720 is worked !! Thankyou so much , and we cannot use input another resolution ? i mean 9:16 720x1280 or 840x480 ?
Yes, you should be able to lower the resolution or change the ratio aspect freely... I honestly don't know the problem. I was just telling you the size that worked for me, you could try others
@Ton1989 @YarvixPA i dont know if this is true, but any theory is a good conspiracy theory ;-) when i ran into this error it was on a workflow with a non gguf clip model. Changing the clip model to be gguf as well, and the error was gone. Worth a try, but defo dont know if it holds water ;-) or if i was just having some random luck
EDIT: strike that, it seems to happen if not using certain resolutions
What setting you use?
Its a prompt issue I checked his prompt "android looking at his fingers twitching" of course there will be no talking.
Dont see an issue with my prompt. Realistic images dont require describing that person is speaking, they just work. You are welcome to provide your results with anime characters.
nah mate all examples from the official wan page are having a specific prompt. Its the same like with loras they are mosty working without triggerwords but much better using the specific trigger words.
Hi , This is my result with 2 Lora and just 4 step cf1 on ksampler it fast and keeping nice quality 👉 01:31 min. for RTX 4070 12GB .
and sound i try on my lang. is Thailand is perfect !!!
video say 👉 สบายดีพี่น้อง วันนี้จะไปเที่ยวไหนดีคะ (Hi everyone. Where would you like to go today? )
What setting you use?
Its a prompt issue I checked his prompt "android looking at his fingers twitching" of course there will be no talking.
Dont see an issue with my prompt. Realistic images dont require describing that person is speaking, they just work. You are welcome to provide your results with anime characters.
Here u go
@YarvixPA why u deleted my post it was sfw. Also I have discovered an issue. When I use Q8 the microphone dissapears for a few secs no problem with Q3 much better prompt following.
could anyone share a working workflow? using the original workflow in the first comment i get terrible times like 1300seconds per step... so doing 4 steps predicts to need like 5 hours.
RTX506016GB here
what are your generation times?
Any Ideas with the following error?
AudioEncoderEncode
Input type (double) and bias type (float) should be the same
could anyone share a working workflow? using the original workflow in the first comment i get terrible times like 1300seconds per step... so doing 4 steps predicts to need like 5 hours.
RTX506016GB herewhat are your generation times?
There are several different workfows in the videos here..just download and load it into comfy. As for gentime 740 secs for a 9sec video 480x832 resolution using Q8 8 steps with 2 samplers and high and low speed lora on my 4070 12 GB
@Sikaworld,loscrossos In my 5060 16gb ram marks out of memory which specs are you starting running comfyui and the workflow? would you mind to share the json of your workflow?
I experience very similar errors with SOME starting pictures. Tried two different pictures with same resolution, one works, the other does not. So it can't be just that.
example:
"RuntimeError: The size of tensor a (23) must match the size of tensor b (22) at non-singleton dimension 4"
EDIT: Guess I don't understand enough, but does it mean some resolutions are guaranteed to work ALWAYS and some are just SOMETIMES?
System Ryzen 9 5900x, 64 GB RAM, RTX 4070 12 GB, Using Q8 with multigpu node.
1st gen "raw workflow" 5sec vid 26 mins2nd gen, sageattn etc enabled, high and low speed, 4 steps, gentime 284 secs but the well known motion problem. At least videogirl looks close to the 1 showcase girl
Would you mind sharing the workflow? seems im running out of memory and have 16 gigs vram
I experience very similar errors with SOME starting pictures. Tried two different pictures with same resolution, one works, the other does not. So it can't be just that.
example:
"RuntimeError: The size of tensor a (23) must match the size of tensor b (22) at non-singleton dimension 4"
I found resizing the pictures to the mentioned standards above works.
@miguelamendez ,thank you, I just discovered the "WanVideo Image Resize To Closest" node and so far no more errors.
@Sikaworld,loscrossos In my 5060 16gb ram marks out of memory which specs are you starting running comfyui and the workflow? would you mind to share the json of your workflow?
i am running the WF of the very first post fully unchanged. only thing is i have sage attention installed. i only changed steps to 4. it comes with steps 20 and it takes forever to start an then says it going to take 11 hours.. so i aborted it.
i havent had luck with any of the embedded videos yet.. all take so long to start. i have even a fully brand new comfy setup only for this
@Sikaworld,loscrossos In my 5060 16gb ram marks out of memory which specs are you starting running comfyui and the workflow? would you mind to share the json of your workflow?
i am running the WF of the very first post fully unchanged. only thing is i have sage attention installed. i only changed steps to 4. it comes with steps 20 and it takes forever to start an then says it going to take 11 hours.. so i aborted it.
i havent had luck with any of the embedded videos yet.. all take so long to start. i have even a fully brand new comfy setup only for this
@loscrossos What GPU and how much VRAM and RAM do you have?
We have uploaded the workflow with the GGUF loaders also in this repo here
does this work for v2v ?? or I2V only?
@Sikaworld,loscrossos In my 5060 16gb ram marks out of memory which specs are you starting running comfyui and the workflow? would you mind to share the json of your workflow?
i am running the WF of the very first post fully unchanged. only thing is i have sage attention installed. i only changed steps to 4. it comes with steps 20 and it takes forever to start an then says it going to take 11 hours.. so i aborted it.
i havent had luck with any of the embedded videos yet.. all take so long to start. i have even a fully brand new comfy setup only for this
I recommend to use this workflow
but using lighting lora high and low 1.0 instead of fusionx, 8 steps split 4/4. Generation time should be 700secs to 12 mins depending on seed for a 9 sec video 480x832 using Q8 multigpu node (u should repplace)
I have discovered that the ai video (wan) generation time is significantly depending on the chosen seed. Thats why:
In generative models, the seed initializes the random number generator that controls the starting noise pattern.
This noise is the “canvas” the model gradually refines into your video.
Different seeds → different starting noise → different intermediate states during generation.
Modified the comfy workflow, almost no color drift. The lora that's not bypassed is the best one that I found. It's very fast too, for relatively good results. Q8 on 3090, 4 steps, the old wan2.1 lora works the best, from 1 to 1.5. 120-140 sec per 77 frames. So 14.5 sec of video would be around 7 min (before upscale, etc). Oh yeah, and just ignore the indexed batch select, I was trying something there, it didn't work.
And it works with ultimate upscale, too, without much issues (cut out best parts here)
@YarvixPA Yup, wf should be included. q8 is obviously superior. Sticking close to 16:9 seems to help. Native nodes + gguf (obviously) and some convenience nodes, but it should work with just native and gguf, too. Adjusted the settings a bit, euler beta57 is quick and pretty accurate.
lightx2v_t2v_14b_cfg_step_distill_v2_lora_rank256_bf16 lora for wan2.1 has given me the best results so far, others didn’t really work. It's still not 100% amazing, but at 1.5 strength it's "OK". Truncating first 1 or 2 frames (best 2 imo) improves the overall result.
Oh, and torch compile sometimes freaks out, but just clearing the cache seems to solve it.
@YarvixPA Yup, wf should be included. q8 is obviously superior. Sticking close to 16:9 seems to help. Native nodes + gguf (obviously) and some convenience nodes, but it should work with just native and gguf, too. Adjusted the settings a bit, euler beta57 is quick and pretty accurate.
lightx2v_t2v_14b_cfg_step_distill_v2_lora_rank256_bf16 lora for wan2.1 has given me the best results so far, others didn’t really work. It's still not 100% amazing, but at 1.5 strength it's "OK". Truncating first 1 or 2 frames (best 2 imo) improves the overall result.
Oh, and torch compile sometimes freaks out, but just clearing the cache seems to solve it.
Have u also tested the apative lightx2v?
@Sikaworld1990
Enjoy!
S2V gguf q8, cfg 1, 4 steps, 25 s/it, 77frames, seed 19, euler beta57, inductor, sage attention. LIMITED testing. The sync might be better on slower text (used relatively fast speech)
Because it's the best so far, I ran additional tests:
lightx2v_T2V_14B_cfg_step_distill_v2_lora_rank256_bf16.safetensors
1.0: mid (slight color shift, ~75% synced, very slight ghosting)
1.5: good (slight color shift, ~90% synced, great movement)
1.6-1.9: good (color shift increases, lips ~90% synced, movement ranges from great to good - probably solved with different seed)
2.0: mid (color fried, ~95% synced, great movement)
Wan21_T2V_14B_lightx2v_cfg_step_distill_lora_rank64.safetensors
1.0: fail (face ghosting, synced ~75%)
1.5: mid/good (slight color shift, ~80% synced)
Expect similar results from 1.6 as above
lightx2v_14B_T2V_cfg_step_distill_lora_adaptive_rank_quantile_0.15_bf16.safetensors
1.0: mid (color shift, synced ~70%)
1.5: good (slight color shift, synced ~80%)
2.0: good (color shift, synced ~90%)
Similar performance to rank256_bf16
Wan2.2-Lightning_T2V-v1.1-A14B-4steps-lora_HIGH_fp16.safetensors
1.0: mid (color shift, synced ~60%)
1.5: fail (color fried, synced ~60%)
Wan2.2-Lightning_T2V-v1.1-A14B-4steps-lora_LOW_fp16.safetensors
1.0: fail (total ghosting)
1.5: fail (weirdly enough the motion and lip sync are great, it's just a ghost talking)
2.0: mid (slight ghosting, good movement) There might be something here! res_2s beta57 - fail, euler simple - mid, dpmpp_3m_sde beta57 - mid/fail.
2.5: fail (less ghosting, bad movement)
Wan2.2-low-T2V-A14B-4steps-lora-rank64-Seko-V1.1.safetensors
1.0: fail (total ghosting)
1.5: fail (total ghosting)
Wan21_PusaV1_Lora_14B_rank512_bf16.safetensors (as addon lora)
fail (no noticeable improvement with rank256_bf16)
@PiquantSalt why are you using the T2V-lora, when S2V should be more like I2V? I am using the low+high noise lightx2v-I2V-loras with acceptable results.
@PiquantSalt why are you using the T2V-lora, when S2V should be more like I2V? I am using the low+high noise lightx2v-I2V-loras with acceptable results.
I agree, in theory. The reason is cause my gen results were worse with i2v loras, and s2v loads both i2v and t2v loras normally. See above t2v lora result with the woman in armor video and upscale (no drift on face movement, no color drift at all). Let me run the full test on all my i2v variants and I'll come back with the exact results.