trojblue
/

HunyuanVideo-lora-AnimeStills

template:diffusion-lora

Model card Files Files and versions Community

HunyuanVideo-lora-AnimeStills / README.md

trojblue's picture

Update README.md

5f46ec9 verified 3 months ago

|

history blame contribute delete

2.51 kB

	---
	tags:
	- text-to-image
	- text-to-video
	- lora
	- diffusers
	- template:diffusion-lora
	widget:
	- text: '-'
	output:
	url: images/ComfyUI_00789_.png
	- text: '-'
	output:
	url: images/ComfyUI_00796_.png
	- text: '-'
	output:
	url: images/ComfyUI_00793_.png
	- text: >-
	an anime illustration of kitsune, girl, blue eyes, braided hair,
	multicoloured hair, brown hair, pink hair, brown fox ears, brown fox tail,
	fantasy school uniform, open shoulders, masterpiece, best quality, with
	professional photography composition, dynamic lighting, well-balanced color
	and contrast, clear separation of subject and background, detailed, and
	storytelling.
	output:
	url: images/ComfyUI_00784_.png
	base_model: tencent/HunyuanVideo
	instance_prompt: an anime illustration of
	license: mit
	---
	# Hunyuan Video Lora - AnimeStills

	<Gallery />

	EXPERIMENTAL: the model generates noisy, low-resolution illustration-like images. It can be used to guide more refined models such as SDXL for its natural language (and composition) capabilities, but use with a grain of salt if you plan to use it directly. Also, results might look 'old-time anime' due to dataset used.


	A experimental model that uses HunyuanVideo as a image generator. outputs images at 768 resolution.

	In a typical HunyuanVideo workflow, set 'frame' to 1 and add this lora to get an anime illustration-like output.


	## Trigger words

	You should use `an anime illustration of` to trigger the image generation.


	## Resolutions

	Use the following resolution for the best results:
	```
	(768, 768)
	(672, 864), (864, 672)
	(608, 960), (960, 608)
	(544, 1088), (1088, 544)
	```

	## Training


	![image/png](https://cdn-uploads.huggingface.co/production/uploads/636982a164aad59d4d42714b/oMsfEYPYbWLyK6mihIkWm.png)

	The model has been trained on a tag-balanced dataset of 2k best pixiv illustrations, at resolution of 768, for 856 eopchs (214 epochs * 4 repeats per epoch).


	The training takes about 3 days on a 8 x H100 cluster. By the time training ends the loss is still consistently going down, so further training could be beneficial.




	## Download model

	Weights for this model are available in Safetensors format.

	[Download](/trojblue/HunyuanVideo-lora-AnimeStills/tree/main/epoch214) them in the Files & versions tab.


	## Limitations

	The model outputs could be deformed, not conforming to prompt, turning realistic, or getting nsfw results, due to the limited size of dataset used and limitations of lora models.