turkeyju
/

generator_maskgen_kl_xl

model_hub_mixin

pytorch_model_hub_mixin

text-to-image-generation

Model card Files Files and versions Community

turkeyju commited on 29 days ago

Commit

2d7081c

·

verified ·

1 Parent(s): 49f217d

Push model using huggingface_hub.

Files changed (2) hide show

README.md +3 -3
config.json +42 -0

README.md CHANGED Viewed

@@ -6,7 +6,7 @@ tags:
 - pytorch_model_hub_mixin
 - text-to-image-generation
 ---
 This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
-- Project page: https://tacju.github.io/projects/maskgen.html
-- arXiv: https://arxiv.org/abs/2501.07730
-- Library: https://github.com/bytedance/1d-tokenizer

 - pytorch_model_hub_mixin
 - text-to-image-generation
 ---
 This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
+- Library: https://github.com/bytedance/1d-tokenizer
+- Docs: [More Information Needed]

config.json ADDED Viewed

	@@ -0,0 +1,42 @@

+{
+    "experiment": {
+        "tokenizer_checkpoint": "tatitok_bl32_vae.bin",
+        "generator_checkpoint": "maskgen_kl_xl.bin"
+    },
+    "model": {
+        "vq_model": {
+            "quantize_mode": "vae",
+            "token_size": 16,
+            "vit_enc_model_size": "base",
+            "vit_dec_model_size": "large",
+            "vit_enc_patch_size": 16,
+            "vit_dec_patch_size": 16,
+            "num_latent_tokens": 32,
+            "scale_factor": 0.7525,
+            "finetune_decoder": false,
+            "is_legacy": false
+        },
+        "maskgen": {
+            "decoder_embed_dim": 1280,
+            "decoder_depth": 20,
+            "decoder_num_heads": 16,
+            "micro_condition": true,
+            "micro_condition_embed_dim": 256,
+            "text_drop_prob": 0.1,
+            "cfg": 3.0,
+            "cfg_schedule": "linear",
+            "num_iter": 32,
+            "temperature": 1.0,
+            "sample_aesthetic_score": 6.5
+        }
+    },
+    "losses": {
+        "diffloss_d": 8,
+        "diffloss_w": 1280
+    },
+    "dataset": {
+        "preprocessing": {
+            "crop_size": 256
+        }
+    }
+}