--- license: mit datasets: - ILSVRC/imagenet-1k model-index: - name: Taming-VQGAN results: - task: type: image-generation dataset: name: ILSVRC/imagenet-1k type: ILSVRC/imagenet-1k metrics: - name: rFID type: rFID value: 7.96 - name: InceptionScore type: InceptionScore value: 115.9 - name: LPIPS type: LPIPS value: 0.306 - name: PSNR type: PSNR value: 20.2 - name: SSIM type: SSIM value: 0.52 - name: CodebookUsage type: CodebookUsage value: 0.445 --- This model is the Taming VQGAN tokenizer with a vocabulary size of 10bits converted into a format for the MaskBit codebase. It uses a downsampling factor of 16 and is trained on ImageNet for images of resolution 256. You can find more details on the VQGAN in the original [repository](https://github.com/CompVis/taming-transformers) or [paper](https://arxiv.org/abs/2012.09841). All credits for this model belong to Patrick Esser, Robin Rombach and Björn Ommer.