---
license: mit
datasets:
- ILSVRC/imagenet-1k
model-index:
  - name: Taming-VQGAN
    results:
      - task:
          type: image-generation
        dataset:
          name: ILSVRC/imagenet-1k
          type: ILSVRC/imagenet-1k
        metrics:
          - name: rFID
            type: rFID
            value: 7.96
          - name: InceptionScore
            type: InceptionScore
            value: 115.9
          - name: LPIPS
            type: LPIPS
            value: 0.306
          - name: PSNR
            type: PSNR
            value: 20.2
          - name: SSIM
            type: SSIM
            value: 0.52
          - name: CodebookUsage
            type: CodebookUsage
            value: 0.445
---

This model is the Taming VQGAN tokenizer with a vocabulary size of 10bits converted into a format for the MaskBit codebase. It uses a downsampling factor of 16 and is trained on ImageNet for images of resolution 256.

You can find more details on the VQGAN in the original [repository](https://github.com/CompVis/taming-transformers) or [paper](https://arxiv.org/abs/2012.09841). All credits for this model belong to Patrick Esser, Robin Rombach and Björn Ommer.