ViTConfig

class lib.model.networks.clip.ViTConfig(embed_dim: int, resolution: int, layer_conf: int | tuple[int, int, int, int], width: int, patch: int, git_id: int = 0)

Bases: object

Configuration settings for ViT

Parameters:
  • embed_dim (int) – Dimensionality of the final shared embedding space

  • resolution (int) – Spatial resolution of the input images

  • layer_conf (tuple[int, int, int, int] | int) – Number of layers in the visual encoder, or a tuple of layer configurations for a custom ResNet visual encoder

  • width (int) – Width of the visual encoder layers

  • patch (int) – Size of the patches to be extracted from the images. Only used for Visual encoder.

  • git_id (int, optional) – The id of the model weights file stored in deepfakes_models repo if they exist. Default: 0

Attributes Summary

git_id

Attributes Documentation

git_id: int = 0