ViTConfig

class lib.model.networks.clip.ViTConfig(embed_dim: int, resolution: int, layer_conf: int | tuple[int, int, int, int], width: int, patch: int, git_id: int = 0)

Bases: object

Configuration settings for ViT

Parameters:

embed_dim (int) – Dimensionality of the final shared embedding space
resolution (int) – Spatial resolution of the input images
layer_conf (tuple[int, int, int, int] | int) – Number of layers in the visual encoder, or a tuple of layer configurations for a custom ResNet visual encoder
width (int) – Width of the visual encoder layers
patch (int) – Size of the patches to be extracted from the images. Only used for Visual encoder.
git_id (int, optional) – The id of the model weights file stored in deepfakes_models repo if they exist. Default: 0

Attributes Summary

git_id

Attributes Documentation

git_id: int = 0