ViTConfig
- class lib.model.networks.clip.ViTConfig(embed_dim: int, resolution: int, layer_conf: int | tuple[int, int, int, int], width: int, patch: int, git_id: int = 0)
Bases:
objectConfiguration settings for ViT
- Parameters:
embed_dim (int) – Dimensionality of the final shared embedding space
resolution (int) – Spatial resolution of the input images
layer_conf (tuple[int, int, int, int] | int) – Number of layers in the visual encoder, or a tuple of layer configurations for a custom ResNet visual encoder
width (int) – Width of the visual encoder layers
patch (int) – Size of the patches to be extracted from the images. Only used for Visual encoder.
git_id (int, optional) – The id of the model weights file stored in deepfakes_models repo if they exist. Default: 0
Attributes Summary
Attributes Documentation
- git_id: int = 0