VitNet
- class eyefeatures.deep.models.VitNet(CNN, RNN, fusion_mode='concat', activation=None, embed_dim=32)[source]
Bases:
ModuleParent class for a vision-and-text network that fuses CNN and RNN-based representations using concatenation or addition.
- Parameters:
CNN – (nn.Module) CNN backbone for processing image data.
RNN – (nn.Module) RNN backbone for processing sequence data.
fusion_mode – (str, optional) Fusion mode (‘concat’ or ‘add’). Default is ‘concat’.
activation – (nn.Module, optional) Activation function applied after fusion. Default is None.
embed_dim – (int, optional) Embedding dimension for the projected features. Default is 128.