In the first pretrained stage, the model is trained using image-text pairs from preprocessed mimic dataset. To launch the first stage training, run the following command. In our experiments, we use 4 ...
Some results have been hidden because they may be inaccessible to you