it's shown that the simple pre-instruction endeavor of predicting which caption goes with which image is surely an productive and scalable way to understand SOTA graphic representations from scratch on the dataset of https://k2spiceshop.com/product/liquid-k2-on-paper-online/