MaMMUT: A simple vision-encoder text-decoder architecture for ...

MaMMUT: A simple vision-encoder text-decoder architecture for ...

More to explore

Based on this image's title: “MaMMUT: A simple vision-encoder text-decoder architecture for ...