Soldato
This is pretty cool.
Gemini - Google DeepMind
Gemini is built from the ground up for multimodality — reasoning seamlessly across image, video, audio, and code.
deepmind.google