Forum Moderators: Robert Charlton & goodroi
MUM not only understands language, but also generates it. It’s trained across 75 different languages and many different tasks at once, allowing it to develop a more comprehensive understanding of information and world knowledge than previous models. And MUM is multimodal, so it understands information across text and images and, in the future, can expand to more modalities like video and audio.
MUM is multimodal, which means it can understand information from different formats like webpages, pictures and more, simultaneously. Eventually, you might be able to take a photo of your hiking boots and ask, “can I use these to hike Mt. Fuji?” MUM would understand the image and connect it with your question to let you know your boots would work just fine. It could then point you to a blog with a list of recommended gear.