Deciphering animal intent presents a fundamental challenge in computational ethology, primarily due to semantic aliasing—the phenomenon where identical external signals (e.g., a cat's purr) correspond to radically different internal states depending on physiological context. Existing Multimodal Large Language Models (MLLMs) are often blind to high-frequency biological time-series data. This limitation restricts them to superficial behavioral pattern matching, hindering their ability to perform genuine latent-state reasoning.
To bridge this critical gap, a research team has introduced Meow-Omni 1, which stands as the first open-source, quad-modal MLLM purpose-built for computational ethology. Meow-Omni 1 is designed to natively fuse video, audio, and physiological time-series streams with textual reasoning. Through targeted architectural adaptation, it integrates specialized scientific encoders into a unified backbone and formalizes intent inference via physiologically grounded cross-modal alignment.
Evaluated on MeowBench, a novel and expert-verified quad-modal benchmark, Meow-Omni 1 achieved a state-of-the-art intent-recognition accuracy of 71.16%. This performance substantially outperforms leading vision-language and omni-modal baselines. The complete open-source pipeline, including model weights, the training framework, and the Meow-10K dataset, has been released. This initiative aims to establish a scalable paradigm for inter-species intent understanding and to advance foundation models toward real-world applications such as veterinary diagnostics and wildlife conservation.