Twelve Labs Unveils 'Marengo 3.0' Video Understanding Model, Doubling Indexing Speed

Amidst the deluge of video data, a next-generation AI capable of enabling 'understanding' in its true sense has finally emerged. Twelve Labs, a leading player in video understanding-based multimodal AI, has officially launched its innovativ...

admin

Dec 2, 2025 - 00:00

0 3.7k

Twelve Labs Unveils 'Marengo 3.0' Video Understanding Model, Doubling Indexing Speed

Amidst the deluge of video data, a next-generation AI capable of enabling 'understanding' in its true sense has finally emerged. Twelve Labs, a leading player in video understanding-based multimodal AI, has officially launched its innovative video AI model 'Marengo 3.0,' opening a new horizon in video analysis. Beyond simple video analysis, Marengo 3.0 is a true 'video foundation model' that integrally grasps all elements within a video. This model comprehensively analyzes not only the visual information present in a scene but also voice, subtle movements, and even situational context. It accurately tracks how objects, actions, and emotions change over time and boasts astonishing insight that connects and interprets how conversations in a video lead to subsequent actions. Now, the era has arrived where AI 'appreciates' and 'understands' videos. This Marengo 3.0 features powerful functions designed to maximize search convenience and precision. The 'Composite Image Search' function allows users to find desired scenes by simultaneously utilizing images and text keywords, while the 'Proper Noun Search' function enables easy retrieval of all videos featuring a specific person or product by registering that element. It supports 36 languages worldwide, increasing accessibility for global users. Performance has also dramatically improved. According to internal test results, storage costs have been reduced by 50% compared to the previous model, and video indexing speed is twice as fast, giving it an unparalleled advantage in efficiency. Twelve Labs explains that the core of Marengo 3.0 lies in its 'native foundation architecture,' which transcends existing limitations. This goes beyond mere frame-by-frame analysis or the mechanical combination of image-audio models, enabling deep temporal and spatial interpretation of the entire video and a complete grasp of continuity and flow between scenes. It's like a human watching a movie, understanding the narrative and context within the video. Such innovative capabilities hold immense potential across various industries. In sports leagues, it can automatically extract highlight-reel moments of specific players to create highlights, or quickly find specific behavioral patterns of desired individuals in media archives. It can maximize the efficiency of public security CCTV video analysis, and in e-commerce, it can be utilized for precise analysis of the exposure effect of specific products. Jaesung Lee, CEO of Twelve Labs, stated, "Despite video constituting the majority of digital data worldwide, its complexity has historically prevented its full value from being utilized." He emphasized, "Marengo 3.0 will resolve this fundamental challenge and provide a powerful tool for businesses and developers to easily explore and leverage the infinite possibilities of video data." With Marengo 3.0, video data will no longer be mere storage but a treasure trove of valuable insights.