Alibaba's Qianwen App has received a significant upgrade with the official launch of its latest AI model, Wan2.7. This model is designed to offer a versatile AI creation platform, substantially enhancing video and image generation capabilities, making professional-grade content creation accessible on mobile devices.
The Wan2.7 model demonstrates remarkable prowess in video generation. It can produce coherent videos from text prompts; for instance, a simple prompt can generate scenes with vivid character expressions and fluid camera transitions. It excels in portraying diverse individuals in group shots, achieving a 'thousand faces' effect, and automatically integrates matching sound effects. Furthermore, the model supports image-to-video functionality, allowing users to generate dynamic content, such as a saxophone performance, by simply uploading an image and audio. Wan2.7 also features video continuation, enabling users to extend existing videos with new elements like a tail frame for seamless transitions, and localized video editing, such as replacing specific objects in a video with other images while maintaining high detail, exemplified by reflections on a plate.
Beyond its generative power, Wan2.7 introduces an action imitation feature. Users can extract character movements from a video and apply them to a character in an image, precisely replicating gestures and body language. This significantly streamlines the motion capture and animation production process.
In image generation, Wan2.7-Image reaches new heights with its 'thousand faces' capability, offering fine-grained control over facial details. Users can customize features like bone structure, eyes, and skin texture. For example, through detailed text descriptions, the model can accurately recreate the image of Professor Snape from 'Harry Potter,' exhibiting a realism that surpasses Gemini and ChatGPT's outputs for similar prompts, with pores and wrinkles clearly visible. Additionally, Wan2.7-Image provides a color palette feature, allowing for precise 8-color HEX control to ensure accurate main color tones, as demonstrated in a cyberpunk street scene's blue hues. The model also supports ultra-long text input of up to 3K tokens, ensuring accurate rendering of bilingual (Chinese and English) text without distortions, capable of generating content equivalent to a full A4 page.
From a user experience perspective, the Qianwen App's interface retains its familiar design, but the new video editing, continuation, and action imitation capabilities, combined with the Pro-level model performance, make the creation process more convenient and efficient. Practical tests indicate that Wan2.7 significantly enhances the playability and usability of content creation for daily creative expressions, professional graphic design, and even film production (e.g., AI actors, AI short dramas). In human preference blind tests, Wan2.7-Image has ranked first among domestic generative models, surpassing GPT Image 1.5 and approaching Nano Banana Pro, underscoring its robust capabilities.