Following its official launch, GPT Image 2 has firmly established its leadership in the AI image generation domain, significantly outperforming competitors with a score of 241 in model benchmarks. Beyond the performance metrics, the research and development team behind this breakthrough is particularly noteworthy.
The core OpenAI team responsible for GPT Image 2 comprises only 13 individuals, with Chinese researchers making up a substantial portion. A closer look at their backgrounds reveals that many had prior connections in Chinese universities, laboratories, or even research summer camps before joining OpenAI, highlighting a tightly-knit "acquaintance network" and academic mentorship structure within the AI community.
Boyuan Chen stands out as a pivotal member of the GPT Image 2 team, his growth emblematic of the "pass-on-and-lead" mentorship common in Chinese academia. During high school, he attended a research summer camp in Wuxi, where he, without prior programming experience, met Xia Fei, who would later become a senior researcher at Google DeepMind. Xia Fei introduced him to deep learning, becoming his guide into the field of AI.
The two maintained continuous contact. Chen pursued Computer Science and Mathematics at UC Berkeley, joining the EECS honors program with a 3.96 GPA, and conducted research under Pieter Abbeel. He also founded a robotics education company in 2017. During his challenging first year as a PhD student at MIT, Xia Fei provided crucial support, helping him publish the impactful paper NLMap. Xia Fei also invited him twice for internships at DeepMind. During his 2023 internship, Chen led the development of a multimodal large language model data synthesis pipeline, and his refined instruction-tuning techniques were incorporated into Gemini 2.0's development. With this extensive experience, Chen joined OpenAI in June 2025 and is also a member of the Sora video generation team.
At MIT, Chen Boyuan worked on "world models" research under Assistant Professor Vincent Sitzmann at the Computer Science and Artificial Intelligence Laboratory (CSAIL). Kiwhan Song was a fellow student in the same lab, mentored by Sitzmann. The core focus of Sitzmann's lab is enabling AI to predict changes in the physical world through mental simulation, rather than mere pixel imitation—an approach that likely directly influenced GPT Image 2's technical direction. During their PhDs, Chen and Song collaborated on papers such as "History-Guided Video Diffusion" and "Large Video Planner," exploring how to integrate diffusion models with sequential generation to allow models to establish temporal and spatial causal logic before generating content. Notably, Kiwhan Song is also the creator of the distinctive "long-neck" sticker-style cartoon avatars.
Beyond these two lab mates, the team includes two other Chinese members with extensive industrial research backgrounds. Jianfeng Wang spent nearly nine years at Microsoft as a principal researcher, focusing on large-scale multimodal representation learning, and collaborated closely with the OpenAI team during DALL-E 3's development. At OpenAI, he is primarily responsible for improving the model's instruction following and understanding of world knowledge. Bing Liang, after over five years at Google as a senior software engineer, contributed to the core R&D of Imagen 3, Veo video models, and the Gemini multimodal series. He joined OpenAI last August to focus on image generation research. Their contributions extend beyond individual capabilities, bringing years of accumulated engineering experience and lessons learned from competitors, saving the team considerable time and effort.
Weixin Liang and Yuguang Yang represent another notable pair in the team, both graduates of Zhejiang University's Chu Kochen Honors College, sharing a common undergraduate background.
Yuguang Yang's career spans diverse fields. He studied engineering at Chu Kochen Honors College as an undergraduate, then pursued a PhD in Computational Chemical Physics and Machine Learning at Johns Hopkins University. Subsequently, he researched deep learning for speech recognition at Amazon Alexa and then handled query understanding and large-scale retrieval at Microsoft Bing. He also conducted visiting research at Tsinghua University on reinforcement learning algorithms for nanorobots navigating human blood vessels, publishing seven peer-reviewed journal articles during this period. This interdisciplinary expertise was directly evident in GPT Image 2's launch demonstrations.
In contrast, Weixin Liang's path leaned more academic. He pursued his PhD at the Stanford AI Lab (SAIL), collaborating with renowned professors like Christopher Manning, Fei-Fei Li, and James Zou. During an internship at Meta, he authored the paper "Mixture-of-Transformers (MoT)," which introduced a modality-decoupled mixture-of-experts model architecture. This approach implemented modality-aware sparsity for every non-embedding parameter of the Transformer, including feed-forward networks, attention matrices, and layer normalization, ultimately reducing the computational cost of multimodal pre-training by 66% and validating it at a 30B parameter scale. MoT effectively addressed the exponential growth in computational demands of multimodal models, which handle text and high-resolution images simultaneously, by efficiently allocating weights between different modalities during pre-training through its decoupled attention mechanism. This research was lauded as a "foundational contribution driving the unification of multimodal understanding and generation."
In recent years, graduates from top Chinese universities like Tsinghua's Yao Class, Zhejiang University's Chu Kochen Honors College, University of Science and Technology of China's Youth Class, and Shanghai Jiao Tong University have become core members of leading international AI labs such as OpenAI, Anthropic, DeepMind, and Meta.
Besides the aforementioned individuals, several other core researchers play vital roles: Kenji Hata, with an MS in Computer Science from Stanford and prior experience at Google Research, has been involved in the development of models like 4o image generation (i.e., GPT-Image-1) and Sora 2 at OpenAI, making him one of the most experienced members in model iteration. Ayaan Haque, a former researcher at Luma AI involved in training the Dream Machine video generation model, brings expertise in handling high-dimensional temporal data to GPT Image 2 and its thinking pattern development. Dibya Bhattacharjee, with a BS/MS from Yale in Computer Science and nearly five years at Google, joined OpenAI in February 2024 for image generation research, demonstrating the model's multi-specification generation capabilities at the launch event and being key to its "out-of-the-box" output formats. Mengchao Z., holding a BS from Shanghai Jiao Tong University and an MS from Texas A&M University, boasts a solid engineering background, having previously led large-scale recommendation system architecture design, and is now responsible for transforming the model's technical capabilities into usable product forms.
Additionally, the identities of several other team members remain temporarily unverified.