ABOUT

GLM-OCR is a multimodal OCR model built on the GLM-V encoder-decoder architecture, specifically designed for complex document understanding. It integrates a CogViT visual encoder and GLM-0.5B language decoder, leveraging Multi-Token Prediction (MTP) loss and reinforcement learning to significantly boost training efficiency, recognition accuracy, and generalization. Achieving a SOTA score of 94.62 on OmniDocBench V1.5, GLM-OCR excels in handling formulas, tables, and information extraction. With only 0.9B parameters, it supports efficient deployment via vLLM and SGLang, offering low inference latency and optimized costs, ideal for high-concurrency and edge scenarios. Fully open-sourced with a comprehensive SDK, GLM-OCR ensures easy installation and integration for accurate and rapid document intelligence.

CAPABILITIES

State-of-the-Art Performance: Achieves a SOTA score of 94.62 on OmniDocBench V1.5, excelling in complex layouts like formulas and tables.
Optimized for Real-World Scenarios: Delivers robust performance on challenging real-world documents, including complex tables, code-heavy content, and seals.
Efficient Inference & Deployment: With 0.9B parameters, it supports vLLM and SGLang for low-latency, high-concurrency, and cost-optimized deployment.
Fully Open-Sourced & Easy to Use: Offers a comprehensive SDK and toolchain for simple installation and seamless integration into existing pipelines.

EXTERNAL RESOURCES

GitHub Repository ↗

RELATED ECOSYSTEM PRODUCTS