m

magika

by google
🔓 Open Source Python 🌍 Global free

About

Magika, developed by Google, is an innovative AI-powered file type detection tool leveraging deep learning for unparalleled accuracy. It uses a highly optimized, compact model (only a few MBs) to swiftly identify file types within milliseconds, even on a single CPU. Trained on a massive dataset of ~100 million samples across 200+ content types, Magika achieves an impressive ~99% average accuracy, excelling particularly with textual formats. Currently deployed at scale within Google for security and content policy routing across Gmail, Drive, and Safe Browsing, Magika also integrates with industry platforms like VirusTotal. Available as a command-line tool, Python API, and JavaScript/Go bindings, it offers robust, efficient, and flexible file identification for diverse applications.

Features

  • AI-Powered Deep Learning Accuracy: Utilizes a custom, optimized deep learning model for ~99% average accuracy across 200+ content types.
  • Ultra-Fast & Efficient Identification: Identifies file types within milliseconds using a compact, few-MB model, even on a single CPU, with near-constant inference time regardless of file size.
  • Scalable & Versatile Deployment: Processes billions of files weekly at Google; available as CLI, Python API, JavaScript, and Go bindings, supporting recursive scanning and configurable confidence modes.

Supported Platforms

linuxmacos