#deep-learning
Ecosystem overview for everything related to deep-learning.
Products (2)
Magika, developed by Google, is an innovative AI-powered file type detection tool leveraging deep learning for unparalleled accuracy. It uses a highly optimized, compact model (only a few MBs) to swiftly identify file types within milliseconds, even on a single CPU. Trained on a massive dataset of ~100 million samples across 200+ content types, Magika achieves an impressive ~99% average accuracy, excelling particularly with textual formats. Currently deployed at scale within Google for security and content policy routing across Gmail, Drive, and Safe Browsing, Magika also integrates with industry platforms like VirusTotal. Available as a command-line tool, Python API, and JavaScript/Go bindings, it offers robust, efficient, and flexible file identification for diverse applications.
"LLMs-from-scratch" is the official code repository for Sebastian Raschka's book *Build a Large Language Model (From Scratch)*. It guides users through developing, pretraining, and finetuning a GPT-like LLM from the ground up, mirroring the approach of large-scale foundational models. The project implements all code in PyTorch, avoiding external LLM libraries, and includes functionalities for loading larger pretrained models for finetuning. It's designed for educational purposes, focusing on in-depth understanding of LLM mechanics.