ProgramBench

by facebookresearch

🔓 Open Source Python 🌍 Global free

About

ProgramBench is a benchmark developed by facebookresearch designed to evaluate the capability of Language Models (LLMs) to rebuild programs from scratch. It challenges AI agents to architect and implement a complete codebase that reproduces the original program's behavior, given only a compiled binary and its documentation. This tool is crucial for assessing LLMs' performance in reverse engineering and code generation tasks.

Features

Binary-to-source code reconstruction evaluation
Assesses AI agent program architecture and implementation
Provides standard dataset and leaderboard for performance comparison
Supports quick deployment in Python environments
Focuses on language model reverse engineering capabilities

Supported Platforms

desktop

Links

🌐 Visit Website 📦 GitHub Repository

ProgramBench

About

Features

Supported Platforms

Links

Related AI Industry News

MCP: The USB-C of AI Tools, Addressing Developers' Outdated AI Assistant Workflows

Unveiling MCP Tool's Hidden Footprint: How eBPF Exposes AI Agent's True Kernel Interactions

Neuralink Developing Surgical Robot Capable of Reaching All Brain Regions for Universal Neural Interface