P

ProgramBench

by facebookresearch
🔓 Open Source Python 🌍 Global free

About

ProgramBench is a benchmark developed by facebookresearch designed to evaluate the capability of Language Models (LLMs) to rebuild programs from scratch. It challenges AI agents to architect and implement a complete codebase that reproduces the original program's behavior, given only a compiled binary and its documentation. This tool is crucial for assessing LLMs' performance in reverse engineering and code generation tasks.

Features

  • Binary-to-source code reconstruction evaluation
  • Assesses AI agent program architecture and implementation
  • Provides standard dataset and leaderboard for performance comparison
  • Supports quick deployment in Python environments
  • Focuses on language model reverse engineering capabilities

Supported Platforms

desktop