• OpenAI has introduced GPT‑5.1-Codex-Max, a new frontier agentic coding model now available in its Codex developer environment. The release marks a significant step forward in AI-assisted software engineering, offering improved long-horizon reasoning, efficiency, and real-time interactive capabilities. GPT‑5.1-Codex-Max will now replace GPT‑5.1-Codex as the default model across Codex-integrated surfaces.

    The new model is designed to serve as a persistent, high-context software development agent, capable of managing complex refactors, debugging workflows, and project-scale tasks across multiple context windows.

    It comes on the heels of Google releasing its powerful new Gemini 3 Pro model yesterday, yet still outperforms or matches it on key coding benchmarks:

    On SWE-Bench Verified, GPT‑5.1-Codex-Max achieved 77.9% accuracy at extra-high reasoning effort, edging past Gemini 3 Pro’s 76.2%.

    It also led on Terminal-Bench 2.0, with 58.1% accuracy versus Gemini’s 54.2%, and matched Gemini’s score of 2,439 on LiveCodeBench Pro, a competitive coding Elo benchmark.

    When measured against Gemini 3 Pro’s most advanced configuration — its Deep Thinking model — Codex-Max holds a slight edge in agentic coding benchmarks, as well.

    Performance Benchmarks: Incremental Gains Across Key Tasks

    GPT‑5.1-Codex-Max demonstrates measurable improvements over GPT‑5.1-Codex across a range of standard software engineering benchmarks.

    On SWE-Lancer IC SWE, it achieved 79.9% accuracy, a significant increase from GPT‑5.1-Codex’s 66.3%. In SWE-Bench Verified (n=500), it reached 77.9% accuracy at extra-high reasoning effort, outperforming GPT‑5.1-Codex’s 73.7%.

    Performance on Terminal Bench 2.0 (n=89) showed more modest improvements, with GPT‑5.1-Codex-Max achieving 58.1% accuracy compared to 52.8% for GPT‑5.1-Codex.

    All evaluations were run with compaction and extra-high reasoning effort enabled.

    These results indicate that the new model offers a higher ceiling on both benchmarked correctness and real-world usability under extended reasoning loads.

    Technical Architecture: Long-Horizon Reasoning via Compaction

    A major architectural improvement in GPT‑5.1-Codex-Max is its ability to reason effectively over extended input-output sessions using a mechanism called compaction.

    This enables the model to retain key contextual information while discarding irrelevant details as it nears its context window limit — effectively allowing for continuous work across millions of tokens without performance degradation.

    The model has been internally observed to complete tasks lasting more than 24 hours, including multi-step refactors, test-driven iteration, and autonomous debugging.

    Compaction also improves token efficiency. At medium reasoning effort, GPT‑5.1-Codex-Max used approximately 30% fewer thinking tokens than GPT‑5.1-Codex for comparable or better accuracy, which has implications for both cost and latency.

    Platform Integration and Use Cases

    GPT‑5.1-Codex-Max is currently available across multiple Codex-based environments, which refer to OpenAI’s own integrated tools and interfaces built specifically for code-focused AI agents. These include:

    • Codex CLI, OpenAI’s official command-line tool (@openai/codex), where GPT‑5.1-Codex-Max is already live.

    • IDE extensions, likely developed or maintained by OpenAI, though no specific third-party IDE integrations were named.

    • Interactive coding environments, such as those used to demonstrate frontend simulation apps like CartPole or Snell’s Law Explorer.

    • Internal code review tooling, used by OpenAI’s engineering teams.

    For now, GPT‑5.1-Codex-Max is not yet available via public API, though OpenAI states this is coming soon. Users who wish to work with the model in terminal environments today can do so by installing and using the Codex CLI.

    It is not currently confirmed whether or how the model will integrate into third-party IDEs unless they are built on top of the CLI or future API.

    The model is capable of interacting with live tools and simulations. Examples shown in the release include:

    • An interactive CartPole policy gradient simulator, which visualizes reinforcement learning training and activations.

    • A Snell’s Law optics explorer, supporting dynamic ray tracing across refractive indices.

    These interfaces exemplify the model’s ability to reason in real time while maintaining an interactive development session — effectively bridging computation, visualization, and implementation within a single loop.

    Cybersecurity and Safety Constraints

    While GPT‑5.1-Codex-Max does not meet OpenAI’s “High” capability threshold for cybersecurity under its Preparedness Framework, it is currently the most capable cybersecurity model OpenAI has deployed. It supports use cases such as automated vulnerability detection and remediation, but with strict sandboxing and disabled network access by default.

    OpenAI reports no increase in scaled malicious use but has introduced enhanced monitoring systems, including activity routing and disruption mechanisms for suspicious behavior. Codex remains isolated to a local workspace unless developers opt-in to broader access, mitigating risks like prompt injection from untrusted content.

    Deployment Context and Developer Usage

    GPT‑5.1-Codex-Max is currently available to users on ChatGPT Plus, Pro, Business, Edu, and Enterprise plans. It will also become the new default in Codex-based environments, replacing GPT‑5.1-Codex, which was a more general-purpose model.

    OpenAI states that 95% of its internal engineers use Codex weekly, and since adoption, these engineers have shipped ~70% more pull requests on average — highlighting the tool’s impact on internal development velocity.

    Despite its autonomy and persistence, OpenAI stresses that Codex-Max should be treated as a coding assistant, not a replacement for human review. The model produces terminal logs, test citations, and tool call outputs to support transparency in generated code.

    Outlook

    GPT‑5.1-Codex-Max represents a significant evolution in OpenAI’s strategy toward agentic development tools, offering greater reasoning depth, token efficiency, and interactive capabilities across software engineering tasks. By extending its context management and compaction strategies, the model is positioned to handle tasks at the scale of full repositories, rather than individual files or snippets.

    With continued emphasis on agentic workflows, secure sandboxes, and real-world evaluation metrics, Codex-Max sets the stage for the next generation of AI-assisted programming environments — while underscoring the importance of oversight in increasingly autonomous systems.

  • OpenAI’s annual developer conference on Monday was a spectacle of ambitious AI product launches, from an app store for ChatGPT to a stunning video-generation API that brought creative concepts to life. But for the enterprises and technical leaders watching closely, the most consequential announcement was the quiet general availability of Codex, the company's AI software engineer. This release signals a profound shift in how software—and by extension, modern business—is built.

    While other announcements captured the public’s imagination, the production-ready release of Codex, supercharged by a new specialized model and a suite of enterprise-grade tools, is the engine behind OpenAI’s entire vision. It is the tool that builds the tools, the proven agent in a world buzzing with agentic potential, and the clearest articulation of the company's strategy to win the enterprise.

    The general availability of Codex moves it from a "research preview" to a fully supported product, complete with a new software development kit (SDK), a Slack integration, and administrative controls for security and monitoring.This transition declares that Codex is ready for mission-critical work inside the world’s largest companies.

    "We think this is the best time in history to be a builder; it has never been faster to go from idea to product," said OpenAI CEO Sam Altman during the opening keynote presentation. "Software used to take months or years to build. You saw that it can take minutes now to build with AI." 

    That acceleration is not theoretical. It's a reality born from OpenAI’s own internal use — a massive "dogfooding" effort that serves as the ultimate case study for enterprise customers.

    Inside GPT-5-Codex: The AI model that codes autonomously for hours and drives 70% productivity gains

    At the heart of the Codex upgrade is GPT-5-Codex, a version of OpenAI's latest flagship model that has been "purposely trained for Codex and agentic coding." The new model is designed to function as an autonomous teammate, moving far beyond simple code autocompletion.

    "I personally like to think about it as a little bit like a human teammate," explained Tibo Sottiaux, an OpenAI engineer, during a technical session on Codex. "You can pair a program with it on your computer, you can delegate to it, or as you'll see, you can give it a job without explicit prompting."

    This new model enables "adaptive thinking," allowing it to dynamically adjust the time and computational effort spent on a task based on its complexity.For simple requests, it's fast and efficient, but for complex refactoring projects, it can work for hours.

    One engineer during the technical session noted, "I've seen the GPT-5-Codex model work for over seven hours productively... on a marathon session." This capability to handle long-running, complex tasks is a significant leap beyond the simple, single-shot interactions that define most AI coding assistants.

    The results inside OpenAI have been dramatic. The company reported that 92% of its technical staff now uses Codex daily, and those engineers complete 70% more pull requests (a measure of code contribution) each week. Usage has surged tenfold since August. 

    "When we as a team see the stats, it feels great," Sottiaux shared. "But even better is being at lunch with someone who then goes 'Hey I use Codex all the time. Here's a cool thing that I do with it. Do you want to hear about it?'" 

    How OpenAI uses Codex to build its own AI products and catch hundreds of bugs daily

    Perhaps the most compelling argument for Codex’s importance is that it is the foundational layer upon which OpenAI’s other flashy announcements were built. During the DevDay event, the company showcased custom-built arcade games and a dynamic, AI-powered website for the conference itself, all developed using Codex.

    In one session, engineers demonstrated how they built "Storyboard," a custom creative tool for the film industry, in just 48 hours during an internal hackathon. "We decided to test Codex, our coding agent... we would send tasks to Codex in between meetings. We really easily reviewed and merged PRs into production, which Codex even allowed us to do from our phones," said Allison August, a solutions engineering leader at OpenAI. 

    This reveals a critical insight: the rapid innovation showcased at DevDay is a direct result of the productivity flywheel created by Codex. The AI is a core part of the manufacturing process for all other AI products.

    A key enterprise-focused feature is the new, more robust code review capability. OpenAI said it "purposely trained GPT-5-Codex to be great at ultra thorough code review," enabling it to explore dependencies and validate a programmer's intent against the actual implementation to find high-quality bugs.Internally, nearly every pull request at OpenAI is now reviewed by Codex, catching hundreds of issues daily before they reach a human reviewer.

    "It saves you time, you ship with more confidence," Sottiaux said. "There's nothing worse than finding a bug after we actually ship the feature." 

    Why enterprise software teams are choosing Codex over GitHub Copilot for mission-critical development

    The maturation of Codex is central to OpenAI’s broader strategy to conquer the enterprise market, a move essential to justifying its massive valuation and unprecedented compute expenditures. During a press conference, CEO Sam Altman confirmed the strategic shift.

    "The models are there now, and you should expect a huge focus from us on really winning enterprises with amazing products, starting here," Altman said during a private press conference. 

    OpenAI President and Co-founder Greg Brockman immediately added, "And you can see it already with Codex, which I think has been just an incredible success and has really grown super fast." 

    For technical decision-makers, the message is clear. While consumer-facing agents that book dinner reservations are still finding their footing, Codex is a proven enterprise agent delivering substantial ROI today. Companies like Cisco have already rolled out Codex to their engineering organizations, cutting code review times by 50% and reducing project timelines from weeks to days.

    With the new Codex SDK, companies can now embed this agentic power directly into their own custom workflows, such as automating fixes in a CI/CD pipeline or even creating self-evolving applications. During a live demo, an engineer showcased a mobile app that updated its own user interface in real-time based on a natural language prompt, all powered by the embedded Codex SDK. 

    While the launch of an app ecosystem in ChatGPT and the breathtaking visuals of the Sora 2 API rightfully generated headlines, the general availability of Codex marks a more fundamental and immediate transformation. It is the quiet but powerful engine driving the next era of software development, turning the abstract promise of AI-driven productivity into a tangible, deployable reality for businesses today.