• Marble, a startup building artificial intelligence agents for tax professionals, has raised $9 million in seed funding as the accounting industry grapples with a deepening labor shortage and mounting regulatory complexity.

    The round, led by Susa Ventures with participation from MXV Capital and Konrad Capital, positions Marble to compete in a market where AI adoption has lagged significantly behind other knowledge industries like law and software development.

    "When we looked at the economy and asked ourselves where AI is going to transform the way businesses operate, we focused on knowledge industries — specifically businesses with hourly fee-based service models," said Bhavin Shah, Marble's chief executive officer, in an exclusive interview with VentureBeat. "Accounting generates $250 billion in fee-based billing in the US every year. There's a tremendous opportunity to increase efficiency and improve margins for accounting firms."

    The company has launched a free AI-powered tax research tool on its website that converts complex government tax data into accessible, citation-backed answers for practitioners. Marble plans to expand into AI agents that can analyze compliance scenarios and eventually automate portions of tax preparation workflows.

    Marble's backers share Shah's conviction about the market. "Marble is rethinking the accounting system from the ground up. Accounting is one of the biggest — and most overlooked — markets in professional services," Chad Byers, general partner at Susa Ventures, told VentureBeat. "We've known Bhavin from his time as an executive in the Susa portfolio, and have seen firsthand how sharp and execution-driven he is. He and Geordie bring the perfect mix of operational depth and product instinct to a space long overdue for change — and they see the same massive opportunity we do."

    The accounting industry lost 340,000 workers in four years — and replacements aren't coming

    Marble enters a market shaped by structural forces that have fundamentally altered the economics of professional accounting.

    The accounting profession has shed roughly 340,000 workers since 2019, a 17% decline that has left firms scrambling to meet client demands. First-time candidates for the Certified Public Accountant exam dropped 33% between 2016 and 2021, according to AICPA data, and 2022 saw the lowest number of exam takers in 17 years.

    The exodus comes as baby boomers exit en masse. The American Institute of CPAs estimates that approximately 75% of all licensed CPAs reached retirement age by 2019, creating a demographic cliff that the profession has struggled to address.

    “Fewer CPAs are getting certified year over year," Shah said. "The industry is compressing at the same time that there's more work to be done and the tax code is getting more complicated."

    The National Pipeline Advisory Group, a multi-stakeholder body formed by the AICPA in July 2023, released a report identifying the 150-hour education requirement for CPA licensure as a significant barrier to entry. A separate survey by the Center for Audit Quality found that 57% of business majors who chose not to pursue accounting cited the additional credit hours as a deterrent.

    Recent legislative changes reflect the urgency. Ohio now offers alternatives to the 150-hour requirement, signaling that states are willing to experiment with pathways that could reverse enrollment declines.

    Why AI transformed law and software development but left accounting behind

    Despite the profession's challenges, AI adoption in accounting has moved more slowly than in adjacent knowledge industries. Harvey and Legora have raised hundreds of millions to bring AI to legal work. Cursor and other coding assistants have transformed software development. Accounting, by contrast, remains largely dependent on legacy research platforms and manual processes.

    Geordie Konrad, Marble's executive chairman and a co-founder of restaurant software company TouchBistro, attributes the gap to how people conceptualize AI's capabilities.

    “It was obvious to many people that LLMs could do meaningful work by manipulating code for software developers and manipulating words for lawyers. In the accounting industry, LLMs are going to be used as reasoning agents," Konrad said. " That requires a bit more of a two-step analysis to see why it's a big opportunity."

    The technical challenge is substantial. Tax regulations form one of the most complex, interconnected information systems that humans have created — tens of thousands of interlocking rules, guidance documents, and jurisdiction-specific requirements that frequently overlap or conflict.

    "If you want to put AI through its paces and ask how far it's come in replicating cognitive functions, this is an unbelievable playground to work in," Konrad said.

    A dramatic shift: AI adoption among tax and finance teams doubles in one year

    Recent data suggests the accounting profession's stance toward AI is shifting rapidly.

    A 2025 survey from Hanover Research and Avalara found that 84% of finance and tax teams now use AI heavily in their operations, up from 47% in 2024. The 2025 Generative AI in Professional Services Report from Thomson Reuters Institute found that 21% of tax firms already use generative AI technology, with 53% either planning to adopt it or actively considering it.

    Large accounting firms have invested heavily in AI infrastructure. Deloitte has developed generative AI capabilities within its audit platform. BDO announced a $1B investment in AI over the next five years. EY launched an AI platform combining technology with strategy, transactions, and tax services. PwC estimates a complete AI-driven audit solution will launch by 2026.

    But adoption at smaller firms remains uneven. According to Thomson Reuters research, 52% of tax firm respondents who use generative AI rely on open-source tools like ChatGPT rather than industry-specific solutions—a pattern that could shift as purpose-built alternatives emerge.

    Marble's founders believe the hesitance stems not from technophobia but from a lack of compelling options.

    “Firms want to embrace AI," Shah said. “They just haven't seen great software and tooling made for them. That's part of the opportunity — to work with them and build something they're excited to use on a day-to-day basis.”

    Can artificial intelligence rescue accounting's billable-hour business model?

    AI's arrival in accounting raises questions about the profession's billing structure.

    Accounting firms have traditionally generated profits by billing clients for staff time, often at multiples of employee compensation costs. Junior associates performing compliance work represent a significant revenue stream. If AI can automate that work, does it undercut the business model firms depend on?

    Marble's founders argue the opposite. The chronic staffing shortage has already constrained firms' ability to capture available revenue. Advisory and consulting work — higher-margin services that clients actively want — goes undone because practitioners are buried in compliance tasks.

    "Everyone in the industry agrees that an enormous amount of advisory work simply isn't getting done," Konrad said. "Customers want it. Firms want to do it because it's high-margin, great work. But nobody gets to it."

    The 2025 AICPA National Management of an Accounting Practice Survey supports this view. Firms reported a median 6.7% increase in net client fees over the prior year, with growth in audit, assurance, tax services, and client accounting advisory. Net remaining per partner climbed 11.9% from fiscal year 2022 to fiscal year 2024, reaching $252,663.

    The survey also found growing interest in AI adoption, though most firms have yet to allocate formal budgets or develop structured training programs. Continued adoption, the survey suggested, could help expand services and fuel continued growth.

    Accountants won't adopt AI tools they can't trust with sensitive client data

    For AI to succeed in accounting, it must clear a high bar for data security. Accounting firms handle some of the most sensitive financial information in the economy. Practitioners cannot adopt tools that create compliance or confidentiality risks.

    According to Avalara's survey, 63% of respondents cited data security and privacy concerns as the top barriers to automating tax and finance functions. The concern persists throughout the adoption lifecycle, from initial selection through implementation and ongoing use.

    Marble has made security a foundational priority. The company obtained software compliance certification before releasing any product and maintains that data privacy is embedded in its operational culture from day one.

    "Security is at the core of what we are building," Shah said. "Every employee knows that security is critical. It's a part of our onboarding and something that we consider in everything we do."

    From number crunchers to strategic advisors: How AI could reshape accounting careers

    Marble's founders reject the narrative that AI will only take away from accounting jobs. They propose instead that AI will result in accounting jobs becoming more strategic and less characterized by repetitive execution. 

    They draw an analogy to architecture, where computer-aided design replaced laborious manual drafting. Architects did not disappear — they gained tools that let them spend more time on creative design and less on mechanical reproduction.

    "If you take some of the hours-intensive, less creative work out of what being a junior or intermediate accountant is, and you replace it with a role where you're a professional who is being creative, synthesizing ideas, and able to delegate a lot of tasks to AI assistant platform solutions, you end up with an industry that's just a lot more fun to operate in," Konrad said.

    The shift could also improve client outcomes. When accountants spend less time on compliance, they can invest more in the strategic advisory work that clients value.

    "Not only does the work become more enjoyable because of what you can focus on, but that's also what your clients are going to value more from you," Shah said.

    The competitive landscape: Marble faces well-funded rivals and legacy giants

    Marble enters a market with formidable incumbents and well-funded competitors. BlueJ, a global tax research platform, has raised over $100 million. Thomson Reuters, CCH, and Intuit have deep customer relationships built over decades.

    But the founders see opportunity in the transition moment.

    "AI has changed what’s possible in the industry," Shah said. "We are going to work with and integrate with some technology players in the industry and also compete with other players with new products powered by AI. In some cases we are going to forget about the existing technology solution for doing things and go back to the task itself. We have totally new technological capabilities — how would you design something from a blank canvas that works with humans to accomplish that task?""

    The decision to offer a free research tool reflects Marble's go-to-market philosophy. By giving practitioners access without a paywall, the company aims to build trust and demonstrate capability.

    "It allows us to expose a really compelling product that is purpose-built to those that are worried about how to use AI or question how to adopt it.  Now they don’t have to think about purchasing something that is cost-prohibitive when they don't know how to integrate it into their workflow," Shah said.

    The $250 billion question: Can a startup transform how America does its taxes?

    Marble's roadmap extends beyond research. The company plans to develop AI agents capable of analyzing complex tax scenarios, identifying compliance issues, and eventually automating significant portions of compliance workflows — all while keeping practitioners in control.

    The founders frame success not in terms of disruption but rebalancing. Today's tax work skews heavily toward compliance, leaving the strategic advisory services that clients crave — and that generate higher margins—perpetually undone. Marble's bet is that AI can flip that equation.

    "Everyone wants it to look more like compliance is done simpler, and you spend time talking about strategy and planning," Konrad said. "How do we change that blend of compliance versus strategy and planning to strategy and planning first—with compliance as something that has been made dramatically simpler?"

    Whether Marble can execute on that vision remains to be seen. The company faces entrenched competitors, a profession that has historically resisted technological change, and the inherent unpredictability of building AI systems for high-stakes financial work.

    But the founders are betting that the industry's demographic shift will accelerate adoption in ways that previous technology waves could not. With fewer accountants entering the profession each year and client demands only growing, firms may have an increased appetite to embrace tools that let their remaining staff do more.

    "AI is going to change every industry — in some cases in ways that will help business models and in some cases in ways that will challenge them. We believe AI is ultimately going to make accounting firms’ businesses better and more profitable and at the same time end clients will get better services at better prices," Shah said.

    The accounting profession, it seems, is about to find out which side of that equation it lands on.

  • With the AI boom in full swing, perhaps the most critical pieces of hardware in 2025 were the semiconductors powering data centers – whether that’s high-powered GPUs or the latest CPUs, TPUs and DPUs.
  • Despite strong cloud sales and infrastructure gains, the revenue miss and mounting debt have heightened skepticism about the scalability and timing of Oracle's AI investments.
  • Explore the year's insights and tactics for reducing risk amid AI-driven threats and tight budgets.
  • There's no shortage of generative AI benchmarks designed to measure the performance and accuracy of a given model on completing various helpful enterprise tasks — from coding to instruction following to agentic web browsing and tool use. But many of these benchmarks have one major shortcoming: they measure the AI's ability to complete specific problems and requests, not how factual the model is in its outputs — how well it generates objectively correct information tied to real-world data — especially when dealing with information contained in imagery or graphics.

    For industries where accuracy is paramount — legal, finance, and medical — the lack of a standardized way to measure factuality has been a critical blind spot.

    That changes today: Google’s FACTS team and its data science unit Kaggle released the FACTS Benchmark Suite, a comprehensive evaluation framework designed to close this gap.

    The associated research paper reveals a more nuanced definition of the problem, splitting "factuality" into two distinct operational scenarios: "contextual factuality" (grounding responses in provided data) and "world knowledge factuality" (retrieving information from memory or the web).

    While the headline news is Gemini 3 Pro’s top-tier placement, the deeper story for builders is the industry-wide "factuality wall."

    According to the initial results, no model—including Gemini 3 Pro, GPT-5, or Claude 4.5 Opus—managed to crack a 70% accuracy score across the suite of problems. For technical leaders, this is a signal: the era of "trust but verify" is far from over.

    Deconstructing the Benchmark

    The FACTS suite moves beyond simple Q&A. It is composed of four distinct tests, each simulating a different real-world failure mode that developers encounter in production:

    1. Parametric Benchmark (Internal Knowledge): Can the model accurately answer trivia-style questions using only its training data?

    2. Search Benchmark (Tool Use): Can the model effectively use a web search tool to retrieve and synthesize live information?

    3. Multimodal Benchmark (Vision): Can the model accurately interpret charts, diagrams, and images without hallucinating?

    4. Grounding Benchmark v2 (Context): Can the model stick strictly to the provided source text?

    Google has released 3,513 examples to the public, while Kaggle holds a private set to prevent developers from training on the test data—a common issue known as "contamination."

    The Leaderboard: A Game of Inches

    The initial run of the benchmark places Gemini 3 Pro in the lead with a comprehensive FACTS Score of 68.8%, followed by Gemini 2.5 Pro (62.1%) and OpenAI’s GPT-5 (61.8%).However, a closer look at the data reveals where the real battlegrounds are for engineering teams.

    Model

    FACTS Score (Avg)

    Search (RAG Capability)

    Multimodal (Vision)

    Gemini 3 Pro

    68.8

    83.8

    46.1

    Gemini 2.5 Pro

    62.1

    63.9

    46.9

    GPT-5

    61.8

    77.7

    44.1

    Grok 4

    53.6

    75.3

    25.7

    Claude 4.5 Opus

    51.3

    73.2

    39.2

    Data sourced from the FACTS Team release notes.

    For Builders: The "Search" vs. "Parametric" Gap

    For developers building RAG (Retrieval-Augmented Generation) systems, the Search Benchmark is the most critical metric.

    The data shows a massive discrepancy between a model's ability to "know" things (Parametric) and its ability to "find" things (Search). For instance, Gemini 3 Pro scores a high 83.8% on Search tasks but only 76.4% on Parametric tasks.

    This validates the current enterprise architecture standard: do not rely on a model's internal memory for critical facts.

    If you are building an internal knowledge bot, the FACTS results suggest that hooking your model up to a search tool or vector database is not optional—it is the only way to push accuracy toward acceptable production levels.

    The Multimodal Warning

    The most alarming data point for product managers is the performance on Multimodal tasks. The scores here are universally low. Even the category leader, Gemini 2.5 Pro, only hit 46.9% accuracy.

    The benchmark tasks included reading charts, interpreting diagrams, and identifying objects in nature. With less than 50% accuracy across the board, this suggests that Multimodal AI is not yet ready for unsupervised data extraction.

    Bottom line: If your product roadmap involves having an AI automatically scrape data from invoices or interpret financial charts without human-in-the-loop review, you are likely introducing significant error rates into your pipeline.

    Why This Matters for Your Stack

    The FACTS Benchmark is likely to become a standard reference point for procurement. When evaluating models for enterprise use, technical leaders should look beyond the composite score and drill into the specific sub-benchmark that matches their use case:

    • Building a Customer Support Bot? Look at the Grounding score to ensure the bot sticks to your policy documents. (Gemini 2.5 Pro actually outscored Gemini 3 Pro here, 74.2 vs 69.0).

    • Building a Research Assistant? Prioritize Search scores.

    • Building an Image Analysis Tool? Proceed with extreme caution.

    As the FACTS team noted in their release, "All evaluated models achieved an overall accuracy below 70%, leaving considerable headroom for future progress."For now, the message to the industry is clear: The models are getting smarter, but they aren't yet infallible. Design your systems with the assumption that, roughly one-third of the time, the raw model might just be wrong.

  • By leveraging these trends, I&O leaders can enhance flexibility, resilience, and innovation while addressing challenges like geopolitical risks, distributed infrastructure, and the growing impact of generative AI.
  • The companies say their test of quantum-secured communication marks a quantum computing milestone, providing a secure high-speed connection option.
  • Despite integration and supply chain hurdles, recent full-scale deployments and compelling ROI show two-phase cooling moving from niche to mainstream in high-density data centers.
  • Presented by SAP


    When SAP ran a quiet internal experiment to gauge consultant attitudes toward AI, the results were striking. Five teams were asked to validate answers to more than 1,000 business requirements completed by SAP’s AI co-pilot, Joule for Consultants — a workload that would normally take several weeks.

    Four teams were told the analysis had been completed by junior interns fresh out of school. They reviewed the material, found it impressive, and rated the work about 95% accurate.

    The fifth team was told the very same answers had come from AI.

    They rejected almost everything.

    Only when asked to validate each answer one by one did they discover that the AI was, in fact, highly accurate — surfacing detailed insights the consultants had initially dismissed. The overall accuracy? Again, about 95%.

    “The lesson learned here is that we need to be very cautious as we introduce AI — especially in how we communicate with senior consultants about its possibilities and how to integrate it into their workflows,” says Guillermo B. Vazquez Mendez, chief architect, RI business transformation and architecture, SAP America Inc.

    The experiment has since become a revealing starting point for SAP’s push toward the consultant of 2030: a practitioner who is deeply human, enabled by AI, and no longer weighed down by the technical grunt work of the past.

    Overcoming AI skepticism

    Resistance isn’t surprising, Vazquez notes. Consultants with two or three decades of experience carry enormous institutional knowledge — and an understandable degree of caution.

    But AI copilots like Joule for Consultants are not replacing expertise. They’re amplifying it.

    “What Joule really does is make their very expensive time far more effective,” Vazquez says. “It removes the clerical work, so they can focus on turning out high-quality answers in a fraction of the time.”

    He emphasizes this message constantly: “AI is not replacing you. It’s a tool for you. Human oversight is always required. But now, instead of spending your time looking for documentation, you’re gaining significant time and boosting the effectiveness and detail of your answers.”

    The consultant time-shift: from tech execution to business insight

    Historically, consultants spent about 80% of their time understanding technical systems — how processes run, how data flows, how functions execute. Customers, by contrast, spend 80% of their time focused on their business.

    That mismatch is exactly where Joule steps in.

    “There’s a gap there — and the bridge is AI,” Vazquez says. “It flips the time equation, enabling consultants to invest more of their energy in understanding the customer’s industry and business goals. AI takes on the heavy technical lift, so consultants can focus on driving the right business outcomes.”

    Bringing new consultants up to speed

    AI is also transforming how new hires learn.

    “We’re excited to see Joule acting as a bridge between senior consultants, who are adapting more slowly, and interns and new consultants who are already technically savvy,” Vazquez says.

    Junior consultants ramp up faster because Joule helps them operate independently. Seniors, meanwhile, engage where their insight matters most.

    This is also where many consultants learn the fundamentals of today’s AI copilots. Much of the work depends on prompt engineering — for instance, instructing Joule to act as a senior chief technology architect specializing in finance and SAP S/4HANA 2023, then asking it to analyze business requirements and deliver the output as tables or PowerPoint slides.

    Once they grasp how to frame prompts, consultants consistently get higher-quality, more structured answers.

    New architects are also able to communicate more clearly with their more experienced counterparts. They know what they don’t know and can ask targeted questions, which makes mentorship far smoother. It’s created a real synergy, Vazquez adds — senior consultants see how quickly new hires are adapting and learning with AI, and that momentum encourages them to keep pace and adopt the technology themselves.

    Looking ahead to the future of AI copilots

    “We’re still in the baby steps of AI — we’re toddlers,” Vazquez says. “Right now, copilots depend on prompt engineering to get good answers. The better you prompt, the better the answer you get.”

    But that represents only the earliest phase of what these systems will eventually do. As copilots mature, they’ll move beyond responding to prompts and start interpreting entire business processes — understanding the sequence of steps, identifying where human intervention is needed, and spotting where an AI agent could take over. That shift is what leads directly into agentic AI.

    SAP’s depth of process knowledge is what makes that evolution possible. The company has mapped more than 3,500 business processes across industries — a repository Vazquez calls “some of the most valuable, rigorously tested processes developed in the last 50 years.” Every day, SAP systems support roughly $7.3 trillion in global commerce, giving these emerging AI agents a rich foundation to navigate and reason over.

    “With that level of process insight and data, we can take a real leap forward,” he says, “equipping our consultants with agentic AI that can solve complex challenges and push us toward increasingly autonomous systems.”


    Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.

  • French AI startup Mistral has weathered a rocky period of public questioning over the last year to emerge, now here in December 2025, with new, crowd-pleasing models for enterprise and indie developers.

    Just days after releasing its powerful open source, general purpose Mistral 3 LLM family for edge devices and local hardware, the company returned today to debut Devstral 2.

    The release includes a new pair of models optimized for software engineering tasks — again, with one small enough to run on a single laptop, offline and privately — alongside Mistral Vibe, a command-line interface (CLI) agent designed to allow developers to call the models up directly within their terminal environments.

    The models are fast, lean, and open—at least in theory. But the real story lies not just in the benchmarks, but in how Mistral is packaging this capability: one model fully free, another conditionally so, and a terminal interface built to scale with either.

    It’s an attempt not just to match proprietary systems like Claude and GPT-4 in performance, but to compete with them on developer experience—and to do so while holding onto the flag of open-source.

    Both models are available now for free for a limited time via Mistral’s API and Hugging Face.

    The full Devstral 2 model is supported out-of-the-box in the community inference provider vLLM and on the open source agentic coding platform Kilo Code.

    A Coding Model Meant to Drive

    At the top of the announcement is Devstral 2, a 123-billion parameter dense transformer with a 256K-token context window, engineered specifically for agentic software development.

    Mistral says the model achieves 72.2% on SWE-bench Verified, a benchmark designed to evaluate long-context software engineering tasks in real-world repositories.

    The smaller sibling, Devstral Small 2, weighs in at 24B parameters, with the same long context window and a performance of 68.0% on SWE-bench.

    On paper, that makes it the strongest open-weight model of its size, even outscoring many 70B-class competitors.

    But the performance story isn’t just about raw percentages. Mistral is betting that efficient intelligence beats scale, and has made much of the fact that Devstral 2 is:

    • 5× smaller than DeepSeek V3.2

    • 8× smaller than Kimi K2

    • Yet still matches or surpasses them on key software reasoning benchmarks.

    Human evaluations back this up. In side-by-side comparisons:

    • Devstral 2 beat DeepSeek V3.2 in 42.8% of tasks, losing only 28.6%.

    • Against Claude Sonnet 4.5, it lost more often (53.1%)—a reminder that while the gap is narrowing, closed models still lead in overall preference.

    Still, for an open-weight model, these results place Devstral 2 at the frontier of what’s currently available to run and modify independently.

    Vibe CLI: A Terminal-Native Agent

    Alongside the models, Mistral released Vibe CLI, a command-line assistant that integrates directly with Devstral models. It’s not an IDE plugin or a ChatGPT-style code explainer. It’s a native interface designed for project-wide code understanding and orchestration, built to live inside the developer’s actual workflow.

    Vibe brings a surprising degree of intelligence to the terminal:

    • It reads your file tree and Git status to understand project scope.

    • It lets you reference files with @, run shell commands with !, and toggle behavior with slash commands.

    • It orchestrates changes across multiple files, tracks dependencies, retries failed executions, and can even refactor at architectural scale.

    Unlike most developer agents, which simulate a REPL from within a chat UI, Vibe starts with the shell and pulls intelligence in from there. It’s programmable, scriptable, and themeable. And it’s released under the Apache 2.0 license, meaning it’s truly free to use—in commercial settings, internal tools, or open-source extensions.

    Licensing Structure: Open-ish — With Revenue Limitations

    At first glance, Mistral’s licensing approach appears straightforward: the models are open-weight and publicly available. But a closer look reveals a line drawn through the middle of the release, with different rules for different users.

    Devstral Small 2, the 24-billion parameter variant, is covered under a standard, enterprise- and developer-friendly Apache 2.0 license.

    That’s a gold standard in open-source: no revenue restrictions, no fine print, no need to check with legal. Enterprises can use it in production, embed it into products, and redistribute fine-tuned versions without asking for permission.

    Devstral 2, the flagship 123B model, is released under what Mistral calls a “modified MIT license.” That phrase sounds innocuous, but the modification introduces a critical limitation: any company making more than $20 million in monthly revenue cannot use the model at all—not even internally—without securing a separate commercial license from Mistral.

    “You are not authorized to exercise any rights under this license if the global consolidated monthly revenue of your company […] exceeds $20 million,” the license reads.

    The clause applies not only to the base model, but to derivatives, fine-tuned versions, and redistributed variants, regardless of who hosts them. In effect, it means that while the weights are “open,” their use is gated for large enterprises—unless they’re willing to engage with Mistral’s sales team or use the hosted API at metered pricing.

    To draw an analogy: Apache 2.0 is like a public library—you walk in, borrow the book, and use it however you need. Mistral’s modified MIT license is more like a corporate co-working space that’s free for freelancers but charges rent once your company hits a certain size.

    Weighing Devstral Small 2 for Enterprise Use

    This division raises an obvious question for larger companies: can Devstral Small 2 with its more permissive and unrestricted Apache 2.0 licensing serve as a viable alternative for medium-to-large enterprises?

    The answer depends on context. Devstral Small 2 scores 68.0% on SWE-bench, significantly ahead of many larger open models, and remains deployable on single-GPU or CPU-only setups. For teams focused on:

    • internal tooling,

    • on-prem deployment,

    • low-latency edge inference,

      …it offers a rare combination of legality, performance, and convenience.

    But the performance gap from Devstral 2 is real. For multi-agent setups, deep monorepo refactoring, or long-context code analysis, that 4-point benchmark delta may understate the actual experience difference.

    For most enterprises, Devstral Small 2 will serve either as a low-friction way to prototype—or as a pragmatic bridge until licensing for Devstral 2 becomes feasible. It is not a drop-in replacement for the flagship, but it may be “good enough” in specific production slices, particularly when paired with Vibe CLI.

    But because Devstral Small 2 can be run entirely offline — including on a single GPU machine or a sufficiently specced laptop — it unlocks a critical use case for developers and teams operating in tightly controlled environments.

    Whether you’re a solo indie building tools on the go, or part of a company with strict data governance or compliance mandates, the ability to run a performant, long-context coding model without ever hitting the internet is a powerful differentiator. No cloud calls, no third-party telemetry, no risk of data leakage — just local inference with full visibility and control.

    This matters in industries like finance, healthcare, defense, and advanced manufacturing, where data often cannot leave the network perimeter. But it’s just as useful for developers who prefer autonomy over vendor lock-in — or who want their tools to work the same on a plane, in the field, or inside an air-gapped lab. In a market where most top-tier code models are delivered as API-only SaaS products, Devstral Small 2 offers a rare level of portability, privacy, and ownership.

    In that sense, Mistral isn’t just offering open models—they’re offering multiple paths to adoption, depending on your scale, compliance posture, and willingness to engage.

    Integration, Infrastructure, and Access

    From a technical standpoint, Mistral’s models are built for deployment. Devstral 2 requires a minimum of 4× H100-class GPUs, and is already available on build.nvidia.com.

    Devstral Small 2 can run on a single GPU or CPU such as those in a standard laptop, making it accessible to solo developers and embedded teams alike.

    Both models support quantized FP4 and FP8 weights, and are compatible with vLLM for scalable inference. Fine-tuning is supported out of the box.

    API pricing—after the free introductory window—follows a token-based structure:

    • Devstral 2: $0.40 per million input tokens / $2.00 for output

    • Devstral Small 2: $0.10 input / $0.30 output

    That pricing sits just below OpenAI’s GPT-4 Turbo, and well below Anthropic’s Claude Sonnet at comparable performance levels.

    Developer Reception: Ground-Level Buzz

    On X (formerly Twitter), developers reacted quickly with a wave of positive reception, with Hugging Face's Head of Product Victor Mustar asking if the small, Apache 2.0 licensed variant was the "new local coding king," i.e., the one developers could use to run on their laptops directly and privately, without an internet connection:

    Another popular AI news and rumors account, TestingCatalogNews, posted that it was "SOTTA in coding," or "State Of The Tiny Art"

    Another user, @xlr8harder, took issue with the custom licensing terms for Devstral 2, writing "calling the Devstral 2 license 'modified MIT' is misleading at best. It’s a proprietary license with MIT-like attribution requirements."

    While the tone was critical, it reflected some attention Mistral’s license structuring was receiving, particularly among developers familiar with open-use norms.

    Strategic Context: From Codestral to Devstral and Mistral 3

    Mistral’s steady push into software development tools didn’t start with Devstral 2—it began in May 2024 with Codestral, the company’s first code-focused large language model. A 22-billion parameter system trained on more than 80 programming languages, Codestral was designed for use in developer environments ranging from basic autocompletions to full function generation. The model launched under a non-commercial license but still outperformed heavyweight competitors like CodeLlama 70B and Deepseek Coder 33B in early benchmarks such as HumanEval and RepoBench.

    Codestral’s release marked Mistral’s first move into the competitive coding-model space, but it also established a now-familiar pattern: technically lean models with surprisingly strong results, a wide context window, and licensing choices that invited developer experimentation. Industry partners including JetBrains, LlamaIndex, and LangChain quickly began integrating the model into their workflows, citing its speed and tool compatibility as key differentiators.

    One year later, the company followed up with Devstral, a 24B model purpose-built for “agentic” behavior—handling long-range reasoning, file navigation, and autonomous code modification. Released in partnership with All Hands AI and licensed under Apache 2.0, Devstral was notable not just for its portability (it could run on a MacBook or RTX 4090), but for its performance: it beat out several closed models on SWE-Bench Verified, a benchmark of 500 real-world GitHub issues.

    Then came Mistral 3, announced in December 2025 as a portfolio of 10 open-weight models targeting everything from drones and smartphones to cloud infrastructure. This suite included both high-end models like Mistral Large 3 (a MoE system with 41 active parameters and 256K context) and lightweight “Ministral” variants that could run on 4GB of VRAM. All were licensed under Apache 2.0, reinforcing Mistral’s commitment to flexible, edge-friendly deployment.

    Mistral 3 positioned the company not as a direct competitor to frontier models like GPT-5 or Gemini 3, but as a developer-first platform for customized, localized AI systems. Co-founder Guillaume Lample described the vision as “distributed intelligence”—many smaller systems tuned for specific tasks and running outside centralized infrastructure. “In more than 90% of cases, a small model can do the job,” he told VentureBeat. “It doesn’t have to be a model with hundreds of billions of parameters.”

    That broader strategy helps explain the significance of Devstral 2. It’s not a one-off release but a continuation of Mistral’s long-running commitment to code agents, local-first deployment, and open-weight availability—an ecosystem that began with Codestral, matured through Devstral, and scaled up with Mistral 3. Devstral 2, in this framing, is not just a model. It’s the next version of a playbook that’s been unfolding in public for over a year.

    Final Thoughts (For Now): A Fork in the Road

    With Devstral 2, Devstral Small 2, and Vibe CLI, Mistral AI has drawn a clear map for developers and companies alike. The tools are fast, capable, and thoughtfully integrated. But they also present a choice—not just in architecture, but in how and where you’re allowed to use them.

    If you’re an individual developer, small startup, or open-source maintainer, this is one of the most powerful AI systems you can freely run today.

    If you’re a Fortune 500 engineering lead, you’ll need to either talk to Mistral—or settle for the smaller model and make it work.

    In a market increasingly dominated by black-box models and SaaS lock-ins, Mistral’s offer is still a breath of fresh air. Just read the fine print before you start building.