• Railway, a San Francisco-based cloud platform that has quietly amassed two million developers without spending a dollar on marketing, announced Thursday that it raised $100 million in a Series B funding round, as surging demand for artificial intelligence applications exposes the limitations of legacy cloud infrastructure.

    TQ Ventures led the round, with participation from FPV Ventures, Redpoint, and Unusual Ventures. The investment values Railway as one of the most significant infrastructure startups to emerge during the AI boom, capitalizing on developer frustration with the complexity and cost of traditional platforms like Amazon Web Services and Google Cloud.

    "As AI models get better at writing code, more and more people are asking the age-old question: where, and how, do I run my applications?" said Jake Cooper, Railway's 28-year-old founder and chief executive, in an exclusive interview with VentureBeat. "The last generation of cloud primitives were slow and outdated, and now with AI moving everything faster, teams simply can't keep up."

    The funding is a dramatic acceleration for a company that has charted an unconventional path through the cloud computing industry. Railway raised just $24 million in total before this round, including a $20 million Series A from Redpoint in 2022. The company now processes more than 10 million deployments monthly and handles over one trillion requests through its edge network — metrics that rival far larger and better-funded competitors.

    Why three-minute deploy times have become unacceptable in the age of AI coding assistants

    Railway's pitch rests on a simple observation: the tools developers use to deploy and manage software were designed for a slower era. A standard build-and-deploy cycle using Terraform, the industry-standard infrastructure tool, takes two to three minutes. That delay, once tolerable, has become a critical bottleneck as AI coding assistants like Claude, ChatGPT, and Cursor can generate working code in seconds.

    "When godly intelligence is on tap and can solve any problem in three seconds, those amalgamations of systems become bottlenecks," Cooper told VentureBeat. "What was really cool for humans to deploy in 10 seconds or less is now table stakes for agents."

    The company claims its platform delivers deployments in under one second — fast enough to keep pace with AI-generated code. Customers report a tenfold increase in developer velocity and up to 65 percent cost savings compared to traditional cloud providers.

    These numbers come directly from enterprise clients, not internal benchmarks. Daniel Lobaton, chief technology officer at G2X, a platform serving 100,000 federal contractors, measured deployment speed improvements of seven times faster and an 87 percent cost reduction after migrating to Railway. His infrastructure bill dropped from $15,000 per month to approximately $1,000.

    "The work that used to take me a week on our previous infrastructure, I can do in Railway in like a day," Lobaton said. "If I want to spin up a new service and test different architectures, it would take so long on our old setup. In Railway I can launch six services in two minutes."

    Inside the controversial decision to abandon Google Cloud and build data centers from scratch

    What distinguishes Railway from competitors like Render and Fly.io is the depth of its vertical integration. In 2024, the company made the unusual decision to abandon Google Cloud entirely and build its own data centers, a move that echoes the famous Alan Kay maxim: "People who are really serious about software should make their own hardware."

    "We wanted to design hardware in a way where we could build a differentiated experience," Cooper said. "Having full control over the network, compute, and storage layers lets us do really fast build and deploy loops, the kind that allows us to move at 'agentic speed' while staying 100 percent the smoothest ride in town."

    The approach paid dividends during recent widespread outages that affected major cloud providers — Railway remained online throughout.

    This soup-to-nuts control enables pricing that undercuts the hyperscalers by roughly 50 percent and newer cloud startups by three to four times. Railway charges by the second for actual compute usage: $0.00000386 per gigabyte-second of memory, $0.00000772 per vCPU-second, and $0.00000006 per gigabyte-second of storage. There are no charges for idle virtual machines — a stark contrast to the traditional cloud model where customers pay for provisioned capacity whether they use it or not.

    "The conventional wisdom is that the big guys have economies of scale to offer better pricing," Cooper noted. "But when they're charging for VMs that usually sit idle in the cloud, and we've purpose-built everything to fit much more density on these machines, you have a big opportunity."

    How 30 employees built a platform generating tens of millions in annual revenue

    Railway has achieved its scale with a team of just 30 employees generating tens of millions in annual revenue — a ratio of revenue per employee that would be exceptional even for established software companies. The company grew revenue 3.5 times last year and continues to expand at 15 percent month-over-month.

    Cooper emphasized that the fundraise was strategic rather than necessary. "We're default alive; there's no reason for us to raise money," he said. "We raised because we see a massive opportunity to accelerate, not because we needed to survive."

    The company hired its first salesperson only last year and employs just two solutions engineers. Nearly all of Railway's two million users discovered the platform through word of mouth — developers telling other developers about a tool that actually works.

    "We basically did the standard engineering thing: if you build it, they will come," Cooper recalled. "And to some degree, they came."

    From side projects to Fortune 500 deployments: Railway's unlikely corporate expansion

    Despite its grassroots developer community, Railway has made significant inroads into large organizations. The company claims that 31 percent of Fortune 500 companies now use its platform, though deployments range from company-wide infrastructure to individual team projects.

    Notable customers include Bilt, the loyalty program company; Intuit's GoCo subsidiary; TripAdvisor's Cruise Critic; and MGM Resorts. Kernel, a Y Combinator-backed startup providing AI infrastructure to over 1,000 companies, runs its entire customer-facing system on Railway for $444 per month.

    "At my previous company Clever, which sold for $500 million, I had six full-time engineers just managing AWS," said Rafael Garcia, Kernel's chief technology officer. "Now I have six engineers total, and they all focus on product. Railway is exactly the tool I wish I had in 2012."

    For enterprise customers, Railway offers security certifications including SOC 2 Type 2 compliance and HIPAA readiness, with business associate agreements available upon request. The platform provides single sign-on authentication, comprehensive audit logs, and the option to deploy within a customer's existing cloud environment through a "bring your own cloud" configuration.

    Enterprise pricing starts at custom levels, with specific add-ons for extended log retention ($200 monthly), HIPAA BAAs ($1,000), enterprise support with SLOs ($2,000), and dedicated virtual machines ($10,000).

    The startup's bold strategy to take on Amazon, Google, and a new generation of cloud rivals

    Railway enters a crowded market that includes not only the hyperscale cloud providers—Amazon Web Services, Microsoft Azure, and Google Cloud Platform—but also a growing cohort of developer-focused platforms like Vercel, Render, Fly.io, and Heroku.

    Cooper argues that Railway's competitors fall into two camps, neither of which has fully committed to the new infrastructure model that AI demands.

    "The hyperscalers have two competing systems, and they haven't gone all-in on the new model because their legacy revenue stream is still printing money," he observed. "They have this mammoth pool of cash coming from people who provision a VM, use maybe 10 percent of it, and still pay for the whole thing. To what end are they actually interested in going all the way in on a new experience if they don't really need to?"

    Against startup competitors, Railway differentiates by covering the full infrastructure stack. "We're not just containers; we've got VM primitives, stateful storage, virtual private networking, automated load balancing," Cooper said. "And we wrap all of this in an absurdly easy-to-use UI, with agentic primitives so agents can move 1,000 times faster."

    The platform supports databases including PostgreSQL, MySQL, MongoDB, and Redis; provides up to 256 terabytes of persistent storage with over 100,000 input/output operations per second; and enables deployment to four global regions spanning the United States, Europe, and Southeast Asia. Enterprise customers can scale to 112 vCPUs and 2 terabytes of RAM per service.

    Why investors are betting that AI will create a thousand times more software than exists today

    Railway's fundraise reflects broader investor enthusiasm for companies positioned to benefit from the AI coding revolution. As tools like GitHub Copilot, Cursor, and Claude become standard fixtures in developer workflows, the volume of code being written — and the infrastructure needed to run it — is expanding dramatically.

    "The amount of software that's going to come online over the next five years is unfathomable compared to what existed before — we're talking a thousand times more software," Cooper predicted. "All of that has to run somewhere."

    The company has already integrated directly with AI systems, building what Cooper calls "loops where Claude can hook in, call deployments, and analyze infrastructure automatically." Railway released a Model Context Protocol server in August 2025 that allows AI coding agents to deploy applications and manage infrastructure directly from code editors.

    "The notion of a developer is melting before our eyes," Cooper said. "You don't have to be an engineer to engineer things anymore — you just need critical thinking and the ability to analyze things in a systems capacity."

    What Railway plans to do with $100 million and zero marketing experience

    Railway plans to use the new capital to expand its global data center footprint, grow its team beyond 30 employees, and build what Cooper described as a proper go-to-market operation for the first time in the company's five-year history.

    "One of my mentors said you raise money when you can change the trajectory of the business," Cooper explained. "We've built all the required substrate to scale indefinitely; what's been holding us back is simply talking about it. 2026 is the year we play on the world stage."

    The company's investor roster reads like a who's who of developer infrastructure. Angel investors include Tom Preston-Werner, co-founder of GitHub; Guillermo Rauch, chief executive of Vercel; Spencer Kimball, chief executive of Cockroach Labs; Olivier Pomel, chief executive of Datadog; and Jori Lallo, co-founder of Linear.

    The timing of Railway's expansion coincides with what many in Silicon Valley view as a fundamental shift in how software gets made. Coding assistants are no longer experimental curiosities — they have become essential tools that millions of developers rely on daily. Each line of AI-generated code needs somewhere to run, and the incumbents, by Cooper's telling, are too wedded to their existing business models to fully capitalize on the moment.

    Whether Railway can translate developer enthusiasm into sustained enterprise adoption remains an open question. The cloud infrastructure market is littered with promising startups that failed to break the grip of Amazon, Microsoft, and Google. But Cooper, who previously worked as a software engineer at Wolfram Alpha, Bloomberg, and Uber before founding Railway in 2020, seems unfazed by the scale of his ambition.

    "In five years, Railway [will be] the place where software gets created and evolved, period," he said. "Deploy instantly, scale infinitely, with zero friction. That's the prize worth playing for, and there's no bigger one on offer."

    For a company that built a $100 million business by doing the opposite of what conventional startup wisdom dictates — no marketing, no sales team, no venture hype—the real test begins now. Railway spent five years proving that developers would find a better mousetrap on their own. The next five will determine whether the rest of the world is ready to get on board.

  • The artificial intelligence coding revolution comes with a catch: it's expensive.

    Claude Code, Anthropic's terminal-based AI agent that can write, debug, and deploy code autonomously, has captured the imagination of software developers worldwide. But its pricing — ranging from $20 to $200 per month depending on usage — has sparked a growing rebellion among the very programmers it aims to serve.

    Now, a free alternative is gaining traction. Goose, an open-source AI agent developed by Block (the financial technology company formerly known as Square), offers nearly identical functionality to Claude Code but runs entirely on a user's local machine. No subscription fees. No cloud dependency. No rate limits that reset every five hours.

    "Your data stays with you, period," said Parth Sareen, a software engineer who demonstrated the tool during a recent livestream. The comment captures the core appeal: Goose gives developers complete control over their AI-powered workflow, including the ability to work offline — even on an airplane.

    The project has exploded in popularity. Goose now boasts more than 26,100 stars on GitHub, the code-sharing platform, with 362 contributors and 102 releases since its launch. The latest version, 1.20.1, shipped on January 19, 2026, reflecting a development pace that rivals commercial products.

    For developers frustrated by Claude Code's pricing structure and usage caps, Goose represents something increasingly rare in the AI industry: a genuinely free, no-strings-attached option for serious work.

    Anthropic's new rate limits spark a developer revolt

    To understand why Goose matters, you need to understand the Claude Code pricing controversy.

    Anthropic, the San Francisco artificial intelligence company founded by former OpenAI executives, offers Claude Code as part of its subscription tiers. The free plan provides no access whatsoever. The Pro plan, at $17 per month with annual billing (or $20 monthly), limits users to just 10 to 40 prompts every five hours — a constraint that serious developers exhaust within minutes of intensive work.

    The Max plans, at $100 and $200 per month, offer more headroom: 50 to 200 prompts and 200 to 800 prompts respectively, plus access to Anthropic's most powerful model, Claude 4.5 Opus. But even these premium tiers come with restrictions that have inflamed the developer community.

    In late July, Anthropic announced new weekly rate limits. Under the system, Pro users receive 40 to 80 hours of Sonnet 4 usage per week. Max users at the $200 tier get 240 to 480 hours of Sonnet 4, plus 24 to 40 hours of Opus 4. Nearly five months later, the frustration has not subsided.

    The problem? Those "hours" are not actual hours. They represent token-based limits that vary wildly depending on codebase size, conversation length, and the complexity of the code being processed. Independent analysis suggests the actual per-session limits translate to roughly 44,000 tokens for Pro users and 220,000 tokens for the $200 Max plan.

    "It's confusing and vague," one developer wrote in a widely shared analysis. "When they say '24-40 hours of Opus 4,' that doesn't really tell you anything useful about what you're actually getting."

    The backlash on Reddit and developer forums has been fierce. Some users report hitting their daily limits within 30 minutes of intensive coding. Others have canceled their subscriptions entirely, calling the new restrictions "a joke" and "unusable for real work."

    Anthropic has defended the changes, stating that the limits affect fewer than five percent of users and target people running Claude Code "continuously in the background, 24/7." But the company has not clarified whether that figure refers to five percent of Max subscribers or five percent of all users — a distinction that matters enormously.

    How Block built a free AI coding agent that works offline

    Goose takes a radically different approach to the same problem.

    Built by Block, the payments company led by Jack Dorsey, Goose is what engineers call an "on-machine AI agent." Unlike Claude Code, which sends your queries to Anthropic's servers for processing, Goose can run entirely on your local computer using open-source language models that you download and control yourself.

    The project's documentation describes it as going "beyond code suggestions" to "install, execute, edit, and test with any LLM." That last phrase — "any LLM" — is the key differentiator. Goose is model-agnostic by design.

    You can connect Goose to Anthropic's Claude models if you have API access. You can use OpenAI's GPT-5 or Google's Gemini. You can route it through services like Groq or OpenRouter. Or — and this is where things get interesting — you can run it entirely locally using tools like Ollama, which let you download and execute open-source models on your own hardware.

    The practical implications are significant. With a local setup, there are no subscription fees, no usage caps, no rate limits, and no concerns about your code being sent to external servers. Your conversations with the AI never leave your machine.

    "I use Ollama all the time on planes — it's a lot of fun!" Sareen noted during a demonstration, highlighting how local models free developers from the constraints of internet connectivity.

    What Goose can do that traditional code assistants can't

    Goose operates as a command-line tool or desktop application that can autonomously perform complex development tasks. It can build entire projects from scratch, write and execute code, debug failures, orchestrate workflows across multiple files, and interact with external APIs — all without constant human oversight.

    The architecture relies on what the AI industry calls "tool calling" or "function calling" — the ability for a language model to request specific actions from external systems. When you ask Goose to create a new file, run a test suite, or check the status of a GitHub pull request, it doesn't just generate text describing what should happen. It actually executes those operations.

    This capability depends heavily on the underlying language model. Claude 4 models from Anthropic currently perform best at tool calling, according to the Berkeley Function-Calling Leaderboard, which ranks models on their ability to translate natural language requests into executable code and system commands.

    But newer open-source models are catching up quickly. Goose's documentation highlights several options with strong tool-calling support: Meta's Llama series, Alibaba's Qwen models, Google's Gemma variants, and DeepSeek's reasoning-focused architectures.

    The tool also integrates with the Model Context Protocol, or MCP, an emerging standard for connecting AI agents to external services. Through MCP, Goose can access databases, search engines, file systems, and third-party APIs — extending its capabilities far beyond what the base language model provides.

    Setting Up Goose with a Local Model

    For developers interested in a completely free, privacy-preserving setup, the process involves three main components: Goose itself, Ollama (a tool for running open-source models locally), and a compatible language model.

    Step 1: Install Ollama

    Ollama is an open-source project that dramatically simplifies the process of running large language models on personal hardware. It handles the complex work of downloading, optimizing, and serving models through a simple interface.

    Download and install Ollama from ollama.com. Once installed, you can pull models with a single command. For coding tasks, Qwen 2.5 offers strong tool-calling support:

    ollama run qwen2.5

    The model downloads automatically and begins running on your machine.

    Step 2: Install Goose

    Goose is available as both a desktop application and a command-line interface. The desktop version provides a more visual experience, while the CLI appeals to developers who prefer working entirely in the terminal.

    Installation instructions vary by operating system but generally involve downloading from Goose's GitHub releases page or using a package manager. Block provides pre-built binaries for macOS (both Intel and Apple Silicon), Windows, and Linux.

    Step 3: Configure the Connection

    In Goose Desktop, navigate to Settings, then Configure Provider, and select Ollama. Confirm that the API Host is set to http://localhost:11434 (Ollama's default port) and click Submit.

    For the command-line version, run goose configure, select "Configure Providers," choose Ollama, and enter the model name when prompted.

    That's it. Goose is now connected to a language model running entirely on your hardware, ready to execute complex coding tasks without any subscription fees or external dependencies.

    The RAM, processing power, and trade-offs you should know about

    The obvious question: what kind of computer do you need?

    Running large language models locally requires substantially more computational resources than typical software. The key constraint is memory — specifically, RAM on most systems, or VRAM if using a dedicated graphics card for acceleration.

    Block's documentation suggests that 32 gigabytes of RAM provides "a solid baseline for larger models and outputs." For Mac users, this means the computer's unified memory is the primary bottleneck. For Windows and Linux users with discrete NVIDIA graphics cards, GPU memory (VRAM) matters more for acceleration.

    But you don't necessarily need expensive hardware to get started. Smaller models with fewer parameters run on much more modest systems. Qwen 2.5, for instance, comes in multiple sizes, and the smaller variants can operate effectively on machines with 16 gigabytes of RAM.

    "You don't need to run the largest models to get excellent results," Sareen emphasized. The practical recommendation: start with a smaller model to test your workflow, then scale up as needed.

    For context, Apple's entry-level MacBook Air with 8 gigabytes of RAM would struggle with most capable coding models. But a MacBook Pro with 32 gigabytes — increasingly common among professional developers — handles them comfortably.

    Why keeping your code off the cloud matters more than ever

    Goose with a local LLM is not a perfect substitute for Claude Code. The comparison involves real trade-offs that developers should understand.

    Model Quality: Claude 4.5 Opus, Anthropic's flagship model, remains arguably the most capable AI for software engineering tasks. It excels at understanding complex codebases, following nuanced instructions, and producing high-quality code on the first attempt. Open-source models have improved dramatically, but a gap persists — particularly for the most challenging tasks.

    One developer who switched to the $200 Claude Code plan described the difference bluntly: "When I say 'make this look modern,' Opus knows what I mean. Other models give me Bootstrap circa 2015."

    Context Window: Claude Sonnet 4.5, accessible through the API, offers a massive one-million-token context window — enough to load entire large codebases without chunking or context management issues. Most local models are limited to 4,096 or 8,192 tokens by default, though many can be configured for longer contexts at the cost of increased memory usage and slower processing.

    Speed: Cloud-based services like Claude Code run on dedicated server hardware optimized for AI inference. Local models, running on consumer laptops, typically process requests more slowly. The difference matters for iterative workflows where you're making rapid changes and waiting for AI feedback.

    Tooling Maturity: Claude Code benefits from Anthropic's dedicated engineering resources. Features like prompt caching (which can reduce costs by up to 90 percent for repeated contexts) and structured outputs are polished and well-documented. Goose, while actively developed with 102 releases to date, relies on community contributions and may lack equivalent refinement in specific areas.

    How Goose stacks up against Cursor, GitHub Copilot, and the paid AI coding market

    Goose enters a crowded market of AI coding tools, but occupies a distinctive position.

    Cursor, a popular AI-enhanced code editor, charges $20 per month for its Pro tier and $200 for Ultra—pricing that mirrors Claude Code's Max plans. Cursor provides approximately 4,500 Sonnet 4 requests per month at the Ultra level, a substantially different allocation model than Claude Code's hourly resets.

    Cline, Roo Code, and similar open-source projects offer AI coding assistance but with varying levels of autonomy and tool integration. Many focus on code completion rather than the agentic task execution that defines Goose and Claude Code.

    Amazon's CodeWhisperer, GitHub Copilot, and enterprise offerings from major cloud providers target large organizations with complex procurement processes and dedicated budgets. They are less relevant to individual developers and small teams seeking lightweight, flexible tools.

    Goose's combination of genuine autonomy, model agnosticism, local operation, and zero cost creates a unique value proposition. The tool is not trying to compete with commercial offerings on polish or model quality. It's competing on freedom — both financial and architectural.

    The $200-a-month era for AI coding tools may be ending

    The AI coding tools market is evolving quickly. Open-source models are improving at a pace that continually narrows the gap with proprietary alternatives. Moonshot AI's Kimi K2 and z.ai's GLM 4.5 now benchmark near Claude Sonnet 4 levels — and they're freely available.

    If this trajectory continues, the quality advantage that justifies Claude Code's premium pricing may erode. Anthropic would then face pressure to compete on features, user experience, and integration rather than raw model capability.

    For now, developers face a clear choice. Those who need the absolute best model quality, who can afford premium pricing, and who accept usage restrictions may prefer Claude Code. Those who prioritize cost, privacy, offline access, and flexibility have a genuine alternative in Goose.

    The fact that a $200-per-month commercial product has a zero-dollar open-source competitor with comparable core functionality is itself remarkable. It reflects both the maturation of open-source AI infrastructure and the appetite among developers for tools that respect their autonomy.

    Goose is not perfect. It requires more technical setup than commercial alternatives. It depends on hardware resources that not every developer possesses. Its model options, while improving rapidly, still trail the best proprietary offerings on complex tasks.

    But for a growing community of developers, those limitations are acceptable trade-offs for something increasingly rare in the AI landscape: a tool that truly belongs to them.


    Goose is available for download at github.com/block/goose. Ollama is available at ollama.com. Both projects are free and open source.

  • João Freitas is GM and VP of engineering for AI and automation at PagerDuty

    As AI use continues to evolve in large organizations, leaders are increasingly seeking the next development that will yield major ROI. The latest wave of this ongoing trend is the adoption of AI agents. However, as with any new technology, organizations must ensure they adopt AI agents in a responsible way that allows them to facilitate both speed and security. 

    More than half of organizations have already deployed AI agents to some extent, with more expecting to follow suit in the next two years. But many early adopters are now reevaluating their approach. Four-in-10 tech leaders regret not establishing a stronger governance foundation from the start, which suggests they adopted AI rapidly, but with margin to improve on policies, rules and best practices designed to ensure the responsible, ethical and legal development and use of AI.

    As AI adoption accelerates, organizations must find the right balance between their exposure risk and the implementation of guardrails to ensure AI use is secure.

    Where do AI agents create potential risks?

    There are three principal areas of consideration for safer AI adoption.

    The first is shadow AI, when employees use unauthorized AI tools without express permission, bypassing approved tools and processes. IT should create necessary processes for experimentation and innovation to introduce more efficient ways of working with AI. While shadow AI has existed as long as AI tools themselves, AI agent autonomy makes it easier for unsanctioned tools to operate outside the purview of IT, which can introduce fresh security risks.

    Secondly, organizations must close gaps in AI ownership and accountability to prepare for incidents or processes gone wrong. The strength of AI agents lies in their autonomy. However, if agents act in unexpected ways, teams must be able to determine who is responsible for addressing any issues.

    The third risk arises when there is a lack of explainability for actions AI agents have taken. AI agents are goal-oriented, but how they accomplish their goals can be unclear. AI agents must have explainable logic underlying their actions so that engineers can trace and, if needed, roll back actions that may cause issues with existing systems.

    While none of these risks should delay adoption, they will help organizations better ensure their security.

    The three guidelines for responsible AI agent adoption

    Once organizations have identified the risks AI agents can pose, they must implement guidelines and guardrails to ensure safe usage. By following these three steps, organizations can minimize these risks.

    1: Make human oversight the default 

    AI agency continues to evolve at a fast pace. However, we still need human oversight when AI agents are given the  capacity to act, make decisions and pursue a goal that may impact key systems. A human should be in the loop by default, especially for business-critical use cases and systems. The teams that use AI must understand the actions it may take and where they may need to intervene. Start conservatively and, over time, increase the level of agency given to AI agents.

    In conjunction, operations teams, engineers and security professionals must understand the role they play in supervising AI agents’ workflows. Each agent should be assigned a specific human owner for clearly defined oversight and accountability. Organizations must also allow any human to flag or override an AI agent’s behavior when an action has a negative outcome.

    When considering tasks for AI agents, organizations should understand that, while traditional automation is good at handling repetitive, rule-based processes with structured data inputs, AI agents can handle much more complex tasks and adapt to new information in a more autonomous way. This makes them an appealing solution for all sorts of tasks. But as AI agents are deployed, organizations should control what actions the agents can take, particularly in the early stages of a project. Thus, teams working with AI agents should have approval paths in place for high-impact actions to ensure agent scope does not extend beyond expected use cases, minimizing risk to the wider system.

    2: Bake in security 

    The introduction of new tools should not expose a system to fresh security risks. 

    Organizations should consider agentic platforms that comply with high security standards and are validated by enterprise-grade certifications such as SOC2, FedRAMP or equivalent. Further, AI agents should not be allowed free rein across an organization’s systems. At a minimum, the permissions and security scope of an AI agent must be aligned with the scope of the owner, and any tools added to the agent should not allow for extended permissions. Limiting AI agent access to a system based on their role will also ensure deployment runs smoothly. Keeping complete logs of every action taken by an AI agent can also help engineers understand what happened in the event of an incident and trace back the problem.

    3: Make outputs explainable 

    AI use in an organization must never be a black box. The reasoning behind any action must be illustrated so that any engineer who tries to access it can understand the context the agent used for decision-making and access the traces that led to those actions.

    Inputs and outputs for every action should be logged and accessible. This will help organizations establish a firm overview of the logic underlying an AI agent’s actions, providing significant value in the event anything goes wrong.

    Security underscores AI agents’ success

    AI agents offer a huge opportunity for organizations to accelerate and improve their existing processes. However, if they do not prioritize security and strong governance, they could expose themselves to new risks.

    As AI agents become more common, organizations must ensure they have systems in place to measure how they perform and the ability to take action when they create problems.

    Read more from our guest writers. Or, consider submitting a post of your own! See our guidelines here.

  • Enterprises that want tokenizer-free multilingual models are increasingly turning to byte-level language models to reduce brittleness in noisy or low-resource text. To tap into that niche — and make it practical at scale — the Allen Institute of AI (Ai2) introduced Bolmo, a new family of models that leverage its Olmo 3 models by “bytefiying” them and reusing their backbone and capabilities.

    The company launched two versions, Bolmo 7B and Bolmo 1B, which are “the first fully open byte-level language model,” according to Ai2. The company said the two models performed competitively with — and in some cases surpassed — other byte-level and character-based models.

    Byte-level language models operate directly on raw UTF-8 bytes, eliminating the need for a predefined vocabulary or tokenizer. This allows them to handle misspellings, rare languages, and unconventional text more reliably — key requirements for moderation, edge deployments, and multilingual applications.

    For enterprises deploying AI across multiple languages, noisy user inputs, or constrained environments, tokenizer-free models offer a way to reduce operational complexity. Ai2’s Bolmo is an attempt to make that approach practical at scale — without retraining from scratch.

    How Bolmo works and how it was built 

    Ai2 said it trained the Bolmo models using its Dolma 3 data mix, which helped train its Olmo flagship models, and some open code datasets and character-level data.

    The company said its goal “is to provide a reproducible, inspectable blueprint for byteifying strong subword language models in a way the community can adopt and extend.” To meet this goal, Ai2 will release its checkpoints, code, and a full paper to help other organizations build byte-level models on top of its Olmo ecosystem. 

    Since training a byte-level model completely from scratch can get expensive, Ai2 researchers instead chose an existing Olmo 3 7B checkpoint to byteify in two stages. 

    In the first stage, Ai2 froze the Olmo 3 transformer so that they only train certain parts, such as the local encoder and decoder, the boundary predictor, and the language modeling head. This was designed to be “cheap and fast” and requires just 9.8 billion tokens. 

    The next stage unfreezes the model and trains it with additional tokens. Ai2 said the byte-level approach allows Bolmo to avoid the vocabulary bottlenecks that limit traditional subword models.

    Strong performance among its peers

    Byte-level language models are not as mainstream as small language models or LLMs, but this is a growing field in research. Meta released its BLT architecture research last year, aiming to offer a model that is robust, processes raw data, and doesn’t rely on fixed vocabularies. 

    Other research models in this space include ByT5, Stanford’s MrT5, and Canine.  

    Ai2 evaluated Bolmo using its evaluation suite, covering math, STEM reasoning, question answering, general knowledge, and code. 

    Bolmo 7B showed strong performance, outperforming character-focused benchmarks like CUTE and EXECUTE, and also improving accuracy over the base LLM Olmo 3. 

    Bolmo 7B outperformed models of comparable size in coding, math, multiple-choice QA, and character-level understanding. 

    Why enterprises may choose byte-level models

    Enterprises find value in a hybrid model structure, using a mix of models and model sizes. 

    Ai2 makes the case that organizations should also consider byte-level models not only for robustness and multilingual understanding, but because it “naturally plugs into an existing model ecosystem.”

    “A key advantage of the dynamic hierarchical setup is that compression becomes a toggleable knob,” the company said.

    For enterprises already running heterogeneous model stacks, Bolmo suggests that byte-level models may no longer be purely academic. By retrofitting a strong subword model rather than training from scratch, Ai2 is signaling a lower-risk path for organizations that want robustness without abandoning existing infrastructure.