Building an enterprise AI company on a "foundation of shifting sand" is the central challenge for founders today, according to the leadership at Palona AI.
Today, the Palo Alto-based startup—led by former Google and Meta engineering veterans—is making a decisive vertical push into the restaurant and hospitality space with today's launch of Palona Vision and Palona Workflow.
The new offerings transform the company’s multimodal agent suite into a real-time operating system for restaurant operations — spanning cameras, calls, conversations, and coordinated task execution.
The news marks a strategic pivot from the company’s debut in early 2025, when it first emerged with $10 million in seed funding to build emotionally intelligent sales agents for broad direct-to-consumer enterprises.
Now, by narrowing its focus to a "multimodal native" approach for restaurants, Palona is providing a blueprint for AI builders on how to move beyond "thin wrappers" to build deep systems that solve high-stakes physical world problems.
“You’re building a company on top of a foundation that is sand—not quicksand, but shifting sand,” said co-founder and CTO Tim Howes, referring to the instability of today’s LLM ecosystem. “So we built an orchestration layer that lets us swap models on performance, fluency, and cost.”
VentureBeat spoke with Howes and co-founder and CEO Maria Zhang in person recently at — where else? — a restaurant in NYC about the technical challenges and hard lessons learned from their launch, growth, and pivot.
The New Offering: Vision and Workflow as a ‘Digital GM’
For the end user—the restaurant owner or operator—Palona’s latest release is designed to function as an automated "best operations manager" that never sleeps.
Palona Vision uses in-store security cameras to analyze operational signals — such as queue lengths, table turnover, prep bottlenecks, and cleanliness — without requiring any new hardware.
It monitors front-of-house metrics like queue lengths, table turns, and cleanliness, while simultaneously identifying back-of-house issues like prep slowdowns or station setup errors.
Palona Workflow complements this by automating multi-step operational processes. This includes managing catering orders, opening and closing checklists, and food prep fulfillment. By correlating video signals from Vision with Point-of-Sale (POS) data and staffing levels, Workflow ensures consistent execution across multiple locations.
“Palona Vision is like giving every location a digital GM,” said Shaz Khan, founder of Tono Pizzeria + Cheesesteaks, in a press release provided to VentureBeat. “It flags issues before they escalate and saves me hours every week.”
Going Vertical: Lessons in Domain Expertise
Palona’s journey began with a star-studded roster. CEO Zhang previously served as VP of Engineering at Google and CTO of Tinder, while Co-founder Howes is the co-inventor of LDAP and a former Netscape CTO.
Despite this pedigree, the team’s first year was a lesson in the necessity of focus.
Initially, Palona served fashion and electronics brands, creating "wizard" and "surfer dude" personalities to handle sales. However, the team quickly realized that the restaurant industry presented a unique, trillion-dollar opportunity that was "surprisingly recession-proof" but "gobsmacked" by operational inefficiency.
"Advice to startup founders: don't go multi-industry," Zhang warned.
By verticalizing, Palona moved from being a "thin" chat layer to building a "multi-sensory information pipeline" that processes vision, voice, and text in tandem.
That clarity of focus opened access to proprietary training data (like prep playbooks and call transcripts) while avoiding generic data scraping.
1. Building on ‘Shifting Sand’
To accommodate the reality of enterprise AI deployments in 2025 — with new, improved models coming out on a nearly weekly basis — Palona developed a patent-pending orchestration layer.
Rather than being "bundled" with a single provider like OpenAI or Google, Palona’s architecture allows them to swap models on a dime based on performance and cost.
They use a mix of proprietary and open-source models, including Gemini for computer vision benchmarks and specific language models for Spanish or Chinese fluency.
For builders, the message is clear: Never let your product's core value be a single-vendor dependency.
2. From Words to ‘World Models’
The launch of Palona Vision represents a shift from understanding words to understanding the physical reality of a kitchen.
While many developers struggle to stitch separate APIs together, Palona’s new vision model transforms existing in-store cameras into operational assistants.
The system identifies "cause and effect" in real-time—recognizing if a pizza is undercooked by its "pale beige" color or alerting a manager if a display case is empty.
"In words, physics don't matter," Zhang explained. "But in reality, I drop the phone, it always goes down... we want to really figure out what's going on in this world of restaurants".
3. The ‘Muffin’ Solution: Custom Memory Architecture
One of the most significant technical hurdles Palona faced was memory management. In a restaurant context, memory is the difference between a frustrating interaction and a "magical" one where the agent remembers a diner’s "usual" order.
The team initially utilized an unspecified open-source tool, but found it produced errors 30% of the time. "I think advisory developers always turn off memory [on consumer AI products], because that will guarantee to mess everything up," Zhang cautioned.
To solve this, Palona built Muffin, a proprietary memory management system named as a nod to web "cookies". Unlike standard vector-based approaches that struggle with structured data, Muffin is architected to handle four distinct layers:
-
Structured Data: Stable facts like delivery addresses or allergy information.
-
Slow-changing Dimensions: Loyalty preferences and favorite items.
-
Transient and Seasonal Memories: Adapting to shifts like preferring cold drinks in July versus hot cocoa in winter.
-
Regional Context: Defaults like time zones or language preferences.
The lesson for builders: If the best available tool isn't good enough for your specific vertical, you must be willing to build your own.
4. Reliability through ‘GRACE’
In a kitchen, an AI error isn't just a typo; it’s a wasted order or a safety risk. A recent incident at Stefanina’s Pizzeria in Missouri, where an AI hallucinated fake deals during a dinner rush, highlights how quickly brand trust can evaporate when safeguards are absent.
To prevent such chaos, Palona’s engineers follow its internal GRACE framework:
-
Guardrails: Hard limits on agent behavior to prevent unapproved promotions.
-
Red Teaming: Proactive attempts to "break" the AI and identify potential hallucination triggers.
-
App Sec: Lock down APIs and third-party integrations with TLS, tokenization, and attack prevention systems.
-
Compliance: Grounding every response in verified, vetted menu data to ensure accuracy.
-
Escalation: Routing complex interactions to a human manager before a guest receives misinformation.
This reliability is verified through massive simulation. "We simulated a million ways to order pizza," Zhang said, using one AI to act as a customer and another to take the order, measuring accuracy to eliminate hallucinations.
The Bottom Line
With the launch of Vision and Workflow, Palona is betting that the future of enterprise AI isn't in broad assistants, but in specialized "operating systems" that can see, hear, and think within a specific domain.
In contrast to general-purpose AI agents, Palona’s system is designed to execute restaurant workflows, not just respond to queries — it's capable of remembering customers, hearing them order their "usual," and monitoring the restaurant operations to ensure they deliver that customer the food according to their internal processes and guidelines, flagging whenever something goes wrong or crucially, is about to go wrong.
For Zhang, the goal is to let human operators focus on their craft: "If you've got that delicious food nailed... we’ll tell you what to do."
-
Zencoder, the Silicon Valley startup that builds AI-powered coding agents, released a free desktop application on Monday that it says will fundamentally change how software engineers interact with artificial intelligence — moving the industry beyond the freewheeling era of "vibe coding" toward a more disciplined, verifiable approach to AI-assisted development.
The product, called Zenflow, introduces what the company describes as an "AI orchestration layer" that coordinates multiple AI agents to plan, implement, test, and review code in structured workflows. The launch is Zencoder's most ambitious attempt yet to differentiate itself in an increasingly crowded market dominated by tools like Cursor, GitHub Copilot, and coding agents built directly by AI giants Anthropic, OpenAI, and Google.
"Chat UIs were fine for copilots, but they break down when you try to scale," said Andrew Filev, Zencoder's chief executive, in an exclusive interview with VentureBeat. "Teams are hitting a wall where speed without structure creates technical debt. Zenflow replaces 'Prompt Roulette' with an engineering assembly line where agents plan, implement, and, crucially, verify each other's work."
The announcement arrives at a critical moment for enterprise software development. Companies across industries have poured billions of dollars into AI coding tools over the past two years, hoping to dramatically accelerate their engineering output. Yet the promised productivity revolution has largely failed to materialize at scale.
Why AI coding tools have failed to deliver on their 10x productivity promise
Filev, who previously founded and sold the project management company Wrike to Citrix, pointed to a growing disconnect between AI coding hype and reality. While vendors have promised tenfold productivity gains, rigorous studies — including research from Stanford University — consistently show improvements closer to 20 percent.
"If you talk to real engineering leaders, I don't remember a single conversation where somebody vibe coded themselves to 2x or 5x or 10x productivity on serious engineering production," Filev said. "The typical number you would hear would be about 20 percent."
The problem, according to Filev, lies not with the AI models themselves but with how developers interact with them. The standard approach of typing requests into a chat interface and hoping for usable code works well for simple tasks but falls apart on complex enterprise projects.
Zencoder's internal engineering team claims to have cracked a different approach. Filev said the company now operates at roughly twice the velocity it achieved 12 months ago, not primarily because AI models improved, but because the team restructured its development processes.
"We had to change our process and use a variety of different best practices," he said.
Inside the four pillars that power Zencoder's AI orchestration platform
Zenflow organizes its approach around four core capabilities that Zencoder argues any serious AI orchestration platform must support.
Structured workflows replace ad-hoc prompting with repeatable sequences (plan, implement, test, review) that agents follow consistently. Filev drew parallels to his experience building Wrike, noting that individual to-do lists rarely scale across organizations, while defined workflows create predictable outcomes.
Spec-driven development requires AI agents to first generate a technical specification, then create a step-by-step plan, and only then write code. The approach became so effective that frontier AI labs including Anthropic and OpenAI have since trained their models to follow it automatically. The specification anchors agents to clear requirements, preventing what Zencoder calls "iteration drift," or the tendency for AI-generated code to gradually diverge from the original intent.
Multi-agent verification deploys different AI models to critique each other's work. Because AI models from the same family tend to share blind spots, Zencoder routes verification tasks across model providers, asking Claude to review code written by OpenAI's models, or vice versa.
"Think of it as a second opinion from a doctor," Filev told VentureBeat. "With the right pipeline, we see results on par with what you'd expect from Claude 5 or GPT-6. You're getting the benefit of a next-generation model today."
Parallel execution lets developers run multiple AI agents simultaneously in isolated sandboxes, preventing them from interfering with each other's work. The interface provides a command center for monitoring this fleet, a significant departure from the current practice of managing multiple terminal windows.
How verification solves AI coding's biggest reliability problem
Zencoder's emphasis on verification addresses one of the most persistent criticisms of AI-generated code: its tendency to produce "slop," or code that appears correct but fails in production or degrades over successive iterations.
The company's internal research found that developers who skip verification often fall into what Filev called a "death loop." An AI agent completes a task successfully, but the developer, reluctant to review unfamiliar code, moves on without understanding what was written. When subsequent tasks fail, the developer lacks the context to fix problems manually and instead keeps prompting the AI for solutions.
"They literally spend more than a day in that death loop," Filev said. "That's why the productivity is not 2x, because they were running at 3x first, and then they wasted the whole day."
The multi-agent verification approach also gives Zencoder an unusual competitive advantage over the frontier AI labs themselves. While Anthropic, OpenAI, and Google each optimize their own models, Zencoder can mix and match across providers to reduce bias.
"This is a rare situation where we have an edge on the frontier labs," Filev said. "Most of the time they have an edge on us, but this is a rare case."
Zencoder faces steep competition from AI giants and well-funded startups
Zencoder enters the AI orchestration market at a moment of intense competition. The company has positioned itself as a model-agnostic platform, supporting major providers including Anthropic, OpenAI, and Google Gemini. In September, Zencoder expanded its platform to let developers use command-line coding agents from any provider within its interface.
That strategy reflects a pragmatic acknowledgment that developers increasingly maintain relationships with multiple AI providers rather than committing exclusively to one. Zencoder's universal platform approach lets it serve as the orchestration layer regardless of which underlying models a company prefers.
The company also emphasizes enterprise readiness, touting SOC 2 Type II, ISO 27001, and ISO 42001 certifications along with GDPR compliance. These credentials matter for regulated industries like financial services and healthcare, where compliance requirements can block adoption of consumer-oriented AI tools.
But Zencoder faces formidable competition from multiple directions. Cursor and Windsurf have built dedicated AI-first code editors with devoted user bases. GitHub Copilot benefits from Microsoft's distribution muscle and deep integration with the world's largest code repository. And the frontier AI labs continue expanding their own coding capabilities.
Filev dismissed concerns about competition from the AI labs, arguing that smaller players like Zencoder can move faster on user experience innovation.
"I'm sure they will come to the same conclusion, and they're smart and moving fast, so I'm sure they will catch up fairly quickly," he said. "That's why I said in the next six to 12 months, you're going to see a lot of this propagating through the whole space."
The case for adopting AI orchestration now instead of waiting for better models
Technical executives weighing AI coding investments face a difficult timing question: Should they adopt orchestration tools now, or wait for frontier AI labs to build these capabilities natively into their models?
Filev argued that waiting carries significant competitive risk.
"Right now, everybody is under pressure to deliver more in less time, and everybody expects engineering leaders to deliver results from AI," he said. "As a founder and CEO, I do not expect 20 percent from my VP of engineering. I expect 2x."
He also questioned whether the major AI labs will prioritize orchestration capabilities when their core business remains model development.
"In the ideal world, frontier labs should be building the best-ever models and competing with each other, and Zencoders and Cursors need to build the best-ever UI and UX application layer on top of those models," Filev said. "I don't see a world where OpenAI will offer you our code verifier, or vice versa."
Zenflow launches as a free desktop application, with updated plugins available for Visual Studio Code and JetBrains integrated development environments. The product supports what Zencoder calls "dynamic workflows," meaning the system automatically adjusts process complexity based on whether a human is actively monitoring and on the difficulty of the task at hand.
Zencoder said internal testing showed that replacing standard prompting with Zenflow's orchestration layer improved code correctness by approximately 20 percent on average.
What Zencoder's bet on orchestration reveals about the future of AI coding
Zencoder frames Zenflow as the first product in what it expects to become a significant new software category. The company believes every vendor focused on AI coding will eventually arrive at similar conclusions about the need for orchestration tools.
"I think the next six to 12 months will be all about orchestration," Filev predicted. "A lot of organizations will finally reach that 2x. Not 10x yet, but at least the 2x they were promised a year ago."
Rather than competing head-to-head with frontier AI labs on model quality, Zencoder is betting that the application layer (the software that helps developers actually use these models effectively) will determine winners and losers.
It is, Filev suggested, a familiar pattern from technology history.
"This is very similar to what I observed when I started Wrike," he said. "As work went digital, people relied on email and spreadsheets to manage everything, and neither could keep up."
The same dynamic, he argued, now applies to AI coding. Chat interfaces were designed for conversation, not for orchestrating complex engineering workflows. Whether Zencoder can establish itself as the essential layer between developers and AI models before the giants build their own solutions remains an open question.
But Filev seems comfortable with the race. The last time he spotted a gap between how people worked and the tools they had to work with, he built a company worth over a billion dollars.
Zenflow is available immediately as a free download at zencoder.ai/zenflow.
Enterprises that want tokenizer-free multilingual models are increasingly turning to byte-level language models to reduce brittleness in noisy or low-resource text. To tap into that niche — and make it practical at scale — the Allen Institute of AI (Ai2) introduced Bolmo, a new family of models that leverage its Olmo 3 models by “bytefiying” them and reusing their backbone and capabilities.
The company launched two versions, Bolmo 7B and Bolmo 1B, which are “the first fully open byte-level language model,” according to Ai2. The company said the two models performed competitively with — and in some cases surpassed — other byte-level and character-based models.
Byte-level language models operate directly on raw UTF-8 bytes, eliminating the need for a predefined vocabulary or tokenizer. This allows them to handle misspellings, rare languages, and unconventional text more reliably — key requirements for moderation, edge deployments, and multilingual applications.
For enterprises deploying AI across multiple languages, noisy user inputs, or constrained environments, tokenizer-free models offer a way to reduce operational complexity. Ai2’s Bolmo is an attempt to make that approach practical at scale — without retraining from scratch.
How Bolmo works and how it was built
Ai2 said it trained the Bolmo models using its Dolma 3 data mix, which helped train its Olmo flagship models, and some open code datasets and character-level data.
The company said its goal “is to provide a reproducible, inspectable blueprint for byteifying strong subword language models in a way the community can adopt and extend.” To meet this goal, Ai2 will release its checkpoints, code, and a full paper to help other organizations build byte-level models on top of its Olmo ecosystem.
Since training a byte-level model completely from scratch can get expensive, Ai2 researchers instead chose an existing Olmo 3 7B checkpoint to byteify in two stages.
In the first stage, Ai2 froze the Olmo 3 transformer so that they only train certain parts, such as the local encoder and decoder, the boundary predictor, and the language modeling head. This was designed to be “cheap and fast” and requires just 9.8 billion tokens.
The next stage unfreezes the model and trains it with additional tokens. Ai2 said the byte-level approach allows Bolmo to avoid the vocabulary bottlenecks that limit traditional subword models.
Strong performance among its peers
Byte-level language models are not as mainstream as small language models or LLMs, but this is a growing field in research. Meta released its BLT architecture research last year, aiming to offer a model that is robust, processes raw data, and doesn’t rely on fixed vocabularies.
Other research models in this space include ByT5, Stanford’s MrT5, and Canine.
Ai2 evaluated Bolmo using its evaluation suite, covering math, STEM reasoning, question answering, general knowledge, and code.
Bolmo 7B showed strong performance, outperforming character-focused benchmarks like CUTE and EXECUTE, and also improving accuracy over the base LLM Olmo 3.
Bolmo 7B outperformed models of comparable size in coding, math, multiple-choice QA, and character-level understanding.
Why enterprises may choose byte-level models
Enterprises find value in a hybrid model structure, using a mix of models and model sizes.
Ai2 makes the case that organizations should also consider byte-level models not only for robustness and multilingual understanding, but because it “naturally plugs into an existing model ecosystem.”
“A key advantage of the dynamic hierarchical setup is that compression becomes a toggleable knob,” the company said.
For enterprises already running heterogeneous model stacks, Bolmo suggests that byte-level models may no longer be purely academic. By retrofitting a strong subword model rather than training from scratch, Ai2 is signaling a lower-risk path for organizations that want robustness without abandoning existing infrastructure.


