Wyatt Galante, Author at Optimus Technology Group

OpenCV founders launch AI video startup to take on OpenAI and Google

OpenCV founders launch AI video startup to take on OpenAI and Google

A new artificial intelligence startup founded by the creators of the world's most widely used computer vision library has emerged from stealth with technology that generates realistic human-centric videos up to five minutes long — a dramatic leap beyond the capabilities of rivals including OpenAI's Sora and Google's Veo.

CraftStory, which launched Tuesday with $2 million in funding, is introducing Model 2.0, a video generation system that addresses one of the most significant limitations plaguing the nascent AI video industry: duration. While OpenAI's Sora 2 tops out at 25 seconds and most competing models generate clips of 10 seconds or less, CraftStory's system can produce continuous, coherent video performances that run as long as a typical YouTube tutorial or product demonstration.

The breakthrough could unlock substantial commercial value for enterprises struggling to scale video production for training, marketing, and customer education — markets where brief AI-generated clips have proven inadequate despite their visual polish.

"If you really try to create a video with one of these video generation systems, you find that a lot of the times you want to implement a certain creative vision, and regardless of how detailed the instructions are, the systems basically ignore a part of your instructions," said Victor Erukhimov, CraftStory's founder and CEO, in an exclusive interview with VentureBeat. "We developed a system that can generate videos basically as long as you need them."

How parallel processing solves the long-form video problem

CraftStory's advance rests on what the company describes as a parallelized diffusion architecture — a fundamentally different approach to how AI models generate video compared to the sequential methods employed by most competitors.

Traditional video generation models work by running diffusion algorithms on increasingly large three-dimensional volumes where time represents the third axis. To generate a longer video, these models require proportionally larger networks, more training data, and significantly more computational resources.

CraftStory instead runs multiple smaller diffusion algorithms simultaneously across the entire duration of the video, with bidirectional constraints connecting them. "The latter part of the video can influence the former part of the video too," Erukhimov explained. "And this is pretty important, because if you do it one by one, then an artifact that appears in the first part propagates to the second one, and then it accumulates."

Rather than generating eight seconds and then stitching on additional segments, CraftStory's system processes all five minutes concurrently through interconnected diffusion processes.

Crucially, CraftStory trained its model on proprietary footage rather than relying solely on internet-scraped videos. The company hired studios to shoot actors using high-frame-rate camera systems that capture crisp detail even in fast-moving elements like fingers — avoiding the motion blur inherent in standard 30-frames-per-second YouTube clips.

"What we showed is that you don't need a lot of data and you don't need a lot of training budget to create high quality videos," Erukhimov said. "You just need high quality data."

Model 2.0 currently operates as a video-to-video system: users upload a still image to animate and a "driving video" containing a person whose movements the AI will replicate. CraftStory provides preset driving videos shot with professional actors, who receive revenue shares when their motion data is used, or users can upload their own footage.

The system generates 30-second clips at low resolution in approximately 15 minutes. An advanced lip-sync system synchronizes mouth movements to scripts or audio tracks, while gesture alignment algorithms ensure body language matches speech rhythm and emotional tone.

Fighting a war chest battle with $2 million against billions

CraftStory's funding comes almost entirely from Andrew Filev, who sold his project management software company Wrike to Citrix for $2.25 billion in 2021 and now runs Zencoder, an AI coding company. The modest raise stands in stark contrast to the billions flowing into competing efforts — OpenAI has raised over $6 billion in its latest funding round alone.

Erukhimov pushed back on the notion that massive capital is prerequisite for success. "I don't necessarily buy the thesis that compute is the path to success," he said. "It definitely helps if you have compute. But if you raise a billion dollars on a PowerPoint, in the end, no one is happy, neither the founders nor the investors."

Filev defended the David-versus-Goliath approach. "When you invest in startups, you're fundamentally betting on people," he said in an interview with VentureBeat. "To paraphrase Margaret Mead: never underestimate what a small group of thoughtful, committed engineers and scientists can build."

He argued that CraftStory benefits from a focused strategy. "The big labs are in an arms race to build general-purpose video foundation models," Filev said. "CraftStory is riding that wave and going very deep into a specific format: long-form, engaging, human-centric video."

Why computer vision expertise matters in generative AI video

Erukhimov's credibility stems from his deep roots in computer vision rather than the transformer architectures that have dominated recent AI advances. He was an early contributor to OpenCV — the Open Source Computer Vision Library that has become the de facto standard for computer vision applications, with over 84,000 stars on GitHub.

When Intel reduced its support for OpenCV in the mid-2000s, Erukhimov co-founded Itseez with the explicit goal of maintaining and advancing the library. The company expanded OpenCV significantly and pivoted toward automotive safety systems before Intel acquired it in 2016.

Filev said this background is precisely what makes Erukhimov well-positioned for video generation. "What people sometimes miss is that generative AI video isn't just about the generative part. It's about understanding motion, facial dynamics, temporal coherence, and how humans actually move," Filev said. "Victor has spent his career mastering exactly those problems."

Enterprise focus targets training videos and product demos

While much of the public excitement around AI video generation has centered on creative tools for consumers, CraftStory is pursuing a decidedly enterprise-focused strategy.

"We are definitely thinking about B2B more than consumer," Erukhimov said. "We're thinking about companies, specifically software companies, being able to make cool training videos and product videos and launch videos."

The logic is straightforward: corporate training, product tutorials, and customer education videos often run several minutes and require consistent quality throughout. A 10-second AI clip cannot effectively demonstrate how to use enterprise software or explain a complex product feature.

"If you need a longer-form video, then you should go with us," Erukhimov said. "We can create up to five minutes, consistent video, high quality."

Filev echoed this assessment. "One huge gap in this market is the lack of models that can generate consistent videos over longer sequences — and that's extremely important for real-world use," he said. "If you're creating a commercial for your company, a 10-second video, no matter how good it looks, just isn't enough. You need 30 seconds, you need two minutes — you need more."

The company anticipates cost savings for customers. Filev suggested that "a small business owner could create content in minutes that previously would have cost $20,000 and taken two months to produce."

CraftStory is also courting creative agencies that produce video content for corporate clients, with the value proposition centered on cost and speed: agencies can record an actor on camera and transform that footage into a finished AI video, rather than managing expensive multi-day shoots.

The next major development on CraftStory's roadmap is a text-to-video model that would allow users to generate long-form content directly from scripts. The team is also developing support for moving-camera scenarios, including the popular "walk-and-talk" format common in high-end advertising.

Where CraftStory fits in a fragmented competitive landscape

CraftStory enters a crowded and rapidly evolving market. OpenAI's Sora 2, while not yet publicly available, has generated significant buzz. Google's Veo models are advancing quickly. Runway, Pika, and Stability AI all offer video generation tools with different capabilities.

Erukhimov acknowledged the competitive pressure but emphasized that CraftStory serves a distinct niche focused on human-centric videos. He positioned rapid innovation and market capture as the company's primary strategy rather than relying on technical moats.

Filev sees the market fragmenting into distinct layers, with large tech companies serving as "API providers of powerful, general-purpose generation models" while specialized players like CraftStory focus on specific use cases. "If the big players are building the engines, CraftStory is building the production studio and assembly line on top," he said.

Model 2.0 is available now at app.craftstory.com/model-2.0, with the company offering early access to users and enterprises interested in testing the technology. Whether a lightly funded startup can capture meaningful market share against deep-pocketed incumbents remains uncertain, but Erukhimov is characteristically confident about the opportunity ahead.

"AI-generated video will soon become the primary way companies communicate their stories," he said.

read more
Industry Perspectives: Share Your Expertise with the Data Center Community

Industry Perspectives: Share Your Expertise with the Data Center Community
Data Center Knowledge is seeking thought leaders and industry experts to contribute to our renowned Industry Perspectives column.

read more
SC25: AMD, Nvidia, Dell Unveil Next-Gen Supercomputing Products

SC25: AMD, Nvidia, Dell Unveil Next-Gen Supercomputing Products
Major tech giants unveiled AI-powered supercomputers, next-gen servers, and exascale systems at the supercomputing world's premier showcase.

read more
Immersion Cooling: Lagging Today, Leading Tomorrow

Immersion Cooling: Lagging Today, Leading Tomorrow
The shift to immersion cooling is inevitable, and early adopters will have a significant advantage over those who wait until DTC reaches its limits.

read more
Musk’s xAI launches Grok 4.1 with lower hallucination rate on the web and apps — no API access (for now)

Musk’s xAI launches Grok 4.1 with lower hallucination rate on the web and apps — no API access (for now)

In what appeared to be a bid to soak up some of Google's limelight prior to the launch of its new Gemini 3 flagship AI model — now recorded as the most powerful LLM in the world by multiple independent evaluators — Elon Musk's rival AI startup xAI last night unveiled its newest large language model, Grok 4.1.

The model is now live for consumer use on Grok.com, social network X (formerly Twitter), and the company’s iOS and Android mobile apps, and it arrives with major architectural and usability enhancements, among them: faster reasoning, improved emotional intelligence, and significantly reduced hallucination rates. xAI also commendably published a white paper on its evaluations and including a small bit on training process here.

Across public benchmarks, Grok 4.1 has vaulted to the top of the leaderboard, outperforming rival models from Anthropic, OpenAI, and Google — at least, Google's pre-Gemini 3 model (Gemini 2.5 Pro). It builds upon the success of xAI's Grok-4 Fast, which VentureBeat covered favorably shortly following its release back in September 2025.

However, enterprise developers looking to integrate the new and improved model Grok 4.1 into production environments will find one major constraint: it's not yet available through xAI’s public API.

Despite its high benchmarks, Grok 4.1 remains confined to xAI’s consumer-facing interfaces, with no announced timeline for API exposure. At present, only older models—including Grok 4 Fast (reasoning and non-reasoning variants), Grok 4 0709, and legacy models such as Grok 3, Grok 3 Mini, and Grok 2 Vision—are available for programmatic use via the xAI developer API. These support up to 2 million tokens of context, with token pricing ranging from $0.20 to $3.00 per million depending on the configuration.

For now, this limits Grok 4.1’s utility in enterprise workflows that rely on backend integration, fine-tuned agentic pipelines, or scalable internal tooling. While the consumer rollout positions Grok 4.1 as the most capable LLM in xAI’s portfolio, production deployments in enterprise environments remain on hold.

Model Design and Deployment Strategy

Grok 4.1 arrives in two configurations: a fast-response, low-latency mode for immediate replies, and a “thinking” mode that engages in multi-step reasoning before producing output.

Both versions are live for end users and are selectable via the model picker in xAI’s apps.

The two configurations differ not just in latency but also in how deeply the model processes prompts. Grok 4.1 Thinking leverages internal planning and deliberation mechanisms, while the standard version prioritizes speed. Despite the difference in architecture, both scored higher than any competing models in blind preference and benchmark testing.

Leading the Field in Human and Expert Evaluation

On the LMArena Text Arena leaderboard, Grok 4.1 Thinking briefly held the top position with a normalized Elo score of 1483 — then was dethroned a few hours later with Google's release of Gemini 3 and its incredible 1501 Elo score.

The non-thinking version of Grok 4.1 also fares well on the index, however, at 1465.

These scores place Grok 4.1 above Google’s Gemini 2.5 Pro, Anthropic’s Claude 4.5 series, and OpenAI’s GPT-4.5 preview.

In creative writing, Grok 4.1 ranks second only to Polaris Alpha (an early GPT-5.1 variant), with the “thinking” model earning a score of 1721.9 on the Creative Writing v3 benchmark. This marks a roughly 600-point improvement over previous Grok iterations.

Similarly, in the Arena Expert leaderboard, which aggregates feedback from professional reviewers, Grok 4.1 Thinking again leads the field with a score of 1510.

The gains are especially notable given that Grok 4.1 was released only two months after Grok 4 Fast, highlighting the accelerated development pace at xAI.

Core Improvements Over Previous Generations

Technically, Grok 4.1 represents a significant leap in real-world usability. Visual capabilities—previously limited in Grok 4—have been upgraded to enable robust image and video understanding, including chart analysis and OCR-level text extraction. Multimodal reliability was a pain point in prior versions and has now been addressed.

Token-level latency has been reduced by approximately 28 percent while preserving reasoning depth.

In long-context tasks, Grok 4.1 maintains coherent output up to 1 million tokens, improving on Grok 4’s tendency to degrade past the 300,000 token mark.

xAI has also improved the model's tool orchestration capabilities. Grok 4.1 can now plan and execute multiple external tools in parallel, reducing the number of interaction cycles required to complete multi-step queries.

According to internal test logs, some research tasks that previously required four steps can now be completed in one or two.

Other alignment improvements include better truth calibration—reducing the tendency to hedge or soften politically sensitive outputs—and more natural, human-like prosody in voice mode, with support for different speaking styles and accents.

Safety and Adversarial Robustness

As part of its risk management framework, xAI evaluated Grok 4.1 for refusal behavior, hallucination resistance, sycophancy, and dual-use safety.

The hallucination rate in non-reasoning mode has dropped from 12.09 percent in Grok 4 Fast to just 4.22 percent — a roughly 65% improvement.

The model also scored 2.97 percent on FActScore, a factual QA benchmark, down from 9.89 percent in earlier versions.

In the domain of adversarial robustness, Grok 4.1 has been tested with prompt injection attacks, jailbreak prompts, and sensitive chemistry and biology queries.

Safety filters showed low false negative rates, especially for restricted chemical knowledge (0.00 percent) and restricted biological queries (0.03 percent).

The model’s ability to resist manipulation in persuasion benchmarks, such as MakeMeSay, also appears strong—it registered a 0 percent success rate as an attacker.

Limited Enterprise Access via API

Despite these gains, Grok 4.1 remains unavailable to enterprise users through xAI’s API. According to the company’s public documentation, the latest available models for developers are Grok 4 Fast (both reasoning and non-reasoning variants), each supporting up to 2 million tokens of context at pricing tiers ranging from $0.20 to $0.50 per million tokens. These are backed by a 4M tokens-per-minute throughput limit and 480 requests per minute (RPM) rate cap.

By contrast, Grok 4.1 is accessible only through xAI’s consumer-facing properties—X, Grok.com, and the mobile apps. This means organizations cannot yet deploy Grok 4.1 via fine-tuned internal workflows, multi-agent chains, or real-time product integrations.

Industry Reception and Next Steps

The release has been met with strong public and industry feedback. Elon Musk, founder of xAI, posted a brief endorsement, calling it “a great model” and congratulating the team. AI benchmark platforms have praised the leap in usability and linguistic nuance.

For enterprise customers, however, the picture is more mixed. Grok 4.1’s performance represents a breakthrough for general-purpose and creative tasks, but until API access is enabled, it will remain a consumer-first product with limited enterprise applicability.

As competitive models from OpenAI, Google, and Anthropic continue to evolve, xAI’s next strategic move may hinge on when—and how—it opens Grok 4.1 to external developers.

read more
Microsoft remakes Windows for an era of autonomous AI agents

Microsoft remakes Windows for an era of autonomous AI agents

Microsoft is fundamentally restructuring its Windows operating system to become what executives call the first "agentic OS," embedding the infrastructure needed for autonomous AI agents to operate securely at enterprise scale — a watershed moment in the evolution of personal computing that positions the 40-year-old platform as the foundation for a new era of human-machine collaboration.

The company announced Tuesday at its Ignite conference that it is introducing native agent infrastructure directly into Windows 11, allowing AI agents — autonomous software programs that can perform complex, multi-step tasks on behalf of users — to discover tools, execute workflows, and interact with applications through standardized protocols while operating in secure, policy-controlled environments separate from user sessions.

The shift is Microsoft's most significant architectural evolution of Windows since the introduction of the modern security model, transforming the operating system from a platform where users manually orchestrate applications into one where they can "simply express your desired outcome, and agents handle the complexity," according to Pavan Davuluri, President of Windows & Devices at Microsoft.

"Windows 11 starts with this notion of secure by design, secure by default," Davuluri said in an exclusive interview with VentureBeat. "And a lot of the work that we're doing today, when we think about the engagement we have with our customers, the expectations they have with us is making sure we are building upon the fact that Windows is the most secure platform for them and is the most resilient platform as well."

The announcements arrive as enterprises are experimenting with AI agents but struggling with fragmented tooling, security concerns, and lack of centralized management — challenges that Microsoft believes only operating system-level integration can solve. The stakes are enormous: with Windows running on an estimated 1.4 billion devices globally, Microsoft's architectural choices will likely shape how organizations deploy autonomous AI systems for years to come.

New platform primitives create foundation for agent computing

At the core of Microsoft's vision are three new platform capabilities entering preview that fundamentally change how agents operate on Windows. Agent Connectors provide native support for the Model Context Protocol (MCP), an open standard introduced by Anthropic that allows AI agents to connect with external tools and data sources. Microsoft has built what it calls an "on-device registry" — a secure, manageable repository where developers can register their applications' capabilities as agent connectors, making them discoverable to any compatible agent on the system.

"These are platform capabilities that then become available to all of our customers," Davuluri explained, describing how the Windows file system, for example, becomes an agent connector that any MCP-compatible agent can access with user consent. "We're able to do this in a fashion that can scale for one but it also allows others to participate in the Windows registry for MCP."

The architecture introduces an MCP proxy layer that handles authentication, authorization, and auditing for all communication between agents and connectors. Microsoft is launching with two built-in agent connectors for File Explorer and System Settings, allowing agents to manage files or adjust system configurations like switching between light and dark mode — all with explicit user permission.

Agent Workspace, entering private preview, represents perhaps the most significant security innovation. It creates what Microsoft describes as "a contained, policy-controlled, and auditable environment where agents can interact with software" — essentially a parallel desktop session where agents operate with their own distinct identity, completely separate from the user's primary session.

"We want to be able to have clarity in the identity of the agent that is operating in the local operating system," Davuluri said, addressing security concerns about agents accessing sensitive data. "We want that session to be a session that is secure, that is policy control, that is manageable, that has transparency and auditability."

Each agent workspace runs with minimal privileges by default, accessing only explicitly granted resources. The system maintains detailed audit logs distinguishing agent actions from user actions — critical for enterprises that need to prove compliance and track all changes to systems and data.

Windows 365 for Agents extends this infrastructure to the cloud, turning Microsoft's Cloud PC offering into execution environments for agents. Instead of running on local devices, agents can operate in secure, policy-controlled virtual machines in Azure, enabling what Microsoft calls "computer-using agents" to interact with legacy applications and perform automation tasks at scale without consuming local compute resources.

Taskbar becomes command center for monitoring AI agents at work

The infrastructure enables significant user interface changes designed to make agents as commonplace as applications. Microsoft is introducing "Ask Copilot on the taskbar," a unified entry point in preview that combines Microsoft 365 Copilot, agent invocation, and traditional search in a single interface.

Users will be able to invoke agents using "@" mentions directly from the taskbar, then monitor their progress through familiar UI patterns like hover cards, progress badges, and notifications — all while continuing other work. When an agent completes a task or needs input, it surfaces updates through the taskbar without disrupting the user's primary workflow.

"We've evolved and created new UX in the taskbar to reflect the unique needs of agents performing background tasks on your behalf," said Navjot Virk, Corporate Vice President of Windows Experiences, describing features like progress bars and status badges that indicate when agents are working, need approval, or have completed tasks.

The design philosophy, Virk emphasized, centers on user control. "These experiences are designed to be opt in. We want to give customers full control over when and how they engage with copilots and agents."

For commercial Microsoft 365 Copilot users, the integration goes deeper. Microsoft is embedding Copilot directly into File Explorer, allowing users to ask questions, generate summaries, or draft emails based on document contents without leaving the file management interface. On Copilot+ PCs — devices with neural processing units capable of 40 trillion operations per second — new capabilities include converting any on-screen table into an Excel spreadsheet through the Click to Do feature.

Microsoft bets on open standards against Apple and Google's proprietary approaches

Microsoft's embrace of the open Model Context Protocol, created by Anthropic, marks a strategic bet on openness as enterprises evaluate competing AI platforms from Apple and Google that use proprietary frameworks.

"Windows is an open platform, and by virtue [of being] an open platform, we certainly have the ability to take existing technologies, evolve, harden, adapt those, but we also allow customers to bring their own capabilities to the platform as well," Davuluri said when asked about competing with Apple Intelligence and Google's Android AI for Enterprise.

The company demonstrated this openness with Claude, Anthropic's AI assistant, accessing the Windows file system through agent connectors with user consent — one of numerous partnerships Microsoft has secured. Dynamics 365 is using the File Explorer connector to streamline expense reporting, reducing what was previously a 30-minute, dozen-step process to "one sentence with high accuracy," according to Microsoft's blog post. Other early partners include Manus AI, Dropbox Dash, Roboflow, and Infosys.

"Windows is the platform in which they build upon," Davuluri said of enterprise customers. "And so our ability to take those existing bodies of work they have, and extend them is the, I think, the least friction way for them to go, learn, adopt, experiment and find ways to [scale]."

Security model enforces strict containment and mandatory user consent

Microsoft's security model for agents adheres to what it calls "secure by default" policies aligned with the company's broader Secure Future Initiative. All agent connectors registered in the on-device registry must meet strict requirements around packaging and identity, with applications properly packaged and signed by trusted sources. Developers must explicitly declare the minimum capabilities their agent connectors require, and agents and connectors run in isolated environments with dedicated agent user accounts, separate from human user accounts. Windows requires explicit user approval when agents first access sensitive resources like files or system settings.

"We give Windows the ability to go deliver on the security expectations, and then it is auditable at the end of the day," Davuluri said. "You still want an auditability log that looks similar to perhaps what you use in the cloud. And so all three pieces are built into the design and architecture of Agent Workspace."

For IT administrators, Microsoft is introducing management policies through Intune and Group Policy that allow organizations to enable or disable agent features at device and account levels, set minimum security policy levels, and access event logs enumerating all agent connector invocations and errors. The company emphasized that agents operate with restricted privileges, with minimal permissions by default and access granted only to explicitly approved resources that users can revoke at any time.

Post-quantum cryptography and recovery tools address emerging and persistent threats

Beyond agent infrastructure, Microsoft announced significant security and resilience updates addressing both emerging and persistent enterprise challenges. Post-Quantum Cryptography APIs are now generally available in Windows, allowing organizations to begin migrating to encryption algorithms designed to withstand future quantum computing attacks that could break today's cryptographic standards. Microsoft worked closely with the National Institute of Standards and Technology to implement these algorithms.

"We are introducing post quantum cryptography APIs in Windows," Davuluri said. "For customers who want to be able to do cryptographic encryption in their workloads, they can start taking advantage of these APIs in Windows for the first time. That is a huge step forward for us when we think about the future of windows."

Hardware-accelerated BitLocker will arrive on new devices starting spring 2026, offloading disk encryption to dedicated silicon for faster performance while providing hardware-level key protection. Sysmon functionality is becoming generally available as part of Windows in early 2026, bringing advanced forensics and threat detection capabilities previously available only as a separate download directly into the operating system's event logging system.

The company also detailed progress on its Windows Resiliency Initiative, launched a year ago following the CrowdStrike incident that disrupted 8.5 million Windows devices globally. New recovery capabilities include Quick Machine Recovery with expanded networking support and Autopatch management, allowing IT to remotely fix devices stuck in Windows Recovery Environment. Point-in-time restore entering preview rolls back devices to earlier states to resolve update conflicts or configuration errors, while Cloud rebuild in preview allows IT to remotely rebuild malfunctioning devices by downloading fresh installation media and using Autopilot for zero-touch provisioning.

Microsoft is also raising security requirements for third-party drivers across the Windows ecosystem. Following updated requirements for antivirus drivers effective April 1, 2025, the company is expanding this approach to other driver classes including networking, cameras, USB, printers, and storage — requiring higher certification standards, adding compiler safeguards, and providing more Windows in-box drivers to reduce reliance on third-party kernel-mode code.

Measured rollout reflects enterprise caution around autonomous software

Microsoft is positioning these updates as essential infrastructure for what it calls "Frontier Firms" — organizations that "blend human ingenuity with intelligent systems to deliver real outcomes." However, the company emphasized a cautious, opt-in approach that reflects enterprise concerns about autonomous software agents.

"The principles we're using in designing these new platform capabilities accounts for the reality that we have a very, very broad user base," Davuluri said. "A lot of the features and capabilities we're building are opt in capabilities. And so it is our goal to be able to have users find value in the workflow and meet them."

Virk emphasized the measured approach: "This is more about meeting customers where they are and then taking them on this journey when they are ready. So there's the optionality, but also having support for it. And really important thing is that they should feel comfortable. They should feel secure."

Microsoft's bet is that only operating system-level integration can provide the security, governance, and user experience required for mainstream AI agent adoption. Whether that vision materializes will depend on developer adoption, enterprise comfort with autonomous software, and Microsoft's ability to balance innovation with the stability that 40 years of Windows customers expect. After four decades of putting users in control of their computers, Windows is now asking them to share that control with machines.

read more
Writer’s AI agents can actually do your work—not just chat about it

Writer’s AI agents can actually do your work—not just chat about it

Writer, a San Francisco-based artificial intelligence startup, is launching a unified AI agent platform designed to let any employee automate complex business workflows without writing code — a capability the company says distinguishes it from consumer-oriented tools like Microsoft Copilot and ChatGPT.

The platform, called Writer Agent, combines chat-based assistance with autonomous task execution in a single interface. Starting Tuesday, enterprise customers can use natural language to instruct the AI to create presentations, analyze financial data, generate marketing campaigns, or coordinate across multiple business systems like Salesforce, Slack, and Google Workspace—then save those workflows as reusable "Playbooks" that run automatically on schedules.

The announcement comes as enterprises struggle to move AI initiatives beyond pilot programs into production at scale. Writer CEO May Habib has been outspoken about this challenge, recently revealing that 42% of Fortune 500 executives surveyed by her company said AI is "tearing their company apart" due to coordination failures between departments.

"We're delivering an agent interface that is both incredibly powerful and radically simple to transform individual productivity into organizational impact," Habib said in a statement. "Writer Agent is the difference between a single sales rep asking a chatbot to write an outreach email and an enterprise ensuring that 1,000 reps are all sending on-brand, compliant, and contextually-aware messages to target accounts."

How Writer is putting workflow automation in the hands of non-technical workers

The platform's core innovation centers on making workflow automation accessible to non-technical employees—what Writer executives call "democratizing who gets to be a builder."

In an exclusive interview with VentureBeat, Doris Jwo, Writer's director of product management, demonstrated how the system works: A user types a request in plain English — for example, "Create a two-page partnership proposal between [Company A] and [Company B], make it a branded deck, include impact metrics and partnership tiers."

The AI agent then breaks down that request into discrete steps, conducts web research, generates graphics and charts on the fly, creates individual slides with sourced information, and assembles a complete presentation. The entire process, which might take an employee hours or days, can be completed in 10-12 minutes.

"The agent basically looks at the request, breaks it down, does research, understands what pieces it needs, creates a detailed plan at a step-by-step level," Jwo explained during a product demonstration. "It might say, 'I need to do web research,' or 'This user needs information from Gong or Slack,' and it reaches out to those connectors, grabs the data, and executes the plan."

Crucially, users can save these multi-step processes as Playbooks—reusable templates that colleagues can deploy with a single click. Routines allow those Playbooks to run automatically at scheduled intervals, essentially putting knowledge work "on autopilot."

Security and compliance controls: Writer's answer to enterprise IT concerns

Writer positions these enterprise-focused controls as a key differentiator from competitors. While Microsoft, OpenAI, and Anthropic offer powerful AI capabilities, Writer's executives argue those tools weren't designed from the ground up for the security, compliance, and governance requirements of large regulated organizations.

"All of the products you mentioned are great products, but even Copilot is very much focused on personal productivity—summarizing email, for example, which is important, but that's not the component we're focusing on," said Matan-Paul Shetrit, Writer's director of product management, in an exclusive interview with VentureBeat.

Shetrit emphasized Writer's "trust, security, and interoperability" approach. IT administrators can granularly control what the AI can access — for instance, preventing market research agents from mentioning competitors, or restricting which employees can use web search capabilities. All activity is logged with detailed audit trails showing exactly what data the agent touched and what actions it took.

"These fine-grained controls are what make products enterprise-ready," Shetrit said. "We can deploy to tens of thousands or hundreds of thousands of employees while maintaining the security and guardrails you need for that scale."

This architecture reflects Writer's origin story. Unlike OpenAI or Anthropic, which started as research labs and later added enterprise offerings, Writer has targeted Fortune 500 companies since its 2020 founding. "We're not a research lab that went to consumer and is dabbling in enterprise," Shetrit said. "We are first and foremost targeting the Global 2000 and Fortune 500, and our research is in service of these customers' needs."

Inside Writer's strategy to connect AI agents across enterprise software systems

A critical technical component is Writer's approach to system integrations. The platform includes pre-built connectors to more than a dozen enterprise applications—Google Workspace, Microsoft 365, Snowflake, Asana, Slack, Gong, HubSpot, Atlassian, Databricks, PitchBook, and FactSet—allowing the AI to retrieve information and take actions across those systems.

Writer built these connectors using the Model Context Protocol (MCP), an emerging standard for AI system integrations, but added what Shetrit described as an "enterprise-ready" layer on top.

"We took a first-principle approach of: You have this MCP connector infrastructure—how do you build it in a way that's enterprise-ready?" Shetrit explained. "What we have today in the industry is definitely not it."

The system can write and execute code on the fly to handle unexpected scenarios. If a user uploads an unfamiliar file format, for instance, the agent will generate code to extract and process the text without requiring a human to intervene.

Jwo demonstrated this capability with a daily workflow she runs: Every morning at 10 a.m., a Routine automatically summarizes her Google Calendar meetings, identifies external participants, finds their LinkedIn profiles, and sends the summary to her via Slack — all without her involvement.

"This was pretty simple, but you can imagine for a salesperson it might say, 'At the end of the day, wrap up a summary of all the calls I had, send me action items, post it to the account-specific Slack channel, and tag these folks so they can accomplish those workflows,'" Jwo said. "That can run continuously each day, each week, or on demand."

From mortgage lenders to CPG brands: Real-world AI agent use cases across industries

The platform is attracting customers across multiple industries. New American Funding, a mortgage lender, uses Writer Agent to automate marketing workflows. Senior Content Marketing Manager Karen Rodriguez uploads Asana project tickets with creative briefs, and the AI executes tasks like updating email campaigns or transforming articles into social media carousels, video scripts, and captions.

Other use cases span financial services teams creating investment dashboards with PitchBook and FactSet data, consumer packaged goods companies brainstorming new product lines based on social media trends, and marketing teams generating partnership presentations with branded assets.

Writer has added customers including TikTok, Comcast, Keurig Dr Pepper, CAA, and Aptitude Health, joining an existing base that includes Accenture, Qualcomm, Uber, Vanguard, and Marriott. The company now serves more than 300 enterprises and has secured over $50 million in signed contracts, with projections to double that to $100 million this year.

The startup's net retention rate — a measure of how much existing customers expand their usage — stands at 160%, meaning customers on average increase their spending by 60% after initial contracts. Twenty customers who started with $200,000-$300,000 contracts now spend about $1 million annually, according to company data.

'Vibe working': Writer's vision for AI-powered productivity beyond coding

Writer executives frame the platform as enabling what they call "vibe working" — a playful reference to the popular term "vibe coding," which describes AI tools like Cursor that dramatically accelerate software development.

"We used to call it transformation when we took 12 steps and made them nine. That's optimizing the world as it is," Habib said at Writer's AI Leaders Forum earlier this month, according to Forbes. "We can now create a new world. That is the greenfield mindset."

Shetrit echoed this framing: "Vibe coding is the theme of 2025. Our view is that ‘vibe working’ is the theme of 2026. How do you bring the same productivity gains you've seen with coding agents into the workspace in a way that non-technical users can maximize them?"

The platform is powered by Palmyra X5, Writer's proprietary large language model featuring a one-million-token context window — among the largest commercially available. Writer trained the model for approximately $700,000, a fraction of the estimated $100 million OpenAI spent on GPT-4, by using synthetic data and techniques that halt training when returns diminish.

The model can process one million tokens in about 22 seconds and costs 60 cents per million input tokens and $6 per million output tokens — significantly cheaper than comparable offerings, according to company specifications.

Making AI Decisions Visible: Writer's Approach to Trust and Transparency

A distinctive aspect of Writer's approach is transparency into the AI's decision-making process. The interface displays the agent's step-by-step reasoning, showing which data sources it accessed, what code it generated, and how it arrived at outputs.

"There's a very clear exhibition of how the agent is thinking, what it's doing, what it's touching," Shetrit said. "This is important for the end user to trust it, but also important for the IT person or security professional to see what's going on."

This "supervision" model goes beyond simple observability of API calls to encompass what Shetrit described as "a superset of observability" — giving organizations the ability to not just monitor but control AI behavior through policies and permissions.

Session logs capture all agent activity when enabled by administrators, and users can submit feedback on every output to help improve system performance. The platform also emphasizes providing sources and citations for generated content, allowing users to verify information.

"With any sort of chat assistant, agentic or not, trust but verify is really important," Jwo said. "That's part of the pillars of us building this and making it enterprise-grade."

What Writer Agent Costs—and Why It's Included in the Base Platform

Writer is including all the new capabilities—Playbooks, Routines, Connectors, and Personality customization—as part of its core platform without additional charges, according to Jwo.

"This is fully included as part of the Writer platform," she said. "We're not charging additional for using Writer Agent."

The "Personality" feature allows individual users, teams, or entire organizations to customize the AI's communication style, ensuring generated content matches brand voice and tone guidelines. This works alongside company-level controls that enforce terminology and style requirements.

For highly structured, repetitive tasks, Writer also offers a library of more than 100 pre-built agents and an AI Studio for building custom multi-agent systems aligned with specific business use cases.

The Race to Define Enterprise AI: Can Purpose-Built Platforms Beat Tech Giants?

The launch crystallizes a fundamental tension in how enterprises will adopt AI at scale. While consumer-facing AI tools emphasize individual productivity gains, companies need systems that work reliably across thousands of employees, integrate with existing software infrastructure, maintain regulatory compliance, and deliver measurable business impact.

Writer's wager is that these requirements demand purpose-built enterprise platforms rather than consumer tools adapted for business use. The company's $1.9 billion valuation — achieved in a November 2024 funding round that raised $200 million — suggests investors see merit in this thesis. Backers include Premji Invest, Radical Ventures, ICONIQ Growth, Salesforce Ventures, and Adobe Ventures.

Yet the competitive landscape remains formidable. Microsoft and Google command enormous distribution advantages through their existing enterprise software relationships. OpenAI and Anthropic possess research capabilities that have produced breakthrough models. Whether Writer can maintain its differentiation as these giants expand their enterprise offerings will test the startup's core premise: that serving Fortune 500 companies from day one creates advantages that research labs turned enterprise vendors cannot easily replicate.

"We're entering an era where if you can describe a better way to work, you can build it," Jwo said. "The new Writer Agent democratizes who gets to be a builder, empowering the operational experts and creative problem-solvers in every department to become the architects of their own transformation. That's how you unlock innovation that competitors can't replicate."

The promise is alluring — AI capabilities powerful enough to transform how work gets done, accessible enough for any employee to use, and controlled enough for enterprises to deploy safely at scale. Whether Writer can deliver on that promise at the speed and scale required will determine if its vision of "vibe working" becomes the 2026 theme Shetrit predicts, or just another ambitious attempt to solve enterprise AI's execution problem.

But one thing is certain: In a market where 85% of AI initiatives fail to escape pilot purgatory, Writer is betting that the winners won't be the companies with the most powerful models—they'll be the ones that make those models actually work inside the enterprise.

read more
SC25: AI Will Supercharge Human Potential but Create a Data Center Crisis

SC25: AI Will Supercharge Human Potential but Create a Data Center Crisis
The keynote for the supercomputing industry’s flagship event focused on AI’s collision course: supercharging human potential and breaking the grid.

read more
Microsoft Launches Cobalt 200 Chip and Expands Azure AI Platform

Microsoft Launches Cobalt 200 Chip and Expands Azure AI Platform
At Microsoft Ignite, the company aggressively positioned Azure as the unified platform for all data and AI workloads.

read more
Cloudflare Outage Resolved After Widespread Internet Disruption

Cloudflare Outage Resolved After Widespread Internet Disruption
The internet infrastructure company says the incident is resolved, though some residual issues may persist.

read more

OpenCV founders launch AI video startup to take on OpenAI and Google

How parallel processing solves the long-form video problem

Fighting a war chest battle with $2 million against billions

Why computer vision expertise matters in generative AI video

Enterprise focus targets training videos and product demos

Where CraftStory fits in a fragmented competitive landscape

Industry Perspectives: Share Your Expertise with the Data Center Community

SC25: AMD, Nvidia, Dell Unveil Next-Gen Supercomputing Products

Immersion Cooling: Lagging Today, Leading Tomorrow

Musk’s xAI launches Grok 4.1 with lower hallucination rate on the web and apps — no API access (for now)

Model Design and Deployment Strategy

Leading the Field in Human and Expert Evaluation

Core Improvements Over Previous Generations

Safety and Adversarial Robustness

Limited Enterprise Access via API

Industry Reception and Next Steps

Microsoft remakes Windows for an era of autonomous AI agents

New platform primitives create foundation for agent computing

Taskbar becomes command center for monitoring AI agents at work

Microsoft bets on open standards against Apple and Google's proprietary approaches

Security model enforces strict containment and mandatory user consent

Post-quantum cryptography and recovery tools address emerging and persistent threats

Measured rollout reflects enterprise caution around autonomous software

Writer’s AI agents can actually do your work—not just chat about it

How Writer is putting workflow automation in the hands of non-technical workers

Security and compliance controls: Writer's answer to enterprise IT concerns

Inside Writer's strategy to connect AI agents across enterprise software systems

From mortgage lenders to CPG brands: Real-world AI agent use cases across industries

'Vibe working': Writer's vision for AI-powered productivity beyond coding

Making AI Decisions Visible: Writer's Approach to Trust and Transparency

What Writer Agent Costs—and Why It's Included in the Base Platform

The Race to Define Enterprise AI: Can Purpose-Built Platforms Beat Tech Giants?

SC25: AI Will Supercharge Human Potential but Create a Data Center Crisis

Microsoft Launches Cobalt 200 Chip and Expands Azure AI Platform

Cloudflare Outage Resolved After Widespread Internet Disruption

Useful Links

Useful Links

AVADA IT

RECENT TWEETS

CONTACT US