Presented by BlueOcean
AI has become a central part of how marketing teams work, but the results often fall short. Models can generate content at scale and summarize information in seconds, yet the outputs are not always aligned with the brand, the audience, or the company’s strategic goals. The problem is not capability. The problem is the absence of context.
The bottleneck is no longer computational power. It is contextual intelligence.
Generative AI is powerful, but it doesn’t understand the nuances of the business it supports. It doesn’t have the context for why customers choose one brand over another or what creates competitive advantage. Without that grounding, AI operates as a fast executor rather than a strategic partner. It produces more, but it does not always help teams make better decisions.
This becomes even more visible inside complex marketing organizations where insights live in different corners of the business and rarely come together in a unified way.
As Grant McDougall, CEO of BlueOcean, explains, “Inside large marketing organizations, the data is vertical. Digital has theirs, loyalty has theirs, content has theirs, media has theirs. But CMOs think horizontally. They need to combine customer insight, competitive movement, creative performance, and sales signals into one coherent view. Connecting that data fundamentally changes how decisions get made.”
This shift from vertical data to horizontal intelligence reflects a new phase in AI adoption. The emphasis is shifting from output volume to decision quality. Marketers are recognizing that the future of AI is intelligence that understands who you are as a company and why you matter to your customers.
In BlueOcean’s work with global brands across technology, healthcare, and consumer industries, including Amazon, Cisco, SAP, and Intel, the same pattern appears. Teams move faster and make better decisions when AI is grounded in structured brand and competitive context.
Why context is becoming the critical ingredient
Large language models excel at producing language. They do not inherently understand brand, meaning, or intention. This is why generic prompts often lead to generic outputs. The model executes based on statistical prediction, not strategic nuance.
Context changes that. When AI systems are supplied with structured inputs about brand strategy, audience insight, and creative intent, the output becomes sharper and more reliable. Recommendations become more specific. Creative stays on brief. The AI begins to act less like a content generator and more like a partner that understands the boundaries and goals of the business.
This shift mirrors a key theme from BlueOcean’s recent report, Building Marketing Intelligence: The CMO Blueprint for Context-Aware AI. The report explains that AI is most effective when it is grounded in a clear frame of reference. CMOs who design these context-aware workflows see better performance, stronger creative, and more reliable decision-making.
For a deeper exploration of these principles, the full report is available here.
The industry’s pivot: From execution to understanding
Many teams remain in an experimentation phase with AI. They test tools, run pilots, and explore new workflows. This creates productivity gains but not intelligence. Without shared context, every team uses AI differently, and the result is fragmentation.
The companies making the clearest progress treat context as a shared layer across workflows. When teams pull from the same brand strategy, insights, and creative guidance, AI becomes more predictable and more valuable. It supports decisions rather than contradicting them. This becomes especially effective when the context includes external signals such as shifts in sentiment, competitor movement, content performance, and broader category trends.
Brand-context AI connects brand identity, customer sentiment, competitive movement, and creative performance in a single environment. It strengthens workflows in practical ways: briefs become more strategic, content reviews more accurate, and insights faster because the system synthesizes patterns teams once assembled manually.
Across enterprise teams supported by BlueOcean, this shift consistently unlocks clarity. AI becomes a contributor to strategic understanding rather than a generator of disconnected output. With shared context in place, teams make more confident, coherent, and aligned decisions.
Structured context: What it actually includes
Structured context is the intelligence marketers already curate to understand how their brand shows up in the world. It brings together the narrative elements that shape the brand’s voice, the customer motivations that influence messaging, the competitive signals unfolding in the market, and the creative patterns that have historically performed. It also includes the external brand signals teams monitor every day: sentiment shifts, content dynamics, press and social movement, and how competitors position themselves across channels.
When this information is organized into a coherent frame, AI can interpret direction and creative choices with the same clarity strategists use. The value does not come from giving AI more data; it comes from giving it structure so it can reason through decisions the way marketers already do.
The new division of labor between humans and AI
The strongest AI-enabled marketing teams have one thing in common. They are clear about what humans own and what AI owns. Humans define purpose, strategy, and creative judgment. They understand emotion, cultural nuance, competitive meaning, and brand intent.
AI delivers speed, scale, and precision. It excels at synthesizing information, producing iterations, and following structured instruction.
“AI works best when it is given clear boundaries and clear intent,” says McDougall. “Humans set the direction led by creativity and imagination. AI executes with precision. That partnership is where the real value emerges.”
The systems that perform best are the ones guided by human-defined boundaries and human-led strategy. AI provides scale, but people provide meaning.
CMOs are recognizing that governing context is becoming a leadership responsibility. They already own brand, messaging, and customer insight. Extending this ownership into AI systems ensures the brand shows up consistently across every touchpoint, whether a human or a model produced the work.
A practical example of context in action
Consider a team preparing a global campaign. Without context, an AI system might generate copy that sounds polished but generic. It may overlook claims the brand can make, reference benefits competitors own, or ignore differentiators that matter most. It may even amplify a competitor’s message simply because that language appears frequently in public data.
With structured context, the experience changes. The model understands the audience, the brand tone, the competitive landscape, and the objective. It knows which competitors are gaining attention, which messages resonate in the market, and where the brand has permission to play. It can propose angles that strengthen positioning rather than dilute it. It can generate variations that stay on brief and avoid competitor-owned territory.
BlueOcean has observed this shift inside enterprise teams including Amazon, Intel, and SAP, where structured brand and competitive context has improved alignment and reduced drift at scale.
Creative, brand, and competitive signals are no longer separate inputs. When they are connected and contextualized, AI begins supporting decision-making in a meaningful way. The technology stops producing output for its own sake and starts helping marketers understand where the brand stands and what actions will grow it.
What comes next
A new phase of AI is beginning. AI agents are evolving from task assistants to systems that collaborate across tools and workflows. As these systems become more capable, context will determine whether they behave unpredictably or perform as trusted extensions of the team.
Brand-context AI provides a path forward. It gives AI systems the structure they need to operate consistently. It supports the teams responsible for protecting brand integrity. In practice, these agents can already assemble context-aware creative briefs, review content for competitive and brand alignment, monitor shifts in category messaging, and synthesize insights across products or markets. It creates intelligence that adapts rather than overwhelms.
In the coming years, success will not come from producing more content, but from producing content anchored in brand context, the kind that sharpens decisions, strengthens positioning, and drives long-term growth.
The companies that build on context today will define the generative enterprise of tomorrow. BlueOcean is helping leading enterprises shape the next generation of context-aware AI systems.
Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.
- The two companies are launching the Accenture Anthropic Business Group to bring Anthropic's AI to Accenture's employees.
- Empromptu claims all a user has to do is tell the platform's AI chatbot what they want — like a new HTML or JavaScript app — and the AI will go ahead and build it.
- With evolving export controls, data center operators must act now to ensure compliance and secure access to critical AI chip technology.
- The shift marks a significant change in Canada’s energy policy, enabling new growth opportunities for natural gas-fired power plants and carbon capture initiatives.
- Neoclouds excel in specialized, cost-efficient AI infrastructure, but power, supply chain, and talent challenges limit their chances of a full takeover.
Chinese AI startup Zhipu AI aka Z.ai has released its GLM-4.6V series, a new generation of open-source vision-language models (VLMs) optimized for multimodal reasoning, frontend automation, and high-efficiency deployment.
The release includes two models in "large" and "small" sizes:
-
GLM-4.6V (106B), a larger 106-billion parameter model aimed at cloud-scale inference
-
GLM-4.6V-Flash (9B), a smaller model of only 9 billion parameters designed for low-latency, local applications
Recall that generally speaking, models with more parameters — or internal settings governing their behavior, i.e. weights and biases — are more powerful, performant, and capable of performing at a higher general level across more varied tasks.
However, smaller models can offer better efficiency for edge or real-time applications where latency and resource constraints are critical.
The defining innovation in this series is the introduction of native function calling in a vision-language model—enabling direct use of tools such as search, cropping, or chart recognition with visual inputs.
With a 128,000 token context length (equivalent to a 300-page novel's worth of text exchanged in a single input/output interaction with the user) and state-of-the-art (SoTA) results across more than 20 benchmarks, the GLM-4.6V series positions itself as a highly competitive alternative to both closed and open-source VLMs. It's available in the following formats:
-
API access via OpenAI-compatible interface
-
Try the demo on Zhipu’s web interface
-
Download weights from Hugging Face
-
Desktop assistant app available on Hugging Face Spaces
Licensing and Enterprise Use
GLM‑4.6V and GLM‑4.6V‑Flash are distributed under the MIT license, a permissive open-source license that allows free commercial and non-commercial use, modification, redistribution, and local deployment without obligation to open-source derivative works.
This licensing model makes the series suitable for enterprise adoption, including scenarios that require full control over infrastructure, compliance with internal governance, or air-gapped environments.
Model weights and documentation are publicly hosted on Hugging Face, with supporting code and tooling available on GitHub.
The MIT license ensures maximum flexibility for integration into proprietary systems, including internal tools, production pipelines, and edge deployments.
Architecture and Technical Capabilities
The GLM-4.6V models follow a conventional encoder-decoder architecture with significant adaptations for multimodal input.
Both models incorporate a Vision Transformer (ViT) encoder—based on AIMv2-Huge—and an MLP projector to align visual features with a large language model (LLM) decoder.
Video inputs benefit from 3D convolutions and temporal compression, while spatial encoding is handled using 2D-RoPE and bicubic interpolation of absolute positional embeddings.
A key technical feature is the system’s support for arbitrary image resolutions and aspect ratios, including wide panoramic inputs up to 200:1.
In addition to static image and document parsing, GLM-4.6V can ingest temporal sequences of video frames with explicit timestamp tokens, enabling robust temporal reasoning.
On the decoding side, the model supports token generation aligned with function-calling protocols, allowing for structured reasoning across text, image, and tool outputs. This is supported by extended tokenizer vocabulary and output formatting templates to ensure consistent API or agent compatibility.
Native Multimodal Tool Use
GLM-4.6V introduces native multimodal function calling, allowing visual assets—such as screenshots, images, and documents—to be passed directly as parameters to tools. This eliminates the need for intermediate text-only conversions, which have historically introduced information loss and complexity.
The tool invocation mechanism works bi-directionally:
-
Input tools can be passed images or videos directly (e.g., document pages to crop or analyze).
-
Output tools such as chart renderers or web snapshot utilities return visual data, which GLM-4.6V integrates directly into the reasoning chain.
In practice, this means GLM-4.6V can complete tasks such as:
-
Generating structured reports from mixed-format documents
-
Performing visual audit of candidate images
-
Automatically cropping figures from papers during generation
-
Conducting visual web search and answering multimodal queries
High Performance Benchmarks Compared to Other Similar-Sized Models
GLM-4.6V was evaluated across more than 20 public benchmarks covering general VQA, chart understanding, OCR, STEM reasoning, frontend replication, and multimodal agents.
According to the benchmark chart released by Zhipu AI:
-
GLM-4.6V (106B) achieves SoTA or near-SoTA scores among open-source models of comparable size (106B) on MMBench, MathVista, MMLongBench, ChartQAPro, RefCOCO, TreeBench, and more.
-
GLM-4.6V-Flash (9B) outperforms other lightweight models (e.g., Qwen3-VL-8B, GLM-4.1V-9B) across almost all categories tested.
-
The 106B model’s 128K-token window allows it to outperform larger models like Step-3 (321B) and Qwen3-VL-235B on long-context document tasks, video summarization, and structured multimodal reasoning.
Example scores from the leaderboard include:
-
MathVista: 88.2 (GLM-4.6V) vs. 84.6 (GLM-4.5V) vs. 81.4 (Qwen3-VL-8B)
-
WebVoyager: 81.0 vs. 68.4 (Qwen3-VL-8B)
-
Ref-L4-test: 88.9 vs. 89.5 (GLM-4.5V), but with better grounding fidelity at 87.7 (Flash) vs. 86.8
Both models were evaluated using the vLLM inference backend and support SGLang for video-based tasks.
Frontend Automation and Long-Context Workflows
Zhipu AI emphasized GLM-4.6V’s ability to support frontend development workflows. The model can:
-
Replicate pixel-accurate HTML/CSS/JS from UI screenshots
-
Accept natural language editing commands to modify layouts
-
Identify and manipulate specific UI components visually
This capability is integrated into an end-to-end visual programming interface, where the model iterates on layout, design intent, and output code using its native understanding of screen captures.
In long-document scenarios, GLM-4.6V can process up to 128,000 tokens—enabling a single inference pass across:
-
150 pages of text (input)
-
200 slide decks
-
1-hour videos
Zhipu AI reported successful use of the model in financial analysis across multi-document corpora and in summarizing full-length sports broadcasts with timestamped event detection.
Training and Reinforcement Learning
The model was trained using multi-stage pre-training followed by supervised fine-tuning (SFT) and reinforcement learning (RL). Key innovations include:
-
Curriculum Sampling (RLCS): Dynamically adjusts the difficulty of training samples based on model progress
-
Multi-domain reward systems: Task-specific verifiers for STEM, chart reasoning, GUI agents, video QA, and spatial grounding
-
Function-aware training: Uses structured tags (e.g., <think>, <answer>, <|begin_of_box|>) to align reasoning and answer formatting
The reinforcement learning pipeline emphasizes verifiable rewards (RLVR) over human feedback (RLHF) for scalability, and avoids KL/entropy losses to stabilize training across multimodal domains
Pricing (API)
Zhipu AI offers competitive pricing for the GLM-4.6V series, with both the flagship model and its lightweight variant positioned for high accessibility.
-
GLM-4.6V: $0.30 (input) / $0.90 (output) per 1M tokens
-
GLM-4.6V-Flash: Free
Compared to major vision-capable and text-first LLMs, GLM-4.6V is among the most cost-efficient for multimodal reasoning at scale. Below is a comparative snapshot of pricing across providers:
USD per 1M tokens — sorted lowest → highest total cost
Model
Input
Output
Total Cost
Source
Qwen 3 Turbo
$0.05
$0.20
$0.25
ERNIE 4.5 Turbo
$0.11
$0.45
$0.56
GLM‑4.6V
$0.30
$0.90
$1.20
Grok 4.1 Fast (reasoning)
$0.20
$0.50
$0.70
Grok 4.1 Fast (non-reasoning)
$0.20
$0.50
$0.70
deepseek-chat (V3.2-Exp)
$0.28
$0.42
$0.70
deepseek-reasoner (V3.2-Exp)
$0.28
$0.42
$0.70
Qwen 3 Plus
$0.40
$1.20
$1.60
ERNIE 5.0
$0.85
$3.40
$4.25
Qwen-Max
$1.60
$6.40
$8.00
GPT-5.1
$1.25
$10.00
$11.25
Gemini 2.5 Pro (≤200K)
$1.25
$10.00
$11.25
Gemini 3 Pro (≤200K)
$2.00
$12.00
$14.00
Gemini 2.5 Pro (>200K)
$2.50
$15.00
$17.50
Grok 4 (0709)
$3.00
$15.00
$18.00
Gemini 3 Pro (>200K)
$4.00
$18.00
$22.00
Claude Opus 4.1
$15.00
$75.00
$90.00
Previous Releases: GLM‑4.5 Series and Enterprise Applications
Prior to GLM‑4.6V, Z.ai released the GLM‑4.5 family in mid-2025, establishing the company as a serious contender in open-source LLM development.
The flagship GLM‑4.5 and its smaller sibling GLM‑4.5‑Air both support reasoning, tool use, coding, and agentic behaviors, while offering strong performance across standard benchmarks.
The models introduced dual reasoning modes (“thinking” and “non-thinking”) and could automatically generate complete PowerPoint presentations from a single prompt — a feature positioned for use in enterprise reporting, education, and internal comms workflows. Z.ai also extended the GLM‑4.5 series with additional variants such as GLM‑4.5‑X, AirX, and Flash, targeting ultra-fast inference and low-cost scenarios.
Together, these features position the GLM‑4.5 series as a cost-effective, open, and production-ready alternative for enterprises needing autonomy over model deployment, lifecycle management, and integration pipel
Ecosystem Implications
The GLM-4.6V release represents a notable advance in open-source multimodal AI. While large vision-language models have proliferated over the past year, few offer:
-
Integrated visual tool usage
-
Structured multimodal generation
-
Agent-oriented memory and decision logic
Zhipu AI’s emphasis on “closing the loop” from perception to action via native function calling marks a step toward agentic multimodal systems.
The model’s architecture and training pipeline show a continued evolution of the GLM family, positioning it competitively alongside offerings like OpenAI’s GPT-4V and Google DeepMind’s Gemini-VL.
Takeaway for Enterprise Leaders
With GLM-4.6V, Zhipu AI introduces an open-source VLM capable of native visual tool use, long-context reasoning, and frontend automation. It sets new performance marks among models of similar size and provides a scalable platform for building agentic, multimodal AI systems.
-
- Bill Kleyman reflects on a whirlwind year of conferences, exploring AI, power, and the evolving role of data centers in a rapidly changing world.
Anthropic on Monday launched a beta integration that connects its fast-growing Claude Code programming agent directly to Slack, allowing software engineers to delegate coding tasks without leaving the workplace messaging platform where much of their daily communication already happens.
The release, which Anthropic describes as a "research preview," is the AI safety company's latest move to embed its technology deeper into enterprise workflows — and comes as Claude Code has emerged as a surprise revenue engine, generating over $1 billion in annualized revenue just six months after its public debut in May.
"The critical context around engineering work often lives in Slack, including bug reports, feature requests, and engineering discussion," the company wrote in its announcement blog post. "When a bug report appears or a teammate needs a code fix, you can now tag Claude in Slack to automatically spin up a Claude Code session using the surrounding context."
From bug report to pull request: how the new Slack integration actually works
The mechanics are deceptively simple but address a persistent friction point in software development: the gap between where problems get discussed and where they get fixed.
When a user mentions @Claude in a Slack channel or thread, Claude analyzes the message to determine whether it constitutes a coding task. If it does, the system automatically creates a new Claude Code session. Users can also explicitly instruct Claude to treat requests as coding tasks.
Claude gathers context from recent channel and thread messages in Slack to feed into the Claude Code session. It will use this context to automatically choose which repository to run the task on based on the repositories you've authenticated to Claude Code on the web.
As the Claude Code session progresses, Claude posts status updates back to the Slack thread. Once complete, users receive a link to the full session where they can review changes, along with a direct link to open a pull request.
The feature builds on Anthropic's existing Claude for Slack integration and requires users to have access to Claude Code on the web. In practical terms, a product manager reporting a bug in Slack could tag Claude, which would then analyze the conversation context, identify the relevant code repository, investigate the issue, propose a fix, and post a pull request—all while updating the original Slack thread with its progress.
Why Anthropic is betting big on enterprise workflow integrations
The Slack integration arrives at a pivotal moment for Anthropic. Claude Code has already hit $1 billion in revenue six months since its public debut in May, according to a LinkedIn post from Anthropic's chief product officer, Mike Krieger. The coding agent continues to barrel toward scale with customers like Netflix, Spotify, and Salesforce.
The velocity of that growth helps explain why Anthropic made its first-ever acquisition earlier this month. Anthropic declined to comment on financial details. The Information earlier reported on Anthropic's bid to acquire Bun.
Bun is a breakthrough JavaScript runtime that is dramatically faster than the leading competition. As an all-in-one toolkit — combining runtime, package manager, bundler, and test runner — it's become essential infrastructure for AI-led software engineering, helping developers build and test applications at unprecedented velocity.
Since becoming generally available in May 2025, Claude Code has grown from its origins as an internal engineering experiment into a critical tool for many of the world's category-leading enterprises, including Netflix, Spotify, KPMG, L'Oreal, and Salesforce — and Bun has been key in helping scale its infrastructure throughout that evolution.
The acquisition signals that Anthropic views Claude Code not as a peripheral feature but as a core business line worth substantial investment. The Slack integration extends that bet, positioning Claude Code as an ambient presence in the workspaces where engineering decisions actually get made.
According to an Anthropic spokesperson, companies including Rakuten, Novo Nordisk, Uber, Snowflake, and Ramp now use Claude Code for both professional and novice developers. Rakuten, the Japanese e-commerce giant, has reportedly reduced software development timelines from 24 days to just 5 days using the tool — a 79% reduction that illustrates the productivity claims Anthropic has been making.
Claude Code's rapid rise from internal experiment to billion-dollar product
The Slack launch is the latest in a rapid series of Claude Code expansions. In late November, Claude Code was added to Anthropic's desktop apps including the Mac version. Claude Code was previously limited to mobile apps and the web. It allows software engineers to code, research, and update work with multiple local and remote sessions running at the same time.
That release accompanied Anthropic's unveiling of Claude Opus 4.5, its newest and most capable model. Claude Opus 4.5 is available today on the company's apps, API, and on all three major cloud platforms. Pricing is $5/$25 per million tokens — making Opus-level capabilities accessible to even more users, teams, and enterprises.
The company has also invested heavily in the developer infrastructure that powers Claude Code. In late November, Anthropic released three new beta features for tool use: Tool Search Tool, which allows Claude to use search tools to access thousands of tools without consuming its context window; Programmatic Tool Calling, which allows Claude to invoke tools in a code execution environment reducing the impact on the model's context window; and Tool Use Examples, which provides a universal standard for demonstrating how to effectively use a given tool.
The Model Context Protocol (MCP) is an open standard for connecting AI agents to external systems. Connecting agents to tools and data traditionally requires a custom integration for each pairing, creating fragmentation and duplicated effort that makes it difficult to scale truly connected systems. MCP provides a universal protocol — developers implement MCP once in their agent and it unlocks an entire ecosystem of integrations.
Inside Anthropic's own AI transformation: what happens when engineers use Claude all day
Anthropic has been unusually transparent about how its own engineers use Claude Code — and the findings offer a preview of broader workforce implications. In August 2025, Anthropic surveyed 132 engineers and researchers, conducted 53 in-depth qualitative interviews, and studied internal Claude Code usage data to understand how AI use is changing work at the company.
Employees self-reported using Claude in 60% of their work and achieving a 50% productivity boost, a 2-3x increase from this time last year. This productivity looks like slightly less time per task category, but considerably more output volume.
Perhaps most notably, 27% of Claude-assisted work consists of tasks that wouldn't have been done otherwise, such as scaling projects, making nice-to-have tools like interactive data dashboards, and exploratory work that wouldn't be cost-effective if done manually.
The internal research also revealed how Claude is changing the nature of engineering collaboration. The maximum number of consecutive tool calls Claude Code makes per transcript increased by 116%. Claude now chains together 21.2 independent tool calls without need for human intervention versus 9.8 tool calls from six months ago.
The number of human turns decreased by 33%. The average number of human turns decreased from 6.2 to 4.1 per transcript, suggesting that less human input is necessary to accomplish a given task now compared to six months ago.
But the research also surfaced tensions. One prominent theme was that Claude has become the first stop for questions that once went to colleagues. "It has reduced my dependence on [my team] by 80%, [but] the last 20% is crucial and I go and talk to them," one engineer explained. Several engineers said they "bounce ideas off" Claude, similar to interactions with human collaborators.
Others described experiencing less interaction with colleagues. Some appreciate the reduced social friction, but others resist the change or miss the older way of working: "I like working with people and it is sad that I 'need' them less now."
How Anthropic stacks up against OpenAI, Google, and Microsoft in the enterprise AI race
Anthropic is not alone in racing to capture the enterprise coding market. OpenAI, Google, and Microsoft (through GitHub Copilot) are all pursuing similar integrations. The Slack launch gives Anthropic a presence in one of the most widely-used enterprise communication platforms — Slack claims over 750,000 organizations use its software.
The deal comes as Anthropic pursues a more disciplined growth path than rival OpenAI, focusing on enterprise customers and coding workloads. Internal financials reported by The Wall Street Journal show Anthropic expects to break even by 2028 — two years earlier than OpenAI, which continues to invest heavily in infrastructure as it expands into video, hardware, and consumer products.
The move also marks an increased push into developer tooling. Anthropic has recently seen backing from some of tech's biggest titans. Microsoft and Nvidia pledged up to $15 billion in fresh investment in Anthropic last month, alongside a $30 billion commitment from Anthropic to run Claude Code on Microsoft's cloud. This is in addition to the $8 billion invested from Amazon and $3 billion from Google.
The cross-investment from both Microsoft and Google — fierce competitors in the cloud and AI spaces — highlights how valuable Anthropic's enterprise positioning has become. By integrating with Slack (which is owned by Salesforce), Anthropic further embeds itself in the enterprise software ecosystem while remaining platform-agnostic.
What the Slack integration means for developers — and whether they can trust it
For engineering teams, the Slack integration promises to collapse the distance between problem identification and problem resolution. A bug report in a Slack channel can immediately trigger investigation. A feature request can spawn a prototype. A code review comment can generate a refactor.
But the integration also raises questions about oversight and code quality. Most Anthropic employees use Claude frequently while reporting they can "fully delegate" only 0-20% of their work to it. Claude is a constant collaborator but using it generally involves active supervision and validation, especially in high-stakes work — versus handing off tasks requiring no verification at all.
Some employees are concerned about the atrophy of deeper skillsets required for both writing and critiquing code — "When producing output is so easy and fast, it gets harder and harder to actually take the time to learn something."
The Slack integration, by making Claude Code invocation as simple as an @mention, may accelerate both the productivity benefits and the skill-atrophy concerns that Anthropic's own research has documented.
The future of coding may be conversational—and Anthropic is racing to prove it
The beta launch marks the beginning of what Anthropic expects will be a broader rollout, with documentation forthcoming for teams looking to deploy the integration and refinements planned based on user feedback during the research preview phase.
For Anthropic, the Slack integration is a calculated bet on a fundamental shift in how software gets written. The company is wagering that the future of coding will be conversational — that the walls between where developers talk about problems and where they solve them will dissolve entirely. The companies that win enterprise AI, in this view, will be the ones that meet developers not in specialized tools but in the chat windows they already have open all day.
Whether that vision becomes reality will depend on whether Claude Code can deliver enterprise-grade reliability while maintaining the security that organizations demand. The early returns are promising: a billion dollars in revenue, a roster of Fortune 500 customers, and a growing ecosystem of integrations suggest Anthropic is onto something real.
But in one of Anthropic's own internal interviews, an engineer offered a more cautious assessment of the transformation underway: "Nobody knows what's going to happen… the important thing is to just be really adaptable."
In the age of AI coding agents, that may be the only career advice that holds up.
When many enterprises weren’t even thinking about agentic behaviors or infrastructures, Booking.com had already “stumbled” into them with its homegrown conversational recommendation system.
This early experimentation has allowed the company to take a step back and avoid getting swept up in the frantic AI agent hype. Instead, it is taking a disciplined, layered, modular approach to model development: small, travel-specific models for cheap, fast inference; larger large language models (LLMs) for reasoning and understanding; and domain-tuned evaluations built in-house when precision is critical.
With this hybrid strategy — combined with selective collaboration with OpenAI — Booking.com has seen accuracy double across key retrieval, ranking and customer-interaction tasks.
As Pranav Pathak, Booking.com’s AI product development lead, posed to VentureBeat in a new podcast: “Do you build it very, very specialized and bespoke and then have an army of a hundred agents? Or do you keep it general enough and have five agents that are good at generalized tasks, but then you have to orchestrate a lot around them? That's a balance that I think we're still trying to figure out, as is the rest of the industry.”
Check out the new Beyond the Pilot podcast here, and continue reading for highlights.
Moving from guessing to deep personalization without being ‘creepy’
Recommendation systems are core to Booking.com’s customer-facing platforms; however, traditional recommendation tools have been less about recommendation and more about guessing, Pathak conceded. So, from the start, he and his team vowed to avoid generic tools: As he put it, the price and recommendation should be based on customer context.
Booking.com’s initial pre-gen AI tooling for intent and topic detection was a small language model, what Pathak described as “the scale and size of BERT.” The model ingested the customer’s inputs around their problem to determine whether it could be solved through self-service or bumped to a human agent.
“We started with an architecture of ‘you have to call a tool if this is the intent you detect and this is how you've parsed the structure,” Pathak explained. “That was very, very similar to the first few agentic architectures that came out in terms of reason and defining a tool call.”
His team has since built out that architecture to include an LLM orchestrator that classifies queries, triggers retrieval-augmented generation (RAG) and calls APIs or smaller, specialized language models. “We've been able to scale that system quite well because it was so close in architecture that, with a few tweaks, we now have a full agentic stack,” said Pathak.
As a result, Booking.com is seeing a 2X increase in topic detection, which in turn is freeing up human agents’ bandwidth by 1.5 to 1.7X. More topics, even complicated ones previously identified as ‘other’ and requiring escalation, are being automated.
Ultimately, this supports more self-service, freeing human agents to focus on customers with uniquely-specific problems that the platform doesn’t have a dedicated tool flow for — say, a family that is unable to access its hotel room at 2 a.m. when the front desk is closed.
That not only “really starts to compound,” but has a direct, long-term impact on customer retention, Pathak noted. “One of the things we've seen is, the better we are at customer service, the more loyal our customers are.”
Another recent rollout is personalized filtering. Booking.com has between 200 and 250 search filters on its website — an unrealistic amount for any human to sift through, Pathak pointed out. So, his team introduced a free text box that users can type into to immediately receive tailored filters.
“That becomes such an important cue for personalization in terms of what you're looking for in your own words rather than a clickstream,” said Pathak.
In turn, it cues Booking.com into what customers actually want. For instance, hot tubs — when filter personalization first rolled out, jacuzzi’s were one of the most popular requests. That wasn’t even a consideration previously; there wasn’t even a filter. Now that filter is live.
“I had no idea,” Pathak noted. “I had never searched for a hot tub in my room honestly.”
When it comes to personalization, though, there is a fine line; memory remains complicated, Pathak emphasized. While it’s important to have long-term memories and evolving threads with customers — retaining information like their typical budgets, preferred hotel star ratings or whether they need disability access — it must be on their terms and protective of their privacy.
Booking.com is extremely mindful with memory, seeking consent so as to not be “creepy” when collecting customer information.
“Managing memory is much harder than actually building memory,” said Pathak. “The tech is out there, we have the technical chops to build it. We want to make sure we don't launch a memory object that doesn't respect customer consent, that doesn't feel very natural.”
Finding a balance of build versus buy
As agents mature, Booking.com is navigating a central question facing the entire industry: How narrow should agents become?
Instead of committing to either a swarm of highly specialized agents or a few generalized ones, the company aims for reversible decisions and avoids “one-way doors” that lock its architecture into long-term, costly paths. Pathak’s strategy is: Generalize where possible, specialize where necessary and keep agent design flexible to help ensure resiliency.
Pathak and his team are “very mindful” of use cases, evaluating where to build more generalized, reusable agents or more task-specific ones. They strive to use the smallest model possible, with the highest level of accuracy and output quality, for each use case. Whatever can be generalized is.
Latency is another important consideration. When factual accuracy and avoiding hallucinations is paramount, his team will use a larger, much slower model; but with search and recommendations, user expectations set speed. (Pathak noted: “No one’s patient.”)
“We would, for example, never use something as heavy as GPT-5 for just topic detection or for entity extraction,” he said.
Booking.com takes a similarly elastic tack when it comes to monitoring and evaluations: If it's general-purpose monitoring that someone else is better at building and has horizontal capability, they’ll buy it. But if it’s instances where brand guidelines must be enforced, they’ll build their own evals.
Ultimately, Booking.com has leaned into being “super anticipatory,” agile and flexible. “At this point with everything that's happening with AI, we are a little bit averse to walking through one way doors,” said Pathak. “We want as many of our decisions to be reversible as possible. We don't want to get locked into a decision that we cannot reverse two years from now.”
What other builders can learn from Booking.com’s AI journey
Booking.com’s AI journey can serve as an important blueprint for other enterprises.
Looking back, Pathak acknowledged that they started out with a “pretty complicated” tech stack. They’re now in a good place with that, “but we probably could have started something much simpler and seen how customers interacted with it.”
Given that, he offered this valuable advice: If you’re just starting out with LLMs or agents, out-of-the-box APIs will do just fine. “There's enough customization with APIs that you can already get a lot of leverage before you decide you want to go do more.”
On the other hand, if a use case requires customization not available through a standard API call, that makes a case for in-house tools.
Still, he emphasized: Don't start with the complicated stuff. Tackle the “simplest, most painful problem you can find and the simplest, most obvious solution to that.”
Identify the product market fit, then investigate the ecosystems, he advised — but don’t just rip out old infrastructures because a new use case demands something specific (like moving an entire cloud strategy from AWS to Azure just to use the OpenAI endpoint).
Ultimately: “Don't lock yourself in too early,” Pathak noted. “Don't make decisions that are one-way doors until you are very confident that that's the solution that you want to go with.”


