• When many enterprises weren’t even thinking about agentic behaviors or infrastructures, Booking.com had already “stumbled” into them with its homegrown conversational recommendation system.

    This early experimentation has allowed the company to take a step back and avoid getting swept up in the frantic AI agent hype. Instead, it is taking a disciplined, layered, modular approach to model development: small, travel-specific models for cheap, fast inference; larger large language models (LLMs) for reasoning and understanding; and domain-tuned evaluations built in-house when precision is critical.

    With this hybrid strategy — combined with selective collaboration with OpenAI — Booking.com has seen accuracy double across key retrieval, ranking and customer-interaction tasks.

    As Pranav Pathak, Booking.com’s AI product development lead, posed to VentureBeat in a new podcast: “Do you build it very, very specialized and bespoke and then have an army of a hundred agents? Or do you keep it general enough and have five agents that are good at generalized tasks, but then you have to orchestrate a lot around them? That's a balance that I think we're still trying to figure out, as is the rest of the industry.”

    Check out the new Beyond the Pilot podcast here, and continue reading for highlights.

    Moving from guessing to deep personalization without being ‘creepy’

    Recommendation systems are core to Booking.com’s customer-facing platforms; however, traditional recommendation tools have been less about recommendation and more about guessing, Pathak conceded. So, from the start, he and his team vowed to avoid generic tools: As he put it, the price and recommendation should be based on customer context.

    Booking.com’s initial pre-gen AI tooling for intent and topic detection was a small language model, what Pathak described as “the scale and size of BERT.” The model ingested the customer’s inputs around their problem to determine whether it could be solved through self-service or bumped to a human agent.

    “We started with an architecture of ‘you have to call a tool if this is the intent you detect and this is how you've parsed the structure,” Pathak explained. “That was very, very similar to the first few agentic architectures that came out in terms of reason and defining a tool call.”

    His team has since built out that architecture to include an LLM orchestrator that classifies queries, triggers retrieval-augmented generation (RAG) and calls APIs or smaller, specialized language models. “We've been able to scale that system quite well because it was so close in architecture that, with a few tweaks, we now have a full agentic stack,” said Pathak.

    As a result, Booking.com is seeing a 2X increase in topic detection, which in turn is freeing up human agents’ bandwidth by 1.5 to 1.7X. More topics, even complicated ones previously identified as ‘other’ and requiring escalation, are being automated.

    Ultimately, this supports more self-service, freeing human agents to focus on customers with uniquely-specific problems that the platform doesn’t have a dedicated tool flow for — say, a family that is unable to access its hotel room at 2 a.m. when the front desk is closed.

    That not only “really starts to compound,” but has a direct, long-term impact on customer retention, Pathak noted. “One of the things we've seen is, the better we are at customer service, the more loyal our customers are.”

    Another recent rollout is personalized filtering. Booking.com has between 200 and 250 search filters on its website — an unrealistic amount for any human to sift through, Pathak pointed out. So, his team introduced a free text box that users can type into to immediately receive tailored filters.

    “That becomes such an important cue for personalization in terms of what you're looking for in your own words rather than a clickstream,” said Pathak.

    In turn, it cues Booking.com into what customers actually want. For instance, hot tubs — when filter personalization first rolled out, jacuzzi’s were one of the most popular requests. That wasn’t even a consideration previously; there wasn’t even a filter. Now that filter is live.

    “I had no idea,” Pathak noted. “I had never searched for a hot tub in my room honestly.”

    When it comes to personalization, though, there is a fine line; memory remains complicated, Pathak emphasized. While it’s important to have long-term memories and evolving threads with customers — retaining information like their typical budgets, preferred hotel star ratings or whether they need disability access — it must be on their terms and protective of their privacy.

    Booking.com is extremely mindful with memory, seeking consent so as to not be “creepy” when collecting customer information.

    “Managing memory is much harder than actually building memory,” said Pathak. “The tech is out there, we have the technical chops to build it. We want to make sure we don't launch a memory object that doesn't respect customer consent, that doesn't feel very natural.”

    Finding a balance of build versus buy

    As agents mature, Booking.com is navigating a central question facing the entire industry: How narrow should agents become?

    Instead of committing to either a swarm of highly specialized agents or a few generalized ones, the company aims for reversible decisions and avoids “one-way doors” that lock its architecture into long-term, costly paths. Pathak’s strategy is: Generalize where possible, specialize where necessary and keep agent design flexible to help ensure resiliency.

    Pathak and his team are “very mindful” of use cases, evaluating where to build more generalized, reusable agents or more task-specific ones. They strive to use the smallest model possible, with the highest level of accuracy and output quality, for each use case. Whatever can be generalized is.

    Latency is another important consideration. When factual accuracy and avoiding hallucinations is paramount, his team will use a larger, much slower model; but with search and recommendations, user expectations set speed. (Pathak noted: “No one’s patient.”)

    “We would, for example, never use something as heavy as GPT-5 for just topic detection or for entity extraction,” he said.

    Booking.com takes a similarly elastic tack when it comes to monitoring and evaluations: If it's general-purpose monitoring that someone else is better at building and has horizontal capability, they’ll buy it. But if it’s instances where brand guidelines must be enforced, they’ll build their own evals.

    Ultimately, Booking.com has leaned into being “super anticipatory,” agile and flexible. “At this point with everything that's happening with AI, we are a little bit averse to walking through one way doors,” said Pathak. “We want as many of our decisions to be reversible as possible. We don't want to get locked into a decision that we cannot reverse two years from now.”

    What other builders can learn from Booking.com’s AI journey

    Booking.com’s AI journey can serve as an important blueprint for other enterprises.

    Looking back, Pathak acknowledged that they started out with a “pretty complicated” tech stack. They’re now in a good place with that, “but we probably could have started something much simpler and seen how customers interacted with it.”

    Given that, he offered this valuable advice: If you’re just starting out with LLMs or agents, out-of-the-box APIs will do just fine. “There's enough customization with APIs that you can already get a lot of leverage before you decide you want to go do more.”

    On the other hand, if a use case requires customization not available through a standard API call, that makes a case for in-house tools.

    Still, he emphasized: Don't start with the complicated stuff. Tackle the “simplest, most painful problem you can find and the simplest, most obvious solution to that.”

    Identify the product market fit, then investigate the ecosystems, he advised — but don’t just rip out old infrastructures because a new use case demands something specific (like moving an entire cloud strategy from AWS to Azure just to use the OpenAI endpoint).

    Ultimately: “Don't lock yourself in too early,” Pathak noted. “Don't make decisions that are one-way doors until you are very confident that that's the solution that you want to go with.”

  • LinkedIn is launching its new AI-powered people search this week, after what seems like a very long wait for what should have been a natural offering for generative AI.

    It comes a full three years after the launch of ChatGPT and six months after LinkedIn launched its AI job search offering. For technical leaders, this timeline illustrates a key enterprise lesson: Deploying generative AI in real enterprise settings is challenging, especially at a scale of 1.3 billion users. It’s a slow, brutal process of pragmatic optimization.

    The following account is based on several exclusive interviews with the LinkedIn product and engineering team behind the launch.

    First, here’s how the product works: A user can now type a natural language query like, "Who is knowledgeable about curing cancer?" into LinkedIn’s search bar.

    LinkedIn's old search, based on keywords, would have been stumped. It would have looked only for references to "cancer". If a user wanted to get sophisticated, they would have had to run separate, rigid keyword searches for "cancer" and then "oncology" and manually try to piece the results together.

    The new AI-powered system, however, understands the intent of the search because the LLM under the hood grasps semantic meaning. It recognizes, for example, that "cancer" is conceptually related to "oncology" and even less directly, to "genomics research." As a result, it surfaces a far more relevant list of people, including oncology leaders and researchers, even if their profiles don't use the exact word "cancer."

    The system also balances this relevance with usefulness. Instead of just showing the world's top oncologist (who might be an unreachable third-degree connection), it will also weigh who in your immediate network — like a first-degree connection — is "pretty relevant" and can serve as a crucial bridge to that expert.

    See the video below for an example.

    Arguably, though, the more important lesson for enterprise practitioners is the "cookbook" LinkedIn has developed: a replicable, multi-stage pipeline of distillation, co-design, and relentless optimization. LinkedIn had to perfect this on one product before attempting it on another.

    "Don't try to do too much all at once," writes Wenjing Zhang, LinkedIn's VP of Engineering, in a  post about the product launch, and who also spoke with VentureBeat last week in an interview. She notes that an earlier "sprawling ambition" to build a unified system for all of LinkedIn's products "stalled progress."

    Instead, LinkedIn focused on winning one vertical first. The success of its previously launched AI Job Search — which led to job seekers without a four-year degree being 10% more likely to get hired, according to VP of Product Engineering Erran Berger — provided the blueprint.

    Now, the company is applying that blueprint to a far larger challenge. "It's one thing to be able to do this across tens of millions of jobs," Berger told VentureBeat. "It's another thing to do this across north of a billion members."

    For enterprise AI builders, LinkedIn's journey provides a technical playbook for what it actually takes to move from a successful pilot to a billion-user-scale product.

    The new challenge: a 1.3 billion-member graph

    The job search product created a robust recipe that the new people search product could build upon, Berger explained. 

    The recipe started with with a "golden data set" of just a few hundred to a thousand real query-profile pairs, meticulously scored against a detailed 20- to 30-page "product policy" document. To scale this for training, LinkedIn used this small golden set to prompt a large foundation model to generate a massive volume of synthetic training data. This synthetic data was used to train a 7-billion-parameter "Product Policy" model — a high-fidelity judge of relevance that was too slow for live production but perfect for teaching smaller models.

    However, the team hit a wall early on. For six to nine months, they struggled to train a single model that could balance strict policy adherence (relevance) against user engagement signals. The "aha moment" came when they realized they needed to break the problem down. They distilled the 7B policy model into a 1.7B teacher model focused solely on relevance. They then paired it with separate teacher models trained to predict specific member actions, such as job applications for the jobs product, or connecting and following for people search. This "multi-teacher" ensemble produced soft probability scores that the final student model learned to mimic via KL divergence loss.

    The resulting architecture operates as a two-stage pipeline. First, a larger 8B parameter model handles broad retrieval, casting a wide net to pull candidates from the graph. Then, the highly distilled student model takes over for fine-grained ranking. While the job search product successfully deployed a 0.6B (600-million) parameter student, the new people search product required even more aggressive compression. As Zhang notes, the team pruned their new student model from 440M down to just 220M parameters, achieving the necessary speed for 1.3 billion users with less than 1% relevance loss.

    But applying this to people search broke the old architecture. The new problem included not just ranking but also retrieval.

    “A billion records," Berger said, is a "different beast."

    The team’s prior retrieval stack was built on CPUs. To handle the new scale and the latency demands of a "snappy" search experience, the team had to move its indexing to GPU-based infrastructure. This was a foundational architectural shift that the job search product did not require.

    Organizationally, LinkedIn benefited from multiple approaches. For a time, LinkedIn had two separate teams job search and people search attempting to solve the problem in parallel. But once the job search team achieved its breakthrough using the policy-driven distillation method, Berger and his leadership team intervened. They brought over the architects of the job search win product lead Rohan Rajiv and engineering lead Wenjing Zhang to transplant their 'cookbook' directly to the new domain.

    Distilling for a 10x throughput gain

    With the retrieval problem solved, the team faced the ranking and efficiency challenge. This is where the cookbook was adapted with new, aggressive optimization techniques.

    Zhang’s technical post (I’ll insert the link once it goes live) provides the specific details our audience of AI engineers will appreciate. One of the more significant optimizations was input size.

    To feed the model, the team trained another LLM with reinforcement learning (RL) for a single purpose: to summarize the input context. This "summarizer" model was able to reduce the model's input size by 20-fold with minimal information loss.

    The combined result of the 220M-parameter model and the 20x input reduction? A 10x increase in ranking throughput, allowing the team to serve the model efficiently to its massive user base.

    Pragmatism over hype: building tools, not agents

    Throughout our discussions, Berger was adamant about something else that might catch peoples’ attention: The real value for enterprises today lies in perfecting recommender systems, not in chasing "agentic hype." He also refused to talk about the specific models that the company used for the searches, suggesting it almost doesn't matter. The company selects models based on which one it finds the most efficient for the task.

    The new AI-powered people search is a manifestation of Berger’s philosophy that it’s best to optimize the recommender system first. The architecture includes a new "intelligent query routing layer," as Berger explained, that itself is LLM-powered. This router pragmatically decides if a user's query — like "trust expert" — should go to the new semantic, natural-language stack or to the old, reliable lexical search.

    This entire, complex system is designed to be a "tool" that a future agent will use, not the agent itself.

    "Agentic products are only as good as the tools that they use to accomplish tasks for people," Berger said. "You can have the world's best reasoning model, and if you're trying to use an agent to do people search but the people search engine is not very good, you're not going to be able to deliver." 

    Now that the people search is available, Berger suggested that one day the company will be offering agents to use it. But he didn’t provide details on timing. He also said the recipe used for job and people search will be spread across the company’s other products.

    For enterprises building their own AI roadmaps, LinkedIn's playbook is clear:

    1. Be pragmatic: Don't try to boil the ocean. Win one vertical, even if it takes 18 months.

    2. Codify the "cookbook": Turn that win into a repeatable process (policy docs, distillation pipelines, co-design).

    3. Optimize relentlessly: The real 10x gains come after the initial model, in pruning, distillation, and creative optimizations like an RL-trained summarizer.

    LinkedIn's journey shows that for real-world enterprise AI, emphasis on specific models or cool agentic systems should take a back seat. The durable, strategic advantage comes from mastering the pipeline — the 'AI-native' cookbook of co-design, distillation, and ruthless optimization.

    (Editor's note: We will be publishing a full-length podcast with LinkedIn's Erran Berger, which will dive deeper into these technical details, on the VentureBeat podcast feed soon.)

  • Nous Research launches Hermes 4 open-source AI models that outperform ChatGPT on math benchmarks with uncensored responses and hybrid reasoning capabilities.
  • Salesforce launches CRMArena-Pro, a simulated enterprise AI testing platform, to address the 95% failure rate of AI pilots and improve agent reliability, performance, and security in real-world business deployments.
  • Anthropic launches a limited pilot of Claude for Chrome, allowing its AI to control web browsers while raising critical concerns about security and prompt injection attacks.
  • Take this blind test to discover whether you truly prefer OpenAI's GPT-5 or the older GPT-4o—without knowing which model you're using.
  • A new MIT report reveals that while 95% of corporate AI pilots fail, 90% of workers are quietly succeeding with personal AI tools, driving a hidden productivity boom.
  • The Chan Zuckerberg Initiative unveils rBio, a groundbreaking AI model that simulates cell biology without lab experiments to accelerate drug discovery and disease research.
  • CodeSignal Inc., the San Francisco-based skills assessment platform trusted by Netflix, Meta, and Capital One, launched Cosmo on Wednesday, a mobile learning application that transforms spare minutes into career-ready skills through artificial intelligence-powered micro-courses. The app represents a strategic pivot for CodeSignal, which built its reputation assessing technical talent for major corporations but always harbored […]

  • China's DeepSeek has released a 685-billion parameter open-source AI model, DeepSeek V3.1, challenging OpenAI and Anthropic with breakthrough performance, hybrid reasoning, and zero-cost access on Hugging Face.