benchmarks Archives - Optimus Technology Group

Nous Research drops Hermes 4 AI models that outperform ChatGPT without content restrictions

Nous Research launches Hermes 4 open-source AI models that outperform ChatGPT on math benchmarks with uncensored responses and hybrid reasoning capabilities.

Salesforce builds ‘flight simulator’ for AI agents as 95% of enterprise pilots fail to reach production

Salesforce launches CRMArena-Pro, a simulated enterprise AI testing platform, to address the 95% failure rate of AI pilots and improve agent reliability, performance, and security in real-world business deployments.

MCP-Universe benchmark shows GPT-5 fails more than half of real-world orchestration tasks

A new benchmark from Salesforce research evaluates model and agentic performance on real-life enterprise tasks.

Stop benchmarking in the lab: Inclusion Arena shows how LLMs perform in production

Researchers from Inclusion AI and Ant Group proposed a new LLM leaderboard that takes its data from real, in-production apps.

Nous Research drops Hermes 4 AI models that outperform ChatGPT without content restrictions

Salesforce builds ‘flight simulator’ for AI agents as 95% of enterprise pilots fail to reach production

MCP-Universe benchmark shows GPT-5 fails more than half of real-world orchestration tasks

Stop benchmarking in the lab: Inclusion Arena shows how LLMs perform in production

Useful Links

Useful Links

AVADA IT

RECENT TWEETS

CONTACT US