Today marks an exciting milestone in AI technology with the release of upgraded models: Claude 3.5 Sonnet and Claude 3.5 Haiku. These models reflect significant advancements in performance, particularly in coding and tool use. Claude 3.5 Sonnet excels with a 49.0% performance on the SWE-bench Verified, eclipsing all publicly available models. Meanwhile, Claude 3.5 Haiku maintains speed and affordability while topping intelligence benchmarks previously set by Claude 3 Opus.
In a pioneering move, the public beta of computer use is now available through the API. This feature allows Claude to operate computers as humans do—viewing screens, clicking, and typing. Though experimental, companies like Replit and The Browser Company are already innovating with this capability, automating complex processes with a fraction of the effort. While still ironing out kinks, we anticipate rapid improvements, making this a potential game-changer for various industries.
Claude 3.5 Haiku is more than just an upgrade—it’s a beacon of efficiency and precision. Despite matching the cost and speed of its predecessor, it surpasses many existing models in coding tasks, scoring 40.6% on SWE-bench Verified. This makes it an excellent candidate for tasks that demand quick, accurate responses, whether in user interaction scenarios or intensive data analysis.
The concept of computer use pushes AI into new realms of capability. Rather than building specific tools, developers can use general computer skills for automating repetitive tasks, creating software, and even conducting extensive research. Our API facilitates Claude's interaction with computer interfaces, translating user instructions into actions. As this technology matures, we are cautiously optimistic about its potential to enhance productivity without compromising safety.
Responsible deployment remains central to our mission. We’ve introduced new safety measures and classifiers to ensure computer use is conducted securely. This includes monitoring potential misuse, such as fraud or misinformation. We’re learning from our early deployments, and openly invite feedback to optimize this innovative journey.
We are thrilled about the potential these new models unlock. As developers explore these tools, the feedback will be invaluable in refining and expanding the capabilities of Claude. Whether you're innovating how AI interacts with existing platforms or creating entirely new experiences, these advancements are just the beginning.
What new possibilities can you imagine with the groundbreaking features of Claude 3.5?