Claude Opus 4.1 has been released, bringing several improvements and new features for software developers. Here are some of the key updates:
1. "Enhanced Performance": The latest version offers improved speed and efficiency, allowing developers to handle more complex tasks with greater ease.
2. "New API Endpoints": Additional API endpoints have been introduced to provide more flexibility and functionality, enabling developers to integrate Claude Opus 4.1 into their applications more seamlessly.
3. "Improved Natural Language Understanding": The model now has better natural language processing capabilities, making it more accurate and responsive to developer queries and commands.
4. "Expanded Knowledge Base": Claude Opus 4.1 has been updated with the latest information, ensuring that developers have access to the most current and relevant data.
5. "Enhanced Security Features": New security measures have been implemented to protect sensitive data and ensure that developers can work with confidence.
6. "Better Collaboration Tools": Improved collaboration features allow multiple developers to work together more effectively, streamlining the development process.
7. "Customization Options": Developers can now customize the model to better suit their specific needs, including the ability to fine-tune parameters for optimal performance.
8. "Debugging and Error Handling": Enhanced debugging tools and improved error handling make it easier for developers to identify and resolve issues quickly.
If you have any specific questions or need further details about any of these features, feel free to ask!
相關(guān)內(nèi)容:
作者 | Hien Luu
譯者 | 田橙
Anthropic 已推出 Claude Opus 4.1,這是針對 Opus 4 的重要升級版,顯著增強了模型在多文件項目中的代碼可靠性,并提升了模型在長鏈式交互中的推理能力。該版本在 SWE-bench Verified 基準測試 中的得分由 72.5% 改進至 74.5%,說明模型在真實世界編程任務中更加可靠。

圖 1:Opus 4.1 與 Opus 4 在 SWE-bench Verified 準確率上的對比
在 Opus 4 的基礎(chǔ)上,新版本進一步強化了 Claude 作為編程助手的能力,尤其在開發(fā)者常用的多文件場景中,其代碼重構(gòu)的可靠性有了提升——這是許多 AI 助手的薄弱環(huán)節(jié)。Anthropic 還指出,模型在長時間交互中跟蹤推理鏈和狀態(tài)的能力有所提升,這對類代理(agent-like)工作流程至關(guān)重要。他們將這些更新視為循序漸進但意義顯著的改進,助力 Claude 向更實用、可應用于企業(yè)級場景的 AI 助手發(fā)展。
SWE-bench Verified 被廣泛認為是衡量編碼助手在真實 GitHub 項目中解決問題能力的重要基準測試。相比于合成基準,SWE-bench 更貼近真實開發(fā)場景,因此其得分提升被視為模型在實際編程任務中能力增強的重要指標。
據(jù)發(fā)布說明所述,GitHub 反饋稱 Opus 4.1 在復雜重構(gòu)任務上性能更強;Rakuten Group 表示,Claude 能在大型代碼庫中準確指出修正位置,且不會引入無關(guān)改動;而 Windsurf 在內(nèi)部面向初級開發(fā)者的基準測試中,觀察到比 Opus 4 高出一個標準差的性能躍升——這一跨越被比作從 Sonnet 3.7 升級到 Sonnet 4 的提升。
安全性方面,Claude Opus 4.1 的“無害響應率”(harmless response rate)提升至 98.76%,相比 Opus 4 的 97.27% 有明顯提高。這意味著模型在拒絕違規(guī)請求時更加可靠。同時,在涉及武器或毒品合成等高風險濫用場景中,模型的合作率下降了 25%,有效降低企業(yè)在合規(guī)與品牌方面的風險。
“無害響應率”是衡量模型在對抗違禁或危險內(nèi)容請求時保持安全響應的一項核心指標,尤其對企業(yè)部署而言,這關(guān)系到合規(guī)性與品牌形象。
Claude Opus 4.1 目前已向以下用戶開放使用:已付費的 Claude 用戶、通過 Claude Code 用于終端工作流的用戶,以及通過 API、Amazon Bedrock 和 Google Cloud 的 Vertex AI 平臺接入者。值得一提的是,其定價保持與 Opus 4 相同。
原文鏈接:
https://www.infoq.com/news/2025/08/anthropic-claude-opus-4-1/
聲明:本文為 InfoQ 翻譯,未經(jīng)許可禁止轉(zhuǎn)載。
今日好文推薦
叮!極客邦 2025 秋招“通關(guān)文牒”已送達!