文章摘要

OpenAI发布了GPT-5.4 Thinking系统卡，这是其最新的推理模型，展示了该技术在安全性和研究方面的发展。系统卡详细介绍了模型的能力和潜在应用，同时强调了OpenAI对AI安全与对齐的持续关注。

文章总结

2026年3月5日
出版物安全

GPT-5.4思维是GPT-5系列的最新推理模型（详见博客）。其安全防护措施延续了该系列的一贯方案，但作为首个通用模型，它特别针对网络安全高级能力实施了防护，相关技术基于GPT-5.3 Codex在ChatGPT和API中的最新实践。

注：文中GPT-5.4 Thinking简称gpt-5.4-thinking。需注意不存在GPT-5.3 Thinking模型，主要对比基线为GPT-5.2 Thinking。

以下是评论内容的总结，平衡呈现不同观点：

性能质疑
- 有用户指出基准测试提升有限，甚至某些领域表现更差 "Benchmarks barely improved it seems" (world2vec) "Health category seems to report worse performance compared to 5.2" (cj)
成本争议
- 对定价过高表示不满 "$30/M Input and $180/M Output Tokens is nuts" (nthypes) "Ridiculous expensive for not that great bump on intelligence" (nthypes)
技术担忧
- "思考"模式导致数学和浏览器代理基准测试表现下降 "Significantly worse results when enabling thinking...Especially for Math" (ZeroCool2u)
- 对基准测试可信度存疑 "I wouldn't trust any of these benchmarks unless...accompanied by proof" (iamleppert)
产品方向批评
- 认为应聚焦产品而非模型微调 "It's time for a product, not for a marginally improved model" (yanis_t)
- 对比竞争对手处于劣势 "People are more excited by Anthropic and Google releases" (beernet)
积极评价
- 操作系统使用能力超越人类 "75% on os world surpassing humans at 72%" (iamronaldo)
- 肯定动态工具搜索功能 "Most exciting change...use of tool search to dynamically load tools" (rbitar)
其他关注点
- 军事应用联想 "I'm sure the military will enjoy it" (Chance-Device)
- 编码能力比较需求 "5.4 vs 5.3-Codex? Which is better for coding?" (jcmontx)

注：所有评论均未显示评分（None），主要反映用户自发观点。负面评价主要集中在性能提升有限、定价过高和测试方法透明度方面；正面评价则关注特定功能改进和新特性。