文章摘要
Anthropic公司开源了其原始性能测试项目originalperformancetakehome,现已在GitHub上公开供开发者尝试使用。该项目可能用于评估系统性能或作为技术面试的测试题目。
文章总结
Anthropic开源性能测试项目:挑战Claude Opus 4.5的极限
主要内容: 1. 项目背景 - Anthropic公司公开了其原始性能测试项目"originalperformancetakehome" - 该项目原本用于评估AI模型Claude的性能表现 - 现在开放给公众尝试挑战Claude Opus 4.5的表现
- 性能基准
- 列出了Claude各版本在不同条件下的表现(以时钟周期数衡量):
- Claude Opus 4:2164周期(多次测试后)
- Claude Opus 4.5:1790周期(2小时测试)
- 最佳表现:1363周期(改进测试条件下)
- 挑战邀请
- 鼓励开发者尝试优化性能
- 如果优化结果能低于1487周期(超过Claude Opus 4.5发布时的最佳表现)
- 可将代码和简历发送至performance-recruiting@anthropic.com
- 优秀表现者可能获得面试机会
- 项目信息
- 包含Python(88.7%)和HTML(11.3%)代码
- 提供测试脚本tests/submission_tests.py用于验证优化结果
- 已获得284星标和53次fork
- 联系方式
- 性能相关问题可联系:performance-recruiting@anthropic.com
(注:已省略GitHub页面导航菜单、页脚信息等与核心内容无关的部分)
评论总结
评论总结:
- 对任务说明的困惑
- 主要观点:任务说明不清晰,缺乏具体要求和评分标准
- 关键引用: "What is the actual assignment here? The README only gives numbers without any information on what you're supposed to do or how you are rated."(koolba) "This is a knowledge test of GPU architecture?"(greesil)
- 对招聘方式的批评
- 主要观点:认为这种限时优化测试过于片面,不能全面评估候选人能力
- 关键引用: "Seems like they're trying to hire nerds who know a lot about hardware or compiler optimizations...hiring for creativity is a lot harder."(jackblemming) "It shocks me that anyone supposedly good enough for anthropic would subject themselves to such a one sided waste of time."(zeroCalories)
- 技术挑战的积极评价
- 主要观点:认为这是一个有趣的学习机会,特别是对优化技术的学习
- 关键引用: "Having recently learned more about SIMD, PTX and optimization techniques, this is a nice little challenge to learn even more."(sureglymop) "It's pretty interesting how close this assignment looks to demoscene golf."(avaer)
- 对公司态度的质疑
- 主要观点:对Anthropic的招聘语气和方式表示不满
- 关键引用: "The snarky writing...is really something, innit?"(tucnak) "I suspect this was released by Anthropic as a DDOS attack on other AI companies."(pvalue005)
- AI表现的讨论
- 主要观点:关注AI在编程竞赛中的表现及其影响
- 关键引用: "The oAI 2nd place at the atcoder world championship competition was the first one...Sakana also got 1st place in another atcoder competition a few weeks ago."(NitpickLawyer) "Was the screening format here that this problem was sent out, and candidates had to reply with a solution within 2 hours?"(Maro)
- 对其他公司的猜测
- 主要观点:猜测OpenAI是否会采取类似做法
- 关键引用: "I wonder if OpenAI follows suit."(dhruv3006)