Hacker News 中文摘要

文章摘要

文章探讨了在GPT-5之后，通过性能效率优化路由来降低大型语言模型（LLMs）成本并提升其表现的方法。研究旨在通过优化模型的路由策略，实现更高效的计算资源利用，从而在保持或提升模型性能的同时，减少运行成本。

文章总结

标题：超越GPT-5：通过性能效率优化路由降低大语言模型成本并提升性能

主要内容：

在大型语言模型（LLM）的发展中，平衡性能与效率是一个核心挑战。GPT-5通过测试时路由技术，动态地将查询分配给高效或高容量模型，以应对这一挑战。本文提出了Avengers-Pro，一个测试时路由框架，它集成了不同容量和效率的LLM，为所有性能与效率的权衡提供了统一的解决方案。

Avengers-Pro通过嵌入和聚类传入的查询，然后根据性能效率评分将每个查询路由到最合适的模型。在6个具有挑战性的基准测试和8个领先模型（包括GPT-5-medium、Gemini-2.5-pro和Claude-opus-4.1）中，Avengers-Pro取得了最先进的结果：通过调整性能效率权衡参数，它可以在平均准确率上超越最强的单一模型（GPT-5-medium）7%。此外，它可以在降低27%成本的情况下，匹配最强单一模型的平均准确率，并在降低63%成本的情况下达到其90%的性能。最重要的是，它实现了帕累托前沿，在所有单一模型中，始终以最低的成本提供最高的准确率，或以最低的成本达到给定的准确率。

代码已开源，可通过以下链接访问：GitHub链接。

关键词：大型语言模型、性能效率优化、测试时路由、Avengers-Pro、GPT-5

评论总结

评论内容总结：

对LLM集成方法的乐观态度
- 评论1认为集成方法将成为LLM发展的下一阶段，并指出开源社区有机会在这方面超越OpenAI。
  引用：
  - "ensembling approaches would become the next stage of LLM development"
  - "The open weight community has an opportunity to take these ideas and run with them better than OpenAI has."
- 评论2对“LLM路由”这一新范式表示兴趣，认为尽管目前粗糙，但未来会显著改进。
  引用：
  - "What GPT-5 auto (and this paper) are doing is a step further: 'LLM routing' across multiple distinct models."
  - "It’s still rough right now, but it feels inevitable that this will get much better over time."
对路由延迟和方法的质疑
- 评论3指出论文未提及路由延迟，并批评了图表展示问题。
  引用：
  - "Paper and repo do not mention routing latency, which I think is a concern."
  - "the paper has some pie chart crimes on page 6."
- 评论10认为简单的路由方法在实际应用中可能失效，尤其是在混合数据集的情况下。
  引用：
  - "this almost certainly doesn’t work in practice."
  - "if you were to mix all datasets into one... this approach would surely break down."
对商业化和模型优化的思考
- 评论4探讨了AI商业化的可能性，认为模型优化和数据中心效率提升可能推动盈利。
  引用：
  - "I wonder how long it will be before someone manages to make a profitable AI business."
  - "Maybe when they race to train better models slows down and they don’t need to constantly upgrade capacity."
- 评论7对GPT-5路由器的智能性提出质疑，认为其配置过于吝啬。
  引用：
  - "the GPT-5 router either isn’t very smart or is deliberately configured to be very stingy."
  - "It basically never uses the reasoning model by itself, even if that means it hallucinates nonsense."
对现有解决方案的提及
- 评论8提到NotDiamond公司可能已解决类似问题，并期待其团队回应。
  引用：
  - "Isn’t this what NotDiamond (founded 2 years ago!) has been working to solve for?"
  - "Maybe someone from their team will chime in."
对路由机制的解释
- 评论9解释了系统如何根据查询类型将提示定向到最适合的LLM。
  引用：
  - "the system intelligently directs the prompt to the LLM that is best suited to handle it."
  - "It’s externally optimizing people’s prompts."
- 评论6描述了论文中使用的基准测试方法，认为其结果是帕累托效率的体现。
  引用：
  - "they use 70% of the benchmark query-answer pairs to cluster and determine which models work best for each cluster."
  - "It doesn’t seem surprising that this approach would give you Pareto efficiency on those benchmarks."

总结：评论中对LLM集成和路由方法的前景持乐观态度，但也对路由延迟、方法局限性和商业化提出了质疑。部分评论提到现有解决方案，并对路由机制进行了解释和批评。

通过性能效率优化路由降低大语言模型成本并提升性能 -- Making LLMs Cheaper and Better via Performance-Efficiency Optimized Routing

文章摘要

文章总结

评论总结