Hacker News 中文摘要

RSS订阅

交互模型 -- Interaction Models

文章摘要

, 2024) notes that "the model is not designed for real-time interaction." But we think interactivity should scale alongside intelligence—the way we work with AI should not be treated as an afterthought.

Interaction models#

Interaction models handle interaction natively rather than through external scaffolding. They continuously take in audio, video, and text, and think, respond, and act in real time. We train an interaction model from scratch. To ensure real-time responsiveness, we adopt a multi-stream, micro-turn design. Our research preview demonstrates qualitatively new interaction capabilities, as well as state-of-the-art combined performance in intelligence and responsiveness.该文章介绍了交互模型这一新型AI协作方式,主张将交互性作为核心能力而非事后补充。通过原生支持多模态实时输入输出,采用多流微轮次设计,实现了智能与响应能力的突破,使AI能像人类一样自然协作。研究预览展示了该模型在交互能力和综合性能上的显著进步。

文章总结

交互模型:实现人机协作的可扩展方案

核心创新

我们推出"交互模型"研究预览版,该模型通过原生设计(而非外部架构)实现多模态实时交互。其突破性在于: 1. 实时全双工交互:同步处理音频、视频和文本输入,同时生成实时响应 2. 微回合架构:采用200毫秒为单位的流式处理,消除传统回合制边界 3. 双模型协同:前台交互模型保障实时性,后台模型处理深度推理任务

技术突破

  • 多模态微回合设计:通过时间对齐的流式处理,实现语音重叠、视觉打断等自然交互
  • 轻量级编码:直接处理dMel音频信号和40x40图像分块,避免传统重型编码器
  • 推理优化:开发流式会话管理,将GPU内存开销降低95%
  • 安全体系:针对实时交互特点,开发语音化拒绝策略和长程鲁棒性训练

性能表现

模型TML-Interaction-Small在交互性和智能性方面均达前沿水平: - 交互延迟:0.4秒(行业平均1.18秒) - FD-bench V1.5:77.8分(基准模型平均47.8分) - Audio MultiChallenge:43.4%准确率(GPT-3.5为37.6%)

应用场景

  1. 实时翻译:支持重叠语音的同步传译
  2. 编程协作:根据代码编写过程实时提出建议
  3. 健身指导:准确计数并即时纠正动作
  4. 工业质检:视频流中实时识别缺陷

发展路线

  • 2024Q3:开放研究预览版
  • 2024Q4:发布更大参数版本(当前为276B MoE)
  • 2025年:优化长会话上下文管理

我们诚邀研究者加入这项变革人机交互范式的工作,详情请联系interaction@thinkingmachines.ai。

注:本文保留了核心技术细节和性能数据,删减了部分引用文献和实现细节,聚焦于模型的核心创新与应用价值。

评论总结

总结评论观点如下:

正面评价: 1. 对演示效果的赞赏 - "Aside from how impressive the model is, the demos here are very well done!" (评论1) - "incredibly impressive demos" (评论2)

  1. 技术突破的肯定
  • "That's probably the main thing that distinguishes it from the multimodal models" (评论5)
  • "Very appealing...lots of hope that we could see lower latency" (评论6)

质疑与建议: 1. 实用性担忧 - "The demos felt fairly contrived...wonder what more useful applications look like" (评论4) - "I don't want an AI talk to me like that" (评论3)

  1. 技术疑问
  • "how can they guarantee the models won't lose a skill?" (评论2)
  • "what's the economic model for a company like this?" (评论6)

关键亮点引用: 1. 实时交互技术 "Working with 200ms chunks enables near real-time concurrency" (评论5)

  1. 人性化表现 "the model does nothing, just waits" (评论6)

  2. 发展潜力 "lots of room to add intelligence" (评论6)