Hacker News 中文摘要

文章摘要

文章介绍了如何通过将AI代理编写为HTML文件，使其能够在浏览器中直接运行，从而减少对额外工具和框架的依赖。这一实验性项目旨在推动开源代理的广泛应用，用户只需打开HTML文件即可运行AI代理，无需安装其他依赖。

文章总结

文章《Wasm-agents: AI agents running in your browser》介绍了如何在浏览器中运行AI代理，而无需安装额外的工具和框架。以下是主要内容总结：

背景与动机

开源AI代理的广泛应用面临一个主要障碍，即运行前需要安装额外的工具和框架。为了解决这个问题，Mozilla AI团队提出了Wasm agents blueprint项目，展示了如何将代理编写为HTML文件，直接在浏览器中打开和运行，无需任何额外依赖。

技术基础

WebAssembly (Wasm)：一种二进制指令格式，允许在浏览器中以接近原生速度运行C、C++、Rust和Python等语言编写的代码。
Pyodide：一个基于WebAssembly的Python发行版，支持在浏览器中直接运行Python代码及其库。

实现方式

代理代码被封装为独立的HTML文件，包含UI和运行代码。通过调用pyodide.runPythonAsync命令，安装所需的Python依赖、禁用跟踪并运行实际的代理代码。代理代码基于openai-agents-python库，支持通过OpenAI兼容的API运行AI代理，默认使用gpt-4o模型，但也支持其他自托管模型，如HuggingFace TGI、vLLM和Ollama。

使用示例

文章提供了几个示例HTML文件，展示了Wasm代理的功能： - hello_agent.html：简单的对话代理，用于理解Wasm代理的基础。 - handoff_demo.html：多代理系统，根据提示特征将请求路由到专门的代理。 - tool_calling.html：高级代理，内置工具用于实际任务。 - ollama_local.html：依赖本地模型的代理，适合离线任务。

已知限制

依赖openai-agents框架：目前仅支持该框架，且需要禁用代理跟踪。
CORS问题：跨域请求默认被浏览器阻止，需要手动处理。
推理引擎假设：使用开源模型仍需额外安装步骤。
模型资源需求：大型模型可能无法在低性能硬件上运行。

自定义与扩展

文章鼓励用户通过以下方式探索和扩展Wasm代理： - 学习代码并运行示例。 - 调整提示，测试工具调用的有效性。 - 理解不同模型的特性和局限性。 - 测试模型和硬件的极限。 - 尝试其他代理框架和工具。

结论

Wasm代理是否是一个伟大的创意还是一个有趣的hack尚不确定，但它们与许多重要概念（如工具所有权、本地运行、数据保护等）产生了共鸣。Mozilla AI团队鼓励用户尝试并反馈，进一步推动这一项目的发展。

通过这篇文章，读者可以了解到如何在浏览器中轻松运行AI代理，并探索其潜在应用和限制。

评论总结

主要观点总结：

关于CORS问题的讨论：
- 观点：浏览器扩展可能绕过CORS限制。
- 引用：
  - "Can you bypass the cors issue with a browser extension? I seem to recall CORS doesn't apply to extensions, or at least the part that isn't injected to the webpages." (评论1)
  - "Firefox might be the future of agents, due to its extensibility." (评论7)
对AI代理的质疑：
- 观点：当前AI代理的概念被过度炒作，实际应用并不复杂。
- 引用：
  - "This is trying to use the word agent to make it sound cool, but it doesn't make a case for why it's particularly about agents and not just basic level AI stuff." (评论4)
  - "The only difference here is that the client-side code is in Python, which by itself doesn't make creating agents any simpler - I would argue that it complicates things a tone." (评论8)
本地运行AI代理的优势：
- 观点：本地运行AI代理可以解决隐私和安全问题，并减少环境足迹。
- 引用：
  - "LLMs that act (a.k.a. agents) bring a whole lot of new security and privacy issues." (评论5)
  - "As a plus, because they're small, their environmental footprint will also be smaller." (评论5)
浏览器环境的局限性：
- 观点：浏览器环境不适合运行长期进程的AI代理，需要进化。
- 引用：
  - "The frustrating thing about this is the limitation of using a browser. Agents should be long-running processes that exist external to a browser." (评论11)
  - "I think we are looking at a true evolution of the web now if this is the way it's going to go." (评论11)
技术实现的多样性：
- 观点：AI代理的实现方式多样，包括浏览器扩展、本地运行等。
- 引用：
  - "i build an opensource mobile browser - we create ai agents (that run in the background) on the mobile browser." (评论6)
  - "When I saw the title, I thought this was running models in the browser. IMO that's way more interesting and you can do it with transformers.js and onnx runtime." (评论9)

总结：

评论中主要围绕AI代理的技术实现、隐私安全问题、浏览器环境的局限性以及本地运行的优势展开讨论。部分评论对AI代理的概念提出质疑，认为其被过度炒作，而另一些评论则强调了本地运行和浏览器扩展的潜力。总体而言，评论者认为AI代理的技术实现需要更多的创新和进化，以应对当前的技术和隐私挑战。

WASM 代理：浏览器中运行的 AI 代理 -- WASM Agents: AI agents running in the browser