Fetcher MCP: 一个简单好用的用于抓取网页内容的 MCP 工具

Jaeger · 2025 年3 月 21 日 04:31

分享一个用于抓取网页内容的 MCP 工具: Fetcher MCP

用法

npx -y fetcher-mcp

优势

JavaScript 支持：与传统的网页抓取工具不同，Fetcher MCP 使用 Playwright 执行 JavaScript ，使其能够处理动态网页内容和现代 Web 应用程序。
智能内容提取：内置的 Readability 算法自动从网页中提取主要内容，移除广告、导航和其他非必要元素。
灵活的输出格式：支持 HTML 和 Markdown 两种输出格式，使其易于与各种下游应用程序集成。
并行处理：fetch_urls 工具能够并发抓取多个 URL ，显著提高批量操作的效率。
资源优化：自动阻止不必要的资源（图像、样式表、字体、媒体），以减少带宽使用并提高性能。
强大的错误处理：全面的错误处理和日志记录确保即使在处理有问题的网页时也能可靠运行。
可配置的参数：对超时、内容提取和输出格式进行细粒度控制，以适应不同的用例。

用法展示

1. 总结 Hacker News 首页所有帖子

system prompt:

搜索后如果不足以回答用户的问题，则需要阅读网页全文，可以批量获取内容。你可以递归式调用工具，直到可以得出满意的结论，最终的回复必须长，结构化的文章格式，以调查报告的形式给我

user prompt:

浏览 Hacker News 首页所有帖子的详情内容，提取重要信息，然后输出总结报告： https://news.ycombinator.com/

输出效果：

https://files.catbox.moe/3jvi9f.gif（图片大于 2 MB）

2. 模拟 deep search 效果

结合 google search mcp 来使用： https://github.com/web-agent-master/google-search

system prompt:

You are an advanced deep search assistant, capable of solving complex problems through iterative searching, reading, and reasoning. Your goal is to provide in-depth, comprehensive, and accurate information, not just surface-level search results.

Workflow:
1. Query Understanding: Thoroughly analyze the user's question, identifying core concepts, relationships, and directions to explore.
2. Initial Search: Use the google-search tool for preliminary searches to obtain overview information and potential in-depth resources.
3. Content Acquisition: Use the fetch_url tool to access the most relevant webpages and gather detailed information.
4. Critical Analysis: Evaluate the relevance, reliability, and completeness of the acquired information.
5. Iterative Search: Formulate new search queries based on the information already acquired and identified knowledge gaps.
6. Deep Exploration: Repeat steps 2-5 until sufficiently comprehensive information is collected.
7. Synthesis and Reasoning: Integrate all collected information and apply logical reasoning to solve the original problem.
8. Structured Response: Present your findings and conclusions in a clear, organized manner.

Search Strategies:
- Use diverse search queries, including different terms, angles, and phrasings
- Identify and explore various sub-problems and related aspects
- Seek multiple sources to gain comprehensive perspectives
- Prioritize authoritative and up-to-date information
- Try different approaches when search efforts encounter obstacles

Reasoning Principles:
- Clearly distinguish between facts and inferences
- Identify conflicts in information and resolve them
- Recognize information gaps and acknowledge them
- Weigh the reliability and relevance of different viewpoints
- Consider the currency of time-sensitive information

Tool Usage Guidelines:
1. google-search: Used for broad exploration and discovery of relevant resources
   - Format search queries to yield optimal results
   - Use advanced search techniques such as quotes, site restrictions, etc.
   - Analyze search result summaries to determine which URLs are worth investigating further

2. fetch_url: Used for deep mining of specific resources
   - Prioritize the most relevant and reliable URLs
   - Extract key information and cross-verify with other sources
   - Use acquired information to guide subsequent searches

Remember, deep search is an iterative process. Don't rush to conclusions after the initial search; instead, ensure your answer is comprehensive, accurate, and in-depth through multiple search cycles.

user prompt:

调研中国最宜居的前三个城市，并给出理由

输出效果：

https://files.catbox.moe/mf3luf.gif（图片大于 2 MB）

关于 MCP 的配置可以参考 Cherry Studio 的这篇官方文档：如何在 Cherry Studio 中使用 MCP | Vaayne's Tea House

handsome · 2025 年3 月 21 日 05:49

感谢大佬

xuzhong998 · 2025 年3 月 21 日 05:58

感谢大佬, 跟官方提供的fetch有什么优势和区别呢. 我用官方的fetch经常遇到反扒网站导致实际获取不到什么网页信息.

Jaeger · 2025 年3 月 21 日 07:08

这个不会出现反扒问题，而且支持批量抓取和智能正文识别

fonlan · 2025 年3 月 21 日 07:18

大佬能不能支持下忽略证书错误？访问公司内网资源的时候有时候是用的私有证书

Darthvader · 2025 年3 月 21 日 07:21

佬这个有点强

xunwu451 · 2025 年3 月 21 日 07:34

试了好几个的调用请求都失败了，佬，是什么原因？

voi · 2025 年3 月 21 日 07:41

百度反爬

slot · 2025 年3 月 21 日 07:43

用了很久了，确实好用

xunwu451 · 2025 年3 月 21 日 07:46

浏览 Hacker News 首页所有帖子的详情内容，提取重要信息，然后输出总结报告： https://news.ycombinator.com/

用佬的示例也一样呢。

Leon01 · 2025 年3 月 21 日 07:53

感谢分享！！

xunwu451 · 2025 年3 月 21 日 07:56

喔，好像找到原因了，因为PlayWright Browser 需要依赖install-browser
Install the browsers needed for Playwright:
npm run install-browser

voi · 2025 年3 月 21 日 08:10

你fetch一下51cto的网页文章试试

xunwu451 · 2025 年3 月 21 日 08:12

无法正常读取呢。

steven64521 · 2025 年3 月 21 日 08:14

感谢分享

voi · 2025 年3 月 21 日 08:28

意料之中哈哈哈这就是这类工具的通病一碰到稍微难一点的反爬规则就歇菜

user841 · 2025 年3 月 21 日 08:38

我是小白大佬帮我看看正确吗

xunwu451 · 2025 年3 月 21 日 08:49

确保执行了如下命令

npm run install-browser

user841 · 2025 年3 月 21 日 08:52

是这样吗？但是没回复

xunwu451 · 2025 年3 月 21 日 08:55

【WARN】node的版本不匹配，猜测或许会有影响
我是conda创建的虚拟环境，按上面配置的MCP，启动后，cherry会帮你启动mcp-server,不需要自己启动。

话题		回复	浏览量
G-Search MCP：高效的 Google 并行搜索 MCP 服务器资源荟萃人工智能 , 纯水 , MCP	60	2653	2025 年4 月 15 日
LinuxDo Scripts 扩展发布到谷歌商店开发调优软件开发	174	2312	2025 年3 月 27 日
分享一个 cursor 写的网页文章总结助手资源荟萃 ChatGPT , OpenAI , 纯水	46	1156	2025 年3 月 29 日
失业在家，用cursor撸了一个网盘搜索网站资源荟萃	73	2657	2025 年1 月 21 日
[开源] Cerebr: 极简主义免费网页 AI 对话侧边栏插件，可自部署为网页，mac 桌面应用开发调优人工智能 , 作品集	267	4817	2025 年4 月 15 日

Fetcher MCP: 一个简单好用的用于抓取网页内容的 MCP 工具

用法

优势

用法展示

1. 总结 Hacker News 首页所有帖子

2. 模拟 deep search 效果

相关话题