[转载]Jumony快速抓取网页 — Jumony使用笔记–icode – 郝喜路

[转载]Jumony快速抓取网页 — Jumony使用笔记–icode – 郝喜路 – 博客园.

刚刚在博客园看到一篇博文《使用HttpWebRequest和HtmlAgilityPack抓取网页（拒绝乱码，拒绝正则表达式）》，感觉不错，作者写的也挺好的，然后在看了园子里的朋友的评论后，我知道了有一个更牛x的工具——Jumony 。这个工具用起来可谓称之为简单、高效。特此记录和分享，Jumony 的使用方法。

Jumony是开源项目，目前源代码存放咋GitHub ，源码地址： https://github.com/Ivony/Jumony 。我测试使用的是Visual Studio 2012 ,测试网页为博客园。

下面介绍使用方法：

一、在新建项目后，需要将Jumony添加到项目中，你可以下载源码使用，也可以在NugGet 中搜索 Jumony Core 将其添加到项目中并且后自动添加所需的引用。

二、添加引用之后，即可写项目代码。（此处代码为获取博客园首页文章内容）

public string Html = string.Empty;//为将拼接好html字符串返回给前台代码
protected void Page_Load(object sender, EventArgs e)
{
var htmlSource = new JumonyParser().LoadDocument("http://www.cnblogs.com").Find(".post_item a.titlelnk");
int count = 0;
foreach (var htmlElement in htmlSource)
{
count ++;
Html += string.Format("
<ul>
    <li>{2}、  <a href="\&quot;About.aspx?Url={0}\&quot;" target="\&quot;_blank\&quot;">{1}</a></li>
</ul>
&nbsp;
 
", htmlElement.Attribute("href").Value(), htmlElement.InnerText(),count);
}
}

　　效果图：

三、下面就是要在点击上图从博客园抓取的文章标题之后，在显示博客全文（并非在打开博客园的文章）

string html = Request["Url"];
var htmlSource =
new JumonyParser().LoadDocument(html);
HtmlText = htmlSource.Find(".postTitle2").FirstOrDefault().InnerText();
 
Html = htmlSource.Find("#cnblogs_post_body").FirstOrDefault().InnerHtml();

[转载]Jumony快速抓取网页 --- Jumony使用笔记--icode - 郝喜路 - 博客园

相关推荐

热门标签

分类

链接表

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏