首页 > 编程语言 >C#使用Html Agility Pack(HAP)的XPath解析HTML

C#使用Html Agility Pack(HAP)的XPath解析HTML

时间:2022-10-22 15:33:34浏览次数:57  
标签:XPath htmlDoc Agility Console html HTML var using public


安装

Html Agility Pack(HAP)是C#的开源项目,支持XPath查询。

官网:https://html-agility-pack.net/

使用NuGet安装,如图:

C#使用Html Agility Pack(HAP)的XPath解析HTML_linq

HtmlDocument.Load加载文件

using System;
using HtmlAgilityPack;

public class Program
{
public static void Main()
{
SaveHtmlFile();
var path = @"test.html";
var doc = new HtmlDocument();
doc.Load(path);
var node = doc.DocumentNode.SelectSingleNode("//body");
Console.WriteLine(node.OuterHtml);
}

private static void SaveHtmlFile()
{
var html =
@"<!DOCTYPE html>
<html>
<body>
<h1>This is <b>bold</b> heading</h1>
<p>This is <u>underlined</u> paragraph</p>
<h2>This is <i>italic</i> heading</h2>
</body>
</html> ";
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);
htmlDoc.Save("test.html");
}
}

C#使用Html Agility Pack(HAP)的XPath解析HTML_c#_02

HtmlDocument.LoadHtml加载字符串

using System;
using HtmlAgilityPack;

public class Program
{
public static void Main()
{
var html = @"<!DOCTYPE html>
<html>
<body>
<h1>This is <b>bold</b> heading</h1>
<p>This is <u>underlined</u> paragraph</p>
<h2>This is <i>italic</i> heading</h2>
</body>
</html> ";

var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);
var htmlBody = htmlDoc.DocumentNode.SelectSingleNode("//body");
Console.WriteLine(htmlBody.OuterHtml);
}
}

C#使用Html Agility Pack(HAP)的XPath解析HTML_c#_03

HtmlWeb.Load通过URL加载HTML

using HtmlAgilityPack;
using System;

namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
var html = @"https://www.baidu.com/";
var web = new HtmlWeb();
var doc = web.Load(html);
var node = doc.DocumentNode.SelectSingleNode("//head/title");
Console.WriteLine(node.OuterHtml);
}
}
}

SelectNodes()选择多个节点

// @nuget: HtmlAgilityPack

using System;
using System.Linq;
using HtmlAgilityPack;

public class Program
{
public static void Main()
{
var html =
@"<TD class=texte width=""50%"">
<DIV align=right>Name :<B> </B></DIV>
</TD>
<TD width=""50%"">
<INPUT class=box value=John maxLength=16 size=16 name=user_name>
<INPUT class=box value=Tony maxLength=16 size=16 name=user_name>
<INPUT class=box value=Jams maxLength=16 size=16 name=user_name>
</TD>
<TR vAlign=center>";

var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);

string name = htmlDoc.DocumentNode
.SelectNodes("//td/input")
.First()
.Attributes["value"].Value;

Console.WriteLine(name);
}
}

SelectSingleNode(String)选择第一个节点

// @nuget: HtmlAgilityPack

using System;

using HtmlAgilityPack;

public class Program
{
public static void Main()
{
var html =
@"<TD class=texte width=""50%"">
<DIV align=right>Name :<B> </B></DIV>
</TD>
<TD width=""50%"">
<INPUT class=box value=第一 maxLength=16 size=16 name=user_name>
<INPUT class=box value=第二 maxLength=16 size=16 name=user_name>
<INPUT class=box value=第三 maxLength=16 size=16 name=user_name>
</TD>
<TR vAlign=center>";

var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);

string name = htmlDoc.DocumentNode
.SelectSingleNode("//td/input")
.Attributes["value"].Value;

Console.WriteLine(name);
}
}

获取属性

// @nuget: HtmlAgilityPack

using System;
using System.Xml;
using HtmlAgilityPack;

public class Program
{
public static void Main()
{
var html =
@"<body>
<h1>This is <b>bold</b> heading</h1>
<p>This is <u>underlined</u> paragraph</p>

<h1>This is <i>italic</i> heading</h1>
<p>This is <u>underlined</u> paragraph</p>
</body>";

var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);

var htmlNodes = htmlDoc.DocumentNode.SelectNodes("//body/h1");

foreach (var node in htmlNodes)
{
Console.WriteLine("InnerHtml:" + node.InnerHtml);
Console.WriteLine("OuterHtml:" + node.OuterHtml);
Console.WriteLine("InnerText:"+node.InnerText);
Console.WriteLine("ParentNode" + node.ParentNode.Name);
Console.WriteLine("===========");
}
}
}

C#使用Html Agility Pack(HAP)的XPath解析HTML_加载_04

参考

​https://html-agility-pack.net/parser​


标签:XPath,htmlDoc,Agility,Console,html,HTML,var,using,public
From: https://blog.51cto.com/lilongsy/5785892

相关文章

  • HTML 音频/视频
    HTML音频/视频方法方法描述addTextTrack()向音频/视频添加新的文本轨道。canPlayType()检测浏览器是否能播放指定的音频/视频类型。load()重新加载音频/视......
  • HTML 画布 <canvas>
    HTML5<canvas>标签用于绘制图像(通过脚本,通常是JavaScript),getContext("2d")对象的属性和方法,可用于在画布上绘制文本、线条、矩形、圆形等等。颜色、样式和阴影属性......
  • HTML标签-表单标签-概述
    HTML标签-表单标签-概述HTML标签:表单标签表单:概念:用于采集用户输入的数据的。用于和服务器进行交互。from:用于定义表单的。可以定义一个范围,范围......
  • HTML标签-综合案例-分析和HTML标签-综合案例-实现
    HTML标签-综合案例-分析链接标签:a:定义一个超链接属性:href:指定访问资源的URL(统一资源定位符)target:指定打开资源的方式......
  • 590 HTML标签_表单标签_表单项input2 and 592 HTML标签_表单标签_表单项select&texta
    表单选项标签 file:文件选择框hidden:隐藏域,用于提交一些信息。按钮:submit:提交按钮。可以提交表单button:普通按钮 ......
  • CSS与HTML结合方式和CSS语法格式
    3.CSS的使用:CSS与html结合方式 1.内联样式 *在标签内使用style属性指定css代码 *如:<divstyle="color:red;">hellocss</div> 2.内部样......
  • HTML标签案例注册页面
    <!DOCTYPEhtml><htmllang="en"><head><metacharset="UTF-8"><title>表单标签</title></head><body><!--form:用于定义表单可以定一个范围......
  • CSS与html的结合方式、CSS语法格式
    CSS与html的结合方式<!--内联样式,在标签内使用style属性指定css代码--><divstyle="...">Hello</div><head><metacharset="UTF-8"><title>Title</ti......
  • HTML标签-表格标签1和HTML标签-表格标签2
    HTML标签-表格标签1table:定义表格width:宽度border:边框cellpadding:定义内容和单元格的距离cellspacing:定义单元格之间的距离。如果指定为0,则单元格的......
  • HTML5基础
    HTML5基础一.标签基础(集合)<!DOCTYPE>文档类型声明,在代码的第一行出现;<html>根元素标签;属于有结束语</html>的标签包含两个标签:<head>,<body>;<head>有结束......