Home > Backend Development > C#.Net Tutorial > C# method to grab a tag links and innerhtml based on regular expressions

C# method to grab a tag links and innerhtml based on regular expressions

黄舟
Release: 2017-06-04 09:40:04
Original
1720 people have browsed it

这篇文章主要介绍了C#基于正则表达式抓取a标签链接和innerhtml的方法,结合实例形式分析了C#使用正则表达式进行页面元素的匹配与抓取相关操作技巧,需要的朋友可以参考下

本文实例讲述了C#基于正则表达式抓取a标签链接和innerhtml的方法。分享给大家供大家参考,具体如下:

//读取网页html
string text = File.ReadAllText(Environment.CurrentDirectory + "//test.txt", Encoding.GetEncoding("gb2312"));
string prttern = "<a(\\s+(href=\"(?<url>([^\"])*)\"|&#39;([^&#39;])*&#39;|\\w+=\"(([^\"])*)\"|&#39;([^&#39;])*&#39;))+>(?<text>(.*?))</a>";
var maths = Regex.Matches(text, prttern);
//抓取出来写入的文件
using (FileStream w = new FileStream(Environment.CurrentDirectory + "//wirter.txt", FileMode.Create))
{
    for (int i = 0; i < maths.Count; i++)
    {
      byte[] bs = Encoding.UTF8.GetBytes(string.Format("链接地址:{0},  innerhtml:{1}", maths[i].Groups["url"].Value,
        maths[i].Groups["text"].Value) + "\r\n");
      w.Write(bs, 0, bs.Length);
      Console.WriteLine();
    }
}
Console.ReadKey();
Copy after login

图解正则

朋友需要截取img标签的src 和data-url 跟上面差不多。。顺便附上

string text =File.ReadAllText(Environment.CurrentDirectory + "//test.txt", Encoding.GetEncoding("gb2312"));
string prttern = "<img(\\s*(src=\"(?<src>[^\"]*?)\"|data-url=\"(?<dataurl>[^\"]*?)\"|[-\\w]+=\"[^\"]*?\"))*\\s*/>";
var maths = Regex.Matches(text, prttern);
//抓取出来写入的文件
using (FileStream w = new FileStream(Environment.CurrentDirectory + "//wirter.txt", FileMode.Create))
{
    for (int i = 0; i < maths.Count; i++)
    {
      byte[] bs = Encoding.UTF8.GetBytes(string.Format("图片src:{0},  图片data-url:{1}", maths[i].Groups["src"].Value,
        maths[i].Groups["dataurl"].Value) + "\r\n");
      w.Write(bs, 0, bs.Length);
      Console.WriteLine();
    }
}
Copy after login

The above is the detailed content of C# method to grab a tag links and innerhtml based on regular expressions. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template