Home Backend Development C#.Net Tutorial Code example of how C# uses regular expressions to crawl website information

Code example of how C# uses regular expressions to crawl website information

Mar 27, 2017 am 11:47 AM
c# regular expression

This article mainly introduces the use of CregularExpression Capture website information, combined with examples to analyze C#'s techniques related to regular crawling operations for web page information. It has certain reference value. Friends in need can refer to the following

Examples in this article The method of using regular expressions to capture website information in C# is shared with you for your reference. The details are as follows:

Here is an example of capturing Jingdong Mall product details. ##1. Create JdRobber.cs program class

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

public class JdRobber

{

  /// <summary>

  /// 判断是否京东链接

  /// </summary>

  /// <param name="param"></param>

  /// <returns></returns>

  public bool ValidationUrl(string url)

  {

    bool result = false;

    if (!String.IsNullOrEmpty(url))

    {

      Regex regex = new Regex(@"^http://item.jd.com/\d+.html$");

      Match match = regex.Match(url);

      if (match.Success)

      {

        result = true;

      }

    }

    return result;

  }

  /// <summary>

  /// 抓取京东信息

  /// </summary>

  /// <param name="param"></param>

  /// <returns></returns>

  public void GetInfo(string url)

  {

    if (ValidationUrl(url))

    {

      string htmlStr = WebHandler.GetHtmlStr(url, "Default");

      if (!String.IsNullOrEmpty(htmlStr))

      {

        string pattern = "";     //正则表达式

        string sourceWebID = "";   //商品关键ID

        string title = "";      //标题

        decimal price = 0;      //价格

        string picName = "";     //图片

        //提取商品关键ID

        pattern = @"http://item.jd.com/(?<Object>\d+).html";

        sourceWebID = WebHandler.GetRegexText(url, pattern);

        //提取标题

        pattern = @"<p.*id=\""name\"".*>[\s\S]*<h1>(?<Object>.*?)</h1>";

        title = WebHandler.GetRegexText(htmlStr, pattern);

        //提取图片

        int begin = htmlStr.IndexOf("<p id=\"spec-n1\"");

        int end = htmlStr.IndexOf("</p>", begin + 1);

        if (begin > 0 && end > 0)

        {

          string subPicHtml = htmlStr.Substring(begin, end - begin);

          pattern = @"<img.*src=\""(?<Object>.*?)\"".*/>";

          picName = WebHandler.GetRegexText(subPicHtml, pattern);

        }

        //提取价格

        if (sourceWebID != "")

        {

          string priceUrl = @"http://p.3.cn/prices/get?skuid=J_" + sourceWebID + "&type=1";

          string priceJson = WebHandler.GetHtmlStr(priceUrl, "Default");

          pattern = @"\""p\"":\""(?<Object>\d+(\.\d{1,2})?)\""";

          price = WebHandler.GetValidPrice(WebHandler.GetRegexText(priceJson, pattern));

        }

        Console.WriteLine("商品名称:{0}", title);

        Console.WriteLine("图片:{0}", picName);

        Console.WriteLine("价格:{0}", price);

      }

    }

  }

}

Copy after login

2. Create WebHandler.cs public method class

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

/// <summary>

/// 公共方法类

/// </summary>

public class WebHandler

{

  /// <summary>

  /// 获取网页的HTML码

  /// </summary>

  /// <param name="url">链接地址</param>

  /// <param name="encoding">编码类型</param>

  /// <returns></returns>

  public static string GetHtmlStr(string url, string encoding)

  {

    string htmlStr = "";

    try

    {

      if (!String.IsNullOrEmpty(url))

      {

        WebRequest request = WebRequest.Create(url); //实例化WebRequest对象

        WebResponse response = request.GetResponse(); //创建WebResponse对象

        Stream datastream = response.GetResponseStream(); //创建流对象

        Encoding ec = Encoding.Default;

        if (encoding == "UTF8")

        {

          ec = Encoding.UTF8;

        }

        else if (encoding == "Default")

        {

          ec = Encoding.Default;

        }

        StreamReader reader = new StreamReader(datastream, ec);

        htmlStr = reader.ReadToEnd(); //读取数据

        reader.Close();

        datastream.Close();

        response.Close();

      }

    }

    catch { }

    return htmlStr;

  }

  /// <summary>

  /// 获取正则表达式中的关键字

  /// </summary>

  /// <param name="input">文本</param>

  /// <param name="pattern">表达式</param>

  /// <returns></returns>

  public static string GetRegexText(string input, string pattern)

  {

    string result = "";

    if (!String.IsNullOrEmpty(input) && !String.IsNullOrEmpty(pattern))

    {

      Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);

      Match match = regex.Match(input);

      if (match.Success)

      {

        result = match.Groups["Object"].Value;

      }

    }

    return result;

  }

  /// <summary>

  /// 返回有效价格

  /// </summary>

  /// <param name="strPrice"></param>

  /// <returns></returns>

  public static decimal GetValidPrice(string strPrice)

  {

    decimal price = 0;

    try

    {

      if (!String.IsNullOrEmpty(strPrice))

      {

        Regex regex = new Regex(@"^\d+(\.\d{1,2})?$", RegexOptions.IgnoreCase);

        Match match = regex.Match(strPrice);

        if (match.Success)

        {

          price = decimal.Parse(strPrice);

        }

      }

    }

    catch { }

    return price;

  }

}

Copy after login

The above is the detailed content of Code example of how C# uses regular expressions to crawl website information. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Active Directory with C# Active Directory with C# Sep 03, 2024 pm 03:33 PM

Guide to Active Directory with C#. Here we discuss the introduction and how Active Directory works in C# along with the syntax and example.

C# Serialization C# Serialization Sep 03, 2024 pm 03:30 PM

Guide to C# Serialization. Here we discuss the introduction, steps of C# serialization object, working, and example respectively.

Random Number Generator in C# Random Number Generator in C# Sep 03, 2024 pm 03:34 PM

Guide to Random Number Generator in C#. Here we discuss how Random Number Generator work, concept of pseudo-random and secure numbers.

C# Data Grid View C# Data Grid View Sep 03, 2024 pm 03:32 PM

Guide to C# Data Grid View. Here we discuss the examples of how a data grid view can be loaded and exported from the SQL database or an excel file.

Patterns in C# Patterns in C# Sep 03, 2024 pm 03:33 PM

Guide to Patterns in C#. Here we discuss the introduction and top 3 types of Patterns in C# along with its examples and code implementation.

Prime Numbers in C# Prime Numbers in C# Sep 03, 2024 pm 03:35 PM

Guide to Prime Numbers in C#. Here we discuss the introduction and examples of prime numbers in c# along with code implementation.

Factorial in C# Factorial in C# Sep 03, 2024 pm 03:34 PM

Guide to Factorial in C#. Here we discuss the introduction to factorial in c# along with different examples and code implementation.

The difference between multithreading and asynchronous c# The difference between multithreading and asynchronous c# Apr 03, 2025 pm 02:57 PM

The difference between multithreading and asynchronous is that multithreading executes multiple threads at the same time, while asynchronously performs operations without blocking the current thread. Multithreading is used for compute-intensive tasks, while asynchronously is used for user interaction. The advantage of multi-threading is to improve computing performance, while the advantage of asynchronous is to not block UI threads. Choosing multithreading or asynchronous depends on the nature of the task: Computation-intensive tasks use multithreading, tasks that interact with external resources and need to keep UI responsiveness use asynchronous.

See all articles