Home > Backend Development > C++ > How to Extract Data from HTML Elements Within Frames and IFrames?

How to Extract Data from HTML Elements Within Frames and IFrames?

Mary-Kate Olsen
Release: 2025-01-18 23:16:12
Original
670 people have browsed it

How to Extract Data from HTML Elements Within Frames and IFrames?

Parse HTML elements within frames and iframes

You are having trouble finding the <video> tag while trying to extract a video link from the provided website. This is because the website uses frames (iframes), which effectively isolate different parts of the content into separate HTML documents.

To solve this problem, you need to dig into the collection of frames in the main document. Each frame contains its own HTML document, and access to these individual documents is necessary to extract data from all parts of the website.

Solution:

Use the WebBrowser.Document.Window.Frames attribute to access the frame collection. Each HtmlWindow in this collection has its own HtmlDocument object.

Modify your code to iterate over each frame's document, using the Frame.Document.Body.GetElementsByTagName() method to retrieve the element you need. Use HtmlElement.GetAttribute to extract element attributes.

Example:

<code class="language-csharp">List<MovieLink> moviesLinks = new List<MovieLink>();

private void Browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
    var browser = sender as WebBrowser;
    if (browser.ReadyState != WebBrowserReadyState.Complete) return;

    var documentFrames = browser.Document.Window.Frames;
    foreach (HtmlWindow frame in documentFrames)
    {
        try
        {
            var videoElement = frame.Document.Body
                .GetElementsByTagName("VIDEO").OfType<HtmlElement>().FirstOrDefault();

            if (videoElement != null)
            {
                string videoLink = videoElement.GetAttribute("src");
                int hash = videoLink.GetHashCode();
                if (moviesLinks.Any(m => m.Hash == hash))
                {
                    return; // 此 URL 的解析已完成
                }

                string sourceImage = videoElement.GetAttribute("poster");
                moviesLinks.Add(new MovieLink()
                {
                    Hash = hash,
                    VideoLink = videoLink,
                    ImageLink = sourceImage
                });
            }
        }
        catch (UnauthorizedAccessException) { } // 忽略此异常
        catch (InvalidOperationException) { } // 忽略此异常
    }
}</code>
Copy after login

Instructions:

  • The DocumentCompleted event may fire multiple times as the browser loads each frame document.
  • Some frames may not be accessible, or their elements may throw exceptions when accessing properties. Ignore these exceptions because they are unavoidable.
  • Use hashing to avoid duplicate link storage. When a duplicate hash value is found, you can stop parsing the URL.

The above is the detailed content of How to Extract Data from HTML Elements Within Frames and IFrames?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template