Extracting Video Links from Nested IFrames within WinForms WebBrowser
Web scraping video links using the WinForms WebBrowser
control presents challenges when dealing with nested iframes. The standard approach often fails to locate <video>
tags due to the iframe structure.
The Solution: Recursive IFrame Traversal
The key is to recursively traverse the iframe hierarchy. Each iframe has its own HtmlDocument
, requiring a method to navigate and extract data from each nested level.
Leveraging the DocumentCompleted
Event
To ensure the page is fully loaded before parsing, subscribe to the DocumentCompleted
event. Only process the iframes once ReadyState
is WebBrowserReadyState.Complete
.
Example Implementation (Improved Error Handling and Clarity)
The following code provides a robust solution, incorporating improved error handling and clearer variable names:
<code class="language-csharp">public class MovieLink { public int Hash { get; set; } public string VideoLink { get; set; } public string ImageLink { get; set; } } private List<MovieLink> movieLinks = new List<MovieLink>(); private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e) { if (webBrowser1.ReadyState != WebBrowserReadyState.Complete) return; ExtractVideoLinks(webBrowser1.Document); } private void ExtractVideoLinks(HtmlDocument document) { try { foreach (HtmlElement videoElement in document.GetElementsByTagName("video")) { string videoLink = videoElement.GetAttribute("src"); if (string.IsNullOrEmpty(videoLink)) continue; //Skip if src is missing int hash = videoLink.GetHashCode(); if (movieLinks.Any(m => m.Hash == hash)) continue; //Skip duplicates string posterImage = videoElement.GetAttribute("poster"); movieLinks.Add(new MovieLink { Hash = hash, VideoLink = videoLink, ImageLink = posterImage }); } // Recursively process iframes foreach (HtmlWindow frame in document.Window.Frames) { ExtractVideoLinks(frame.Document); } } catch (Exception ex) { // Log the exception for debugging purposes. Don't let one iframe failure halt the entire process. Console.WriteLine($"Error processing iframe: {ex.Message}"); } }</code>
This improved code recursively calls ExtractVideoLinks
for each iframe, handling potential exceptions gracefully. It also includes checks for null or empty src
attributes and duplicate video links. This approach ensures more reliable and complete extraction of video links from complex web pages.
The above is the detailed content of How to Retrieve HtmlElement Values from Within Frames/IFrames in a WinForms WebBrowser Control?. For more information, please follow other related articles on the PHP Chinese website!