How to Dynamically Generate HTML Code Using .NET's WebBrowser or mshtml.HTMLDocument?
Problem:
Retrieving dynamically generated HTML code from a web page using the WebBrowser class or mshtml.HTMLDocument interface can be a challenge. The WebBrowser class fails to capture the rendered HTML, and mshtml.HTMLDocument returns raw HTML that differs from the actual page content.
Solution:
Using WebBrowser Class:
Although the WebBrowser class does not provide a direct method for obtaining the rendered HTML, it is possible to implement a workaround. Add a WebBrowser control to a form, have it navigate to the desired URL, and then use the following steps to retrieve the HTML:
Using mshtml.HTMLDocument Interface:
Additional Considerations:
Example Code:
<code class="C#">using Microsoft.Win32; using System; using System.Threading; using System.Threading.Tasks; using mshtml; public async Task<string> LoadDynamicPage(string url, CancellationToken token) { var doc = new HTMLDocument(); doc.write(new System.Net.WebClient().DownloadString(url)); // Poll for changes in HTML snapshot var html = doc.documentElement.outerHTML; while (true) { await Task.Delay(500, token); var htmlNow = doc.documentElement.outerHTML; if (html == htmlNow) break; html = htmlNow; } return html; }</code>
The above is the detailed content of How to Retrieve Dynamically Generated HTML Code from Web Browser Controls?. For more information, please follow other related articles on the PHP Chinese website!