Headless browser refers to a browser that can run with a graphical interface. I can control the headless browser to automatically perform various tasks through programming, such as doing tests, taking screenshots of web pages, etc.
The word "headless" comes from the original "headless computer". Wikipedia's entry on "headless computer":
A headless system refers to a computer system that has been configured to operate without a monitor (i.e., "head"), keyboard, and mouse. equipment. Headless systems are usually controlled through a network connection, but some headless system devices require device management through an RS-232 serial connection. Servers usually use headless mode to reduce operating costs.
In addition to the two previously mentioned harmless use cases, headless browsers can be used to automate malicious tasks. The most common form is to do a web crawler, or disguise the traffic, or detect website vulnerabilities.
A very popular headless browser is PhantomJS. Because it is based on the Qt framework, it has many different features compared to our common browsers, so there are many ways to identify it.
However, starting from chrome 59, Google released a headless Google Chrome. It is different from PhantomJS in that it is developed based on the orthodox Google Chrome, not other frameworks, which makes it difficult for the program to distinguish whether it is a normal browser or a headless browser.
Below, we will introduce several methods to determine whether a program is running in a normal browser or a headless browser.
Note: These methods have only been tested on four devices (2 Linux, 2 Mac). In other words, there must be many other methods to detect headless browsers. browser.
Let’s first introduce the most common way to determine the type of browser, checking the User agent. In the Linux computer, the User agent value of Chrome version 59 headless browser is:
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/59.0.3071.115 Safari/537.36 ”
So, we can detect whether it is a headless Chrome browser like this:
if (/HeadlessChrome/.test(window.navigator.userAgent)) { console.log("Chrome headless detected"); }
User agent can also be obtained from HTTP headers. However, both cases are easily faked.
navigator.plugins will return an array containing the plug-in information in the current browser. Usually, the ordinary Chrome browser has some default plug-ins, such as Chrome PDF viewer or Google Native Client. In contrast, in headless mode, there are no plugins and an empty array is returned.
if(navigator.plugins.length == 0) { console.log("It may be Chrome headless"); }
In Google Chrome, there are two JavaScript properties that can get the current browser language settings: navigator.language and navigator.languages. The first one refers to the language of the browser interface, and the latter one returns an array, which stores all the second-choice languages of the browser user. However, in headless mode, navigator.languages returns an empty string.
if(navigator.languages == "") { console.log("Chrome headless detected"); }
WebGL provides a set of APIs that can perform 3D rendering in HTML canvas. Through these APIs, we can query the graphics driver vendor and renderer.
In the ordinary Google Chrome on Linux, the renderer and vendor values we get are: "Google SwiftShader" and "Google Inc.".
In the headless mode, the one we get is "Mesa OffScreen" - it is the name of the rendering technology that does not use any window system, and "Brian Paul" - the original name of the open source Mesa graphics library program of.
var canvas = document.createElement('canvas'); var gl = canvas.getContext('webgl'); var debugInfo = gl.getExtension('WEBGL_debug_renderer_info'); var vendor = gl.getParameter(debugInfo.UNMASKED_VENDOR_WEBGL); var renderer = gl.getParameter(debugInfo.UNMASKED_RENDERER_WEBGL); if(vendor == "Brian Paul" && renderer == "Mesa OffScreen") { console.log("Chrome headless detected"); }
Not all versions of headless browsers have the same two values. However, currently in headless browsers there are two values: "Mesa Offscreen" and "Brian Paul".
Modernizr can detect the current browser's support for various features of HTML and CSS. I found that the only difference between normal Chrome and headless Chrome is that there is no hairline feature in headless mode. It is used to detect whether hidpi/retina hairlines are supported.
if(!Modernizr["hairline"]) { console.log("It may be Chrome headless"); }
Finally, the last method I found is also the most effective method. The starting point is to check the height and width of images that cannot be loaded normally in the browser.
In normal Chrome, the size of the unsuccessfully loaded image is related to the browser's zoom, but it is definitely not zero. In the headless Chrome browser, the width and height of this image are both 0.
var body = document.getElementsByTagName("body")[0]; var image = document.createElement("img"); image.src = "http://iloveponeydotcom32188.jg"; image.setAttribute("id", "fakeimage"); body.appendChild(image); image.onerror = function(){ if(image.width == 0 && image.height == 0) { console.log("Chrome headless detected"); } }
Recommended tutorial: "javascript video tutorial"
The above is the detailed content of Use JavaScript to detect whether the current browser is a headless browser. For more information, please follow other related articles on the PHP Chinese website!