Ever wonder what happens when you click a link? ? How The Internet Works takes you behind the scenes of the digital world, breaking down complex tech into simple, bite-sized insights. From data packets to servers and beyond, discover the magic that powers your online experience! (Hook written with the help of AI, because I can't :D)
Let me explain the physical keyboard actions and the OS interrupts. When you press the "g" key, the browser registers the event, triggering the auto-complete functions. Based on your browser's algorithm and whether you're in regular or private/incognito mode, various suggestions appear in a dropdown beneath the URL bar.
These suggestions are typically prioritized and sorted using factors such as your search history, bookmarks, cookies, and popular internet searches. As you continue typing "google.com," numerous processes run in the background, and the suggestions refine with each keystroke. The browser might even predict "google.com" before you've finished typing.
Browsing Autocomplete Sequences
The "enter" key bottoms out
To establish a starting point, let's consider the Enter key on a keyboard when it reaches the bottom of its travel range. At this moment, an electrical circuit dedicated to the Enter key is closed (either mechanically or capacitively), allowing a small current to flow into the keyboard's logic circuitry. This circuitry scans the state of each key switch, filters out electrical noise from the rapid closure of the switch (debouncing), and translates the action into a keycode—in this case, the integer 13. The keyboard controller then encodes this keycode for transmission to the computer. Today, this is almost always done over a Universal Serial Bus (USB) or Bluetooth connection, though older systems used PS/2 or ADB.
In the case of a USB keyboard:
In the case of a virtual keyboard (such as on touch screen devices):
For non-USB keyboards, such as those using legacy connections (e.g., PS/2), the keyboard signals an interrupt via its interrupt request line (IRQ). This IRQ is mapped to an interrupt vector (an integer) by the system's interrupt controller. The CPU consults the Interrupt Descriptor Table (IDT), which links each interrupt vector to a corresponding function known as an interrupt handler, supplied by the operating system’s kernel.
When the interrupt is triggered, the CPU uses the interrupt vector to index into the IDT and execute the appropriate interrupt handler. This process causes the CPU to transition into kernel mode, allowing the operating system to manage the keypress event.
When the Enter key is pressed, the Human Interface Device (HID) transport passes the key down event to the KBDHID.sys driver, which converts the HID usage data into a scan code. In this case, the scan code is VK_RETURN (0x0D), representing the Enter key. The KBDHID.sys driver then communicates with the KBDCLASS.sys driver (the keyboard class driver), which securely manages all keyboard input. Before proceeding, the signal may pass through any third-party keyboard filters installed on the system, though this also happens in kernel mode.
Next, Win32K.sys comes into play, determining which window is currently active by invoking the GetForegroundWindow() API. This function retrieves the window handle (hWnd) of the active application, such as the browser’s address bar. At this point, the Windows "message pump" calls SendMessage(hWnd, WM_KEYDOWN, VK_RETURN, lParam). The lParam parameter contains a bitmask that provides additional information about the keypress, including:
The SendMessage API queues the message for the specific window handle. Later, the system’s main message processing function (known as WindowProc) assigned to the window (hWnd) retrieves and processes messages in the queue.
The active window in this case is an edit control, and its WindowProc function has a message handler that responds to WM_KEYDOWN events. The handler checks the third parameter (wParam) passed by SendMessage, recognizes that the value is VK_RETURN, and thus determines that the user has pressed the Enter key. This triggers the appropriate response for the application.
When a key is pressed on OS X, the interrupt signal triggers an event in the I/O Kit keyboard driver (a kernel extension or "kext"). This driver translates the hardware signal into a key code. The key code is then passed to the WindowServer, which manages the graphical user interface.
The WindowServer dispatches the key press event to the appropriate applications (such as the active or listening ones) by sending it through their Mach port, where it is placed into an event queue. Applications with the proper privileges can access this event queue by calling the mach_ipc_dispatch function.
Most applications handle this process through the NSApplication main event loop, which is responsible for processing user input. When the event is a key press, it is represented as an NSEvent of type NSEventTypeKeyDown. The application then reads this event and responds accordingly, triggering any code related to keypress actions based on the key code received.
When a key is pressed in a graphical environment using the X server, the X server employs the evdev (event device) driver to capture the keypress event. The keycode from the physical keyboard is then re-mapped into a scancode using X server-specific keymaps and rules.
Once the mapping is complete, the X server forwards the resulting scancode to the window manager (such as DWM, Metacity, i3, etc.). The window manager, in turn, sends the character or key event to the currently focused window. The graphical API of the focused window processes this event and displays the corresponding symbol in the appropriate field, using the correct font, based on the key pressed.
This flow ensures that the character is correctly rendered in the active application’s interface, completing the keypress interaction from hardware to graphical output.
When the browser parses the URL(Uniform Resource Locator), it extracts the following components:
Each of these components helps the browser interpret and fetch the desired resource from the web.
When no protocol (e.g., "http") or valid domain name is provided, the browser interprets the text in the address bar as a potential search term. Instead of trying to resolve it as a URL, the browser forwards the text to its default web search engine.
In most cases, the browser appends a special identifier to the search query, indicating that the request originated from the browser's URL bar. This allows the search engine to handle and prioritize these searches accordingly, improving the relevance of the results based on the context.
This process helps the browser determine whether it should attempt to navigate directly to a website or provide search results based on the entered text.
The browser first checks its preloaded HSTS (HTTP Strict Transport Security) list, which contains websites that have explicitly requested to be accessed only via HTTPS.
If the requested website is found on this list, the browser automatically sends the request using HTTPS rather than HTTP. If the website is not in the HSTS list, the initial request is sent via HTTP.
It’s important to note that a website can still implement HSTS without being included in the preloaded list. In such cases, the first HTTP request made by the user will return a response instructing the browser to only send subsequent requests via HTTPS. However, this initial HTTP request could expose the user to a downgrade attack, where an attacker might intercept the request and force it to remain unencrypted. This vulnerability is why modern web browsers include the HSTS list, enhancing security for users by preventing insecure connections from being established in the first place.
The browser begins the DNS lookup process by checking if the domain is already present in its cache. (To view the DNS cache in Google Chrome, navigate to chrome://net-internals/#dns.)
If the domain is not found in the cache, the browser calls the gethostbyname library function (the specific function may vary depending on the operating system) to perform the hostname resolution.
Local Hosts File Check:
DNS Server Request:
ARP Process for DNS Server:
This systematic approach ensures that the browser efficiently resolves domain names to IP addresses, enabling it to establish a connection to the desired website. By checking the cache first, using the local hosts file, and finally querying the DNS server, the browser minimizes the time spent on hostname resolution.
Sequence Diagram
In order to send an ARP (Address Resolution Protocol) broadcast, the network stack library needs two key pieces of information: the target IP address that needs to be looked up and the MAC address of the interface that will be used to send out the ARP broadcast.
The ARP cache is first checked for an entry corresponding to the target IP address. If an entry exists, the library function returns the result in the format:
Target IP = MAC.
If the Entry is Not in the ARP Cache:
If there is no entry for the target IP address, the following steps are taken:
The network library constructs and sends a Layer 2 (data link layer of the OSI model) ARP request with the following format: ARP Request:
Depending on the hardware setup between the computer and the router, the behavior of the ARP request varies:
If the computer is directly connected to the router, the router will respond with an ARP Reply (see below).
If the computer is connected to a hub, the hub will broadcast the ARP request out of all its other ports. If the router is connected to the same "wire," it will respond with an ARP Reply (see below).
If the computer is connected to a switch, the switch will check its local CAM/MAC table to identify which port has the MAC address being queried. If the switch has no entry for the MAC address, it will rebroadcast the ARP request to all other ports. If the switch does have an entry in its MAC/CAM table, it will send the ARP request only to the port that has the corresponding MAC address.
The ARP reply will have the following format:
Sender MAC: target:mac:address:here
Sender IP: target.ip.goes.here
Target MAC: interface:mac:address:here
Target IP: interface.ip.goes.here
Now that the network library has obtained the IP address of either the DNS server or the default gateway, it can resume its DNS process:
Once the browser receives the IP address of the destination server, it combines this with the port number specified in the URL (where HTTP defaults to port 80 and HTTPS to port 443). The browser then makes a call to the system library function named socket, requesting a TCP socket stream using AF_INET or AF_INET6 and SOCK_STREAM.
此时,数据包已准备好通过以下方法之一进行传输:
对于大多数家庭或小型企业互联网连接,数据包将从您的计算机传递,可能通过本地网络,然后通过调制解调器(调制器/解调器)。该调制解调器将数字 1 和 0 转换为适合通过电话、电缆或无线电话连接传输的模拟信号。在连接的另一端,另一个调制解调器将模拟信号转换回数字数据,以供下一个网络节点处理,其中将进一步分析起始地址和目标地址。
相比之下,较大的企业和一些较新的住宅连接将使用光纤或直接以太网连接,从而使数据保持数字化并直接传递到下一个网络节点进行处理。
最终,数据包将到达管理本地子网的路由器。从那里,它将继续前往自治系统 (AS) 的边界路由器,遍历其他 AS,最后到达目标服务器。沿途的每个路由器从 IP 标头中提取目标地址,并将其路由到适当的下一跳。对于每个处理它的路由器,IP 标头中的生存时间 (TTL) 字段会减一。如果 TTL 字段达到零或者当前路由器队列中没有空间(这可能是由于网络拥塞而发生),数据包将被丢弃。
此发送和接收过程按照 TCP 连接流程发生多次:
将(客户端 ISN 1)复制到其 ACK 字段并添加 ACK 标志以指示它正在确认收到第一个数据包。
客户端通过发送以下数据包来确认连接:
数据传输:数据传输如下:
关闭连接:关闭连接:
打开套接字:序列图
此握手过程在客户端和服务器之间建立安全连接,确保通过连接传输的数据不被窃听和篡改。
有时,由于网络拥塞或不稳定的硬件连接,TLS 数据包可能会在到达最终目的地之前被丢弃。在这种情况下,发送者必须决定如何反应。管理此响应的算法称为 TCP 拥塞控制。具体实现可能因发送者而异,最常见的算法是较新操作系统上的 Cubic 和许多其他操作系统上的 New Reno。
这种拥塞控制机制有助于优化网络性能和稳定性,确保数据能够高效传输,同时最大限度地减少丢包的影响。
如果使用的网络浏览器是由 Google 开发的,它可能会尝试与服务器协商从 HTTP 到 SPDY 协议的“升级”,而不是发送标准 HTTP 请求来检索页面。
如果客户端使用的是HTTP协议且不支持SPDY,则会按照以下格式向服务器发送请求:
GET / HTTP/1.1 Host: google.com Connection: close [other headers]
这里,[其他标头]指的是一系列以冒号分隔的键值对,这些键值对按照 HTTP 规范格式化,并以单个换行符分隔。这假设 Web 浏览器不存在违反 HTTP 规范的错误,并且它正在使用 HTTP/1.1。如果它使用不同的版本,例如 HTTP/1.0 或 HTTP/0.9,它可能不会在请求中包含 Host 标头。
HTTP/1.1 为发送方定义了“关闭”连接选项,以表明响应完成后将关闭连接。例如:
Connection: close
不支持持久连接的 HTTP/1.1 应用程序必须在每条消息中包含“关闭”连接选项。
发送请求和标头后,Web 浏览器会向服务器发送一个空白换行符,表示请求内容已完成。
服务器随后使用表示请求状态的响应代码进行响应,其结构如下:
200 OK [response headers]
后面跟着一个换行符,然后是包含 www.google.com 的 HTML 内容的有效负载。服务器可以关闭连接,或者,如果客户端发送的标头请求,则保持连接打开以便在进一步的请求中重用。
If the HTTP headers sent by the web browser contained sufficient information for the web server to determine whether the version of the file cached by the web browser has been unmodified since the last retrieval (for example, if the web browser included an ETagheader), the server may instead respond with:
304 Not Modified [response headers]
This response will have no payload, and the web browser will retrieve the HTML from its cache.
After parsing the HTML, the web browser (and server) repeats this process for every resource (image, CSS, favicon.ico, etc.) referenced in the HTML page. In these cases, instead of GET / HTTP/1.1, the request will be structured as:
GET /$(URL relative to www.google.com) HTTP/1.1
If the HTML references a resource on a different domain than www.google.com, the web browser returns to the steps involved in resolving the other domain, following all steps up to this point for that domain. The Host header in the request will be set to the appropriate server name instead of google.com.
The HTTPD (HTTP Daemon) server is responsible for handling requests and responses on the server side. The most common HTTPD servers include Apache and Nginx for Linux, as well as IIS for Windows.
By following these steps, the HTTPD server efficiently processes incoming requests and returns the appropriate responses to the client.
The primary functionality of a browser is to present the web resources you choose by requesting them from a server and displaying them in the browser window. The resource is typically an HTML document but may also include PDFs, images, or other types of content. The location of the resource is specified by the user using a URI (Uniform Resource Identifier).
The way a browser interprets and displays HTML files is defined by the HTML and CSS specifications, which are maintained by the W3C (World Wide Web Consortium), the standards organization for the web.
Browser user interfaces share many common features, including:
The components of a browser can be broken down as follows:
每个组件协同工作以创建无缝的浏览体验,使用户能够高效地访问网络资源并与之交互。
渲染引擎开始从网络层检索所请求文档的内容,通常以 8 kB 块的形式检索。 HTML 解析器的主要职责是将 HTML 标记转换为称为解析树的结构化表示。
输出树,称为“解析树”,由 DOM(文档对象模型)元素和属性节点的层次结构组成。 DOM 用作 HTML 文档的对象表示,为 HTML 元素提供与外部脚本(例如 JavaScript)交互的接口。这棵树的根是“Document”对象,在任何脚本操作之前,DOM 与原始标记保持几乎一一对应。
由于多种因素,使用传统的自上而下或自下而上的解析器无法有效地解析 HTML:
解析完成后,浏览器将继续获取链接到页面的外部资源,例如 CSS 样式表、图像和 JavaScript 文件。此时,浏览器将文档标记为交互式,并开始解析处于“延迟”模式的脚本,这意味着这些脚本将在文档完全解析后执行。然后文档状态设置为“完成”,并触发“加载”事件。
重要的是,浏览器不会为 HTML 页面生成“无效语法”错误。相反,它们会自动更正任何无效内容并继续处理文档,确保用户可以在最小干扰的情况下查看网页。
CSS解释的过程涉及几个关键步骤:
通过这种解释,浏览器可以全面了解如何将样式应用于 DOM 中的 HTML 元素,从而促进网页呈现出预期的视觉呈现效果。
网页的渲染过程涉及几个结构化步骤:
该图像也是由 GPU 渲染的
渲染过程完成后,浏览器执行由各种事件触发的 JavaScript 代码,例如计时机制(如 Google Doodle 动画)或用户交互(例如,在搜索框中输入查询并接收建议)。
以上是互联网如何运作?第 1 部分的详细内容。更多信息请关注PHP中文网其他相关文章!