3.2 Implementation Principle of Caching
3.2.1 What is Web Cache
WEB cache (cache) is located between the Web server and the client.
The cache will save a copy of the output content according to the request, such as html page, picture, file. When the next request comes: if it is the same URL, the cache will directly use the copy to respond to the access request instead of sending the request to the source server again. .
The HTTP protocol defines relevant message headers to make WEB caching work as well as possible.
3.2.2 Advantages of caching
Reduced response latency: Because requests are responded from the cache server (closer to the client) rather than the origin server, this process takes less time, making the web server appear to respond faster.
Reduce network bandwidth consumption: When replicas are reused, the bandwidth consumption of the client will be reduced; customers can save bandwidth costs, control the growth of bandwidth requirements and make it easier to manage.
3.2.3 HTTP extension message headers related to cache
Expires: Indicates the expiration time of the response content, Greenwich Mean Time GMT
Cache-Control: More detailed control of cached content
Last-Modified: In response The time when the resource was last modified
ETag: The check value of the resource in the response, which is uniquely identified on the server for a certain period of time.
Date: Server time
If-Modified-Since: The time when the resource accessed by the client was last modified, the same as Last-Modified.
If-None-Match: The check value of the resource accessed by the client, the same as ETag.
3.2.4 Common process for client cache to take effect
When the server receives the request, it will send back the Last-Modified and ETag headers of the resource in 200OK. The client will save the resource in the cache and record these two Attributes. When the client needs to send the same request, it will carry two headers, If-Modified-Since and If-None-Match, in the request. The values of the two headers are the values of the Last-Modified and ETag headers in the response. The server determines that the local resource has not changed through these two headers, and the client does not need to download it again and returns a 304 response. The common process is shown in the figure below:
3.2.5 Web caching mechanism
The purpose of caching in HTTP/1.1 is to reduce sending requests in many cases, and at the same time, in many cases there is no need to send a complete response. The former reduces the number of network loops; HTTP utilizes an "expiration" mechanism for this purpose. The latter reduces the bandwidth of network applications; HTTP uses a "validation" mechanism for this purpose.
HTTP defines 3 caching mechanisms:
1) Freshness: allows a response message to be rechecked at the source server and can be controlled by the server and client. For example, the Expires response header gives the time a document was unavailable. The max-age flag in Cache-Control indicates the maximum time for caching;
2) Validation: Used to check whether a cached response is still available. For example, if a response has a Last-Modified response header, the cache can use If-Modified-Since to determine whether it has changed, so as to determine whether to send the request according to the situation;
3) Invalidation: When another request passes the cache, it is often There is a side effect. For example, if a URL is associated with a cached response but is followed by POST, PUT, and DELETE requests, the cache will expire.
3.3 The implementation principle of breakpoint resumption and multi-threaded download
The GET method of HTTP protocol supports requesting only a certain part of a resource;
206 Partial Content partial content response;
Range requested resource range;
Content-Range The resource range of the response;
When the connection is disconnected and reconnected, the client only requests the undownloaded part of the resource instead of re-requesting the entire resource to achieve breakpoint resumption.
Blocked resource request example:
Eg1: Range: bytes=306302-: Request the part from 306302 bytes to the end of this resource;
Eg2: Content-Range: bytes 306302-604047/604048: indicated in the response It carries the 306302-604047th bytes of the resource, and the resource has a total of 604048 bytes;
The client achieves concurrent block download of a certain resource by concurrently requesting different fragments of the same resource. So as to achieve the purpose of fast downloading. The currently popular FlashGet and Thunder basically use this principle.
The principle of multi-threaded downloading:
The download tool opens multiple threads that issue HTTP requests;
Each http request only requests a part of the resource file: Content-Range: bytes 20000-40000/47000;
Merge each Thread downloaded files.
3.4 https communication process
3.4.1 What is https
HTTPS (full name: Hypertext Transfer Protocol over Secure Socket Layer), which is an HTTP channel targeting security. Simply put, it is a secure version of HTTP. That is, the SSL layer is added to HTTP. The security foundation of HTTPS is SSL, so please see SSL for details on encryption.
See the picture below:
The port number used by https is 443.