In-depth understanding of HTTP protocol
1. Basic concepts
1.1 Introduction
HTTP is the abbreviation of Hyper Text Transfer Protocol. Its development is the result of cooperation between the World Wide Web Consortium and the Internet Engineering Task Force (IETF), which eventually released a series of RFCs. RFC 1945 defines the HTTP/1.0 version. The most famous of these is RFC 2616. RFC 2616 defines a version commonly used today - HTTP 1.1.
HTTP protocol (HyperText Transfer Protocol, Hypertext Transfer Protocol) is a transfer protocol used to transfer hypertext from the WWW server to the local browser. It can make the browser more efficient and reduce network transmission. It not only ensures that the computer transmits hypertext documents correctly and quickly, but also determines which part of the document is transmitted and which part of the content is displayed first (such as text before graphics), etc.
HTTP is an application layer protocol, consisting of requests and responses, and is a standard client-server model. HTTP is a stateless protocol.
1.2 Position in the TCP/IP protocol stack
The HTTP protocol is usually carried on top of the TCP protocol, and sometimes on top of the TLS or SSL protocol layer. At this time, it becomes what we often call HTTPS. As shown in the figure below:
The default port number for HTTP is 80 and the port number for HTTPS is 443.
1.3 HTTP request response model
HTTP protocol always initiates a request from the client and sends back a response from the server. See the picture below:
This limits the use of the HTTP protocol, and it is impossible for the server to push messages to the client when the client does not initiate a request.
HTTP protocol is a stateless protocol. There is no correspondence between this request and the last request of the same client.
1.4 Workflow
An HTTP operation is called a transaction, and its working process can be divided into four steps:
1) First, the client and server need to establish a connection. Just click on a hyperlink and HTTP's work begins.
2) After establishing the connection, the client sends a request to the server. The format of the request is: Uniform Resource Identifier (URL), protocol version number, followed by MIME information including request modifiers, client information and possible content. .
3) After receiving the request, the server will give corresponding response information. The format is a status line, including the protocol version number of the information, a success or error code, followed by MIME information including server information, entity information and possible content.
4) The client receives the information returned by the server and displays it on the user's display through the browser, and then the client disconnects from the server.
If an error occurs in a certain step in the above process, the error message will be returned to the client and output on the display screen. For the user, these processes are completed by HTTP itself. The user only needs to click with the mouse and wait for the information to be displayed.
1.5 Use Wireshark to capture TCP and http packets
Open Wireshark, select "Capture" -> "Options" on the toolbar, the interface selection is shown in Figure 1:
General readers only need to select the top drop-down box, select the appropriate Device, and then click "Capture Filter". The selection here is "HTTP TCP port (80)". After selection, click "Start" in the picture above to start capturing packets.
For example, open http://image.baidu.com/ in the browser, and the packet capture is shown in Figure 3:
http://www.blogjava.NET/images/blogjava_net/amigoxie/40799/ o_http%e5%8d%8f%e8%ae%ae%e5%ad%a6%e4%b9%a0-%e6%a6%82%e5%bf%b5-3.jpg
In the picture above, you can clearly see the interaction process between the client browser (ip is 192.168.2.33) and the server:
1) No1: The browser (192.168.2.33) sends a message to the server (220.181.50.118) ) issues a connection request. This is the first step of the TCP three-way handshake. As can be seen from the figure, it is SYN, seq: The request requires confirmation. At this time, it is: SYN, ACK. At this time, seq: y (y is 0), ACK: x+1 (is 1). This is the second step of the three-way handshake;
3) No3: The browser (192.168.2.33) responded to the server (220.181.50.118) for confirmation, and the connection was successful. is: ACK, at this time seq: x+1 (is 1), ACK: y+1 (is 1). This is the third step of the three-way handshake;
4) No4: The browser (192.168.2.33) issues a page HTTP request;
5) No5: The server (220.181.50.118) confirms;
6) No6: The server ( 220.181.50.118) Send data;
7) No7: Client browser (192.168.2.33) confirms;
8) No14: Client (192.168.2.33) issues an image HTTP request;
9) No15: Server (220.181.50.118) Send status response code 200 OK
......
1.6 Header field
Each header field consists of a domain name, a colon (:) and a domain value. Domain names are case-insensitive. Any number of spaces can be added before the field value. Header fields can be expanded to multiple lines, using at least one space or tab at the beginning of each line.
In the packet capture picture, click on No14 to see Figure 4:
http://www.blogjava.net/images/blogjava_net/amigoxie/40799/o_http%e5%8d%8f%e8% ae%ae%e5%ad%a6%e4%b9%a0-%e6%a6%82%e5%bf%b5-4.jpgThe response message is shown in Figure 5:
1.6.1 Host header field
The Host header field specifies the Internet host and port number of the requested resource, and must indicate the location of the original server or gateway of the requested URL. HTTP/1.1 requests must include the host header field, otherwise the system will return with a 400 status code.
The behavior of the host in Figure 5:
1.6.2 Referer header field
The Referer header field allows the client to specify the source resource address of the request URI, which allows the server to generate a fallback list, which can be used to log in and optimize the cache wait. It also allows abandoned or faulty connections to be tracked for maintenance purposes. If the requested URI does not have its own URI address, the Referer cannot be sent. If a partial URI address is specified, this address should be a relative address.
In Figure 4, the content of the Referer line is:
1.6.3 User-Agent header field
The content of the User-Agent header field contains the user information that made the request.
In Figure 4, the content of the User-Agent line is:
http://www.blogjava.net/images/blogjava_net/amigoxie/40799/o_http%e5%8d%8f%e8%ae%ae%e5% ad%a6%e4%b9%a0-%e6%a6%82%e5%bf%b5-8.jpg
1.6.4 Cache-Control header field
Cache-Control specifies the cache that requests and responses follow mechanism. Setting Cache-Control in a request message or response message does not modify the caching process during the processing of another message. The caching instructions during the request include no-cache, no-store, max-age, max-stale, min-fresh, only-if-cached, and the instructions in the response message include public, private, no-cache, no-store, no-transform, must-revalidate, proxy-revalidate, max-age.
The header field in Figure 5 is:
1.6.5 Date header field
The Date header field indicates the time when the message was sent. The description format of the time is defined by rfc822. For example, Date:Mon,31Dec200104:25:57GMT. The time described by Date represents world standard time. To convert it to local time, you need to know the user's time zone.
In Figure 5, the header field is as shown below:
1.7 Several important concepts of HTTP
1.7.1 Connection: Connection
An actual circulation of the transport layer, which is established between two applications that communicate with each other.
In http1.1, a connection header may appear in the request and response headers. The meaning of this header is how to handle long links when the client and server communicate.
In http1.1, the client and server support long links by default. If the client uses the http1.1 protocol but does not want to use long links, you need to specify the value of connection in the header as close; if the server If the party does not want to support long links, it also needs to clearly indicate that the value of connection is close in the response. Whether the request or response header contains a connection with a value of close, it indicates that the TCP link currently in use will be disconnected after the request is processed that day. In the future, the client must create a new tcp link when making a new request.
1.7.2 Message: Message
The basic unit of HTTP communication, including a structured sequence of eight-tuple and transmitted through the connection.
1.7.3 Request: Request
A request information from the client to the server includes the method applied to the resource, the identifier of the resource and the version number of the protocol.
1.7.4 Response: Response
A message returned from the server includes the version number of the HTTP protocol, the status of the request (such as "successful" or "not found") and the MIME type of the document.
1.7.5 Resource: Resource
A network data object or service identified by a URI.
1.7.6 Entity: Entity
A special representation of a data resource or a reflection from a service resource, which may be enclosed in a request or response message. An entity includes entity header information and the entity's own content.
1.7.7 Client: Client
An application that establishes a connection for the purpose of sending requests.
1.7.8 UserAgent: UserAgent
Initializes a requesting client. These are browsers, editors or other user tools.
1.7.9 Server: Server
An application that accepts connections and returns information to requests.
1.7.10 Origin Server: Originserver
is a server on which a given resource can reside or be created.
1.7.11 Proxy: Proxy
An intermediate program that can act as a server or a client to establish requests for other clients. Requests are passed internally or via other servers via possible translations. A proxy must interpret and if possible rewrite a request message before sending it.
Proxies often act as a portal for clients through firewalls. Proxies can also serve as a helper application to handle requests over protocols that are not completed by the user agent.
1.7.12 Gateway: Gateway
A server that acts as an intermediary for other servers. Unlike a proxy, a gateway accepts requests as if it were the origin server for the requested resource; the requesting client is unaware that it is dealing with the gateway.
A gateway often acts as a server-side portal through a firewall. A gateway can also act as a protocol translator to access resources stored in non-HTTP systems.
1.7.13 Channel: Tunnel
is an intermediary program that acts as a relay between the two connections. Once activated, the channel is not considered to belong to HTTP communication, although the channel may be initiated by an HTTP request. When both ends of the relayed connection are closed, the channel disappears. Channels are often used when a portal must exist or when an intermediary cannot interpret the relayed traffic.
1.7.14 Cache: Cache
local storage of response information.
Appendix: References
"http_Baidu Encyclopedia": http://baike.baidu.com/view/9472.htm
"Result Encoding and http Status Response Code": http://blog.tieniu1980 .cn/archives/377
"Analysis of TCP's three-way handshake":
http://cache.baidu.com/c?m=9f65cb4a8c8507ed4fece763104c8c711923d030678197027fa3c215cc7905141130a8e5747e 0d548d98297a5ae91e03f7f63772315477e3cacdd94cdbbdc42225d82c36734f 844315c419d891007a9f34d507a9f916a2e1b065d2f48193864353bb15543897 f1fb4d711edd1b86033093b1e94e022e67adec40728e2e605f983431c5508fe4&p=c6769a46c5820efd08e2973b42&user=baidu
"Use Wireshark to detect an HTTP connection process":
http://blog.163.com/wangbo_tester/blog/static/12806792120098174162288/
"Several Important Concepts of the http Protocol": http://nc.mofcom.gov.cn/news/10819972.html
"The role of the connection header in the http protocol":
http://blog.csdn.net/barfoo/archive/2008/06/05/2514667.aspx
2. Detailed explanation of the protocol
2.1 HTTP/1.0 Comparison with HTTP/1.1
RFC 1945 defines the HTTP/1.0 version, and RFC 2616 defines the HTTP/1.1 version.
The author provides the download addresses of the Chinese versions of these two RFCs on the blog.
RFC1945 download address:
http://www.blogjava.Net/Files/amigoxie/RFC1945 (HTTP) Chinese version.rar
RFC2616 download address:
http://www.blogjava.net/Files/ amigoxie/RFC2616 (HTTP) Chinese version.rar
2.1.1 Connection establishment aspects
HTTP/1.0 Each request needs to establish a new TCP connection, and the connection cannot be reused. HTTP/1.1 New requests can be sent on top of the TCP connection established by the previous request, and the connection can be reused. The advantage is to reduce the overhead of repeated TCP three-way handshake and improve efficiency.
Note: In the same TCP connection, new requests need to wait until the last request receives a response before they can be sent.
2.1.2 Host domain
HTTP1.1 has an additional Host domain in the Request message header, while HTTP1.0 does not have this domain.
Eg:
GET /pub/WWW/TheProject.html HTTP/1.1
Host: www.w3.org
Maybe HTTP1.0 thinks that the IP address has been specified when establishing the TCP connection. This IP There is only one host at the address.
2.1.3 Date and time stamp
(receiving direction)
Whether it is HTTP1.0 or HTTP1.1, it must be able to parse the following three date/time stamps:
Sun, 06 Nov 1994 08:49:37 GMT ; RFC 822, updated by RFC 1123
Sunday, 06-Nov-94 08:49:37 GMT ; RFC 85 0, obsoleted by RFC 1036
Sun Nov 6 08:49:37 1994 ; ANSI C's asctime() format
(Sending direction)
HTTP1.0 requires that date/time stamps in the third asctime format cannot be generated;
HTTP1.1 requires that only date/time stamps in RFC 1123 (first) format be generated.
2.1.4 Status response code
Status response code 100 (Continue) The use of status code allows the client to use the request header to test the server before sending the request message body to see if the server wants to receive the request body, and then decide. Do not send request body.
The client includes
Expect: 100-continue
in the Request header. After the Server sees it, if it returns the status code 100 (Continue), the client will continue to send the request body. This is only available in HTTP1.1.
In addition, HTTP/1.1 also added 101, 203, 205, etc. sexual status response codes
2.1.5 request methods
HTTP1.1 added Request methods such as OPTIONS, PUT, DELETE, TRACE, and CONNECT.
Method = "OPTIONS" ; Section 9.2
"GET" ; Section 9.3
; Section 9.9
| Extension-Method
Extension-Method = Token
2.2 HTTP Request Message
2.2.1 Request Message Format as shown below:
Request
General Information Head | Request Header|Entity header
CRLF (carriage return and line feed)
Entity content
The "request line" is: request line = method [space] request URI [space] version number [carriage return and line feed]
request line example:
Eg1:
GET /index.html HTTP/1.1
Eg2:
POST http://192.168.2.217:8080/index.jsp HTTP/1.1
HTTP request message example:
GET /h ello .htm HTTP/1.1Accept: */*
Accept-Language: zh-cnAccept-Encoding: gzip, deflate
If-Modified-Since: Wed, 17 Oct 2007 02:15:55 GMTIf-None- Match: W/"158-1192587355000"
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)Host: 192.168.2.162:8080
Connection: Keep-Alive
2.2. 2 Request Methods
HTTP request methods include the following:
q GET
q POST
q PUT
q DELETE
q OPTIONS
q TRACE
q CONNECT
2.3 HTTP Response message
2.3.1 Response message format
The format of the HTTP response message is as follows:
Status line
General information header | Response header | Entity header
CRLF
Entity content
Where: Status line = Version number [space] Status code [space] Reason [Enter and line feed]
Status line example:
Eg1:
HTTP/1.0 200 OK
Eg2:
HTTP/1.1 400 Bad Request
HTTP response message An example is as follows:
HTTP/1.1 200 OK
ETag: W/"158-1192590101000"Last-Modified: Wed, 17 Oct 2007 03:01:41 GMT
Content-Type: text/htmlContent-Length : 158
Date: Wed, 17 Oct 2007 03:01:59 GMTServer: Apache-Coyote/1.1
2.3.2 http status response code
2.3.2.1 1**: Request received, continue processing
100——The client must continue to make the request
101——The client requires the server to convert the HTTP protocol version according to the request
2.3.2.2 2**: The operation is successfully received, analyzed and accepted
200——The transaction is successful
2.3.2.3 3**: Completion of this request must be further processed
300 - The requested resource is available in multiple places
301 - Delete the request data
302 - The request data was found at other addresses
303 - Suggest customers to visit other URLs or access methods
304 - The client has executed GET, but the file has not changed
305 - The requested resource must be obtained from the address specified by the server
306 - The code used in the previous version of HTTP is no longer used in the current version
307 - —Declaration that the requested resource is temporarily deleted
2.3.2.4 4**: The request contains an incorrect syntax or cannot be completed
400 - Bad request, such as syntax error
401 - Unauthorized
HTTP 401.1 - Unauthorized : Login failed
HTTP 401.2 - Unauthorized: Server configuration problem caused login failure
HTTP 401.3 - ACL prohibits access to the resource
HTTP 401.4 - Unauthorized: Authorization was denied by the filter
HTTP 401.5 - Unauthorized: ISAPI or CGI Authorization failed
402 - Retain valid ChargeTo header response
403 - Access Forbidden
HTTP 403.1 Access Forbidden: Executable access is prohibited
HTTP 403.2 - Access Forbidden: Read access is prohibited
HTTP 403.3 - Access Forbidden: Access is prohibited Write access
HTTP 403.4 - Forbidden: Requires SSL
HTTP 403.5 - Forbidden: Requires SSL 128
HTTP 403.6 - Forbidden: IP address denied
HTTP 403.7 - Forbidden: Client certificate required
HTTP 403.8 - Forbidden: Site access is prohibited
HTTP 403.9 - Forbidden: Too many connected users
HTTP 403.10 - Forbidden: Invalid configuration
HTTP 403.11 - Forbidden: Password change
HTTP 403.12 - Forbidden: Mapper Access Denied
HTTP 403.13 - Forbidden: The client certificate has been revoked
HTTP 403.15 - Forbidden: The client has too many access permissions
HTTP 403.16 - Forbidden: The client certificate is untrusted or invalid
HTTP 403.17 - Forbidden: The client certificate has expired or is not yet valid
404 - No file, query or URL found
405 - The method defined by the user in the Request-Line field is not allowed
406 - Request resources based on the Accept sent by the user Not accessible
407 - Similar to 401, the user must first be authorized on the proxy server
408 - The client did not complete the request within the user-specified time
409 - The request cannot be completed for the current resource status
410 - This resource no longer exists on the server and has no further reference address
411 - The server rejected the user-defined Content-Length attribute request
412 - One or more request header fields are incorrect in the current request
413 - The requested resource is larger than the size allowed by the server
414 - The requested resource URL is longer than the length allowed by the server
415 - The requested resource does not support the requested item format
416 - The request contains the Range request header field , there is no range indication value within the current request resource range, and the request does not contain the If-Range request header field
417 - The server does not meet the expectations specified by the request Expect header field. If it is a proxy server, the next-level server may not be able to Fulfill request long. 52.3.2.5 5 **: The server executes a completely effective request failure
HTTP 500-Internal server error
HTTP 500.100-Internal Server Error-ASP Error
HTTP 500-11 server Close
HTTP 500-12 Application Program restart
HTTP 500-13 - Server too busy
HTTP 500-14 - Application invalid
HTTP 500-15 - Request global.asa not allowed
Error 501 - Not implemented
HTTP 502 - Gateway error
2.4 Use telnet for http testing
Under Windows, you can use the command window to perform simple http testing.
Enter cmd to enter the command window, type the following command on the command line and press Enter:
telnet www.baidu.com 80
can have the returned results echoed by pressing "Ctrl+]" in the window and pressing Enter .
Then start sending request messages, for example, send the following request message to request Baidu’s homepage message. The HTTP protocol used is HTTP/1.1:
GET /index.html HTTP/1.1
Note: After copying the above message to the command window You need to press two carriage returns and line feeds to get the response message. The first carriage return and line feed is typed after the command, which is required by the HTTP protocol. The second is to confirm the input and send the request.
You can see that a 200 OK message is returned, as shown in the figure below:
You can see that when HTTP/1.1 is used, the connection is not disconnected after the request is completed. If HTTP1.0 is used, type in the command window:
GET /index.html HTTP/1.0
At this point you can see that the request is disconnected immediately after it ends.
Readers can also try to bring header information when using GET or POST, such as typing the following information:
GET /index.html HTTP/1.1
connection: close
Host: www.baidu.com
2.5 Commonly used The request method
Commonly used request methods are GET and POST.
l GET method: It is to obtain the information of the resource specified by the request URI in the form of an entity. If the request URI is just a data generation process, then the response entity will eventually be What is returned in is the resource pointed to by the result of the processing, not the description of the processing.
l POST method: used to issue a request to the destination server, requiring it to accept the entity attached to the request and treat it as an additional new sub-item of the resource specified by the request URI in the request queue. Post is designed to use a unified The method implements the following functions:
1: Interpretation of existing resources;
2: Send messages to electronic bulletin boards, news groups, mailing lists or similar discussion groups;
3: Submit data blocks;
4: Extend the database with additional operations.
As can be seen from the above description, Get is a request to the server for data; Post is a request to submit data to the server. The data to be submitted is located in the entity behind the information header.
The GET and POST methods have the following differences:
(1) On the client side, the Get method submits data through the URL, and the data can be seen in the URL; the POST method, the data is placed in the HTML HEADER for submission.
(2) The data submitted by GET method can only be up to 1024 bytes, while POST does not have this limit.
(3) Security issues. As mentioned in (1), when using Get, the parameters will be displayed in the address bar, but Post will not. Therefore, if the data is Chinese data and non-sensitive data, then use get; if the data entered by the user is not Chinese characters and contains sensitive data, then it is better to use post.
(4) Safe and idempotent. Safe means that the operation is used to obtain information rather than modify it. Idempotent means that multiple requests to the same URL should return the same result. The complete definition is not as strict as it seems. In other words, GET requests should generally not have side effects. Fundamentally, the goal is that when a user opens a link, she can be confident that the resource has not changed from her perspective. For example, the front page of a news site is constantly updated. Although the second request returns a different batch of news, the operation is still considered safe and idempotent because it always returns the current news. vice versa. POST requests are not so easy. POST represents a request that may change a resource on the server. Still taking the news site as an example, readers' annotations on the article should be implemented through POST requests, because the site is different after the annotation is submitted (for example, an annotation appears below the article).
2.6 Request Headers
The most common request headers of HTTP are as follows:
l Accept: MIME type acceptable to the browser;
l Accept-Charset: Character set acceptable to the browser;
l Accept-Encoding : The data encoding method that the browser can decode, such as gzip. Servlets can return gzip-encoded HTML pages to browsers that support gzip. In many cases this can reduce download time by 5 to 10 times;
l Accept-Language: The language type desired by the browser, used when the server can provide more than one language version;
l Authorization: Authorization Information, usually appears in the response to the WWW-Authenticate header sent by the server;
l Connection: Indicates whether a persistent connection is required. If the servlet sees the value here as "Keep-Alive", or sees that the request is using HTTP 1.1 (HTTP 1.1 makes persistent connections by default), it can take advantage of persistent connections when the page contains multiple elements (e.g. Applet, picture), significantly reducing the time required for downloading. To achieve this, the Servlet needs to send a Content-Length header in the response. The simplest implementation method is: first write the content to ByteArrayOutputStream, and then calculate its size before officially writing the content;
l Content-Length: Indicates the length of the request message body;
l Cookie: This is one of the most important request header information;
l From: the email address of the request sender, used by some special web client programs and will not be used by browsers it;
l Host: The host and port in the initial URL;
l If-Modified-Since: Return the requested content only if it has been modified after the specified date, otherwise return a 304 "Not Modified" response;
l Pragma: Specifying the "no-cache" value indicates that the server must return a refreshed document, even if it is a proxy server and already has a local copy of the page;
l Referer: Contains a URL from which the user starts Access the currently requested page.
l User-Agent: Browser type, this value is very useful if the content returned by the Servlet is related to the browser type;
l UA-Pixels, UA-Color, UA-OS, UA-CPU: By some versions Non-standard request headers sent by IE browser indicating screen size, color depth, operating system and CPU type.
2.7 Response headers
The most common response headers of HTTP are as follows:
l Allow: Which request methods are supported by the server (such as GET, POST, etc.);
l Content-Encoding: The encoding (Encode) method of the document. Only after decoding can the content type specified by the Content-Type header be obtained. Using gzip to compress documents can significantly reduce the download time of HTML documents. Java's GZIPOutputStream can easily perform gzip compression, but only Netscape on Unix and IE 4 and IE 5 on Windows support it. Therefore, the Servlet should check whether the browser supports gzip by looking at the Accept-Encoding header (i.e. request.getHeader("Accept-Encoding")), return a gzip-compressed HTML page for browsers that support gzip, and return a normal HTML page for other browsers. Page;
l Content-Length: Indicates the content length. This data is only required if the browser uses persistent HTTP connections. If you want to take advantage of persistent connections, you can write the output document to ByteArrayOutputStream, check its size when completed, then put the value into the Content-Length header, and finally send the content through byteArrayStream.writeTo(response.getOutputStream();
l Content-Type: Indicates what MIME type the following document belongs to. Servlet defaults to text/plain, but it usually needs to be explicitly specified as text/html. Since Content-Type is often set, HttpServletResponse provides a dedicated method. setContentTyep. The correspondence between the extension and the MIME type can be configured in the web. When should the document be considered expired and no longer cached? Last-Modified: The client can provide a date via the If-Modified-Since request header, which will be considered a condition. GET, only documents whose modification time is later than the specified time will be returned, otherwise a 304 (Not Modified) status will be returned. Last-Modified can also be set with the setDateHeader method;
l Location: Indicates where the customer should go to retrieve the document. It is usually not set directly, but through the sendRedirect method of HttpServletResponse, which also sets the status code to 302;
l Refresh: Indicates how long the browser should refresh the document, in seconds, in addition to refreshing the current document. You can also let the browser read the specified page through setHeader("Refresh", "5; URL=http://host/path"). Note that this function is usually set by setting the Implementation, this is because automatic refresh or redirection is very important for HTML writers who cannot use CGI or Servlet. However, For Servlet, it is more convenient to set the Refresh header directly. Note that the meaning of Refresh is "refresh this page or access the specified page after N seconds", rather than "refresh this page or access the specified page every N seconds". Therefore, continuous refresh requires sending a Refresh header each time, and sending a 204 status code can prevent the browser from continuing to refresh, whether using the Refresh header or . Note that the Refresh header is not part of the official HTTP 1.1 specification, but is an extension, but both Netscape and IE support it.
2.8 Entity header
The entity header uses the meta-information of the entity content to describe the attributes of the entity content, including entity information type, length, compression method, last modification time, data validity, etc.
l Allow: GET, POST
l Content-Encoding: The encoding (Encode) method of the document, for example: gzip, see "2.5 Response Header";
l Content-Language: The language type of the content, for example: zh-cn ;
l Content-Length: Indicates the content length, eg: 80, please refer to "2.5 Response Header";
l Content-Location: Indicates where the customer should go to retrieve the document, for example: http://www.dfdf. org/dfdf.html, please refer to "2.5 Response Header";
l Content-MD5: An MD5 digest of the MD5 entity, used as a checksum. Both the sender and receiver calculate the MD5 digest, and the receiver compares its calculated value to the value passed in this header. Eg1: Content-MD5:
l Content-Range: Sent together with some entities; indicates the low and high byte offset of the inserted byte, and also indicates the total length of this entity. Eg1: Content-Range: 1001-2000/5000, eg2: bytes 2543-4532/7898
l Content-Type: Indicates the MIME type of the entity being sent or received. Eg: text/html; charset=GB2312 Main type/subtype;
l Expires: 0 proves no caching;
l Last-Modified: The WEB server considers the last modification time of the object, such as the last modification time of the file, dynamic The last time the page was generated, etc. For example: Last-Modified: Tue, 06 May 2008 02:42:43 GMT.
2.8 extension header
In HTTP messages, you can also use some header fields that are not defined in the HTTP1.1 official specification. These header fields Collectively called custom HTTP headers or extension headers, they are usually treated as a type of entity header.
Nowadays popular browsers actually support several commonly used extension header fields such as Cookie, Set-Cookie, Refresh and Content-Disposition.
l Refresh: 1; url=http://www.dfdf.org //Jump to the specified location after 1 second;
l Content-Disposition: header field, please refer to "2.5 Response Header";
l Content-Type: The WEB server tells the browser the type of object it responds to.
eg1: Content-Type: application/xml;
eg2: applicaiton/octet-stream;
Content-Disposition: attachment; filename=aaa.zip.
Appendix: Reference materials
"The difference between HTTP1.1 and HTTP1.0":
http://blog.csdn.net/yanghehong/archive/2009/05/28/4222594.aspx
"HTTP request ( The difference between GET and POST) and response》:
http://www.blogjava.net/honeybee/articles/164008.html
《HTTP request header overview_Baidu Knows》:
http://zhidao.baidu .com/question/32517427.html
"Entity Header and Extension Header":
http://www.cnblogs.com/tongzhiyong/archive/2008/03/16/1108776.html
3. In-depth understanding
3.1 Cookie and Session
Cookie and Session are both used to save state information. They are both mechanisms to save client state. They are both efforts to solve the stateless problem of HTTP.
Session can be implemented using Cookie or the URL writeback mechanism. Session implemented using Cookie can be considered a more advanced application of Cookie.
3.1.1 Comparison between the two
Cookies and Sessions have the following obvious differences:
1) Cookies save the state on the client side, and Sessions save the state on the server side;
2) Cookies are stored on the local machine of the server A small piece of text stored on a server and sent to the same server with every request. Cookie was first implemented in RFC2109, and was subsequently enhanced in RFC2965. The web server sends cookies to the client using HTTP headers. On the client terminal, the browser parses these cookies and saves them to a local file. It automatically attaches these cookies to any request to the same server. Session is not defined in the HTTP protocol;
3) Session is for each user. The value of the variable is stored on the server. A sessionID is used to distinguish which user's session variable it is. This value is determined by the user's browser. Returned to the server during access. When the client disables cookies, this value may also be set to be returned to the server by get;
4) In terms of security: When you visit a site that uses session, at the same time on your own machine To create a cookie, it is recommended that the SESSION mechanism on the server side be more secure because it will not arbitrarily read the information stored by the client.
3.1.2 Session Mechanism
Session mechanism is a server-side mechanism. The server uses a structure similar to a hash table (or may use a hash table) to save information.
When the program needs to create a session for a client's request, the server first checks whether the client's request already contains a session identifier - called session id. If it already contains a session id, it means that this client has been created before. If the client has created a session, the server will retrieve the session according to the session id and use it (if it cannot be retrieved, it may create a new one). If the client request does not include the session id, then create a session for the client and generate a session with this The session id associated with the session. The value of the session id should be a string that is neither repetitive nor easy to find patterns to counterfeit. This session id will be returned to the client for storage in this response.
3.1.6 How to implement Session
3.1.6.1 Use Cookie to implement
The server assigns a unique JSESSIONID to each Session and sends it to the client through Cookie.
When the client initiates a new request, it will carry this JSESSIONID in the Cookie header. In this way, the server can find the Session corresponding to the client.
The process is shown in the figure below:
3.1.6.2 Use URL echo to achieve
URL writeback means that the server carries the JSESSIONID parameter in all links sent to the browser page, so that the client clicks Any link will bring the JSESSIONID to the server.
If you directly enter the URL of the server resource in the browser to request the resource, the Session will not be matched.
Tomcat's implementation of Session is to use Cookie and URL writeback mechanism at the same time at the beginning. If it is found that the client supports Cookie, it will continue to use Cookie and stop using URL Writeback. If you find that cookies are disabled, always use URL writeback. When jsp development processes Session, remember to use response.encodeURL() for links in the page.
3.1.3 Several situations of Session failure in J2EE projects
1) Session timeout: Session expires within a specified time, such as 30 minutes. If there is no operation within 30 minutes, the Session will expire, such as in web. The following settings are made in xml:
()Explicitly remove the Session.
3.2 Implementation principle of cache
3.2.1 What is Web cacheWEB cache (cache) is located between the Web server and the client. The cache will save a copy of the output content according to the request, such as html page, picture, file. When the next request comes: if it is the same URL, the cache will directly use the copy to respond to the access request instead of sending the request to the source server again. . The HTTP protocol defines relevant message headers to make WEB caching work as well as possible. 3.2.2 Advantages of cachingq Reduced response latency: Because requests are responded from the cache server (closer to the client) rather than the origin server, this process takes less time, making the web server appear to respond faster.q Reducing network bandwidth consumption: When replicas are reused, the bandwidth consumption of the client will be reduced; customers can save bandwidth costs, control the growth of bandwidth requirements and make it easier to manage.
3.2.3 HTTP extension headers related to cache
q Expires: Indicates the expiration time of the response content, Greenwich Mean Time GMT
q Cache-Control: Controls cached content in more detail
q Last-Modified : The time when the resource in the response was last modified
q ETag: The check value of the resource in the response, which is uniquely identified in a certain period of time on the server.
q Date: Server time
q If-Modified-Since: The time when the resource accessed by the client was last modified, the same as Last-Modified.
q If-None-Match: The check value of the resource accessed by the client, the same as ETag.
3.2.4 Common process for client cache to take effect
When the server receives the request, it will send back the Last-Modified and ETag headers of the resource in 200OK. The client will save the resource in the cache and record these two Attributes. When the client needs to send the same request, it will carry two headers, If-Modified-Since and If-None-Match, in the request. The values of the two headers are the values of the Last-Modified and ETag headers in the response. The server determines that the local resource has not changed through these two headers, and the client does not need to download it again and returns a 304 response. The common process is shown in the figure below:
3.2.5 Web caching mechanism
The purpose of caching in HTTP/1.1 is to reduce sending requests in many cases, and in many cases there is no need to send a complete response. The former reduces the number of network loops; HTTP utilizes an "expiration" mechanism for this purpose. The latter reduces the bandwidth of network applications; HTTP uses a "validation" mechanism for this purpose.
HTTP defines 3 caching mechanisms:
1) Freshness: allows a response message to be rechecked at the source server and can be controlled by the server and client. For example, the Expires response header gives the time a document was unavailable. The max-age flag in Cache-Control indicates the maximum cache time;
2) Validation: Used to check whether a cached response is still available. For example, if a response has a Last-Modified response header, the cache can use If-Modified-Since to determine whether it has changed, so as to determine whether to send the request according to the situation;
3) Invalidation: When another request passes the cache, it is often There is a side effect. For example, if a URL is associated with a cached response but is followed by POST, PUT, and DELETE requests, the cache will expire.
3.3 The implementation principle of breakpoint resumption and multi-threaded download
q The GET method of HTTP protocol supports requesting only a certain part of a resource;
q 206 Partial Content partial content response;
q Range of requested resources Range;
q Content-Range The resource range of the response;
q When the connection is disconnected and reconnected, the client only requests the undownloaded part of the resource instead of re-requesting the entire resource to achieve breakpoint resumption.
Blocked resource request example:
Eg1: Range: bytes=306302-: Request the part from 306302 bytes to the end of this resource;
Eg2: Content-Range: bytes 306302-604047/604048: Indicated in the response It carries the 306302-604047th bytes of the resource, and the resource has a total of 604048 bytes;
The client achieves concurrent block download of a certain resource by concurrently requesting different fragments of the same resource. So as to achieve the purpose of fast downloading. The currently popular FlashGet and Thunder basically use this principle.
The principle of multi-threaded downloading:
q The download tool opens multiple threads that issue HTTP requests;
q Each http request only requests a part of the resource file: Content-Range: bytes 20000-40000/47000;
q Merge files downloaded by each thread.
3.4 https communication process
3.4.1 What is https
HTTPS (full name: Hypertext Transfer Protocol over Secure Socket Layer) is an HTTP channel targeting security. Simply put, it is a secure version of HTTP. That is, the SSL layer is added to HTTP. The security foundation of HTTPS is SSL, so please see SSL for details on encryption.
See the picture below:
The port number used by https is 443.
3.4.2 The implementation principle of https
There are two basic types of encryption and decryption algorithms:
1) Symmetric encryption: There is only one key, encryption and decryption are the same password, and the encryption and decryption speed is fast, typical symmetric encryption Algorithms include DES, AES, etc.;
2) Asymmetric encryption: keys appear in pairs (and the private key cannot be inferred based on the public key, and the public key cannot be inferred based on the private key), and different keys are used for encryption and decryption (public key encryption Private key decryption is required, and private key encryption requires public key decryption). Relatively symmetric encryption is slower. Typical asymmetric encryption algorithms include RSA, DSA, etc.
Let’s take a look at the communication process of https:
Advantages of https communication:
1) The key generated by the client can only be obtained by the client and the server;
2) The encrypted data can only be obtained by the client and the server Only the server side can get the clear text;
3) The communication from client to server is safe.
3.5 http proxy
3.5.1 http proxy server
The full English name of proxy server is Proxy Server, and its function is to act as a proxy for network users to obtain network information. To put it figuratively: it is a transfer station for network information.
A proxy server is a server between the browser and the web server. With it, the browser does not go directly to the web server to retrieve the web page but sends a request to the proxy server. The Request signal will be sent to the proxy first. Server, the proxy server retrieves the information required by the browser and sends it to your browser.
Moreover, most proxy servers have a buffering function, just like a big Cache. It has a large storage space. It continuously stores newly obtained data in its own memory. If the browser requests The data already exists in its local memory and is the latest, then it does not re-fetch the data from the Web server, but directly transmits the data in the memory to the user's browser, which can significantly improve browsing speed and efficiency. .
More importantly: Proxy Server (proxy server) is an important security function provided by the Internet link-level gateway. It works mainly in the conversation layer of the Open Systems Interconnection (OSI) model.
3.5.2 The main functions of http proxy server
The main functions are as follows:
1) Break through your own IP access restrictions and access foreign sites. For example: Internet users such as Education Network and 169 Network can access foreign websites through proxies;
2) Access internal resources of some units or groups, such as FTP of a university (provided that the proxy address is within the allowed access range of the resource), Using the free proxy server in the address segment of the education network, you can use it for various FTP downloads and uploads open to the education network, as well as various data query and sharing services;
3) Break through China Telecom’s IP blockade: China Telecom has many users Access to the website is restricted. This restriction is artificial. Different servers block addresses differently. Therefore, if you cannot access it, you can try a foreign proxy server;
4) Improve access speed: Usually the proxy server sets a larger hard disk buffer. When external information passes through, it will also be saved to In the buffer, when other users access the same information again, the information will be taken directly from the buffer and passed to the user to improve the access speed;
5) Hide the real IP: Internet users can also hide themselves through this method IP, protected from attacks.
3.5.3 http proxy icon
The http proxy icon is shown below:
For the client browser, the http proxy server is equivalent to the server.
For the web server, the http proxy server plays the role of the client.
3.6 Implementation of virtual host
3.6.1 What is a virtual host
Virtual host: It divides a certain disk space on the network server for users to place sites, application components, etc., and provides necessary site functions and data storage. transfer function.
The so-called virtual host, also called "website space", is to divide a server running on the Internet into multiple "virtual" servers. Each virtual host has an independent domain name and a complete Internet server (supports WWW, FTP, E-mail, etc.) functions. Different virtual hosts on a server are independent and managed by the user. However, a server host can only support a certain number of virtual hosts. When this number is exceeded, users will experience a sharp drop in performance.
3.6.2 Implementation principle of virtual host
Virtual host is a technology that uses the same WEB server to provide services for websites with different domain names. Apache, Tomcat, etc. can realize this function through configuration.
Related HTTP message header: Host.
For example: Host: www.baidu.com
When the client sends an HTTP request, it will carry the Host header. The Host header records the domain name entered by the client. In this way, the server can confirm which domain name the client wants to access based on the Host header.
Appendix: References
"Understanding Cookie and Session Mechanism":
http://sumongh.javaeye.com/blog/82498
"A Brief Analysis of HTTP Protocol":
http://203.208.39.132/ search?q=cache:CdXly_88gjIJ:www.cnblogs.com/gpcuster/archive/2009/05/25/1488749.html+http%E5%8D%8F%E8%AE%AE+web%E7%BC%93% E5%AD%98&cd=27&hl=zh-CN&ct=clnk&gl=cn&st_usg=ALhdy2-vzOcP8XTG1h7lcRr2GJrkTbH2Cg
"http proxy_Baidu Encyclopedia":
http://baike.baidu.com/view/1159398.htm
"Virtual Host_Baidu Encyclopedia》:
http://baike.baidu.com/view/7383.htm
《https_Baidu Encyclopedia》: