Introduction to HTTP, http is an object-oriented protocol belonging to the application layer-PHP Tutorial-php.cn

please indicate the source: Introduction to HTTP, http is an object-oriented protocol belonging to the application layer

introduction

HTTP is an object-oriented protocol belonging to the application layer. Due to its simple and fast way, it is suitable for distributed hypermedia information systems. It was proposed in 1990 and has been continuously improved and expanded after several years of use and development. The sixth version of HTTP/1.0 is currently used in the WWW. The standardization work of HTTP/1.1 is in progress, and the HTTP-NG (Next Generation of HTTP) Suggestions have been made.
The main features of the HTTP protocol can be summarized as follows:
1. Support client/server mode.
2. Simple and fast: When a client requests a service from the server, it only needs to transmit the request method and path. Commonly used request methods are GET, HEAD, and POST. Each method specifies a different type of contact between the client and the server. Due to the simplicity of the HTTP protocol, the program size of the HTTP server is small and the communication speed is very fast.
3. Flexible: HTTP allows the transmission of any type of data object. The type being transferred is marked by Content-Type.
4. No connection: The meaning of no connection is to limit each connection to only process one request. After the server processes the client's request and receives the client's response, it disconnects. This method saves transmission time.
5. Stateless: The HTTP protocol is a stateless protocol. Stateless means that the protocol has no memory ability for transaction processing. The lack of status means that if subsequent processing requires the previous information, it must be retransmitted, which may result in an increase in the amount of data transferred per connection. On the other hand, the server responds faster when it does not need previous information.

1. Detailed explanation of HTTP protocol: URL

HTTP (Hypertext Transfer Protocol) is a stateless, application-layer protocol based on request and response mode. It is often based on TCP connection method. HTTP1.1 version provides a continuous connection mechanism, which is absolutely Most web development is web applications built on the HTTP protocol.

HTTP URL (URL is a special type of URI that contains enough information to find a resource) has the following format:
http://www.php.cn /[":"port][abs_path]
http means to locate network resources through the HTTP protocol; host means a legal Internet host domain name or IP address; port specifies a port number, if it is empty Use the default port 80; abs_path specifies the URI of the requested resource; if abs_path is not given in the URL, then when it is used as the request URI, it must be given in the form of "/". Usually the browser automatically completes this task for us.
eg:
1. Enter: www.guet.edu.cn
The browser automatically converts to: http:// www.php.cn/
2、http:192.168.0.116:8080/index.jsp

2. Detailed explanation of HTTP protocol - Request

http request consists of three parts, namely: request line, message header, and request body

1. The request line starts with a method symbol, separated by spaces, followed by the requested URI and protocol version. The format is as follows: Method Request-URI HTTP-Version CRLF
Method represents the request method; Request-URI It is a uniform resource identifier; HTTP-Version indicates the requested HTTP protocol version; CRLF indicates carriage return and line feed (except for the CRLF at the end, no separate CR or LF characters are allowed).

There are many request methods (all methods are in uppercase letters). The explanations of each method are as follows:
GET Request to obtain the resource identified by Request-URI
POST Append a new resource after the resource identified by Request-URI Data
HEAD Requests to obtain the response message header of the resource identified by Request-URI
PUT Requests the server to store a resource and uses Request-URI as its identifier
DELETE Requests the server to delete the resource identified by Request-URI Resource
TRACE Request the server to send back the received request information, mainly used for testing or diagnosis
CONNECT Reserved for future use
OPTIONS request to query the performance of the server, or query options and requirements related to the resource
Application Example:
GET method: When accessing a webpage by entering the URL in the browser's address bar, the browser uses the GET method to obtain resources from the server, eg: GET /form.html HTTP/1.1 (CRLF)

The POST method requires the requested server to accept the data attached to the request, and is often used to submit forms.
eg：POST /reg.jsp HTTP/ (CRLF)
Accept:image/gif,image/x-xbit,... (CRLF)
...
HOST:www.guet .edu.cn (CRLF)
Content-Length:22 (CRLF)
Connection:Keep-Alive (CRLF)
Cache-Control:no-cache (CRLF)
(CRLF) // This CRLF indicates that the message header has ended, and before this is the message header
user=jeffrey&pwd=1234 //The following line is the submitted data

The HEAD method is almost the same as the GET method. For the response part of the HEAD request, the information contained in its HTTP header is the same as the information obtained through the GET request. Using this method, information about the resource identified by the Request-URI can be obtained without transmitting the entire resource content. This method is often used to test the validity of a hyperlink, whether it is accessible, and whether it has been updated recently.
2. Request header description later
3. Request body (omitted)

3. Response Chapter Detailed Explanation of HTTP Protocol

After receiving and interpreting the request message, the server returns an HTTP response message.

HTTP response also consists of three parts, namely: status line, message header, response body
1. The status line format is as follows:
HTTP-Version Status-Code Reason-Phrase CRLF
where , HTTP-Version represents the version of the server HTTP protocol; Status-Code represents the response status code sent back by the server; Reason-Phrase represents the text description of the status code.
The status code consists of three digits. The first digit defines the category of the response and has five possible values:
1xx: Indication information--indicates that the request has been received and continues to be processed
2xx: Success --Indicates that the request has been successfully received, understood, and accepted
3xx: Redirect--Further operations must be performed to complete the request
4xx: Client error--The request has a syntax error or the request cannot be implemented
5xx: Server-side error--the server failed to implement a legal request
Common status codes, status descriptions, instructions:
200 OK //Client request successful
400 Bad Request //Client request OK Syntax error, cannot be understood by the server
401 Unauthorized //The request is not authorized, this status code must be used together with the WWW-Authenticate header field
403 Forbidden //The server received the request, but refused to provide the service
404 Not Found //The requested resource does not exist, eg: the wrong URL was entered
500 Internal Server Error //An unexpected error occurred in the server
503 Server Unavailable //The server is currently unable to process the client's request, a paragraph It may return to normal after some time
eg: HTTP/1.1 200 OK (CRLF)

2. Response header is described later

3. The response body is the content of the resource returned by the server

4. Detailed Explanation of HTTP Protocol: Message Header

HTTP messages consist of client-to-server requests and server-to-client responses. Both request messages and response messages consist of a start line (for a request message, the start line is the request line, for a response message, the start line is the status line), a message header (optional), a blank line (a line with only CRLF), and the message body (optional) composition.

HTTP message headers include ordinary headers, request headers, response headers, and entity headers.
Each header field is composed of name + ":" + space + value. The name of the message header field is case-independent.

1. Ordinary header
In the ordinary header, there are a few header fields used for all request and response messages, but not for the entity being transmitted, only for the transmitted message.
eg:
Cache-Control is used to specify cache instructions. The cache instructions are one-way (the cache instructions that appear in the response may not appear in the request) and are independent (the cache instructions of a message will not Caching mechanism that affects another message processing), a similar header field used by HTTP 1.0 is Pragma.
Cache directives when requesting include: no-cache (used to indicate that request or response messages cannot be cached), no-store, max-age, max-stale, min-fresh, only-if-cached;
The caching directives in response include: public, private, no-cache, no-store, no-transform, must-revalidate, proxy-revalidate, max-age, s-maxage.
eg: In order to instruct IE browser ( Client) Do not cache the page. The server-side JSP program can be written as follows: response.sehHeader("Cache-Control","no-cache");
//response.setHeader("Pragma","no-cache" );The function is equivalent to the above code, usually both // are used together
This code will set the common header field in the response message sent: Cache-Control:no-cache

Date common header field indicates the date and time when the message was generated

The Connection common header field allows sending options for specific connections. For example, specify that the connection is continuous, or specify the "close" option to notify the server to close the connection after the response is completed

2. Request header
The request header allows the client to transmit additional information of the request and the client's own information to the server.
Commonly used request headers
Accept
The Accept request header field is used to specify what types of information the client accepts. eg: Accept: image/gif, indicating that the client wishes to accept resources in GIF image format; Accept: text/html, indicating that the client wishes to accept html text.
Accept-Charset
The Accept-Charset request header field is used to specify the character set accepted by the client. eg: Accept-Charset:iso-8859-1, gb2312. If this field is not set in the request message, the default is that any character set is acceptable.
Accept-Encoding
The Accept-Encoding request header field is similar to Accept, but it is used to specify acceptable content encoding. eg: Accept-Encoding:gzip.deflate. If this domain is not set in the request message, the server assumes that the client can accept various content encodings.
Accept-Language
The Accept-Language request header field is similar to Accept, but it is used to specify a natural language. eg: Accept-Language:zh-cn. If this header field is not set in the request message, the server assumes that the client can accept various languages.
Authorization
The Authorization request header field is mainly used to prove that the client has the right to view a certain resource. When the browser accesses a page and receives a response code of 401 (Unauthorized) from the server, it can send a request containing the Authorization request header field to ask the server to verify it.
Host (this header field is required when sending a request)
Host request header field is mainly used to specify the Internet host and port number of the requested resource. It is usually extracted from the HTTP URL, eg:
We enter in the browser: http://www.php.cn/
The request message sent by the browser will include the Host request header field, As follows:
Host: www.guet.edu.cn
The default port number 80 is used here. If the port number is specified, it becomes: Host: www.guet.edu.cn:Specify port number
User-Agent
When we log in to the forum online, we often see some welcome messages, which list The name and version of your operating system and the name and version of the browser you are using often make many people feel amazing. In fact, the server application obtains this information from the User-Agent request header field. . The User-Agent request header field allows the client to tell the server its operating system, browser, and other attributes. However, this header field is not necessary. If we write a browser ourselves and do not use the User-Agent request header field, then the server will not be able to know our information.
Request header example:
GET /form.html HTTP/1.1 (CRLF)
Accept:image/gif,image/x-xbitmap,image/jpeg,application/x-shockwave-flash,application/ vnd.ms-excel,application/vnd.ms-powerpoint,application/msword,*/* (CRLF)
Accept-Language:zh-cn (CRLF)
Accept-Encoding:gzip,deflate (CRLF)
If-Modified-Since:Wed,05 Jan 2007 11:21:25 GMT (CRLF)
If-None-Match:W/"80b1a4c018f3c41:8317" (CRLF)
User-Agent:Mozilla /4.0(compatible;MSIE6.0;Windows NT 5.0) (CRLF)
Host:www.guet.edu.cn (CRLF)
Connection:Keep-Alive (CRLF)
(CRLF)

3. Response header
The response header allows the server to pass additional response information that cannot be placed in the status line, as well as information about the server and information about next access to the resource identified by the Request-URI.
Commonly used response headers
Location
The Location response header field is used to redirect the recipient to a new location. The Location response header field is often used when changing domain names.
Server
The Server response header field contains information about the software used by the server to process the request. Corresponds to the User-Agent request header field. The following is an example of the
Server response header field:
Server:Apache-Coyote/1.1
WWW-Authenticate
The WWW-Authenticate response header field must be included in the 401 (Unauthorized) response message When the client receives the 401 response message and sends the Authorization header field to request the server to verify it, the server response header contains this header field.
eg: WWW-Authenticate:Basic realm="Basic Auth Test!" //It can be seen that the server uses a basic verification mechanism for requested resources.

4. Entity header
Both request and response messages can transmit an entity. An entity consists of an entity header field and an entity body. However, this does not mean that the entity header field and the entity body must be sent together. Only the entity header field can be sent. The entity header defines meta-information about the entity body (eg: presence or absence of an entity body) and the resource identified by the request.
Commonly used entity headers
Content-Encoding
The Content-Encoding entity header field is used as a modifier of the media type. Its value indicates the encoding of additional content that has been applied to the entity body, so it must To obtain the media type referenced in the Content-Type header field, the corresponding decoding mechanism must be used. Content-Encoding is used to record the compression method of the document, eg: Content-Encoding: gzip
Content-Language
Content-Language entity header field describes the natural language used by the resource. If this field is not set, it is assumed that the entity content will be available to readers in all languages. eg: Content-Language:da
Content-Length
The Content-Length entity header field is used to indicate the length of the entity body, expressed as a decimal number stored in bytes.
Content-Type
The Content-Type entity header field specifies the media type of the entity body sent to the recipient. eg:
Content-Type:text/html;charset=ISO-8859-1
Content-Type:text/html;charset=GB2312
Last-Modified
Last-Modified entity header field is used Indicates the date and time the resource was last modified.
Expires
The Expires entity header field gives the date and time when the response expires. In order to allow the proxy server or browser to update the page in the cache after a period of time (when accessing the previously visited page again, load it directly from the cache, shorten the response time and reduce the server load), we can use the Expires entity header field to specify the page Expiration time. eg: Expires: Thu, 15 Sep 2006 16:23:12 GMT
Clients and caches of HTTP1.1 MUST treat other illegal date formats (including 0) as having expired. eg: In order to prevent the browser from caching the page, we can also use the Expires entity header field and set it to 0. The program in jsp is as follows: response.setDateHeader("Expires","0");

5. Use telnet to observe the communication process of http protocol

Purpose and principle of the experiment: Use MS's telnet tool to send a request to the server by manually entering http request information. After the server receives, interprets and accepts the request, it will return a response, which will be displayed in telnet It is displayed on the window, thereby deepening the understanding of the communication process of http protocol from a perceptual perspective.

Experimental steps:

1. Open telnet1.1 Open telnet
Run-->cmd-->telnet

1.2 Turn on telnet echo functionset localecho

2. Connect to the server and send a request
2.1 open www.guet.edu.cn 80 //Note that the port number cannot be omitted

HEAD /index.asp HTTP/1.0
Host:www.guet.edu.cn

/*We can change the request method and request the content of Guilin Electronics homepage, enter the message as follows*/
open www.guet.edu.cn 80

GET /index.asp HTTP/1.0 //The content of the requested resource
Host:www.guet .edu.cn

2.2 open www.sina.com.cn 80 //Enter telnet directly at the command prompt www.sina.com.cn 80
HEAD /index.asp HTTP/1.0
Host:www.sina.com.cn

3 Experimental results:

3.1 The response obtained by requesting information 2.1 is:

HTTP/1.1 200 OK 8 Mar 200707:17:51 GMT
Connection: Keep-Alive ##Content-Length: 23330
Content-Type: text/html
Expries: Thu,08 Mar 2007 07:16:51 GMT
Set-Cookie: ASPSESSIONIDQAQBQQQB=BEJCDGKADEDJKLKKAJEOIMMH; path=/
Cache-control: private

//Resource content omitted

3.2 The response obtained by requesting information 2.2 is:

HTTP/1.0 404 Not Found //Request failed

Date: Thu, 08 Mar 2007 07:50:50 GMT

Server: Apache/2.0.54 Last-Modified: Thu, 30 Nov 2006 11:35:41 GMT
ETag: "6277a-415-e7c76980"
Accept-Ranges: bytes
X-Powered-By: mod_xlayout_jh/0.0.1vhs.markII.remix
Vary: Accept-Encoding
Content-Type: text/html
X-Cache: MISS from zjm152-78.sina.com.cn
Via: 1.0 zjm152-78.sina.com.cn :80
X-Cache: MISS from th-143.sina.com.cn
Connection: close

Lost connection to host

Press any key to continue...

4. Notes: 1. If there is an input error, the request will not be successful.
2. The header fields are not case-sensitive.
To learn more about the HTTP protocol, you can check RFC2616 and find the file at http://www.php.cn/.
4. To develop background programs, you must master the http protocol

6. HTTP protocol related technical supplement

1. Basics:
High-level protocols include: File Transfer Protocol FTP, Email Transfer Protocol SMTP, Domain Name System Service DNS, Network News Transfer Protocol NNTP and HTTP protocols, etc.
There are three types of intermediaries: Proxy ), Gateway and Tunnel, a proxy accepts requests according to the absolute format of the URI, rewrites all or part of the message, and sends the formatted request to the server through the URI identifier. A gateway is a receiving proxy that acts as a layer above some other server and, if necessary, can translate requests to the underlying server protocol. A channel acts as a relay point between two connections that do not change messages. Channels are often used when communication needs to go through an intermediary (such as a firewall, etc.) or when the intermediary cannot identify the content of the message.
Proxy: An intermediate program that can act as a server or a client to establish requests for other clients. Requests are passed internally or via other servers via possible translations. A proxy must interpret and if possible rewrite a request message before sending it. A proxy often acts as a portal for clients through a firewall. A proxy can also serve as a helper application to handle requests over a protocol that are not completed by the user agent.
Gateway: A server that acts as an intermediary for other servers. Unlike a proxy, a gateway accepts requests as if it were the origin server for the requested resource; the requesting client is unaware that it is dealing with the gateway.
A gateway often serves as a server-side portal through a firewall. The gateway can also serve as a protocol translator to access resources stored in non-HTTP systems.
Channel (Tunnel): It is an intermediary program that acts as a relay between two connections. Once activated, the channel is not considered to belong to HTTP communication, although the channel may be initiated by an HTTP request. When both ends of the relayed connection are closed, the channel disappears. Channels are often used when a portal must exist or when an intermediary cannot interpret the relayed traffic.

2. Advantages of protocol analysis - HTTP analyzer detects network attacks
Analyzing and processing high-level protocols in a modular manner will be the direction of future intrusion detection.
Commonly used ports 80, 3128 and 8080 of HTTP and its proxy are specified in the network section using the port tag

3. HTTP protocol Content Lenth restriction vulnerability leads to denial of service attack
When using the POST method, ContentLenth can be set to define the length of data that needs to be transmitted, for example, ContentLenth:999999999. The memory will not be released until the transmission is completed. An attacker can take advantage of this flaw to continuously send junk data to the WEB server until the WEB server runs out of memory. This attack method leaves basically no trace.
http://www.php.cn/

4. Some ideas for using the characteristics of HTTP protocol to carry out denial of service attacks
The server is busy processing the TCP connection request forged by the attacker and has no time to pay attention to it The client's normal request (after all, the client's normal request ratio is very small), at this time, from the perspective of a normal client, the server loses response. This situation is called: the server is subject to a SYNFlood attack (SYN flood attack).
Smurf, TearDrop, etc. use ICMP messages to carry out Flood and IP fragmentation attacks. This article uses the "normal connection" method to generate a denial of service attack.
Port 19 has been used for Chargen attacks in the early days, namely Chargen_Denial_of_Service, but! The method they used was to generate a UDP connection between two Chargen servers, allowing the server to process too much information and become DOWN. Then, there must be two conditions for killing a WEB server: 1. There is a Chargen service 2. There is HTTP Service
Method: The attacker forges the source IP and sends a connection request (Connect) to N Chargens. After Chargen receives the connection, it will return a 72-byte character stream per second (actually, according to the actual network conditions, this speed is faster ) to the server.

5. Http Fingerprinting Technology
The principle of Http fingerprinting is basically the same: recording different servers to identify minor differences in the execution of the Http protocol. Http fingerprinting is better than TCP/IP stack fingerprinting It is much more complicated. The reason is that customizing the configuration file of the HTTP server and adding plug-ins or components make it easy to change the HTTP response information, which makes identification difficult; however, customizing the behavior of the TCP/IP stack requires modifying the core layer, so It is easy to identify.
It is very simple to set up the server to return different Banner information. For open source Http servers like Apache, users can modify the Banner information in the source code, and then restart the Http service to take effect. For Http servers that do not have open source code, such as Microsoft's IIS or Netscape, you can modify it in the Dll file that stores Banner information. Related articles have discussed it, so I won't go into details here. Of course, the effect of such modification is still good. .Another way to obscure banner information is to use a plug-in.
Common test requests:
1: HEAD/Http/1.0 sends basic Http request
2: DELETE/Http/1.0 sends requests that are not allowed, such as Delete requests
3: GET/Http/3.0 sends an illegal version of the Http protocol request
4: GET/JUNK/1.0 sends an incorrect one Specification of Http protocol request
Http fingerprint identification tool Httprint, which can effectively determine the type of Http server by using statistical principles and combining fuzzy logic technology. It can be used to collect and analyze the types of HTTP servers generated by different Http servers. 's signature.

6. Others: In order to improve the performance of users when using the browser, modern browsers also support concurrent access methods. When browsing a web page, multiple connections are established at the same time to quickly obtain multiple icons on a web page. , which can complete the transmission of the entire web page more quickly.
HTTP1.1 provides this continuous connection method, and the next generation HTTP protocol: HTTP-NG has added support for session control, rich content negotiation and other methods to provide
more efficient connect.

The above is the detailed content of Introduction to HTTP, http is an object-oriented protocol belonging to the application layer. For more information, please follow other related articles on the PHP Chinese website!