How long should a URL be? Why do I ask this question? There are many optimization guides that say: Minimize COOKIE, shorten URL, and use GET requests as much as possible, etc., in order to optimize the request and loading of WEB pages. However, the so-called "as much as possible" and "as much as possible" are only qualitative descriptions. From a quantitative point of view, how many bytes should be shortened to be considered short?
For example, during one of our homepage revisions, I saw the URLs of several interesting .js files through http analyzers, which looked like this:
[xhtml] view plaincopy
Pay attention to the last item. Well, don't be surprised, it is indeed such a long URL, the exact length is 443 bytes. But is it too long? Still not too long?
You must know that taking IE as an example, the URL length that can be processed is 2048 bytes. In other words, IE can handle it anyway. In fact, general browsers have no problem, so "correctness" is no problem. So, the next thing we want to talk about is efficiency.
1. Packet header issues in TCP/IP protocol
In TCP/IP network, the underlying protocol is one thing, and the application layer protocol is another. . Therefore, as an application layer protocol, HTTP itself can transmit how much content it can and how to transmit it (for example, HTTP packets are generally bounded by 48K. When it exceeds 48K, application layer sub-packaging will occur, which is the so-called multipart). These are all determined by the application layer. Come to make an appointment. In the underlying protocol, the link layer and the transport layer have their own agreements on "how large a packet should be transmitted." Simply put, the transport layer agrees on the MSS (maximum segment size) of IP data packets, and the link layer agrees on the MTU (maximum transmission unit). If the size of an IP packet exceeds the MTU (ie, MSS TCP header IP header > MTU), the IP packet will be split into multiple packets for transmission at the link layer.
MSS is related to different transmission environments and has two recommended values. Generally speaking, - when the destination address is not a local address (in a different network segment from the source address), the default MSS value is usually 536; otherwise, the default MSS value is usually 1460. MTU is related to the network environment and has two recommended values. In general, - 576 bytes for serial port; - 1500 bytes for Ethernet.
There is a 40-byte difference between the two recommended values of MTU/MSS, which is the general value of (TCP header IP header), which is capped at 120 bytes (20 20-byte IP /TCP header; 40 40-byte IP/TCP optional header). Therefore, in a complex network environment, the optimal value for the size of a single data packet available for the application layer network protocol should be less than 536-80 = 456 bytes, and try to be limited to 1460-80 = 1380 bytes. Such restrictions are the result of comprehensive consideration of transport layer and link layer protocols. However, some common suggestions also use the two values of 536/1460, which is not fundamentally different from the discussion here. I'm just emphasizing that if we want a "sufficiently optimized request", what should the limit be?
2. Header issues in HTTP protocol
So, now we come to HTTP, the application layer protocol. An HTTP request consists of a header and a data area. For an HTTP GET request, there can be only a header without a data area. The reason is that the content of the HTTP header is as follows (the header needs to end with 2 consecutive carriage returns and line feeds):
[xhtml] view plaincopy
The GET (...) here can be followed by a complete GET request URL, and the parameters of the GET request are also placed on this URL, so there is no need for a separate data area. In the above HTTP request, some specific clients may have a few more or fewer http head fields, but usually the fields will be shorter. We only use this example to illustrate, so how many bytes does this "default (incomplete) HTTP header" use?
The answer is 184 bytes. However, it needs to be emphasized that the Referer is directly related to the URL currently being browsed. For example, the page currently being browsed is a 500-byte URL. Then when a hyperlink on the current web page is clicked, the Referer field will be filled with this 500-byte URL. A URL that is too long in a web page will consume more transmission when clicking a hyperlink. Here is another example.
Let’s not discuss the impact of the Referer field. Taking the above as an example, the best value we can use is only 456-184=272 bytes. These 272 bytes will be used in three places, namely the three places marked (...) above: GET, User-Agent and Cookie. The User-Agent field is related to the browser. Different browsers and the browsers handle different operating system environments will appear different. In JS and statistical software on the server, this field is often used to determine the browser environment, such as OS, version, etc. The value of this field is sometimes relatively long. Taking my current machine as an example, the value is: --------- Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; QQWubi 108 ; EmbeddedWB 14.52 from:http://www.bsalsa.com/ EmbeddedWB 14.52; .NET CLR 2.0.50727; InfoPath.2; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; .NET CLR 1.1.4322; .NET CLR 3.5.21022; .NET4.0C; .NET4.0E) --------- Occupies 274 bytes. In other words, in fact, using 456 bytes under ideal circumstances is no longer enough. As discussed previously, we can do the next best thing: - Use a boundary value of 536 bytes, which does not take into account the 80-byte tcp/ip optional header.
In addition, it should be emphasized that the length of User-Agent is variable. For example, the 64 bytes of "EmbeddedWB..." above may not be available in ordinary computers. This is a third-party component. . Similarly, other browser environments (such as Maxthon) may cause this field to be longer. Based on this fact, I will still analyze this special situation in this case.
Taking 536 bytes as an example, we actually have 78 bytes available, so here we set the first level of optimization to: 70 bytes. It is recommended that the company can take a balanced value based on the data collected on the server side.
3. COOKIE consumption can be reduced to 0
Now, cookies are the largest consumer. Taking my current machine as an example, This value has several situations (different for different protocols and domains):
(1) For the homepage http://www.alipay.net/ , the value is 49 bytes: ali_apache_id=12.1.11.70.1275978936200.5; lastpg=
(2) for http://* .alipay.net/, the value is 171 bytes: ali_apache_id=12.1.11.70.1275978936200.5; ali_apache_sid=12.1.46.46.128998714836.4|1289988948; ALIPAYJSESSIONID= bYWcn4Wq0Z5FBCoHzfpn2f1XxDAmBepay; ali_apache_tracktmp=uid=
(3) For https://static.alipay.net/, the value is 307 bytes: cna=AKaaAhYBhU0BAeMdAHlnHNcd; ali_apache_id=169.17. 198.19.1272623861747.7; payMethod=directPay; _tb_order=38016166656317; defaultBank=ICBC; __utma=22931947.260433774.1277279158.1277279158.1282287558.2; __utm z=22931947.1282287558.2.2.utmcsr=life.alipay.net|utmccn=(referral)|utmcmd=referral|utmcct= /index.php
(4) For http(s)://img.alipay.net/, the value is 379 bytes: apay_id=159588238.127262386236866.128979461890689.1289969142342368.137; cna = AKaaAhYBhU0BAeMdAHlnHNcd; ali_apache_id=169.17.198.19.1272623861747.7; payMethod=directPay; _tb_order=38016166656317; defaultBank=ICBC; __utma=22931947.260433774.1277 279158.1277279158.1282287558.2; __utmz=22931947.1282287558.2.2.utmcsr=life.alipay.net|utmccn=(referral)| utmcmd=referral|utmcct=/index.php
(5) Other situations.
Why did cookie usage surge in situations 2, 3, and 4? In fact, although situations 3 and 4 are slightly different, the root cause of the problem is completely consistent with situation 2. Therefore, the following article only takes case 2 as an example. Tracking the http request process shows: - When requesting the homepage, the server returned four set-cookie responses.
The four responses (http response head) are as follows: --------
Set-Cookie:ali_apache_sid=10.2.46.46.128998714836.4|1289988948 ; path=/; domain=.alipay.net Set-Cookie:JSESSIONID=A8CE523AEA03E2C990D6796D6BAEC81E; =; Domain =.alipay.net; Path=/
--------
So in all subsequent http requests, the 171-byte cookie in the previous example (3) will be used. However, obviously, these cookies are meaningless in at least the following situations: - If you visit a redirect page, including redirects that return Status Code: 302, and redirects that use http-meta in html pages; - If the visited page is cached, for example, Status Code: 304 "Not Modified" will be returned; - If the visited page is static and does not require cookie recognition, such as .img, .js and .js in static.alipay.net. css files, etc.
Obviously, our images or other static resources in img, static can be cached, and whether it is cached or accessed for the first time, the cookie value is completely meaningless. For static pages (.html), if we do not use http server to statistically analyze visits to static pages, then these cookies are not needed. Therefore, for these resources and content, we should emphasize that these cookies are not sent, or used as little as possible (for some .html static pages, we may only need the session ID to analyze the user access chain).
The way to optimize cookies is very simple: deploy these static resources in a server/group that does not have .alipay.net as the domain, or use other independent domain names. In this case, for a specific and certainly the largest portion of resources, COOKIE consumption can be reduced to 0.
4. Shorten URL
Finally we come to the topic: How long can a URL be? Through the previous analysis, we still have 70 words of book available. Even under certain conditions, we need to leave track data for some page visits (such as session), then we still have 40~50 bytes available. Use . However, that's all, we are still far away from the 443 bytes mentioned at the beginning of this article.
But do we really need such a long URL?
The answer is no, we can shorten the URL. For example, in the previous example, the get part of our original URL is:
[xhtml] view plaincopy
Look carefully, it actually means --------- /min?b=javascript&f=... ---------
field What follows f is actually the splicing of some static resources in the script project arale. On the server side, the min program splices some script fragments into a single .js file according to the parameters "b=javascript&f=..." and returns it to the browser. If there is no change, it directly returns Status Code: 304.
Then, in fact, the parameter block after the "f=..." field we request will be exactly the same every time. Or, even if the list of files required to be spliced is different in different situations, there are only fairly limited combinations. This makes us naturally think of something: summation. Use this method to find a key (such as hash, md5, crc) for the above string, and then we can use this unique key to find the spliced .js content?? This also means that the min program does not need to be used every time Splice text. In this way, the above URL can become (taking crc32 for the 396 bytes after the f field as an example): --------- /min?b=javascript&f=313466DB ------- -- Taking into account different version management: --------- /min?b=javascript&v=0.9b&f=313466DB ---------
Now, we will control the URL On a fairly small scale, with the addition of version management and content validity verification, the server-side min program can also be dynamically generated and cached if necessary. These modifications did not conflict with our original needs. The important thing is that we successfully controlled the get request to 35 bytes, and the remaining space fully met our
overall optimization needs:
The first level of optimization, 70 bytes!
5. Technology maturity and value
1. Twitter has long used this technology.2. Similar to the arale project, the YQL (Yahoo! Query Language) project also has similar needs, so they turned "input a sql in the URL" into a short name through the above technology, for example: http://y.ahoo.it/iHQ8c0sv is equivalent to http://developer.yahoo.com/yql/console/?q=select woeid from geo.places where text="san francisco, ca"
3. Microsoft is still "silly and unclear", so you can see their official website very slowly. ^^.
4. When we have the conditions to reduce the http header to less than 456 bytes, we should do our best. For example, because Wangwang has an independent client, it can customize the http request head to reduce fields such as User-Agent.
5. When we always issue minimized HTTP requests from the browser, the network can always submit the request to the server as quickly as possible without waiting for multiple packages to be combined. This effect will be extremely obvious in slow networks and networks with a lot of packet loss. Simply put, if someone uses Thunder or BT on a LAN, minimizing HTTP requests will significantly improve the web browsing experience.
6. We should do version management of static resources such as scripts.