##1. Background introduction
I believe that many people use mobile phones We have all received some marketing text messages. The text messages sometimes come with some URLs, as shown belowThese URLs are often very short, but when we open them, if you look carefully, there will be a jump in the middle. , the URL displayed in the browser address bar is not the URL you see in the text message, this is the short URL!2. Principle and application
Short URLs generally use a very short domain name, the path Parameters generally only consist of 3-6 characters, which is very concise! The premise of using a short URL is to generate a short URL first, which mainly uses a certain algorithm to make a short character correspond to a long character, for example, from the commonly used 0-9, a-z, A-Z, a total of 62 characters Select 6 characters in , that means there are 62 to the 6th power combinations, and there are about 56.8 billion unique short URLs available! The server queries the real long URL through the path parameter, and then uses 301/302 to jump to the real URL! Regarding jumps, 301 is a permanent redirect and 302 is a temporary redirect. The short address will not change once it is generated, so using 301 is consistent with http semantics. The browser will record the jump address, and at the same time, the pressure on the server will be reduced to a certain extent. But if 301 is used, we cannot count the number of times the short address is clicked. If there are requirements for data statistics, it may be better to use 302 jump! The main benefit of short URLs is that they facilitate transfer and memory, especially when used in text messages. SMS has a limit on the number of words in the content. For example, short URLs are also used for sharing on Weibo!3. Existing cases on the market
There are many free short link services on the market, and their functions are basically the same. limit! (1) Baidu’s short link (dwz.cn/), Baidu not only provides web page entrance, but also provides interfaces and development documents, which is simple and easy to use! (2) Sina’s short link (sina.lt/) currently only provides web page entrance, and no interface service has been found! (3) Taobao’s short link (tb.am/) currently only provides web page entrance, and no interface service has been found! There are many other small companies on the market that provide short link services, some are partially free, and some short links are valid, so I won’t introduce them one by one here!4. Commonly used algorithms
The more popular algorithms on the Internet include hexadecimal algorithm, digest (Hash) algorithm, and random number algorithm. The following is simple Let me introduce:One-base algorithm
This algorithm is also called the self-increasing sequence algorithm on the Internet. Its characteristic is that it never repeats. Setting The id is incremented automatically. A decimal ID corresponds to a 62-digit value, 1 to 1, so there will be no duplication. This takes advantage of the feature that the number of characters will be reduced when the low base is converted to a high base. . Common base systems in computers include binary, octal, decimal, and hexadecimal. The larger the base, the larger the number that can be expressed and the fewer words it takes up. Here's an example: 1000 in decimal is 1750 in octal, and 3E8 in hexadecimal. What about in hexadecimal? Some people say that there is no base 62 in the computer. . . Although there is no one, we can make one. The base conversion algorithm is fixed. The most common one is the "division by base method"! We assume that the 62-digit character sequence is 0-9a-zA-Z. The order can be disrupted, but it should be fixed. It is an array starting from 0 to 61. Let’s call it this for now. For the alphabet! ====> 1000/62 = 16, remainder 8====> 16/62 = 0, remainder 16The number obtained by the remainder It's 16 and 8. Then find the characters marked 16 and 8 in the alphabet and put them together, which is g8. It's very short, only 2 digits! If we want to generate at least 6 digits of characters, then we can start with a relatively large number. You can see the figure below for details: 1 digit 62 0 - 61 2 digits 3844 62 - 3843 3 digits about 230,000 3844 - 238327 4 digits about 14 million 238328 - 14776335 5 digits about 910 million 14776336 - 916132831 6 digits about 56.8 billion 916132832 - 56800235583 Copy code二.Hash algorithm
The first way: Simply salt the long link md5, which will generate a 32-bit string, randomly pick 6 characters from it, or simply Roughly take the last 6 digits, but md5 only contains 0-9A-Fa-f, which is fewer characters than the alphabet, and the chance of conflict is greater! Second method:1. Generate a 32-bit signature string from the long URL md5, divided into 4 segments, each segment is 8 bytes2. Process these four segments in a loop, take 8 bytes, treat them as hexadecimal strings and operate with 0x3fffffff (30 bits 1), that is, ignore processing exceeding 30 bits
3 .These 30 digits are divided into 6 segments. Each 5-digit number is used as an index of the alphabet to obtain a specific character. The 6-digit string
4 is obtained in sequence. The total md5 string can obtain 4 6-digit strings. Take Any one of them can be used as the short url address of this long url.
The generation method is more complicated and the probability of duplication is low, but conflicts will still occur!
3. Random number algorithm
This is simpler. Just randomly select the 62-character array and select 6 of them. Making short link codes is simple and easy to use, but duplicate conflicts will inevitably occur!
4. Algorithm comparison
The first algorithm can avoid conflicts as long as it solves the problem of self-increasing id. The self-increasing id can use the database to automatically increase the id. Increasing the primary key requires only one database operation each time a short code is generated (insert operation, obtain the primary key id, and then calculate the short code)
The second and third algorithms are actually similar, and both rely on The program is random and prone to conflicts, which requires heavy judgment every time it is inserted into the database, which is less efficient!
5. Security
Although short links facilitate transmission and memory, due to the small number of characters in the link, they are more likely to be exploded , guessing attack, the attacker can easily traverse the links composed of all characters!
Therefore, it is not recommended to use short links to send private URLs, such as password reset links, and secondary authentication must be done for links with some permissions and sensitive information!
Recommended tutorial: Laravel practical development short link generator video tutorial
The above is the detailed content of How to implement short URL in PHP. For more information, please follow other related articles on the PHP Chinese website!