This itself uses verification code technology to prevent network programs such as crawlers. What I know about cracking verification codes is to use artificial intelligence image recognition. It seems that there are similar functions available, but the accuracy is not too high.
For verification code issues, firstly, you can turn to the API provided by professional service providers (they use machine learning or artificial intelligence), such as Youyoutu; secondly, you can write your own verification code recognition program and provide a project for reference: https://github.com /luyishisi/...
Picture 1 is easy to process, the verification code is just a picture, and the verification code can be obtained through picture processing (ocr technology); Picture 2 is more troublesome. If you use the first method, its numbers will be overlaid on the text. After getting the picture The content is more difficult. I don’t have any good methods for the second method. I hope students with experience in this field can help me with the answer
Verification code is used to counter machines and crawlers. If the verification code can be easily bypassed by your automated crawler, can it still be called a verification code? The author should first find out what the mechanism of the verification code is, and then see if it is true. You can imagine that it can be easily bypassed. In short, unless there are loopholes in the verification code implementation of other websites, you cannot bypass the verification code mechanism. You can only recognize the text on the verification code, such as OCR (Optical Character Recognition) technology It is used to solve this problem. OCR refers to the process in which an electronic device (such as a scanner) checks the characters printed on the paper. It determines its shape by detecting the dark/light pattern, and then uses character recognition methods to translate the shape into computer text.
In short, the verification code identification threshold is high and the cost is high, so it is unavoidable. For example, in the picture below, the verification codes are staggered and overlapping, making it difficult to identify.
To deal with complex verification codes, the more efficient and time-saving method should be to connect to the coding platform and let their manual processing.
This itself uses verification code technology to prevent network programs such as crawlers. What I know about cracking verification codes is to use artificial intelligence image recognition. It seems that there are similar functions available, but the accuracy is not too high.
For verification code issues, firstly, you can turn to the API provided by professional service providers (they use machine learning or artificial intelligence), such as Youyoutu; secondly, you can write your own verification code recognition program and provide a project for reference: https://github.com /luyishisi/...
One solution is to manually log in to the browser and then extract the cookies and directly include them in the crawler request and send them out.
Picture 1 is easy to process, the verification code is just a picture, and the verification code can be obtained through picture processing (ocr technology);
Picture 2 is more troublesome. If you use the first method, its numbers will be overlaid on the text. After getting the picture The content is more difficult. I don’t have any good methods for the second method. I hope students with experience in this field can help me with the answer
Verification code is used to counter machines and crawlers. If the verification code can be easily bypassed by your automated crawler, can it still be called a verification code? The author should first find out what the mechanism of the verification code is, and then see if it is true. You can imagine that it can be easily bypassed. In short, unless there are loopholes in the verification code implementation of other websites, you cannot bypass the verification code mechanism. You can only recognize the text on the verification code, such as OCR (Optical Character Recognition) technology It is used to solve this problem. OCR refers to the process in which an electronic device (such as a scanner) checks the characters printed on the paper. It determines its shape by detecting the dark/light pattern, and then uses character recognition methods to translate the shape into computer text.
Basic steps for verification code recognition:
1. Preprocessing
2. Grayscale
3. Binarization
4. Denoising
5. Segmentation
6. Recognition
In short, the verification code identification threshold is high and the cost is high, so it is unavoidable.
For example, in the picture below, the verification codes are staggered and overlapping, making it difficult to identify.
You can use a verification code service like the 9eu I am using.
The easiest way is to take out the cookies and write them in the code, but cookies are time-sensitive
To deal with complex verification codes, the more efficient and time-saving method should be to connect to the coding platform and let their manual processing.