The following tutorial column will record the experience of using Laravel-s to resist Baidu crawlers. I hope it will be helpful to friends in need!
What is Laravel-sLaravelS is a glue project for fast Integrate Swoole into Laravel or Lumen to give them better performance
github addressWhy use Laravel-s
After the Baidu applet was launched, the high qps (concurrency) of the Baidu crawler caused the CPU to be fully loaded and the server to crash. The server was configured with 4 cores, 8G memory and 5M broadband. What to do at this time?
Adjust the php-fpm parameters and set it to static. Static mode has higher performance than dynamic mode. For example, if you set the number of child processes to 255 or even higher, the higher the number, the greater the amount of concurrency it will bear, but the higher the number, the more memory it will occupy. Conclusion, it is effective to a certain extent, but it is useless under high concurrency.
Feedback to Baidu to adjust the crawler crawling frequency. Conclusion, wait a minute, the day lilies are already cold, but it’s better to give feedback.
Load balancing. Let other servers share the pressure. The premise is that there are enough servers and the same code must be deployed, and the business that other servers are originally responsible for cannot be affected. Or temporarily apply for N servers in a certain cloud, but you don’t know when the crawler will come and when it will go, which is unrealistic.
The next step is the topic of the article, using Laravel-s to accelerate http response.
Because there was no statistics for all periods at that timeqps Specific values, so there is no way to draw accurate conclusions. We can only compare based on the machine load before and after adjustment.
Before deployment, cpu
was fully loaded, and the machine was down N times and was paralyzed. The external network broadband is full (5M). After deployment, the cpu immediately drops to
. After temporarily upgrading the broadband to 15M, cpu
reaches 60%
. The external network broadband It is still fully occupied (it can only be said that Baidu crawler is a real one, you can get as much bandwidth as you want). In conclusion, it brings at least 5 times performance improvement.
The page crawled by the crawler is only part of the page, so the online project is not transformed into laravel-s is also unrealistic. We only need to separate the crawled pages and deploy them to
laravel-s separately. Create a new empty project, the business logic only processes the captured pages
deployment laravel-s, test api and ab stress test
The online project will proxy the page path crawled by the crawler to the new project, such as
127.0.0.1:6501<div class="code" style="position:relative; padding:0px; margin:0px;"><pre class="brush:php;toolbar:false;">location ~ ^/v1/test.* {
proxy_pass http://127.0.0.1:6501;
proxy_set_header Host $host;}</pre><div class="contentsignin">Copy after login</div></div>
In
conf/laravels.php enabled is twice the number of cpu
cores.
, in memory. Every time you change the code, you need to restart laravel-s
. Due to the reason in Article 2, the database connection cannot be released, and
<div class="code" style="position:relative; padding:0px; margin:0px;"><pre class="brush:php;toolbar:false;">&#39;options&#39; => [
// 开启持久连接
\PDO::ATTR_PERSISTENT => true,],</pre><div class="contentsignin">Copy after login</div></div> in
conf/database.php
mysql configuration
The above is the detailed content of Record the experience of using Laravel-s to resist Baidu crawlers. For more information, please follow other related articles on the PHP Chinese website!