python - pyspider scheduler 停止调度，重启时间长.

Question

当前的pyspider为pyspider (0.3.9) python 2.7.5 大概有200个项目，其中部分stop，运行状态大概有100多个。 projectdb和resultdb 使用的是 mongodb collection有过百万的数据。某些porjectdb 的task数据也有数十...

巴扎黑 · Answer

When scheduler stops scheduling, does it stop all projects or the one you are trying to restart?

Trace the contents of the scheduler log about project %s updated, status:%s, paused:%s, %d tasks to see if schduler knows that the project status has changed.

unknown project 如果 project 确实存在，是不应该出现的
not processing pack This is normal. After the scheduler restarts, the previously distributed tasks cannot be tracked.
When starting, the scheduler needs to restore the status of all active tasks from the database. If there are many tasks, it will indeed be time-consuming.

巴扎黑 · Answer

This problem has been found. In the source code of pyspider, the status_count query of statusdb under mongodb under database is very slow when the amount of data is extremely large, which will cause the scheduler to start up very long