


java - PHP or python for data collection and analysis, what are the more mature frameworks?
I now need to automatically collect data from the article list of a website and the actual content in the list. The id of each article can be obtained in the list, and each article is passed through a unified interface (the parameter brings the article id that is The corresponding json can be obtained) and there is some data that needs to be collected and then analyzed.
Are there any relatively mature frameworks or wheels that can meet my needs? (It needs to be multi-threaded and can run stably 24/7 because the number of collections is huge)
In addition, I would like to ask how to store the collected content (millions to tens of millions). There is some numerical data in the data that needs statistical analysis. Can I use mysql? Or are there other more mature and simple wheels that can be used?
Reply content:
I now need to automatically collect data from the article list of a website and the actual content in the list. The id of each article can be obtained in the list, and each article is passed through a unified interface (the parameter brings the article id that is The corresponding json can be obtained) and there is some data that needs to be collected and then analyzed.
Are there any relatively mature frameworks or wheels that can meet my needs? (It needs to be multi-threaded and can run stably 24/7 because the number of collections is huge)
In addition, I would like to ask how to store the collected content (millions to tens of millions). There is some numerical data in the data that needs statistical analysis. Can I use mysql? Or are there other more mature and simple wheels that can be used?
If it is data analysis.
map-reduce does log analysis
Dpark can solve PV and UV analysis
Spark is also good.
After producing the data report, you can use Pandas for analysis and display. .
If it is data collection. There are many tools.
Why do I think you want to start a search engine? . . The quantity is relatively large. Distributed stuff is recommended.
It is not practical to use MYSQL. . .
Young man, isn’t this what you want from a reptile?
Crawler framework: scrapy
Database selection: You can use MySQL to index at your level for another 500 years
You can also try MongoDB
You didn’t say anything about the language or environment. For multi-threading, nodejs and python are currently generally used. Both of these can use mysql and the like to store data. Millions or tens of millions is not a problem.
Have you ever played with python selenium + PhantomJs?
This is scrapy in python language or this is

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



PHP is a server-side scripting language used for dynamic web development and server-side applications. 1.PHP is an interpreted language that does not require compilation and is suitable for rapid development. 2. PHP code is embedded in HTML, making it easy to develop web pages. 3. PHP processes server-side logic, generates HTML output, and supports user interaction and data processing. 4. PHP can interact with the database, process form submission, and execute server-side tasks.

PHP is suitable for web development and content management systems, and Python is suitable for data science, machine learning and automation scripts. 1.PHP performs well in building fast and scalable websites and applications and is commonly used in CMS such as WordPress. 2. Python has performed outstandingly in the fields of data science and machine learning, with rich libraries such as NumPy and TensorFlow.

Golang and C each have their own advantages in performance competitions: 1) Golang is suitable for high concurrency and rapid development, and 2) C provides higher performance and fine-grained control. The selection should be based on project requirements and team technology stack.

The core benefits of PHP include ease of learning, strong web development support, rich libraries and frameworks, high performance and scalability, cross-platform compatibility, and cost-effectiveness. 1) Easy to learn and use, suitable for beginners; 2) Good integration with web servers and supports multiple databases; 3) Have powerful frameworks such as Laravel; 4) High performance can be achieved through optimization; 5) Support multiple operating systems; 6) Open source to reduce development costs.

PHP has shaped the network over the past few decades and will continue to play an important role in web development. 1) PHP originated in 1994 and has become the first choice for developers due to its ease of use and seamless integration with MySQL. 2) Its core functions include generating dynamic content and integrating with the database, allowing the website to be updated in real time and displayed in personalized manner. 3) The wide application and ecosystem of PHP have driven its long-term impact, but it also faces version updates and security challenges. 4) Performance improvements in recent years, such as the release of PHP7, enable it to compete with modern languages. 5) In the future, PHP needs to deal with new challenges such as containerization and microservices, but its flexibility and active community make it adaptable.

Python is more suitable for beginners, with a smooth learning curve and concise syntax; JavaScript is suitable for front-end development, with a steep learning curve and flexible syntax. 1. Python syntax is intuitive and suitable for data science and back-end development. 2. JavaScript is flexible and widely used in front-end and server-side programming.

Golang is more suitable for high concurrency tasks, while Python has more advantages in flexibility. 1.Golang efficiently handles concurrency through goroutine and channel. 2. Python relies on threading and asyncio, which is affected by GIL, but provides multiple concurrency methods. The choice should be based on specific needs.

PHP is suitable for web development and rapid prototyping, and Python is suitable for data science and machine learning. 1.PHP is used for dynamic web development, with simple syntax and suitable for rapid development. 2. Python has concise syntax, is suitable for multiple fields, and has a strong library ecosystem.
