Table of Contents
Reply content:
Home Backend Development PHP Tutorial java - PHP or python for data collection and analysis, what are the more mature frameworks?

java - PHP or python for data collection and analysis, what are the more mature frameworks?

Oct 22, 2016 am 12:14 AM
c++ java node.js php python

I now need to automatically collect data from the article list of a website and the actual content in the list. The id of each article can be obtained in the list, and each article is passed through a unified interface (the parameter brings the article id that is The corresponding json can be obtained) and there is some data that needs to be collected and then analyzed.

Are there any relatively mature frameworks or wheels that can meet my needs? (It needs to be multi-threaded and can run stably 24/7 because the number of collections is huge)

In addition, I would like to ask how to store the collected content (millions to tens of millions). There is some numerical data in the data that needs statistical analysis. Can I use mysql? Or are there other more mature and simple wheels that can be used?

Reply content:

I now need to automatically collect data from the article list of a website and the actual content in the list. The id of each article can be obtained in the list, and each article is passed through a unified interface (the parameter brings the article id that is The corresponding json can be obtained) and there is some data that needs to be collected and then analyzed.

Are there any relatively mature frameworks or wheels that can meet my needs? (It needs to be multi-threaded and can run stably 24/7 because the number of collections is huge)

In addition, I would like to ask how to store the collected content (millions to tens of millions). There is some numerical data in the data that needs statistical analysis. Can I use mysql? Or are there other more mature and simple wheels that can be used?

If it is data analysis.
map-reduce does log analysis
Dpark can solve PV and UV analysis
Spark is also good.
After producing the data report, you can use Pandas for analysis and display. .

If it is data collection. There are many tools.

Why do I think you want to start a search engine? . . The quantity is relatively large. Distributed stuff is recommended.
It is not practical to use MYSQL. . .

Young man, isn’t this what you want from a reptile?

  1. Crawler framework: scrapy

  2. Database selection: You can use MySQL to index at your level for another 500 years

You can also try MongoDB

You didn’t say anything about the language or environment. For multi-threading, nodejs and python are currently generally used. Both of these can use mysql and the like to store data. Millions or tens of millions is not a problem.

Have you ever played with python selenium + PhantomJs?

This is scrapy in python language or this is

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

PHP: An Introduction to the Server-Side Scripting Language PHP: An Introduction to the Server-Side Scripting Language Apr 16, 2025 am 12:18 AM

PHP is a server-side scripting language used for dynamic web development and server-side applications. 1.PHP is an interpreted language that does not require compilation and is suitable for rapid development. 2. PHP code is embedded in HTML, making it easy to develop web pages. 3. PHP processes server-side logic, generates HTML output, and supports user interaction and data processing. 4. PHP can interact with the database, process form submission, and execute server-side tasks.

PHP vs. Python: Use Cases and Applications PHP vs. Python: Use Cases and Applications Apr 17, 2025 am 12:23 AM

PHP is suitable for web development and content management systems, and Python is suitable for data science, machine learning and automation scripts. 1.PHP performs well in building fast and scalable websites and applications and is commonly used in CMS such as WordPress. 2. Python has performed outstandingly in the fields of data science and machine learning, with rich libraries such as NumPy and TensorFlow.

The Performance Race: Golang vs. C The Performance Race: Golang vs. C Apr 16, 2025 am 12:07 AM

Golang and C each have their own advantages in performance competitions: 1) Golang is suitable for high concurrency and rapid development, and 2) C provides higher performance and fine-grained control. The selection should be based on project requirements and team technology stack.

Why Use PHP? Advantages and Benefits Explained Why Use PHP? Advantages and Benefits Explained Apr 16, 2025 am 12:16 AM

The core benefits of PHP include ease of learning, strong web development support, rich libraries and frameworks, high performance and scalability, cross-platform compatibility, and cost-effectiveness. 1) Easy to learn and use, suitable for beginners; 2) Good integration with web servers and supports multiple databases; 3) Have powerful frameworks such as Laravel; 4) High performance can be achieved through optimization; 5) Support multiple operating systems; 6) Open source to reduce development costs.

PHP and the Web: Exploring its Long-Term Impact PHP and the Web: Exploring its Long-Term Impact Apr 16, 2025 am 12:17 AM

PHP has shaped the network over the past few decades and will continue to play an important role in web development. 1) PHP originated in 1994 and has become the first choice for developers due to its ease of use and seamless integration with MySQL. 2) Its core functions include generating dynamic content and integrating with the database, allowing the website to be updated in real time and displayed in personalized manner. 3) The wide application and ecosystem of PHP have driven its long-term impact, but it also faces version updates and security challenges. 4) Performance improvements in recent years, such as the release of PHP7, enable it to compete with modern languages. 5) In the future, PHP needs to deal with new challenges such as containerization and microservices, but its flexibility and active community make it adaptable.

Python vs. JavaScript: The Learning Curve and Ease of Use Python vs. JavaScript: The Learning Curve and Ease of Use Apr 16, 2025 am 12:12 AM

Python is more suitable for beginners, with a smooth learning curve and concise syntax; JavaScript is suitable for front-end development, with a steep learning curve and flexible syntax. 1. Python syntax is intuitive and suitable for data science and back-end development. 2. JavaScript is flexible and widely used in front-end and server-side programming.

Golang vs. Python: Concurrency and Multithreading Golang vs. Python: Concurrency and Multithreading Apr 17, 2025 am 12:20 AM

Golang is more suitable for high concurrency tasks, while Python has more advantages in flexibility. 1.Golang efficiently handles concurrency through goroutine and channel. 2. Python relies on threading and asyncio, which is affected by GIL, but provides multiple concurrency methods. The choice should be based on specific needs.

Choosing Between PHP and Python: A Guide Choosing Between PHP and Python: A Guide Apr 18, 2025 am 12:24 AM

PHP is suitable for web development and rapid prototyping, and Python is suitable for data science and machine learning. 1.PHP is used for dynamic web development, with simple syntax and suitable for rapid development. 2. Python has concise syntax, is suitable for multiple fields, and has a strong library ecosystem.

See all articles