Big data full-stack development language – Python-Python Tutorial-php.cn

Home

Backend Development

Python Tutorial

Big data full-stack development language – Python

巴扎黑

Mar 29, 2017 pm 03:51 PM

python develop data language

Some time ago, ThoughtWorks held a community event in Shenzhen, and there was a speech titled "Fullstack JavaScript", which was about using JavaScript for front-end, server-side, and even database (MongoDB) development. A web application developer only needs to learn one. language, you can implement the entire application.

Inspired by this, I discovered that Python can be called a big data full-stack development language. Because Python is a hot language in cloud infrastructure, DevOps, big data processing and other fields.

Field	Popular language
Cloud Infrastructure	Python, Java, Go
DevOps	Python, Shell, Ruby, Go
Web Crawler	Python, PHP, C++
data processing	Python, R, Scala

Just like you can write a complete web application as long as you know JavaScript, you can implement a complete big data processing platform as long as you know Python.

Cloud infrastructure

These days, if we don’t support cloud platforms, massive data, or dynamic scaling, we don’t dare to say that we do big data. At most, we dare to tell others that we do business intelligence (BI).

Cloud platforms are divided into private clouds and public clouds. OpenStack, the popular private cloud platform, is written in Python. CloudStack, the former pursuer, strongly emphasized that it was written in Java and had advantages over Python when it was first launched. As a result, at the beginning of 2015, Citrix, the founder of CloudStack, announced that it would join the OpenStack Foundation, and CloudStack was about to come to an end.

If you find it troublesome and don’t want to build your own private cloud, use public clouds. Whether it’s AWS, GCE, Azure, Alibaba Cloud, or Qingyun, they all provide Python SDKs. GCE only provides Python and JavaScript SDKs, while Qingyun only provides Python SDKs. . It can be seen that various cloud platforms attach great importance to Python.

When it comes to infrastructure construction, we have to mention Hadoop. Today, Hadoop is no longer the first choice for big data processing because its MapReduce data processing speed is not fast enough. However, HDFS and Yarn, the two components of Hadoop, are becoming more and more popular. The more popular it becomes. The development language of Hadoop is Java, and there is no official Python support. However, there are many third-party libraries that encapsulate Hadoop's API interface (pydoop, hadoopy, etc.).

The replacement of Hadoop MapReduce is Spark, which is said to be 100 times faster. Its development language is Scala, but it provides development interfaces for Scala, Java, and Python. It is really unreasonable to want to please so many data scientists who develop in Python without supporting Python. . HDFS alternatives, such as GlusterFS, Ceph, etc., all directly provide Python support. A replacement for Yarn, Mesos is implemented in C++. In addition to C++, it also provides support packages for Java and Python.

DevOps

DevOps has a Chinese name, which is called development and self-operation and maintenance. In the Internet era, only by being able to quickly test new ideas and deliver business value safely and reliably as soon as possible can we remain competitive. The automated build/test/deployment and system measurement and other technical practices advocated by DevOps are indispensable in the Internet era.

Automated construction is easy because of the application. If it is a Python application, because of the existence of tools such as setuptools, pip, virtualenv, tox, flake8, etc., automated construction is very simple. Moreover, because almost all Linux systems have built-in Python interpreters, using Python for automation does not require any pre-installed software on the system.

In terms of automated testing, the Python-based Robot Framework is the favorite automated testing framework for enterprise-level applications, and it has nothing to do with language. Cucumber also has many supporters, and its Python counterpart Lettuce can do exactly the same thing. Locust has also begun to receive more and more attention in automated performance testing.

Automated configuration management tools, old ones such as Chef and Puppet, are developed in Ruby and still maintain a strong momentum. However, the new generation of Ansible and SaltStack - both developed in Python - are more lightweight than the previous two and are welcomed by more and more developers, which has begun to create a lot of pressure on their predecessors.

In terms of system monitoring and measurement, traditional Nagios is gradually declining, upstarts such as Sensu are well received, and New Relic in the form of cloud services has become the standard for startups. None of these are directly implemented through Python, but Python needs to be connected to these tools. , not difficult.

In addition to the above tools, PaaS platforms based on Python that provide complete DevOps functions, such as Cloudify and Deis, have not yet become popular, but they have already received a lot of attention.

Web Crawler

Where does the data of big data come from? Except for some companies that have the ability to generate large amounts of data themselves, most of the time, they need to rely on crawlers to capture Internet data for analysis.

Web crawlers are Python's traditional strong areas. The most popular crawler framework Scrapy, HTTP tool kit urlib2, HTML parsing tool beautifulsoup, XML parser lxml, etc. are all class libraries that can stand alone.

However, web crawlers are not just as simple as opening web pages and parsing HTML. An efficient crawler must be able to support a large number of flexible concurrent operations, and often be able to crawl thousands or even tens of thousands of web pages at the same time. The traditional thread pool method wastes a lot of resources. After the number of threads reaches thousands, system resources are basically wasted. Thread scheduling is on. Because Python can well support coroutine operations, many concurrency libraries have been developed based on this, such as Gevent, Eventlet, and distributed task frameworks such as Celery. ZeroMQ, which is considered more efficient than AMQP, was also the first to provide a Python version. With support for high concurrency, web crawlers can truly reach the scale of big data.

The captured data needs word segmentation processing, and Python is not inferior in this regard. The famous natural language processing package NLTK, and Jieba, which specializes in Chinese word segmentation, are all powerful tools for word segmentation.

data processing

All is ready except for the opportunity. This east wind is the data processing algorithm. From statistical theory, to data mining, machine learning, to the deep learning theory proposed in recent years, data science is in an era where a hundred flowers are blooming. What programming do data scientists use?

If it is in the field of theoretical research, the R language may be the most popular among data scientists, but the problems with the R language are also obvious. Because statisticians created the R language, its syntax is slightly weird. Moreover, if R language wants to realize a large-scale distributed system, it will still take a long time to go on the engineering road. Therefore, many companies use R language for prototype testing. After the algorithm is determined, it is translated into engineering language.

Python is also one of the favorite languages of data scientists. Unlike the R language, Python itself is an engineering language. The algorithms implemented by data scientists in Python can be directly used in products, which is very helpful for big data startups to save costs. Officially because of data scientists' love for Python and R, Spark provides very good support for these two languages in order to please data scientists.

Python has many data processing related libraries. The high-performance scientific computing libraries NumPy and SciPy lay a very good foundation for other advanced algorithms. matploglib makes Python drawing as easy as Matlab. Scikit-learn and Milk implement many machine learning algorithms. Pylearn2 implemented based on these two libraries is an important member of the deep learning field. Theano uses GPU acceleration to achieve high-performance mathematical symbolic calculations and multi-dimensional matrix calculations. Of course, there is also Pandas, a big data processing library that has been widely used in the engineering field. Its DataFrame design is borrowed from the R language, and later inspired the Spark project to implement a similar mechanism.

By the way, there is also iPython. This tool is so useful that I almost regarded it as a standard library and forgot to introduce it. iPython is an interactive Python running environment that allows you to see the results of each piece of Python code in real time. By default, iPython runs on the command line, and you can execute ipython notebook to run it on the web page. Figures drawn with matplotlib can be directly displayed embedded in iPython Notebook.
The notebook files of iPython Notebook can be shared with other people, so that others can reproduce your work results in their own environment; if the other party does not have a running environment, they can also be directly converted into HTML or PDF.

Why Python

It is precisely because application development engineers, operation and maintenance engineers, and data scientists all like Python that Python has become a full-stack development language for big data systems.

For development engineers, the elegance and simplicity of Python are undoubtedly the biggest attraction. In the Python interactive environment, execute import this and read the Zen of Python, and you will understand why Python is so attractive. The Python community has always been very dynamic. Unlike the explosive growth of software packages in the NodeJS community, the growth rate of Python software packages has been relatively stable, and the quality of the software packages is also relatively high. Many people criticize Python for having too strict requirements on spaces, but it is precisely because of this requirement that Python has an advantage over other languages when doing large-scale projects. OpenStack projects total more than 2 million lines of code to prove this.

For operation and maintenance engineers, the biggest advantage of Python is that almost all Linux distributions have built-in Python interpreters. Although Shell is powerful, its syntax is not elegant enough, and it will be painful to write more complex tasks. Using Python to replace Shell to do some complex tasks is a liberation for operation and maintenance personnel.

For data scientists, Python is simple yet powerful. Compared with C/C++, there is no need to do a lot of low-level work and model verification can be carried out quickly; compared with Java, Python has concise syntax and strong expressive ability, and the same work only requires 1/3 of the code; compared with Matlab and Octave, Python's engineering maturity is higher. More than one programming expert has expressed that Python is the most suitable language to use as a university computer science programming course - MIT's introductory computer course uses Python - because Python can let people learn the most important thing about programming - how to solve problems question.

By the way, Microsoft participated in PyCon 2015 and made a high-profile announcement to improve the Python programming experience on Windows, including Visual Studio supporting Python, optimizing the compilation of Python C extensions on Windows, and so on. Imagine a future scenario where Python becomes the default component of Windows.

The above is the detailed content of Big data full-stack development language – Python. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

How to fix KB5055523 fails to install in Windows 11?

3 weeks ago By DDD

How to fix KB5055518 fails to install in Windows 10?

3 weeks ago By DDD

Roblox: Dead Rails - How To Tame Wolves

4 weeks ago By DDD

Roblox: Grow A Garden - Complete Mutation Guide

2 weeks ago By DDD

Strength Levels for Every Enemy & Monster in R.E.P.O.

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial

1657

CakePHP Tutorial

1415

Laravel Tutorial

1309

PHP Tutorial

1257

C# Tutorial

1229

Related knowledge

PHP and Python: Different Paradigms Explained Apr 18, 2025 am 12:26 AM

PHP is mainly procedural programming, but also supports object-oriented programming (OOP); Python supports a variety of paradigms, including OOP, functional and procedural programming. PHP is suitable for web development, and Python is suitable for a variety of applications such as data analysis and machine learning.

Choosing Between PHP and Python: A Guide Apr 18, 2025 am 12:24 AM

PHP is suitable for web development and rapid prototyping, and Python is suitable for data science and machine learning. 1.PHP is used for dynamic web development, with simple syntax and suitable for rapid development. 2. Python has concise syntax, is suitable for multiple fields, and has a strong library ecosystem.

PHP and Python: A Deep Dive into Their History Apr 18, 2025 am 12:25 AM

PHP originated in 1994 and was developed by RasmusLerdorf. It was originally used to track website visitors and gradually evolved into a server-side scripting language and was widely used in web development. Python was developed by Guidovan Rossum in the late 1980s and was first released in 1991. It emphasizes code readability and simplicity, and is suitable for scientific computing, data analysis and other fields.

Python vs. JavaScript: The Learning Curve and Ease of Use Apr 16, 2025 am 12:12 AM

Python is more suitable for beginners, with a smooth learning curve and concise syntax; JavaScript is suitable for front-end development, with a steep learning curve and flexible syntax. 1. Python syntax is intuitive and suitable for data science and back-end development. 2. JavaScript is flexible and widely used in front-end and server-side programming.

How to run sublime code python Apr 16, 2025 am 08:48 AM

To run Python code in Sublime Text, you need to install the Python plug-in first, then create a .py file and write the code, and finally press Ctrl B to run the code, and the output will be displayed in the console.

Can vs code run in Windows 8 Apr 15, 2025 pm 07:24 PM

VS Code can run on Windows 8, but the experience may not be great. First make sure the system has been updated to the latest patch, then download the VS Code installation package that matches the system architecture and install it as prompted. After installation, be aware that some extensions may be incompatible with Windows 8 and need to look for alternative extensions or use newer Windows systems in a virtual machine. Install the necessary extensions to check whether they work properly. Although VS Code is feasible on Windows 8, it is recommended to upgrade to a newer Windows system for a better development experience and security.

Where to write code in vscode Apr 15, 2025 pm 09:54 PM

Writing code in Visual Studio Code (VSCode) is simple and easy to use. Just install VSCode, create a project, select a language, create a file, write code, save and run it. The advantages of VSCode include cross-platform, free and open source, powerful features, rich extensions, and lightweight and fast.

How to run python with notepad Apr 16, 2025 pm 07:33 PM

Running Python code in Notepad requires the Python executable and NppExec plug-in to be installed. After installing Python and adding PATH to it, configure the command "python" and the parameter "{CURRENT_DIRECTORY}{FILE_NAME}" in the NppExec plug-in to run Python code in Notepad through the shortcut key "F6".

See all articles