Detailed comparison between R language and python

爱喝马黛茶的安东尼
Release: 2019-06-15 13:17:10
Original
14262 people have browsed it

Comparing the r language and python, both have their own merits. Which one you want to learn can be made according to your actual needs. Of course, it is best to learn both.

Related recommendations: "python video"

Detailed comparison between R language and python

##01 Development purpose

R The language

R was developed by statisticians. It was born with the important mission of statistical analysis, drawing, and data mining. Therefore, there are a lot of statistical principles and knowledge in the R language system.

If you have some statistical background, R will make it more enjoyable and refreshing for you to use various models and complex formulas, because you can always find the corresponding package and you can call it with just a few lines of code.


Python


The original intention of the founder of Python was to be an open language designed for non-professional programmers. Elegant, clear and simple are its labels. Therefore, there are always people singing "Life is short, I use Python".

Data analysis, web crawlers, programming development, artificial intelligence, etc. As a multi-functional glue language, Python’s usage purposes and learning paths are more diverse.


02 Applicable people

Although they are all popular in the data science community, the choice of tools will depend on your field and what you want to solve. The problem varies from person to person.

R Language


At first, R was used more in academic research and survey work, and gradually extended to the corporate and commercial world. Users do not necessarily need a computer background. Statistics, finance, economics, nuclear power, environment, medical care, logistics management, and even humanities all have a foothold in R language.

Similarly, given that R is a more efficient independent data analysis tool in terms of data exploration and statistical analysis, people with a good background in mathematical statistics will be more comfortable using it. It comes with a base-R basic module , mle - maximum likelihood estimation module, ts - time series analysis module, mva - multivariate statistical analysis module, etc.


Python


#Compared with R's non-standard code, Python is a well-known tool with concise syntax, which is particularly friendly to people with a little programming foundation. It can reduce stumbling blocks in the programming process.

Novices in programming without any basic knowledge can also get started with Python, and its applicable scope also covers various industries such as finance, medical care, management, and communication.


In addition to data analysis, if you also need to integrate with web applications, or need to connect to data sources, read, call other languages, etc., using Python is a more convenient choice. , "one-stop solution".


03 Learning Curve

This is one of the questions that beginners are most concerned about before getting started. Which one is more difficult to learn?

In fact, because we don’t understand everyone’s knowledge background and learning costs, this question cannot give a black-and-white absolute answer. This is why users of R and Python on various forums always have different opinions on how difficult it is to get started.


R Language


Started learning R. After understanding the most basic knowledge and language logic, it is not difficult to get started. And if you have a good foundation in mathematical statistics, you will feel more comfortable as you learn. On the contrary, if you have no mathematical background at all, you will feel that the difficulty increases significantly.

Python


Python values ​​readability and ease of use, and its learning curve is relatively gentle. For beginners, it is relatively friendly, but if you want to learn in depth and expand your direction, you still need to master a lot of package knowledge and usage.


If you really need to define and compare the difficulty of the learning curve between the two, you need to first clarify what your learning purpose is.


04 Industry Selection & Development Direction

There are a lot of data comparing the popularity of R and Python on the Internet. Overall, Python ranks higher, mainly The reason is that R is only used in the context of data science, while Python, as a general-purpose language, is widely used.


R language


Scenarios for applying R: data exploration, statistical analysis, data visualization

Positions for applying R skills: data analyst, data Scientists, investment analysts, tax personnel, managers, scientific researchers, etc.


Development direction: combine professional knowledge from various industries to do in-depth business data processing and statistical analysis


Python

Scenarios for applying Python: data analysis, web crawlers, system programming, graphics processing, text processing, database programming, network programming, Web programming, database connections, artificial intelligence, machine learning, etc.

Positions using Python: data architect, data analyst, data engineer, data scientist, program developer, etc.


Development direction: Combining professional knowledge from various industries, doing various types of or Collaborative work


05 Comparative analysis of advantages and disadvantages

is here! In specific use, the two tools must have their own advantages and disadvantages, and their respective focuses. Knowing which point is most important to you is the key to your choice.

data visualization

Words are not as good as tables, tables are not as good as pictures. R and visualization are a perfect match. Some essential visualization software packages are such as ggplot2, ggvis, googleVis and rCharts. Due to the complete statistical model and exquisite detailed design, in R you can quickly complete a beautiful and grand layout using one or a few lines of code. 100% data graph, you can clearly see the characteristics and trends of the data.

Python also has some good visualization libraries, such as Matplotlib, Seaborn, Bokeh and Pygal. It can also complete data graphs as beautiful as R, but you need to write your own code to express and define them, such as Line graphs, bar graphs, distance and proportion of horizontal and vertical coordinates, color selection, etc.

Data Analysis

R contains more built-in functions for data analysis. You can directly use the summary built-in function. Dataframe is built-in to R. structure.

Python needs to rely on third-party software packages, such as statsmodels and pandas packages, to provide powerful data analysis functions.

Data structure

The data structure in R is very simple, mainly including vectors (one-dimensional), multi-dimensional arrays (two-dimensional) matrix), list (unstructured data), data frame (structured data). The variable type of R is relatively simple, and the variable type is the same in different packages.

Python contains richer data structures to achieve more precise access to data and memory control, multi-dimensional arrays (read-write, ordered), tuples (read-only, ordered), sets (Repetitive, disordered), dictionary (Key-Value), etc. In different packages, there are different expressions to define variables. For example, series is used to express lists in the pandas package, while array is used to express lists in the numpy package.

In comparison, Python's richer data structure will increase the learning cost, but it runs more accurately and faster.

Running speed

The running speed of R is relatively slow. In the regression of large samples, insufficient memory will occur if used improperly. Usually, big data needs to be converted into small data through the database (through groupby) before it can be handed over to R for analysis. Or combined with other big data processing tools, such as spark.

Although Python is not as fast as C, it is still very advantageous compared to R. It can directly process G data and is more accurate in very large data operations. a little better.

Help documentation and self-study costs

Compared with Python, which has a wider range of users, R’s help documentation is relatively undetailed and incomplete. The accompanying chestnut is also relatively concise, with some general explanations and usage.

However, Python's code statements, example displays, parameter analysis and other details are relatively complete. People who write help documents will more often provide a complete demo, so it is more friendly for self-taught people.

In addition, Python is a universal language. You can share notebooks with friends without them having to install anything. More importantly, you can bring people from different backgrounds together to be flexible. It has strong flexibility, good scalability, multi-functional work, and is very likely to generate more sparks of thinking.

Example

Text information mining is a common data processing and analysis usage scenario, such as e-commerce online shopping evaluation, social networking Website tags, sentiment analysis in news, etc.

When using R for sentiment analysis, you need to preprocess the data, remove useless symbols, and do word segmentation. Then build a word-document-label data set to create a document-term matrix, and then use various packages to perform machine learning algorithms.

Since the text of sentiment analysis is usually a very large-scale data, the processing speed in R is relatively slow, and multiple packages need to be used to collaborate.

When using Python for sentiment analysis, you first need to decompose the sentence into words, then perform feature extraction and remove stop words; then reduce the dimension, and then perform classification algorithm model training and model evaluation

Python's packages are highly integrated, especially for the problem of text mining sentiment analysis, which can make this operation faster and easier.

Time series analysis is a theory and method of establishing mathematical models through curve fitting and parameter estimation based on time series data obtained from system observations. It can be used in the financial field, weather forecasting, market analysis, etc.

When using R language for time series analysis, R has many packages that can be used to process regular and irregular time series, which is very advantageous, such as library(xts), library(timeSeires), library(zoo) —Time basic package, library (FinTS)--call the autoregressive test function, etc., and the results produced are also very intuitive and clear.

When using Python for timing analysis, there is no particularly complete timing analysis package, and there are no equations written specifically for prediction. Just like visualization, the operator needs to write more code by himself. Commonly used, statsmodels module, this module can be used for time series difference, modeling and model testing.

Have you felt it through the above two chestnuts!

There is no distinction between good and bad tools, it depends on the specific problem you want to solve.

The above is the detailed content of Detailed comparison between R language and python. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template