Python TutorialThe column introduces the overview data.
Recommended (free): Python Tutorial
Article Directory
1. Fundamentals of the big data era
The development status of the big data industry:
Now data has shownexplosive
growth, and there may be 100% of data every minute :
The domestic big data application status is as follows (from CSDN):
It can be seen that the application of big data has reached a certain scale, but there is still a lot of room for development. .
Talent needs mainly include:
Data Analyst2. Data analyst career prospects
Problems that data analysts need to solve:
Estimated demand, allocation Production Capacity
In the era of big data, the ability to interpret data is even more needed.star products
.
The key is to find the star product, which requires counting the total turnover of bread, and then calculating the relative proportion of each type of bread to the total turnover, and giving priority to the production of product combinations that can account for 70% of the turnover. This will use the statistical distribution table and histogram. This analysis method is also called the ABC analysis method, as follows:
Evaluate the effectiveness of the marketing plan
Q: If you want to sell bread online, which kind of advertising is more effective?
A: Write two types of copywriting and advertise them for a period of time to see how effective they are. To compare advertising effectiveness, the best way is to use statistical randomized controlled experiments
, where two types of advertising appear randomly. After a period of time, observe which advertising effect is better, and then use it on a large scale. Advertising that is more effective.
Product Quality Control
You need to know the average weight of the bread first, and then sample the bread to see if the weight of the bread shows a bell-shaped curve with a normal distribution? If it deviates from the curve, it may indicate a problem with the quality of the bread. as follows:
A good data analyst is a good product planner and a leader in the industry;
In IT companies, excellent data analysts are very promising Become a senior member of the company.
The workflow of a data analyst is as follows:
The three major tasks of a data analyst:
8 skills required by data analysts:
Three major abilities required by data analysts:
Typical Data Analyst Growth History:
3. The road to becoming a data analyst
Self-cultivation to become a data analyst:
The skills that data analysts need to possess are as follows:
Python to help you learn: Python is not only a programming language , and is the basis of data mining machine learning and other technologies, which facilitates the establishment of automated workflows;
It is not difficult to get started with Python, and its mathematical requirements are not too high. The important thing is to know how to express an algorithmic logic in language;
Python has many encapsulated tool libraries and commands. What needs to be done is to use mathematical methods to solve a problem and build it.
(1) The biggest feature of Python is that it has a huge and active
Scientific Computing community , the trend of using python for scientific computing is becoming more and more obvious. (2) Because Python has continuously improved libraries, it has become a major alternative for data processing tasks. Combined with its strong strength in general programming, you can just use Python as a language to build data-based applications. Central applications, including:
1.Python version
Python is divided into two major versions: 3.X and 2.X. 2. Install Python on different systems (1)Unix & Linux system comes with python 2.7, you can execute 3. Environment variable configuration XXX\PythonXXX 4. Install pip 5. Integrated development environment selection PyCharm is a Python IDE created by JetBrains, supporting Mac OS, Windows, and Linux systems. 3. Introduction and installation of Anaconda 1.What is Anaconda The Python distribution of Scientific Computing supports Linux, Mac, and Windows systems, and has built-in commonly used scientific computing libraries. It solves two major pain points of official Python: 2. Download and install Anaconda Python3 The installation package of .8Personal Edition is enough, but the download speed from the official website is slow, so I have downloaded and sorted out the Anaconda installation package corresponding to Python 3.8. You can directly click to add the QQ group
963624318 Just download it from the group folder Python related installation package. Finally click Next. After the installation is complete, click the Win key (under Windows system) to see the recently added or application list A as shown below: At this time, you can click Anaconda Navigator, as shown below: You can see that the environment is Python 3.8.3, and the basic environment created by Anaconda is named base. It is also the default environment, and you can also see the libraries installed by default. Open the Anaconda command line tool Anaconda Powershell Prompt, enter You can also create a new conda environment through commands, such as Activate the environment and execute the command You can execute 3. Introduction to the conda tool and package management conda is a tool for package management and environment management under Anaconda. Its function is similar to the combination of pip and virtualenv. Conda’s environment management is basically the same as virtualenv. It's a similar operation. Common conda commands and their meanings are as follows: Common conda package management commands are as follows: In conda, 四、Jupyter Notebook 1.Jupyter Notebook基本介绍 Jupyter Notebook(此前被称为IPython notebook)是一个交互式笔记本,支持运行40多种编程语言。 在开始使用notebook之前,需要先安装该库: 在命令行中执行 可以看到,notebook界面由以下部分组成: 2.Jupyter Notebook的使用 在Jupyter页面下方的主要区域,由被称为单元格的部分组成。每个notebook由多个单元格构成,而每个单元格又可以有不同的用途。 如果想新建一个notebook,只需要点击New,选择希望启动的notebook类型即可。 简单使用示意如下: 可以看到,notebook可以修改之前的单元格,对其重新计算,这样就可以更新整个文档了。如果你不想重新运行整个脚本,只想用不同的参数测试某个程式的话,这个特性显得尤其强大。 再测试标题和其他代码如下: 可以看到,在顶部添加了一个notebook的标题,还可以执行for循环等语句。 3.Jupyter中使用Python Jupyter测试Python变量和数据类型如下: 测试Python函数如下: 测试Python模块如下: 可以看到,在执行出错时,也会抛出异常。 测试数据读写如下: 数据读写很重要,因为进行数据分析时必须先读取数据,进行数据处理后也要进行保存。 4.数据交互案例 加载csv数据,处理数据,保存到MongoDB数据库 有csv文件Python Data Analysis Practical Overview Data Analysis.csv和Python Data Analysis Practical Overview Data Analysis.csv,分别是商品数据和用户评分数据,如下: 如需获取数据、代码等相关文件进行测试学习,可以直接点击加QQ群
963624318 在群文件夹Python数据分析实战中下载即可。 现在需要通过Python将其读取出来,并将指定的字段保存到MongoDB中,需要在Anaconda中执行命令 Python代码如下: 在启动MongoDB服务后,运行Python代码,运行完成后,再通过Robo 3T查看数据库如下: 显然,保存数据成功。 使用Jupyter处理商铺数据 待处理的数据是商铺数据,如下: 包括名称、评论数、价格、地址、评分列表等,其中评论数、价格和评分均不规则、需要进行数据清洗。 如需获取数据、代码等相关文件进行测试学习,可以直接点击加QQ群
963624318 在群文件夹Python数据分析实战中下载即可。 Jupyter中处理如下: 可以看到,最后得到了经过清洗后的规则数据。 完整Python代码如下: 更多编程相关知识,请访问:编程教学!!
Version 3.0 of Python is often called Python 3000, or Py3k for short. This is a major upgrade compared to earlier versions of Python.
In order not to bring too much burden, Python 3.X was not designed with downward compatibility in mind. Many programs designed for earlier Python versions cannot run normally on Python 3.X.
Most third-party libraries are working hard to be compatible with Python 3.X versions.
(2) Window system
./configure
scriptmake
Visit http://www.python.org/download/
(3) Mac system
963624318 Just download it from the group folder Python related installation package.
brew install python to install the new version.
and
XXX\PythonXXX\Scripts To the environment variables, there are two ways:
Command line addition
Finally click Confirm to exit.
path=%path%;XXX\PythonXXX and
path=% respectively in CMD path%;XXX\PythonXXX\Scripts is enough.
XXX\PythonXXX and
XXX\PythonXXX\ ScriptsInstallation path is as follows:
Linux or Mac
pip install -U pip
python -m pip install -U pip
Includes
debugging, syntax highlighting, Project management, code jump, smart prompts, auto-complete, unit testing, version control and other functions.
(1) Provides package management function, solving the scenario where third-party package installation on Windows platform often fails;
(2) Provides environment management function, The function is similar to virtualenv, which solves the problem of coexistence and switching of multiple versions of Python. python -V
, and Python 3.8.3
will also be printed. conda create --name py27 python=2.7
After execution, a conda environment with a Python version of 2.7 named py27 will be created. conda activate py27
, and deactivate the command conda deactivate
. conda list
on the command line to view the installed libraries, as follows: # packages in environment at E:\Anaconda3:
#
# Name Version Build Channel
_ipyw_jlab_nb_ext_conf 0.1.0 py38_0
alabaster 0.7.12 py_0
anaconda 2020.07 py38_0
anaconda-client 1.7.2 py38_0
anaconda-navigator 1.9.12 py38_0
...
zlib 1.2.11 h62dcd97_4
zope 1.0 py38_1
zope.event 4.4 py38_0
zope.interface 4.7.1 py38he774522_0
zstd 1.4.5 ha9fde0e_0
After successful installation, conda will be added to the environment variables by default, so you can run the conda command directly in the command line window.
Command meaning
conda command
conda –h
View help
Create an environment named python36 based on python3.6 version
conda create - -name python36 python=3.6
Activate this environment
activate python36 (Windows), source activate python36 (linux/mac)
View python version
python -V
Exit the current environment
deactivate python36
Delete the environment
conda remove -n py27 --all
View all installed environments
conda info -e
Package management command meaning
Package management command
Install matplotlib
conda install matplotlib
View installed packages
conda list
Package update
conda update matplotlib
Remove package
conda remove matplotlib
anything is a package, everything is a package
, conda itself can be regarded as a package, the python environment can be regarded as a package, anaconda can also be regarded as It is a package, so in addition to ordinary third-party packages supporting updates, these 3 packages also support the following commands:
Operation
Command
Update conda itself
conda update conda
Update anaconda application
conda update anaconda
Update python, assuming the current python environment is 3.8.1 and the latest version is 3.8.2, then it will be upgraded to 3.8.2
conda update python
(1)在命令行中执行pip install jupyter
来安装;
(2)安装Anaconda后自带Jupyter Notebook。jupyter notebook
,就会在当前目录下启动Jupyter服务并使用默认浏览器打开页面,还可以复制链接到其他浏览器中打开,如下:
(1)notebook名称;
(2)主工具栏,提供了保存、导出、重载notebook,以及重启内核等选项;
(3)notebook主要区域,包含了notebook的内容编辑区。
上图中看到的是一个代码单元格(code cell),以[ ]
开头,在这种类型的单元格中,可以输入任意代码并执行。
例如,输入1 + 2
并按下Shift + Enter
,单元格中的代码就会被计算,光标也会被移动到一个新的单元格中。
不过,也可以重新计算整个notebook,只要点击Cell -> Run all
即可。conda install pymongo
安装pymongo。import pymongoclass Product:
def __init__(self,productId:int ,name, imageUrl, categories, tags):
self.productId = productId
self.name = name
self.imageUrl = imageUrl
self.categories = categories
self.tags = tags def __str__(self) -> str:
return self.productId +'^' + self.name +'^' + self.imageUrl +'^' + self.categories +'^' + self.tagsclass Rating:
def __init__(self, userId:int, productId:int, score:float, timestamp:int):
self.userId = userId
self.productId = productId
self.score = score
self.timestamp = timestamp def __str__(self) -> str:
return self.userId +'^' + self.productId +'^' + self.score +'^' + self.timestampif __name__ == '__main__':
myclient = pymongo.MongoClient("mongodb://127.0.0.1:27017/")
mydb = myclient["goods-users"]
# val attr = item.split("\\^")
# // 转换成Product
# Product(attr(0).toInt, attr(1).trim, attr(4).trim, attr(5).trim, attr(6).trim)
Python Data Analysis Practical Overview Data Analysis = mydb['Python Data Analysis Practical Overview Data Analysis']
with open('Python Data Analysis Practical Overview Data Analysis.csv', 'r',encoding='UTF-8') as f:
item = f.readline()
while item:
attr = item.split('^')
product = Product(int(attr[0]), attr[1].strip(), attr[4].strip(), attr[5].strip(), attr[6].strip())
Python Data Analysis Practical Overview Data Analysis.insert_one(product.__dict__)
# print(product)
# print(json.dumps(obj=product.__dict__,ensure_ascii=False))
item = f.readline()
# val attr = item.split(",")
# Rating(attr(0).toInt, attr(1).toInt, attr(2).toDouble, attr(3).toInt)
Python Data Analysis Practical Overview Data Analysis = mydb['Python Data Analysis Practical Overview Data Analysis']
with open('Python Data Analysis Practical Overview Data Analysis.csv', 'r',encoding='UTF-8') as f:
item = f.readline()
while item:
attr = item.split(',')
rating = Rating(int(attr[0]), int(attr[1].strip()), float(attr[2].strip()), int(attr[3].strip()))
Python Data Analysis Practical Overview Data Analysis.insert_one(rating.__dict__)
# print(rating)
item = f.readline()
# 数据读取f = open('商铺数据.csv', 'r', encoding='utf8')for i in f.readlines()[1:15]:
print(i.split(','))# 创建comment、price、commentlist清洗函数def fcomment(s):
'''comment清洗函数:用空格分段,选取结果list的第一个为点评数,并且转化为整型'''
if '条' in s:
return int(s.split(' ')[0])
else:
return '缺失数据'def fprice(s):
'''price清洗函数:用¥分段,选取结果list的最后一个为人均价格,并且转化为浮点型'''
if '¥' in s:
return float(s.split('¥')[-1])
else:
return '缺失数据'def fcommentl(s):
'''commentlist清洗函数:用空格分段,分别清洗出质量、环境及服务数据,并转化为浮点型'''
if ' ' in s:
quality = float(s.split(' ')[0][2:])
environment = float(s.split(' ')[1][2:])
service = float(s.split(' ')[2][2:-1])
return [quality, environment, service]
else:
return '缺失数据'# 数据处理清洗datalist = [] # 创建空列表f.seek(0)n = 0 # 创建计数变量for i in f.readlines():
data = i.split(',')
# print(data)
classify = data[0] # 提取分类
name = data[1] # 提取店铺名称
comment_count = fcomment(data[2]) # 提取评论数量
star = data[3] # 提取星级
price = fprice(data[4]) # 提取人均
address = data[5] # 提取地址
quality = fcommentl(data[6])[0] # 提取质量评分
env = fcommentl(data[6])[1] # 提取环境评分
service = fcommentl(data[6])[2] # 提取服务评分
if '缺失数据' not in [comment_count, price, quality]: # 用于判断是否有数据缺失
n += 1
data_re = [['classify', classify],
['name', name],
['comment_count', comment_count],
['star', star],
['price', price],
['address', address],
['quality', quality],
['environment', env],
['service', service]]
datalist.append(dict(data_re)) # 字典生成,并存入列表datalist
print('成功加载%i条数据' % n)
else:
continueprint(datalist)print('总共加载%i条数据' % n)f.close()
The above is the detailed content of Python Data Analysis Practical Overview Data Analysis. For more information, please follow other related articles on the PHP Chinese website!