Python crawler implementation code example for taking names-Python Tutorial-php.cn

Home

Backend Development

Python Tutorial

Python crawler implementation code example for taking names

Y2J

May 10, 2017 am 11:42 AM

python reptile

Everyone will encounter something in their life. They will not care about it before it appears, but once it comes, they will find that it is extremely important and require a major decision to be made in a short period of time. That is for yourself. Give your newborn baby a name. The following article mainly introduces how to use Python crawler to give your child a good name. Friends in need can refer to it.

Preface

I believe every parent has experienced it, because it is necessary to name the child within two weeks after birth (you need to apply for a birth certificate), which is estimated to be a lot. Everyone is like me. I was very confused at first. Although I felt that there are so many Chinese characters, I could just find any character to make a name. But later I realized that it was really not a casual thing. No matter how I thought about it, I found that it was inappropriate, so I looked around in dictionaries and online. I search and read Tang and Song Dynasty poems, the Book of Songs, and even martial arts novels. However, the name I have been thinking about for a long time often encounters the opinions and objections of my family members, such as problems such as difficulty in speaking, the same accent as the name of relatives, etc., so I fall into a cycle of repeated searches and denials. The cycle becomes more and more confusing.

So we went back to the Internet again and searched various , and found many articles on the Internet such as "A complete list of good baby boy names". These articles gave hundreds of articles at once. Thousands of names are too dizzying to use. There are many websites or apps that test names. When you enter a name, you can get a rating of eight characters or five characters. This function is quite good and can be used as a reference. However, we either need to input names one by one for testing, or These websites or APPs have very few names, or they cannot meet our needs such as qualifying words, or they start charging, and in the end we can't find any useful ones.

So I want to make a program like this:

The main function is to provide reference for batch names. These names are combined with the baby's Calculated from birth date and horoscope;
You can expand the name library. For example, if you find a batch of good names in the Book of Songs on the Internet and want to see how they are, you can add them and use them;
You can limit the characters used in the name. For example, some family trees have restrictions. If the current generation is "国", the name must have the character "国";
The name list can be given scores, so that after inversion, you can look at the names from high scores to low scores;

In this way, you can get a copy There is a list of names that match your child's birth date, your family tree restrictions, and your preferences, and the list has given scores for reference. Based on this, we can figure it out one by one to find the name we like. Of course, if you have new ideas, you can add new names to the vocabulary at any time and recalculate.

Code structure of the program

Code introduction:

/chinese-name-score Code root directory
/chinese-name-score/main Code directory
/chinese-name-score/main/dicts Dictionary file directory
/chinese-name-score/main/dicts/names_boys_double.txt Dictionary file, two-letter names for boys
/chinese-name-score/main/dicts/names_boys_single.txt Dictionary file, single-letter names for boys
/chinese-name-score/ main/dicts/names_girls_single.txt Dictionary file, two-letter names for girls
/chinese-name-score/main/dicts/names_grils_double.txt Dictionary file, one-letter names for girls
/chinese-name-score/main/outputs Output data directory
/chinese-name-score/main/outputs/names_girls_source_wxy.txt Output Sample files
/chinese-name-score/main/scripts Some scripts for preprocessing dictionary files
/chinese-name -score/main/scripts/unique_file_lines.py Set the dictionary file to remove duplicate names and blank lines in the dictionary
##/chinese-name -score/main/sys_config.py The system configuration of the program, including the crawled target URL and dictionary file path
/chinese-name-score/main/user_config.py The user configuration of the program , including the baby’s age, month, day, time, gender and other settings
/chinese-name-score/main/get_name_score.py Program running entrance

How to use the code:

If there are no qualified words, find the dictionary files names_boys_double.txt and names_grils_double.txt, you can add yourself here For some name lists found, just split them by line and add them at the end;
If there are qualified words, find the dictionary files names_boys_single.txt and names_girls_single.txt, and add your favorites here. A single word list can be divided by line and added at the end;
Open user_config.py and configure it. See the next section for configuration items;
Run the script get_name_score.py
In the outputs directory, view your own output files, which can be copied to Excel for sorting and other operations;

Program The configuration entry

The configuration of the program is as follows:

# coding:GB18030
 
"""
在这里写好配置
"""
 
setting = {}
 
# 限定字，如果配置了该值，则会取用单字字典，否则取用多字字典
setting["limit_world"] = "国"
# 姓
setting["name_prefix"] = "李"
# 性别，取值为 男 或者 女
setting["sex"] = "男"
# 省份
setting["area_province"] = "北京"
# 城市
setting["area_region"] = "海淀"
# 出生的公历年份
setting[&#39;year&#39;] = "2017"
# 出生的公历月份
setting[&#39;month&#39;] = "1"
# 出生的公历日子
setting[&#39;day&#39;] = "11"
# 出生的公历小时
setting[&#39;hour&#39;] = "11"
# 出生的公历分钟
setting[&#39;minute&#39;] = "11"
# 结果产出文件名称
setting[&#39;output_fname&#39;] = "names_girls_source_xxx.txt"

Copy after login

According to the configuration item setting["limit_world"] , the system automatically determines whether to use a single-word dictionary or a multi-word dictionary Dictionary:

If this item is set, for example, if it is equal to "国", then the program will combine all the words into names for calculation. For example, both the names Guohao and Haoguo will be calculated;
If you do not set this item and keep it empty String, the program will only read the double-word dictionary of *_double.txt

Principle of the program

This is a simple crawler. You can open the life.httpcn.com/xingming.asp website to view. This is a POST form. Fill in the required parameters and click submit. A results page will open. The bottom of the results page contains the eight-character score and the five-frame score.

If you want to get scores, you need to do two things. One is to automatically submit the form to the crawler and get the results page; the other is to extract the scores from the results page;

For the first thing, it is very simple , urllib2 can achieve it (the code is in /chinese-name-score/main/get_name_score.py):

 post_data = urllib.urlencode(params)
 req = urllib2.urlopen(sys_config.REQUEST_URL, post_data)
 content = req.read()

Copy after login

The params here is a parameter dict. In this way, POST with data is submitted. Then the result data was obtained from content.

The parameters of params are set as follows:

 params = {}
 
 # 日期类型，0表示公历，1表示农历
 params[&#39;data_type&#39;] = "0"
 params[&#39;year&#39;] = "%s" % str(user_config.setting["year"])
 params[&#39;month&#39;] = "%s" % str(user_config.setting["month"])
 params[&#39;day&#39;] = "%s" % str(user_config.setting["day"])
 params[&#39;hour&#39;] = "%s" % str(user_config.setting["hour"])
 params[&#39;minute&#39;] = "%s" % str(user_config.setting["minute"])
 params[&#39;pid&#39;] = "%s" % str(user_config.setting["area_province"])
 params[&#39;cid&#39;] = "%s" % str(user_config.setting["area_region"])
 # 喜用五行，0表示自动分析，1表示自定喜用神
 params[&#39;wxxy&#39;] = "0"
 params[&#39;xing&#39;] = "%s" % (user_config.setting["name_prefix"])
 params[&#39;ming&#39;] = name_postfix
 # 表示女，1表示男
 if user_config.setting["sex"] == "男":
  params[&#39;sex&#39;] = "1"
 else:
  params[&#39;sex&#39;] = "0"
  
 params[&#39;act&#39;] = "submit"
 params[&#39;isbz&#39;] = "1"

Copy after login

The second thing is to extract the required scores from the web page. We can use BeautifulSoup4 to achieve this, and its syntax is also very simple:

 soup = BeautifulSoup(content, &#39;html.parser&#39;, from_encoding="GB18030")
 full_name = get_full_name(name_postfix)
 
 # print soup.find(string=re.compile(u"姓名五格评分"))
 for node in soup.find_all("p", class_="chaxun_b"):
  node_cont = node.get_text()
  if u&#39;姓名五格评分&#39; in node_cont:
   name_wuge = node.find(string=re.compile(u"姓名五格评分"))
   result_data[&#39;wuge_score&#39;] = name_wuge.next_sibling.b.get_text()
  
  if u&#39;姓名八字评分&#39; in node_cont:
   name_wuge = node.find(string=re.compile(u"姓名八字评分"))
   result_data[&#39;bazi_score&#39;] = name_wuge.next_sibling.b.get_text()

Copy after login

Through this method, HTML can be parsed and the scores of eight characters and five grids can be extracted.

Example of running results

1/1287 李国锦 姓名八字评分=61.5 姓名五格评分=78.6 总分=140.1
2/1287 李国铁 姓名八字评分=61 姓名五格评分=89.7 总分=150.7
3/1287 李国晶 姓名八字评分=21 姓名五格评分=81.6 总分=102.6
4/1287 李鸣国 姓名八字评分=21 姓名五格评分=90.3 总分=111.3
5/1287 李柔国 姓名八字评分=64 姓名五格评分=78.3 总分=142.3
6/1287 李国经 姓名八字评分=21 姓名五格评分=89.8 总分=110.8
7/1287 李国蒂 姓名八字评分=22 姓名五格评分=87.2 总分=109.2
8/1287 李国登 姓名八字评分=21 姓名五格评分=81.6 总分=102.6
9/1287 李略国 姓名八字评分=21 姓名五格评分=83.7 总分=104.7
10/1287 李国添 姓名八字评分=21 姓名五格评分=81.6 总分=102.6
11/1287 李国天 姓名八字评分=22 姓名五格评分=83.7 总分=105.7
12/1287 李国田 姓名八字评分=22 姓名五格评分=93.7 总分=115.7

Copy after login

With these scores, we can sort them, which is a very practical reference.

Friendly reminder

The score is related to many factors, such as the time of birth, the limited words, the strokes of the limited words, etc. These conditions It has been decided that some names will not have high scores, so don’t be affected by this, just find the ones with high relative scores;
Currently, the program can only crawl the content of one website, and the address is http ://life.httpcn.com/xingming.asp
This list is for reference only. I have read some articles. There are many celebrities and great people in history. Their names have very low ratings but they all made great achievements. , the name does have some influence, but sometimes catchy words are the best;
After selecting a name from this list, you can check it on Baidu, Renren and other places to Just in case some negative people have the same name, or there are too many people with this name;
The eight-character score is inherited from China, and the five-frame score was invented by the Japanese in modern times. Sometimes You can also try the Western zodiac naming method, and strangely, the horoscopes and five scores are very different on different websites, which further proves that this thing is for reference only;

## The code of this article has been uploaded to github

Summary

[Related recommendations]

Python Free Video Tutorial

Python Meets Data Collection Video Tutorial

Python Learning Manual

The above is the detailed content of Python crawler implementation code example for taking names. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks ago By DDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Chat Commands and How to Use Them

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7524

CakePHP Tutorial

1378

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

Related knowledge

Python: Games, GUIs, and More Apr 13, 2025 am 12:14 AM

Python excels in gaming and GUI development. 1) Game development uses Pygame, providing drawing, audio and other functions, which are suitable for creating 2D games. 2) GUI development can choose Tkinter or PyQt. Tkinter is simple and easy to use, PyQt has rich functions and is suitable for professional development.

PHP and Python: Comparing Two Popular Programming Languages Apr 14, 2025 am 12:13 AM

PHP and Python each have their own advantages, and choose according to project requirements. 1.PHP is suitable for web development, especially for rapid development and maintenance of websites. 2. Python is suitable for data science, machine learning and artificial intelligence, with concise syntax and suitable for beginners.

How debian readdir integrates with other tools Apr 13, 2025 am 09:42 AM

The readdir function in the Debian system is a system call used to read directory contents and is often used in C programming. This article will explain how to integrate readdir with other tools to enhance its functionality. Method 1: Combining C language program and pipeline First, write a C program to call the readdir function and output the result: #include#include#include#includeintmain(intargc,char*argv[]){DIR*dir;structdirent*entry;if(argc!=2){

Python and Time: Making the Most of Your Study Time Apr 14, 2025 am 12:02 AM

To maximize the efficiency of learning Python in a limited time, you can use Python's datetime, time, and schedule modules. 1. The datetime module is used to record and plan learning time. 2. The time module helps to set study and rest time. 3. The schedule module automatically arranges weekly learning tasks.

Nginx SSL Certificate Update Debian Tutorial Apr 13, 2025 am 07:21 AM

This article will guide you on how to update your NginxSSL certificate on your Debian system. Step 1: Install Certbot First, make sure your system has certbot and python3-certbot-nginx packages installed. If not installed, please execute the following command: sudoapt-getupdatesudoapt-getinstallcertbotpython3-certbot-nginx Step 2: Obtain and configure the certificate Use the certbot command to obtain the Let'sEncrypt certificate and configure Nginx: sudocertbot--nginx Follow the prompts to select

GitLab's plug-in development guide on Debian Apr 13, 2025 am 08:24 AM

Developing a GitLab plugin on Debian requires some specific steps and knowledge. Here is a basic guide to help you get started with this process. Installing GitLab First, you need to install GitLab on your Debian system. You can refer to the official installation manual of GitLab. Get API access token Before performing API integration, you need to get GitLab's API access token first. Open the GitLab dashboard, find the "AccessTokens" option in the user settings, and generate a new access token. Will be generated

How to configure HTTPS server in Debian OpenSSL Apr 13, 2025 am 11:03 AM

Configuring an HTTPS server on a Debian system involves several steps, including installing the necessary software, generating an SSL certificate, and configuring a web server (such as Apache or Nginx) to use an SSL certificate. Here is a basic guide, assuming you are using an ApacheWeb server. 1. Install the necessary software First, make sure your system is up to date and install Apache and OpenSSL: sudoaptupdatesudoaptupgradesudoaptinsta

What service is apache Apr 13, 2025 pm 12:06 PM

Apache is the hero behind the Internet. It is not only a web server, but also a powerful platform that supports huge traffic and provides dynamic content. It provides extremely high flexibility through a modular design, allowing for the expansion of various functions as needed. However, modularity also presents configuration and performance challenges that require careful management. Apache is suitable for server scenarios that require highly customizable and meet complex needs.

See all articles