Home > Backend Development > Python Tutorial > Use Python crawler to give your child a good name

Use Python crawler to give your child a good name

高洛峰
Release: 2017-02-20 10:13:24
Original
1673 people have browsed it

Preface

I believe every parent has experienced it, because it is necessary to choose a name within two weeks after the child is born (you need to apply for a birth certificate), and it is estimated that many people are like this I was the same. I was very confused at first. Although I felt that there are so many Chinese characters, I could just pick any character to make a name. But later I realized that it was really not a casual thing. No matter how I thought about it, I found that it was inappropriate. So I looked around in dictionaries, searched online, and Tang poetry, Song lyrics, The Book of Songs, and even martial arts novels. However, the name that I have been thinking about for a long time often encounters the opinions and objections of my family members, such as problems such as difficulty in pronouncing it, the same name as the relative and the accent, etc. In this way, I fall into a cycle of repeated search and denial, and the more and more It's getting more and more confusing.

So we went back to the Internet to search again and found many articles on the Internet such as "A complete list of good baby boy names". These articles gave hundreds or even thousands of names at once. Too dazzled to use. There are many websites or apps that test names. When you enter a name, you can get a rating of eight characters or five characters. This function is quite good and can be used as a reference. However, we either need to input names one by one for testing, or These websites or APPs have very few names, or they cannot meet our needs such as qualifying words, or they start charging, and in the end we can't find any useful ones.

So I want to make a program like this:

  1. The main function is to provide reference for batch names. These names are combined with the baby's Calculated from the birth date;

  2. You can expand the name library. For example, if you find a batch of good names in the Book of Songs on the Internet and want to see how they are, you can add them and use them;

  3. You can limit the characters used in the name. For example, some family trees have restrictions. If the current generation is "国", the name must have the character "国";

  4. The name list can be given scores, so that after inversion, you can look at the names from high scores to low scores;

In this way, you can get a copy There is a list of names that match your child's birth date, your family tree restrictions, and your preferences, and the list has given scores for reference. Based on this, we can figure it out one by one to find the name we like. Of course, if you have new ideas, you can add new names to the vocabulary at any time and recalculate.

Code structure of the program

Use Python crawler to give your child a good name

Code introduction:

  • /chinese-name-score Code root directory

  • /chinese-name-score/main Code directory

  • /chinese-name- score/main/dicts Dictionary file directory

  • /chinese-name-score/main/dicts/names_boys_double.txt Dictionary file, boys’ double-letter names

  • /chinese-name-score/main/dicts/names_boys_single.txt Dictionary file, single-letter names for boys

  • ##/chinese-name-score/main/dicts/names_girls_single. txt dictionary file, girls’ two-letter names

  • ##/chinese-name-score/main/dicts/names_grils_double.txt dictionary file, girls’ one-letter names
  • /chinese-name-score/main/outputs Output data directory
  • /chinese-name-score/main/outputs/names_girls_source_wxy.txt Output sample file
  • /chinese-name-score/main/scripts Some scripts for preprocessing dictionary files
  • /chinese-name-score/main/scripts /unique_file_lines.py Sets the dictionary file to remove duplicate names and blank lines in the dictionary
  • /chinese-name-score/main/sys_config.py System configuration of the program, including Crawling the target URL, dictionary file path
  • /chinese-name-score/main/user_config.py program user configuration, including the baby’s age, month, day, hour, gender and other settings
  • /chinese-name-score/main/get_name_score.py Program running entrance

How to use the code:

    If there are no qualified words, find the dictionary files names_boys_double.txt and names_grils_double.txt. You can add some name lists you found here, split them by lines and add them at the end. ;
  1. If there are qualified words, find the dictionary files names_boys_single.txt and names_girls_single.txt, add a list of single words that you like in advance, split them by lines and add them at the end;
  2. Open user_config.py and configure it. See the next section for configuration items;
  3. Run the script get_name_score.py
  4. In the outputs directory, view your own output files, which can be copied to Excel for sorting and other operations;

Configuration entrance of the program

The configuration of the program is as follows:

# coding:GB18030
 
"""
在这里写好配置
"""
 
setting = {}
 
# 限定字,如果配置了该值,则会取用单字字典,否则取用多字字典
setting["limit_world"] = "国"
# 姓
setting["name_prefix"] = "李"
# 性别,取值为 男 或者 女
setting["sex"] = "男"
# 省份
setting["area_province"] = "北京"
# 城市
setting["area_region"] = "海淀"
# 出生的公历年份
setting['year'] = "2017"
# 出生的公历月份
setting['month'] = "1"
# 出生的公历日子
setting['day'] = "11"
# 出生的公历小时
setting['hour'] = "11"
# 出生的公历分钟
setting['minute'] = "11"
# 结果产出文件名称
setting['output_fname'] = "names_girls_source_xxx.txt"
Copy after login

According to the configuration item

setting["limit_world"]

, the system Automatically decide whether to use a single-character dictionary or a multi-character dictionary:

  1. 如果设置了该项,比如等于“国”,那么程序会组合所有的单字为名字用于计算,比如国浩和浩国两个名字都会计算;

  2. 如果不设置该项,保持空字符串,则程序只会读取*_double.txt的双字词典

程序的原理

这是一个简单的爬虫。大家可以打开http://www.php.cn/网站查看,这是一个POST表单,填写需要的参数,点提交,就会打开一个结果页面,结果页面的最下方包含了八字分数和五格分数。

如果想得到分数,就需要做两件事情,一是爬虫自动提交表单,获取结果页面;二是从结果页面提取分数;

对于第一件事情,很简单,urllib2即可实现(代码在/chinese-name-score/main/get_name_score.py):

 post_data = urllib.urlencode(params)
 req = urllib2.urlopen(sys_config.REQUEST_URL, post_data)
 content = req.read()
Copy after login

这里的params是个参数dict,使用这种方式,就进行了POST带数据的提交,然后从content得到了结果数据。

params的参数设定如下:

 params = {}
 
 # 日期类型,0表示公历,1表示农历
 params['data_type'] = "0"
 params['year'] = "%s" % str(user_config.setting["year"])
 params['month'] = "%s" % str(user_config.setting["month"])
 params['day'] = "%s" % str(user_config.setting["day"])
 params['hour'] = "%s" % str(user_config.setting["hour"])
 params['minute'] = "%s" % str(user_config.setting["minute"])
 params['pid'] = "%s" % str(user_config.setting["area_province"])
 params['cid'] = "%s" % str(user_config.setting["area_region"])
 # 喜用五行,0表示自动分析,1表示自定喜用神
 params['wxxy'] = "0"
 params['xing'] = "%s" % (user_config.setting["name_prefix"])
 params['ming'] = name_postfix
 # 表示女,1表示男
 if user_config.setting["sex"] == "男":
  params['sex'] = "1"
 else:
  params['sex'] = "0"
  
 params['act'] = "submit"
 params['isbz'] = "1"
Copy after login

第二件事情,就是从网页中提取需要的分数,我们可以使用BeautifulSoup4来实现,其语法也很简单:

 soup = BeautifulSoup(content, 'html.parser', from_encoding="GB18030")
 full_name = get_full_name(name_postfix)
 
 # print soup.find(string=re.compile(u"姓名五格评分"))
 for node in soup.find_all("p", class_="chaxun_b"):
  node_cont = node.get_text()
  if u'姓名五格评分' in node_cont:
   name_wuge = node.find(string=re.compile(u"姓名五格评分"))
   result_data['wuge_score'] = name_wuge.next_sibling.b.get_text()
  
  if u'姓名八字评分' in node_cont:
   name_wuge = node.find(string=re.compile(u"姓名八字评分"))
   result_data['bazi_score'] = name_wuge.next_sibling.b.get_text()
Copy after login

通过该方法,就能对HTML解析,提取八字和五格的分数。

运行结果事例

1/1287 李国锦 姓名八字评分=61.5 姓名五格评分=78.6 总分=140.1
2/1287 李国铁 姓名八字评分=61 姓名五格评分=89.7 总分=150.7
3/1287 李国晶 姓名八字评分=21 姓名五格评分=81.6 总分=102.6
4/1287 李鸣国 姓名八字评分=21 姓名五格评分=90.3 总分=111.3
5/1287 李柔国 姓名八字评分=64 姓名五格评分=78.3 总分=142.3
6/1287 李国经 姓名八字评分=21 姓名五格评分=89.8 总分=110.8
7/1287 李国蒂 姓名八字评分=22 姓名五格评分=87.2 总分=109.2
8/1287 李国登 姓名八字评分=21 姓名五格评分=81.6 总分=102.6
9/1287 李略国 姓名八字评分=21 姓名五格评分=83.7 总分=104.7
10/1287 李国添 姓名八字评分=21 姓名五格评分=81.6 总分=102.6
11/1287 李国天 姓名八字评分=22 姓名五格评分=83.7 总分=105.7
12/1287 李国田 姓名八字评分=22 姓名五格评分=93.7 总分=115.7
Copy after login

有了这些分数,我们就可以进行排序,是一个很实用的参考资料。

友情提示

  1. 分数跟很多因素有关,比如出生时刻、已经限定的字、限定字的笔画等因素,这些条件决定了有些名字不会分数高,不要受此影响,找出相对分数高的就可以了;

  2. 目前程序只能抓取一个网站的内容,地址是http://life.httpcn.com/xingming.asp

  3. 本列表仅供参考,看过一些文章,历史上很多名人伟人,姓名八字评分都非常低但是都建功立业,名字确实会有些影响但有时候朗朗上口就是最好的;

  4. 从本列表中选取名字之后,可以在百度、人人网等地方查查,以防有些负面的人重名、或者起这个名字的人太多了烂大街;

  5. 八字分数是中国传承,五格分数是日本人近代发明的,有时候也可以试试西方的星座起名法,并且奇怪的是八字和五个分数不同网站打分相差很大,更说明了这东西只供参考;

本文的代码已上传到github 

更多Use Python crawler to give your child a good name相关文章请关注PHP中文网!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template