Table of Contents
Crawling analysis
Home Backend Development Python Tutorial Python is a favorite name for crawling girls

Python is a favorite name for crawling girls

Jun 20, 2017 pm 02:17 PM
python girl

I went to Zhihu when I had nothing to do, and saw a lot of girls, so I grabbed a bunch of them.

Are you interested? ?

Target URL

Crawling analysis

Crawling analysis

Use pandas to operate files

import pandas as pd
fp = pd.read_excel('D:\Backup\桌面\lunzige.xlsx')

fp
Copy after login

name = fp['name'].tolist()
li1 = list(set(name))
li1

['阿蕾',
 '杨面',
 '陈10',
 '杨顺顺',
 '霧橤',
 '真顺顺真',
 '谢椿明',
 '刀刀',
 '水枪大帝',
 '倾浅',
 'Listening',
 '小火龙',
 '包子琛',
 '杨笋笋',
 '蜉蝣',
 '十元',
 '靡靡之音',
 'Real机智张',
 '陈梓小童鞋',
 '花甲',
 '窗里窗外',
 '刘梓乔',
 '璇璇97',
 'Olivia菊香小姐姐',
 '牛奶小夏目',
 '周依宁',
 '万阿咸',
 '一蓑烟雨任平生',
 '来都来了',
 '就像周一',
 'Mc蛋蛋',
 '秉剑侯',
 '李大梦Lee',
 'Diss锐雯',
 '雨音眞白',
 '半仙幺幺',
 'Natsuki是只蠢兔纸',
 '夏冰莹',
 'guuweihai',
 '阿舞',
 '肖柚妮',
 '墨脱要开',
 '芷珞',
 '舒西婷',
 'Childe0Q',
 '被压扁的海螺',
 'snow arc',
 '灰灰灰灰灰plus',
 '小兔子菲呀',
 '士多啤梨羊咩咩',
 '李小可可',
 '谁来拽我的尾巴',
 '飞鸽之舞',
 '小美',
 '樱雪绫sama',
 'zshiyao',
 '王漠里',
 'Slivan',
 '喵小虾',
 'SUSAN苏',
 '上官兰颜',
 '这个杀手不太冷',
 '看朱成碧纷思君',
 '情绪',
 '我系小忌廉',
 '一只兔',
 'June',
 '我就想改名而已',
 '温柔的大猫Leo',
 '猫芙琳',
 '以太',
 '博丽魔理沙',
 '洛丽塔',
 '羽小团',
 '娄良',
 'Rosi',
 '叶以北',
 '吃不胖的小猫',
 'Lina',
 'ingrid',
 'itttttx',
 '胡杨',
 '孙阿童',
 '林美珍',
 '赫蘿Taiga',
 '宫曼曼',
 'Yoonyicc',
 'ZW711',
 '笙箫',
 'KIKI.Liu',
 '另一只袜子',
 '荒野大嫖客',
 '少女诗',
 '芸豆豆豆豆',
 '璐璐噜',
 '棹歌',
 '梦里有只独角兽',
 'Oo澄子oO',
 '雷梅苔丝',
 'CherryZhao',
 '李萬一',
 '琴脂',
 '鹿斑比',
 'Chris姬-云烟',
 'hyoram',
 '蔗蔗蔗',
 '柚子Ruby',
 'Sheena',
 '孟德尔',
 'kaka小师妹',
 '桢视明',
 '大豆苗',
 '少女开膛手',
 '陈诗茗']
Copy after login

Then, the next step is to segment the name, jieba segmentation, you deserve it. fxsjy/jieba

<code>li2 = ''.join(li1)
li2

'阿蕾杨面陈10杨顺顺霧橤真顺顺真谢椿明刀刀水枪大帝倾浅Listening小火龙包子琛杨笋笋蜉蝣十元靡靡之音Real机智张陈梓小童鞋花甲窗里窗外刘梓乔璇璇97Olivia菊香小姐姐牛奶小夏目周依宁万阿咸一蓑烟雨任平生来都来了就像周一Mc蛋蛋秉剑侯李大梦LeeDiss锐雯雨音眞白半仙幺幺Natsuki是只蠢兔纸夏冰莹guuweihai阿舞肖柚妮墨脱要开芷珞舒西婷Childe0Q被压扁的海螺snow arc灰灰灰灰灰plus小兔子菲呀士多啤梨羊咩咩李小可可谁来拽我的尾巴飞鸽之舞小美樱雪绫samazshiyao王漠里Slivan喵小虾SUSAN苏上官兰颜这个杀手不太冷看朱成碧纷思君情绪我系小忌廉一只兔June我就想改名而已温柔的大猫Leo猫芙琳以太博丽魔理沙洛丽塔羽小团娄良Rosi叶以北吃不胖的小猫Linaingriditttttx胡杨孙阿童林美珍赫蘿Taiga宫曼曼YoonyiccZW711笙箫KIKI.Liu另一只袜子荒野大嫖客少女诗芸豆豆豆豆璐璐噜棹歌梦里有只独角兽Oo澄子oO雷梅苔丝CherryZhao李萬一琴脂鹿斑比Chris姬-云烟hyoram蔗蔗蔗柚子RubySheena孟德尔kaka小师妹桢视明大豆苗少女开膛手陈诗茗'</code><br><br><br>
Copy after login

The next step is to segment the words and create the image cloud

import jieba
seg_list = jieba.cut(li2)
word = "/".join(seg_list)
print("Full Mode: " + "/ ".join(seg_list)) 

Building prefix dict from the default dictionary ...
Dumping model to file cache C:\Users\ADMINI~1\AppData\Local\Temp\jieba.cache
Loading model cost 1.148 seconds.
Prefix dict has been built succesfully.
Full Mode: 阿蕾/ 杨/ 面陈/ 10/ 杨/ 顺顺/ 霧/ 橤/ 真/ 顺顺/ 真/ 谢椿明/ 刀刀/ 水枪/ 大帝/ 倾浅/ Listening/ 小/ 火龙/ 包子/ 琛/ 杨笋/ 笋/ 蜉蝣/ 十元/ 靡靡之音/ Real/ 机智/ 张/ 陈梓/ 小/ 童鞋/ 花甲/ 窗里/ 窗外/ 刘梓乔/ 璇/ 璇/ 97Olivia/ 菊香/ 小姐姐/ 牛奶/ 小夏目/ 周依宁/ 万/ 阿/ 咸一/ 蓑/ 烟雨任/ 平生/ 来/ 都/ 来/ 了/ 就/ 像/ 周一/ Mc/ 蛋蛋/ 秉剑侯/ 李大梦/ LeeDiss/ 锐雯雨/ 音眞白/ 半仙/ 幺/ 幺/ Natsuki/ 是/ 只/ 蠢/ 兔纸/ 夏/ 冰莹/ guuweihai/ 阿舞/ 肖柚妮/ 墨脱/ 要/ 开芷/ 珞/ 舒西婷/ Childe0Q/ 被/ 压扁/ 的/ 海螺/ snow/  / arc/ 灰灰/ 灰灰/ 灰/ plus/ 小兔子/ 菲/ 呀/ 士多啤梨/ 羊/ 咩/ 咩/ 李小/ 可可/ 谁/ 来/ 拽/ 我/ 的/ 尾巴/ 飞鸽/ 之舞/ 小美/ 樱雪/ 绫/ samazshiyao/ 王漠/ 里/ Slivan/ 喵/ 小虾/ SUSAN/ 苏/ 上官/ 兰颜/ 这个/ 杀手/ 不/ 太冷/ 看朱成碧/ 纷思君/ 情绪/ 我系/ 小忌廉/ 一只/ 兔/ June/ 我/ 就/ 想/ 改名/ 而已/ 温柔/ 的/ 大猫/ Leo/ 猫/ 芙琳/ 以太/ 博丽/ 魔理沙/ 洛丽塔/ 羽小团/ 娄良/ Rosi/ 叶/ 以北/ 吃不胖/ 的/ 小猫/ Linaingriditttttx/ 胡杨/ 孙阿童/ 林美珍/ 赫蘿/ Taiga/ 宫曼曼/ YoonyiccZW711/ 笙箫/ KIKI/ ./ Liu/ 另一只/ 袜子/ 荒野/ 大/ 嫖客/ 少女/ 诗/ 芸豆/ 豆豆/ 豆璐璐噜/ 棹/ 歌梦里/ 有/ 只/ 独角兽/ Oo/ 澄子/ oO/ 雷梅/ 苔丝/ CherryZhao/ 李萬/ 一琴脂/ 鹿斑/ 比/ Chris/ 姬/ -/ 云烟/ hyoram/ 蔗蔗蔗/ 柚子/ RubySheena/ 孟德尔/ kaka/ 小/ 师妹/ 桢视/ 明大/ 豆苗/ 少女/ 开膛手/ 陈诗/ 茗
Copy after login

The next step is to draw the image cloud. I encountered many pitfalls using jupyter. .

<code># -*- coding: utf-8 -*-
import matplotlib.pyplot as plt
from wordcloud import WordCloud,STOPWORDS,ImageColorGenerator

# 直接从文件读取数据

text = '''阿蕾/杨/面陈/10/杨/顺顺/霧/橤/真/顺顺/真/谢椿明/刀刀/水枪/大帝/倾浅/Listening/小/火龙/包子/琛/杨笋/笋/蜉蝣/十元/靡靡之音/Real/机智/张/陈梓/小/童鞋/花甲/窗里/窗外/刘梓乔/璇/璇/97Olivia/菊香/小姐姐/牛奶/小夏目/周依宁/万/阿/咸一/蓑/烟雨任/平生/来/都/来/了/就/像/周一/Mc/蛋蛋/秉剑侯/李大梦/LeeDiss/锐雯雨/音眞白/半仙/幺/幺/Natsuki/是/只/蠢/兔纸/夏/冰莹/guuweihai/阿舞/肖柚妮/墨脱/要/开芷/珞/舒西婷/Childe0Q/被/压扁/的/海螺/snow/ /arc/灰灰/灰灰/灰/plus/小兔子/菲/呀/士多啤梨/羊/咩/咩/李小/可可/谁/来/拽/我/的/尾巴/飞鸽/之舞/小美/樱雪/绫/samazshiyao/王漠/里/Slivan/喵/小虾/SUSAN/苏/上官/兰颜/这个/杀手/不/太冷/看朱成碧/纷思君/情绪/我系/小忌廉/一只/兔/June/我/就/想/改名/而已/温柔/的/大猫/Leo/猫/芙琳/以太/博丽/魔理沙/洛丽塔/羽小团/娄良/Rosi/叶/以北/吃不胖/的/小猫/Linaingriditttttx/胡杨/孙阿童/林美珍/赫蘿/Taiga/宫曼曼/YoonyiccZW711/笙箫/KIKI/./Liu/另一只/袜子/荒野/大/嫖客/少女/诗/芸豆/豆豆/豆璐璐噜/棹/歌梦里/有/只/独角兽/Oo/澄子/oO/雷梅/苔丝/CherryZhao/李萬/一琴脂/鹿斑/比/Chris/姬/-/云烟/hyoram/蔗蔗蔗/柚子/RubySheena/孟德尔/kaka/小/师妹/桢视/明大/豆苗/少女/开膛手/陈诗/茗'''

backgroud_Image = plt.imread('girl.jpg')
wc = WordCloud( background_color = 'white',    # 设置背景颜色
                mask = backgroud_Image,        # 设置背景图片
                max_words = 2000,            # 设置最大现实的字数
                stopwords = STOPWORDS,        # 设置停用词
                font_path = 'C:/Users/Windows/fonts/msyh.ttf',# 设置字体格式,如不设置显示不了中文
                max_font_size = 300,            # 设置字体最大值
                random_state = 50,            # 设置有多少种随机生成状态,即有多少种配色方案
                )
wc.generate(text)
image_colors = ImageColorGenerator(backgroud_Image)
#wc.recolor(color_func = image_colors)
plt.imshow(wc)
plt.axis('off')
plt.show()</code><br><br>
Copy after login

0    陈诗茗    
1    李大梦Lee    
2    snow arc    
3    夏冰莹    
4    Sheena    
5    喵小虾    
6    李大梦Lee    
7    李大梦Lee    
8    以太    
9    zshiyao    
10    SUSAN苏
Copy after login

The above is the detailed content of Python is a favorite name for crawling girls. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How to efficiently integrate Node.js or Python services under LAMP architecture? How to efficiently integrate Node.js or Python services under LAMP architecture? Apr 01, 2025 pm 02:48 PM

Many website developers face the problem of integrating Node.js or Python services under the LAMP architecture: the existing LAMP (Linux Apache MySQL PHP) architecture website needs...

How to solve the permissions problem encountered when viewing Python version in Linux terminal? How to solve the permissions problem encountered when viewing Python version in Linux terminal? Apr 01, 2025 pm 05:09 PM

Solution to permission issues when viewing Python version in Linux terminal When you try to view Python version in Linux terminal, enter python...

What is the reason why pipeline persistent storage files cannot be written when using Scapy crawler? What is the reason why pipeline persistent storage files cannot be written when using Scapy crawler? Apr 01, 2025 pm 04:03 PM

When using Scapy crawler, the reason why pipeline persistent storage files cannot be written? Discussion When learning to use Scapy crawler for data crawler, you often encounter a...

What is the reason why the Python process pool handles concurrent TCP requests and causes the client to get stuck? What is the reason why the Python process pool handles concurrent TCP requests and causes the client to get stuck? Apr 01, 2025 pm 04:09 PM

Python process pool handles concurrent TCP requests that cause client to get stuck. When using Python for network programming, it is crucial to efficiently handle concurrent TCP requests. ...

How to view the original functions encapsulated internally by Python functools.partial object? How to view the original functions encapsulated internally by Python functools.partial object? Apr 01, 2025 pm 04:15 PM

Deeply explore the viewing method of Python functools.partial object in functools.partial using Python...

Python Cross-platform Desktop Application Development: Which GUI Library is the best for you? Python Cross-platform Desktop Application Development: Which GUI Library is the best for you? Apr 01, 2025 pm 05:24 PM

Choice of Python Cross-platform desktop application development library Many Python developers want to develop desktop applications that can run on both Windows and Linux systems...

Python hourglass graph drawing: How to avoid variable undefined errors? Python hourglass graph drawing: How to avoid variable undefined errors? Apr 01, 2025 pm 06:27 PM

Getting started with Python: Hourglass Graphic Drawing and Input Verification This article will solve the variable definition problem encountered by a Python novice in the hourglass Graphic Drawing Program. Code...

Do Google and AWS provide public PyPI image sources? Do Google and AWS provide public PyPI image sources? Apr 01, 2025 pm 05:15 PM

Many developers rely on PyPI (PythonPackageIndex)...

See all articles