光阴似箭催人老,日月如移越少年。
<p class="l_post l_post_bright j_l_post clearfix " data-field='{"author":{"user_id":348570172, "user_name":"\u6446\u6446\u821e\u66f2","props":null},"content":{"post_id":31489927386,"is_anonym":false,"forum_id":874949,"thread_id":2108034524,"content":"912904081@qq.com\u8c22\u8c22\u6492","post_no":94,"type":"0","comment_num":0,"props":null,"post_index":0,"pb_tpoint":null}}'> <p class="d_author"> <ul class="p_author"> ... </p>
要爬取的是这个p最外层的标签里user_name和content,中间还有好多好多标签,就是把这个p里的都爬下来了,想知道怎么就留最外面我需要的这个
r = requests.get("http://tieba.baidu.com/p/2108034524?pn=4") soup = BeautifulSoup(r.content, "lxml") users = soup.find_all("p", class_="l_post") for user in users: print(user["data-field"]) # 其他处理
然后对取出的内容再进行处理
要爬取的是这个p最外层的标签里user_name和content,中间还有好多好多标签,就是把这个p里的都爬下来了,想知道怎么就留最外面我需要的这个
然后对取出的内容再进行处理