光阴似箭催人老,日月如移越少年。
<p class="l_post l_post_bright j_l_post clearfix " data-field='{"author":{"user_id":348570172, "user_name":"\u6446\u6446\u821e\u66f2","props":null},"content":{"post_id":31489927386,"is_anonym":false,"forum_id":874949,"thread_id":2108034524,"content":"912904081@qq.com\u8c22\u8c22\u6492","post_no":94,"type":"0","comment_num":0,"props":null,"post_index":0,"pb_tpoint":null}}'> <p class="d_author"> <ul class="p_author"> ... </p>
要爬取的是這個p最外層的標籤裡user_name和content,中間還有好多好多標籤,就是把這個p裡的都爬下來了,想知道怎麼就留最外面我需要的這個
r = requests.get("http://tieba.baidu.com/p/2108034524?pn=4") soup = BeautifulSoup(r.content, "lxml") users = soup.find_all("p", class_="l_post") for user in users: print(user["data-field"]) # 其他处理
然後將取出的內容再進行處理
要爬取的是這個p最外層的標籤裡user_name和content,中間還有好多好多標籤,就是把這個p裡的都爬下來了,想知道怎麼就留最外面我需要的這個
然後將取出的內容再進行處理