人生最曼妙的风景,竟是内心的淡定与从容!
首先你这个需求完全没有必要用csv这个模块来做, csv默认以半角逗号分隔不同的列, 但是如果单列内容有半角逗号的话, excel读取就有点尴尬. 我建议用TAB来做分隔符(定界符), 然后直接用with open(...) as fh这种方式写入
csv
excel
TAB
with open(...) as fh
除此之外, 你的代码还有两点小问题:
函数get_data其实只需要调用一次就好了, 没必要调两次
get_data
url里面多了个斜杠/
/
# -*- coding:utf-8 -*- import requests from bs4 import BeautifulSoup user_agent = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36' URL = 'http://finance.qq.com' def get_data(url): response = requests.get(url) soup = BeautifulSoup(response.text, 'lxml') soup = soup.find('p', {'id': 'listZone'}).findAll('a') return soup def main(): with open("hello.tsv", "w") as fh: fh.write("url\ttitile\n") for item in get_data(URL + "/gdyw.htm"): fh.write("{}\t{}\n".format(URL + item.get("href"), item.get_text())) if __name__ == "__main__": main()
结果:
因为你先写入了csvrow1,然后才写入csvrow2,才导致了这种结果,应该同时遍历csvrow1和2,可以这样:
for i in zip(csvrow1, csvrow2): csvfile.write(i[0] + ',' + i[1] + '\n')
# -*- coding:utf-8 -*- import requests from bs4 import BeautifulSoup import csv user_agent = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36' def get_data(url): response = requests.get(url) soup = BeautifulSoup(response.text, 'lxml') soup = soup.find('p', {'id': 'listZone'}).findAll('a') return soup urls = [] titles = [] for url in get_data('http://finance.qq.com/gdyw.htm'): urls.append('http://finance.qq.com/'+url.get('href')) for title in get_data('http://finance.qq.com/gdyw.htm'): titles.append(title.get_text()) data = [] for url, title in zip(urls, titles): row = { 'url': url, 'title': title } data.append(row) with open('a.csv', 'w') as csvfile: fieldnames = ['url', 'title'] writer = csv.DictWriter(csvfile, fieldnames=fieldnames) writer.writeheader() writer.writerows(data)
首先你这个需求完全没有必要用
csv
这个模块来做,csv
默认以半角逗号分隔不同的列, 但是如果单列内容有半角逗号的话,excel
读取就有点尴尬. 我建议用TAB
来做分隔符(定界符), 然后直接用with open(...) as fh
这种方式写入除此之外, 你的代码还有两点小问题:
函数
get_data
其实只需要调用一次就好了, 没必要调两次url里面多了个斜杠
/
结果:
因为你先写入了csvrow1,然后才写入csvrow2,才导致了这种结果,应该同时遍历csvrow1和2,可以这样: