Community Learn Tools Library Leisure

English

Home > Backend Development > PHP Tutorial > javascript - Python逐行读取txt中的url文件并进行爬虫

javascript - Python逐行读取txt中的url文件并进行爬虫

WBOY

Release： 2016-06-06 20:11:35

Original

1432 people have browsed it

毕设项目需要爬取coursera的课程数据，已经把所有课程的url链接爬下来了，存在了txt中，一行是一个课程的url，现在想要获取每门课程的详细信息，如instructor，syllabus 和detail information这几项，但是都需要点进各个课程的网页链接中取爬取。码渣求大神指导一下，来段伪码就更好啦！thx

回复内容：

毕设项目需要爬取coursera的课程数据，已经把所有课程的url链接爬下来了，存在了txt中，一行是一个课程的url，现在想要获取每门课程的详细信息，如instructor，syllabus 和detail information这几项，但是都需要点进各个课程的网页链接中取爬取。码渣求大神指导一下，来段伪码就更好啦！thx

你好！不知道这是不是你想要的答案：

<code>f = open("coursera.txt","r")
urlList = f.readlines()
for url in urlList:
    r = requests.get(url)
    ''''''</code>

Copy after login

Good Luck ! ^_

如果是爬取coursera的课程数据，建议你用scrapy爬取，这样不需要提前抓取所有课程的url，只要写好匹配url就行。

scrapy教程 http://scrapy-chs.readthedocs.org/zh_CN/0.24/intro/tutorial.html
项目参考 https://github.com/Junnplus/OnlineJudgeCrawlerCore

Related labels：

html java javascript php python

source：php.cn

Previous article：将数组转换成字符串存储有没有更好的方法或函数(字符串长度越小越好)？ Next article：mail - PHP的SMTP发送邮件的标准类库

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Latest Articles by Author

What is a NullPointerException, and how do I fix it?

2024-10-22 09:46:29
From Novice to Coder: Your Journey Begins with C Fundamentals

2024-10-13 13:53:41
Unlocking Web Development with PHP: A Beginner's Guide

2024-10-12 12:15:51
Demystifying C: A Clear and Simple Path for New Programmers

2024-10-11 22:47:31
Unlock Your Coding Potential: C Programming for Absolute Beginners

2024-10-11 19:36:51
Unleash Your Inner Programmer: C for Absolute Beginners

2024-10-11 15:50:41
Automate Your Life with C: Scripts and Tools for Beginners

2024-10-11 15:07:41
PHP Made Easy: Your First Steps in Web Development

2024-10-11 14:21:21
Build Anything with Python: A Beginner's Guide to Unleashing Your Creativity

2024-10-11 12:59:11
The Key to Coding: Unlocking the Power of Python for Beginners

2024-10-11 12:17:31

Latest Issues

Python/MySQL cannot persist integer data correctly No code is required here. I want to save a very long number because I'm making a game and ...

From 2024-04-04 19:09:44

0

1

367

Using selenium want to click and define URL in class I need another tip today. I'm trying to build Python/Selenium code and the idea is to clic...

From 2024-04-04 14:14:44

0

1

3492

Selenium + Python - inspect image via execute_script I need to verify that an image is displayed on the page using selenium in python. For exam...

From 2024-04-03 09:32:15

0

1

375

How to keep the first X rows and delete table rows I have a big table with millions of records in MySQLincident_archive, I want to sort the r...

From 2024-04-01 18:32:54

0

1

347

How to scrape specific Google Weather text using BeautifulSoup? How to find the course text "New York City, USA" in Python using BeautifulSoup? ...

From 2024-04-01 14:06:14

0

1

308

Related Topics

More>

Popular Recommendations

Popular Tutorials

More>

Related Tutorials

Popular Recommendations

Latest courses

Latest Downloads

More>

Web Effects

Website Source Code

Website Materials

Front End Template