Home Backend Development PHP Tutorial [Python] Web crawler (3): Exception handling and classification of HTTP status codes

[Python] Web crawler (3): Exception handling and classification of HTTP status codes

Aug 08, 2016 am 09:27 AM
nbsp print quot urllib

Let’s first talk about HTTP exception handling.
When urlopen cannot handle a response, a urlError is generated.
However, common Python APIs exceptions such as ValueError, TypeError, etc. will also occur at the same time.
HTTPError is a subclass of urlError, usually generated in specific HTTP URLs.

1.URLError
Usually, URLError occurs when there is no network connection (no routing to a specific server), or the server does not exist.

In this case, the exception will also have the "reason" attribute, which is a tuple (can be understood as an immutable array),

contains an error number and an error message.

Let’s build a urllib2_test06.py to experience exception handling:

[python] view plaincopy

  1. import urllib2
  2. req = urllib2.Request('http://www.baibai.com')
  3. try: urllib2.urlopen(req)
  4. except urllib2.URLError, e:
  5. print e.reason

Press F5 and you can see the printed content is:

[Errno 11001] getaddrinfo failed

That is to say, the error number is 11001 and the content is getaddrinfo failed


2 .HTTPError
Every HTTP response object response on the server contains a numeric "status code".

Sometimes the status code indicates that the server cannot complete the request. The default handler handles part of this response for you.

For example: If the response is a "redirect" and the client needs to obtain the document from another address, urllib2 will handle it for you.

Others that cannot be handled, urlopen will generate an HTTPError.

Typical errors include "404" (page not found), "403" (request forbidden), and "401" (request with verification).

HTTP status code indicates the status of the response returned by the HTTP protocol.

For example, if the client sends a request to the server, if the requested resource is successfully obtained, the returned status code is 200, indicating that the response is successful.

If the requested resource does not exist, a 404 error is usually returned.

HTTP status codes are usually divided into 5 types, starting with five numbers from 1 to 5 and consisting of 3-digit integers:

---------------- -------------------------------------------------- ----------------------------------

200: Request successful Processing method: Get the response content and process it

201: The request is completed, resulting in the creation of a new resource. The URI of the newly created resource can be obtained in the response entity. Processing method: Will not be encountered in the crawler.

202: The request is accepted, but the processing has not yet been completed. Processing method: Blocking and waiting.

204: Already implemented on the server side. The request was made, but no new information was returned. If the client is a user agent, it does not need to update its own document view for this purpose. Processing method: Discard

300: This status code is not directly used by HTTP/1.0 applications, but is only used as the default interpretation of 3XX type responses. There are multiple requested resources available. Processing method: If it can be processed in the program, it will be further processed. If it cannot be processed in the program, it will be discarded. 301: The requested resource will be assigned a permanent URL, so that this resource can be accessed through this URL in the future. Processing method : Redirect to the assigned URL
302: The requested resource is temporarily saved at a different URL Processing method: Redirect to the temporary URL

304 The requested resource is not updated Processing method: discard

400 Illegal request Processing method: discard

401 Unauthorized Processing method: discard

403 Prohibited Processing method: discard

404 None Found Processing method: discard

5XX The status code starting with "5" indicates that the server has found an error and cannot continue to execute the request Processing method: discard

----- -------------------------------------------------- ----------------------------------------

HTTPError instance will be generated There is an integer 'code' attribute, which is the relevant error number sent by the server.

Error Codes
Because the default processor handles the redirection (numbers other than 300), and numbers in the 100-299 range indicate success, you can only see error numbers 400-599.
BaseHTTPServer.BaseHTTPRequestHandler.response is a very useful response number dictionary, showing all response numbers used by the HTTP protocol.

When an error number is generated, the server returns an HTTP error number and an error page.

You can use HTTPError instance as the response object response returned by the page.

This means that like the error attribute, it also contains read, geturl, and info methods.

Let’s build a urllib2_test07.py to experience it:

[python] view plaincopy

  1. import urllib2
  2. req = urllib2.Request('http://bbs.csdn.net/callmewhy')
  3. try :
  4. urllib2.urlopen(req)
  5. except urllib2.URLError, e:
  6. print e.code
  7. ​​​​​#print e.read()​​

Press F5 and you can see that the 404 error code is output, which means that this page is not found.


3.Wrapping

So if you want to prepare for HTTPError or URLError, there will be two basic ways. It is recommended to use the second one.

Let’s build a urllib2_test08.py to demonstrate the first exception handling solution:

[python] view plaincopy

  1. from urllib2 import Request, urlopen, URLError, HTTPError  
  2.   
  3. req = Request('http://bbs.csdn.net/callmewhy')  
  4.   
  5. try:  
  6.   
  7.     response = urlopen(req)  
  8.   
  9. except HTTPError, e:  
  10.   
  11.     print 'The server couldn't fulfill the request.'  
  12.   
  13.     print 'Error code: ', e.code  
  14.   
  15. except URLError, e:  
  16.   
  17.     print 'We failed to reach a server.'  
  18.   
  19.     print 'Reason: ', e.reason  
  20.   
  21. else:  
  22.     print 'No exception was raised.'  
  23.     # everything is fine  

和其他语言相似,try之后捕获异常并且将其内容打印出来。

这里要注意的一点,except HTTPError 必须在第一个,否则except URLError将同样接受到HTTPError 
因为HTTPError是URLError的子类,如果URLError在前面它会捕捉到所有的URLError(包括HTTPError )。



我们建一个urllib2_test09.py来示范一下第二种异常处理的方案:

[python] view plaincopy

  1. from urllib2 import Request, urlopen, URLError, HTTPError  
  2.   
  3. req = Request('http://bbs.csdn.net/callmewhy')  
  4.     
  5. try:    
  6.     
  7.     response = urlopen(req)    
  8.     
  9. except URLError, e:    
  10.   
  11.     if hasattr(e, 'code'):    
  12.     
  13.         print 'The server couldn't fulfill the request.'    
  14.     
  15.         print 'Error code: ', e.code    
  16.   
  17.     elif hasattr(e, 'reason'):    
  18.     
  19.         print 'We failed to reach a server.'    
  20.     
  21.         print 'Reason: ', e.reason    
  22.     
  23.     
  24. else:    
  25.     print 'No exception was raised.'    
  26.     # everything is fine    

以上就介绍了[Python]网络爬虫(三):异常的处理和HTTP状态码的分类,包括了方面的内容,希望对PHP教程有兴趣的朋友有所帮助。

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Solution: Your organization requires you to change your PIN Solution: Your organization requires you to change your PIN Oct 04, 2023 pm 05:45 PM

The message "Your organization has asked you to change your PIN" will appear on the login screen. This happens when the PIN expiration limit is reached on a computer using organization-based account settings, where they have control over personal devices. However, if you set up Windows using a personal account, the error message should ideally not appear. Although this is not always the case. Most users who encounter errors report using their personal accounts. Why does my organization ask me to change my PIN on Windows 11? It's possible that your account is associated with an organization, and your primary approach should be to verify this. Contacting your domain administrator can help! Additionally, misconfigured local policy settings or incorrect registry keys can cause errors. Right now

How to adjust window border settings on Windows 11: Change color and size How to adjust window border settings on Windows 11: Change color and size Sep 22, 2023 am 11:37 AM

Windows 11 brings fresh and elegant design to the forefront; the modern interface allows you to personalize and change the finest details, such as window borders. In this guide, we'll discuss step-by-step instructions to help you create an environment that reflects your style in the Windows operating system. How to change window border settings? Press + to open the Settings app. WindowsI go to Personalization and click Color Settings. Color Change Window Borders Settings Window 11" Width="643" Height="500" > Find the Show accent color on title bar and window borders option, and toggle the switch next to it. To display accent colors on the Start menu and taskbar To display the theme color on the Start menu and taskbar, turn on Show theme on the Start menu and taskbar

How to change title bar color on Windows 11? How to change title bar color on Windows 11? Sep 14, 2023 pm 03:33 PM

By default, the title bar color on Windows 11 depends on the dark/light theme you choose. However, you can change it to any color you want. In this guide, we'll discuss step-by-step instructions for three ways to change it and personalize your desktop experience to make it visually appealing. Is it possible to change the title bar color of active and inactive windows? Yes, you can change the title bar color of active windows using the Settings app, or you can change the title bar color of inactive windows using Registry Editor. To learn these steps, go to the next section. How to change title bar color in Windows 11? 1. Using the Settings app press + to open the settings window. WindowsI go to "Personalization" and then

OOBELANGUAGE Error Problems in Windows 11/10 Repair OOBELANGUAGE Error Problems in Windows 11/10 Repair Jul 16, 2023 pm 03:29 PM

Do you see "A problem occurred" along with the "OOBELANGUAGE" statement on the Windows Installer page? The installation of Windows sometimes stops due to such errors. OOBE means out-of-the-box experience. As the error message indicates, this is an issue related to OOBE language selection. There is nothing to worry about, you can solve this problem with nifty registry editing from the OOBE screen itself. Quick Fix – 1. Click the “Retry” button at the bottom of the OOBE app. This will continue the process without further hiccups. 2. Use the power button to force shut down the system. After the system restarts, OOBE should continue. 3. Disconnect the system from the Internet. Complete all aspects of OOBE in offline mode

How to enable or disable taskbar thumbnail previews on Windows 11 How to enable or disable taskbar thumbnail previews on Windows 11 Sep 15, 2023 pm 03:57 PM

Taskbar thumbnails can be fun, but they can also be distracting or annoying. Considering how often you hover over this area, you may have inadvertently closed important windows a few times. Another disadvantage is that it uses more system resources, so if you've been looking for a way to be more resource efficient, we'll show you how to disable it. However, if your hardware specs can handle it and you like the preview, you can enable it. How to enable taskbar thumbnail preview in Windows 11? 1. Using the Settings app tap the key and click Settings. Windows click System and select About. Click Advanced system settings. Navigate to the Advanced tab and select Settings under Performance. Select "Visual Effects"

Display scaling guide on Windows 11 Display scaling guide on Windows 11 Sep 19, 2023 pm 06:45 PM

We all have different preferences when it comes to display scaling on Windows 11. Some people like big icons, some like small icons. However, we all agree that having the right scaling is important. Poor font scaling or over-scaling of images can be a real productivity killer when working, so you need to know how to customize it to get the most out of your system's capabilities. Advantages of Custom Zoom: This is a useful feature for people who have difficulty reading text on the screen. It helps you see more on the screen at one time. You can create custom extension profiles that apply only to certain monitors and applications. Can help improve the performance of low-end hardware. It gives you more control over what's on your screen. How to use Windows 11

10 Ways to Adjust Brightness on Windows 11 10 Ways to Adjust Brightness on Windows 11 Dec 18, 2023 pm 02:21 PM

Screen brightness is an integral part of using modern computing devices, especially when you look at the screen for long periods of time. It helps you reduce eye strain, improve legibility, and view content easily and efficiently. However, depending on your settings, it can sometimes be difficult to manage brightness, especially on Windows 11 with the new UI changes. If you're having trouble adjusting brightness, here are all the ways to manage brightness on Windows 11. How to Change Brightness on Windows 11 [10 Ways Explained] Single monitor users can use the following methods to adjust brightness on Windows 11. This includes desktop systems using a single monitor as well as laptops. let's start. Method 1: Use the Action Center The Action Center is accessible

How to Fix Activation Error Code 0xc004f069 in Windows Server How to Fix Activation Error Code 0xc004f069 in Windows Server Jul 22, 2023 am 09:49 AM

The activation process on Windows sometimes takes a sudden turn to display an error message containing this error code 0xc004f069. Although the activation process is online, some older systems running Windows Server may experience this issue. Go through these initial checks, and if they don't help you activate your system, jump to the main solution to resolve the issue. Workaround – close the error message and activation window. Then restart the computer. Retry the Windows activation process from scratch again. Fix 1 – Activate from Terminal Activate Windows Server Edition system from cmd terminal. Stage – 1 Check Windows Server Version You have to check which type of W you are using

See all articles