0x00 Preface
Everyone should be very familiar with HTTP proxy. It has extremely wide applications in many aspects. HTTP proxies are divided into forward proxies and reverse proxies. The latter is generally used to provide users with access to services behind the firewall or for load balancing. Typical ones include Nginx, HAProxy, etc. This article discusses forward proxies.
The most common uses of HTTP proxy are for network sharing, network acceleration and network limit breakthrough, etc. In addition, HTTP proxies are also commonly used for Web application debugging, monitoring and analysis of Web APIs called in Android/IOS APPs. Currently, well-known software includes Fiddler, Charles, Burp Suite, and mitmproxy. HTTP proxy can also be used to modify request/response content, add additional functions to web applications or change application behavior without changing the server.
0x01 What is HTTP proxy
HTTP proxy is essentially a web application, and it is not fundamentally different from other ordinary web applications. After receiving the request, the HTTP proxy comprehensively determines the target host based on the host name in the Host field in the header and the Get/POST request address, establishes a new HTTP request, forwards the request data, and forwards the received response data to the client.
If the request address is an absolute address, the HTTP proxy uses the Host in the address, otherwise the HOST field in the Header is used. Do a simple test, assuming the network environment is as follows:
192.168.1.2 Web server
192.168.1.3 HTTP proxy server
Use telnet to test
$ telnet 192.168.1.3 GET / HTTP/1.0 HOST: 192.168.1.2
Note that two consecutive carriage returns are required at the end, which is a HTTP protocol requirement . After completion, you can receive the page content of http://192.168.1.2/. Let’s make some adjustments. Bring the absolute address when making the GET request
$ telnet 192.168.1.3 GET http://httpbin.org/ip HTTP/1.0 HOST: 192.168.1.2
Note that the HOST is also set to 192.168.1.2, but the running result returns the content of the http://httpbin.org/ip page, which is the public Network IP address information.
As you can see from the above test process, HTTP proxy is not a very complicated thing, as long as the original request is sent to the proxy server. When an HTTP proxy cannot be set, for a small number of hosts that require an HTTP proxy, the simplest way is to point the IP of the target host domain name to the proxy server, which can be achieved by modifying the hosts file.
0x02 Set HTTP proxy in Python program
urllib2/urllib Proxy setting
urllib2 is a Python standard library with very powerful functions, but it is just a little more troublesome to use. In Python 3, urllib2 is no longer retained and moved to the urllib module. In urllib2, ProxyHandler is used to set up the proxy server.
proxy_handler = urllib2.ProxyHandler({'http': '121.193.143.249:80'}) opener = urllib2.build_opener(proxy_handler) r = opener.open('http://httpbin.org/ip') print(r.read())
You can also use install_opener to install the configured opener into the global environment, so that all urllib2.urlopen will automatically use the proxy
urllib2.install_opener(opener) r = urllib2.urlopen('http://httpbin.org/ip') print(r.read())
In Python 3, use urllib.
proxy_handler = urllib.request.ProxyHandler({'http': 'http://121.193.143.249:80/'}) opener = urllib.request.build_opener(proxy_handler) r = opener.open('http://httpbin.org/ip') print(r.read())
requests proxy settings
requests is one of the best HTTP libraries currently, and it is also the library I use most when constructing http requests. Its API design is very user-friendly and easy to use. Setting up a proxy for requests is very simple. You only need to set a parameter in the form {'http': 'x.x.x.x:8080', 'https': 'x.x.x.x:8080'} for proxies. Among them, http and https are independent of each other.
In [5]: requests.get('http://httpbin.org/ip', proxies={'http': '121.193.143.249:80'}).json() Out[5]: {'origin': '121.193.143.249'}
You can directly set the proxies attribute of the session, eliminating the trouble of bringing proxies parameters with every request.
s = requests.session() s.proxies = {'http': '121.193.143.249:80'} print(s.get('http://httpbin.org/ip').json())
0x03 HTTP_PROXY / HTTPS_PROXY environment variables
urllib2 and the Requests library both recognize the HTTP_PROXY and HTTPS_PROXY environment variables, and will automatically set up and use the proxy once these environment variables are detected. This is very useful when debugging with HTTP proxy, because you can adjust the IP address and port of the proxy server according to environment variables without modifying the code. Most software in *nix also supports HTTP_PROXY environment variable recognition, such as curl, wget, axel, aria2c, etc.
$ http_proxy=121.193.143.249:80 python -c 'import requests; print(requests.get("http://httpbin.org/ip").json())' {u'origin': u'121.193.143.249'} $ http_proxy=121.193.143.249:80 curl httpbin.org/ip { "origin": "121.193.143.249" }
In the IPython interactive environment, you may often need to temporarily debug HTTP requests. You can simply add/cancel the HTTP proxy by setting os.environ['http_proxy'].
In [245]: os.environ['http_proxy'] = '121.193.143.249:80' In [246]: requests.get("http://httpbin.org/ip").json() Out[246]: {u'origin': u'121.193.143.249'} In [249]: os.environ['http_proxy'] = '' In [250]: requests.get("http://httpbin.org/ip").json() Out[250]: {u'origin': u'x.x.x.x'}
0x04 MITM-Proxy
MITM originates from Man-in-the-Middle Attack, which refers to a man-in-the-middle attack, generally intercepting, monitoring and tampering with data in the network between the client and the server.
mitmproxy is an open source man-in-the-middle proxy artifact developed in Python language. It supports SSL, transparent proxy, reverse proxy, traffic recording and playback, and custom scripts. The function is somewhat similar to Fiddler in Windows, but mitmproxy is a console program without a GUI interface, but it is quite convenient to use. Using mitmproxy, you can easily filter, intercept, and modify any proxy HTTP request/response packets. You can even use its scripting API to write scripts to automatically intercept and modify HTTP data.
# test.py def response(flow): flow.response.headers["BOOM"] = "boom!boom!boom!"
上面的脚本会在所有经过代理的Http响应包头里面加上一个名为BOOM的header。用mitmproxy -s 'test.py'命令启动mitmproxy,curl验证结果发现的确多了一个BOOM头。
$ http_proxy=localhost:8080 curl -I 'httpbin.org/get' HTTP/1.1 200 OK Server: nginx Date: Thu, 03 Nov 2016 09:02:04 GMT Content-Type: application/json Content-Length: 186 Connection: keep-alive Access-Control-Allow-Origin: * Access-Control-Allow-Credentials: true BOOM: boom!boom!boom! ...
显然mitmproxy脚本能做的事情远不止这些,结合Python强大的功能,可以衍生出很多应用途径。除此之外,mitmproxy还提供了强大的API,在这些API的基础上,完全可以自己定制一个实现了特殊功能的专属代理服务器。
经过性能测试,发现mitmproxy的效率并不是特别高。如果只是用于调试目的那还好,但如果要用到生产环境,有大量并发请求通过代理的时候,性能还是稍微差点。我用twisted实现了一个简单的proxy,用于给公司内部网站增加功能、改善用户体验,以后有机会再和大家分享。