mrjob allows you to write MapReduce jobs in Python 2.5+ and run them on multiple different platforms. You can:
Write multi-step MapReduce jobs using pure Python
Test on your local machine
On a Hadoop cluster Run
Use Amazon Elastic MapReduce (EMR) to run on the cloud
The installation method of pip is very simple, no configuration is required, run directly: pip install mrjob
Code example:
from mrjob.job import MRJob class MRWordCounter(MRJob): def mapper(self, key, line): for word in line.split(): yield word, 1 def reducer(self, word, occurrences): yield word, sum(occurrences) if __name__ == '__main__': MRWordCounter.run()