Writing MapReduce jobs using Python

高洛峰
Release: 2016-10-18 10:28:52
Original
1380 people have browsed it

mrjob allows you to write MapReduce jobs in Python 2.5+ and run them on multiple different platforms. You can:

Write multi-step MapReduce jobs using pure Python

Test on your local machine

On a Hadoop cluster Run

Use Amazon Elastic MapReduce (EMR) to run on the cloud

The installation method of pip is very simple, no configuration is required, run directly: pip install mrjob

Code example:

from mrjob.job import MRJob
class MRWordCounter(MRJob):
    def mapper(self, key, line):
        for word in line.split():
            yield word, 1
    def reducer(self, word, occurrences):
        yield word, sum(occurrences)
if __name__ == '__main__':
    MRWordCounter.run()
Copy after login


source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template