Home > Java > javaTutorial > Introduction to three methods to implement WordCount

Introduction to three methods to implement WordCount

不言
Release: 2018-10-19 16:17:28
forward
3784 people have browsed it

This article brings you an introduction to three methods of implementing WordCount. It has certain reference value. Friends in need can refer to it. I hope it will be helpful to you.

1. Streamlined Shell

cat /home/sev7e0/access.log | tr -s ' ' '\n' | sort | uniq -c | sort -r | awk '{ print $2, $1}'
#cat command displays the text content at one time
#tr -s ' ' '\n' Replaces the spaces in the text with the Enter key
#sort Sorts all specified files in series and writes the results to standard output.
#uniq -c Filters adjacent matching lines from the input file or standard input and writes them to the output file or standard output. -c adds a prefix number indicating the number of occurrences of the corresponding line before each line
#sort | uniq -c Used at the same time to count the number of occurrences
#sort -r Arrange the results in reverse order
#awk '{print $2,$1}' Output the results, with the text in front and the count in the back

2. Anti-human MapReduce

//mapreduce方式
public static void main(String[] args) throws Exception {

    Configuration conf = new Configuration();
//        conf.set("fs.defaultFS", "hdfs://spark01:9000");
//        conf.set("yarn.resourcemanager.hostname", "spark01");

    Path out = new Path(args[1]);
    FileSystem fs = FileSystem.get(conf);

    //判断输出路径是否存在,当路径存在时mapreduce会报错
    if (fs.exists(out)) {
        fs.delete(out, true);
        System.out.println("ouput is exit  will delete");
    }
    
    // 创建任务
    Job job = Job.getInstance(conf, "wordcountDemo");
    // 设置job的主类
    job.setJarByClass(WordCount.class); // 主类

    // 设置作业的输入路径
    FileInputFormat.setInputPaths(job, new Path(args[0]));

    //设置map的相关参数
    job.setMapperClass(WordCountMapper.class);
    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(LongWritable.class);
    
    //设置reduce相关参数
    job.setReducerClass(WordCountReduce.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(LongWritable.class);

    //设置作业的输出路径
    FileOutputFormat.setOutputPath(job, out);
    job.setNumReduceTasks(2);
    System.exit(job.waitForCompletion(true) ? 0 : 1);
}
Copy after login

3. Easy-to-use spark

//spark版wordcount
sc.textFile("/home/sev7e0/access.log").flatMap(_.split(" ")).map((_, 1)).reduceByKey(_+_).foreach(println(_))
Copy after login

The above is the detailed content of Introduction to three methods to implement WordCount. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:segmentfault.com
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template