hadoop - Not getting correct output when running standard "WordCount" program using Hadoop0.20.2 -

- September 15, 2014

I'm new to Hadop. I'm trying to run the famous "WordCount" program - the total number of words in a list of files using Hadoop-0.20.2. I am using a single node cluster.

Foling is my program:

Import java.io.File; Import java.io.IOException; Import java.util. *;

Import org.apache.hadoop.fs.Path; Import org.apache.hadoop.conf ; Import org.apache.hadoop.io ; Import org.apache.hadoop.mapreduce *; Import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; Import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; Import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; Import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

Public Class WordCount {

  Public Static Class Map Mapper and Letting; LongVerbable, Text, Text, Intrabate, and GT; {Private Final Static IntWritable A = New IntWritable (1); Private text word = new text (); Public Zero map (long-term appropriate key, text value, reference reference) throws IOException, interrupted; Expression {string line = value Tutorial (); StringTokenizer Tokenizer = New StringTokenizer (line); While (tokenizer.hasMoreTokens ()) {word.set (tokenizer.nextToken ()); Context.write (word, one); }}} Reducer reduction in public stable class  value, reference reference) throws IOException, interrupted; Exception {int sum = 0; While (values.hasNext ()) {++ sum; } Context.write (Key, New IntWritable (Yoga)); }} Public static zero main (string [] args throws exceptions {configuration conf = new configuration ()); Job job = new job (conf, "wordcount"); FileInputFormat.addInputPath (Job, New Path (Args [0])); FileOutputFormat.setOutputPath (job, new path (args [1])); Job.setJarByClass (WordCount.class); Job.setInputFormatClass (TextInputFormat.class); Job.setOutputFormatClass (TextOutputFormat.class); Job.setMapperClass (Map.class); Job.setMapOutputKeyClass (Text.class); Job.setMapOutputValueClass (IntWritable.class); Job.setReducerClass (Reduce.class); Job.setOutputKeyClass (Text.class); Job.setOutputValueClass (IntWritable.class); Job.setNumReduceTasks (5); Job.waitForCompletion (true); }    
 Program using Hoop -0.20.2 (not showing the command for clarity), the incoming output is A1A1B1B! C1C1D! D1  
 which is wrong. The actual output should be: A2B2C2D2  
 This "WordCount" program is a very standard program. I'm not really sure this code is wrong with the content of all configuration files like mapper-site.exml, core-site.exml etc.  
 I would be happy if someone could help me.  
 Thank you.   
 
  This code actually runs a local metret job. If you want to submit it to the actual cluster, you must provide the  fs.default.name  and  mapred.job.tracker  configuration parameters. These keys are mapped to a host with your machine: the port pair will just be your Managed / Core-Site. Like in Xml. 
 Ensure that your data is available in HDFS, not on the local disk, as well as the number of reducers should be reduced. It has about 2 records per reducer copy, you should set it to 1.   

 




  



















Get link





Facebook





X





Pinterest





Email





Other Apps

Comments Post a Comment

Search This Blog

T C SPAIN

hadoop - Not getting correct output when running standard "WordCount" program using Hadoop0.20.2 -

Comments

Post a Comment

Popular posts from this blog

qt - switch/case statement in C++ with a QString type -

python - sqlite3.OperationalError: near "REFERENCES": syntax error - foreign key creating -

Python's equivalent for Ruby's define_method? -