hadoop - Not getting correct output when running standard "WordCount" program using Hadoop0.20.2 -
I'm new to Hadop. I'm trying to run the famous "WordCount" program - the total number of words in a list of files using Hadoop-0.20.2. I am using a single node cluster.
Foling is my program:
Import java.io.File; Import java.io.IOException; Import java.util. *;
Import org.apache.hadoop.fs.Path; Import org.apache.hadoop.conf ; Import org.apache.hadoop.io ; Import org.apache.hadoop.mapreduce *; Import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; Import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; Import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; Import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; Public Class WordCount { Program using Hoop -0.20.2 (not showing the command for clarity), the incoming output is A1A1B1B! C1C1D! D1 which is wrong. The actual output should be: A2B2C2D2 This "WordCount" program is a very standard program. I'm not really sure this code is wrong with the content of all configuration files like mapper-site.exml, core-site.exml etc. I would be happy if someone could help me. Thank you. This code actually runs a local metret job. If you want to submit it to the actual cluster, you must provide the
Public Static Class Map Mapper and Letting; LongVerbable, Text, Text, Intrabate, and GT; {Private Final Static IntWritable A = New IntWritable (1); Private text word = new text (); Public Zero map (long-term appropriate key, text value, reference reference) throws IOException, interrupted; Expression {string line = value Tutorial (); StringTokenizer Tokenizer = New StringTokenizer (line); While (tokenizer.hasMoreTokens ()) {word.set (tokenizer.nextToken ()); Context.write (word, one); }}} Reducer reduction in public stable class
fs.default.name and
mapred.job.tracker configuration parameters. These keys are mapped to a host with your machine: the port pair will just be your Managed / Core-Site. Like in Xml.
Ensure that your data is available in HDFS, not on the local disk, as well as the number of reducers should be reduced. It has about 2 records per reducer copy, you should set it to 1.
Comments
Post a Comment