sorting - Error during benchmarking Sort in Hadoop2 - Partitions do not match -


i trying benchmark hadoop2 mapreduce framework. not terasort. testmapredsort.

step-1 create random data:

hadoop jar hadoop/ randomwriter -dtest.randomwrite.bytes_per_map=100 -dtest.randomwriter.maps_per_host=10 /data/unsorted-data 

step-2 sort random data created in step-1:

hadoop jar hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar sort /data/unsorted-data /data/sorted-data 

step-3 check if sorting mr works:

hadoop jar hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar testmapredsort -sortinput /data/unsorted-data -sortoutput /data/sorted-data 

i following error during step-3. want know how fix this error.

java.lang.exception: java.io.ioexception: partitions not match record# 0 ! - '0' v/s '5'     @ org.apache.hadoop.mapred.localjobrunner$job.run(localjobrunner.java:403) caused by: java.io.ioexception: partitions not match record# 0 ! - '0' v/s '5'     @ org.apache.hadoop.mapred.sortvalidator$recordstatschecker$map.map(sortvalidator.java:266)     @ org.apache.hadoop.mapred.sortvalidator$recordstatschecker$map.map(sortvalidator.java:191)     @ org.apache.hadoop.mapred.maprunner.run(maprunner.java:54)     @ org.apache.hadoop.mapred.maptask.runoldmapper(maptask.java:429)     @ org.apache.hadoop.mapred.maptask.run(maptask.java:341)     @ org.apache.hadoop.mapred.localjobrunner$job$maptaskrunnable.run(localjobrunner.java:235)     @ java.util.concurrent.executors$runnableadapter.call(executors.java:439)     @ java.util.concurrent.futuretask$sync.innerrun(futuretask.java:303)     @ java.util.concurrent.futuretask.run(futuretask.java:138)     @ java.util.concurrent.threadpoolexecutor$worker.runtask(threadpoolexecutor.java:895)     @ java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor.java:918)     @ java.lang.thread.run(thread.java:695) 14/08/18 11:07:39 info mapreduce.job: job job_local2061890210_0001 failed state failed due to: na 14/08/18 11:07:39 info mapreduce.job: counters: 23     file system counters         file: number of bytes read=1436271         file: number of bytes written=1645526         file: number of read operations=0         file: number of large read operations=0         file: number of write operations=0         hdfs: number of bytes read=1077294840         hdfs: number of bytes written=0         hdfs: number of read operations=13         hdfs: number of large read operations=0         hdfs: number of write operations=1     map-reduce framework         map input records=102247         map output records=102247         map output bytes=1328251         map output materialized bytes=26         input split bytes=102         combine input records=102247         combine output records=1         spilled records=1         failed shuffles=0         merged map outputs=0         gc time elapsed (ms)=22         total committed heap usage (bytes)=198766592     file input format counters          bytes read=1077294840 java.io.ioexception: job failed!     @ org.apache.hadoop.mapred.jobclient.runjob(jobclient.java:836)     @ org.apache.hadoop.mapred.sortvalidator$recordstatschecker.checkrecords(sortvalidator.java:367)     @ org.apache.hadoop.mapred.sortvalidator.run(sortvalidator.java:579)     @ org.apache.hadoop.util.toolrunner.run(toolrunner.java:70)     @ org.apache.hadoop.mapred.sortvalidator.main(sortvalidator.java:594)     @ sun.reflect.nativemethodaccessorimpl.invoke0(native method)     @ sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl.java:39)     @ sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl.java:25)     @ java.lang.reflect.method.invoke(method.java:597)     @ org.apache.hadoop.util.programdriver$programdescription.invoke(programdriver.java:72)     @ org.apache.hadoop.util.programdriver.run(programdriver.java:144)     @ org.apache.hadoop.test.mapredtestdriver.run(mapredtestdriver.java:115)     @ org.apache.hadoop.test.mapredtestdriver.main(mapredtestdriver.java:123)     @ sun.reflect.nativemethodaccessorimpl.invoke0(native method)     @ sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl.java:39)     @ sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl.java:25)     @ java.lang.reflect.method.invoke(method.java:597)     @ org.apache.hadoop.util.runjar.main(runjar.java:212) 

edit:

hadoop fs -ls /data/unsorted-data -rw-r--r--   3 david supergroup          0 2014-08-14 12:45 /data/unsorted-data/_success -rw-r--r--   3 david supergroup 1077294840 2014-08-14 12:45 /data/unsorted-data/part-m-00000  hadoop fs -ls /data/sorted-data -rw-r--r--   3 david supergroup          0 2014-08-14 12:55 /data/sorted-data/_success -rw-r--r--   3 david supergroup  137763270 2014-08-14 12:55 /data/sorted-data/part-m-00000 -rw-r--r--   3 david supergroup  134220478 2014-08-14 12:55 /data/sorted-data/part-m-00001 -rw-r--r--   3 david supergroup  134219656 2014-08-14 12:55 /data/sorted-data/part-m-00002 -rw-r--r--   3 david supergroup  134218029 2014-08-14 12:55 /data/sorted-data/part-m-00003 -rw-r--r--   3 david supergroup  134219244 2014-08-14 12:55 /data/sorted-data/part-m-00004 -rw-r--r--   3 david supergroup  134220252 2014-08-14 12:55 /data/sorted-data/part-m-00005 -rw-r--r--   3 david supergroup  134224231 2014-08-14 12:55 /data/sorted-data/part-m-00006 -rw-r--r--   3 david supergroup  134210232 2014-08-14 12:55 /data/sorted-data/part-m-00007 

aside change in keys test.randomwrite.bytes_per_map , test.randomwriter.maps_per_host mapreduce.randomwriter.bytespermap , mapreduce.randomwriter.mapsperhost causing settings not through randomwriter, core of problem indicated filenames listed under /data/sorted-data sorted data consists of map outputs, whereas correctly sorted output comes reduce outputs; essentially, sort command performing map portion of sort, , never performing merge in subsequent reduce stage. because of this, testmapredsort command correctly reporting sort did not work.

checking code of sort.java can see there in fact no protection against num_reduces somehow getting set 0; typical behavior of hadoop mr setting number of reduces 0 indicates "map only" job, map outputs go directly hdfs rather being intermediate outputs passed reduce tasks. here relevant lines:

85     int num_reduces = (int) (cluster.getmaxreducetasks() * 0.9); 86     string sort_reduces = conf.get(reduces_per_host); 87     if (sort_reduces != null) { 88        num_reduces = cluster.gettasktrackers() *  89                        integer.parseint(sort_reduces); 90     } 

now, in normal setup, of logic using "default" settings should provide nonzero number of reduces, such sort works. able repro problem running:

hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar sort -r 0 /data/unsorted-data /data/sorted-data 

using -r 0 force 0 reduces. in case, more cluster.getmaxreducetasks() returning 1 (or possibly 0 if cluster broken). don't know off top of head ways method return 1; appears setting mapreduce.tasktracker.reduce.tasks.maximum 1 doesn't apply method. other factors go task capacity include numbers of cores , amount of memory available.

assuming cluster @ least capable of 1 reduce task per tasktracker, can retry sort step using -r 1:

hadoop fs -rmr /data/sorted-data hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar sort -r 1 /data/unsorted-data /data/sorted-data 

Comments

Popular posts from this blog

javascript - Jquery show_hide, what to add in order to make the page scroll to the bottom of the hidden field once button is clicked -

python - Django-cities exits with "killed" -

python - How to get a widget position inside it's layout in Kivy? -