apache pig - Datastax Pig - Can't Load any Data from Cassandra -
i trying run test pig script in pig load data cassandra @ datastax enterprise, getting error. let me show de whole scenario:
cassandra schema: create keyspace libdata replication = {'class': 'simplestrategy', 'replication_factor': 1 };
create table libout ("stabr" text, "fscskey" text, "fscs_seq" text, "libid" text, "libname" text, "address" text, "city" text, "zip" text, "zip4" text, "cnty" text, "phone" text, "c_out_ty" text, "c_msa" text, "sq_feet" int, "f_sq_ft" text, "l_num_bm" int, "f_bkmob" text, "hours" int, "f_hours" text, "wks_open" int, "f_wksopn" text, "yr_sub" int, "statstru" int, "statname" int, "stataddr" int, "longitud" float, "latitude" float, "fipsst" int, "fipsco" int, "fipsplac" int, "cntypop" int, "locale" text, "centract" float, "cenblock" int, "cdcode" text, "mat_cent" text, "mat_type" int, "cbsa" int, "microf" text, primary key ("fscskey", "fscs_seq"));
cqlsh:libdata> create table libsqft ( year int, state text, sqft bigint, primary key (year, state) ); second table going used store data pig cassandra.
at pig grunt: grunt> libdata = load 'cql://libdata/libout' using cqlstorage(); grunt> dump libdata;
this output:
2014-08-18 23:02:11,603 [main] info org.apache.pig.tools.pigstats.scriptstate - pig features used in script: unknown 2014-08-18 23:02:11,607 [main] info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mrcompiler - file concatenation threshold: 100 optimistic? false 2014-08-18 23:02:11,608 [main] info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.multiqueryoptimizer - mr plan size before optimization: 1 2014-08-18 23:02:11,608 [main] info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.multiqueryoptimizer - mr plan size after optimization: 1 2014-08-18 23:02:11,613 [main] info org.apache.pig.tools.pigstats.scriptstate - pig script settings added job 2014-08-18 23:02:11,613 [main] info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.jobcontrolcompiler - mapred.job.reduce.markreset.buffer.percent not set, set default 0.3 2014-08-18 23:02:11,613 [main] info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.jobcontrolcompiler - creating jar file job5135328249315577655.jar 2014-08-18 23:02:14,378 [main] info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.jobcontrolcompiler - jar file job5135328249315577655.jar created 2014-08-18 23:02:14,386 [main] info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.jobcontrolcompiler - setting single store job 2014-08-18 23:02:14,400 [main] info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher - 1 map-reduce job(s) waiting submission. 2014-08-18 23:02:14,783 [thread-12] info org.apache.pig.backend.hadoop.executionengine.util.mapredutil - total input paths (combined) process : 1 2014-08-18 23:02:14,901 [main] info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher - 0% complete 2014-08-18 23:02:15,439 [main] info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher - hadoopjobid: job_201408182033_0011 2014-08-18 23:02:15,439 [main] info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher - more information at: http://ip:50030/jobdetails.jsp?jobid=job_201408182033_0011 2014-08-18 23:03:00,167 [main] info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher - job job_201408182033_0011 has failed! stop running dependent jobs 2014-08-18 23:03:00,167 [main] info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher - 100% complete 2014-08-18 23:03:00,169 [main] warn org.apache.pig.backend.hadoop.executionengine.mapreducelayer.launcher - there no log file write to. 2014-08-18 23:03:00,169 [main] error org.apache.pig.backend.hadoop.executionengine.mapreducelayer.launcher - backend error message java.lang.runtimeexception @ org.apache.cassandra.hadoop.cql3.cqlpagingrecordreader$rowiterator.executequery(cqlpagingrecordreader.java:657) @ org.apache.cassandra.hadoop.cql3.cqlpagingrecordreader$rowiterator.(cqlpagingrecordreader.java:301) @ org.apache.cassandra.hadoop.cql3.cqlpagingrecordreader.initialize(cqlpagingrecordreader.java:167) @ org.apache.pig.backend.hadoop.executionengine.mapreducelayer.pigrecordreader.initialize(pigrecordreader.java:181) @ org.apache.hadoop.mapred.maptask$newtrackingrecordreader.initialize(maptask.java:522) @ org.apache.hadoop.mapred.maptask.runnewmapper(maptask.java:763) @ org.apache.hadoop.mapred.maptask.run(maptask.java:370) @ org.apache.hadoop.mapred.child$4.run(child.java:266) @ java.security.accesscontroller.doprivileged(native method) @ javax.security.auth.subject.doas(subject.java:415) @ org.apache.hadoop.security.usergroupinformation.doas(usergroupinformation.java:1121) @ org.apache.hadoop.mapred.child.main(child.java:260) caused by: unavailableexception() @ org.apache.cassandra.thrift.cassandra$execute_prepared_cql3_query_result$execute_prepared_cql3_query_resultstandardscheme.read(cassandra.java:53662) @ org.apache.cassandra.thrift.cassandra$execute_prepared_cql3_query_result$execute_prepared_cql3_query_resultstandardscheme.read(cassandra.java:53630) @ org.apache.cassandra.thrift.cassandra$execute_prepared_cql3_query_result.read(cassandra.java:53545) @ org.apache.thrift.tserviceclient.receivebase(tserviceclient.java:78) @ org.apache.cassandra.thrift.cassandra$client.recv_execute_prepared_cql3_query(cassandra.java:1820) @ org.apache.cassandra.thrift.cassandra$client.execute_prepared_cql3_query(cassandra.java:1805) @ org.apache.cassandra.hadoop.cql3.cqlpagingrecordreader$rowiterator.executequery(cqlpagingrecordreader.java:635) ... 11 more
2014-08-18 23:03:00,173 [main] error org.apache.pig.tools.pigstats.simplepigstats - error 2997: unable recreate exception backed error: java.lang.runtimeexception @ org.apache.cassandra.hadoop.cql3.cqlpagingrecordreader$rowiterator.executequery(cqlpagingrecordreader.java:657) @ org.apache.cassandra.hadoop.cql3.cqlpagingrecordreader$rowiterator.(cqlpagingrecordreader.java:301) @ org.apache.cassandra.hadoop.cql3.cqlpagingrecordreader.initialize(cqlpagingrecordreader.java:167) @ org.apache.pig.backend.hadoop.executionengine.mapreducelayer.pigrecordreader.initialize(pigrecordreader.java:181) @ org.apache.hadoop.mapred.maptask$newtrackingrecordreader.initialize(maptask.java:522) @ org.apache.hadoop.mapred.maptask.runnewmapper(maptask.java:763) @ org.apache.hadoop.mapred.maptask.run(maptask.java:370) @ org.apache.hadoop.mapred.child$4.run(child.java:266) @ java.security.accesscontroller.doprivileged(native method) @ javax.security.auth.subject.doas(subject.java:415) @ org.apache.hadoop.security.usergroupinformation.doas(usergroupinformation.java:1121) @ org.apache.hadoop.mapred.child.main(child.java:260) caused by: unavailableexception() @ org.apache.cassandra.thrift.cassandra$execute_prepared_cql3_query_result$execute_prepared_cql3_query_resultstandardscheme.read(cassandra.java:53662) @ org.apache.cassandra.thrift.cassandra$execute_prepared_cql3_query_result$execute_prepared_cql3_query_resultstandardscheme.read(cassandra.java:53630) @ org.apache.cassandra.thrift.cassandra$execute_prepared_cql3_query_result.read(cassandra.java:53545) @ org.apache.thrift.tserviceclient.receivebase(tserviceclient.java:78) @ org.apache.cassandra.thrift.cassandra$client.recv_execute_prepared_cql3_query(cassandra.java:1820) @ org.apache.cassandra.thrift.cassandra$client.execute_prepared_cql3_query(cassandra.java:1805) @ org.apache.cassandra.hadoop.cql3.cqlpagingrecordreader$rowiterator.executequery(cqlpagingrecordreader.java:635) ... 11 more
2014-08-18 23:03:00,173 [main] error org.apache.pig.tools.pigstats.pigstatsutil - 1 map reduce job(s) failed! 2014-08-18 23:03:00,174 [main] info org.apache.pig.tools.pigstats.simplepigstats - script statistics:
hadoopversion pigversion userid startedat finishedat features 1.0.4.13 0.10.1 ubuntu 2014-08-18 23:02:11 2014-08-18 23:03:00 unknown
failed!
failed jobs: jobid alias feature message outputs job_201408182033_0011 libdata map_only message: job failed! error - # of failed map tasks exceeded allowed limit. failedcount: 1. lastfailedtask: task_201408182033_0011_m_000000 cfs://10.82.31.13/tmp/temp-1734707970/tmp1694465949,
input(s): failed read data "cql://libdata/libout"
output(s): failed produce result in "cfs://10.82.31.13/tmp/temp-1734707970/tmp1694465949"
counters: total records written : 0 total bytes written : 0 spillable memory manager spill count : 0 total bags proactively spilled: 0 total records proactively spilled: 0
job dag: job_201408182033_0011
2014-08-18 23:03:00,174 [main] info org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher - failed! 2014-08-18 23:03:00,215 [main] error org.apache.pig.tools.grunt.grunt - error 2997: unable recreate exception backed error: java.lang.runtimeexception @ org.apache.cassandra.hadoop.cql3.cqlpagingrecordreader$rowiterator.executequery(cqlpagingrecordreader.java:657) @ org.apache.cassandra.hadoop.cql3.cqlpagingrecordreader$rowiterator.(cqlpagingrecordreader.java:301) @ org.apache.cassandra.hadoop.cql3.cqlpagingrecordreader.initialize(cqlpagingrecordreader.java:167) @ org.apache.pig.backend.hadoop.executionengine.mapreducelayer.pigrecordreader.initialize(pigrecordreader.java:181) @ org.apache.hadoop.mapred.maptask$newtrackingrecordreader.initialize(maptask.java:522) @ org.apache.hadoop.mapred.maptask.runnewmapper(maptask.java:763) @ org.apache.hadoop.mapred.maptask.run(maptask.java:370) @ org.apache.hadoop.mapred.child$4.run(child.java:266) @ java.security.accesscontroller.doprivileged(native method) @ javax.security.auth.subject.doas(subject.java:415) @ org.apache.hadoop.security.usergroupinformation.doas(usergroupinformation.java:1121) @ org.apache.hadoop.mapred.child.main(child.java:260) caused by: unavailableexception() @ org.apache.cassandra.thrift.cassandra$execute_prepared_cql3_query_result$execute_prepared_cql3_query_resultstandardscheme.read(cassandra.java:53662) @ org.apache.cassandra.thrift.cassandra$execute_prepared_cql3_query_result$execute_prepared_cql3_query_resultstandardscheme.read(cassandra.java:53630) @ org.apache.cassandra.thrift.cassandra$execute_prepared_cql3_query_result.read(cassandra.java:53545) @ org.apache.thrift.tserviceclient.receivebase(tserviceclient.java:78) @ org.apache.cassandra.thrift.cassandra$client.recv_execute_prepared_cql3_query(cassandra.java:1820) @ org.apache.cassandra.thrift.cassandra$client.execute_prepared_cql3_query(cassandra.java:1805) @ org.apache.cassandra.hadoop.cql3.cqlpagingrecordreader$rowiterator.executequery(cqlpagingrecordreader.java:635) ... 11 more
2014-08-18 23:03:00,215 [main] warn org.apache.pig.tools.grunt.grunt - there no log file write to. 2014-08-18 23:03:00,215 [main] error org.apache.pig.tools.grunt.grunt - org.apache.pig.impl.logicallayer.frontendexception: error 1066: unable open iterator alias libdata. backend error : unable recreate exception backed error: java.lang.runtimeexception @ org.apache.cassandra.hadoop.cql3.cqlpagingrecordreader$rowiterator.executequery(cqlpagingrecordreader.java:657) @ org.apache.cassandra.hadoop.cql3.cqlpagingrecordreader$rowiterator.(cqlpagingrecordreader.java:301) @ org.apache.cassandra.hadoop.cql3.cqlpagingrecordreader.initialize(cqlpagingrecordreader.java:167) @ org.apache.pig.backend.hadoop.executionengine.mapreducelayer.pigrecordreader.initialize(pigrecordreader.java:181) @ org.apache.hadoop.mapred.maptask$newtrackingrecordreader.initialize(maptask.java:522) @ org.apache.hadoop.mapred.maptask.runnewmapper(maptask.java:763) @ org.apache.hadoop.mapred.maptask.run(maptask.java:370) @ org.apache.hadoop.mapred.child$4.run(child.java:266) @ java.security.accesscontroller.doprivileged(native method) @ javax.security.auth.subject.doas(subject.java:415) @ org.apache.hadoop.security.usergroupinformation.doas(usergroupinformation.java:1121) @ org.apache.hadoop.mapred.child.main(child.java:260) caused by: unavailableexception() @ org.apache.cassandra.thrift.cassandra$execute_prepared_cql3_query_result$execute_prepared_cql3_query_resultstandardscheme.read(cassandra.java:53662) @ org.apache.cassandra.thrift.cassandra$execute_prepared_cql3_query_result$execute_prepared_cql3_query_resultstandardscheme.read(cassandra.java:53630) @ org.apache.cassandra.thrift.cassandra$execute_prepared_cql3_query_result.read(cassandra.java:53545) @ org.apache.thrift.tserviceclient.receivebase(tserviceclient.java:78) @ org.apache.cassandra.thrift.cassandra$client.recv_execute_prepared_cql3_query(cassandra.java:1820) @ org.apache.cassandra.thrift.cassandra$client.execute_prepared_cql3_query(cassandra.java:1805) @ org.apache.cassandra.hadoop.cql3.cqlpagingrecordreader$rowiterator.executequery(cqlpagingrecordreader.java:635) ... 11 more
@ org.apache.pig.pigserver.openiterator(pigserver.java:856) @ org.apache.pig.tools.grunt.gruntparser.processdump(gruntparser.java:683) @ org.apache.pig.tools.pigscript.parser.pigscriptparser.parse(pigscriptparser.java:303) @ org.apache.pig.tools.grunt.gruntparser.parsestoponerror(gruntparser.java:190) @ org.apache.pig.tools.grunt.gruntparser.parsestoponerror(gruntparser.java:166) @ org.apache.pig.tools.grunt.grunt.run(grunt.java:69) @ org.apache.pig.main.run(main.java:490) @ org.apache.pig.main.main(main.java:111) @ sun.reflect.nativemethodaccessorimpl.invoke0(native method) @ sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl.java:57) @ sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl.java:43) @ java.lang.reflect.method.invoke(method.java:606) @ org.apache.hadoop.util.runjar.main(runjar.java:156) caused by: org.apache.pig.backend.executionengine.execexception: error 2997: unable recreate exception backed error: java.lang.runtimeexception @ org.apache.cassandra.hadoop.cql3.cqlpagingrecordreader$rowiterator.executequery(cqlpagingrecordreader.java:657) @ org.apache.cassandra.hadoop.cql3.cqlpagingrecordreader$rowiterator.(cqlpagingrecordreader.java:301) @ org.apache.cassandra.hadoop.cql3.cqlpagingrecordreader.initialize(cqlpagingrecordreader.java:167) @ org.apache.pig.backend.hadoop.executionengine.mapreducelayer.pigrecordreader.initialize(pigrecordreader.java:181) @ org.apache.hadoop.mapred.maptask$newtrackingrecordreader.initialize(maptask.java:522) @ org.apache.hadoop.mapred.maptask.runnewmapper(maptask.java:763) @ org.apache.hadoop.mapred.maptask.run(maptask.java:370) @ org.apache.hadoop.mapred.child$4.run(child.java:266) @ java.security.accesscontroller.doprivileged(native method) @ javax.security.auth.subject.doas(subject.java:415) @ org.apache.hadoop.security.usergroupinformation.doas(usergroupinformation.java:1121) @ org.apache.hadoop.mapred.child.main(child.java:260) caused by: unavailableexception() @ org.apache.cassandra.thrift.cassandra$execute_prepared_cql3_query_result$execute_prepared_cql3_query_resultstandardscheme.read(cassandra.java:53662) @ org.apache.cassandra.thrift.cassandra$execute_prepared_cql3_query_result$execute_prepared_cql3_query_resultstandardscheme.read(cassandra.java:53630) @ org.apache.cassandra.thrift.cassandra$execute_prepared_cql3_query_result.read(cassandra.java:53545) @ org.apache.thrift.tserviceclient.receivebase(tserviceclient.java:78) @ org.apache.cassandra.thrift.cassandra$client.recv_execute_prepared_cql3_query(cassandra.java:1820) @ org.apache.cassandra.thrift.cassandra$client.execute_prepared_cql3_query(cassandra.java:1805) @ org.apache.cassandra.hadoop.cql3.cqlpagingrecordreader$rowiterator.executequery(cqlpagingrecordreader.java:635) ... 11 more
@ org.apache.pig.backend.hadoop.executionengine.mapreducelayer.launcher.geterrormessages(launcher.java:217) @ org.apache.pig.backend.hadoop.executionengine.mapreducelayer.launcher.getstats(launcher.java:149) @ org.apache.pig.backend.hadoop.executionengine.mapreducelayer.mapreducelauncher.launchpig(mapreducelauncher.java:383) @ org.apache.pig.pigserver.launchplan(pigserver.java:1279) @ org.apache.pig.pigserver.executecompiledlogicalplan(pigserver.java:1264) @ org.apache.pig.pigserver.storeex(pigserver.java:961) @ org.apache.pig.pigserver.store(pigserver.java:928) @ org.apache.pig.pigserver.openiterator(pigserver.java:841) ... 12 more it seems pig can't read data cassandra. have idea of what's going on?
thanks lot.
bruno
the news exact sequence of steps provided works correctly on fresh install of single node dse 4.5.1 cluster.
the logs don't indicate problem other "unavailableexception", means query making cassandra, , cassandra doesn't think enough nodes fulfill request.
with rf=1, implication if node in cluster down, hit exception portion of data set.
my reccomendation
- try again on single node dse 4.5.1 installed.
- assuming works, double check topology , status of cluster "nodetool ring", or opscenter's ring view both quite helpful this.
the cassandra logs should show unavailableexception, , possibly point more directly @ underlying source of problem.
i don't see indication problem specific configuration or use of pig itself.
Comments
Post a Comment