hadoop - Hive & RegexSerde returning just NULL -



i'm trying parse following line example using regexeserde in hive:

2011-07-22 20:34:51 808 8b1f27d094fb33ea - - - observed "unavailable" http://www.4shared.com/ 200 tcp_nc_miss text/javascript;charset=utf-8 http dc413.4shared.com 80 /network/search-suggest.jsp ?search=2 kfzhnit2lhyqa==&format=jsonp jsp "mozilla/5.0 (windows; u; windows nt 6.1; en-us; rv:1.9.2.18) gecko/20110614 firefox/3.6.18" 82.137.200.42 484 852 -

my table definition this:

create   external table browsing_data_ext(   cdate string,   ctime string,   time_taken string,   c_ip string,   cs_username string,   cs_auth_group string,   x_exception_id string,   sc_filter_result string,   cs_categories string,   cs_referer string,   sc_status string,   s_action string,   cs_method string,   rs_content_type string,   cs_uri_scheme string,   cs_host string,   cs_uri_port string,   cs_uri_path string,   cs_uri_query string,   cs_uri_extension string,   cs_user_agent string,   s_ip string,   sc_bytes string,   cs_bytes string,   x_virus_id string  ) row format serde 'org.apache.hadoop.hive.contrib.serde2.regexserde' serdeproperties (   "input.regex" = "([\\-0-9]*) ([\\:0-9]*) ([\\d]*) ([\\.a-z0-9]*) ([\\-a-z0-9]*) ([\\-a-z0-9]*) ([\\-a-z0-9]*) ([\\w]*) (\\\"[\\w]*\\\") ([\\.\\-\\=\\&:\\/\\?a-z0-9]*) ([\\d]*) ([\\_\\w]*) ([\\w]*) ([\\/\\w]*) ([\\w]*) ([\\.\\w]*) ([\\d]*) ([\\.\\-\\=\\&:\\/\\?a-z0-9]*) ([\\.\\-\\=\\&:\\/\\?a-z0-9]*) ([\\.\\w]*) (\\\"[\\w\\w]*\\\") ([.:a-z0-9]*) ([\\d]*) ([\\d]*) ([\\-a-z0-9]*)",   "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s %10$s %11$s %12$s %13$s %14$s %15$s %16$s %17$s %18$s %19$s %20$s %21$s %22$s %23$s %24$s %25$s" ) stored textfile location '/user/hdfs/data' tblproperties ("skip.header.line.count"="6"); 



i've tested in rubular , few other regex validation tools pass when i'm selecting table i'm receiving null values;

thanks, daniel

i had read long log file , procedure solve was:

create regex 1) https://regex101.com/#java

2) replace "\w" "\s" , "\w" "\w"

inside each parentheses used "+" not "*" referring "one or more of a".

without 2) result whole line null values after adding double "\" special characters test parsed.


Comments

Popular posts from this blog

javascript - Jquery show_hide, what to add in order to make the page scroll to the bottom of the hidden field once button is clicked -

javascript - Highcharts multi-color line -

javascript - Enter key does not work in search box -