How to get first n elements in an array in Hive -


i use split function create array in hive, how can first n elements array, , want go through sub-array

code example

select col1 table split(col2, ',')[0:5]  

'[0:5]'looks likes python style, doesn't work here.

this tricky one.
first grab brickhouse jar here
add hive : add jar /path/to/jars/brickhouse-0.7.0-snapshot.jar;

now create 2 functions usings :

create temporary function array_index 'brickhouse.udf.collect.arrayindexudf';
create temporary function numeric_range 'brickhouse.udf.collect.numericrange';

the query :

select a, n array_index, array_index(split(a,','),n) value_from_array ( select "abc#1,def#2,hij#3" dual union select "abc#1,def#2,hij#3,zzz#4" dual) t1 lateral view numeric_range( length(a)-length(regexp_replace(a,',',''))+1 ) n1 n

explained :
select "abc#1,def#2,hij#3" dual union select "abc#1,def#2,hij#3,zzz#4" dual

is selecting test data, in case replace table name.

lateral view numeric_range( length(a)-length(regexp_replace(a,',',''))+1 ) n1 n

numeric_range udtf returns table given range, in case, asked range between 0 (default) , number of elements in string (calculated number of commas + 1)
way, each row multiplied number of elements in given column.

array_index(split(a,','),n)

this using split(a,',')[n] hive doesn't support it.
n-th element each duplicated row of initial string resulting in :

abc#1,def#2,hij#3,zzz#4 0 abc#1 abc#1,def#2,hij#3,zzz#4 1 def#2 abc#1,def#2,hij#3,zzz#4 2 hij#3 abc#1,def#2,hij#3,zzz#4 3 zzz#4 abc#1,def#2,hij#3 0 abc#1 abc#1,def#2,hij#3 1 def#2 abc#1,def#2,hij#3 2 hij#3

if want specific number of elements (say 5) use :
lateral view numeric_range(5 ) n1 n


Comments

Popular posts from this blog

javascript - Jquery show_hide, what to add in order to make the page scroll to the bottom of the hidden field once button is clicked -

javascript - Highcharts multi-color line -

javascript - Enter key does not work in search box -