Monday 15 March 2010

hive - Count and find maximum number in Hadoop using pig -



hive - Count and find maximum number in Hadoop using pig -

i have table contain sample cdr info in column , column b having calling person , called person mobile number need find having maximum number of calls made(column a) , need find number(column b) called most

the table construction below

calling called 889578226 77382596 889582256 77382596 889582256 7736368296 7785978214 782987522

in above table 889578226 have number of outgoing calls , 77382596 called number in such way need output

in hive run below

select calling_a,called_b, count(called_b) cdr_data grouping calling_a,called_b;

what might equalent code above query in pig?

anas, please allow me know expecting or different?

input.txt a,100 a,101 a,101 a,101 a,103 b,200 b,201 b,201 c,300 c,300 c,301 d,400 pigscript: = load 'input.txt' using pigstorage(',') (name:chararray,phone:long); b = grouping (name,phone); c = foreach b generate flatten(group),count(a) cnt; d = grouping c $0; e = foreach d { sortedlist = order c cnt desc; top = limit sortedlist 1; generate flatten(top); } dump e; output: (a,101,3) (b,201,2) (c,300,2) (d,400,1)

hadoop hive apache-pig cdr

No comments:

Post a Comment