hive - Count and find maximum number in Hadoop using pig -
i have table contain sample cdr info in column , column b having calling person , called person mobile number need find having maximum number of calls made(column a) , need find number(column b) called most
the table construction below
calling called 889578226 77382596 889582256 77382596 889582256 7736368296 7785978214 782987522in above table 889578226 have number of outgoing calls , 77382596 called number in such way need output
in hive run below
select calling_a,called_b, count(called_b) cdr_data grouping calling_a,called_b;
what might equalent code above query in pig?
anas, please allow me know expecting or different?
input.txt a,100 a,101 a,101 a,101 a,103 b,200 b,201 b,201 c,300 c,300 c,301 d,400 pigscript: = load 'input.txt' using pigstorage(',') (name:chararray,phone:long); b = grouping (name,phone); c = foreach b generate flatten(group),count(a) cnt; d = grouping c $0; e = foreach d { sortedlist = order c cnt desc; top = limit sortedlist 1; generate flatten(top); } dump e; output: (a,101,3) (b,201,2) (c,300,2) (d,400,1)
hadoop hive apache-pig cdr
No comments:
Post a Comment