Saturday 15 June 2013

regex - Creating structured hive table with unstructured GPS packets in csv format -



regex - Creating structured hive table with unstructured GPS packets in csv format -

i have csv file mentioned below.

vts,51,0071,9739965515,nm,gp,inf01,v,19,072219,291014,0000.0000,n,00000.0000,e,07ae vts,01,0097,9739965515,sp,gp,18,072253,v,0000.0000,n,00000.0000,e,0.0,0.0,291014,0000,00,4000,11,999,169,b205 vts,51,0071,9739965515,nm,gp,inf01,v,18,072311,291014,0000.0000,n,00000.0000,e,c24e vts,01,0097,9739965515,nm,gp,19,072311,v,0000.0000,n,00000.0000,e,0.0,0.0,291014,0000,00,4000,11,999,171,b358 vts,51,0071,9739965515,nm,gp,inf01,v,18,072319,291014,0000.0000,n,00000.0000,e,012f vts,51,0071,9739965515,nm,gp,inf01,v,19,072326,291014,0000.0000,n,00000.0000,e,b2e6 vts,01,0097,9739965515,nm,gp,18,072326,v,0000.0000,n,00000.0000,e,0.0,0.0,291014,0000,00,4000,11,999,173,eaa0 vts,51,0071,9739965515,nm,gp,inf01,v,18,072333,291014,0000.0000,n,00000.0000,e,9896 vts,51,0071,9739965515,nm,gp,inf01,v,18,072340,291014,0000.0000,n,00000.0000,e,9b23

this has mapped fields:

pkt_header,gprs_pkt_id,pkt_length,sim_no,msg_id,gprs_pkt,gsm_sig_strength,utc_time,pkt_validation,latitude,direction_n_s,longitude,direction_e_w,speed,track_angle,utc_date,fuel_adc_values,ignition,odometer_values,supply_int,battery_adc,pkt_id,check_sum

the sec field i.e. gprs_pkt_id value 01 depicts valid packet. used case filter csv info valid packets, using regex, not able entire data. help appreciated.

the used hive query shown below.

create external table sky_track_testing1( pkt_header string, gprs_pkt_id string, pkt_length string, sim_no string, msg_id string, gprs_pkt string, gsm_sig_strength string, utc_time string, pkt_validation string, latitude string, direction_n_s string, longitude string, direction_e_w string, speed string, track_angle string, utc_date string, fuel_adc_values string, ignition string, odometer_values string, supply_int string, battery_adc string, pkt_id string, check_sum string ) row format serde 'org.apache.hadoop.hive.contrib.serde2.regexserde' serdeproperties ( "input.regex" = "^(vts,01).*$" ) stored textfile location '/user/root/sky_track';

this wrong query. please help me.

i recommend utilize pig this:

a = load '/user/root/sky_track' (pkt_header,gprs_pkt_id,pkt_length,sim_no,msg_id,gprs_pkt,gsm_sig_strength,utc_time,pkt_validation,latitude,direction_n_s,longitude,direction_e_w,speed,track_angle,utc_date,fuel_adc_values,ignition,odometer_values,supply_int,battery_adc,pkt_id,check_sum); b = filter gprs_pkt_id == '01'; store b '/user/root/sky_track_valid';

regex csv hadoop filter hive

No comments:

Post a Comment