Wednesday 15 May 2013

hadoop - batch insert millions of records to hive using hive sql? -



hadoop - batch insert millions of records to hive using hive sql? -

i want prepare sample info test on hive table stored parquet format. table this:

hive> create table exps (sn string, buildnum string, shortprodname string, useriv string, cfs struct<version : string, name : string, objarray : array<struct<id : string, properties : int>> >) stored parquet;

then write sql file "sample.sql" contains millions of lines of sql insert command.

$ /opt/hive-0.13.1/bin/hive -f sample.sql

it result in hive start lots of map-reduce jobs , execute 1 one, quite slow.

so question is: there improve way this?

there no dummy table in hive, sample.sql won't work.

since need seek parquet format in hive using sql, suggestion

load info in relational database mysql. import info relational database hdfs using apache sqoop. create hive table parquet format load info hdfs hive table.

hadoop hive

No comments:

Post a Comment