Tuesday 15 January 2013

jdbc - Pointing HiveServer2 to MiniMRCluster for Hive Testing -



jdbc - Pointing HiveServer2 to MiniMRCluster for Hive Testing -

i've been wanting hive integration testing of code i've been developing. 2 major requirements of testing framework need:

it needs work cloudera version of hive , hadoop (preferably, 2.0.0-cdh4.7.0) it needs all local. meaning, hadoop cluster , hive server should start on origin of test, run few queries, , teardown after test over.

so broke problem downwards 3 parts:

getting code hiveserver2 part (i decided utilize jdbc connector on thrift service client) getting code building in-memory mapreduce cluster (i decided utilize minimrcluster this) setting both (1) , (2) above work each other.

i able (1) out of way looking @ many resources. of these useful are:

cloudera hadoop google user group hive jdbc client wiki

for (2), followed first-class post in stackoverflow:

integration testing hive jobs

so far, good. @ point of time, pom.xml in maven project, on including both above functionalities, looks this:

<repositories> <repository> <id>cloudera</id> <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url> </repository> </repositories> <dependencies> <dependency> <groupid>commons-io</groupid> <artifactid>commons-io</artifactid> <version>2.1</version> </dependency> <dependency> <groupid>junit</groupid> <artifactid>junit</artifactid> <version>4.11</version> </dependency> <!-- start: dependencies getting minimrcluster work --> <dependency> <groupid>org.apache.hadoop</groupid> <artifactid>hadoop-auth</artifactid> <version>2.0.0-cdh4.7.0</version> </dependency> <dependency> <groupid>org.apache.hadoop</groupid> <artifactid>hadoop-test</artifactid> <version>2.0.0-mr1-cdh4.7.0</version> </dependency> <dependency> <groupid>org.apache.hadoop</groupid> <artifactid>hadoop-hdfs</artifactid> <version>2.0.0-cdh4.7.0</version> </dependency> <dependency> <groupid>org.apache.hadoop</groupid> <artifactid>hadoop-hdfs</artifactid> <version>2.0.0-cdh4.7.0</version> <classifier>tests</classifier> </dependency> <dependency> <groupid>org.apache.hadoop</groupid> <artifactid>hadoop-common</artifactid> <version>2.0.0-cdh4.7.0</version> </dependency> <dependency> <groupid>org.apache.hadoop</groupid> <artifactid>hadoop-common</artifactid> <version>2.0.0-cdh4.7.0</version> <classifier>tests</classifier> </dependency> <dependency> <groupid>org.apache.hadoop</groupid> <artifactid>hadoop-core</artifactid> <version>2.0.0-mr1-cdh4.7.0</version> </dependency> <dependency> <groupid>org.apache.hadoop</groupid> <artifactid>hadoop-core</artifactid> <version>2.0.0-mr1-cdh4.7.0</version> <classifier>tests</classifier> </dependency> <!-- end: dependencies getting minimrcluster work --> <!-- start: dependencies getting hive jdbc work --> <dependency> <groupid>org.apache.hive</groupid> <artifactid>hive-builtins</artifactid> <version>${hive.version}</version> </dependency> <dependency> <groupid>org.apache.hive</groupid> <artifactid>hive-cli</artifactid> <version>${hive.version}</version> </dependency> <dependency> <groupid>org.apache.hive</groupid> <artifactid>hive-metastore</artifactid> <version>${hive.version}</version> </dependency> <dependency> <groupid>org.apache.hive</groupid> <artifactid>hive-serde</artifactid> <version>${hive.version}</version> </dependency> <dependency> <groupid>org.apache.hive</groupid> <artifactid>hive-common</artifactid> <version>${hive.version}</version> </dependency> <dependency> <groupid>org.apache.hive</groupid> <artifactid>hive-exec</artifactid> <version>${hive.version}</version> </dependency> <dependency> <groupid>org.apache.hive</groupid> <artifactid>hive-jdbc</artifactid> <version>${hive.version}</version> </dependency> <dependency> <groupid>org.apache.thrift</groupid> <artifactid>libfb303</artifactid> <version>0.9.1</version> </dependency> <dependency> <groupid>log4j</groupid> <artifactid>log4j</artifactid> <version>1.2.15</version> </dependency> <dependency> <groupid>org.antlr</groupid> <artifactid>antlr-runtime</artifactid> <version>3.5.1</version> </dependency> <dependency> <groupid>org.apache.derby</groupid> <artifactid>derby</artifactid> <version>10.10.1.1</version> </dependency> <dependency> <groupid>javax.jdo</groupid> <artifactid>jdo2-api</artifactid> <version>2.3-ec</version> </dependency> <dependency> <groupid>jpox</groupid> <artifactid>jpox</artifactid> <version>1.1.9-1</version> </dependency> <dependency> <groupid>jpox</groupid> <artifactid>jpox-rdbms</artifactid> <version>1.2.0-beta-5</version> </dependency> <!-- end: dependencies getting hive jdbc work --> </dependencies>

now i'm on step (3). tried running next code:

@test public void testhiveminidfsclusterintegration() throws ioexception, sqlexception { configuration conf = new configuration(); /* build minidfscluster */ minidfscluster minidfs = new minidfscluster.builder(conf).build(); /* build minimr cluster */ system.setproperty("hadoop.log.dir", "/users/nishantkelkar/ideaprojects/" + "nkelkar-incubator/hive-test/target/hive/logs"); int numtasktrackers = 1; int numtasktrackerdirectories = 1; string[] racks = null; string[] hosts = null; minimrcluster minimr = new minimrcluster(numtasktrackers, minidfs.getfilesystem().geturi().tostring(), numtasktrackerdirectories, racks, hosts, new jobconf(conf)); system.setproperty("mapred.job.tracker", minimr.createjobconf( new jobconf(conf)).get("mapred.job.tracker")); seek { string drivername = "org.apache.hive.jdbc.hivedriver"; class.forname(drivername); } grab (classnotfoundexception e) { e.printstacktrace(); system.exit(1); } connection hiveconnection = drivermanager.getconnection( "jdbc:hive2:///", "", ""); statement stm = hiveconnection.createstatement(); // create test tables , query them stm.execute("set hive.support.concurrency = false"); stm.execute("drop table if exists test"); stm.execute("create table if not exists test(a int, b int) row format delimited fields terminated ' '"); stm.execute("create table dual select 1 1 test"); stm.execute("insert table test select stack(1,4,5) (a,b) dual"); stm.execute("select * test"); }

my hope (3) solved next line of code above method:

connection hiveconnection = drivermanager.getconnection( "jdbc:hive2:///", "", "");

however, i'm getting next error:

java.sql.sqlexception: error while processing statement: failed: execution error, homecoming code 1 org.apache.hadoop.hive.ql.exec.ddltask @ org.apache.hive.jdbc.utils.verifysuccess(utils.java:161) @ org.apache.hive.jdbc.utils.verifysuccesswithinfo(utils.java:150) @ org.apache.hive.jdbc.hivestatement.execute(hivestatement.java:207) @ com.ask.nkelkar.hive.hiveunittest.testhiveminidfsclusterintegration(hiveunittest.java:54)

can please allow me know need in addition/what i'm doing wrong work?

p.s. looked @ hiverunner , hive_test projects options, wasn't able these work cloudera versions of hadoop.

your test failing @ first create table statement. hive unhelpfully suppressing next error message:

file:/user/hive/warehouse/test not directory or unable create 1

hive attempting utilize default warehouse directory /user/hive/warehouse doesn't exist on filesystem. create directory, testing you'll want override default value. example:

import static org.apache.hadoop.hive.conf.hiveconf.confvars; ... system.setproperty(confvars.metastorewarehouse.tostring(), "/users/nishantkelkar/ideaprojects/" + "nkelkar-incubator/hive-test/target/hive/warehouse");

jdbc hive integration-testing

No comments:

Post a Comment