Breedlove: indexing - Titan cassandra does not use defined indexes for custom gremlin steps -

Tuesday, 15 March 2011

indexing - Titan cassandra does not use defined indexes for custom gremlin steps -

we have defined 5 indexes using titan cassandra in follow block of code

 def mgmt = g.managementsystem;   seek {      if (!mgmt.containsgraphindex("byid")) {          def key = mgmt.makepropertykey('__id').datatype(string.class).make()          mgmt.buildindex("byid",vertex.class).addkey(key).buildcompositeindex()      }      if (!mgmt.containsgraphindex("bytype")) {           def key = mgmt.makepropertykey('__type').datatype(string.class).make()          mgmt.buildindex("bytype",vertex.class).addkey(key).buildcompositeindex()      }      if (!mgmt.containsgraphindex("lastname")) {          def key = mgmt.makepropertykey('lastname').datatype(string.class).make()          mgmt.buildindex('lastname',vertex.class).addkey(key).buildmixedindex(index_name)      }      if (!mgmt.containsgraphindex("firstname")) {          def key = mgmt.makepropertykey('firstname').datatype(string.class).make()          mgmt.buildindex('firstname',vertex.class).addkey(key).buildmixedindex(index_name)      }      if (!mgmt.containsgraphindex("vin")) {          def key = mgmt.makepropertykey('vin').datatype(string.class).make()          mgmt.buildindex('vin',vertex.class).addkey(key).buildmixedindex(index_name)      }      mgmt.commit()  }  grab (exception e) {      system.err.println("an error occurred initializing indices")      e.printstacktrace()  }

we execute next query

g.v.has('__id','49fb8bae5f994cf5825b849a5dd9b49a')

this produces warning informing :

"query requires iterating on vertices [{}]. improve performance, utilize indexes"

i'm confused because according documentation these indexes set correctly, reason titan not using them.

the indexes created before info in graph, reindexing not neccessary. help appreciated.

update- i've managed break downwards simple test. in our code have developed custom gremlin step utilize stated query

gremlin.definestep('hasid', [vertex,pipe], { string id -> _().has('__id', id) })

then our code phone call

g.v.hasid(id)

it appears when utilize custom gremlin step query not utilize index, when using vanilla gremlin phone call index used.

it looks similar oddity noted in post https://groups.google.com/forum/#!topic/aureliusgraphs/6dqmg13_4eq

i prefer check existence of property key mean adjust checks to:

if (!mgmt.containsrelationtype("__id")) {

i tried out code in titan gremlin console , i'm not seeing issue:

gremlin> g  = titanfactory.open("conf/titan-cassandra.properties") ==>titangraph[cassandrathrift:[127.0.0.1]] gremlin> mgmt = g.managementsystem ==>com.thinkaurelius.titan.graphdb.database.management.managementsystem@2227a6c1 gremlin> key = mgmt.makepropertykey('__id').datatype(string.class).make() ==>__id gremlin> mgmt.buildindex("byid",vertex.class).addkey(key).buildcompositeindex() ==>com.thinkaurelius.titan.graphdb.database.management.titangraphindexwrapper@6d4c273c gremlin> mgmt.commit() ==>null gremlin> mgmt = g.managementsystem ==>com.thinkaurelius.titan.graphdb.database.management.managementsystem@79d743e6 gremlin> mgmt.containsgraphindex("byid") ==>true gremlin> mgmt.rollback() ==>null gremlin> v = g.addvertex() ==>v[256] gremlin> v.setproperty("__id","123") ==>null gremlin> g.commit() ==>null gremlin> g.v 12:56:45 warn  com.thinkaurelius.titan.graphdb.transaction.standardtitantx  - query requires iterating on vertices [()].  improve performance,  utilize indexes ==>v[256] gremlin> g.v("__id","123") ==>v[256] gremlin> g.v.has("__id","123") ==>v[256]

note i'm not getting ugly message "...use indexes". perhaps can seek illustration here , see if behaves expected before going code.

update: in reply updated question above respect custom step. post found noted, titan's query optimizer doesn't seem able sort 1 out. think it's easy see why in example:

gremlin> g = tinkergraphfactory.createtinkergraph() ==>tinkergraph[vertices:6 edges:6] gremlin> gremlin.definestep('hasname', [vertex,pipe], { n -> _().has('name',n) }) ==>null gremlin> g.v.hasname('marko') ==>v[1] gremlin> g.v.hasname('marko').tostring() ==>[gremlinstartpipe, graphquerypipe(vertex), [gremlinstartpipe, propertyfilterpipe(name,equal,marko)]]

the "compiled" gremlin looks lastly line above. note custom step compiles "inner" pipe new gremlinstartpipe. compare same without custom step:

gremlin> g.v.has('name','marko').tostring() ==>[gremlinstartpipe, graphquerypipe(has,vertex), identitypipe]

titan can optimize "graphquerypipe" embedded has, seems isn't case custom step's signature. think workaround (at to the lowest degree particular scenario write function returns pipe.

gremlin> def hasname(g,n){g.v.has('name',n)}   ==>true gremlin> hasname(g,'marko') ==>v[1] gremlin> hasname(g,'marko').tostring() ==>[gremlinstartpipe, graphquerypipe(has,vertex), identitypipe]

passing 'g' around kinda stinks. perhaps write dsl 'g' gets wrapped in class lets do:

with(g).hasname('marko')

a final thought utilize groovy meta-programming facilities:

gremlin> graph.metaclass.hasname = { n -> delegate.v.has('name',n) } ==>groovysh_evaluate$_run_closure1@600b9d27 gremlin> g.hasname("marko").tostring()                               ==>[gremlinstartpipe, graphquerypipe(has,vertex), identitypipe] gremlin> g.hasname("marko")                                          ==>v[1]

indexing cassandra gremlin titan

Breedlove

Tuesday, 15 March 2011

indexing - Titan cassandra does not use defined indexes for custom gremlin steps -

No comments:

Post a Comment