Friday 15 March 2013

canonicalization - Rejecting isomorphisms from collection of graphs -



canonicalization - Rejecting isomorphisms from collection of graphs -

i have collection of 15m (million) dags (directed acyclic graphs - directed hypercubes actually) remove isomorphisms from. mutual algorithm this? each graph small, hybercube of dimension n n 3 6 (for now) resulting in graphs of 64 nodes each n=6 case.

using networkx , python, implemented works little sets 300k (thousand) fine (runs in few days time).

def isisomorphicduplicate(hcl, hc): """checks if hc isomorphism of of hc's in hcl returns true if hcl contains isomorphism of hc returns false if not found""" #for each cube in hcl, check if hc isomorphic #if isomorphic, check if #if isomorphic, homecoming true #if comparisons have been made already, not isomorphism , homecoming false saved_hc in hcl: if nx.faster_could_be_isomorphic(saved_hc, hc): if nx.fast_could_be_isomorphic(saved_hc, hc): if nx.is_isomorphic(saved_hc, hc): homecoming true homecoming false

one improve way convert each graph canonical ordering, sort collection, remove duplicates. bypasses checking each of 15m graphs in binary is_isomophic() test, believe above implementation o(n!n) (not taking isomorphic time account) whereas clean convert canonical ordering , sort should take o(n) conversion + o(log(n)n) search + o(n) removal of duplicates. o(n!n) >> o(log(n)n)

i found paper on canonical graph labeling, tersely described mathematical equations, no pseudocode: "mckay's canonical graph labeling algorithm" - http://www.math.unl.edu/~aradcliffe1/papers/canonical.pdf

tldr: have impossibly big number of graphs check via binary isomorphism checking. believe mutual way done via canonical ordering. packaged algorithms or published straightforward implement algorithms (i.e. have pseudocode) exist?

here breakdown of mckay ’ s canonical graph labeling algorithm, presented in paper hartke , radcliffe [link paper].

i should start pointing out open source implementation available here: nauty , traces source code.

ok, let's this! unfortunately algorithm heavy in graph theory, need terms. first start defining isomorphic , automorphic.

isomorphism:

two graphs isomorphic if same, except vertices labelled differently. next 2 graphs isomorphic.

automorphic:

two graphs automorphic if same, including vertex labeling. next 2 graphs automorphic. seems trivial, turns out of import technical reasons.

graph hashing:

the core thought of whole thing have way hash graph string, given graph compute hash strings graphs isomorphic it. isomorphic hash string alphabetically (technically lexicographically) largest called "canonical hash", , graph produced called "canonical isomorph", or "canonical labelling".

with this, check if 2 graphs isomorphic need check if canonical isomporphs (or canonical labellings) equal (ie automorphs of each other). wow jargon! unfortuntately more confusing without jargon :-(

the hash function going utilize called i(g) graph g: build binary string looking @ every pair of vertices in g (in order of vertex label) , set "1" if there border between 2 vertices, "0" if not. way j-th bit in i(g) represents presense of absence of border in graph.

mckay ’ s canonical graph labeling algorithm

the problem graph on n vertices, there o( n! ) possible isomorphic hash strings based on how label vertices, , many many more if have compute same string multiple times (ie automorphs). in general have compute every isomorph hash string in order find biggest one, there's no magic sort-cut. mckay's algorithm search algorithm find canonical isomoprh faster pruning automorphs out of search tree, forcing vertices in canonical isomoprh labelled in increasing grade order, , few other tricks cut down number of isomorphs have hash.

(1) sect 4: first step of mckay's sort vertices according degree, prunes out bulk of isomoprhs search, not guaranteed unique ordering since there may more 1 vertex of given degree. example, next graph has 6 vertices; verts {1,2,3} have grade 1, verts {4,5} have grade 2 , vert {6} has grade 3. it's partial ordering according vertex grade {1,2,3|4,5|6}.

(2) sect 5: impose artificial symmetry on vertices not distinguished vertex degree; take 1 of groups of vertices same degree, , in turn pick 1 @ time come first in total ordering (fig. 2 in paper), in our illustration above, node {1,2,3|4,5|6} have children { {1|2,3|4,5|6}, {2|1,3|4,5|6}}, {3|1,2|4,5|6}} } expanding grouping {1,2,3} , children { {1,2,3|4|5|6}, {1,2,3|5|4|6} } expanding grouping {4,5}. splitting can done way downwards leaf nodes total orderings {1|2|3|4|5|6} describe total isomorph of g. allows to take partial ordering vertex grade (1), {1,2,3|4,5|6}, , build tree listing candidates canonical isomorph -- way fewer n! combinations since, example, vertex 6 never come first. note mckay evaluates children in depth-first way, starting smallest grouping first, leads deeper narrower tree improve online pruning in next step. note each total ordering leaf node may appear in more 1 subtree, there's pruning comes in!

(3) sect. 6: while searching tree, automorphisms , utilize prune tree. math here bit above me, think thought if find 2 nodes in tree automorphisms of each other can safely prune 1 of subtrees because know both yield same leaf nodes.

i have given high-level description of mckay's, paper goes lot more depth in math, , building implementation require understanding of math. i've given plenty context either go , re-read paper, or read source code of implementation.

graph canonicalization isomorphism

No comments:

Post a Comment