Sunday, 15 May 2011

Python: sorting a dependency list -



Python: sorting a dependency list -

i'm trying work out if problem solvable using builtin sorted() function or if need myself - old school using cmp have been relatively easy.

my data-set looks like:

x = [ ('business', set('fleet','address')) ('device', set('business','model','status','pack')) ('txn', set('device','business','operator')) ....

the sort rule should value of n & y y > n, x[n][0] not in x[y][1]

although i'm using python 2.6 cmp argument still available i'm trying create python 3 safe.

so, can done using lambda magic , key argument?

-== update ==-

thanks eli & winston! didn't think using key work, or if suspected shoe horn solution isn't ideal.

because problem database table dependencies had create minor add-on eli's code remove item list of dependencies (in designed database wouldn't happen, lives in magical perfect world?)

my solution:

def topological_sort(source): """perform topo sort on elements. :arg source: list of ``(name, set(names of dependancies))`` pairs :returns: list of names, dependancies listed first """ pending = [(name, set(deps)) name, deps in source] emitted = [] while pending: next_pending = [] next_emitted = [] entry in pending: name, deps = entry deps.difference_update(set((name,)), emitted) # <-- pop self dep, req py2.6 if deps: next_pending.append(entry) else: yield name emitted.append(name) # <-- not required, preserves original order next_emitted.append(name) if not next_emitted: raise valueerror("cyclic dependancy detected: %s %r" % (name, (next_pending,))) pending = next_pending emitted = next_emitted

what want called topological sort. while it's possible implement using builtin sort(), it's rather awkward, , it's improve implement topological sort straight in python.

why going awkward? if study 2 algorithms on wiki page, both rely on running set of "marked nodes", concept that's hard contort form sort() can use, since key=xxx (or cmp=xxx) works best stateless comparing functions, particularly because timsort doesn't guarantee order elements examined in. i'm (pretty) sure solution does utilize sort() going end redundantly calculating info each phone call key/cmp function, in order around statelessness issue.

the next alg i've been using (to sort javascript library dependancies):

edit: reworked greatly, based on winston ewert's solution

def topological_sort(source): """perform topo sort on elements. :arg source: list of ``(name, [list of dependancies])`` pairs :returns: list of names, dependancies listed first """ pending = [(name, set(deps)) name, deps in source] # re-create deps can modify set in-place emitted = [] while pending: next_pending = [] next_emitted = [] entry in pending: name, deps = entry deps.difference_update(emitted) # remove deps emitted lastly pass if deps: # still has deps? recheck during next pass next_pending.append(entry) else: # no more deps? time emit yield name emitted.append(name) # <-- not required, helps preserve original ordering next_emitted.append(name) # remember emitted difference_update() in next pass if not next_emitted: # entries have unmet deps, 1 of 2 things wrong... raise valueerror("cyclic or missing dependancy detected: %r" % (next_pending,)) pending = next_pending emitted = next_emitted

sidenote: is possible shoe-horn cmp() function key=xxx, outlined in python bug tracker message.

python sorting topological-sort

No comments:

Post a Comment