Wednesday 15 July 2015

optimization - I need some help to optimize a python code -



optimization - I need some help to optimize a python code -

i'm working on knn classifier using python have problems. next piece of code takes 7.5s-9.0s completed , i'll have run 60.000 times.

fold in folds: dot2 in fold: """ distances[x][0] = class of dot2 distances[x][1] = distance between dot1 , dot2 """ distances.append([dot2[0], calc_distance(dot1[1:], dot2[1:], method)])

the "folds" variable list 10 folds summed contain 60.000 inputs of images in .csv format. first value of each dot class belongs to. values in integer. there way create line run faster ?

here calc_distance function

def calc_distancia(dot1, dot2, distance): if distance == "manhanttan": total = 0 #for each coord, take absolute difference x in range(0, len(dot1)): total = total + abs(dot1[x] - dot2[x]) homecoming total elif distance == "euclidiana": total = 0 x in range(0, len(dot1)): total = total + (dot1[x] - dot2[x])**2 homecoming math.sqrt(total) elif distance == "supremum": total = 0 x in range(0, len(dot1)): if abs(dot1[x] - dot2[x]) > total: total = abs(dot1[x] - dot2[x]) homecoming total elif distance == "cosseno": dist = 0 p1_p2_mul = 0 p1_sum = 0 p2_sum = 0 x in range(0, len(dot1)): p1_p2_mul = p1_p2_mul + dot1[x]*dot2[x] p1_sum = p1_sum + dot1[x]**2 p2_sum = p2_sum + dot2[x]**2 p1_sum = math.sqrt(p1_sum) p2_sum = math.sqrt(p2_sum) quociente = p1_sum*p2_sum dist = p1_p2_mul/quociente homecoming dist

edit: found way create faster @ to the lowest degree "manhanttan" method. instead of:

if distance == "manhanttan": total = 0 #for each coord, take absolute difference x in range(0, len(dot1)): total = total + abs(dot1[x] - dot2[x]) homecoming total

i put

if distance == "manhanttan": totalp1 = 0 totalp2 = 0 #for each coord, take absolute difference x in range(0, len(dot1)): totalp1 += dot1[x] totalp2 += dot2[x] homecoming abs(totalp1-totalp2)

the abs() phone call heavy

there many guides "profiling python"; should search some, read them, , walk through profiling process ensure know parts of work taking time.

but if core of work, it's fair bet that calc_distance bulk of running time beingness consumed.

optimizing require using numpy accelerated math or similar, lower-level approach.

as quick , dirty approach requiring less invasive profiling , rewriting, seek installing pypy implementation of python , running under it. have seen easy 2x or more accelerations compared standard (cpython) implementation.

python optimization artificial-intelligence classification knn

No comments:

Post a Comment