c# - Compute the Cartesian Distance between points in high-dimensional space optimally -
summary:
my class squaredistance computes square of cartesian distance in 4 ways using methods these names:
signed unsignedbranching unsigneddistribute casttosignedlongthe first 1 fastest , uses signed integers, info must unsigned (for reasons given below). other 3 methods start unsigned numbers. goal write method in squaredistance takes unsigned info , performs improve 3 wrote, close possible in performance #1. code benchmark results follows. (unsafe code permitted, if think help.)
details:
i developing algorithm solve k-nearest neighbour problems using index derived hilbert curve. execution time naive, linear scan algorithm grows in time quadratically number of points , linearly number of dimensions, , spends time computing , comparing cartesian distances.
the motivation behind special hilbert index cut down number of times distance function called. however, must still called millions of times, must create fast possible. (it called function in program. recent failed effort optimize distance function doubled programme execution time 7 minutes 15 minutes, no, not premature or superfluous optimization.)
dimensions: points may have anywhere 10 5 one thousand dimensions.
constraints. have 2 annoying constraints:
the hilbert transformation logic requires points expressed uint (unsigned integer) arrays. (the code written another, magic , uses shifts, ands, ors, , , can't changed.) storing points signed integers , incessantly casting them uint arrays produced wretched performance, must @ to the lowest degree store uint array re-create of each point.
to improve efficiency, made signed integer re-create of each point speed distance calculations. worked well, 1 time 3,000 dimensions, run out of memory!
to save on memory, removed memoized signed integer arrays , tried write unsigned version of distance calculation. best results 2.25 times worse signed integer version.
the benchmarks create 1000 random points of 1000 dimensions each , perform distance calculations between every point , every other point, 1,000,000 comparisons. since care relative distance, save time not performing square root.
in debug mode:
signedbenchmark ratio: 1.000 seconds: 3.739 unsignedbranchingbenchmark ratio: 2.731 seconds: 10.212 unsigneddistributebenchmark ratio: 3.294 seconds: 12.320 casttosignedlongbenchmark ratio: 3.265 seconds: 12.211
in release mode:
signedbenchmark ratio: 1.000 seconds: 3.494 unsignedbranchingbenchmark ratio: 2.672 seconds: 9.334 unsigneddistributebenchmark ratio: 3.336 seconds: 11.657 casttosignedlongbenchmark ratio: 3.471 seconds: 12.127
the above benchmarks run on dell intel core i7-4800mq cpu @ 2.70ghz 16 gb memory. larger algorithm uses task parallel library larger tasks, fruitless parallelize inner loop.
question: can think of faster algorithm unsignedbranching?
below benchmark code.
update
this uses loop unrolling (thanks @dasblinkenlight) , 2.7 times faster:
public static long unsignedloopunrolledbranching(uint[] x, uint[] y) { var distance = 0ul; var leftovers = x.length % 4; var dimensions = x.length; var rounddimensions = dimensions - leftovers; (var = 0; < rounddimensions; += 4) { var x1 = x[i]; var y1 = y[i]; var x2 = x[i+1]; var y2 = y[i+1]; var x3 = x[i+2]; var y3 = y[i+2]; var x4 = x[i+3]; var y4 = y[i+3]; var delta1 = x1 > y1 ? x1 - y1 : y1 - x1; var delta2 = x2 > y2 ? x2 - y2 : y2 - x2; var delta3 = x3 > y3 ? x3 - y3 : y3 - x3; var delta4 = x4 > y4 ? x4 - y4 : y4 - x4; distance += delta1 * delta1 + delta2 * delta2 + delta3 * delta3 + delta4 * delta4; } (var = rounddimensions; < dimensions; i++) { var xi = x[i]; var yi = y[i]; var delta = xi > yi ? xi - yi : yi - xi; distance += delta * delta; } homecoming (long)distance; }
squaredistance.cs:
using system; using system.collections.generic; using system.linq; using system.text; using system.threading.tasks; namespace distancebenchmark { /// <summary> /// provide several alternate methods computing square of cartesian distance /// allow study of relative performance. /// </summary> public static class squaredistance { /// <summary> /// compute square of cartesian distance between 2 n-dimensional points /// calculations done on signed numbers using signed arithmetic, /// single multiplication , no branching. /// </summary> /// <param name="x">first point.</param> /// <param name="y">second point.</param> /// <returns>square of distance.</returns> public static long signed(int[] x, int[] y) { var distance = 0l; var dimensions = x.length; (var = 0; < dimensions; i++) { var delta = x[i] - y[i]; distance += delta * delta; } homecoming distance; } /// <summary> /// compute square of cartesian distance between 2 n-dimensional points /// calculations done on unsigned numbers using unsigned arithmetic, single multiplication /// , branching instruction (the ternary operator). /// </summary> /// <param name="x">first point.</param> /// <param name="y">second point.</param> /// <returns>square of distance.</returns> public static long unsignedbranching(uint[] x, uint[] y) { var distance = 0ul; var dimensions = x.length; (var = 0; < dimensions; i++) { var xi = x[i]; var yi = y[i]; var delta = xi > yi ? xi - yi : yi - xi; distance += delta * delta; } homecoming (long)distance; } /// <summary> /// compute square of cartesian distance between 2 n-dimensional points /// calculations done on unsigned numbers using unsigned arithmetic , distributive law, /// requires 4 multiplications , no branching. /// /// prevent overflow, coordinates cast ulongs. /// </summary> /// <param name="x">first point.</param> /// <param name="y">second point.</param> /// <returns>square of distance.</returns> public static long unsigneddistribute(uint[] x, uint[] y) { var distance = 0ul; var dimensions = x.length; (var = 0; < dimensions; i++) { ulong xi = x[i]; ulong yi = y[i]; distance += xi * xi + yi * yi - 2 * xi * yi; } homecoming (long)distance; } /// <summary> /// compute square of cartesian distance between 2 n-dimensional points /// calculations done on unsigned numbers using signed arithmetic, /// first casting values longs. /// </summary> /// <param name="x">first point.</param> /// <param name="y">second point.</param> /// <returns>square of distance.</returns> public static long casttosignedlong(uint[] x, uint[] y) { var distance = 0l; var dimensions = x.length; (var = 0; < dimensions; i++) { var delta = (long)x[i] - (long)y[i]; distance += delta * delta; } homecoming distance; } } }
randompointfactory.cs:
using system; using system.collections.generic; using system.linq; using system.text; using system.threading.tasks; namespace distancebenchmark { public static class randompointfactory { /// <summary> /// random list of signed integer points given number of dimensions utilize test data. /// </summary> /// <param name="recordcount">number of points get.</param> /// <param name="dimensions">number of dimensions per point.</param> /// <returns>signed integer test data.</returns> public static ilist<int[]> getsignedtestpoints(int recordcount, int dimensions) { var testdata = new list<int[]>(); var random = new random(datetime.now.millisecond); (var irecord = 0; irecord < recordcount; irecord++) { int[] point; testdata.add(point = new int[dimensions]); (var d = 0; d < dimensions; d++) point[d] = random.next(100000); } homecoming testdata; } /// <summary> /// random list of unsigned integer points given number of dimensions utilize test data. /// </summary> /// <param name="recordcount">number of points get.</param> /// <param name="dimensions">number of dimensions per point.</param> /// <returns>unsigned integer test data.</returns> public static ilist<uint[]> getunsignedtestpoints(int recordcount, int dimensions) { var testdata = new list<uint[]>(); var random = new random(datetime.now.millisecond); (var irecord = 0; irecord < recordcount; irecord++) { uint[] point; testdata.add(point = new uint[dimensions]); (var d = 0; d < dimensions; d++) point[d] = (uint)random.next(100000); } homecoming testdata; } } }
program.cs:
using system; using system.collections.generic; using system.diagnostics; using system.linq; using system.text; using system.threading.tasks; namespace distancebenchmark { public class programme { private static ilist<int[]> signedtestdata = randompointfactory.getsignedtestpoints(1000, 1000); private static ilist<uint[]> unsignedtestdata = randompointfactory.getunsignedtestpoints(1000, 1000); static void main(string[] args) { var baseline = timeit("signedbenchmark", signedbenchmark); timeit("unsignedbranchingbenchmark", unsignedbranchingbenchmark, baseline); timeit("unsigneddistributebenchmark", unsigneddistributebenchmark, baseline); timeit("casttosignedlongbenchmark", casttosignedlongbenchmark, baseline); timeit("signedbenchmark", signedbenchmark, baseline); console.writeline("done. type key exit."); console.readline(); } public static void signedbenchmark() { foreach(var p1 in signedtestdata) foreach (var p2 in signedtestdata) squaredistance.signed(p1, p2); } public static void unsignedbranchingbenchmark() { foreach (var p1 in unsignedtestdata) foreach (var p2 in unsignedtestdata) squaredistance.unsignedbranching(p1, p2); } public static void unsigneddistributebenchmark() { foreach (var p1 in unsignedtestdata) foreach (var p2 in unsignedtestdata) squaredistance.unsigneddistribute(p1, p2); } public static void casttosignedlongbenchmark() { foreach (var p1 in unsignedtestdata) foreach (var p2 in unsignedtestdata) squaredistance.casttosignedlong(p1, p2); } public static double timeit(string testname, action benchmark, double baseline = 0.0) { var stopwatch = new stopwatch(); stopwatch.start(); benchmark(); stopwatch.stop(); var seconds = stopwatch.elapsed.totalseconds; var ratio = baseline <= 0 ? 1.0 : seconds/baseline; console.writeline(string.format("{0,-32} ratio: {1:0.000} seconds: {2:0.000}", testname, ratio, seconds)); homecoming seconds; } } }
you should able shave off lot of execution time unrolling loops:
public static long signed(int[] x, int[] y) { var distance = 0l; var dimensions = x.length; var stop = dimensions - (dimensions % 4); (var = 0; < stop; i+=4) { var delta0 = x[i] - y[i]; var delta1 = x[i+1] - y[i+1]; var delta2 = x[i+2] - y[i+2]; var delta3 = x[i+3] - y[i+3]; distance += (delta0 * delta0) + (delta1 * delta1) + (delta2 * delta2) + (delta3 * delta3); } (var = stop; < dimensions; i++) { var delta = x[i] - y[i]; distance += delta * delta; } homecoming distance; }
this alter lone reduced execution time 8.325s 4.745s on local scheme - 43% improvement!
the thought 4 points @ time while can, , finish off remaining points in separate loop.
c# optimization
No comments:
Post a Comment