import cProfile def vector_op(v1,v2,op): return [op(x,y) for x,y in zip(v1,v2)] v1 = range(1, 10000000) v2 = range(1, 10000000) cProfile.run('print sum(vector_op(v1,v2, lambda x,y: x + y))')

99999990000000 10000046 function calls (10000040 primitive calls) in 59.064 seconds ncalls tottime percall cumtime percall filename:lineno(function) 9999999 25.622 0.000 25.622 0.000:1( ) 1 31.126 31.126 58.165 58.165 PythonApplication1.py:21(vector_op) 1 0.837 0.837 0.837 0.837 {sum} 1 1.417 1.417 1.417 1.417 {zip}

Let's not return an array but a generator expression.

import cProfile def vector_op_lazy(v1,v2,op): return (op(x,y) for x,y in zip(v1,v2)) v1 = range(1, 10000000) v2 = range(1, 10000000) cProfile.run('print sum(vector_op_lazy(v1,v2, lambda x,y: x + y))')

What is really surprising is that it even takes more time now. So what are we doing still wrong?

99999990000000 20000046 function calls (20000040 primitive calls) in 91.824 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 9999999 26.132 0.000 26.132 0.000:1( ) 10000000 46.346 0.000 72.478 0.000 PythonApplication1.py:25( ) 1 17.966 17.966 90.444 90.444 {sum} 1 1.378 1.378 1.378 1.378 {zip}

Let's also use xrange which is lazy evaluated and see how far that gets us.

import cProfile def vector_op_lazy(v1,v2,op): return (op(x,y) for x,y in zip(v1,v2)) v1 = xrange(1, 10000000) v2 = xrange(1, 10000000) cProfile.run('print sum(vector_op_lazy(v1,v2, lambda x,y: x + y))')

99999990000000 20000046 function calls (20000040 primitive calls) in 96.393 seconds ncalls tottime percall cumtime percall filename:lineno(function) 9999999 27.329 0.000 27.329 0.000:1( ) 10000000 48.976 0.000 76.305 0.000 PythonApplication1.py:27( ) 1 19.059 19.059 95.364 95.364 {sum} 1 1.028 1.028 1.028 1.028 {zip}

Will izip do us anygood in this case?

import cProfile from itertools import izip def vector_op_lazy(v1,v2,op): return (op(x,y) for x,y in izip(v1,v2)) v1 = xrange(1, 10000000) v2 = xrange(1, 10000000) cProfile.run('print sum(vector_op_lazy(v1,v2, lambda x,y: x + y))')

99999990000000 20000045 function calls (20000039 primitive calls) in 91.346 seconds ncalls tottime percall cumtime percall filename:lineno(function) 9999999 26.304 0.000 26.304 0.000:1( ) 10000000 46.759 0.000 73.063 0.000 PythonApplication1.py:27( ) 1 18.280 18.280 91.343 91.343 {sum}

What about not using a lambda expression?

import cProfile from itertools import izip def vector_op_lazy(v1,v2,op): return (op(x,y) for x,y in izip(v1,v2)) def op_add(x,y): return x + y v1 = xrange(1, 10000000) v2 = xrange(1, 10000000) cProfile.run('print sum(vector_op_lazy(v1,v2, op_add))')

99999990000000 20000045 function calls (20000039 primitive calls) in 89.603 seconds ncalls tottime percall cumtime percall filename:lineno(function) 10000000 46.009 0.000 71.799 0.000 PythonApplication1.py:27() 9999999 25.790 0.000 25.790 0.000 PythonApplication1.py:29(op_add) 1 17.804 17.804 89.603 89.603 {sum}

Ok... last hope?! Let's inline the addition instead of calling operator function.

import cProfile from itertools import izip def vector_op_lazy(v1,v2): return (x + y for x,y in izip(v1,v2)) v1 = xrange(1, 10000000) v2 = xrange(1, 10000000) cProfile.run('print sum(vector_op_lazy(v1,v2))')

99999990000000 10000046 function calls (10000040 primitive calls) in 45.078 seconds ncalls tottime percall cumtime percall filename:lineno(function) 10000000 27.925 0.000 27.925 0.000 PythonApplication1.py:27() 1 17.153 17.153 45.077 45.077 {sum}

That seemed to help a lot, so beware of using too much abstraction when performance really matters. Not making the extra number of function calls reduces the execution time a lot. So now just for fun let us not return a generator expression but an array again.

import cProfile from itertools import izip def vector_op(v1,v2): return [x + y for x,y in zip(v1,v2)] v1 = xrange(1, 10000000) v2 = xrange(1, 10000000) cProfile.run('print sum(vector_op(v1,v2))')

What the f@ck... The generator expression in combination with the sum function seems like total killing performance. Storing the result from the zip call in an array and next calling sum boosted performance.

99999990000000 47 function calls (41 primitive calls) in 14.603 seconds ncalls tottime percall cumtime percall filename:lineno(function) 1 12.678 12.678 13.733 13.733 PythonApplication1.py:25(vector_op) 1 0.810 0.810 0.810 0.810 {sum} 1 1.056 1.056 1.056 1.056 {zip}

This answer from stackoverflow is very useful to understand the outcome of this test. I conclude this article that it always makes good sense to profile your app and decide for yourself what is the best approach.

Very interesting observations indeed.

ReplyDeleteBut it's quite easy to understand that many function calls can lead to significant overhead. That's what you should really take into account with functional programming in your hands.

yep... first of all it is a micro benchmark and not a valid representation of a real life program. Secondly I was still curious to find out what would be the impact of applying pure lazy evaluation with regard to performance. Like I said, if performance in extreme cases really matters, just Profile the damn thing and do whatever is necessary ;-)

ReplyDelete