### Tutorial: Accelerating py_vollib with Concurrency.

Posted by Larry Richards on 03 May, 2017

## Introduction

Speed is one of the best features of vollib and its variants. But you might still find yourself waiting for your program to finish if you regularly compute many millions of implied volatility points. That could be due to that each process only uses a single CPU core by default. If you have 4 CPU cores, you can potentially exploit that extra horsepower to significantly speed things up using concurrency.

Concurrency takes on many forms, but today we are going to look at a trivially simple and effective example. Using py_vollib and Python 3.6.0 with Numba installed, we'll compare the speed of calculating Black implied volatility with and without concurrency.

#### A note about the hardware.

I'm using Mac OSX with a 2.2 GHz Intel Core i7 with four cores. The Core i7 supports hyper threading, which creates two virtual processors for every physical CPU core. As a result, the operating system sees eight cores instead of four, so we might expect to get even more throughput with 8x rather than 4x concurrency. The below experiment will demonstrate whether this is the case.

## Initial Setup

First, let's create a some code to do the work.

    import time
import numpy
from py_vollib.black.implied_volatility import implied_volatility as iv

def f(x):
for epsilon in numpy.linspace(0,10,1000000):
iv(.01, 100, 120+epsilon, .01, .5, 'c')


Our function f(x) simply calculates the Black implied volatility of an OTM call with a penny of premium. To prevent any unintentional speedups due to cacheing of results, we add a different value to the strike price for each call to iv.

Now let's add some code to measure the execution time:

    t1=time.time()
for x in range(1):
f(x)
t2 = time.time()
print('execution time was {:0.4} seconds'.format(t2-t1))


With this I got the result "execution time was 18.92 seconds."

## Going Parallel

Now let's introduce some concurrency. From Python's multiprocessing package we'll use the Pool class to create multiple worker processes.

    from multiprocessing import Pool
pool = Pool(processes=8)              # start 8 worker processes
t1=time.time()
pool.map(f, range(8))
t2 = time.time()
print('execution time was {:0.4} seconds'.format(t2-t1))


While this script is running, you should notice eight python processes in your performance monitor. In my case, each process is using over 80% of its corresponding virtual CPU core capacity.

My result was "execution time was 43.79 seconds," meaning about 5.5 seconds for each one million calls to iv. Note that we called the function 8 million times this go round. Our end result was a 3.5x speed increase. Why not 8x? It seems that even though we are using 8 virtual cores, our computations are limited in this case by the number of physical cores. That would explain a speed increase close to 4x. Hyper threading does not seem to be providing us much benefit, if any. Actually, the benefits of hyper threading do not scale linearly with the number of concurrent processes. You might try experimenting with more than 8 processes. I did find some small incremental speed gain (from 3.5x to 3.6x) when using 18 processes.

## Conclusion

Concurrency is a vast topic, and we've only scratched the surface. Other possibilities include offloading processes to a GPU, or using a pool of servers with a scheduler-worker configuration. Consider the specific needs of your application. If you need to communicate with your processes while they're running, the above solution will not work.

But if you're running everything on a single machine, are tired of waiting while computing many millions of implied volatilities with py_vollib plus Numba, you might consider the above simple technique.