Optimal Number of Threads in Python

One of the most infamous features of Python is the GIL (Global Interpreter Lock), this means thread performance is significantly limited. The GIL responsible for protecting access to Python objects, this is because CPython’s memory management is not thread safe. Threads in Python essentially interrupt one another, meaning that only one thread has access to Python objects at one time. In many situations this can cause a program to run slower using threads, particularly if these threads are doing CPU bound work. Despite their limitations Python threads can be very performant for I/O bound work such as making a request to a web server. Unlike highly concurrent languages such as Golang and Erlang we cannot launch thousands of Goroutines or Erlang ‘processes’.

This makes it hard to determine the correct number of threads we should use to boost performance. At some point adding more threads is likely to degrade overall performance. I wanted to take a look at the optimal number of threads for an I/O bound task, namely making a HTTP request to a web server.

The Code

I wrote a short script using Python 3.6, requests and the concurrent futures library which makes a get request to the top 1,000 sites according to Amazon’s Alexa Web rankings. I then reran the script using a different number of threads, to see where performance would begin to drop off. To take account of uncontrollable variables, I re-ran each number of threads 5 times to produce an average.

The Results

As you can see as we first start increasing the number of threads used by our demo program, the number HTTP requests we can make per second increases quite rapidly. The increase performance starts dropping once we reach around 50 threads. Finally, once we use a total of 60 threads we actually start to see our HTTP request rate decrease, before it again starts get to faster as we approach a total 200 threads.

This is presumably due to the GIL. Once we add a significant amount of threads, each of the threads are essentially interfering with one another slowing down our program. In order to increase the performance of our code we would have to look into ways of releasing the GIL, such as writing a C extension or releasing the GIL within Cython code. I highly recommend watching this talk from David Beazley for those looking to get a better understanding of the GIL.

Despite the GIL our best result saw us make a total of 1,000 HTTP Get requests in a just a total of nine seconds. This sees us making a total of 111 HTTP requests per second! Which isn’t too bad for what is meant to be a slow language.

Caveats

The results from this experiment suggest that those writing threaded Python applications, should certainly take some time running tests to determine the optimum number of threads. The example used to run this test used I/O bound code, with little CPU overhead. Those running code with a greater amount of CPU bound code may find that they get less benefit from upping the number of threads. Despite, this I hope that this thread encourages people to look into using threads within their application. The increases performance achievable will depend highly on what exactly is being done within the threads.

There is also reason to believe that the optimal number of threads may differ from machine to machine. Which is another reason why it is certainly worth taking the time to test out a varying number of threads when you need to achieve maximum performance.

Leave a Reply

Your email address will not be published. Required fields are marked *