Finding Prime Numbers

Summary

Here I explore coding a simple script for finding prime numbers using Python. I try my first attempt without any optimization by creating the script off the top of my head. As you will see, that script works but runs very slowly. I then follow some online lessons on how to optimize the script and get much faster results.

Prime Finder version 1

Now, here is the code for figuring out if any one number is prime.

But let's allow the user to get all the prime numbers from 1 to the value they enter:

Cool! Now turn that into a function:

And now run this:

This seems to work, but let's time how long this takes for larger numbers since it has to search all those numbers and might take a while!

So, finding all primes between 1 and 10,000 required 8+ seconds! What about 100,000?

start = time.time() find_primes(100000) end = time.time() elapsed = end - start print(f"Time elapsed: ",{elapsed})

Ok, so after a few minutes I gave up and terminated the kernel. Is there a more efficient way to do this??

Prime Finder 2.0

Need to optimize the prime number search space with each iteration so that the code doesn't just use brute force and check every single number. For example, we know that even numbers are not prime, so why not remove those first? The following methodology does that, but also removes multiples of every new prime number found.

Essentially we can work in reverse. Rather than finding each number that doesn't have any multiples, work through the list of multiples and remove all numbers from the final list that those multiples can create.

For example, the first prime number is 2, so we can remove all other values from the set that are multiples of 2. The next prime number is 3, so we can remove all other values from the set that are multiples of 3. The next is 5, since the 4 was already removed when the multiples of 2 were removed, and so on. Thus, the search list becomes smaller and smaller!

Now each time you repeat that set of code, you add values to the actual_primes set and remove values from the potential_primes set, making the search space smaller and smaller.

So let's repeat that process with a while loop:

Great! Now time to package it up into a function again:

Now let's time it again!

All primes through 10,000:

Wow!! Only 0.006 seconds compared to 8+ seconds with version 1!

All primes through 100,000:

Only 0.035 seconds for 100,000 compared to several minutes + with version 1

All primes through 1,000,000:

Now for the mega test: search for all prime numbers from one to one million:

Less. Than. A. Second. WOW

Conclusion

So, as you can see, code optimization can make all the difference between having time for happy hour or not :P

This was a great experience in learning how to optimize code, and that it is not just about the types of data structures (though I'm sure using sets here helped a ton), it's also about optimizing the algorithm itself so that it does as little reduntant work as possible. I credit Andrew Jones (at Data Science Infinity) for sharing this very elegant but powerful algorithm for cutting through the redundancy and efficiently extracting a list of prime numbers.