Efficient prime k-tuple search
Last updated
Last updated
I made this since I have been struggling to find a complete, from start to finish and more importantly beginner friendly explanation on how to search for prime k-tuples or prime constellations quicker, excluding the obvious brute force solution. To follow along, some mathematical background is required such as modular arithmetic, what is a prime constellation is and basic understanding of algorithm complexity.
The problem that we want to solve is: Given a constellation of length in the form of and a target number find a prime number such that are all prime and . In our examples we will use the pattern for a prime triplet. You can find further info about the possible patterns here.
A primorial with primorial number is the multiplication of the first primes and it is denoted by . For example , , and so on. We will use the primorial to quickly eliminate candidates by understanding a recurring pattern around the primorial.
The most important category of tools that we will require is primality testing algorithms. We have multiple options in our disposal, some exact while others are probabilistic. Since rieMiner is using the Fermat test we will be using that. It seems to be fast enough as well as simple to implement, with negligible probability of a false positive (i.e. a pseudo-prime) and with ever more decreasing probability for false positives the longer the candidates we test are. The Fermat primality test works as follows:
If you want to test if is prime
Pick a random number such that
Check for this equality
If true then is prime
In our case the following implementation is used:
The Sieve of Eratosthenes is a popular algorithm for "sieving" out composite numbers from a prime table up to a given limit. It is usually used to quickly find all prime numbers up to a given limit. One important property of this algorithm is that it trade's off memory with time. The idea is to initialize a boolean array with 1's where each index represents a number and its value represents if its prime or not. The idea is that after we initialize the array with 1's we use known primes such as 2 and 3 and eliminate their multiples by setting them to zero. Then we move on and implement this for the next non eliminated number which in our case will be 5 (another prime). And so on until we reach the end of the table. When we are done we have a table where each prime number is set to 1 and each composite number is set to false.
The following function is used in the python script to quickly generate all prime numbers up to num
and returns a list of all the primes up to num
.
We will use the sieve of Eratosthenes to build a prime table in order to be able to quickly use small primes in our calculations. The use-case will become clearer once we see how to sieve factors in the final version of the algorithm.
The script provided doesn't implement a brute force search but its quite trivial to do:
This is by far the slowest way to do things but its quite important to understand this first before we move on to a more optimized version
Despite the fact that we have implemented some optimizations in our code there is a lot to be desired. Firstly we could be using a more suited programming language like C, C++ or Rust which are all generally faster than Python. Since our goal is to be as efficient as possible, this is a must for the final implementation. Another thing to consider is to use an efficiently implemented boolean array to sieve out the candidates. This is usually done by using an unsigned integer array where each bit of the integer represents a number. Then use bit-maks to be able to set a specific bit with 1 or 0 depending on whether that number is a prime or not. This will be a lot quicker and memory efficient as well. Finally we can compute multiple offsets (as many as the CPU cores in our disposal) and parallelize the search on each core, giving each core its own offset. Finally, you can find the complete code here.
The multiplicative inverse of a number in modular arithmetic is the number such that . It can be computed using the extended Euclidean algorithm but in our case we will use the build in python function pow()
. We require the multiplicative inverse when we compute the inverses of the primorial p_m
mod each prime p
in our prime table. This is done in the following function:
The most obvious solution is to brute-force our way into testing each odd to see if its the base of a constellation with our given pattern. The following function is used to test if is a base of our constellation pattern.
The idea of this optimization is based on the fact that we would like to test as few as possible using the fermat_is_valid_pow()
function which will be in a tight loop taking most of the program's time. To do this we would like to sieve out some candidates that are definitely not going to be the base of a prime constellation. We have already done that even in the brute-force version since we ignore even numbers. If we generalize this idea to sieve not only the even numbers, but also odds that are not primes, and even primes that when adding the offset of the desired constellation, produce composites. To be able to sieve out these numbers systematically we need to establish a pattern. To do that we will build a sieve of Eratosthenes, and sieve out and , which are the primes the third primorial or is comprised of.
The above table represents our sieve of Eratosthenes, where the first row represents each number from 0-29 (the decimal parts of the numbers are left out for readability purposes). The next two rows represent the sieved out numbers based on primes which our primorial of choice is comprised of. The final row represents all the eliminated/sieved numbers when taking the whole primorial into account. What we can see is a pattern that emerges. Specifically the pattern xoxxxxxoxxxoxoxxxoxoxxxoxxxxxo
of eliminated numbers. Note that x
is an eliminated number and o
is a number that has not been sieved out. We can see that this pattern repeats forever. Thus, we can use this observation to our advantage. Since the pattern of primes relative to the primorial repeats, then the pattern of prime k-tuples also repeats (relative to that primorial).
The idea is to calculate the offsets based on the above table, up to . In our example we notice that we have a few numbers left until after we sieve out primes and . For those remaining numbers we also have to check if are also divided by and . We will choose a simpler pattern for this example but this applied to any pattern. Lets use the triplet as an example. For those left numbers we have to check if and are divisible by our primes . If so we have to also eliminate them. This makes sense as we don't want to consider prime bases that don't match our desired pattern! After doing so the left numbers are only . These are the offsets for the primorial . Now we only need to take into account numbers of the form and starting from some such that our numbers are bigger than our target . This is done by calculating and checking for numbers of the form
or more concretely for our example and
In practice for larger primorials these offsets will be a lot more so reality we do something a bit different. The idea is to quickly take the first offset which in the previous example was but without doing the whole sieve for primorial . This is achieved by using the base prime of a constellation (with the same pattern as our desired pattern) that is larger than the prime of the primorial of choice . In practice the offsets are pre-calculated and given as input to the algorithm.
The last step in our algorithm is to eliminate possible 's that produce candidates that are definitely composite. If we can also eliminate them based on a pattern, and also efficiently compute this pattern, then we could save a lot of time in the primality testing phase as we skip on these 's. In the python implementation a dictionary is used, which in a more real world setting would be replaced by a boolean array. Since we are only considering numbers of the form if we require that where p is a prime in the prime table we can solve for
where is the offset in our desired constellation pattern, is the multiplicative inverse of our selected primorial modulo and is the pre-calculated offset that is given. Using that we can calculate an such that is divisible by thus we can eliminate and all of its multiples. The following Python code is the implementation:
As we can see, we consider all primes in our prime table that are larger than the largest number in our primorial or where the multiplicative inverse of exists. We then calculate for each in our constellation pattern and eliminate and its multiples up to an arbitrarily selected .
Finally, when we do wheel factorization we skip over the 's that provably do not produce prime candidates. The following code demonstrates that: