Efficient prime k-tuple search

Prime Constellations Demystified

I made this since I have been struggling to find a complete, from start to finish and more importantly beginner friendly explanation on how to search for prime k-tuples or prime constellations quicker, excluding the obvious brute force solution. To follow along, some mathematical background is required such as modular arithmetic, what is a prime constellation is and basic understanding of algorithm complexity.

The problem to solve

The problem that we want to solve is: Given a constellation of length $k$ in the form of $(o_1, o_2, ..., o_k)$ and a target number $T$ find a prime number $n$ such that $(n+o_1, n+o_2, ... ,n+o_k)$ are all prime and $n \geq T$ . In our examples we will use the pattern for a prime triplet. You can find further info about the possible patterns here.

Required Mathematical Tools

Primorial

A primorial with primorial number $m$ is the multiplication of the first $m$ primes and it is denoted by $p_m \#$ . For example $p_1 \# = 2$ , $p_2 \# = 2 \cdot 3 = 6$ , $p_3 \# = 2 \cdot 3 \cdot 5 = 30$ and so on. We will use the primorial to quickly eliminate candidates by understanding a recurring pattern around the primorial.

Fermat Primality Test

The most important category of tools that we will require is primality testing algorithms. We have multiple options in our disposal, some exact while others are probabilistic. Since rieMiner is using the Fermat test we will be using that. It seems to be fast enough as well as simple to implement, with negligible probability of a false positive (i.e. a pseudo-prime) and with ever more decreasing probability for false positives the longer the candidates we test are. The Fermat primality test works as follows:

If you want to test if $n$ is prime

Pick a random number $a$ such that $1 < a < n$
Check for this equality $a^{n-1} \equiv 1 \; (mod \; n)$
If true then $n$ is prime

In our case the following implementation is used:

def fermat_prime_p(n):
    if n == 2:
        return True
    if not n & 1:
        return False
    if pow(2, n-1, n) == 1:
        return True
    return False

Sieve of Eratosthenes

The Sieve of Eratosthenes is a popular algorithm for "sieving" out composite numbers from a prime table up to a given limit. It is usually used to quickly find all prime numbers up to a given limit. One important property of this algorithm is that it trade's off memory with time. The idea is to initialize a boolean array with 1's where each index represents a number and its value represents if its prime or not. The idea is that after we initialize the array with 1's we use known primes such as 2 and 3 and eliminate their multiples by setting them to zero. Then we move on and implement this for the next non eliminated number which in our case will be 5 (another prime). And so on until we reach the end of the table. When we are done we have a table where each prime number is set to 1 and each composite number is set to false.

The following function is used in the python script to quickly generate all prime numbers up to num and returns a list of all the primes up to num.

def sieve_of_eratosthenes(num):
    prime = [True for i in range(num+1)]
    # boolean array
    p = 2
    while (p * p <= num):
  
        # If prime[p] is not
        # changed, then it is a prime
        if (prime[p] == True):
  
            # Updating all multiples of p
            for i in range(p * p, num+1, p):
                prime[i] = False
        p += 1
  
    primes = []

    # Print all prime numbers
    for p in range(2, num+1):
        if prime[p]:
            primes.append(p)
    
    return primes

We will use the sieve of Eratosthenes to build a prime table in order to be able to quickly use small primes in our calculations. The use-case will become clearer once we see how to sieve factors in the final version of the algorithm.

Multiplicative Inverse

The multiplicative inverse of a number $a$ in modular arithmetic is the number $x$ such that $ax \equiv 1 (mod \; n)$ . It can be computed using the extended Euclidean algorithm but in our case we will use the build in python function pow(). We require the multiplicative inverse when we compute the inverses of the primorial p_m mod each prime p in our prime table. This is done in the following function:

def inverses_of_eratosthenes(p_m, primes):
    inverses = []

    for p in primes:
        try:
            inverse = pow(p_m, -1, p)
        except:
            inverse = 0 # if no multiplicative inverse

        inverses.append(inverse)

    return inverses

Trivial Solution: Brute-force

The most obvious solution is to brute-force our way into testing each odd $n \geq T$ to see if its the base of a constellation with our given pattern. The following function is used to test if $n$ is a base of our constellation pattern.

def fermat_is_valid_pow(n, constellation_pattern):
    for offset in constellation_pattern:
        if not fermat_prime_p(n+offset):
            return False
    return True

The script provided doesn't implement a brute force search but its quite trivial to do:

n_max = 100000
for i in range(5,n_max,2):
    if fermat_is_valid_pow(i, constellation_pattern):
        print(i)

This is by far the slowest way to do things but its quite important to understand this first before we move on to a more optimized version

Optimized Solution: Wheel Factorization

The idea of this optimization is based on the fact that we would like to test as few $n$ as possible using the fermat_is_valid_pow() function which will be in a tight loop taking most of the program's time. To do this we would like to sieve out some candidates $c$ that are definitely not going to be the base of a prime constellation. We have already done that even in the brute-force version since we ignore even numbers. If we generalize this idea to sieve not only the even numbers, but also odds that are not primes, and even primes that when adding the offset of the desired constellation, produce composites. To be able to sieve out these numbers systematically we need to establish a pattern. To do that we will build a sieve of Eratosthenes, and sieve out $2,3$ and $5$ , which are the primes the third primorial or $p_3\#$ is comprised of.

        012345678901234567890123456789
p = 2   xoxoxoxoxoxoxoxoxoxoxoxoxoxoxo
p = 3   xooxooxooxooxooxooxooxooxooxoo
p = 5   xooooxooooxooooxooooxooooxoooo
p_3#    xoxxxxxoxxxoxoxxxoxoxxxoxxxxxo

The above table represents our sieve of Eratosthenes, where the first row represents each number from 0-29 (the decimal parts of the numbers are left out for readability purposes). The next two rows represent the sieved out numbers based on primes $2,3,5$ which our primorial of choice $p_3\#$ is comprised of. The final row represents all the eliminated/sieved numbers when taking the whole primorial into account. What we can see is a pattern that emerges. Specifically the pattern xoxxxxxoxxxoxoxxxoxoxxxoxxxxxo of eliminated numbers. Note that x is an eliminated number and o is a number that has not been sieved out. We can see that this pattern repeats forever. Thus, we can use this observation to our advantage. Since the pattern of primes relative to the primorial repeats, then the pattern of prime k-tuples also repeats (relative to that primorial).

The idea is to calculate the offsets based on the above table, up to $p_m\#$ . In our example we notice that we have a few numbers left until $p_3\#$ after we sieve out primes $2,3$ and $5$ . For those remaining numbers we also have to check if $i+o_1, i+o_2, ..., i+o_k$ are also divided by $2,3$ and $5$ . We will choose a simpler pattern for this example but this applied to any pattern. Lets use the triplet $n + (0, 2, 6)$ as an example. For those left numbers $i$ we have to check if $i+2$ and $i+6$ are divisible by our primes $2,3,5$ . If so we have to also eliminate them. This makes sense as we don't want to consider prime bases that don't match our desired pattern! After doing so the left numbers are only $(11, 17)$ . These are the offsets for the primorial $p_3\#$ . Now we only need to take into account numbers of the form $p_3\# \cdot f + 11$ and $p_3\# \cdot f + 17$ starting from some $f$ such that our numbers are bigger than our target $T$ . This is done by calculating $T^{'} = T + p_m \# - (T \; mod \; p_m\#)$ and checking for numbers of the form

$T_f = T^{'} + p_m \# \cdot f + o$

or more concretely for our example $T_f = T^{'} + p_3 \# \cdot f + 11$ and $T_f = T^{'} + p_3 \# \cdot f + 17$

In practice for larger primorials these offsets will be a lot more so reality we do something a bit different. The idea is to quickly take the first offset $o_1$ which in the previous example was $11$ but without doing the whole sieve for primorial $p_m \#$ . This is achieved by using the base prime $n$ of a constellation (with the same pattern as our desired pattern) that is larger than the $m^{th}$ prime $p_m$ of the primorial of choice $p_m \#$ . In practice the offsets are pre-calculated and given as input to the algorithm.

More optimized: Eliminating factors

The last step in our algorithm is to eliminate possible $f$ 's that produce candidates that are definitely composite. If we can also eliminate them based on a pattern, and also efficiently compute this pattern, then we could save a lot of time in the primality testing phase as we skip on these $f$ 's. In the python implementation a dictionary is used, which in a more real world setting would be replaced by a boolean array. Since we are only considering numbers of the form $p_3 \# \cdot f + o + c_i$ if we require that $o \leq f \leq p$ where p is a prime in the prime table we can solve for

$f_p = ((p - ((T^{'} + o + c_i) \; mod \; p)) \cdot {p_m\#}^{-1}) \; mod \; p$

where $c_i$ is the offset in our desired constellation pattern, ${p_m\#}^{-1}$ is the multiplicative inverse of our selected primorial modulo $p$ and $o$ is the pre-calculated offset that is given. Using that we can calculate an $f$ such that $T_{f_p}$ is divisible by $p$ thus we can eliminate $f_p$ and all of its multiples. The following Python code is the implementation:

def get_eliminated_factors(primes, inverses, offset, p_m, m, t, constellation_pattern):
    T2 = t + p_m - (t % p_m) # T'
    eliminated_factors = {}
    k_max = 100

    i = 0
    for p in primes:
        p_m_inverse = inverses[i]

        if p_m_inverse != 0:

            for c_i in constellation_pattern:
                f_p = ((p - ((T2 + o + c_i) % p))*p_m_inverse) % p
                eliminated_factors[f_p] = True

                for k in range(0, k_max):
                        f_p+=p
                        eliminated_factors[f_p] = True

        i+=1
    return eliminated_factors

As we can see, we consider all primes in our prime table that are larger than $p_m$ the largest number in our primorial or where the multiplicative inverse of $p_m\# \; mod \; p$ exists. We then calculate $f_p$ for each $c_i$ in our constellation pattern and eliminate $f_p$ and its multiples up to an arbitrarily selected $k_{max}$ .

Finally, when we do wheel factorization we skip over the $f$ 's that provably do not produce prime candidates. The following code demonstrates that:

for f in range(f_start, f_start+f_max):
    if f not in eliminated_factors:
        primality_tests +=1
        candidate = p_m*f + o
        if fermat_is_valid_pow(candidate, constellation_pattern):
            print(candidate)
            primes+=1