September 16, 2022
A positive integer is called square-free if it is a product of distinct primes. For example, 1222 is square-free since its prime factorization is 2 × 13 × 47. However, 1204 is not square-free since its prime factorization is 2 × 2 × 7 × 43 with the prime factor 2 appearing twice.
In this article, we compare two ways of generating a sorted list of all positive square-free numbers containing only prime factors from a given list of primes. For simplicity, we assume that the given list of primes is sorted in ascending order.
To streamline experimentations, we create a cabal
project called squarefree
with the following folder
structure
squarefree
├── CHANGELOG.md
├── app
│ └── Main.hs
└── squarefree.cabal
where squarefree.cabal
contains, among other things, the
following
executable squarefree
main-is: Main.hs
build-depends: base ^>=4.14.3.0
, criterion
, arithmoi
hs-source-dirs: app
default-language: Haskell2010
In other words, we the packages criterion
and
arithmoi
as dependencies.
The initial content of Main.hs
is
module Main where
import Criterion.Main (bench, bgroup, defaultMain, nf)
import Data.List (sort, subsequences)
import Math.NumberTheory.Primes (factorise, primes, unPrime)
naive :: [Integer] -> [Integer]
= undefined
naive
mergeBased :: [Integer] -> [Integer]
= undefined
mergeBased
numPrimes :: Int
= 10
numPrimes
main :: IO ()
= do
main let ps = map unPrime $ take numPrimes primes
let n = 2^numPrimes - 1
let n1 = 100
let n2 = floor $ fromIntegral n * 0.50
let n3 = floor $ fromIntegral n * 0.99
defaultMain
["naive"
bgroup show n1) $ nf (\xs -> naive xs !! n1) ps
[ bench (show n2) $ nf (\xs -> naive xs !! n2) ps
, bench (show n3) $ nf (\xs -> naive xs !! n3) ps
, bench (
]"mergeBased"
, bgroup show n1) $ nf (\xs -> mergeBased xs !! n1) ps
[ bench (show n2) $ nf (\xs -> mergeBased xs !! n2) ps
, bench (show n3) $ nf (\xs -> mergeBased xs !! n3) ps
, bench (
] ]
We will gradually fill in the details for the functions naive
and mergeBased
.
Essentially, we want to see how fast accessing elements at various indices is. The speed clearly depends on how fast such elements are computed. The indices are chosen to be near the front, around the middle, and near the end.
For the first method, we generate a list of all possible nonemtpy sublists of the given list of primes, compute the product of the elements of each sublist, and then sort the resulting list. This is implemented in the following:
naive :: [Integer] -> [Integer]
= sort $ map product (tail $ subsequences ps) naive ps
Let us give this a try in ghci
by issuing the command
cabal repl
in the squarefree
folder:
> naive [2,5,11]
λ2,5,10,11,22,55,110] [
The result is as expected.
One main drawback of naive
is that the entire list must
be constructed. If we are only interested in the entries near the front,
then a lot of computation is wasted. Is there a way to generate the
entries in the order they should appear? The following merge-based
method does the trick.
mergeBased :: [Integer] -> [Integer]
= []
mergeBased [] :ps) = let ys = mergeBased ps in p:merge ys (map (p*) ys)
mergeBased (p
merge :: (Ord a) => [a] -> [a] -> [a]
@(x:_) [] = xs
merge xs= ys
merge [] ys :xs) (y:ys) = case compare x y of
merge (xLT -> x:merge xs (y:ys)
GT -> y:merge (x:xs) ys
EQ -> x:merge xs ys
The function merge
merges two sorted list into one
sorted list and is used in merge sort. The reasoning behind
mergeBased
is that given an ascending list of primes
p:ps
, to obtain an ascending list of square-free numbers
from these primes, we can first obtain a list of square-free numbers,
which we call ys
, from the primes in ps
, merge
it with the list with every element in ys
multiplied by
p
, and put p
in front of the result.
With cabal run
, we obtain the following results:
benchmarking naive/100
time 101.7 μs (100.7 μs .. 102.7 μs)
0.999 R² (0.999 R² .. 1.000 R²)
mean 101.0 μs (100.4 μs .. 101.6 μs)
std dev 1.990 μs (1.551 μs .. 2.523 μs)
variance introduced by outliers: 15% (moderately inflated)
benchmarking naive/511
time 155.3 μs (154.3 μs .. 156.3 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 154.6 μs (154.0 μs .. 155.2 μs)
std dev 2.087 μs (1.815 μs .. 2.439 μs)
benchmarking naive/1012
time 208.9 μs (207.2 μs .. 210.7 μs)
1.000 R² (0.999 R² .. 1.000 R²)
mean 208.8 μs (207.5 μs .. 210.0 μs)
std dev 4.064 μs (3.473 μs .. 4.871 μs)
variance introduced by outliers: 13% (moderately inflated)
benchmarking mergeBased/100
time 4.672 μs (4.651 μs .. 4.692 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 4.651 μs (4.627 μs .. 4.668 μs)
std dev 68.44 ns (57.62 ns .. 81.46 ns)
variance introduced by outliers: 13% (moderately inflated)
benchmarking mergeBased/511
time 22.22 μs (22.14 μs .. 22.31 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 22.24 μs (22.19 μs .. 22.29 μs)
std dev 192.6 ns (159.3 ns .. 237.1 ns)
benchmarking mergeBased/1012
time 41.83 μs (41.60 μs .. 42.08 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 41.66 μs (41.56 μs .. 41.78 μs)
std dev 393.6 ns (292.2 ns .. 521.1 ns)
Changing numPrimes
to 20, we obtain
benchmarking naive/100
time 204.5 ms (195.9 ms .. 213.8 ms)
0.998 R² (0.992 R² .. 1.000 R²)
mean 211.4 ms (206.9 ms .. 215.3 ms)
std dev 5.542 ms (3.585 ms .. 8.460 ms)
variance introduced by outliers: 14% (moderately inflated)
benchmarking naive/524287
time 814.7 ms (784.7 ms .. 843.9 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 814.6 ms (807.2 ms .. 822.0 ms)
std dev 9.378 ms (4.381 ms .. 12.05 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking naive/1038089
time 1.348 s (1.286 s .. 1.439 s)
0.999 R² (0.998 R² .. 1.000 R²)
mean 1.365 s (1.353 s .. 1.386 s)
std dev 20.47 ms (1.835 ms .. 25.43 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking mergeBased/100
time 4.989 μs (4.967 μs .. 5.011 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 5.005 μs (4.994 μs .. 5.017 μs)
std dev 38.32 ns (29.76 ns .. 50.07 ns)
benchmarking mergeBased/524287
time 47.19 ms (45.64 ms .. 48.69 ms)
0.997 R² (0.995 R² .. 0.999 R²)
mean 47.18 ms (46.56 ms .. 47.75 ms)
std dev 1.175 ms (989.0 μs .. 1.397 ms)
benchmarking mergeBased/1038089
time 89.15 ms (86.89 ms .. 90.85 ms)
0.999 R² (0.998 R² .. 1.000 R²)
mean 89.65 ms (88.88 ms .. 90.40 ms)
std dev 1.272 ms (921.1 μs .. 1.703 ms)
The results are overwhelmingly in favour of
mergeBased
.
We could have ended the article if there weren’t the problem of generating all square-free positive integers without restrictions on the primes used. A lot can be said about these numbers. (See the Wikipedia article for more.)
There is no simple way to modify naive
to accept an
infinite list of primes. But mergeBased
works fine without
any modification:
> sf = 1:(mergeBased $ map unPrime primes)
λ> take 28 sf
λ1,2,3,5,6,7,10,11,13,14,15,17,19,21,22,23,26,29,30,31,33,34,35,37,38,39,41,42]
[> sf !! 999 -- the 1000th square-free integer
λ1637
One can check that these results are indeed correct. (By convention, 1 is regarded as a square-free integer.) Unfortunately, trying to get to the 50,000th square-free number takes more than a few seconds.
> sf !! 49999
λ82257
Surprisingly, the following code lifted from Rosetta Code is almost instantaneous on my 2020 27-inch iMac with a 3.8GHz 8-Core Intel Core i7 processor running macOS Monterey 12.6:
> sf' = filter (all ((== 1) . snd) . factorise) [1..]
λ> sf' !! 49999
λ82257
Remarkably, the following takes less than a second
> sf' !! 499999
λ822463
whereas the following didn’t terminate even after 10 minutes with
peak memory usage by ghc
over 60GB!
> sf !! 499999
λ^CInterrupted.
On the surface, running through every positive integer and filtering out those that are not square-free by looking at its prime factorization does not seem as elegant as the merge-based approach. Two immediate questions follow:
What exactly is making mergeBased
so slow?
Is there an even faster way to generate square-free integers?
As of writing, I have no answers to this. If you have ideas, feel free to contact me.