Using Haskell library random>=1.2

Kevin Cheung

August 30, 2022

Using Haskell library random version 1.2

There are a number of Haskell libraries for pseudo-random number generation. Among these, perhaps random is the most popular based on a search on the term “random” on Hackage. The random library has a rather long history. Version 1.0 was available since 2007 and its usage has been covered in numerous tutorials and books.

In 2020, version 1.2.0 was released. It was a massive effort by a number of individuals, solving the long-standing performance issue and introducing new tools for users of this library. Those interested to know a bit about the history surrouding version 1.2.0 can read Alexey Kuleshevich’s blog post or watch his talk at Zurich Haskell meetup. The talk contains an excellent introduction to what is new. The slides for the talk are hosted on Github. There is also a podcast with Leonhard Markert and Dominic Steinitz that is quite informative. The new features add tremendous power to the library, making it a great choice for situations where pseudo-random number generation is required.

The purpose of this article is to illustrate the basic usage of random version 1.2. It is meant to help Haskell programmers new to pseudo-random number generation in Haskell to get started. The readers are assumed to be familiar with monadic and stateful computations in Haskell.

Preliminary considerations

Working with random numbers in Haskell is arguably not as straightforward as in imperative languages such as C/C++, Python, Julia, etc. In Julia, for example, we can create a vector v of random numbers between 0 and 1 of length 100 as follows:

using Random

v = rand(100)

For reproducibility, we can specify a random seed before generating pseudo-random numbers.

In Haskell, the dependence on a generator is often quite explicit unless one uses an interface that is wrapped in a context where mutations can happen behind the scenes. The need to specify a generator even for the simplest pseudo-random number generation functions seems like an annoyance but such is the price to pay for pure computations. Fortunately, there are patterns that one can follow to make the experience as painless as possible.

In any case, random>=1.2 provides a pure pseudo-random number interface through System.Random and a monadic one through System.Random.Stateful. We will look at each in turn.

Basic usage of System.Random

System.Random contains the functions

uniform :: (RandomGen g, Uniform a) => g -> (a, g)
uniformR :: (RandomGen g, UniformRange a) => (a, a) -> g -> (a, g)

The latter takes a range specified as a pair of (inclusive) end-values and a random generator, and returns an element from the range along with a new generator.

The type class RandomGen requires at least the implementation of split and one of genWord32, genWord64, or next together with genRange. Implementing next and genRange seems to be discouraged but is left in the interface for backward compatibility.

The type class UniformRange has been defined for a wide range of types which include types such as Int, Word, and Double. The full list can be found in the documentation.

For the code that follows, we need to import System.Random. We can simulate a four-sided die and a six-sided die via the functions

fourSidedDie :: (RandomGen g) => g -> (Int, g)
fourSidedDie = uniformR (1, 4)

and

sixSidedDie :: (RandomGen g) => g -> (Int, g)
sixSidedDie = uniformR (1, 6)

Casting simultaneously a four-sided die and a six-sided can then be simulated via the function

rollFourAndSix :: (RandomGen g) => g -> ((Int,Int), g)
rollFourAndSix g = let (r4, g') = fourSidedDie g
                       (r6, g'') = sixSidedDie g'
                   in ((r4, r6), g'')

“Threading through” the new generators in the let bindings looks a bit tedious and suggests sequencing the computations monadically which we will see later on. In the meantime, we can put the function into action in ghci:

λ> rollFourAndSix (mkStdGen 134)
((3,3),StdGen {unStdGen = SMGen 15797293889117377061 16743392734764332447})

The function mkStdGen takes an integer random seed (134 in this case) and returns an object of type StdGen which is an instance of RandomGen. (In the above, SMGen is from splitmix.) Unless the value of the seed to mkStdGen is changed, we will get the same result every time the code above is run with no sign of randomness. This of course is to be expected since rollFourAndSix is a pure function and it will always returns the same output on the same input.

Aside: The custom prompt λ> for the REPL in the above can be set via :set prompt "λ> " It can also be added to .ghci in the home folder to set it as the default.

We can simulate a sequence of pairs of rolls until the sum of the values is at least eight with

untilAtLeastEight :: (RandomGen g) => g -> ([(Int,Int)], g)
untilAtLeastEight g
  | uncurry (+) r >= 8 = ([r], g')
  | otherwise          = let (rs, g'') = untilAtLeastEight g'
                         in (r:rs, g'')
  where
    (r, g') = rollFourAndSix g

Let us give this a run in ghci:

λ> untilAtLeastEight (mkStdGen 134)
([(3,3),(4,3),(3,1),(1,5),(2,6)],StdGen {unStdGen = SMGen 2170483177555623709 16743392734764332447})

Exercise

Implement the function

untilFourSidedBigger :: (RandomGen g) => g -> ([Int], g)

that alternately rolls a 6-sided die and a 4-sided die, starting with a 6-sided die, until the value of the 4-sided die roll is strictly bigger than the value of immediately preceding 6-sided die roll.

Going monadic

We saw in the above that every time we generate a pseudo-random number, we need a new generator. Hence, if we want to obtain a sequence of pseudo-random numbers, we can wrap functions that return random numbers in some sort of context (monad) that mutates the generator and sequence our computations monadically. There are a number of ways towards achieving this goal. In fact, System.Random.Stateful provides all the necessary machinery for monadic pseudo-random number generation. However, we opt to first roll our own “stateful” solution to develop better intuition.

Using State

One option for our purpose is the State monad. Let us begin by creating a file named stateful.hs:

-- stateful.hs
module Stateful where

import Control.Monad.State
import System.Random

fourSidedDie' :: (RandomGen g) => State g Int
fourSidedDie' = do
  (val, g) <- gets $ uniformR (1,4)
  put g
  return val

Note that fourSidedDie' is not a function; it is an Int wrapped in State g. To obtain a die roll and a new generator, we perform

runState fourSidedDie' g

for some g of type that is an instance of RandomGen. Let us try this in ghci using mkStdGen :: Int -> StdGen to create a generator:

λ> :l stateful.hs
[1 of 1] Compiling Stateful         ( stateful.hs, interpreted )
Ok, one module loaded.
λ> runState fourSidedDie' (mkStdGen 134)
(3,StdGen {unStdGen = SMGen 17500645228062596230 16743392734764332447})

We can use the function

state :: MonadState s m => (s -> (a, s)) -> m a

to simplify fourSidedDie' to

fourSidedDie'' :: (RandomGen g) => State g Int
fourSidedDie'' = state $ uniformR (1,4)

Adding this to stateful.hs and trying it out in ghci, we obtain the same result:

λ> :l stateful.hs
[1 of 1] Compiling Stateful         ( stateful.hs, interpreted )
λ> runState fourSidedDie'' (mkStdGen 134)
(3,StdGen {unStdGen = SMGen 17500645228062596230 16743392734764332447})

Using State makes it easy to generate a list of pseudo-random numbers as illustrated by the following function that simulates rolling a k-sided die n times:

kSidedRolls :: (RandomGen g) => Int -> Int -> g -> ([Int], g)
kSidedRolls k n g = let kSided = state $ uniformR (1, k)
                    in runState (replicateM n kSided) g

We can now simulate rolling a 12-sided die 10 times in ghci:

λ> :l stateful.hs
[1 of 1] Compiling Stateful         ( stateful.hs, interpreted )
λ> kSidedRolls (mkStdGen 134) 12 10
([3,3,12,11,3,9,10,8,6,2],StdGen {unStdGen = SMGen 13803821895484298649 16743392734764332447})

Using ST

Another option is to wrap a generator within an STRef and sequence the generation of pseudo-random numbers in ST s. Note that the type s here is a phantom type and is not the type of a random generator. Working with ST can feel a bit magical. However, no attempt will be made in this article to demystify it as some good answers are already available on Stack Overflow. Meanwhile, we illustrate using ST with a concrete example.

Let us add the following imports to stateful.hs:

import Control.Monad.ST
import Data.STRef

We define an alternate version of kSidedRolls, using ST under the hood:

kSidedRolls' :: (RandomGen g) => Int -> Int -> g -> ([Int], g)
kSidedRolls' k n g =
  runST $ do
            gRef  <- newSTRef g
            rs <- replicateM n (kSided gRef)
            g' <- readSTRef gRef
            return (rs, g')
  where
    kSided gRef = do
      (val, g') <- uniformR (1, k) <$> readSTRef gRef 
      writeSTRef gRef g'
      return val
λ> :l stateful.hs
[1 of 1] Compiling Stateful         ( stateful.hs, interpreted )
λ> kSidedRolls' (mkStdGen 134) 12 10
([3,3,12,11,3,9,10,8,6,2],StdGen {unStdGen = SMGen 13803821895484298649 16743392734764332447})

Note that the output is exactly the same as that of kSidedRolls above. But we might ask, is there any practical difference between kSidedRolls and kSidedRolls'? To find out, let us do a bit of benchmarking using criterion. (Those new to criterion could consult this tutorial.)

To manage the growing codebase and accommodate future examples, we create a cabal project called randomthings with the following folder structure

randomthings
├── CHANGELOG.md
├── app
│   └── Main.hs
├── lib
│   └── stateful.hs
└── randomthings.cabal

where randomthings.cabal contain the lines

library stateful
    exposed-modules:  Stateful
    hs-source-dirs:   lib
    build-depends:    base ^>=4.14.3.0
                    , random
                    , mtl
                    , vector
    ghc-options: -Wall -O2
    default-language: Haskell2010

executable randomthings
    main-is:          Main.hs

    -- Modules included in this executable, other than Main.
    -- other-modules:

    -- LANGUAGE extensions used by modules in this package.
    -- other-extensions:
    build-depends:    base ^>=4.14.3.0
                    , criterion
                    , random
                    , stateful
    hs-source-dirs:   app
    default-language: Haskell2010

and the content of Main.hs is

module Main where

import Criterion.Main (bench, bgroup, defaultMain, nf)
import System.Random (mkStdGen)
import Stateful

main :: IO ()
main = do
  let g = mkStdGen 134
  let k = 6
  let n1 = 10000
  let n2 = 100000
  let n3 = 1000000
  defaultMain 
    [
      bgroup "State" [ bench (show n1) $ nf (kSidedRolls k n1) g
                     , bench (show n2) $ nf (kSidedRolls k n2) g
                     , bench (show n3) $ nf (kSidedRolls k n3) g
                     ]
    , bgroup "ST" [ bench (show n1) $ nf (kSidedRolls' k n1) g
                  , bench (show n2) $ nf (kSidedRolls' k n2) g
                  , bench (show n3) $ nf (kSidedRolls' k n3) g
                  ]
    ]

On my machine, cabal -v0 run gives the following results:

benchmarking State/10000
time                 706.8 μs   (702.2 μs .. 712.0 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 716.2 μs   (711.3 μs .. 724.7 μs)
std dev              20.84 μs   (12.31 μs .. 32.71 μs)
variance introduced by outliers: 20% (moderately inflated)

benchmarking State/100000
time                 7.297 ms   (7.255 ms .. 7.344 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 7.236 ms   (7.202 ms .. 7.270 ms)
std dev              96.04 μs   (78.23 μs .. 120.2 μs)

benchmarking State/1000000
time                 72.95 ms   (72.34 ms .. 73.76 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 72.59 ms   (72.33 ms .. 72.96 ms)
std dev              554.9 μs   (345.3 μs .. 741.5 μs)

benchmarking ST/10000
time                 279.9 μs   (278.0 μs .. 281.8 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 279.6 μs   (278.3 μs .. 280.6 s)
std dev              3.767 μs   (3.366 μs .. 4.438 μs)

benchmarking ST/100000
time                 5.023 ms   (5.006 ms .. 5.040 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 4.999 ms   (4.976 ms .. 5.031 ms)
std dev              80.93 μs   (58.87 μs .. 106.1 μs)

benchmarking ST/1000000
time                 56.60 ms   (55.52 ms .. 57.56 ms)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 56.75 ms   (56.07 ms .. 58.09 ms)
std dev              1.757 ms   (740.4 μs .. 3.160 ms)

Judging from the mean times, the ST version is speedier. However, we can notice that as the number of rolls increases, the relative performance gap decreases. It might be that the overhead in list creation begins to dominate the computation time as the list becomes longer. To test this, let us use vectors instead of lists.

We add the following to stateful.hs:

import Data.Vector.Unboxed (Vector)
import qualified Data.Vector.Unboxed as V

kSidedRollsV :: (RandomGen g) => Int -> Int -> g -> (Vector Int , g)
kSidedRollsV k n g = let kSided = state $ uniformR (1, k)
                    in runState (V.replicateM n kSided) g

kSidedRollsV' :: (RandomGen g) => Int -> Int -> g -> (Vector Int, g)
kSidedRollsV' k n g =
  runST $ do
            gRef <- newSTRef g
            rs <- V.replicateM n (kSided gRef)
            g' <- readSTRef gRef
            return (rs, g')
  where
    kSided gRef = do
      (val, g') <- uniformR (1, k) <$> readSTRef gRef 
      writeSTRef gRef g'
      return val

We also replace the defaultMain section in Main.hs with

  defaultMain 
    [ 
      bgroup "State/Vector" [ bench (show n1) $ nf (kSidedRollsV k n1) g
                            , bench (show n2) $ nf (kSidedRollsV k n2) g
                            , bench (show n3) $ nf (kSidedRollsV k n3) g
                            ]
    , bgroup "ST/Vector" [ bench (show n1) $ nf (kSidedRolls' k n1) g
                         , bench (show n2) $ nf (kSidedRollsV' k n2) g
                         , bench (show n3) $ nf (kSidedRollsV' k n3) g
                         ]
    ]

Here are the results:

benchmarking State/Vector/10000
time                 516.3 μs   (513.4 μs .. 519.4 μs)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 529.3 μs   (522.5 μs .. 539.7 μs)
std dev              26.82 μs   (18.43 μs .. 35.02 μs)
variance introduced by outliers: 45% (moderately inflated)

benchmarking State/Vector/100000
time                 4.974 ms   (4.923 ms .. 5.033 ms)
                     0.999 R²   (0.999 R² .. 0.999 R²)
mean                 5.262 ms   (5.196 ms .. 5.365 ms)
std dev              241.3 μs   (151.7 μs .. 396.2 μs)
variance introduced by outliers: 24% (moderately inflated)

benchmarking State/Vector/1000000
time                 49.52 ms   (48.76 ms .. 50.27 ms)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 51.15 ms   (50.62 ms .. 52.00 ms)
std dev              1.314 ms   (821.0 μs .. 2.094 ms)

benchmarking ST/Vector/10000
time                 162.5 μs   (161.4 μs .. 163.4 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 161.3 μs   (160.7 μs .. 162.0 μs)
std dev              2.126 μs   (1.831 μs .. 2.480 μs)

benchmarking ST/Vector/100000
time                 1.677 ms   (1.669 ms .. 1.685 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.679 ms   (1.674 ms .. 1.684 ms)
std dev              16.41 μs   (14.24 μs .. 20.49 μs)

benchmarking ST/Vector/1000000
time                 15.93 ms   (15.83 ms .. 16.01 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 16.03 ms   (15.97 ms .. 16.10 ms)
std dev              165.3 μs   (124.5 μs .. 222.1 μs)

For each vector lenght, the mean time of the ST version is just over 30% of the mean time of the State version. Also, the ST version with vectors is substantially faster than ST version with lists. In particular, for 1 million entries, the mean time for the ST version with vectors is 28.2% of the mean time for the ST version with lists. And for 100,000 entries, it is 33.6%.

Indeed, there is a lot of overhead in creating long lists. For ideas on generating large vectors of random numbers even faster, see this response on Github.

Basic usage of System.Random.Stateful

The System.Random.Stateful interface can be a bit overwhelming at first sight. There are a number of available options designed for different purposes. We will begin by walking through the types and discussing some of the nuances.

First of all, the monadic counterpart to uniform is

uniformM :: (Uniform a, StatefulGen g m) => g -> m a

The type class StatefulGen is multi-parametered and therefore requires the language extension MultiParamTypeClasses. Let us get some info in ghci:

λ> import System.Random.Stateful
λ> :i StatefulGen
type StatefulGen :: * -> (* -> *) -> Constraint
class Monad m => StatefulGen g m where
  uniformWord32R :: GHC.Word.Word32 -> g -> m GHC.Word.Word32
  uniformWord64R :: GHC.Word.Word64 -> g -> m GHC.Word.Word64
  uniformWord8 :: g -> m GHC.Word.Word8
  uniformWord16 :: g -> m GHC.Word.Word16
  uniformWord32 :: g -> m GHC.Word.Word32
  uniformWord64 :: g -> m GHC.Word.Word64
  uniformShortByteString :: Int
                            -> g
                            -> m bytestring-0.10.12.0:Data.ByteString.Short.Internal.ShortByteString
  default uniformShortByteString :: MonadIO m =>
                                    Int
                                    -> g
                                    -> m bytestring-0.10.12.0:Data.ByteString.Short.Internal.ShortByteString
  {-# MINIMAL (uniformWord32 | uniformWord64) #-}
        -- Defined in ‘System.Random.Internal’
instance RandomGen g => StatefulGen (STGenM g s) (ST s)
  -- Defined in ‘System.Random.Stateful’
instance (RandomGen g, MonadIO m) => StatefulGen (IOGenM g) m
  -- Defined in ‘System.Random.Stateful’
instance (RandomGen g, MonadIO m) => StatefulGen (AtomicGenM g) m
  -- Defined in ‘System.Random.Stateful’
instance (RandomGen g, MonadState g m) =>
         StatefulGen (StateGenM g) m
  -- Defined in ‘System.Random.Internal’

We can see four defined instances already. If we add import Control.Concurrent.STM, we will also see instance RandomGen g => StatefulGen (TGenM g) STM which is available from version 1.2.1 onwards. In particular, the instance

instance (RandomGen g, MonadState g m) =>
         StatefulGen (StateGenM g) m

means that there exists, for example, an implementation of uniformWord32::StatefulGen g m => g -> m Word32 that takes an input of type StateGenM g and returns something of type m Word32.

These instances bring into light the existence of various monadic adapters: StateGenM, AtomicGenM, IOGenM, STGenM and TGenM. The documentation describes how they differ in performance and safety in the presence of exceptions and concurrency. The following functions help create the appropriate adapter from from a pure pseudo-random generator:

StateGenM g is a bit special as it is a phantom type with constructor StateGenM:

λ> :t StateGenM
StateGenM :: StateGenM g

Recall the type signature of uniformM:

uniformM :: (Uniform a, StatefulGen h m) => h -> m a

(We have renamed the type of the input argument to h to minimize confusion in what follows.) The input argument type h can be anything provided that there exists an instance definition for StatefulGen h m. Hence, the function

f = uniformM StateGenM

is a specialization of uniformM with type

(Uniform a, StatefulGen (StateGenM g) m) => m a

The instance definition that matches StatefulGen (StateGenM g) m) is

instance (RandomGen g, MonadState g m) => StatefulGen (StateGenM g) m

Hence, the type of f simplifies to

(Uniform a, RandomGen g, MonadState g m) => m a

which agrees with what we get in ghci:

λ> :t f
f :: (Uniform a, RandomGen g, MonadState g m) => m a

To obtain an element of type that is an instance of Uniform (e.g. Int), we apply runState to f and an initial generator:

λ> fst $ runState f (mkStdGen 137)::Int
7879794327570578227

Instead, we could have used runStateGen_ provided by the interface:

λ> runStateGen_ (mkStdGen 137) uniformM::Int
7879794327570578227

We can use runStateGen instead if also want the new generator:

λ> runStateGen (mkStdGen 137) uniformM::(Int, StdGen)
(7879794327570578227,StdGen {unStdGen = SMGen 11285859549637045894 7641485672361121627})

We mention in passing that the library mwc-random since version 0.15.0 has the following instance definition:

import qualified System.Random.Stateful as Random

instance (s ~ PrimState m, PrimMonad m) => Random.StatefulGen (Gen s) m where
  uniformWord32R u = uniformR (0, u)
  {-# INLINE uniformWord32R #-}
  uniformWord64R u = uniformR (0, u)
  {-# INLINE uniformWord64R #-}
  uniformWord8 = uniform
  {-# INLINE uniformWord8 #-}
  uniformWord16 = uniform
  {-# INLINE uniformWord16 #-}
  uniformWord32 = uniform
  {-# INLINE uniformWord32 #-}
  uniformWord64 = uniform
  {-# INLINE uniformWord64 #-}
  uniformShortByteString n g = unsafeSTToPrim (Random.genShortByteStringST n (uniform g))
  {-# INLINE uniformShortByteString #-}

The constraint PrimMonad m essentially says that m is IO or ST s. PrimState m is an associated data type. Those interested in learning more can see the discussion on Stack Overflow.

Example

For this example, we need random and mwc-random instsalled. Let us define a couple of functions

λ> import System.Random.Stateful
λ> :{
kSidedDieM :: (StatefulGen g m) => Int -> g -> m Int
kSidedDieM k = uniformRM (1, k)

kSidedRollsM :: (StatefulGen g m) => Int -> Int -> g -> m [Int]
kSidedRollsM k n = replicateM n . kSidedDieM k
:}

We can now simulate rolling a 6-sided die 12 times in a few different ways using different adapters:

λ> pureGen = mkStdGen 134
λ> :{
go :: (StatefulGen g m) => g -> m [Int]
go = kSidedRollsM 6 12
:}
λ> runStateGen_ pureGen go
[3,3,4,3,3,1,5,5,2,6,6,6]
λ> newIOGenM pureGen >>= go
[3,3,4,3,3,1,5,5,2,6,6,6]
λ> newAtomicGenM pureGen >>= go
[3,3,4,3,3,1,5,5,2,6,6,6]
λ> runST $ newSTGenM pureGen >>= go
[3,3,4,3,3,1,5,5,2,6,6,6]

We can also use a generator from mwc-random:

λ> import qualified System.Random.MWC as MWC
λ> MWC.create >>= go
[3,4,3,1,4,6,1,6,1,4,2,2]

Exercise

Write benchmarking code comparing the performance of the different monadic adapters and the generator from mwc-random.

Final thoughts

We have covered the basics of random>=1.2. The improved performance and the new features help make the library become a unifying framework for pseudo-random number generation in Haskell. The flexibility of the inferface offers fine-grained control on computing with generate pseudo-random numbers depending on the needs of an application. Consequently, the library has an inherent complexity that takes time to master. In future articles, we will look at more examples of usage and other advanced features of the library.