September 5, 2022
Generating a random string of balanced parentheses of a given length is a classic problem. The main difficulty, at least mathematically, is to ensure that such strings are generated uniformly. In this article, we will see an implementation of the algorithm by D. B. Arnold and M. R. Sleep. in Haskell.
In computing, the character ‘(’ is a called left parenthesis and the character ‘)’ a right parenthesis. A string of length \(2n\) consisting of \(n\) left parentheses and \(n\) right parentheses is said to be balanced if the number of left parentheses is at least the number of right parentheses in the substring consisting of the first \(k\) characters for \(k = 1,\ldots, 2n\).
For example, the following are strings of balanced parentheses of length six:
"()(())"
"(()())"
"((()))"
The following strings of parentheses are not balanced:
"())(()"
")()(()"
It is known that the number of strings of balanced parentheses of length \(n\) is given by the Catalan number \[C_n = \binom{2n}{n} - \binom{2n}{n+1}.\] It is also known that strings of balanced parentheses of length \(2n\) are in one-to-one correspondence with binary trees with \(n\) nodes. (See section 7.2.1.6 of Knuth’s The Art of Computer Programming volume 4, pre-fascicle 4A.)
The algorithm generates the string from left to right one character
at a time with the probability of choosing a right parenthesis given by
\[P(r,k) = \frac{r(k+r+2)}{2k(r+1)}\]
where \(k\) is the number of characters
remaining to be produced and \(r\) is
the number of unmatched left parentheses. That is, \(r\) is the number of existing left
parentheses minus the number of existing right parentheses. For example,
for the string "(()(()"
with \(n
= 8\), we have \(k = r = 2\) and
so the probability of choosing ')'
is \(\frac{2(2+2+2)}{2(2)(2+1)} = 1\). This is
indeed correct since choosing '('
will lead to a partial
string with five left parentheses which exceeds the number of allowable
left parentheses in a balanced string of length 8.
We can also see from the formula that when \(r = 0\), the probability of choosing a right parenthesis is zero which is precisely what is needed because choosing a right parenthesis will result in a partial string where the number of right parentheses exceeds the number of left parentheses.
Amazingly, this simple algorithm will generate random strings of balanced parentheses of length \(2n\) uniformly. A proof of this fact is given in the original paper. In the following, we will verify this empirically.
We will use the random
library for pseudo-random number
generation. Optionally, we could set up a local environment for
experimentation to avoid using the global environment:
cabal install --env . --lib random
Let us create a file named random_bal_paren.hs
with the
following lines:
module RandomBalParen where
import System.Random
genRandPar :: RandomGen g => Int -- n: number of pairs of parentheses
-> g
-> ([Char], g)
Since we need to keep track of the values of \(r\) and \(k\), we will implement a helper function
genRandParWork :: RandomGen g => [Char] -- what's generated so far
-> Int -- r: number of unmatched L's
-> Int -- k: number of symbols remaining
-> g
-> ([Char], g)
For better efficiency, we will generate the list of parentheses in
reverse. Also, to avoid dealing with floating-point errors, we choose
the right parenthesis with probability \(P(r,k)\) by obtaining a random non-negative
integer less than \(2k(r+1)\), the
denominator in the formula for \(P(r,k)\), and check if it is less than
\(r(k+r+2)\), the numerator in the
formula for \(P(r,k)\). Here is the
full code for genRandParWork
:
0 g = (acc, g)
genRandParWork acc _ =
genRandParWork acc r k g let denom = 2*k*(r+1)
= r*(k+r+2)
numer = uniformR (0, denom-1) g
(a, g') = if a < numer then (')':acc, r-1) else ('(':acc, r+1)
(acc', r') in genRandParWork acc' r' (k-1) g'
Then genRandPar
is simply
= let (ps, g') = genRandParWork [] 0 (2*n) g
genRandPar n g in (reverse ps, g')
Let us now give the function a test drive in ghci
:
> :l random_bal_paren.hs
λ> g = mkStdGen 134
λ> (s, g') = genRandPar 3 g
λ> s
λ"()(())"
> (s', g'') = genRandPar 4 g'
λ> s'
λ"()(()())"
So far so good.
Next, we use genRandPar
to generate 100,000 random
strings of 4 matched pairs of parentheses and check empircally if the
strings are uniformly generated. Note that the Catalan number \(C_4 = 14\). We will use a Map
from the
containers
library to store the frequencies for each of the
14 strings. To this end, we add to our file the list of imports
import qualified Data.Map as M
import Data.List (foldl', unfoldr)
and the functions
trials :: RandomGen g => Int -> Int -> g -> [String]
= take nTrials (unfoldr (Just . genRandPar n) gen)
trials n nTrials gen
toFreqMap :: [String] -> M.Map String Int
= foldl' (\m k -> M.insertWith (+) k 1 m) M.empty toFreqMap
We then perform the following in ghci
:
> :l random_bal_paren.hs
λ> g = mkStdGen 134
λ> xs = trials 4 100000 g
λ> m = toFreqMap xs λ
We now check that the frequency map m
does have 14
keys:
> M.size m
λ14
We can also take a look at all the keys:
> M.keys m
λ"(((())))","((()()))","((())())","((()))()","(()(()))","(()()())","(()())()","(())(())","(())()()","()((()))","()(()())","()(())()","()()(())","()()()()"] [
As expected, all the strings have balanced parentheses. Let us now obtain a list of frequencies:
> M.elems m
λ7091,7166,7234,7114,7105,7089,7087,7151,6995,7124,7094,7164,7295,7291] [
The numbers look reasonably close to each other, which is good news.
Implement a function that accepts a positive integer \(n\) as argument and outputs a list of all the strings of balanced parentheses of length \(2n\).
How would you generate uniformly at random a string of length
\(2n\) consisting of brackets
[]
, braces {}
, and parentheses ()
so that all pairs are matched and properly nested? Note that such a
string does not need to contain all three types of brackets. For
example, "[][]"
and "{}(())"
are valid. The
following strings do not have proper nesting:
"({)}"
"()({[}])"
If you cannot prove mathematically that your algorithm generates such strings uniformaly at random, at least verify that it does empirically.