Random Strings of Balanced Parentheses

Kevin Cheung

September 5, 2022

Uniform random generation of balanced parenthesis strings

Generating a random string of balanced parentheses of a given length is a classic problem. The main difficulty, at least mathematically, is to ensure that such strings are generated uniformly. In this article, we will see an implementation of the algorithm by D. B. Arnold and M. R. Sleep. in Haskell.

Introduction

In computing, the character ‘(’ is a called left parenthesis and the character ‘)’ a right parenthesis. A string of length \(2n\) consisting of \(n\) left parentheses and \(n\) right parentheses is said to be balanced if the number of left parentheses is at least the number of right parentheses in the substring consisting of the first \(k\) characters for \(k = 1,\ldots, 2n\).

For example, the following are strings of balanced parentheses of length six:

The following strings of parentheses are not balanced:

It is known that the number of strings of balanced parentheses of length \(n\) is given by the Catalan number \[C_n = \binom{2n}{n} - \binom{2n}{n+1}.\] It is also known that strings of balanced parentheses of length \(2n\) are in one-to-one correspondence with binary trees with \(n\) nodes. (See section 7.2.1.6 of Knuth’s The Art of Computer Programming volume 4, pre-fascicle 4A.)

The algorithm

The algorithm generates the string from left to right one character at a time with the probability of choosing a right parenthesis given by \[P(r,k) = \frac{r(k+r+2)}{2k(r+1)}\] where \(k\) is the number of characters remaining to be produced and \(r\) is the number of unmatched left parentheses. That is, \(r\) is the number of existing left parentheses minus the number of existing right parentheses. For example, for the string "(()(()" with \(n = 8\), we have \(k = r = 2\) and so the probability of choosing ')' is \(\frac{2(2+2+2)}{2(2)(2+1)} = 1\). This is indeed correct since choosing '(' will lead to a partial string with five left parentheses which exceeds the number of allowable left parentheses in a balanced string of length 8.

We can also see from the formula that when \(r = 0\), the probability of choosing a right parenthesis is zero which is precisely what is needed because choosing a right parenthesis will result in a partial string where the number of right parentheses exceeds the number of left parentheses.

Amazingly, this simple algorithm will generate random strings of balanced parentheses of length \(2n\) uniformly. A proof of this fact is given in the original paper. In the following, we will verify this empirically.

Implementation in Haskell

We will use the random library for pseudo-random number generation. Optionally, we could set up a local environment for experimentation to avoid using the global environment:

cabal install --env . --lib random

Let us create a file named random_bal_paren.hs with the following lines:

module RandomBalParen where

import System.Random

genRandPar :: RandomGen g =>    Int -- n: number of pairs of parentheses
                             -> g
                             -> ([Char], g)

Since we need to keep track of the values of \(r\) and \(k\), we will implement a helper function

genRandParWork :: RandomGen g =>    [Char] -- what's generated so far
                                 -> Int     -- r: number of unmatched L's
                                 -> Int     -- k: number of symbols remaining
                                 -> g
                                 -> ([Char], g)

For better efficiency, we will generate the list of parentheses in reverse. Also, to avoid dealing with floating-point errors, we choose the right parenthesis with probability \(P(r,k)\) by obtaining a random non-negative integer less than \(2k(r+1)\), the denominator in the formula for \(P(r,k)\), and check if it is less than \(r(k+r+2)\), the numerator in the formula for \(P(r,k)\). Here is the full code for genRandParWork:

genRandParWork acc _ 0 g = (acc, g)
genRandParWork acc r k g =
  let denom = 2*k*(r+1)
      numer = r*(k+r+2)
      (a, g') = uniformR (0, denom-1) g 
      (acc', r') = if a < numer then (')':acc, r-1) else ('(':acc, r+1) 
  in genRandParWork acc' r' (k-1) g'

Then genRandPar is simply

genRandPar n g = let (ps, g') = genRandParWork [] 0 (2*n) g
                 in (reverse ps, g')

Computational experiments

Let us now give the function a test drive in ghci:

λ> :l random_bal_paren.hs
λ> g = mkStdGen 134
λ> (s, g') = genRandPar 3 g
λ> s
"()(())"
λ> (s', g'') = genRandPar 4 g'
λ> s'
"()(()())"

So far so good.

Next, we use genRandPar to generate 100,000 random strings of 4 matched pairs of parentheses and check empircally if the strings are uniformly generated. Note that the Catalan number \(C_4 = 14\). We will use a Map from the containers library to store the frequencies for each of the 14 strings. To this end, we add to our file the list of imports

import qualified Data.Map as M
import Data.List (foldl', unfoldr)

and the functions

trials :: RandomGen g => Int -> Int -> g -> [String]
trials n nTrials gen = take nTrials (unfoldr (Just . genRandPar n) gen)

toFreqMap :: [String] -> M.Map String Int
toFreqMap = foldl' (\m k -> M.insertWith (+) k 1 m) M.empty

We then perform the following in ghci:

λ> :l random_bal_paren.hs
λ> g = mkStdGen 134
λ> xs = trials 4 100000 g
λ> m = toFreqMap xs

We now check that the frequency map m does have 14 keys:

λ> M.size m
14

We can also take a look at all the keys:

λ> M.keys m
["(((())))","((()()))","((())())","((()))()","(()(()))","(()()())","(()())()","(())(())","(())()()","()((()))","()(()())","()(())()","()()(())","()()()()"]

As expected, all the strings have balanced parentheses. Let us now obtain a list of frequencies:

λ> M.elems m
[7091,7166,7234,7114,7105,7089,7087,7151,6995,7124,7094,7164,7295,7291]

The numbers look reasonably close to each other, which is good news.

Challenges

  1. Implement a function that accepts a positive integer \(n\) as argument and outputs a list of all the strings of balanced parentheses of length \(2n\).

  2. How would you generate uniformly at random a string of length \(2n\) consisting of brackets [], braces {}, and parentheses () so that all pairs are matched and properly nested? Note that such a string does not need to contain all three types of brackets. For example, "[][]" and "{}(())" are valid. The following strings do not have proper nesting:

    If you cannot prove mathematically that your algorithm generates such strings uniformaly at random, at least verify that it does empirically.