Generating bitmask permutations in Python


Posted by Diego Assencio on 2014.12.23 under Programming (Python)

Suppose you need to generate all possible bitmasks of length $n$ with $m$ bits set, i.e., all bitmasks with $n$ bits of length having $m$ bits equal to $1$ and $(n-m)$ bits equal to $0$. For small values of $n$, the desired bitmasks can be easily hard-coded, but for general values of $n$ and $m$, this ceases to be a practical way to accomplish the task since the total number $N$ of bitmasks that need to be generated can be very large: $$ \displaystyle N = \left(\matrix{n \\ m}\right) = \frac{n!}{m!(n-m)!} $$

One way to solve this problem is by using generator functions. A generator function is a special type of function which can be used in iterative loops: it behaves like an iterator and serves the purpose of dynamically generating sequences of elements in a simplified way. Before explaining all this in more detail, take a look at the code below; it solves the problem stated above (the values of $m$ and $n$ are assumed to satisfy $0 \leq m \leq n$):

from bitarray import bitarray

def bitmasks(n,m):
    if m < n:
        if m > 0:
            for x in bitmasks(n-1,m-1):
                yield bitarray([1]) + x
            for x in bitmasks(n-1,m):
                yield bitarray([0]) + x
        else:
            yield n * bitarray('0')
    else:
        yield n * bitarray('1')

for b in bitmasks(4,2):
    print(b)

The output of the code above is shown below:

bitarray('1100')
bitarray('1010')
bitarray('1001')
bitarray('0110')
bitarray('0101')
bitarray('0011')

Here we store each bitmask on a special data structure called bitarray, which is appropriate for bitmasks as it contains lots of methods typically used with arrays of bits such as bitwise operations, encoding and decoding etc. On Ubuntu/Debian, support for bitarrays can be added by installing the python-bitarray package (or python3-bitarray for Python 3); for that you just need to open a terminal and run:

sudo apt-get install python-bitarray

If you wish to print the bitmasks in a cleaner way, apply the following change to the code above:

for b in bitmasks(4,2):
    print(b.to01())

The output now becomes easier to read:

1100
1010
1001
0110
0101
0011

The bitmasks function is a generator function. Instead of returning a value, it returns a generator. Indeed, if you execute:

print(bitmasks(4,2))

the output you will get will be similar to the one shown below:

<generator object bitmasks at 0x7fa2a0c99f78>

As mentioned above, this generator can be used in loops which iterate over a set of values. Roughly speaking, the generator will yield each value on the loop as needed by executing the contents of bitmasks until a yield statement is found, at which point the value from the yield statement is assigned to the loop variable and the function is frozen until the next iteration of the loop. In the example above, the bitmasks of length $n$ are built one by one at each iteration of the for loop and assigned to the loop variable b.

Notice that yield is very different from return: yield produces the next value in the sequence and freezes the execution of the function, while return returns the specified value and terminates the execution of the function immediately.

One interesting aspect of generator functions is the fact that since values are generated one by one (lazily), less memory is needed than if all elements were first generated and returned in a list.

Solution without the bitarray data structure

If you do not wish to store the generated bitmasks on bitarrays, you can also store them directly as integers. The solution below implements this:

def bitmasks(n,m):
    if m < n:
        if m > 0:
            for x in bitmasks(n-1,m-1):
                yield (1 << (n-1)) + x
            for x in bitmasks(n-1,m):
                yield x
        else:
            yield 0
    else:
        yield (1 << n) - 1

# print each value as a 4 bit binary number
for b in bitmasks(4,2):
    print('{:04b}'.format(b))

The implementation above does the same as the previous one and actually generates the bitmasks in the same order. Here is the output produced:

1100
1010
1001
0110
0101
0011

Comments

No comments posted yet.

Leave a reply

NOTE: A name and a comment (max. 1024 characters) must be provided; all other fields are optional. Equations will be processed if surrounded with dollar signs (as in LaTeX). You can post up to 5 comments per day.