Writing into binary files in Python


Posted by Diego Assencio on 2015.09.18 under Programming (Python)

Writing into binary files in Python is very easy. One way to do it is to open the file for writing in binary mode and then write data to the file as hexadecimal strings:

output_file = open("myfile.bin","wb")

output_file.write(b"\x0a\x1b\x2c")
output_file.write(b"\x3d\x4e\x5f")

output_file.close()

On Linux, the contents of the generated binary file can be read directly with the hexdump command:

hexdump -C myfile.bin

The -C option instructs hexdump to display the contents of the file in hexadecimal form (and also as an ASCII string on the right side). The output of the command above is:

00000000  0a 1b 2c 3d 4e 5f                                 |..,=N_|
00000006

A dot on the right side represents either a byte which holds the ASCII code for the dot character ('.') or a byte which hexdump fails to interpret as a a legible ASCII character.

The problem with the approach discussed above comes from the fact that it is not very easy to use when we want to write several objects into the binary file. For instance, consider what we would have to do if we wished to write integer values, strings and perhaps even the contents of a list into the file. How would we read the contents of this file later? We would have to also write some metadata into the file to specify its structure in order to be able to retrieve each stored object later. This is not a trivial task, especially if some of the objects can have variable lengths (e.g. strings, lists, etc.).

Fortunately, Python has a module which does this work for us and is extremely easy to use. This module is called pickle; it provides us with the ability to serialize and deserialize objects, i.e., to convert objects into bitstreams which can be stored into files and later be used to reconstruct the original objects. There are some data types which pickle cannot serialize, but it is still capable of serializing most of the objects typically used in Python programs. A comprehensive list of data types which pickle can serialize can be found here.

Even if all of this sounds complicated, the examples below show how easy it is to use pickle. To write ("dump") objects directly into a binary file called myfile.bin, do as shown below:

import pickle

output_file = open("myfile.bin", "wb")

myint = 42
mystring = "Hello, world!"
mylist = ["dog", "cat", "lizard"]
mydict = { "name": "Bob", "job": "Astronaut" }

pickle.dump(myint, output_file)
pickle.dump(mystring, output_file)
pickle.dump(mylist, output_file)
pickle.dump(mydict, output_file)

output_file.close()

The contents of the generated binary file will be different for Python 2.x and 3.x because the default serialization protocol which pickle uses has changed over time; it is recommended that you always use Python 3.x with pickle to avoid compatibility issues. For curiosity's sake, these are the contents of myfile.bin obtained with hexdump if the script above is executed using Python 3.4.0:

00000000  80 03 4b 2a 2e 80 03 58  0d 00 00 00 48 65 6c 6c  |..K*...X....Hell|
00000010  6f 2c 20 77 6f 72 6c 64  21 71 00 2e 80 03 5d 71  |o, world!q....]q|
00000020  00 28 58 03 00 00 00 64  6f 67 71 01 58 03 00 00  |.(X....dogq.X...|
00000030  00 63 61 74 71 02 58 06  00 00 00 6c 69 7a 61 72  |.catq.X....lizar|
00000040  64 71 03 65 2e 80 03 7d  71 00 28 58 04 00 00 00  |dq.e...}q.(X....|
00000050  6e 61 6d 65 71 01 58 03  00 00 00 42 6f 62 71 02  |nameq.X....Bobq.|
00000060  58 03 00 00 00 6a 6f 62  71 03 58 09 00 00 00 41  |X....jobq.X....A|
00000070  73 74 72 6f 6e 61 75 74  71 04 75 2e              |stronautq.u.|
0000007c

Now the original objects can be retrieved ("loaded") from myfile.bin in the same order as they were written ("dumped") into it:

import pickle

input_file = open("myfile.bin", "rb")

myint = pickle.load(input_file)
mystring = pickle.load(input_file)
mylist = pickle.load(input_file)
mydict = pickle.load(input_file)

print("myint = %s" % myint)
print("mystring = %s" % mystring)
print("mylist = %s" % mylist)
print("mydict = %s" % mydict)

input_file.close()

The output of the program above shows that the original objects have been properly retrieved from the binary file:

myint = 42
mystring = Hello, world!
mylist = ['dog', 'cat', 'lizard']
mydict = {'job': 'Astronaut', 'name': 'Bob'}

As an interesting note, you can dump objects directly into stdout or stderr if you wish. All you need to do is replace output_file above with sys.stdout.buffer as shown in the example below:

import sys
import pickle

pickle.dump(42, sys.stdout.buffer)
pickle.dump("Hello, world!", sys.stdout.buffer)
pickle.dump(["dog", "cat", "lizard"], sys.stdout.buffer)
pickle.dump({ "name": "Bob", "job": "Astronaut" }, sys.stdout.buffer)

# flush the stdout buffer
sys.stdout.flush()

The code just shown works with Python 3.x but not Python 2.x (everything else presented here works with both versions of Python). This is the version of the code above which works with Python 2.x:

import sys
import pickle

pickle.dump(42, sys.stdout)
pickle.dump("Hello, world!", sys.stdout)
pickle.dump(["dog", "cat", "lizard"], sys.stdout)
pickle.dump({ "name": "Bob", "job": "Astronaut" }, sys.stdout)

# flush the stdout buffer
sys.stdout.flush()

Comments

No comments posted yet.

Leave a reply

NOTE: A name and a comment (max. 1024 characters) must be provided; all other fields are optional. Equations will be processed if surrounded with dollar signs (as in LaTeX). You can post up to 5 comments per day.