Reed-Solomon Codec
June 08, 2012

Some Background

I'm working on an image processing project for the university, whose purpose is to embed (an extract) a print-scan resilient watermark into an image. This project has (sadly) gotten me acquainted with Matlab, from which I quickly ran way into the friendlier realms of Scipy and friends (Skimage rocks, by the way). I must say I really learned to appreciate the Scipy/Numpy gang in the last two weeks :)

If it wasn't already obvious, it's time to admit I'm a n00b when it comes to signal processing and applied math in general. I know the Fourier transform in broad terms, and have heard of discrete cosine transform and wavelets some time ago... but it's not my cup of tea, to say the least. Luckily for clueless people like me, Matlab (and its kin) enables us to summon the dark powers of mathematics, without ever having to know what we're doing. Hurrah!

So I'm DFT'ing, DCT'ing and DWT'ing like a pro, embedding my watermark using CDMA/spread-spectrum techniques in the frequency domain, and then inverting the process... and all I know is I'm supposed to keep myself in the mid-frequency range, i.e., in a certain region of the matrix. Math for n00bs.

My original idea was to take a string, encode it as a QR code, and then embed this QR image into the host image. I thought it would be a nice shortcut, as it provided me with a synchronization pattern and error correction out of the box, but it quickly turned out QR codes generate a payload that's too big for unobservable embedding. So I set out to find some error correcting code (ECC) library for python, but it proved to be a really difficult task. I found some packages, most of them are haven't been maintained in over 7 years nows, and all of them make use of extension modules that failed to compile. Then there's zfec, but for the life of me, I couldn't figure out how to use it as a simple encoder/decoder.

The Library

I almost gave up and resorted to triplicating my payload (at the bit level), and using majority-selection for each bit, when I came across an amazing python tutorial (with runnable code) that covers Reed Solomon codes and QR in depth. I simply extracted the code, added a usable API, wrote some examples and quickly uploaded it to PyPI, so now there's a pure-python Reed-Solomon encoder/decoder: pip install reedsolo.

The library should support python 2.4-3.2, using strings or bytes. I really can't verify the correctness of the algorithm (it's beyond me), but it seems to work so I'm fine with it. Here's a short demo:

>>> from reedsolo import RSCodec
>>> rs = RSCodec(10)     # 10 bytes of ECC will be added to the output, 
...                      # which allows us to correct up to 5 byte-level errors
>>> rs.encode([1,2,3,4])
'\x01\x02\x03\x04,\x9d\x1c+=\xf8h\xfa\x98M'
>>> rs.encode("hello world")
'hello world\xed%T\xc4\xfd\xfd\x89\xf3\xa8\xaa'
>>> rs.decode('hello world\xed%T\xc4\xfd\xfd\x89\xf3\xa8\xaa')
'hello world'

Now let's add some errors:

>>> rs.decode('hXXlo worXd\xed%T\xc4\xfdX\x89\xf3\xa8\xaa')     # 4 errors - ok
'hello world'
>>> rs.decode('hXXXo worXd\xed%T\xc4\xfdXX\xf3\xa8\xaa')        # 6 errors - fail
Traceback (most recent call last):
  ...
reedsolo.ReedSolomonError: Could not locate error

It's pure python and highly unoptimized... I think someone acquainted with Numpy a little more than I am could improve it blindfolded by a factor 10, but even now, on my dinosaur machine, it encodes a 400kB message in 2.9 seconds and decodes it in 1.9 seconds. I'll drink to that. By the way, it seems that the library can only handle messages that are less than 255 bytes long... but then you can simply encode/decode in chunks. I'll include it in later versions.

I think a good ECC library for python is very useful... if anyone wants to join in on it, feel free to drop me a line at the comments or just fork the repo.