Python is Messy
May 07, 2011

A couple of days ago, Rudiger, a user of RPyC found a rather surprising bug, that in turn revealed just how gruesome python's inner workings are. Rudiger was working with two machines, one 32 bit and the other 64 bit, and one machine had a netref to a remote list. He then tried to execute something as simple as mylist[1:], which to everyone's surprise threw a very peculiar exception: OverflowError: Python int too large to convert to C long.

At first, it seemed that the exception originated from the server side, but further investigation showed it actually originated from the client side, propagated to the server side, and then back to the client side: very weird indeed. I recreated the scenario with two of my machines, and popped up wireshark to see exactly what was going on there. The last packet before the "resonating" exception seemed to be invoking __getslice__ on the client-side list object, passing it 1 as the start index and 9223372036854775807 as the stop index. Where the heck is that number coming from? I added lots of debug prints and what not, but that strange number kept appearing there, and it was obviously not my code that placed it there.

A day later, the answer finally stroke me: 9223372036854775807 is actually 0x7fffffffffffffff, which is sys.maxint on 64-bit machines. From that point on, the solution was simple, but it had revealed a nasty implementation-detail of CPython 2.xx. You see, when getting a slice of an object, two methods come to play. The first, deprecated, method is __getslice__, which simply takes two arguments for start and stop. The second, recommended method, is __getitem__ which accepts a slice object instead of an integer. Sadly, <type list> has both, which are reflected on the netref proxy, which causes this rather surprising behavior:

>>> class Foo(object):
...     def __getitem__(self, x):
...             print "getitem", x
...     def __getslice__(self, *args):
...             print "getslice", args
>>> y=Foo()
>>> y[7]
getitem 7
>>> y[7:8]
getslice (7, 8)
>>> y[7:]
getslice (7, 2147483647)
>>> y[7:None]
getitem slice(7, None, None)

As you can see y[7:] invokes __getslice__ with sys.maxint, while y[7:None] (which is equivalent) invokes __getitem__ with a slice object... how lame! So when the server (64-bit) side code attempts to execute mylist[1:] it invokes __getslice__ on the client (32-bit) side, passing it the server's "version" of sys.maxint, which goes into the C implementation and blows up. Monkeyballs!

So the simple (and only) solution to this issue is using mylist[1:None] when working with different-width platforms... sorry.