Hooking Imports for Fun and Profit June 17, 2011
I really love Python... it's so hackable that it just calls for hacking, inspiring your imagination to find ways to stretch its boundaries. This time I decided to investigate into import hooks, to add some missing functionality I wanted to have.
As you probably know, Python uses a flat namespace for packages, that works on a
"first found first served" basis. Packages are simply searched in linear order, as they appear
sys.path: if two directories contain a package named
foo will fetch
package in the first directory. This is normally the desired behavior (as it allows you to
override some modules by changing
PYTHONPATH), but it's also quite limiting.
Consider the nested package namespace used by Java and various other languages (e.g., Haskell),
where packages are normally "deeply nested", as in
In Haskell, for instance, packages are normally placed under their appropriate "categories",
Control.Monad. When you write a new data type, you'll probably put
Data, as in
If we were to use something like that in Python, if would require us to have a single
directory, under which lots of sub-packages must be placed. This might be possible, but it's
certainly not the "right way"; and besides, it implies that code written by Sun and IBM is
somehow related (after all, they share the
__init__.py file in
com/), which means one may
affect the other (for better or worse :)).
It would make much more sense to have separately-installed packages, e.g.,
site-packages/com.sun.foo.bar, where each package is
independent of the other. Sure, they might share a common
com prefix -- but that's all.
This is especially useful in corporate environments, where multiple teams share common packages
(which usually get very unoriginal names, say,
common), and name collisions are very likely.
It's not a joke: at my work-place, we're know reorganizing our code after such problems. Also,
from a marketing point-of-view, it might make more sense for your customers to
import mycompany.foobar than just
But most importantly -- it's composable. Nested packages allow you to "inject" your package
into another namespace. Take
twisted for instance: it's become so large that it had made more
sense to split it up into sub-packages (
twisted.news, ...), and allow
end users to choose which of them they wish to install. However, since Python wouldn't let you have
site-packages/twisted-conch, they resorted to hacking
into doing what they want. If nested packages were supported, you would have a core
package, with separate add-on packages like
twisted-conch. So why not, really?
Enter nimp ("nested imports"). Without going into too many
nimp is a meta-import hook -- it modifies the way
import statements work.
Specifically, it scans
sys.path and "merges" packages that begin with a common prefix into
"logical packages". For instance, if you have
com-sun-bar on your
nimp will create namespace packages for
This would allow code like
import com.ibm.foo or
from com.sun.bar import vodka to work
transparently. All you need to do is run
import nimp; nimp.install() (you can also put it
site.py, so it would happen every time you run a Python process), and you're ready to go.
This was the first time I wrote an import hook, and I really liked how I easy it was to change
the import mechanism. So today I had another idea -- lazy imports. Of course there's
PEAK's lazyModule and this
quite complicated recipe,
but I thought, why not combine the two. Writing code like
from peak... import lazyModule; foo = lazyModule("foo")
is cumbersome, while the recipe attempts is too make everything lazy.
Instead, I created a module called
__lazy__, that when imported, installs a meta-import hook.
This import hook handles only modules that begin with
__lazy__, so instead of importing them,
it returns an "on-demand-loaded module" (i.e., when you try to access an its attributes).
Using it is really simple:
>>> from __lazy__ import telnetlib >>> telnetlib <OnDemandModule 'telnetlib'> >>> telnetlib.Telnet # forces loading <class telnetlib.Telnet at 0x015B08B8> >>> telnetlib <module 'telnetlib' from 'C:\Python27\lib\telnetlib.pyc'> >>> from __lazy__.xml.dom import minidom >>> minidom <OnDemandModule 'xml.dom.minidom'> >>> minidom.parseString # forces loading <function parseString at 0x01659CF0> >>> minidom <module 'xml.dom.minidom' from 'C:\Python27\lib\xml\dom\minidom.pyc'>
Note, though, that using
from __lazy__.x.y import z forces the loading of
x, since we use the
dot operator on it. The
from __lazy__ import foo is "truly lazy"
You can get the code of