Hi There!
I'm Tomer Filiba, the author of several open source Python projects (most notably RPyC, Construct, and Agnos). After a long while of scattering my stuff around the web (the Python Cookbook, my old site, sourceforge, google-code, and what not), I decided to "consolidate my portfolio" into a single website.
This site also serves as my blog (including my [projects](/projects)' development blogs), and I hope you'll find it interesting.
Latest blog posts
Following my Deducible UI post, and following some of the criticism it had received, I'd like to share something I've been working on (read: experimenting with) at my work place. You see, we have some "interactive wizards" that storage admins use to connect storage arrays to their hosts (say, a DB server). These wizards prompt you with questions like your what's your username, the name of the pool/volume, whether it's an iSCSI or a Fiber Channel connection, etc., and then they go and perform what you've asked for.

These wizards operate in a terminal environment, but we've had thoughts to make GUI/web versions of them. This would be a considerable effort with the current design. Another issue they currently have is the mixing of "business logic" and presentation together. For instance, the code that scans the devices attached to your host also prints ANSI-colored messages or reports its progress. All in all it works fine, but there's lots of room for improvement.
I began to investigate this corner a month or two ago. The initial observation was that such wizards have a pretty rigid and repetitive structure, thus we can find some abstraction or a "toolkit" for "expressing" wizards more compactly. This has also led to the realization that once the business logic and presentation are separate, there's no reason to limit ourselves to terminal-based interaction: our wizard-toolkit could do the plumbing and work with terminals, ncurses, GUIs, web-browsers, etc. The business logic would remain oblivious, and we could have a nice GUI at zero-cost!
There was also a second issue of styling, i.e., printing text in color, that I wanted to get r
id of. This part was easy: I thought, why not employ the model of HTML and CSS? Let's separate the
structure (semantics) of the text from its styling. Instead of printing a banner for titles,
we'll display a Title object, whose exact appearance is determined by a "style sheet" (a class,
of course, not actually a text document).
For instance, when we're using a color-enabled terminal, the title would be printed in bold and
followed by an empty line; but if our terminal is color-blind, we'll render the text centered and
surrounded by = marks. Another example is error-handling: instead of printing error message
in red every time, we'll display an Error object; on a terminal, this would be rendered as
red text, but when running in a GUI, rendering this object would pop up a message box.
I'm going to ignore this for the rest of this post, as this is really a side issue.
Now let's get to expressing wizards, or more generally, dialogs. Following some earlier iterations,
I came to the model where a dialog is a "container object" that's made of dialog elements. These
elements can be output-only (such as a welcome message), or input-output (such as a message
telling you to choose one of the available options). A dialog is "executed" by a DialogRunner that
renders it and returns the results gotten from the user. It's quite important to note that dialog
elements within a single dialog cannot be interdependent -- that is, if you want to ask the user
for his name and then show "Hi there %s" with the user's name, this has to be done as two,
serial dialogs.
That was quite a lot of babble -- let's see this in action:
class MyApp(WizardApp):
def main(self):
iscsi = Option("iSCSI")
fc = Option("FC")
d = Dialog(
Text(Title("hello world")),
Input("un", "Username"),
Password("pw", "Password"),
Choice("conf", "What do you want to configure?", [iscsi, fc]),
)
res = self.ui.run(d)
if res["conf"] == fc:
self.config_fc(res)
else:
self.config_iscsi(res)
...
if len(sys.argv) == 2 and sys.argv[1] == "--gtk":
MyApp.run(GtkDialogRunner("My App"))
else:
MyApp.run(TerminalDialogRunner(ANSIRenderer))
It's a short and incomplete snippet of course, as I'm only going to cover the big picture. The
main function creates a dialog object d and passes it to ui.run, which "runs" the dialog
and returns the results, as a dictionary. Notice that the dialog elements Input, Password
and Choice all take a first parameter -- this is the key under which the result would be placed
in the returned dictionary, e.g., res["un"] would hold the user-provided user name, and
res["pw"] would hold the password. Text, on the other hand, is an output-only element,
so it doesn't return anything and doesn't take a key. Long story short, we're asking the user
to enter some information and choose one of two options, and then continue processing based on the
selected option. At the bottom, we determine how to run the application based on a command-line
switch: if --gtk is given, we'll run the dialogs through the GtkDialogRunner; otherwise,
we'll use the TerminalDialogRunner.
And how does it look like? When running on a terminal:
And with a single command-line switch, we run as a GTK application:
So of course it's far from perfect, but then again, it's a small research project I've only put ~15 hours into. It suffers from some of the problems I've listed in the deducible UI post, for instance, the GUI hangs when the business logic performs blocking tasks. This could be solved by moving to a reactor-based model, but I've tried to keep the existing wizard code in tact as much as possible. A hanging GUI is not nice, but it's not the end of the world either, and there are numerous ways to overcome this.
Another benefit this design brings along is the ability to automate testing by using mock dialog
runners. Since our business logic is only exposed to the returned dictionary, we can use a
dialog runner that actually displays nothing and returns a scripted scenario each time. We can even
go further: because our business logic "talks" in high-level primitives like Choice, we can compute
the Cartesian product of all choices and run through each of them. We can show that we've covered
all paths! And we can do this automatically... without people hitting buttons and keeping logs of
their progress.
Anyway, I just wanted to show that it's feasible. I'm not releasing any code as this project is currently in very early stages, and it's something I do at work. Perhaps we'll open-source it in the future, if it proves useful enough.
Hurrah! I just got me these:
After the CS secretariat refused to let take math courses as electives (thus forcing me into taking boring stuff like SQL), I think Dover books and I are going to become good friends. It's not like I have plenty of time to read them, but even just seeing them on my bookshelf would make me a happier person :)
Next on my wish-list are some books on logics and proof theory, category theory, information theory, and automata and computability. I'll get there some day...

When I was working on Agnos, a cross-language RPC framework, I had a lot of code-generation to do in a variety of languages (Python, Java, C#, and C++). At the early stages, I just appended strings to a list. It was quick and dirty, and it's got the job done... but that wasn't enough, of course. I've lost the original code already, but it looked something like this:
def generate_proxy(typeinfo):
lines = [
"public class %sProxy {" % (typeinfo.name,),
" private int uid;",
" public %sProxy(int uid) {" % (typeinfo.name,),
" this.uid = uid;",
" }",
]
for attr in typeinfo.attributes:
if attr.get:
lines.append(" public %s get%s() {" % (attr.typename,
attr.name,))
lines.append(" // ...")
lines.append(" }")
if attr.set:
lines.append(" public void set%s(%s value) {" % (attr.name,
attr.typename))
lines.append(" // ...")
lines.append(" }")
lines.append("}")
return lines
# ...
lines = []
for ti in typeinfos:
lines.extend(generate_proxy(m, ti))
open("foo.java").write("\n".join(lines))
There are several problems with this approach. First of all, it's very cumbersome and fragile. If you forget a comma in the list, two adjacent strings will be concatenated. Also, you have to do everything yourself, like remembering to close brackets, add semicolons, do the right indentation, etc. If you wished to split this code into functions, the functions you call would have to know the indentation level you're calling them at, or the generated code would be unreadable. This might seem negligible, but think of languages where indentation matters, like Python...
The fundamental problem with this approah (and similar ones) is that the code generator does not reflect the structure of the generated code. The two are diseparate, while it's quite obvious they should be correlated.
In order to solve this, I turned to context managers, a feature I highly value. Conceptually, context managers provide a way to bind beginning-and-end into a single entity; this is normally used for resource management -- but we can leverage this construction further (I'll this review in a different post). Here, I've used them to create nested blocks, which allowed me to reflect the structure of the generated code in the code generator.
Without going into too many details, I defined a Module class that exposes a block()
context manager and a stmt() function. The module holds a "stack" of blocks, and entering a new
block pushes a it onto the stack. Statements are then appended to the topmost block on the stack.
Now, because this framework is "language-aware", it can encapsulate language-specific details.
For instance, In Java, a block will be indented correctly and wrapped by brackets; in Python,
we'll append colons to the opening line and indent the block; in C++, if the block begins with
class, struct or enum, we'll append a trailing semicolon as well.
Here's how it works:
m = JavaModule()
m.stmt("import foo")
m.stmt("import bar")
m.sep() # an empty line
#...
def generate_proxy(m, typeinfo):
BLOCK = m.block
STMT = m.stmt
with BLOCK("public class {0}Proxy", typeinfo.name):
STMT("private int uid")
with BLOCK("public {0}Proxy(int uid)", typeinfo.name):
STMT("this.uid = uid")
for attr in typeinfo.attributes:
if attr.get:
with BLOCK("public {0} get{1}()", attr.typename, attr.name):
pass
if attr.set:
with BLOCK("public void set{0}({1} value)", attr.name,
attr.typename):
pass
# ...
for ti in typeinfos:
generate_proxy(m, ti)
m.render_to_file("foo.java")
So what have we gained?
- The code is much shorter and more concise
- Brackets, semicolons and indentation come out-of-the-box
- We're no longer working with flat lists of strings -- we're working with hierarchal entities that reflect the structure of the generated code
- And the other way around -- the structure of the generated code is reflected in the generating
code; nested code is indeed nested inside
BLOCKs, thus the "generatee" and generator are visually and semantically correlated. - We can easily split our code into functions, as the module maintains an internal stack.
If
f()opened a block and calledg()under it, it the code thatg()generates will be placed and indented correctly.
I tried to keep my code quite general, so I haven't defined all of the target language's constructs, but of course we could do that, or at least head in that direction. It might look like this:
def generate_proxy(m, typeinfo):
with m.CLASS(typeinfo.name + "Proxy", ["public"]):
m.FIELD("int", "uid", ["private"])
with m.CTOR(["int uid"]): # CTOR gets the name of the current class
STMT("this.uid = uid")
However, there's a question of where we "put our foot down", or we'll end up writing Java Combinators for Python... and then we'll be writing Java in Python. No need for that, thank you very much.
The full source code can be found in the Agnos repository
A Brief History
I like automating things. I don't like having to reiterate myself: my dream is to always be able
to add only the necessary amount of information in order to make something possible.
This is one reason, for instance, why I hate expressions like
ArrayList<String> x = new ArrayList<String>();... it always makes me feel like I'm talking to a
retard (compiler).
In 2006/7, I wrote some demos for RPyC to show how easy network-related
tasks become. I chose something rather complex, a chat client, to show that all the code sums
up to a few lines: clients invoke a method on the server, say broadcast(str), and the server
then invokes callbacks on all of its clients, sending them the message.
In order to make it usable, I had to write a GUI: I chose Tk, because it comes with python and is quite simple; I knew there were better toolkits, but my GUI was meant to be basic enough to be doable in any toolkit. It occurred to me, then, that I wrote ~20 lines show-casing RPyC and ~100 lines of horrible GUI code, and that something must be really wrong here. And by here I mean everywhere.
Note: throughout the article, I'm using the word GUI to mean any interactive user interface, be it graphical (Qt, GTK, wxWidgets, ...) or terminal-based (
cursesand the like). Basically, anything that doesn't block on a single line of input, like shells.
GUI Designers
So you might say, "Dude, just use QtDesigner or something". A GUI designer lets you visually place components and makes your life much easier -- drag and drop your widgets and double-click on a button to write its action. Very easy indeed. But I would like to offer a different angle on the subject: just like the invention of the teacup has hindered the technological advance of China, so do GUI designers hinder us from developing better GUIs. These designers offer a local optimum which we fail to surpass, and this leaves us with the mediocre UIs and development tools we have today. And get me going about XAML.
Think about it: you have to design a GUI. So yeah, it's kind of simple, but doesn't that break DRY? You have the code and you have the GUI -- two faces of the same idea. Obviously, one should be derived from the other.
For the lion's share of programs, the UI is highly deterministic -- there's some information that needs to be displayed to- or gotten from the user, and the bindings is trivial. Consider a login screen: you want to get a username and a password, use them somehow, and proceed to the next screen. This is a repeating task, and I'd guess that for ~80% of the programs in the world, it's easy enough to automatically deduce how the UI should look, given the task at hand. And I'm not talking about machine learning algorithms or designing "families of tasks" -- way simpler than that! Just define a mapping between programmatic primitives and their visual representation.
Deducing UI
The ultimate goal is to take "pure code", unaware of UI, and by adding the necessary metadata, be able to automatically create ("deduce") a GUI for it. In fact, I'd like to expose programmatic APIs to a human -- completely interchangable programmatic- and human- interfaces. Think how cool it could be to import Adobe Photoshop and run a directory full of pictures through its filters, instead of doing so through the UI... without Adobe having to write a separate GUI and programming toolkit.
The UI needn't be an eye-candy, at least at the beginning; it just has to be good-enough. It won't work for games or complex applications like Office, but for it would be just fine for a chat client. Let's assume the following mapping:
- An object is represented by a window
- Read-only instance attributes are represented as labels
- Writable instance attributes are represented as textboxes
- Methods are represented by buttons. If a method requires arguments, it would be preceded by textboxes
Of course we could change textboxes and labels to reflect the attribute's or argument's type --
DateTime would be represented by a DatePicker, int could be represented by a number box with
up/down arrows, etc. And of course the framework is free to change the mapping however it wants,
to achieve better, more coherent representation. The mapping above is just a rough draft.
Now, instances of a class like the following:
class Person {
public String firstName;
public String lastName;
public DateTime birthdate;
public void dance() {...}
public void eat(String foodstuff) {...}
}
would turn into

with just a little bit of binding in the form of:
class Main {
static public void Main(String[] args) {
Person p = new Person(...);
guify(p);
}
}
Straightforward, isn't it? You can already begin to see the benefits. This framework would obviously require some sort of decoration (annotations in Java, attributes in C#, ...) on which classes and which class members are to be exposed, and perhaps some extra metadata, like a picture to show instead of a method's name, or some layout information -- but it's perfectly doable. And we can turn it better looking (unlike my beautiful ASCII art example), by using better UI primitives and a better mapping between objects and their representation; but let's leave it for now.
I wrote a a simple prototype of this and lo and behold, it actually worked! But when you try to
use it in a real-life applications, the going gets tough: things are updated behind the scenes
(not through our UI framework) and we need to reflect these changes in the UI. For instance,
an element is added to a list via the list's add() method - how can the GUI become aware of that?
Well, we can use observable objects, which the GUI
would observe; so instead of using an ArrayList<T>, you'd simply use an ObservableArrayList<T>.
But creating an observable counterpart for every class is a considerable effort on the framework's side, and it breaks software modularity: the framework has to be aware of every 3rd party class that you wish to expose, or at least allow you to provide the means to expose them. Another downside of this scheme is that your code becomes aware of the GUI: if we've so far managed to keep our code clean of GUI primitives (we only required some metadata), all of the sudden you must replace your lists with GUI-observable-lists. Bummer.
Another issue is that using synchronous programming techniques (blocking operations) does not play well with this model: when does the GUI gets its "runtime"? How can we keep it from freezing? Who's providing the entry-point of the program? Does it run in a separate thread? If so, we risk polluting our code with GUI-related locks (which is countering the whole purpose); and besides, threads suck and add the incurred complexity is never worth it. The only feasible option is asynchronous programming (via a reactor) -- but requires that your code be programmed this way, and it's quite nasty to write such code without proper language support (e.g., lack of closures, coroutines, etc.).
"No Way"
As I said, I've been toying with this idea from 2006, and I always get the same response from colleague programmers: "it would never work", "it won't be good enough", "no one would want to use it", "users need their eye-candy", "I need tight control over the layout", and what not. Skeptics galore. My answer is always the same: you never know what your user wants -- so who are you to decide? And besides, you're always to lazy to support proper customization of UI, so your user must live with your decisions.
Sure, there are books and dissertations about UX, and you've read them all; but why not just provide good-enough defaults, and let the rest be customized by the user? Let the framework deduce a sane layout for your code -- but let's make everything movable/resizable/dockable. This way, if it makes more sense to place button X to the left of button Y, the user can do so himself. And let's remember the user's preferences in a file, which we'll load each time the application runs. And by "user" I'm also talking about your UX expert -- let him/her decide on a default look (i.e., the preferences file) for the application, which will be shipped with your product, but the end-user would still be able to move things around. Wouldn't it be easier? And if you insist, here's the place to stick some machine-learning magic, in order to deduce better UIs by default.
So anyhow, I had a working proof-of-concept somewhere, but I think I lost it. It wouldn't be too hard to recreate it, but at the moment I'm more concerned with UI combinators, which I'll cover in a future post. Fully deducing a UI is quite a challenge, as a detailed above, but it's doable nonetheless, and the added-value is huge! I'll get back to it some day, perhaps after I have better UI combinators... but in the meantime, is there anyone in the audience who's willing to pick it up?







