Easy Syntax for Representing Trees March 07, 2012
I’m working on a parser for Tree Adjoining Grammar (TAG)
for this seminar I’m taking. TAG is an extension of context-free grammar (CFG) that’s more powerful
while still being polynomially-parsable. Anyhow, TAG makes use of “tree production rules” instead of
the “linear” production rules of CFG: instead of S -> NP VP
, you’d have a small tree, the root of
which being S
, having NP
and VP
as its children. Of course these trees can be more than
two-level deep, and they go all sorts of operations such as substitution and adjunction, but that’s
for the parser.
So I needed a compact and (hopefully) readable way to express such trees in my code. At first I used lots of parenthesis, which was ugly and cumbersome, but then I devised this:
And here’s how you use it:
The peculiar +S
is a way to mark that a leaf node is a foot, which is part of the semantics TAG
(that’s where adjunction takes place). It’s represented in the diagram by the more conventional S*
,
but I had to resort to a unary operator in the code. Anyway, I’m not sure if it’s a recipe or just
a nice trick, but I thought I’d share this.
By the way, you can use both __sub__
and __neg__
to achieve things like S-----X
(i.e.,
more than a single -
sign, to allow for better padding), but I tried to avoid too much ASCII art.
I’d love to hear about other such ideas!