Easy Syntax for Representing Trees March 07, 2012

I’m working on a parser for Tree Adjoining Grammar (TAG)
for this seminar I’m taking. TAG is an extension of context-free grammar (CFG) that’s more powerful
while still being polynomially-parsable. Anyhow, TAG makes use of “tree production rules” instead of
the “linear” production rules of CFG: instead of S -> NP VP
, you’d have a small tree, the root of
which being S
, having NP
and VP
as its children. Of course these trees can be more than
two-level deep, and they go all sorts of operations such as substitution and adjunction, but that’s
for the parser.
So I needed a compact and (hopefully) readable way to express such trees in my code. At first I used lots of parenthesis, which was ugly and cumbersome, but then I devised this:
class NonTerminal(object):
def __init__(self, name):
self.name = name
def __sub__(self, children):
return Tree(self, children)
def __pos__(self):
return Foot(self)
class Foot(object):
def __init__(self, nonterm):
self.nonterm = nonterm
class Tree(object):
def __init__(self, root, children):
self.root = root
self.children = children
And here’s how you use it:
S = NonTerminal("S")
t1 = S-["e"]
t2 = S-["a", S-["c", +S, "d"], "b"]
# Which represents the following two trees:
#
# t1: t2:
# S S
# | / | \
# | / | \
# e a S b
# / | \
# / | \
# c S* d
#
The peculiar +S
is a way to mark that a leaf node is a foot, which is part of the semantics TAG
(that’s where adjunction takes place). It’s represented in the diagram by the more conventional S*
,
but I had to resort to a unary operator in the code. Anyway, I’m not sure if it’s a recipe or just
a nice trick, but I thought I’d share this.
By the way, you can use both __sub__
and __neg__
to achieve things like S-----X
(i.e.,
more than a single -
sign, to allow for better padding), but I tried to avoid too much ASCII art.
I’d love to hear about other such ideas!