Easy Syntax for Representing Trees March 07, 2012
I'm working on a parser for Tree Adjoining Grammar (TAG)
for this seminar I'm taking. TAG is an extension of context-free grammar (CFG) that's more powerful
while still being polynomially-parsable. Anyhow, TAG makes use of "tree production rules" instead of
the "linear" production rules of CFG: instead of
S -> NP VP, you'd have a small tree, the root of
VP as its children. Of course these trees can be more than
two-level deep, and they go all sorts of operations such as substitution and adjunction, but that's
for the parser.
So I needed a compact and (hopefully) readable way to express such trees in my code. At first I used lots of parenthesis, which was ugly and cumbersome, but then I devised this:
class NonTerminal(object): def __init__(self, name): self.name = name def __sub__(self, children): return Tree(self, children) def __pos__(self): return Foot(self) class Foot(object): def __init__(self, nonterm): self.nonterm = nonterm class Tree(object): def __init__(self, root, children): self.root = root self.children = children
And here's how you use it:
S = NonTerminal("S") t1 = S-["e"] t2 = S-["a", S-["c", +S, "d"], "b"] # Which represents the following two trees: # # t1: t2: # S S # | / | \ # | / | \ # e a S b # / | \ # / | \ # c S* d #
+S is a way to mark that a leaf node is a foot, which is part of the semantics TAG
(that's where adjunction takes place). It's represented in the diagram by the more conventional
but I had to resort to a unary operator in the code. Anyway, I'm not sure if it's a recipe or just
a nice trick, but I thought I'd share this.
By the way, you can use both
__neg__ to achieve things like
more than a single
- sign, to allow for better padding), but I tried to avoid too much ASCII art.
I'd love to hear about other such ideas!