June 06, 2013

I thought it’d be useful to publish some of the seminars/technical papers that I worked on when studying at Tel Aviv University, instead of letting them rot in a drawer. Hope you find it useful/interesting.

Chierchia: On Plurality of Mass Nouns

A technical report on Gennaro Chierchia’s paper, Plurality of Mass Nouns and the Notion of “Semantic Parameter”. Chierchia offers a lattice-theoretical treatment for pluralities of all sorts (especially the mass/count noun distinction). I found it very interesting, but it quite technical and requires some courses in formal semantics so as not to get lost.

I began working on this project with very little expectations, but it introduced me to the world of signal processing in general, and SciPy more specifically. It also forced me to learn Reed-Solomon codes (and since I couldn’t find any Python implementation back in the day, I released reedsolo).

The purpose of the project was to develop an invisible image-watermarking scheme that survives printing (e.g., billboard signs) and sustain considerable damage and noise, much like QR codes. The idea was that you could embed invisible “QR codes” into images, which could be scanned by mobile devices and open a URL. I didn’t get that far, but “lab tests” were positive :)

Anyhow, it could serve as a great introduction to anyone who’s starting with watermarking, wavelets or SciPy.


Earley CFG Parser

A Python (and Java) implementation of the Earley parser for Context Free Grammar (CFG). Earley is the “most efficient” parser for CFG, as it does only the “necessary amount of computation”. It achieves O(n) time for LR(k) grammars, O(n^2) for unambiguous grammars, and O(n^3) in the general case.

This implementation includes more than just the parser itself - it is also able to extract the entire parse forest (all possible derivations) from a sentence. It relates quite well to my previous blog post, Cartesian Tree-Products.

Tree Insertion Grammar (TIG) Parser

Tree Insertion Grammar (TIG) is a formalism that spun off of Tree Adjoining Grammar (TAG). The two are very similar and share common properties – in both, the grammar is defined as trees which are “embedded” one into the other – but TIG allows trees to be wrapped from only one side, not both. This restriction makes TIG of equivalent power to CFG, while TAG is strictly stronger and requires O(n^6) work.

The work below presents the subject in further detail as well as a parser (most probably the only Python TIG parser).

TIG parser code

Iterated Learning Model: a Review (ILM)

Iterated Learning Model (ILM) is an interesting approach to language evolution, which boils down to the observation that all languages must be able to “squeeze” into the bottleneck of language acquisition: if a language is “too complex”, it won’t be able to “compress itself” into the bottleneck (pass on to future generations) and thus must adapt. In other words, languages evolve so that they are “regular enough” to be acuirable by their users.

My paper sums up some of Kirby’s and Brighton’s experiments and offers my criticism concerning ILM. I was very enthusiastic when I approached the subject, but I grew skeptic as I delved into the matter.

More resources: