TAU Computational Linguistics Lab

Research

The Tel Aviv University Computational Linguistics Lab uses computational methods to investigate human language acquisition. Researchers in the lab seek to offer a perspective on learning and learnability that is informed by work in theoretical linguistics, psychology, and computer science and to support collaboration between these fields in addressing a shared research question: what can be learned? The lab's overarching research project is the creation of a fully general model of language acquisition that will allow divergent representations of grammar proposed in the linguistic literature to be evaluated and compared on a computationally and cognitively sound basis. Current projects include research on the acquisition of phonology, using both Optimality Theory and rule-based systems as models, research on the acquisition of mildly context-sensitive grammars in syntax, and a study of the computational advantages and disadvantages of segregating the lexicon from the syntactic component. The Computational Linguistics Lab is part of the Linguistics Department and the Sagol School of Neuroscience.


People



Lab manager: Dani Rodov

danirodov@mail.tau.ac.il

Nur Lan

nlan@ens.fr

Alma Frischoff

almaf@mail.tau.ac.il

Matan Abudy

matan.abudy@gmail.com



TAU Collaborators




Alumni

Maike Züfle
Adi Behar Medrano
Taly Rabinerson
Itamar Shefi
Noa Peled
Iddo Berger
Adam Rimon
Tomer Avraham
Victoria Costa
Sefi Potashnik
Tali Arad

Courses

Fall semester
0627.4090 Advanced Computational Linguistics
0627.4191 Parsing: Computation and Cognition

Spring semester
0627.2222 Computational Linguistics for Beginners
0627.4095 Learning: Computation and Cognition


Course descriptions

Syllabus
This is an introductory class in computational linguistics designed for linguists with little or no background in the subject. The class will not require a programming background. By the end of the course, students will be able to do basic-to-intermediate level programming in a language chosen for its appropriateness to the relevant work. Other topics will include the basics of data structures and algorithms and an introduction to Formal Language Theory.

Syllabus
This course will continue to build skills needed to conduct original research in computational linguistics. Programming ability is required. We will discuss finite-state tools (automata and transducers), as well as Hidden Markov and Maximum Entropy Models. We will build a morphological analyzer and a syntactic parser.

Syllabus
Part I: Computation.
In the first half of the semester we will explore mathematical and computational approaches to learning and learnability. We will study Gold's theorem from 1967 that shows that under certain assumptions, learning of even very simple classes of languages is impossible. We will proceed to discuss probabilistic approaches to learning, such as Horning's modification of Gold's paradigm and Valiant's paradigm of Probably Approximately Correct learning. We will discuss artificial neural networks, which have been proposed as a general, biologically motivated approach to learning. We will also cover Monte Carlo methods and genetic algorithms, which have been used to search through large and complex spaces of hypotheses. We will end the mathematical part of the semester with the notions of Kolmogorov Complexity and Solomonoff Induction, which allow us to quantify the total amount of information in a given input.

Part II: Cognition.
In the second half of the semester we will look at experimental attempts to determine what can and cannot be learned. We will review the experiments that led behaviorists such as Watson and Skinner to adopt a radical empiricist approach and the evidence that convinced ethologists such as Lorenz and Tinbergen to emphasize instinct. We will examine results that show that humans are very good at extracting certain kinds of statistical regularities from unanalyzed data but very bad at learning other, seemingly similar patterns. We will end the semester by looking at what can be said about the division of labor between innateness and learning based on typological generalizations and at the nuanced view on this connection offered by evolutionary approaches to language change.

Syllabus
Part I: Computation.
In the first half of the semester we will cover mathematical and computational approaches to parsing. We start by reviewing the basic algorithms for parsing with regular and context-free formalisms, both deterministically and probabilistically. We discuss the notions of weak and strong generative capacity, looking in detail at context-sensitive node admissibility conditions, generalized phrase-structure grammar, and the Lambek calculus. We then turn to mildly context-sensitive formalisms, focusing in particular on combinatory categorial grammars, tree-adjoining grammars, and minimalist grammars.

Part II: Cognition.
In the second half of the semester we discuss attempts to understand how human parsing works. We start with the classical proposals of Yngve, Miller and Chomsky, and Kimball, and then proceed to characterizations of the memory loads in different parsing strategies. We discuss the Strong Competence Hypothesis and its relation to the question of whether non-canonical constituents should be part of the grammar. We will look at proposals that tie processing difficulties to the geometric notion of open dependencies in proof nets, along with other attempts to capture processing costs in terms of resource management, such as Gibson's dependency locality theory. We also discuss approaches, such as Hale's surprisal and entropy-reduction proposals, that relate processing difficulty to the information content of the current input element.

Publications


Tel Aviv University ◆ Department of Linguistics ◆ Webb 407

mail_outline rkatzir@post.tau.ac.il
Lab's Github