Mildly context-sensitive language
From Wikipedia, the free encyclopedia
In formal grammar theory, mildly context-sensitive languages are a class of formal languages which can be efficiently parsed, but still possess enough context sensitivity to allow the parsing of natural languages. The concept was first introduced by Aravind Joshi in 1985.
The formal conditions imposed on the class are:
1: The languages must be parsable in polynomial time.
2: The language must have constant growth; this means that the distribution of string lengths should be linear rather than supralinear. This is often guaranteed by proving a pumping lemma for some class of mildly context-sensitive languages.
3: The language should admit limited cross-serial dependencies, allowing grammatical agreement to be enforced between two arbitrarily long subphrases; this condition is not satisfied by context-free grammars. This is formally enforced by requiring that the language consisting of strings concatenated with themselves belong to the class of mildly context-sensitive languages.
Some attempts at creating mildly context-sensitive language formalisms include the linear context-free rewriting systems developed by D. J. Weir, the minimalist grammars of Edward P. Stabler, the head grammars of Carl Pollard, the combinatory categorial grammars developed by Mark Steedman, the linear indexed grammars defined by Gerald Gazdar, and the tree-adjoining grammars developed by Aravind Joshi. The first two of these grammar classes define the same set of languages, while the rest define a single, strictly smaller class; while all languages in both classes are mildly context-senstive and both classes support some cross-serial dependency, Laura Kallmeyer believes that neither class exhausts the full set of mildly context-sensitive languages.
The larger of the classes described above may be parsed by thread automatons, while the smaller one may be parsed by embedded pushdown automatons.
Contents |
[edit] Control Language Hierarchy
A more precisely defined hierarchy of languages that correspond to the mildly context-sensitive class was defined by David J. Weir. Based on the work of Nabil A. Khabbaz, Weir's Control Language Hierarchy is a containment hierarchy of countable set of language classes where the Level-1 is defined as context-free, and Level-2 is the class of tree-adjoining and the other three grammars.
Following are some of the properties of Level-k languages in the hierarchy:
- Level-k languages are properly contained by Level-k+1 language class
- Level-k languages can be parsed in
time - Level-k contains the language
, but not 
- Level-k contains the language
, but not 
Those properties correspond (at least for small k>1) well to Joshi's definition of the mildly context-sensitive languages.
[edit] See also
[edit] Further reading
- Joshi, Aravind; Vijay-Shanker, K. & Weir, David (1991), “The Convergence of Mildly Context-Sensitive Grammar Formalisms”, in Sells, Peter; Shieber, Stuart & Wasow, Thomas, Foundational Issues in Natural Language Processing, Cambridge, MA: MIT Press, pp. 31–81, ISBN 0-262-19303-5, <http://repository.upenn.edu/cgi/viewcontent.cgi?article=1571&context=cis_reports>.
- Vijay-Shanker, K. & Weir, David (1994), “The Equivalence of Four Extensions of Context-Free Grammars”, Mathematical Systems Theory 27 (6): 511–546, ISSN 0025-5661, <http://citeseer.ist.psu.edu/vijay-shanker94equivalence.html>.
- Weir, D. J. (1992). "A geometric hierarchy beyond context-free languages". Jounal of Computer and System Sciences 104 (2): 235-261..
[edit] External links
- Parsing Beyond Context-Free Grammar, by Laura Kallmeyer
- Seminar on tree-adjoining grammars and mildly context-sensitive languages and formalisms, by Laura Kallmeyer
- Mildly Context-Sensitive Grammars, by Aravind Joshi
| Chomsky hierarchy |
Grammars | Languages | Minimal automaton |
|---|---|---|---|
| Type-0 | Unrestricted | Recursively enumerable | Turing machine |
| n/a | (no common name) | Recursive | Decider |
| Type-1 | Context-sensitive | Context-sensitive | Linear-bounded |
| n/a | Indexed | Indexed | Nested stack |
| n/a | Tree-adjoining etc. | (Mildly context-sensitive) | Embedded pushdown |
| Type-2 | Context-free | Context-free | Nondeterministic pushdown |
| n/a | Deterministic context-free | Deterministic context-free | Deterministic pushdown |
| Type-3 | Regular | Regular | Finite |
| n/a | Star-free | Counter-Free | |
| Each category of languages or grammars is a proper subset of the category directly above it, and any automaton in each category has an equivalent automaton in the category directly above it. |
|||

