Mildly context-sensitive language

From Wikipedia, the free encyclopedia

In formal grammar theory, mildly context-sensitive languages are a class of formal languages which can be efficiently parsed, but still possess enough context sensitivity to allow the parsing of natural languages. The concept was first introduced by Aravind Joshi in 1985.

The formal conditions imposed on the class are:

1: The languages must be parsable in polynomial time.

2: The language must have constant growth; this means that the distribution of string lengths should be linear rather than supralinear. This is often guaranteed by proving a pumping lemma for some class of mildly context-sensitive languages.

3: The language should admit limited cross-serial dependencies, allowing grammatical agreement to be enforced between two arbitrarily long subphrases; this condition is not satisfied by context-free grammars. This is formally enforced by requiring that the language consisting of strings concatenated with themselves belong to the class of mildly context-sensitive languages.

Some attempts at creating mildly context-sensitive language formalisms include the linear context-free rewriting systems developed by D. J. Weir, the minimalist grammars of Edward P. Stabler, the head grammars of Carl Pollard, the combinatory categorial grammars developed by Mark Steedman, the linear indexed grammars defined by Gerald Gazdar, and the tree-adjoining grammars developed by Aravind Joshi. The first two of these grammar classes define the same set of languages, while the rest define a single, strictly smaller class; while all languages in both classes are mildly context-senstive and both classes support some cross-serial dependency, Laura Kallmeyer believes that neither class exhausts the full set of mildly context-sensitive languages.

The larger of the classes described above may be parsed by thread automatons, while the smaller one may be parsed by embedded pushdown automatons.

Contents

[edit] Control Language Hierarchy

A more precisely defined hierarchy of languages that correspond to the mildly context-sensitive class was defined by David J. Weir. Based on the work of Nabil A. Khabbaz, Weir's Control Language Hierarchy is a containment hierarchy of countable set of language classes where the Level-1 is defined as context-free, and Level-2 is the class of tree-adjoining and the other three grammars.

Following are some of the properties of Level-k languages in the hierarchy:

  • Level-k languages are properly contained by Level-k+1 language class
  • Level-k languages can be parsed in O(n^{3\cdot2^{k-1}}) time
  • Level-k contains the language \{a_1^n ... a_{2^k}^n|n\geq0\}, but not \{a_1^n ... a_{2^{k+1}}^n|n\geq0\}
  • Level-k contains the language \{w^{2^{k-1}}|w\in\{a,b\}^*\}, but not \{w^{2^{k-1}+1}|w\in\{a,b\}^*\}

Those properties correspond (at least for small k>1) well to Joshi's definition of the mildly context-sensitive languages.

[edit] See also

[edit] Further reading

  • Weir, D. J. (1992). "A geometric hierarchy beyond context-free languages". Jounal of Computer and System Sciences 104 (2): 235-261. .

[edit] External links


Automata theory: formal languages and formal grammars
Chomsky
hierarchy
Grammars Languages Minimal
automaton
Type-0 Unrestricted Recursively enumerable Turing machine
n/a (no common name) Recursive Decider
Type-1 Context-sensitive Context-sensitive Linear-bounded
n/a Indexed Indexed Nested stack
n/a Tree-adjoining etc. (Mildly context-sensitive) Embedded pushdown
Type-2 Context-free Context-free Nondeterministic pushdown
n/a Deterministic context-free Deterministic context-free Deterministic pushdown
Type-3 Regular Regular Finite
n/a Star-free Counter-Free
Each category of languages or grammars is a proper subset of the category directly above it,
and any automaton in each category has an equivalent automaton in the category directly above it.
Languages