Mildly context-sensitive language

From Wikipedia, the free encyclopedia

In formal grammar theory, mildly context-sensitive languages are a class of formal languages which can be efficiently parsed, but still possess enough context sensitivity to allow the parsing of natural languages. The concept was first introduced by Aravind Joshi in 1985.

The formal conditions imposed on the class are:

1: The languages must be parsable in polynomial time.

2: The language must have constant growth; this means that the distribution of string lengths should be linear rather than supralinear. This is often guaranteed by proving a pumping lemma for some class of mildly context-sensitive languages.

3: The language should admit limited cross-serial dependencies, allowing grammatical agreement to be enforced between two arbitrarily long subphrases; this condition is not satisfied by context-free grammars. This is formally enforced by requiring that the language consisting of strings concatenated with themselves belong to the class of mildly context-sensitive languages.

Some attempts at creating mildly context-sensitive language formalisms include the linear context-free rewriting systems developed by D. J. Weir, the minimalist grammars of Edward P. Stabler, the head grammars of Carl Pollard, the combinatory categorial grammars developed by Mark Steedman, the linear indexed grammars defined by Gerald Gazdar, and the tree-adjoining grammars developed by Aravind Joshi. The first two of these grammar classes define the same set of languages, while the rest define a single, strictly smaller class; while all languages in both classes are mildly context-senstive and both classes support some cross-serial dependency, Laura Kallmeyer believes that neither class exhausts the full set of mildly context-sensitive languages.

The larger of the classes described above may be parsed by thread automatons, while the smaller one may be parsed by embedded pushdown automatons.

1 Control Language Hierarchy
2 See also
3 Further reading
4 External links

[edit] Control Language Hierarchy

A more precisely defined hierarchy of languages that correspond to the mildly context-sensitive class was defined by David J. Weir. Based on the work of Nabil A. Khabbaz, Weir's Control Language Hierarchy is a containment hierarchy of countable set of language classes where the Level-1 is defined as context-free, and Level-2 is the class of tree-adjoining and the other three grammars.

Following are some of the properties of Level-k languages in the hierarchy:

Level-k languages are properly contained by Level-k+1 language class
Level-k languages can be parsed in $O(n^{3\cdot2^{k-1}})$ time
Level-k contains the language $\{a_1^n ... a_{2^k}^n|n\geq0\}$ , but not $\{a_1^n ... a_{2^{k+1}}^n|n\geq0\}$
Level-k contains the language $\{w^{2^{k-1}}|w\in\{a,b\}^*\}$ , but not $\{w^{2^{k-1}+1}|w\in\{a,b\}^*\}$

Those properties correspond (at least for small k>1) well to Joshi's definition of the mildly context-sensitive languages.

[edit] See also

[edit] Further reading

Joshi, Aravind; Vijay-Shanker, K. & Weir, David (1991), “The Convergence of Mildly Context-Sensitive Grammar Formalisms”, in Sells, Peter; Shieber, Stuart & Wasow, Thomas, Foundational Issues in Natural Language Processing, Cambridge, MA: MIT Press, pp. 31–81, ISBN 0-262-19303-5, <http://repository.upenn.edu/cgi/viewcontent.cgi?article=1571&context=cis_reports> .

Vijay-Shanker, K. & Weir, David (1994), “The Equivalence of Four Extensions of Context-Free Grammars”, Mathematical Systems Theory 27 (6): 511–546, ISSN 0025-5661, <http://citeseer.ist.psu.edu/vijay-shanker94equivalence.html> .

Weir, D. J. (1992). "A geometric hierarchy beyond context-free languages". Jounal of Computer and System Sciences 104 (2): 235-261. .

[edit] External links

Automata theory: formal languages and formal grammars
Chomsky hierarchy	Grammars	Languages	Minimal automaton
Type-0	Unrestricted	Recursively enumerable	Turing machine
n/a	(no common name)	Recursive	Decider
Type-1	Context-sensitive	Context-sensitive	Linear-bounded
n/a	Indexed	Indexed	Nested stack
n/a	Tree-adjoining etc.	(Mildly context-sensitive)	Embedded pushdown
Type-2	Context-free	Context-free	Nondeterministic pushdown
n/a	Deterministic context-free	Deterministic context-free	Deterministic pushdown
Type-3	Regular	Regular	Finite
n/a		Star-free	Counter-Free
Each category of languages or grammars is a proper subset of the category directly above it, and any automaton in each category has an equivalent automaton in the category directly above it.

Categories: Formal languages

Mildly context-sensitive language

From Wikipedia, the free encyclopedia

Contents

[edit] Control Language Hierarchy

[edit] See also

[edit] Further reading

[edit] External links

Views

Navigation

Interaction

Search

Languages