Context-sensitive grammar

From Wikipedia, the free encyclopedia

A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols. Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton.

The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context. A formal language that can be described by a context-sensitive grammar is called a context-sensitive language.

1 Formal definition
2 Examples
3 Normal forms
4 Computational properties and uses
5 See also
6 References

[edit] Formal definition

A formal grammar G = (N, Σ, P, S) is context-sensitive if all rules in P are of the form

αAβ → αγβ

where A ∈ N (i.e., A is a single nonterminal), α,β ∈ (N U Σ)* (i.e., α and β are strings of nonterminals and terminals) and γ ∈ (N U Σ)⁺ (i.e., γ is a nonempty string of nonterminals and terminals).

In addition, a rule of the form

S → λ provided S does not appear on the right side of any rule

where λ represents the empty string is permitted. The addition of the empty string allows the statement that the context sensitive languages are a proper superset of the context free languages, rather than having to make the weaker statement that all context free grammars with no →λ productions are also context sensitive grammars.

The name context-sensitive is explained by the α and β that form the context of A and determine whether A can be replaced with γ or not. This is different from a context-free grammar where the context of a nonterminal is not taken into consideration.

If the possibility of adding the empty string to a language is added to the strings recognized by the noncontracting grammars (which can never include the empty string) then the languages in these two definitions are identical.

[edit] Examples

This grammar generates the canonical non-context-free language $\{ a^n b^n c^n : n \ge 1 \}$ :

S → aRc

R → aRT | b

bTc → bbcc

bTT → bbUT

UT → UU

UUc → VUc → Vcc

UV → VV

bVc → bbcc

bVV → bbWV

WV → WW

WWc → TWc → Tcc

WT → TT

The generation chain for aaa bbb ccc is:

aRc

aaRTc

aaaRTTc

aaabTTc

aaabbUTc

aaabbUUc

aaabbVUc

aaabbVcc

aaabbbccc

More complicated grammars can be used to parse $\{ a^n b^n c^n d^n : n \ge 1 \}$ , and other languages with even more letters:

S -> abcd

S -> aXbcd

Xb -> bX

Xc -> bYc

Yc -> cY

Yd -> Rcdd

cR -> Rc

bR -> Rb

aR -> aaX | aa

(This grammar is not in fact context-sensitive, because of the presence of productions such as Xb -> bX. However, there does exist a context-sensitive grammar for this language.)

The generation chain for aaa bbb ccc ddd is:

aXbcd

abXcd

abbYcd

abbcYd

abbcRcdd

abbRccdd

abRbccdd

aRbbccdd

aaXbbccdd

aabXbccdd

aabbXccdd

aabbbYccdd

aabbbcYcdd

aabbbccYdd

aabbbccRcddd

aabbbcRccddd

aabbbRcccddd

aabbRbcccddd

aabRbbcccddd

aaRbbbcccddd

aaabbbcccddd

[edit] Normal forms

Every context-sensitive grammar which does not generate the empty string can be transformed into an equivalent one in Kuroda normal form. "Equivalent" here means that the two grammars generate the same language. The normal form will not in general be context-sensitive, but will be a noncontracting grammar.

[edit] Computational properties and uses

The decision problem that asks whether a certain string s belongs to the language of a certain context-sensitive grammar G, is PSPACE-complete. There are even some context-sensitive grammars whose fixed grammar recognition problem is PSPACE-complete.

The emptiness problem for context-sensitive grammars (given a context-sensitive grammar G, is $L(G)=\emptyset$ ?) is undecidable.

It has been shown that nearly all natural languages may in general be characterized by context-sensitive grammars, but the whole class of CSG's seems to be much bigger than natural languages. Worse yet, since the aforementioned decision problem for CSG's is PSPACE-complete, that makes them totally unworkable for practical use, as a polynomial-time algorithm for a PSPACE-complete problem would imply P=NP. Ongoing research on computational linguistics has focused on formulating other classes of languages that are "mildly context-sensitive" whose decision problems are feasible, such as tree-adjoining grammars, combinatory categorial grammars, coupled context-free languages, and linear context-free rewriting systems. The languages generated by these formalisms properly lie between the context-free and context-sensitive languages.

[edit] See also

Chomsky hierarchy

[edit] References

Introduction to Languages and the Theory of Computation by John C. Martin McGraw Hill 1996 (2nd edition)

Automata theory: formal languages and formal grammars
Chomsky hierarchy	Grammars	Languages	Minimal automaton
Type-0	Unrestricted	Recursively enumerable	Turing machine
n/a	(no common name)	Recursive	Decider
Type-1	Context-sensitive	Context-sensitive	Linear-bounded
n/a	Indexed	Indexed	Nested stack
n/a	Tree-adjoining etc.	(Mildly context-sensitive)	Embedded pushdown
Type-2	Context-free	Context-free	Nondeterministic pushdown
n/a	Deterministic context-free	Deterministic context-free	Deterministic pushdown
Type-3	Regular	Regular	Finite
n/a		Star-free	Counter-Free
Each category of languages or grammars is a proper subset of the category directly above it, and any automaton in each category has an equivalent automaton in the category directly above it.

Categories: Formal languages

Context-sensitive grammar

From Wikipedia, the free encyclopedia

Contents

[edit] Formal definition

[edit] Examples

[edit] Normal forms

[edit] Computational properties and uses

[edit] See also

[edit] References

Views

Navigation

Interaction

Search

Languages