Languageware

From Wikipedia, the free encyclopedia

LanguageWare is a natural language processing (NLP) technology, developed by IBM, that allows applications "understand" natural language text. It comprises a light-weight set of Java libraries which provide in range of NLP functions; language identification, text segmentation and normalization, entity and relationship extraction, and semantic analysis and disambiguation.

The LanguageWare analysis engine is based upon a Finite State Machine. It manages to be very fast, while still maintaining a reasonably small dictionary size, because it uses a variety of node layout types ^[1]. This approach works very well because of the statistical properties of natural language dictionaries ^[2].

The behaviour of the system is driven through a set of lexico-semantic resources which describe the characteristics of the language and domain. A default set of resources come as part of LanguageWare and these describe the native language characteristics, such as morphology, and the basic vocabulary for the language. Supplemental resources are available which capture additional vocabularies, terminologies, rules and grammars, which may be generic to the language or specific to one or more domains.

A set of Eclipse-based customization tooling, LanguageWare Resource Workbench, is provided to allow any type of domain knowledge to be compiled into these resources and thereby incorporated into the analysis process.

LanguageWare can be deployed as a set of UIMA-compliant annotators, Eclipse plug-ins or Web Services.

1 See also
2 Interesting links
3 References
4 Citations
5 Additional Related Papers

[edit] See also

[edit] Interesting links

[edit] Citations

http://www.springerlink.com/content/dl6l5qq3wq0be630/ A. Troussov, B. O'Donovan, S. Koskenniemi, and N. Glushnev, "Per-Node Optimization of Finite-State Mechanisms for Natural Language Processing," Computational Linguistics and Intelligent Text Processing, Lecture Notes in Computer Science N 2588, Springer-Verlag, pp.223-227, (2003)

http://www.iol.ie/~bodonovan/pubs/RA-NLP-03.pdf A. Troussov and B. O’Donovan, "Statistics of Morphological Finite-State Transition Networks Obey the Power Law," Proceedings of the International Conference RANLP -- 2003 (Recent Advances in Natural Language Processing) 10-12 September 2003, Borovets, Bulgaria