Human speechome project

From Wikipedia, the free encyclopedia

The Human Speechome Project (pronounced "speech-ome", rhymes with "genome") is being conducted at the Massachusetts Institute of Technology's Media Laboratory by the Cognitive Machines Group, headed by Associate Professor Deb Roy. It is an effort to observe and model the language acquisition of a single child unobtrusively at his English-speaking home in great detail over the first three years of his life. The resultant data is being used to create computational models which could yield further insight into language acquisition. [1]

Contents

[edit] Rationale

Most studies of human speech acquisition in children have been done in laboratory settings and with sampling rates of only a couple of hours per week. The need for studies in the more natural setting of the child's home, and at a much higher sampling rate approaching the child's total experience, led to the development of this project concept.[1]

Just as the Human Genome Project illuminates the innate genetic code that shapes us, the Speechome project is an important first step toward creating a map of how the environment shapes human development and learning.
Frank Moss, director of the Media Lab [2]

[edit] Methodology

A digital network consisting of eleven video cameras, fourteen microphones, and an array of data capture hardware has been installed in the home of the subject, giving as complete, 24-hour coverage of the child's experiences as possible. The motion-activated cameras are ceiling-mounted, wide-angle, unobtrusive units providing overhead views of all primary living areas. Sensitive boundary layer microphones are located in the ceilings near the cameras.

Video image resolution is sufficient to capture gestures and head orientation of people and identity of mid-sized objects anywhere in a room, but insufficient to resolve direction of eye gaze and similar subtle details. Audio is sampled at greater than CD quality, yielding recordings of speech that are easily transcribed. A cluster of ten computers and audio samplers with a capacity of five terabytes[2] is located in the basement of the house to capture the data. Data from the cluster is moved manually to the MIT campus as necessary for storage in a one-million-gigabyte (one-petabyte) storage facility. [1]

[edit] Privacy Issues

To provide control of the observation system to the occupants of the house, eight touch-activated displays have been wall-mounted throughout the house. These allow for stopping and starting video and or audio recording, and also provide an "oops" capability wherein the occupants can erase any number of minutes of recording permanently from the system. Motorized "privacy shutters" move to cover the cameras when video recording is turned off, providing natural feedback of the state of the system. On most days, audio recording is turned off throughout the house at night after the child is asleep and then turned back on in the morning. Audio and/or video are also often turned off periodically at the discretion of the participants, for example, during the adult dinner time. [1]

[edit] Data Analysis Tools

Data is being gathered at an average rate of 200 gigabytes per day. This has necessitated the development of sophisticated data-mining tools to reduce analysis efforts to a manageable level. This includes analysis of audio spectrograms. Transcripts of significant speech (all that is heard and produced by the child) add a labor-intensive dimension to the study, and advanced techniques are being developed to cope with this burden. [1] In order to securely store the project's data, a large storage array is being constructed at the MIT Media Lab. This construction is in collaboration with Bell Microproducts, Seagate, and Zetera Corporation. [3]

[edit] Modeling Efforts

Building upon earlier efforts of the Cognitive Machines Group, researchers are advancing from a simpler modeling of noun-picture relationships to address issues of semantic grounding in terms of physical and social action, and recognition of intentions. Semi-automation of learning behavior grammars from video data is being advanced to construct a behavior lexicon. Extensions of this work are focusing on developing a video parser that uses grammars constructed from acquired behavior patterns to infer latent structure underlying movement patterns. Cross-situational learning algorithms are being developed to learn mappings from spoken words and phrases to these latent structures. [1]

[edit] References

  1. ^ a b c d e f Roy, et al, Deb (2006). "The Human Speechome Project". Retrieved on 2008-01-03.
  2. ^ a b Wright, Sarah H. (2006). "Media Lab project explores language acquisition". MIT News Office. Retrieved on 2008-01-03.
  3. ^ "News Announcement" (2006). Retrieved on 2008-01-03.

[edit] See Also

[edit] External Links

  • Language Acquisition, an article by Steven Pinker of MIT. This is a non-final, draft version of this highly informative article.