Towards Automated Sign Language Recognition from Video

Goal:

To advance the design of robust computer representations and algorithms for recognizing American Sign Language from video. (An overview ppt file is here).

Broader Impact:

To facilitate the communication between the Deaf and the hearing population.

To bridge the gap in access to next generation Human Computer Interface.

Scientific Contributions:

We have developed representations and approaches that can

Capture the global (Gestalt) configuration of hand and face relationship using relational distributions. It is somewhat robust to segmentation errors and does not require part tracking.

Learn, without supervision, sign models from examples using automated common motif extraction using Markov Chain Monte Carlo methods

Recognize in the presence of movement epenthesis, i.e. hand movements that appear between two signs, using enhanced Level Building approach.

Automatically segment an ASL sentence into signs using Conditional Random Fields.

Match signs and gestures in the presence of segmentation noise using fragment-Hidden Markov Models (frag-HMM)

Publications

Nayak, S., Duncan, K., Sarkar, S., & Loeding, B. (2012), Finding recurrent patterns from continuous sign language sentences for automated extraction of signs.Journal of Machine Learning Research, 13, 2589-2615. Impact factor 2.0.

Nayak, S., Sarkar, S, & Loeding, B. (2012), Automated extraction of signs from continuous language sentences using iterated conditional modes, IEEE Transactions on pattern analysis and machine intelligence.31, 5, 795-810.

Sarkar, S., Loeding, B., Yang, R., Nayak, S. & Parashar, A. (2011), Segmentation-robust representations, matching, and modeling for sign languageComputer Vision and Pattern Recognition Workshops, IEEE Computer Society. p 13-19. ISSN: 2160-7508 Print ISBN: 978-1-4577-0529-8 INSPEC Accession Number: 12173887.

Yang, R., Sarkar, S., & Loeding, B.(2010), Handling movement epenthesis and hand segmentation ambiguities in continuous sign language recognition using nested dynamic programmingIEEE Transactions on Pattern Analysis and Machine Intelligence 32, 3, 462-477.

S. Nayak, S. Sarkar, B. Loeding, Automated Extraction of Signs from Continuous Sign Language Sentences using Iterated Conditional Modes,IEEE Conference on Computer Vision and Pattern Recognition, June 2009. Poster, Results on complete data

R. Yang and S. Sarkar, Handling Movement Epenthesis and Hand Segmentation Ambiguities in Continuous Sign Language Recognition using Nested Dynamic Programming,IEEE Transactions on Pattern Analysis and Machine Intelligence, accepted Jan 2009

R. Yang and S. Sarkar, Coupled Grouping and Matching for Sign and Gesture Recognition, Computer Vision and Image Understanding, vol. 113, no. 6, pp. 663-681, June 2009.

S. Nayak, S. Sarkar, B. Loeding, Distribution-based dimensionality reduction applied to articulated motion recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 5, May 2009.

Ruiduo Yang Dynamic programming with multiple candidates and its applications to sign language and hand gesture recognition., Dissertation, University of South Florida (USF), 2008.

Sunita Nayak Representation and learning for sign language recognition., Dissertation, University of South Florida (USF), 2008.

R. Yang; S. Sarkar, B. Loeding, Enhanced Level Building Algorithm for the Movement Epenthesis Problem in Sign Language Recognition, IEEE Conference. on Computer Vision and Pattern Recognition, 2007.

R. Yang; Sarkar, S., Gesture Recognition using Hidden Markov Models from Fragmented Observations, IEEE Conference on Computer Vision and Pattern Recognition pp. 766- 773, 17-22 June 2006.

R. Yang and S. Sarkar, Detecting Coarticulation in Sign Language using Conditional Random Fields, International Conference on Pattern Recognition vol.2, pp. 108- 112, 20-24 Aug. 2006.

S. Nayak, S. Sarkar, and B. Loeding, Unsupervised Modeling of Signs Embedded in Continuous Sentences, IEEE Workshop on Vision for Human-Computer Interaction, vol. 3, pp. 81, June 2005.

R. Yang, S. Sarkar, B. L. Loeding, A. I. Karshmer, Efficient Generation of Large Amounts of Training Data for Sign Language Recognition: A Semi-automatic Tool, International Conference on Computers Helping People with Special Needs, 2006: 635-642.

S. Nayak, S. Sarkar, and K. Sengupta, Modeling Signs using Functional Data Analysis, IAPR Conference on Computer Vision, Graphics, and Image Processing, 2004.

B. L. Loeding, S. Sarkar, A. Parashar, A. Karshmer, Progress in Automated Computer Recognition of Sign Language, International Conference on Computers Helping People with Special Needs 2004: 1079-1087.

Sunita Nayak, A vision-based approach for unsupervised modeling of signs embedded in continuous sentences, Master's thesis, University of South Florida, Tampa, 2005.

Ayush S Parashar, Representation and interpretation of manual and non-manual information for automated American Sign Language recognition,Master's thesis, University of South Florida, Tampa, 2003.

Data Sets

A data set with 25 sentences with 5 instances of each sentence taken against plain background is available for distribution. The vocabulary is the context of an airport security check scenario. For the signs in 4 instances of each sentence, we also have ground truth (manually generated) for the dominant hand locations (each pixel from the hands are marked) and the head location is marked with a rectangle. A release document needs to be signed by a permanent faculty member or researcher. We need the original signed document.

Code

Frag-HMM source code is HERE. Frag HMM is a type of HMM taking the observation as the grouping result of a low level over segmentation output.

Matching of signs to sentences using enhanced-Level Building and other associated tools

A tool to annotate sign language sentences

Funding Acknowledgement

This work was supported in part by the National Science Foundation under ITR grant IIS 0312993 and is currently being supported by funds from the USF Center for Pattern Recognition. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).

Last revised: 19 October 2009