Concept Embedding through Canonical Forms and Zero-Shot

In the recognition problem, a canonical form that expresses the spatio-temporal relation of concepts for a given class can potentially increase accuracy. Concepts are defined as attributes that can be recognized using a soft matching paradigm. We consider the specific case study of American Sign Language (ASL) to show that canonical forms of classes can be used to recognize unseen gestures. There are several advantages of a canonical form of gestures including translation between gestures, gesture-based searching, and automated transcription of gestures into any spoken language. We applied our technique to two independently collected datasets: a) IMPACT Lab dataset: 23 ASL gestures each executed three times from 130 first time ASL learners as training data and b) ASLTEXT dataset: 190 gestures each executed six times on an average. Our technique was able to recognize 19 arbitrarily chosen previously unseen gestures in the IMPACT dataset from seven individuals who are not a part of 130 and 34 unseen gestures from the ASLTEXT dataset without any retraining. Our normalized accuracy on the ASLTEXT dataset is 66% which is 13.6 % higher than the state-of-art technique.

Dataset: 

Please find below the link to the dataset used for this project. If you use the dataset in your research please refer to the publication:  

Publication: 

ICPR2020(Accepted): Azamat Kamzin, Apurupa Amperyani, Prasanth Sukhapalli, Ayan Banerjee, Sandeep Gupta: Concept Embedding through Canonical Forms: A Case Study on Zero-Shot ASL Recognition.

Impact Lab ASL dataset 23 ASL gesture videos with three repetions in real-world settings

ASLTEXT with poseNet join data

Dataset Description:

The Impact lab dataset consist of .csv files collected from PoseNet framework. The format of the files are [id]_[gesture_name].The folder have test and training splits.

Each file has 17 columns. Each column represents a join location coordinate from PoseNet. More details at official blog. Each row represents frame from video files. For privacy purpose videos can not be shared at this point in time.

The ASLTEXT is a subset of ASL Lexicon Video Dataset which is collected at Boston University from ASL native signers. The ASLTEXT consisting of 250 unique gestures. There are 1598 videos out of which we utilize 1200 videos of 190 gestures not in the IMPACT dataset. Our aim in this paper is to utilize all 190 unique gestures as a test set to validate our zero-shot capabilities. We do not use any part of the ASLTEXT dataset for training purposes. We have included generated PoseNet keypoints for each of the ASLTEXT video files as well.

Methods: 

Faculty Advisor: 
Dr. Sandeep Gupta

Research Faculty:
Dr. Ayan Banerjee

Current Students:
Azamat Kamzin
Apurupa Amperayani

Collaborators:
Paul Quinn