OFAI

Action Verb Corpus (AVC)

The Action Verb Corpus (download link) comprises multimodal data of 46 episodes (recordings) conducted by 12 humans with in total 390 instances of simple actions -- take, put, and push. Recorded are audio, video and motion data (hand and arm) while participants perform an action and describe what they do. Details about how the data was collected can be found in Stephanie Gross, Matthias Hirschmanner, Brigitte Krenn, Friedrich Neubarth, Michael Zillich: Action Verb Corpus. LREC 2018. An extension to AVC focusing on visual action recognition is available here.

The data are licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0). This license allows you to use the data free of charge for non-commercial purposes. You may modify and redistribute the data as long as you keep the attribution to the original in all the files, publish your work under the same licencse and cite Stephanie Gross, Matthias Hirschmanner, Brigitte Krenn, Friedrich Neubarth, Michael Zillich: Action Verb Corpus. LREC 2018.

The dataset consists of the following information:

  • the merged output of the object trackers, including the object poses and their reliability estimate calculated by the object tracker, whether an object is touched by or is in the hand of the instructor and whether the object touches the table, for the coordinate system applied see this picture (one csv file per episode/recording, named Objects.csv),
  • the head, hand and arm positions, including per frame the 3D positions of the joints in the elbow, wrist, and knuckles of the instructor's hands (one csv file per episode/recording, named Hands.csv), the interpretation of the figures is as followes:
    • HandID: 0 right, 1 left
    • FingerID: 0 thumb, 1 index, 2 middle, 3 ring, 4 pinky
    • BoneID: 0 metacarpal, 1 proximal, 2 intermediate, 3 distal
  • the merged hand and object positions (one file per episode/recording, named Merged.csv),
  • the videos from leap motion showing the hand movements and objects (one avi file per episode/recording, named HandsObjects_libm.avi),
  • an animation of the merged hand and object tracking (one avi file per episode/recording),
  • the following list of annotations synchronized with the real-time animation of the hand and object tracking and with the speech stream (one eaf (ELAN) and one csv file per episode/recording)
    • manual orthographic transcriptions and translitations of utterances,
    • part-of-speech tags, automatically generated with the Tree-Tagger (Schmid 1995) and manually corrected,
    • lemmata, automatically generated with the Tree-Tagger and manually corrected,
    • information which object is currently moved, and where it is moved to (manually annotated),
    • information whether the left or right hand touches a particular object (manually annotated),
    • information whether a particular object touches the ground/table (automatically identified by the object tracker and manually corrected),
    • position of stationary objects in the scene (automatically calculated from output of object tracker),

Acknowledgments

Corpus creation and annotation was supported by the WWTF project RALLI and the CHIST-ERA HLU project ATLANTIS. The dataset was recorded at ACIN, TUW.