The Charades Activity Challenge aims towards automatic understanding of daily activities, by providing realistic videos of people doing everyday activities. The Charades dataset is collected for an unique insight into daily tasks such as drinking coffee, putting on shoes while sitting in a chair, or snuggling with a blanket on the couch while watching something on a laptop. This enables computer vision algorithms to learn from real and diverse examples of our daily dynamic scenarios. The challenge consists of two separate tracks: classification and localization track. The classification track is to recognize all activity categories for given videos ('Activity Classification'), where multiple overlapping activities can occur in each video. The localization track is to find the temporal locations of all activities in a video ('Activity Localization').
The test data has been released and the evaluation server is
The evaluation server is being hosted by Codalab and can be found here.
A prize of $3,000 will be awarded separately to the highest scoring entrant of the classification and localization subtasks.
To submit results, participants should:
- 1) Sign up for a Codalab account by clicking the Sign Up button on the competition page linked above.
- 2) Click on the Participate tab, agree to the terms and conditions, and click register.
- 3) If needed, navigate to the Participate tab and click on the Get Data side tab to download the train and dev sets.
- 4) Navigate to the Learn the Details tab and click on the Evaluation tab in the sidebar to read about the submission format required.
- 5) After your request is approved, navigate to the Participate tab and then click on the Submit/View Results tab in the sidebar.
- 6) Click the submit button and upload a results file.
- 7) After your submission is scored you will have a chance to review it before posting it to the leaderboard.
- 8) If you have questions, please ask them in the Charades competition forum (located under the Forum tab)
The training and validation sets for this challenge come directly from the Charades dataset (http://allenai.org/plato/charades). This includes 9848 videos that contains 66,500 temporal annotations for 157 action classes. A new unseen test set with 2000 videos will be released closer to the submission deadline. The following components are made available:
Training Set: 7985 videos of average length 30.1 seconds with rich annotations for 157 activities including temporal boundaries, objects, verbs.
Validation Set: 1863 video with the same detail of annotations.
Test Set: 2000 videos with withheld ground truth, made available closer to the submission deadline.
The videos in this challenge contain on average 6.8 actions per video and were created by hundreds of people in their own homes.
VU17_Charades.zip (Annotations and evaluation scripts)
Training and Validation videos (scaled to 480p, 13 GB)
Training and Validation videos (original size) (55 GB)
Training and Validation videos as RGB frames at 24fps (76 GB)
Training and Validation videos as Optical Flow at 24fps (45 GB)
Training and Validation videos as Two-Stream FC7 features (RGB stream, 12 GB)
Training and Validation videos as Two-Stream FC7 features (Flow stream, 15 GB)
Code for baseline algorithms @ GitHub
VU2017 Test Set Videos (List of test videos)
Testing Videos (scaled to 480p, 2 GB)
Testing Videos (original size) (13 GB)
Testing Videos as RGB frames at 24fps (15 GB)
Testing Videos as Optical Flow at 24fps (10 GB)
Testing Videos as Two-Stream FC7 features (RGB stream, 2.6 GB)
Testing Videos as Two-Stream FC7 features (Flow stream, 3.2 GB)
The tentative dates for the competition are:
|Test set is released||June 6th|
|Final submission||July 15th (11:59PM GMT)|
|Winners announced||July 20th|
|Conference workshop||July 26th|
The VU2017 Charades challenge has the following two tasks:
Multi-label Video Classification Task
The task accepts submissions that use any available training data to train a model that predicts a score for 157 classes in each video. The performance of the algorithms is evaluated with mean average precision (mAP) across the videos.
Multi-label Action Localization Task
The task accepts submission that use any available training data to train a model that predicts a score for 157 classes for 25 equally spaced time-points in each video. The performance of the algorithms is evaluated with mean average precision (mAP) across all frames in all the videos.
For any questions, please contact email@example.com.
Carnegie Mellon University
Allen Institute for AI