THOR

THOR Challenge

Navigate and find objects in a virtual environment

For the full release of AI2-THOR, please refer to this link.

In this challenge, the task is to navigate towards a set of objects in indoor scenes based only on visual input. AI2-THOR enables navigation in a set of near-photo-realistic indoor scenes, and we will use this framework for the challenge. The agents that the challenge participants provide receive visual input from THOR and should perform an action based on the visual input.

Competition Conclusion

A team from National Tsing Hua University (NTHU) has won the THOR challenge with their strong showing on the discrete navigation challenge.

The NTHU team will present their work in the Workshop on Visual Understanding Across Modalities @ CVPR 2:00-2:30 pm on 7/26. Everyone is encouraged to attend!

Discrete Navigation Leaderboard

Rank	Entrant	Score
1	JamesChuang	0.04248
2	phunghx	0.01651
3	random	0.00001

Test Data

The test data for the THOR challenge is now available. To use it, please download the test target definition file and the THOR build corresponding to your architecture linked here. Installation is unchanged from the instructions below, simply replace the THOR build where it appears.

wget https://s3-us-west-2.amazonaws.com/ai2-vision-robosims/builds/thor-challenge-targets-test.zip
wget https://s3-us-west-2.amazonaws.com/ai2-vision-robosims/builds/thor-201706291201-OSXIntel64.zip
wget https://s3-us-west-2.amazonaws.com/ai2-vision-robosims/builds/thor-201706291201-Linux64.zip

The instructions for running the robosims framework are also unchanged from those given below. You need only to replace the training targets file with thor-challenge-targets-test.zip and the THOR build given in unity_path with the appropriate OSX or Linux build linked here.

Tasks

The task is to navigate and find objects using a set of predefined actions. The agent is placed in a random location within the scene and provided a target image (e.g. image of an apple). The action set is (MoveAhead, MoveRight, MoveLeft, MoveBack, LookUp, LookDown, RotateRight, RotateLeft). The agent must not only navigate to the closest point possible to the object, but also be looking at the object. The release for the challenge includes 30 scenes (15 kitchens and 15 living rooms), where 20 scenes are training scenes, and 10 scenes are for validation. We will release the test scenes later.

Target Image	Agent Start Position	Agent Target Position

Competition

The evaluation server is up and running!
The evaluation server is being hosted by Codalab and can be found here.

A prize of $3,000 will be awarded separately to the highest scoring entrant of the discrete and continuous navigation subtasks.

To submit results, participants should:

1) Sign up for a Codalab account by clicking the Sign Up button on the competition page linked above.
2) Click on the Participate tab, agree to the terms and conditions, and click register.
3) If needed, navigate to the Participate tab and click on the Get Data side tab to download the train and dev sets.
4) Navigate to the Learn the Details tab and click on the Evaluation tab in the sidebar to read about the submission format required.
5) After your request is approved, navigate to the Participate tab and then click on the Submit/View Results tab in the sidebar.
6) Click the submit button and upload a results file.
7) After your submission is scored you will have a chance to review it before posting it to the leaderboard.
8) If you have questions, please ask them in the THOR competition forum (located under the Forum tab)

Dates

The tentative dates for the competition are:

Test set is released	June 26th (tentative)
Final submission	July 15th (11:59PM GMT)
Winners announced	July 20th
Conference workshop	July 26th

Installation

The framework for the challenge consists of three parts:

THOR: Game engine that hosts the scenes (Linux/OSX)
Robosims: Python framework which interacts with the game engine
Targets: zip file containing a JSON targets file as well as target images

OSX Installation

pip install https://s3-us-west-2.amazonaws.com/ai2-vision-robosims/builds/robosims-0.0.9-py2.py3-none-any.whl
wget https://s3-us-west-2.amazonaws.com/ai2-vision-robosims/builds/thor-201705011400-OSXIntel64.zip
wget https://s3-us-west-2.amazonaws.com/ai2-vision-robosims/builds/thor-challenge-targets.zip
unzip thor-challenge-targets.zip
unzip thor-201705011400-OSXIntel64.zip

Linux Installation

To run on Linux you must have an X server running that has a GLX module enabled.
pip install https://s3-us-west-2.amazonaws.com/ai2-vision-robosims/builds/robosims-0.0.9-py2.py3-none-any.whl
wget https://s3-us-west-2.amazonaws.com/ai2-vision-robosims/builds/thor-201705011400-Linux64.zip
wget https://s3-us-west-2.amazonaws.com/ai2-vision-robosims/builds/thor-challenge-targets.zip
unzip thor-challenge-targets.zip
unzip thor-201705011400-Linux64.zip

Robosims Example - Training


#!/usr/bin/env python
import robosims
import json
env = robosims.controller.ChallengeController(
    # Use unity_path=thor-201705011400-OSXIntel64.app/Contents/MacOS/thor-201705011400-OSXIntel64 for OSX
    unity_path='projects/thor-201705011400-Linux64',
    x_display="0.0" # this parameter is ignored on OSX, but you must set this to the appropriate display on Linux
)
env.start()
with open("thor-challenge-targets/targets-train.json") as f:
    t = json.loads(f.read())
    for target in t:
        env.initialize_target(target)
        # Possible actions are: MoveLeft, MoveRight, MoveAhead, MoveBack, LookUp, LookDown, RotateRight, RotateLeft
        event = env.step(action=dict(action='MoveLeft'))

        # path to the target image (e.g. apple, lettuce, keychain, etc.)
        print(target['targetImage'])

        # target_found() returns True/False if the target has been found
        print(env.target_found())

        # image of the current frame from the agent - numpy array of shape (300,300,3) in RGB order
        image = event.frame
        
        # True/False whether the last action was successful.  Actions will fail if you attempt to move into a wall or 
        # LookUp/Down beyond the allowed range.
        print(event.metadata['lastActionSuccess'])

The structure of the target definition:

{
        "agentPositionIndex": 102,	# corresponds to a point on the grid
        "sceneIndex": 0,  # which configuration for the scene (0-4)
        "sceneName": "FloorPlan1",  # the name of the scene
        "startingHorizon": 330.0,  # starting camera horizon (up,down) in degrees of the agent's camera
        "startingRotation": 90.0,  # global rotation of the agent
        "targetAgentHorizon": 60.0,  # where agent should be looking to see the target object
        "targetAgentRotation": 180.0,  # how the agent should be rotated to see the target object
        "targetImage": "images/FloorPlan1/spoon.png",  # path to image file that shows the target image
        "targetObjectId": "Spoon|-01.73|+00.90|-00.95",  # internal id of the target object
        "targetPosition": {
            "x": -0.6949999928474426,
            "y": 0.9800000190734863,
            "z": -1.7799999713897705
        },  # final position to see the target object
        "uuid": "a27cb81c-5f68-4f90-80e8-e01e49ab188d"  # unique identifier for the task
    },

Challenge Submission

The participants should submit a file that includes the taken actions. The code below shows how to use the API. While running against the validation scenes, the target_found() method will never return True. The agent must decide when it has reached the goal and cease performing actions. Once the agent has iterated through all the targets defined in the targets file, you can save the actions to a JSON file and submit to the CodaLab challenge.


#!/usr/bin/env python
import robosims
import json
env = robosims.controller.ChallengeController(
    # Use for OSX
    # unity_path='thor-201705011400-OSXIntel64.app/Contents/MacOS/thor-201705011400-OSXIntel64',
    unity_path='projects/thor-201705011400-Linux64',
    x_display="0.0",
    record_actions=True
)
env.start()
with open("thor-challenge-targets/targets-val.json") as f:
    t = json.loads(f.read())
    for target in t:
        env.initialize_target(target)
        # navigate to the target
        event = env.step(action=dict(action='MoveLeft'))

    env.save_actions('submission-discrete.json')

Continuous vs Discrete Track

The challenge has two settings: (1) Discrete (2) Continuous. By default, the framework operates in discrete mode. This means that the agent navigates on a predefined grid. The agent is only able to look up and down in 30 degree increments and rotate in 90 degree increments. If an action is issued that pushes the agent into a wall, the agent will remain on the grid, but the lastActionSuccess value will be set to False. In continuous mode, the agent is free to rotate, look up/down and move in any amount. This allows the agent to collide with walls, move towards an object in fewer steps and rotate with greater precision.


# to enable continuous mode, pass the mode to the constructor
env = robosims.controller.ChallengeController(
    # Use for OSX
    # unity_path='thor-201705011400-OSXIntel64.app/Contents/MacOS/thor-201705011400-OSXIntel64',
    unity_path='projects/thor-201705011400-Linux64',
    x_display="0.0",
    mode='continuous'
)

Continuous actions:


# All Move actions must include the moveMagnitude parameter
event = env.step(action=dict(action='MoveLeft', moveMagnitude=0.25))

# Rotate the agent to a specific rotation in global space
event = env.step(action=dict(action='Rotate', rotation=75.0))

# Move the horizon of the agent's camera (up/down) 
event = env.step(action=dict(action='Look', horizon=35.0))

# Save actions for submission
env.save_actions('submission-continuous.json')

Evaluation

The evaluation score is calculated as the average of the optimum number of actions (o_t) over the predicted set of actions (p_t) over all the tasks:

If the predicted set of actions do not place the agent in the correct location for a task, the score for that task will be zero. The tasks include finding objects in different configurations of a scene from different starting locations.

Organizers

Roozbeh Mottaghi

Allen Institute for AI

Eric Kolve

Allen Institute for AI

Daniel Gordon

University of Washington