ReferIt3D Project Page Code Paper

ReferIt3D Benchmarks

Intro

With the ReferIt3D benchmarks we wish to aggregate and report the progress that is happening in the emerging field of language assisted understanding and learning in real-world 3D environments. To this end, we investigate the same questions present in our ReferIt3D paper, and compare methods that attempt to identify a single object among many objects of 3D scene, given appropriate referential language.

Specifically we consider:
  • How well such learning methods work when the input referential language is Natural as produced by speaking humans solving the task (Nr3D challenge) vs. being template-based concerning only Spatial-relations among the objects of a scene (Sr3D challenge)?
  • How such methods are affected when we vary the number of same-to-the-target-class distracting instances in the 3D scene? E.g., when handling an "Easy" case, where exactly 1 such distractor co-exists with the target vs. a "Hard" case, where there are more distractors?
  • Last, how such methods perform when the input language is View-Dependent e.g., "Facing the couch, pick the ... on your right side", vs. being View-Independent e.g., "It's the ... between the bed and the window".
In a nutshell, these questions regarding the object identification problem in 3D environments aim to disentangle the performance-characteristics of the compared approaches, aside of providing a single "aggregate" performance score, as explained in the ReferIt3D paper.

Rules

Please use our published datasets (Nr3D, Sr3D) following the official ScanNet train/val splits. Since in these benchmarks we tackle the identification problem among all objects in a scene (and not only among the same-class distractors), when using the Nr3D make sure to use only the utterances where the target-class is explicitly mentioned (mentions_target_class=True) and which where guessed correctly by the human listener (correct_guess=True).

Nr3D Challenge

Paper Overall Easy Hard View-Dependent View-Independent
ReferIt3D 35.6% 43.6% 27.9% 32.5% 37.1%
FFL-3DOG 41.7% 48.2% 35.0% 37.1% 44.7%
Text-Guided-GNNs 37.3% 44.2% 30.6% 35.8% 38.0%
InstanceRefer 38.8% 46.0% 31.8% 34.5% 41.9%
TransRefer3D 42.1% 48.5% 36.0% 36.5% 44.9%
SAT 49.2% 56.3% 42.4% 46.9% 50.4%

Sr3D Challenge

Paper Overall Easy Hard View-Dependent View-Independent
ReferIt3D 40.8% 44.7% 31.5% 39.2% 40.8%
Text-Guided-GNNs 45.0% 48.5% 36.9% 45.8% 45.0%
InstanceRefer 48.0% 51.1% 40.5% 45.4% 48.1%
TransRefer3D 57.4% 60.5% 50.2% 49.9% 57.7%
SAT 57.9% 61.2% 50.0% 49.2% 58.3%

Reporting new results

If you have new results on Sr3D or Nr3D to report, please send your performance numbers and the accompanying paper link to Panos Achlioptas.