ReferIt3D Benchmarks


With the ReferIt3D benchmarks we wish to aggregate and report the progress that is happening in the emerging field of language assisted understanding and learning in real-world 3D environments. To this end, we investigate the same questions present in our ReferIt3D paper, and compare methods that attempt to identify a single object among many objects of 3D scene, given appropriate referential language.

Specifically we consider:
  • How well such learning methods work when the input referential language is Natural as produced by speaking humans solving the task (Nr3D challenge) vs. being template-based concerning only Spatial-relations among the objects of a scene (Sr3D challenge)?
  • How such methods are affected when we vary the number of same-to-the-target-class distracting instances in the 3D scene? E.g., when handling an "Easy" case, where exactly 1 such distractor co-exists with the target vs. a "Hard" case, where there are more distractors?
  • Last, how such methods perform when the input language is View-Dependent e.g., "Facing the couch, pick the ... on your right side", vs. being View-Independent e.g., "It's the ... between the bed and the window".
In a nutshell, these questions regarding the object identification problem in 3D environments aim to disentangle the performance-characteristics of the compared approaches, aside of providing a single "aggregate" performance score, as explained in the ReferIt3D paper.


Please use our published datasets (Nr3D, Sr3D) following the official ScanNet train/val splits. Since in these benchmarks we tackle the identification problem among all objects in a scene (and not only among the same-class distractors), when using the Nr3D make sure to use only the utterances where the target-class is explicitly mentioned (mentions_target_class=True) and which where guessed correctly by the human listener (correct_guess=True).

Nr3D Challenge

Paper Overall Easy Hard View-Dependent View-Independent
ReferIt3D 35.6% 43.6% 27.9% 32.5% 37.1%
FFL-3DOG 41.7% 48.2% 35.0% 37.1% 44.7%
Text-Guided-GNNs 37.3% 44.2% 30.6% 35.8% 38.0%
InstanceRefer 38.8% 46.0% 31.8% 34.5% 41.9%
TransRefer3D 42.1% 48.5% 36.0% 36.5% 44.9%
SAT 49.2% 56.3% 42.4% 46.9% 50.4%

Sr3D Challenge

Paper Overall Easy Hard View-Dependent View-Independent
ReferIt3D 40.8% 44.7% 31.5% 39.2% 40.8%
Text-Guided-GNNs 45.0% 48.5% 36.9% 45.8% 45.0%
InstanceRefer 48.0% 51.1% 40.5% 45.4% 48.1%
TransRefer3D 57.4% 60.5% 50.2% 49.9% 57.7%
SAT 57.9% 61.2% 50.0% 49.2% 58.3%

Reporting new results

If you have new results on Sr3D or Nr3D to report, please send your performance numbers and the accompanying paper link to Panos Achlioptas.