They're talking about determining the origin of a sound. They're saying if a sound originates from anywhere on the line of the cone shape it becomes impossible to decipher where it originated accurately. Humans actually have decent vertical echolocation, but just like dogs do when they're trying to vertically echolocate we can also just face the approximate sound origin and tilt our heads to improve echolocation accuracy because our brains use the difference in elevation between our left and right ears to pinpoint the sound's location.
It refers to the ability to locate the source of a sound. It supposes that the localization is achieved only by the difference of the time of arrival of the sound to both ears. That's why the curve is a hyperbola, which is the set of points which distances to the foci (ears) have the same difference, so you couldn't differentiate which of all the points in the hyperbola is the actual source (confusion). But this is too simplistic, the auditory system is much more sophisticated and the source can be localized by other means.