RGB-D and Point Clouds in visionOS

Dear all,

We are building an XR application demonstrating our research on open-vocabulary 3D instance segmentation for assistive technology. We intend on bringing it to visionOS using the new Enterprise APIs. Our method was trained on datasets resembling ScanNet which contain the following:

localized (1) RGB camera frames (2) with Depth (3) and camera intrinsics (4)
point cloud (5)

I understand, we can query (1), (2), and (4) from the CameraFrameProvider. As for (3) and (4), it is unclear to me if/how we can obtain that data.

In handheld ARKit, this example project demos how the depthMap can be used to simulate raw point clouds. However, this property doesn't seem to be available in visionOS.

Is there some way for us to obtain depth data associated with camera frames?

"Faking" depth data from the SceneReconstructionProvider-generated meshes is too coarse for our method. I hope I'm just missing some detail and there's some way to configure CameraFrameProvider to also deliver depth and/or point clouds.

Thanks for any help or pointer in the right direction!

~ Alex