Synchronized depth and video data not being received with builtInLiDARDepthCamera

Hello,

Faced with a really perplexing issue. Primary problem is that sometimes I get depth and video data as expected, but at other times I don't. And sometimes I'll get both data outputs for a 4-5 frames and then it'll just stop. The source code I implemented is a modified version of the sample code provided by Apple, and interestingly enough I can't re-create this issue with the Apple sample app. So wondering what I could be doing wrong?

Here's the code for setting up the capture input. preferredDepthResolution is 1280 in my case. I'm running this on an iPad Pro (6th gen). iOS version 17.0.3 (21A360). Encounter this issue on iPhone 13 Pro as well. iOS version is 17.0 (21A329)

private func setupLiDARCaptureInput() throws {
        // Look up the LiDAR camera.
        guard let device = AVCaptureDevice.default(.builtInLiDARDepthCamera, for: .video, position: .back) else {
            throw ConfigurationError.lidarDeviceUnavailable
        }
        
        guard let format = (device.formats.last { format in
            format.formatDescription.dimensions.width == preferredWidthResolution &&
            format.formatDescription.mediaSubType.rawValue == kCVPixelFormatType_420YpCbCr8BiPlanarFullRange &&
            format.videoSupportedFrameRateRanges.first(where: {$0.maxFrameRate >= 60}) != nil &&
            !format.isVideoBinned &&
            !format.supportedDepthDataFormats.isEmpty
        }) else {
            throw ConfigurationError.requiredFormatUnavailable
        }
        
        guard let depthFormat = (format.supportedDepthDataFormats.last { depthFormat in
            depthFormat.formatDescription.mediaSubType.rawValue == kCVPixelFormatType_DepthFloat16
        }) else {
            throw ConfigurationError.requiredFormatUnavailable
        }
        
        // Begin the device configuration.
        try device.lockForConfiguration()

        // Configure the device and depth formats.
        device.activeFormat = format
        device.activeDepthDataFormat = depthFormat
        
        let desc = format.formatDescription
        dimensions = CMVideoFormatDescriptionGetDimensions(desc)
        
        let duration = CMTime(value:1, timescale:CMTimeScale(60))
        device.activeVideoMinFrameDuration = duration
        device.activeVideoMaxFrameDuration = duration
        
        // Finish the device configuration.
        device.unlockForConfiguration()
        
        self.device = device
        
        print("Selected video format: \(device.activeFormat)")
        print("Selected depth format: \(String(describing: device.activeDepthDataFormat))")
        
        // Add a device input to the capture session.
        let deviceInput = try AVCaptureDeviceInput(device: device)
        captureSession.addInput(deviceInput)
                
        guard let audioDevice = AVCaptureDevice.default(for: .audio) else {
            return
        }
        
        // Configure audio input - always configure audio even if isAudioEnabled is false
        audioDeviceInput = try! AVCaptureDeviceInput(device: audioDevice)
        captureSession.addInput(audioDeviceInput)

        deviceSystemPressureStateObservation = device.observe(
            \.systemPressureState,
             options: .new
        ) { _, change in
            guard let systemPressureState = change.newValue else { return }
            print("system pressure \(systemPressureState.levelAsString()) due to \(systemPressureState.factors)")
        }
    }

Here's how I'm setting up the output:

private func setupLiDARCaptureOutputs() {
        // Create an object to output video sample buffers.
        videoDataOutput = AVCaptureVideoDataOutput()
        captureSession.addOutput(videoDataOutput)
        
        // Create an object to output depth data.
        depthDataOutput = AVCaptureDepthDataOutput()
        depthDataOutput.isFilteringEnabled = false
        captureSession.addOutput(depthDataOutput)
        
        audioDeviceOutput = AVCaptureAudioDataOutput()
        audioDeviceOutput.setSampleBufferDelegate(self, queue: videoQueue)
        captureSession.addOutput(audioDeviceOutput)
        
        // Create an object to synchronize the delivery of depth and video data.
        outputVideoSync = AVCaptureDataOutputSynchronizer(dataOutputs: [depthDataOutput, videoDataOutput])
       
        outputVideoSync.setDelegate(self, queue: videoQueue)
        
        // Enable camera intrinsics matrix delivery.
        guard let outputConnection = videoDataOutput.connection(with: .video) else { return }
        if outputConnection.isCameraIntrinsicMatrixDeliverySupported {
            outputConnection.isCameraIntrinsicMatrixDeliveryEnabled = true
        }
    }

The top part of my delegate implementation is as follows:

func dataOutputSynchronizer(
        _ synchronizer: AVCaptureDataOutputSynchronizer,
        didOutput synchronizedDataCollection: AVCaptureSynchronizedDataCollection
    ) {
        // Retrieve the synchronized depth and sample buffer container objects.
        guard let syncedDepthData = synchronizedDataCollection.synchronizedData(for: depthDataOutput) as? AVCaptureSynchronizedDepthData,
              let syncedVideoData = synchronizedDataCollection.synchronizedData(for: videoDataOutput) as? AVCaptureSynchronizedSampleBufferData else {
            if synchronizedDataCollection.synchronizedData(for: depthDataOutput) == nil {
                print("no depth data at time \(mach_absolute_time())")
            }
            if synchronizedDataCollection.synchronizedData(for: videoDataOutput) == nil {
                print("no video data at time \(mach_absolute_time())")
            }
            return
        }
        
        print("received depth data \(mach_absolute_time())")
}

As you can see, I'm console logging whenever depth data is not received. Note because I'm driving the video frames at 60 fps, its expected that I'll only receive depth data for every alternate video frame.

Console output is posted as a follow up comment (because of the character limit). I edited some lines out for brevity. You'll see it started streaming correctly but after a while it stopped received both video and depth outputs (in some other runs, it works perfectly and in some other runs I receive no depth data whatsoever). One thing to note, I sometimes run quicktime mirroring to see the device screen to see what the app is displaying (so not sure if that's causing any interference - that said I don't see any system pressure changes either).

Any help is most appreciated! Thanks.

Here's the console output referenced above.

Selected video format: <AVCaptureDeviceFormat: 0x283edddd0 'vide'/'420f' 1280x 720, { 1- 60 fps}, photo dims:{1280x720,4224x2376}, fov:64.717, supports vis (max strength:Low), max zoom:123.75 (upscales @3.00), AF System:2, ISO:18.0-1728.0, SS:0.000024-1.000000, supports wide color, supports depth>
Selected depth format: Optional('dpth'/'hdep'  320x 180, { 1- 30 fps}, photo dims:{}, fov:64.717)
no depth data at time 4660461719679
received depth data 4660461721647
no depth data at time 4660461721931
received depth data 4660461722090
no depth data at time 4660461722241
received depth data 4660461722574
no depth data at time 4660461733834
received depth data 4660461754095
no depth data at time 4660484422372
received depth data 4660484427828
no depth data at time 4660484551473
no depth data at time 4660484557872
received depth data 4660484560137
received depth data 4660484587264
received depth data 4660484625637
received depth data 4660484723590
received depth data 4660484890588
received depth data 4660484894113
received depth data 4660485028919
received depth data 4660485065300
received depth data 4660485108980
received depth data 4660485117934
received depth data 4660485311522
received depth data 4660485374234
received depth data 4660485409137
received depth data 4660485587031
received depth data 4660485717468
received depth data 4660485774764
received depth data 4660485965976
received depth data 4660486050028
received depth data 4660486050960
received depth data 4660486141976
no depth data at time 4660486148485
received depth data 4660486255161
received depth data 4660486255353
received depth data 4660486255451
received depth data 4660486255518
received depth data 4660486255590
received depth data 4660486255661
received depth data 4660486323853
received depth data 4660486343897
received depth data 4660486346269
received depth data 4660486586520
received depth data 4660486739514
received depth data 4660486845554
received depth data 4660486925583
received depth data 4660487127930
received depth data 4660487324394
received depth data 4660487377842
received depth data 4660487413297
received depth data 4660487533840
received depth data 4660487539837
received depth data 4660487539986
received depth data 4660487606854
received depth data 4660487730023
received depth data 4660487731258
received depth data 4660487731424
received depth data 4660487943556
received depth data 4660487945158
received depth data 4660488042570
received depth data 4660488089371
received depth data 4660488089543
received depth data 4660488161973
no depth data at time 4660488378041
no video data at time 4660488378151
no depth data at time 4660488795023
no video data at time 4660488795134
no depth data at time 4660488925324
no video data at time 4660488925465
no depth data at time 4660488931173
no video data at time 4660488931281
no depth data at time 4660489036864
no video data at time 4660489036974
no depth data at time 4660489098026
no video data at time 4660489098133
no depth data at time 4660489177999
no video data at time 4660489178109
no depth data at time 4660489181387
no video data at time 4660489181546
no depth data at time 4660489239588
no video data at time 4660489239690
no depth data at time 4660489455661
no video data at time 4660489455773
no depth data at time 4660489565211
no video data at time 4660489565315
no depth data at time 4660489683372
no video data at time 4660489683486
no depth data at time 4660489912688
no video data at time 4660489912809
no depth data at time 4660489925852
no video data at time 4660489926002
no depth data at time 4660490016836
no video data at time 4660490016941
no depth data at time 4660490017016
no video data at time 4660490017061
...

One learning - my project is very similar to Apple's WWDC sample available for download here: https://developer.apple.com/documentation/avfoundation/additional_data_capture/capturing_depth_using_the_lidar_camera.

Only difference is that instead of using a metal texture, I'm using a AVCaptureVideoPreviewLayer to show the camera view. After much comparison/experimentation, I'm finding that not adding my preview layer is allowing the video and depth data to flow reliably. When I add it, it flows correctly about 50% of the time.

This appears to be the most likely issue - any clues as to why this can happen?

given the errors I have seen, I suspect that the time spent generating the preview layer by buffer copying is what is causing a disruption in the data flow. The problem with doing that is that the cost of such operations is "hidden" by being embedded deeply in the Kernel modules which we are not allowed to see. If you want to get an idea how long it takes capture a timestamp before and after you request the image. I have also found that installing the Kernel Debugger will give insights into such behaviors by generating trace ehtries when extended intervals are detected in the kernel

Synchronized depth and video data not being received with builtInLiDARDepthCamera
 
 
Q