VNCoreMLRequest Callback Not Triggered in Modified Video Classification App

Question

Blume OP

Created 4d

Replies 1

Boosts 0

Participants 1

Hi everyone,

I'm working on integrating object recognition from live video feeds into my existing app by following Apple's sample code. My original project captures video and records it successfully. However, after integrating the Vision-based object detection components (VNCoreMLRequest), no detections occur, and the callback for the request is never triggered.

To debug this issue, I’ve added the following functionality:

Set up AVCaptureVideoDataOutput for processing video frames.
Created a VNCoreMLRequest using my Core ML model.

The video recording functionality works as expected, but no object detection happens. I’d like to know:

How to debug this further? Which key debug points or logs could help identify where the issue lies?
Have I missed any key configurations? Below is a diff of the modifications I’ve made to my project for the new feature.

Diff of Changes: (Attach the diff provided above)

Specific Observations:

The captureOutput method is invoked correctly, but there is no output or error from the Vision request callback.
Print statements in my setup function setForVideoClassify() show that the setup executes without errors.

Questions:

Could this be due to issues with my Core ML model compatibility or configuration?
Is the VNCoreMLRequest setup incorrect, or do I need to ensure specific image formats for processing?

Platform:

Xcode 16.1, iOS 18.1, Swift 5, SwiftUI, iPhone 11,
Darwin MacBook-Pro.local 24.1.0 Darwin Kernel Version 24.1.0: Thu Oct 10 21:02:27 PDT 2024; root:xnu-11215.41.3~2/RELEASE_X86_64 x86_64

Any guidance or advice is appreciated! Thanks in advance.

Boost

Answer 1

Blume OP

4d

Highlighted difference indicating code used from the sample.

---
> 
68,69d67
<     private let videoDataOutputQueue = DispatchQueue(label: "VideoDataOutput", qos: .userInitiated, attributes: [], autoreleaseFrequency: .workItem)
<     var bufferSize: CGSize = .zero
---
>     
102d99
<             print("captureOutput error")
114,142d110
< 
<     private func setForVideoClassify() {
<         print("setForVideoClassify()")
<         // Add the video data output for classification if it's not already added
<         let videoDataOutput = AVCaptureVideoDataOutput()
<         videoDataOutput.alwaysDiscardsLateVideoFrames = true
<         videoDataOutput.videoSettings = [
<             kCVPixelBufferPixelFormatTypeKey as String: Int(kCVPixelFormatType_420YpCbCr8BiPlanarFullRange)
<         ]
<         videoDataOutput.setSampleBufferDelegate(self, queue: videoDataOutputQueue)
< 
<         if captureSession.canAddOutput(videoDataOutput) {
<             captureSession.addOutput(videoDataOutput)
<         } else {
<             print("Could not add video data output")
<         }
<         let captureConnection = videoDataOutput.connection(with: .video)
<         // Always process the frames
<         captureConnection?.isEnabled = true
<         do {
<             let videoDevice = currentDevice
<             try  videoDevice.lockForConfiguration()
<             let dimensions = CMVideoFormatDescriptionGetDimensions((videoDevice.activeFormat.formatDescription))
<             bufferSize.width = CGFloat(dimensions.width)
<             bufferSize.height = CGFloat(dimensions.height)
<             videoDevice.unlockForConfiguration()
<         } catch {
<             print("setForVideoClassify error: \(error)")
<         }
144,145d111
<     }
<     
262,265c228
<             setupVision()
<             setForVideoClassify()
<             captureSession.sessionPreset = .vga640x480
<             //captureSession.sessionPreset = .high
---
>             captureSession.sessionPreset = .high
526a490
>             setForVideoClassify()
529a494,510
>     private func setForVideoClassify() {
>         print("setForVideoClassify()")
>         // Add the video data output for classification if it's not already added
>         let videoDataOutput = AVCaptureVideoDataOutput()
>         videoDataOutput.alwaysDiscardsLateVideoFrames = true
>         videoDataOutput.videoSettings = [
>             kCVPixelBufferPixelFormatTypeKey as String: Int(kCVPixelFormatType_420YpCbCr8BiPlanarFullRange)
>         ]
>         videoDataOutput.setSampleBufferDelegate(self, queue: DispatchQueue(label: "VideoDataOutputQueue"))
>         
>         if captureSession.canAddOutput(videoDataOutput) {
>             captureSession.addOutput(videoDataOutput)
>         } else {
>             print("Could not add video data output")
>         }
>     }
> 
567a549
>

0