Feasibility of Real-Time Object Detection in Live Video with Core ML on M1 Pro and A-Series Chips

Question

Blume OP

Created 6d

Replies 1

Boosts 0

Participants 1

Hello,

I am exploring real-time object detection, and its replacement/overlay with another shape, on live video streams for an iOS app using Core ML and Vision frameworks. My target is to achieve high-speed, real-time detection without noticeable latency, similar to what’s possible with PageFault handling and Associative Caching in OS, but applied to video processing.

Given that this requires consistent, real-time model inference, I’m curious about how well the Neural Engine or GPU can handle such tasks on A-series chips in iPhones versus M-series chips (specifically M1 Pro and possibly M4) in MacBooks. Here are a few specific points I’d like insight on:

Hardware Suitability: How feasible is it to perform real-time object detection with Core ML on the Neural Engine (i.e., can it maintain low latency)? Would the M-series chips (e.g., M1 Pro or newer) offer a tangible benefit for this type of task compared to the A-series in mobile devices? Which A- and M- chips would be minimum feasible recommendation for such task.
Performance Expectations: For continuous, live video object detection, what would be the expected frame rate or latency using an optimized Core ML model? Has anyone benchmarked such applications, and is the M-series required to achieve smooth, real-time processing?
Differences Across Apple Hardware: How does performance scale between the A-series Neural Engine and M-series GPU and Neural Engine? Is the M-series vastly superior for real-time Core ML tasks like object detection on live video feeds?

If anyone has attempted live object detection on these chips, any insights on real-time performance, limitations, or optimizations would be highly appreciated.

Please refer: Apple APIs

Thank you in advance for your help!

Boost

Answer 1

Blume OP

6d

In WWDC 2024 session 10223, Apple demonstrated the availability of multiple Core ML modules, including Vision, Natural Language, Sound, Speech, and Translation. These modules can be leveraged concurrently within an app, allowing tasks to run in parallel.

Could you provide guidance on best practices for performing multiple Core ML tasks simultaneously on iPhone and MacBook hardware? Specifically, I’d like to know about the feasibility and efficiency of running three parallel tasks on different Apple chipsets (A-series for iPhone and M-series for MacBook). Are there particular recommendations for optimizing parallel tasks on these chips to maintain performance and battery efficiency?

Thank you!

0