Hello,
I am exploring real-time object detection, and its replacement/overlay with another shape, on live video streams for an iOS app using Core ML and Vision frameworks. My target is to achieve high-speed, real-time detection without noticeable latency, similar to what’s possible with PageFault handling and Associative Caching in OS, but applied to video processing.
Given that this requires consistent, real-time model inference, I’m curious about how well the Neural Engine or GPU can handle such tasks on A-series chips in iPhones versus M-series chips (specifically M1 Pro and possibly M4) in MacBooks. Here are a few specific points I’d like insight on:
Hardware Suitability: How feasible is it to perform real-time object detection with Core ML on the Neural Engine (i.e., can it maintain low latency)? Would the M-series chips (e.g., M1 Pro or newer) offer a tangible benefit for this type of task compared to the A-series in mobile devices? Which A- and M- chips would be minimum feasible recommendation for such task.
Performance Expectations: For continuous, live video object detection, what would be the expected frame rate or latency using an optimized Core ML model? Has anyone benchmarked such applications, and is the M-series required to achieve smooth, real-time processing?
Differences Across Apple Hardware: How does performance scale between the A-series Neural Engine and M-series GPU and Neural Engine? Is the M-series vastly superior for real-time Core ML tasks like object detection on live video feeds?
If anyone has attempted live object detection on these chips, any insights on real-time performance, limitations, or optimizations would be highly appreciated.
Please refer: Apple APIs
Thank you in advance for your help!
Performance
RSS for tagImprove your app's performance.
Posts under Performance tag
49 Posts
Sort by:
Post
Replies
Boosts
Views
Activity
I'm currently pulling device-specific data for my app, and I'm manually listing 150 models like this:
device_models = [ "iPhone1_1", "iPhone1_2", "iPhone2_1", ... "iPad16_6"]
Is there an API endpoint or an automated method to dynamically retrieve a complete list of device models?
I'm specifically looking to connect this with the performance metrics API to monitor launch times per device type. Any suggestions on how to streamline or automate this list would be greatly appreciated. Thanks!
I created a simple animation of a circle that changes sizes. The circle pulses like a heartbeat in the center of the screen. My expectation was for the CPU use to be very low, but that is not the case. In addition, even if the CPU use isn't as low as I would expect, I did not expect the CPU use to increase over time because nothing else is happening in the app. Here is the code:
import SwiftUI
@main
struct TestApp: App {
var body: some Scene {
WindowGroup {
SplashScreenView()
}
}
}
import SwiftUI
struct SplashScreenView: View {
var body: some View {
ZStack {
SplashNucleusView(minSize: 50, maxSize: 100)
}
}
}
import SwiftUI
struct SplashNucleusView: View {
let minSize: Double
let maxSize: Double
@State private var nucleusColor: Color = .primary
@State private var nucleusRadius: Double = 10
@State private var nucleusOpacity: Double = 1.0
private var nucleusAnimation: Animation {
.easeInOut(duration: 0.25)
.repeatForever(autoreverses: true)
}
let timer = Timer.publish(every: 0.5, on: .main, in: .common).autoconnect()
var body: some View {
Circle()
.fill(nucleusColor)
.frame(width: nucleusRadius)
.opacity(nucleusOpacity)
.onReceive(timer) { _ in
withAnimation(nucleusAnimation) {
nucleusRadius = Double.random(in: minSize...maxSize)
}
}
}
}
This is how the animation looks:
The animation is snappy until the CPU use reaches 95%, at which point there is visible stuttering. Here is how the CPU looks when the animation duration value is 0.5 seconds and the timer publishing interval is 3 seconds:
Changing the animation duration value to 0.25 seconds and the timer publishing interval to 0.5 seconds changes the CPU use as follows:
The complete view has many other moving parts, which make the issue much worse. The issue is evident with only the circle view. I spent hours working with the debugger, reading about animations, and testing new approaches, but the result is always the same.
Why is this happening?
In my React Native mobile application, we are experiencing app termination issues on a few devices (iPhone 13 & 14). We are not getting any logs on Xcode organizer after app termination due to memory leak or terminates by OS.
Could you please suggest a way to log app terminations or recommend any other platform where we can log such events? Alternatively, do you have any suggestions on how to resolve app termination issues?
In my React Native mobile application, we are experiencing app termination issues on a few devices (iPhone 13 & 14). We have implemented Firebase Crashlytics, we are getting crashes logs but we are not receiving any logs for app terminations when app terminates due to memory leak or terminates by OS.
Could you please suggest a way to log app terminations or recommend any other platform where we can log such events? Alternatively, do you have any suggestions on how to resolve app termination issues?
I generate images with command line apps in Swift on MacOS. Under the prior Xcode/MacOS my code had been running at the same performance for years. Converting to Swift 6 (no code changes) and running on Sequoia, I noticed a massive slowdown. Running Profile, I tracked it down to allow single line:
var values = ContiguousArray<Double>(repeating: 0.0, count: localData.options.count)
count for my current test case is 4, so its allocating 4 doubles at a time, around 40,000 times in this test. This one line takes 42 seconds out of a run time of 52 seconds. With the profile shown as:
26 41.62 s 4.8% 26.00 ms specialized ContiguousArray.init(_uninitializedCount:)
42 41.57 s 4.8% 42.00 ms _ContiguousArrayBuffer.init(_uninitializedCount:minimumCapacity:)
40730 40.93 s 4.7% 40.73 s _swift_allocObject_
68 68.00 ms 0.0% 68.00 ms std::__1::pair<MallocTypeCacheEntry*, unsigned int> swift::ConcurrentReadableHashMap<MallocTypeCacheEntry, swift::LazyMutex>::find<unsigned int>(unsigned int const&, swift::ConcurrentReadableHashMap<MallocTypeCacheEntry, swift::LazyMutex>::IndexStorage, unsigned long, MallocTypeCacheEntry*)
7 130.00 ms 0.0% 7.00 ms swift::swift_slowAllocTyped(unsigned long, unsigned long, unsigned long long)
which is clearly inside the OS allocator somewhere. What happened? Previously this would have taken closer to 8 seconds or so for the entire run.
Recently, we reworked a crucial part of our app and managed to half the amount of CPU cycles our app requires (according to Xcode Instruments).
Nonetheless, when using the Time Profiler component in instruments, it shows that the CPU time spent was either higher or the same (depending on execution).
The main time-consuming factor here: libsystem_pthread.dylib - the amount of CPU time spent by this library has doubled from original implementation to reworked implementation.
Therefore, I'm having a few questions:
How should I interpret this result?
How is this even possible if the CPU clock cycles halved?
What is the better metric here, the CPU cycles or the time profiler?
How can I reduce the impact of that said library? What does that library do and how can I influence its performance?
Thanks in advance.
On and off I've been trying to figure out how to do hang detection in-application (at least from the user's point of view). Qualitatively what I'd like to do is have a process which runs sample(1) on the application after it's been unresponsive for more than a second or so. Basically, an in-app replacement for Spin Control. The problem I've been stuck on is: how do I tell?
There used to be Core Graphics SPI (CGSRegisterNotifyProc with a value of kCGSEventNotificationAppIsUnresponsive) for doing this, but it doesn't work anymore (either due to sandboxing or system-wide security changes, I can't tell which but it doesn't matter).
One thought I had was to have an XPC service which would expect to receive a checkin once per second from the host (via a timer set up by the host). If it didn't, it would start sample(1). This seems pretty heavyweight to me, since it means that once per second, I'm going to be consuming cycles to check in with the service. But I haven't been able to come up with a scheme that doesn't include some kind of check-in by the target process.
Are there any APIs or strategies that I could use to accomplish this? Or is there some entitlement which would allow the application to request "application became unresponsive"/"application became responsive" notifications from the window server?
What are the possible KPI requirements set by Apple AI for cellular networks, e.g. regarding latency, throughput or jitter?
What is the expected effect on iPhone energy consumption?
Hi Everyone, I would appreciate your help with the topic mentioned above. I'm seeking a solution for the issue I linked below.
https://discussions.apple.com/thread/255668660?sortBy=best
Apple Support said I could get a faster response. I've also submitted the issue to Apple Support, and they said it's currently with an Apple Engineer, but things are moving a bit slowly there. I'm writing the similar explanation I wrote on the discussion forum here as well. It's been months, and I hope we can get a result here:
**Here is the problem: **
I've noticed that the "CursorUIViewService" process in Activity Monitor is becoming 'not responding' and causing significant lag on my MacBook Air (M3), especially when typing and switching between upper and lower case letters. It appears this process also controls the blue caps-lock indicator, which stops working when the process is unresponsive. This issue seems to cause the lag, and currently, it is using about 170MB of RAM.
Additionally, the "com.apple.hiservices-xpcservice" process also becomes unresponsive , though it usually doesn't exceed 3.5MB of RAM. Actually, this process becomes 'not responding' much more frequently compared to the CursorUIViewService process. The possibility that it might be related to CursorUIViewService pushed me to research this issue as well. I can see that there have been complaints about this process for years, but it seems no solution is being produced.
By the way, I've tried everything. I did a clean install, ran diagnostics, performed first aid, and still encountered the problem.
Has anyone else experienced this issue or found a solution?
As an update, I would like to inform you that the "com.apple.hiservices-xpcservice" process is still experiencing not responding issues with macOS Sequoia (15.0). However, because "cursoruiviewservice" was causing problems less often on Sonoma, I can't say the issue is completely resolved just yet; I need to monitor the situation.
Thank you!
Hello there!
I wanted to give a native scrolling mechanism for the Swift Charts Graph a try and experiment a bit if the scenario that we try to achieve might be possible, but it seems that the Swift Charts scrolling performance is very poor.
The graph was created as follows:
X-axis is created based on a date range,
Y-axis is created based on an integer values between moreless 0-320 value.
the graph is scrollable horizontally only (x-axis),
The time range (x-axis) for the scrolling content was set to one year from now date (so the user can scroll one year into the past as a minimum visible date (.chartXScale).
The X-axis shows 3 hours of data per screen width (.chartXVisibleDomain).
The data points for the graph are generated once when screen is about to appear so that the Charts engine can use it (no lazy loading implemented yet).
The line data points (LineMark views) consist of 2880 data points distributed every 5 minutes which simulates - two days of continuous data stream that we want to present. The rest of the graph displays no data at all.
The performance result:
The graph on the initial loading phase is frozen for about 10-15 seconds until the data appears on the graph.
Scrolling is very laggy - the CPU usage is 100% and is unacceptable for the end users.
If we show no data at all on the graph (so no LineMark views are created at all) - the result is similar - the empty graph scrolling is also very laggy.
Below I am sharing a test code:
@main
struct ChartsTestApp: App {
var body: some Scene {
WindowGroup {
ContentView()
Spacer()
}
}
}
struct LineDataPoint: Identifiable, Equatable {
var id: Int
let date: Date
let value: Int
}
actor TestData {
func generate(startDate: Date) async -> [LineDataPoint] {
var values: [LineDataPoint] = []
for i in 0..<(1440 * 2) {
values.append(
LineDataPoint(
id: i,
date: startDate.addingTimeInterval(
TimeInterval(60 * 5 * i) // Every 5 minutes
),
value: Int.random(in: 1...100)
)
)
}
return values
}
}
struct ContentView: View {
var startDate: Date {
return endDate.addingTimeInterval(-3600*24*30*12) // one year into the past from now
}
let endDate = Date()
@State var dataPoints: [LineDataPoint] = []
var body: some View {
Chart {
ForEach(dataPoints) { item in
LineMark(
x: .value("Date", item.date),
y: .value("Value", item.value),
series: .value("Series", "Test")
)
}
}
.frame(height: 200)
.chartScrollableAxes(.horizontal)
.chartYAxis(.hidden)
.chartXScale(domain: startDate...endDate) // one year possibility to scroll back
.chartXVisibleDomain(length: 3600 * 3) // 3 hours visible on screen
.onAppear {
Task {
dataPoints = await TestData().generate(startDate: startDate)
}
}
}
}
I would be grateful for any insights or suggestions on how to improve it or if it's planned to be improved in the future.
Currently, I use UIKit CollectionView where we split the graph into smaller chunks of the graph and we present the SwiftUI Chart content in the cells, so we use the scrolling offered there. I wonder if it's possible to use native SwiftUI for such a scenario so that later on we could also implement some kind of lazy loading of the data as the user scrolls into the past.
Translate the complete passage into English: "When using xctrace to record mobile process performance parameters, it happens that long duration recordings (exceeding about 20 minutes) result in the trace file missing some information, as shown in the diagram. However, there is no issue when recording directly with the instrument on the problematic machine. Now, using xctrace for recording for over 20 minutes almost certainly causes this problem. The macOS versions are all 12.5.1, and the processors are M1 and Intel Core i7, respectively. Additionally, the Xcode versions are all 13.4 (13F17a). There is no problem with short recording durations. Only when it records for a longer period does it result in nearly all data being lost. The situation is pretty much the same with the other templates as well.
I'm looking at performance around large codable nested structures that come in from HTTP/JSON.
We are seeing stalls on the main thread, and after reviewing all the code, the webrequests and parsing are async and background. The post to set the new struct value (80K) is handled on mainthread.
When I looked at the nested structures, they are about 80K.
Reading several articles and posts suggested that observing structs will cause a refresh on any change. And that large structures will take longer as they have to be copied for passing to each observer. And that more observers will slow things down.
So a made a test app to verify these premises.
The app has an timer animating a slider.
A VM with a structure containing a byte array.
Sliders to scale the size of the byte array from 10K to 200K and to scale the number of observers from 1 to 100.
It also measures the actual duration between the timer ticks. My intention is to be able to visual see mainthread stalls and be able to measure them and see the average and max frame delays.
Using this to test I found little difference in performance given different structure sizes or number of observers. I'm not certain if this is expected or if I missing something in creating my test app.
I have also created a variation where the top struct is a an observable class. I see no difference between struct or class.
I'm wondering if this is due to copy-on-mutate causing the struct to actually be passed as reference under the good?
I wonder if other optimizations are minimizing the affect of scaling from 1 to 100 observers.
I appreciate any insights & critiques.
#if CLASS_BASED
class LargeStruct: ObservableObject {
@Published var data: [UInt8]
init(size: Int = 80_000) {
self.data = [UInt8](repeating: 0, count: size)
}
func regenerate(size: Int) {
self.data = [UInt8](repeating: UInt8.random(in: 0...255), count: size)
}
var hashValue: String {
let hash = SHA256.hash(data: Data(data))
return hash.compactMap { String(format: "%02x", $0) }.joined()
}
}
#else
struct LargeStruct {
var data: [UInt8]
init(size: Int = 80_000) {
self.data = [UInt8](repeating: 0, count: size)
}
mutating func regenerate(size: Int) {
self.data = [UInt8](repeating: UInt8.random(in: 0...255), count: size)
}
var hashValue: String {
let hash = SHA256.hash(data: Data(data))
return hash.compactMap { String(format: "%02x", $0) }.joined()
}
}
#endif
class ViewModel: ObservableObject {
@Published var largeStruct = LargeStruct()
}
struct ContentView: View {
@StateObject var vm = ViewModel()
@State private var isRotating = false
@State private var counter = 0.0
@State private var size: Double = 80_000
@State private var observerCount: Double = 10
// Variables to track time intervals
@State private var lastTickTime: Date?
@State private var minInterval: Double = .infinity
@State private var maxInterval: Double = 0
@State private var totalInterval: Double = 0
@State private var tickCount: Int = 0
var body: some View {
VStack {
Model3D(named: "Scene", bundle: realityKitContentBundle)
.padding(.bottom, 50)
// A rotating square to visualize stalling
Rectangle()
.fill(Color.blue)
.frame(width: 50, height: 50)
.rotationEffect(isRotating ? .degrees(360) : .degrees(0))
.animation(.linear(duration: 2).repeatForever(autoreverses: false), value: isRotating)
.onAppear {
isRotating = true
}
Slider(value: $counter, in: 0...100)
.padding()
.onAppear {
Timer.scheduledTimer(withTimeInterval: 0.005, repeats: true) { timer in
let now = Date()
if let lastTime = lastTickTime {
let interval = now.timeIntervalSince(lastTime)
minInterval = min(minInterval, interval)
maxInterval = max(maxInterval, interval)
totalInterval += interval
tickCount += 1
}
lastTickTime = now
counter += 0.2
if counter > 100 {
counter = 0
}
}
}
HStack {
Text(String(format: "Min: %.3f ms", minInterval * 1000))
Text(String(format: "Max: %.3f ms", maxInterval * 1000))
Text(String(format: "Avg: %.3f ms", (totalInterval / Double(tickCount)) * 1000))
}
.padding()
Text("Hash: \(vm.largeStruct.hashValue)")
.padding()
Text("Hello, world!")
Button("Regenerate") {
vm.largeStruct.regenerate(size: Int(size)) // Trigger the regeneration with the selected size
}
Button("Clear Stats") {
minInterval = .infinity
maxInterval = 0
totalInterval = 0
tickCount = 0
lastTickTime = nil
}
.padding(.bottom)
Text("Size: \(Int(size)) bytes")
Slider(value: $size, in: 10_000...200_000, step: 10_000)
.padding()
Text("Number of Observers: \(observerCount)")
Slider(value: $observerCount, in: 1...100, step: 5)
.padding()
HStack {
ForEach(0..<Int(observerCount), id: \.self) { index in
Text("Observer \(index + 1): \(vm.largeStruct.data[index])")
.padding(5)
}
}
}
.padding()
}
}
Here's a video clearly demonstrating the problem:
https://youtu.be/-IbyaaIzh0I
This is a major issue for my game, because it's not meant to be played multiple times. My game is designed to only play once, so it really ruins the experience if it runs poorly until someone force quits or crashes the game.
Does anyone have a solution to this, or has encountered this issue of poor initial launch performance?
I made this game in Unity and I'm not sure if this is an Apple issue or a Unity issue.
[Submitted as FB14860454, but posting here since I rarely get responses in Feedback Assistant]
In a simple SwiftData app that adds items to a list, memory usage drastically increases as items are added. After a few hundred items, the UI lags and becomes unusable.
In comparison, a similar app built with CoreData shows only a slight memory increase in the same scenario and does NOT lag, even past 1,000 items.
In the SwiftData version, as each batch is added, memory spikes the same amount…or even increases! In the CoreData version, the increase with each batch gets smaller and smaller, so the memory curve levels off.
My Question
Are there any ways to improve the performance of adding items in SwiftData, or is it just not ready for prime time?
Example Projects
Here are the test projects on GitHub if you want to check it out yourself:
PerfSwiftData
PerfCoreData
I'm building an iOS/iPadOS app for iOS 18+ using the new RealityView in SwiftUI. (I may add visionOS, but I'm not focusing on it right now.) The 3D scene I'm rendering is fairly simple (just a few dozen vertices and a couple of textures), and I'd like to render it at 120fps on ProMotion devices if possible. I tried setting CADisableMinimumFrameDurationOnPhone to true in the info plist, but it had no effect. The frame rate in the GPU Report in Xcode stays capped at 60fps, and the gauge even tops out at 60.
My question is kind of the opposite of this post, which asks how to limit the frame rate of a RealityView.
I'm on Xcode 16 beta 5 on macOS Sonoma and iOS 18.0 beta 6 on my iPhone 15 Pro.
func testMLTensor() {
let t1 = MLTensor(shape: [2000, 1], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 2000), scalarType: Float.self)
let t2 = MLTensor(shape: [1, 3000], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 3000), scalarType: Float.self)
for _ in 0...50 {
let t = Date()
let x = (t1 * t2)
print("MLTensor", t.timeIntervalSinceNow * 1000, "ms")
}
}
testMLTensor()
The above code took more time than expected, especially in the early stage of iteration.
I'm struggling with getting a hierarchical SwiftUI List to perform well with large data sets.
Here's a demo repo: https://github.com/lemonmojo/swiftui-hierarchical-list-performance
There are three main problems:
Rendering of the list is slow if there are many items. (Just start the app and wait for the list to be rendered.)
Changing the selected item is very slow. (Tap/Click an item and wait for the selection to change.)
Updating the list is slow. (Press the "Shuffle" button.)
On an iPhone 13 Pro, it takes 6 seconds from tapping the app icon to the list being rendered. This was timed with a release build. Once the list has been rendered, it takes 8 seconds for the selection to change when tapping an item. Tapping the "Shuffle" button results in a 2 seconds delay before the updated list is rendered.
All three problems are much more pronounced on macOS (tested on a Mac Studio M2) where it even takes minutes(!) for the app to become responsive.
Instruments shows that 99% of the CPU time is spent somewhere deep inside SwiftUI. Various attempts have been made to fix the problems (as documented in the code) but none of them have been successful.
There are two different data sets included in the demo. One can switch between them by commenting out one of the let rootItem … declarations and commenting in the other in ContentView.swift. The default example has a flat list with 100.000 items. The "real-world" example has 4 folders, each containing 25.000 items which is faster to render initially, but as soon as you start expanding folders it's slow again.
Is there any way to make SwiftUI's list perform well on iOS and macOS with data sets greater than a couple thousand items? I'm especially worried about the selection performance. Why is selecting an item so slow once the list has been rendered?
I've created a repo that uses AppKit's NSOutlineView here and another one using UIKit's UICollectionView here. Obviously, both are blazing fast and have none of the issues I encounter with SwiftUI List.
Any ideas on how to improve the SwiftUI performance?
Many thx,
Felix
I have a game for iOS where I use CADisplayLink to animate a simulation, and for some reason the animation is not getting the full 120hz on capable devices (like iPhone 15 Pro). When I enable a 120hz refresh target, the animation is capped at only 90hz. This looks terrible because the animation works best when doubled (30, 60, 120, 240, etc).
The really bizarre thing is that when I turn on Screen Recording, my frame rate instantly jumps to 120, and everything looks perfectly smooth. My game has never looked better on iPhone! When recording is stopped, the animation drops back down to 90 fps. What in the world is going on?
[displayLink setPreferredFrameRateRange:CAFrameRateRangeMake(100,240,120)]; //Min. Max, Preferred [displayLink addToRunLoop:[NSRunLoop currentRunLoop] forMode:NSDefaultRunLoopMode];
(Also, CADisableMinimumFrameDurationOnPhone is set to True in info.plist)
let metrics: [XCTMetric] = [XCTClockMetric(), // to measure time
XCTCPUMetric(), // to measure cpu cycles
XCTStorageMetric(), // to measure storage consuming
XCTMemoryMetric(),
]
let measureOptions = XCTMeasureOptions.default
measureOptions.iterationCount = 1
measure(metrics: metrics) {
//App flow
}
I want to get values of XCTCPUMetric, XCTMemoryMetric, XCTStorageMetric etc in any variable so that if want to send it further somewhere I can do it.
Example -
// let cpuMetric = CPU measure object should be here & I can get each information from this object.
// let MemoryMetric = Memory measure object should be here & I can get each information from this object.
But It's not available in XCUITest. We can only able to find it in the TestResult file. Please suggest any code available to get each metric object & value in the XCUITest rather than the test result.