RunLoop behaviour change with performBlock?

I have a dedicated render thread with a run loop that has a CADisplayLink added to it (that's the only input source attached). The render thread has this loop in it:

  while (_continueRunLoop)
  {
    [runLoop runMode:NSDefaultRunLoopMode beforeDate:[NSDate distantFuture]];
  }

I have some code to stop the render thread that sets _continueRunLoop to false in a block, and then does a pthread_join on the render thread:

  [_renderThreadRunLoop performBlock:^{
    self->_continueRunLoop = NO;
  }];
  pthread_join(_renderThread, NULL);

I have noticed recently (iOS 18?) that if the Display Link is paused or invalidated before trying to stop the loop then the pthread_join blocks forever and the render thread is still sitting in the runMode:beforeDate: method. If the display link is still active then it does exit the loop, but only after one more turn of the display link callback.

The most likely explanation I can think of is there has been a behaviour change to performBlock - I believe this used to "consume" a turn of the run loop, and exit the runMode:beforeDate call but now it happens without leaving that function.

I can't find specific mention in the docs of the expected behaviour for performBlock - just that other RunLoop input sources cause the run method to exit, and timer sources do not. Is it possible that the behaviour has changed here?

Answered by DTS Engineer in 814484022

The most likely explanation I can think of is there has been a behaviour change to performBlock - I believe this used to "consume" a turn of the run loop, and exit the runMode:beforeDate call but now it happens without leaving that function.

There has not, but that's because CFRunLoop is documented in a way that gives it plenty of room to do whatever it wants. In this case, from CFRunLoopPerformBlock:

"This method enqueues the block only and does not automatically wake up the specified run loop. Therefore, execution of the block occurs the next time the run loop wakes up to handle another input source. If you want the work performed right away, you must explicitly wake up that thread using the CFRunLoopWakeUp function."

However my bigger concern is that you're doing this at all:

I have some code to stop the render thread that sets _continueRunLoop to false in a block, and then does a pthread_join on the render thread:

In practice, returning from a runloop and destroying thread:

while (_continueRunLoop)
{
	[runLoop runMode:NSDefaultRunLoopMode beforeDate:[NSDate distantFuture]];
}

...doesn't really work, at least not very well. Depending on the API usage, it either leaks mach ports (because the thread was destroyed) or the thread doesn't get destroyed (because input source leak prevents a return). The actual issue here is what was specifically called out in the documentation for NSRunLoop.runMode:beforeDate:

"...Manually removing all known input sources and timers from the run loop does not guarantee that the run loop will exit immediately. macOS may install and remove additional input sources as needed to process requests targeted at the receiver’s thread. Those sources could therefore prevent the run loop from exiting."

In practical terms, this means that outside of extremely simple usage patterns, it's basically "impossible" to control what's using a given runloop and, by extension, guarantee that NSRunLoop.runMode:beforeDate: will ever return.

More specifically, this kind of code tends to:

  • ...work fine in VERY simple, highly controlled cases where you know and control EVERY input source that interacts with the runloop. In practice, VERY view of our APIs actually fit within that category.

  • ... work fine in development because your usage accounts for every input source that happens to be occurring in that particular system version. More importantly when it fails to work (in development), then assumption is that the failure is "a bug", so you go to the trouble of tracking down and stopping the "extra" input source.

  • ...fail in real world use because system level changes and your own codes evolution will inevitably cause "extra" source to attach.

In terms of what you do about, you basically have 3 choices:

  1. Track down the "extra" input sources and stop them. This is entirely possible (though often annoying and time consuming), however, it's important to understand that this will be an ongoing maintenance task, not a one time bug. The problem you're seeing here will happen again.

  2. You can use CFRunLoopStop to force the return, as shown in the code snippet I posted on this forum thread. Note that you can also enjoy My Take On RunLoops™, at topic Quinn an I have both spent many an hour trying to explain. In any case, while this does work, it also means that you'll leak mach port, which is always worth avoiding*. In practice, CFRunLoopStop can be useful in very specific circumstances** but shouldn't really be used in long lived processes/apps.

*Mach port leaks are just as annoying as the input source leak you're tracking down in #1, except you can leak a lot mach ports without noticing and when you run out basically "anything" in your app can fail.

**For example, many command line tools work by using CFRunLoopStop to break out of their main thread runloop, running any finalization/cleanup code, and then calling "exit". Mach port leaks don't matter, as exit() solves all leaks.

  1. Use a single, long lived runloop thread that you never destroy. This is the system preferred solution and what I would recommend as well. The resource cost of a single thread is trivial and it's far simpler architecture to manage and maintain.

The idea with #3 is that you may still need to track down input source leaks (as in #1), but the leak itself won't cause any immediate problems. That's somewhat true of #2 as well, however, #2 also tends to magnify the leak issue. Many input sources only attach to the runloop "once", which means #2 turns "my app leaks 1 mach port" into "my app leaks 1 mach port every time I create my thread".

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Accepted Answer

The most likely explanation I can think of is there has been a behaviour change to performBlock - I believe this used to "consume" a turn of the run loop, and exit the runMode:beforeDate call but now it happens without leaving that function.

There has not, but that's because CFRunLoop is documented in a way that gives it plenty of room to do whatever it wants. In this case, from CFRunLoopPerformBlock:

"This method enqueues the block only and does not automatically wake up the specified run loop. Therefore, execution of the block occurs the next time the run loop wakes up to handle another input source. If you want the work performed right away, you must explicitly wake up that thread using the CFRunLoopWakeUp function."

However my bigger concern is that you're doing this at all:

I have some code to stop the render thread that sets _continueRunLoop to false in a block, and then does a pthread_join on the render thread:

In practice, returning from a runloop and destroying thread:

while (_continueRunLoop)
{
	[runLoop runMode:NSDefaultRunLoopMode beforeDate:[NSDate distantFuture]];
}

...doesn't really work, at least not very well. Depending on the API usage, it either leaks mach ports (because the thread was destroyed) or the thread doesn't get destroyed (because input source leak prevents a return). The actual issue here is what was specifically called out in the documentation for NSRunLoop.runMode:beforeDate:

"...Manually removing all known input sources and timers from the run loop does not guarantee that the run loop will exit immediately. macOS may install and remove additional input sources as needed to process requests targeted at the receiver’s thread. Those sources could therefore prevent the run loop from exiting."

In practical terms, this means that outside of extremely simple usage patterns, it's basically "impossible" to control what's using a given runloop and, by extension, guarantee that NSRunLoop.runMode:beforeDate: will ever return.

More specifically, this kind of code tends to:

  • ...work fine in VERY simple, highly controlled cases where you know and control EVERY input source that interacts with the runloop. In practice, VERY view of our APIs actually fit within that category.

  • ... work fine in development because your usage accounts for every input source that happens to be occurring in that particular system version. More importantly when it fails to work (in development), then assumption is that the failure is "a bug", so you go to the trouble of tracking down and stopping the "extra" input source.

  • ...fail in real world use because system level changes and your own codes evolution will inevitably cause "extra" source to attach.

In terms of what you do about, you basically have 3 choices:

  1. Track down the "extra" input sources and stop them. This is entirely possible (though often annoying and time consuming), however, it's important to understand that this will be an ongoing maintenance task, not a one time bug. The problem you're seeing here will happen again.

  2. You can use CFRunLoopStop to force the return, as shown in the code snippet I posted on this forum thread. Note that you can also enjoy My Take On RunLoops™, at topic Quinn an I have both spent many an hour trying to explain. In any case, while this does work, it also means that you'll leak mach port, which is always worth avoiding*. In practice, CFRunLoopStop can be useful in very specific circumstances** but shouldn't really be used in long lived processes/apps.

*Mach port leaks are just as annoying as the input source leak you're tracking down in #1, except you can leak a lot mach ports without noticing and when you run out basically "anything" in your app can fail.

**For example, many command line tools work by using CFRunLoopStop to break out of their main thread runloop, running any finalization/cleanup code, and then calling "exit". Mach port leaks don't matter, as exit() solves all leaks.

  1. Use a single, long lived runloop thread that you never destroy. This is the system preferred solution and what I would recommend as well. The resource cost of a single thread is trivial and it's far simpler architecture to manage and maintain.

The idea with #3 is that you may still need to track down input source leaks (as in #1), but the leak itself won't cause any immediate problems. That's somewhat true of #2 as well, however, #2 also tends to magnify the leak issue. Many input sources only attach to the runloop "once", which means #2 turns "my app leaks 1 mach port" into "my app leaks 1 mach port every time I create my thread".

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Thanks for the reply, great info.

So actually it's probably the case that in iOS 18 the system has added another input source to the runloop, whereas in earlier versions when I invalidated the displayLink the thread basically busy-looped on the run method so the performBlock happened essentially immediately.

I've switched the code for exiting the render loop to this, which shouldn't rely on any of the undefined behaviour and has been working in my testing.

// Schedule a block on the run loop to allow the thread to exit
CFRunLoopRef rl = [_renderThreadRunLoop getCFRunLoop];
CFRunLoopPerformBlock(rl, kCFRunLoopDefaultMode, ^{
    self->_continueRunLoop = NO;
    // Exit current runMode:beforeDate: method
    CFRunLoopStop(rl);
});

// Wake up the run loop to execute the scheduled block immediately
CFRunLoopWakeUp(rl);

Can you clarify the circumstances in which mach ports are leaked? The while loop in the original post is in a function launched as a new thread with pthread_create, my code only adds a CADisplayLink as an input source before the while loop, and when ending the thread I call [displayLink invalidate] and then set the loop variable so the pthread function can exit cleanly. Is there still a mach port leak in that case?

To give a bit more context, this is a dedicated render thread for a custom metal view. The starting point for my code came from this Apple sample: https://developer.apple.com/documentation/metal/onscreen_presentation/creating_a_custom_metal_view?language=objc

It spins up a render thread when didMoveToWindow is called, as that's the first time there's a UIScreen available to get a CADisplayLink, and that's the only app-side input source the run loop will ever need. When the view moves off a window it then [displayLink invalidate] is called which removes the source from the run loop.

I saw a couple of issues with the code in that example - firstly the render thread isn't stopped when the display link is invalidated. If the system hasn't added any other input sources, the render thread will be effectively busy-looping at this point (runMode would return immediately as there are no sources).

Secondly, the next time the view is added back to a window, a new render thread is started. There's some code that sets _continueRunLoop to NO before starting the new render thread to allow any previous render threads to exit, but no actual guarantee the render thread has checked that flag and exited before setting the same ivar back to YES to get the new thread to run.

My code fixes these issues (and also switched to pthreads) but it still seemed reasonable to me to have the lifetime of the render thread (and related display link, and run loop) all driven by whether the view is attached to a window. If it's not possible to implement that without leaking mach ports then I can have a rethink...

Can you clarify the circumstances in which mach ports are leaked?

Well, the unhelpful (and slightly flippant) answer is basically "anytime they're not properly deallocated" when they're no longer useful. More practically, the whole reason mach ports create issues IS that it's often quite difficult to connect any given mach port with the specific component responsible for managing that unit. In terms of run loop semantics specifically, I think you'd (potentially) generate a leak anytime you destroy an thread without removing/stopping all of it's input sources*.

*Strictly speaking, that isn't necessarily a guaranteed leak. For example, some of our frameworks register thread destruction handlers so to handle clean up. However, at that point you're relying on the details of our own implementation, which is never ideal. Also, keep in mind that this isn't just about mach port leakage, but could involve any other resource who's management happened to be tied to that thread.

The while loop in the original post is in a function launched as a new thread with pthread_create, my code only adds a CADisplayLink as an input source before the while loop, and when ending the thread I call [displayLink invalidate] and then set the loop variable so the pthread function can exit cleanly. Is there still a mach port leak in that case?

In terms of a "base" implementation, no, I don't think so. Of course, that's what makes all of this so messy- the issue here isn't the existing base implementation it's:

  • Anything "else" you do on that thread.

  • Any future changes to that implementation.

Strictly speaking, you can use NSRunLoop.runUntilDate to "know" that all input source have been removed, however, that also means that when something changes you'll basically end up leaking a thread.

To give a bit more context, this is a dedicated render thread for a custom metal view. The starting point for my code came from this Apple sample:

Huh. I'm checking in with the DTS engineer who handles metal, but my initial read of AAPLUIView is that it's implementation is broken/incomplete. The obvious fix would be to to set "_continueRunLoop = NO" in stopRenderLoop, but the whole things feels a bit "janky".

I saw a couple of issues with the code in that example - firstly the render thread isn't stopped when the display link is invalidated. If the system hasn't added any other input sources, the render thread will be effectively busy-looping at this point (runMode would return immediately as there are no sources).

Secondly, the next time the view is added back to a window, a new render thread is started. There's some code that sets _continueRunLoop to NO before starting the new render thread to allow any previous render threads to exit, but no actual guarantee the render thread has checked that flag and exited before setting the same ivar back to YES to get the new thread to run.

Yeah, that's all true. Frankly, the thread management in AAPLUIView is just weird and doesn't really use the tools at hand very well. Case in point, given an NSThread with an operational run loop, you can can actually just message thread "directly" with performSelector:onThread:.

One small comment here:

My code fixes these issues (and also switched to pthreads)

You're welcome to use pthreads if you choose, but NSThread is in fact a wrapper around pthread*. Indeed, you don't actually need to "create" an NSThread in order to "get" an NSThread- [NSThread currentThread] actually does is return an NSThread object stored in thread local storage but if you call it on a "standard" pthread, what it actually does is create a "new" NSThread object for that thread, then return the object it just created. If there is some larger reason to use the "direct" pthread API then you certainly can, but there isn't any fundamental difference between the two APIs, aside from the additional convenience provided by NSThread.

*Keep in mind that, as far as our code code is concerned, "all threads are pthreads". Strictly speaking it is possible to create a thread using mach but, as far as I've been able to tell, none of our code actually does so.

still seemed reasonable to me to have the lifetime of the render thread (and related display link, and run loop) all driven by whether the view is attached to a window.

How many of these views are you actually planning to have "live" at any given time? The one concern I'd have with this kind of architecture is that, in practice, I think it can end up introducing unnecessary complexity with less benefit than it would other wise "seem". More specifically:

  • When the view count is low, you end up wasting thread unnecessarily. The point of metal is offload work to the GPU, so this thread shouldn't actually be doing very much. This is actually why the "default" is the main thread- for many apps, the main thread will work just fine.

  • As the view count increases, eventually it falls apart from thread overhead and general "load".

If it's not possible to implement that without leaking mach ports then I can have a rethink...

Again, this all depends on exactly what you're doing on that that thread. If you usage is truly confined to pure metal rendering then, yes, it probably can be done safely. However, I'm you may still be better off using one thread for other reasons beyond the immediate one.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Thanks for the all the info, I think I'll stick with my current setup in that case. There's usually only one of these views on screen at once - and typically only one ever created in the app, but I wanted it to clean up after itself if apps did choose to dealloc it. It felt like a reasonable architecture to encapsulate the render thread behaviour in the view class as it is tied to when the view is added to a window.

I do want a dedicated render thread and also wanted to follow the guidelines for them from this WWDC talk: https://developer.apple.com/videos/play/wwdc2018/612/?time=1146

I switched to pthreads for the thread creation so I could set those scheduler attributes at thread creation time as shown in that video. I'm happy enough with using pthreads directly for now.

RunLoop behaviour change with performBlock?
 
 
Q