NEMachServiceName failure to access after network extension upgrade

We have a product which uses a Network Extension (a socket filter and a packet content filter). The application contains the network extension, as well as an un-sandboxed LaunchDaemon which connects to the service at the NEMachServiceName.

Occasionally, usually after an upgrade where the system extension is swapped for the new version, our un-sandboxed process isn't able to contact the network extension. From the logging, we receive the following XPC error

(libxpc.dylib) [com.apple.xpc:connection] [0x7fd6d0307f40] failed to do a bootstrap look-up: xpc_error=[3: No such process]

in the unsandboxed process. Eventually, we receive an invalidated callback on the XPC connection with the error Couldn’t communicate with a helper application.. We have confirmed that an appropriate service is running via the launchctl command, and the network extension process appears to have initialised correctly. We don't see any indication of a received connection at the Network Extension process however (probably not surprising given the error).

Once a system enters this state, repeated attempts to connect are unsuccessful and continue to produce the same error.

We've also confirmed that there are no XPC codec exceptions apparent that might cause the connection to fail.

I'm at a bit of a loss to explain why this failure might be occurring, other than a problem in the bootstrap/launchd being able to find the appropriate service. Is there possibly some problem with unsandboxed processes accessing the sandboxed network extension via XPC? They are both provisioned in an app group together. Is there possibly some issue where attempting to connect at a critical point during network extension installation causes it to become inaccessible?

We've observed this specifically on macOS 14.5 (23F79), however this is something we've noticed on other versions of macOS and our code. The problem isn't systematic, and systems end up in this state only occasionally. We do seem to find some customers have more instances of this problems than others, but we haven't been successful at teasing out any common thread that might explain why.

I think I’ve seen this before. Check out this thread.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"

Ah yes, this one looks fairly familiar thanks for pointing me to it. If I follow it correctly it seems the current workaround available is to unload and reload the system extension?

Our app (and a few other things) are distributed by .pkg rather than a dmg drag-n-drop install. Installing system extensions is done by our host app's main binary. We install a LaunchAgent that detects that installation needs to take place and triggers an app launch via NSWorkspace in a user's session. Most customers perform the pkg installation via Fleet Management (like Jamf), I suppose it's possible they are doing this outside a user session which could cause some problems?

Is there anything we can provide that would let Apple do further diagnostics on this? Most of our customers probably wouldn't notice since they manage system extension permissions via MDM, but for those few working without this, the user experience is pretty sub-optimal.

NEMachServiceName failure to access after network extension upgrade
 
 
Q