hi all.
I subscribe the notify write event, every time I recieve a notify write event message i will send log data and reply block(didn't do nothing) with async method to host app(Objc XPC API).host app will reply immediately once it recieves data. after a while my sysext crashed, then I checked system log find the log below.
launchd: exited with exit reason (namespace: 30 code: 0xc40000000004aaaa) - (unknown reason)
is it because of exceeding the maximum limit of xpc's block queue length, or too many memory allocation, or...
by the way, host app didn't crash.
how this happened exactly? how could i solve it?
So, starting with the error:
launchd: exited with exit reason (namespace: 30 code: 0xc40000000004aaaa) - (unknown reason)
The "namespace" values comes from reason.h in XNU and "30" is "OS_REASON_PORT_SPACE". Basically, something went wrong with your mach port. The error code is in the kernel's structured error format, which uses a bit field structured as (system | subsystem | code). IOKit's IOReturn.h has the clearest description of the format that I'm aware though, to be clear, that is NOT an IOKit error. You could decode the specific error by digging into mach's error handling, but I'm fairly sure I know what's going wrong based on this:
I subscribe the notify write event, every time I recieve a notify write event message i will send log data
The key word there is "every time". Those word are a warning sign with any ES message, especially when tied to a high volume message like notify write. What you're describing means that every call to "write" across the entire monitored system will end up generating a mach message to a single mach port being processed by a single thread. If that thread can't process those messages as fast as they're generated, then the only question is what will break first*.
*As a side note here, I assume you muted the host app (the receiving app)? If you haven't and the host app then writes every log message to disk... well, that's going to fail a lot faster.
The big thing to understand here is that the actual risk here ISN'T normal system load, it's an attacker who is actively trying to break your ES client. How many write calls per-second can a process generate? Say 100 threads calling "write(1 byte)" in a tight loop? I'd need to test it to say for certain, but it's certainly "1000s", probably "10,000s", maybe "100,000s"? And, of course, there's no reason an attacker needs to stick to one process. Whatever the "base" count is, an attacker can multiply it indefinitely by duplicating itself. The key point here is that the issue here isn't something you can fix by optimizing or tweaking your current approach. The attacker can generate LOTS of writes with very little effort.
The solution here is to break the direct connection between notify messages your own messaging. Typically, that means your ES client collect the data over time, then pushes the data to your other process at a slower/throttled pace.
-Kevin Elliott
DTS Engineer, CoreOS Hardware