Neural Engine Request Overhead

I have several CoreML models that I've set up to run in sequence where one of the outputs from each model is passed as one of the inputs to the next.

For the most part, there is very little overhead in between each sub-model "chunk":

However a couple of the models (eg the first two above) spend a noticeable amount of time in "Prepare Neural Engine Request". From Instruments, it seems like this is spent doing some sort of model loading.

Given that I'm calling these models in sequence and in a fixed order, is there some way to reduce or amortize this cost? Thanks!

Neural Engine Request Overhead
 
 
Q