CoreML model using excessive ram during prediction

I have an mlprogram of size 127.2MB it was created using tensorflow and then converted to CoreML. When I request a prediction the amount of memory shoots up to 2-2.5GB every time. I've tried using the optimization techniques in coremltools but nothing seems to work it still shoots up to the same 2-2.5GB of ram every time. I've attached a graph to see it doesn't seem to be a leak as the memory is then going back down.

Hi @Michi314,

I am running into a similar issue, where the memory usage is around 2GB for a 100MB model. Did you find a solution to this?

When instantiating the CoreML model try to pass a MLModelConfiguration object. Play around with the config's computeUnits options. For example:

let config MLModelConfiguration()
config.computeUnits = .cpuOnly

let model = MLModel(contentsOf: localModelUrl, configuration: config)

In my case .cpuOnly worked best, but it's different for different models, try the other options as well.

Core ML runtime needs to allocate intermediate tensors, which can be large depending on the model architecture, data type, and the compute device.

It's hard to say more without looking at the actual model. We would appreciate if you submit a problem report through Feedback assistance with the model attached. A few important data points to be included in the report are:

  1. The Model (.mlpackage, .mlmodel, or .mlmodelc file)
  2. The compute unit (See https://developer.apple.com/documentation/coreml/mlmodelconfiguration/computeunits)
  3. (If the model uses a flexible shape) the shape of the actual input.

@Frameworks Engineer how can we further optimize the memory usage for CoreML? I find that for my model, also around the size of 100+MB, on CPU it takes up ~1GB memory, but on GPU it takes up more than 1.7GB memory.

Could we understand further on how memory allocation happens on CPU / GPU / ANE, and if there is a way that we can tune it? (e.g. on GPU, I understand that CoreML uses MPSGraph, so is there a way we can reduce the concurrent ops passed into the MTLCommandQueue to reduce peak memory usage?)

CoreML model using excessive ram during prediction
 
 
Q