CIImageProcessorKernel using Metal Compute Pipeline error

Greetings! I have been battling with a bit of a tough issue. My use case is running a pixelwise regression model on a 2D array of images using CIImageProcessorKernel and a custom Metal Shader.

It mostly works great, but the issue that arises is that if the regression calculation in Metal takes too long, an error occurs and the resulting output texture has strange artifacts, for example:

The specific error is:

Error excuting command buffer = Error Domain=MTLCommandBufferErrorDomain Code=1 "Internal Error (0000000e:Internal Error)" UserInfo={NSLocalizedDescription=Internal Error (0000000e:Internal Error), NSUnderlyingError=0x60000320ca20 {Error Domain=IOGPUCommandQueueErrorDomain Code=14 "(null)"}} (com.apple.CoreImage)

There are multiple levels of concurrency: Swift Concurrency calling the Core Image code (which shouldn't have an impact) and of course the Metal command buffer.

Is there anyway to ensure the compute command encoder can complete its work?

Here is the full implementation of my CIImageProcessorKernel subclass:

class ParametricKernel: CIImageProcessorKernel {
    static let device = MTLCreateSystemDefaultDevice()!

    override class var outputFormat: CIFormat {
        return .BGRA8
    }

    override class func formatForInput(at input: Int32) -> CIFormat {
        return .BGRA8
    }

    override class func process(with inputs: [CIImageProcessorInput]?, arguments: [String : Any]?, output: CIImageProcessorOutput) throws {
        guard
            let commandBuffer = output.metalCommandBuffer,
            let images = arguments?["images"] as? [CGImage],
            let mask = arguments?["mask"] as? CGImage,
            let fillTime = arguments?["fillTime"] as? CGFloat,
            let betaLimit = arguments?["betaLimit"] as? CGFloat,
            let alphaLimit = arguments?["alphaLimit"] as? CGFloat,
            let errorScaling = arguments?["errorScaling"] as? CGFloat,
            let timing = arguments?["timing"],
            let TTRThreshold = arguments?["ttrthreshold"] as? CGFloat,
            let input = inputs?.first,
            let sourceTexture = input.metalTexture,
            let destinationTexture = output.metalTexture
        else {
            return
        }

        guard let kernelFunction = device.makeDefaultLibrary()?.makeFunction(name: "parametric") else {
            return
        }

        guard let commandEncoder = commandBuffer.makeComputeCommandEncoder() else {
            return
        }

        let imagesTexture = Texture.textureFromImages(images)

        let pipelineState = try device.makeComputePipelineState(function: kernelFunction)
        commandEncoder.setComputePipelineState(pipelineState)
        commandEncoder.setTexture(imagesTexture, index: 0)

        let maskTexture = Texture.textureFromImages([mask])
        
        commandEncoder.setTexture(maskTexture, index: 1)
        commandEncoder.setTexture(destinationTexture, index: 2)

        var errorScalingFloat = Float(errorScaling)
        let errorBuffer = device.makeBuffer(bytes: &errorScalingFloat, length: MemoryLayout<Float>.size, options: [])
        commandEncoder.setBuffer(errorBuffer, offset: 0, index: 1)

       // Other buffers omitted....

        let threadsPerThreadgroup = MTLSizeMake(16, 16, 1)
        let width = Int(ceil(Float(sourceTexture.width) / Float(threadsPerThreadgroup.width)))
        let height = Int(ceil(Float(sourceTexture.height) / Float(threadsPerThreadgroup.height)))
        let threadGroupCount = MTLSizeMake(width, height, 1)

        commandEncoder.dispatchThreadgroups(threadGroupCount, threadsPerThreadgroup: threadsPerThreadgroup)
        commandEncoder.endEncoding()
    }
}

I'm not sure using Core Image is the best choice here. CI might impose limits on the runtime of kernels, and your regression kernel seems too expensive.

It's also not intended that you pass the images and mask as CGImage via the arguments into the kernel. It would be better if you'd convert them to CIImage first and then pass them via the inputs parameter. CI would then convert them to Metal textures for you. Unfortunately, Core Image doesn't support texture arrays, so you would need to find a workaround for that.

Have you tried running your kernel in a pure Metal pipeline? It might be the better choice here. Or do you need it to be part of a Core Image pipeline?

if the regression calculation in Metal takes too long

After how long do you see it failing?

Do you also have a requirement to hardcode the threadgroup size? Or did you notice that it performed best with 16x16 ? https://developer.apple.com/documentation/metal/compute_passes/calculating_threadgroup_and_grid_sizes provides sample code for dynamically computing this size.

As a side note, although I don't expect this to be the core of your issue, you shouldn't instantiate the compute pipeline state as part of the process() function. This needs to be done only once, not for each process() call. This state should be created beforehand or at least cached.

Thank you for your reply! I have no specific reason to use the Core Image pipeline besides ease of use with the API. I tried switching to a Metal pipeline, but am still seeing the image artifacts and this corresponding error:

metal Execution of the command buffer was aborted due to an error during execution. Internal Error (0000000e:Internal Error)

Another error mentioned the GPU was timing out, but I'm having trouble reproducing that one. Is there any way to make the command buffer wait longer?

CIImageProcessorKernel using Metal Compute Pipeline error
 
 
Q