Metal Performance Shader color issue with yCbCr buffer

Question

Created Aug ’24

Replies 3

Boosts 0

Participants 2

I'm making an app that reads a ProRes file, processes each frame through metal to resize and scale it, then outputs a new ProRes file. In the future the app will support other codecs but for now just ProRes. I'm reading the ProRes 422 buffers in the kCVPixelFormatType_422YpCbCr16 pixel format. This is what's recommended by Apple in this video https://developer.apple.com/wwdc20/10090?time=599.

When the MTLTexture is run through a metal performance shader, the colorspace seems to force RGB or is just not allowing yCbCr textures as the output is all green/purple. If you look at the render code, you will see there's a commented out block of code to just blit copy the outputTexture, if you perform the copy instead of the scaling through MPS, the output colorspace is fine. So it appears the issue is from Metal Performance Shaders. Side note - I noticed that when using this format, it brings in the YpCbCr texture as a single plane. I thought it's preferred to handle this as two separate planes? That said, if I have two separate planes, that makes my app more complicated as I would need to scale both planes or merge it to RGB. But I'm going for the most performance possible.

A sample project can be found here: https://www.dropbox.com/scl/fo/jsfwh9euc2ns2o3bbmyhn/AIomDYRhxCPVaWw9XH-qaN0?rlkey=sp8g0sb86af1u44p3xy9qa3b9&dl=0

Inside the supporting files, there is a test movie. For ease, I would move this to somewhere easily accessible (i.e Desktop).

Load and run the example project.
Click 'Select Video'
Select that video you placed on your desktop
It will now output a new video next to the selected one, named "Output.mov"
The new video should just be scaled at 50%, but the colorspace is all wrong.

Below is a photo of before and after the metal performance shader. Screenshot 2024-08-20 at 10.05.47 PM.png

Answered by DTS Engineer in 801071022

Hello @Joebayld,

kCVPixelFormatType_422YpCbCr16 is a packed YUV format with chroma subsampling where 2 pixels are stored in 64 bits. (Take a look at https://developer.apple.com/library/archive/technotes/tn2162/_index.html#//apple_ref/doc/uid/DTS40013070-CH1-TNTAG8-V216__4_2_2_COMPRESSION_TYPE for a detailed description of the pixel format).

Your code is taking each pixel from this buffer and mapping it into a 32-bit RGBA texture (using CVMetalTextureCache), and then filtering that texture with MPSImageBilinearScale. So one of the problems is that the filter has no way of knowing that the data represents packed YUV data with chroma subsampling, so I'm not surprised that produced the result in your screenshot. You are welcome to file an enhancement request using Feedback Assistant, but in the meantime, you will need to find a different solution.

My recommendation to you is to have AVFoundation convert the sample to an RGB format for you, kCVPixelFormatType_64RGBAHalf would maintain the precision of the source.

Best regards,

Greg

Boost

Answer 1

DTS Engineer OP

Apple

Aug ’24

Recommended

Hello @Joebayld,

kCVPixelFormatType_422YpCbCr16 is a packed YUV format with chroma subsampling where 2 pixels are stored in 64 bits. (Take a look at https://developer.apple.com/library/archive/technotes/tn2162/_index.html#//apple_ref/doc/uid/DTS40013070-CH1-TNTAG8-V216__4_2_2_COMPRESSION_TYPE for a detailed description of the pixel format).

Your code is taking each pixel from this buffer and mapping it into a 32-bit RGBA texture (using CVMetalTextureCache), and then filtering that texture with MPSImageBilinearScale. So one of the problems is that the filter has no way of knowing that the data represents packed YUV data with chroma subsampling, so I'm not surprised that produced the result in your screenshot. You are welcome to file an enhancement request using Feedback Assistant, but in the meantime, you will need to find a different solution.

My recommendation to you is to have AVFoundation convert the sample to an RGB format for you, kCVPixelFormatType_64RGBAHalf would maintain the precision of the source.

Best regards,

Greg

0

Answer 2

Joebayld OP

Aug ’24

@DTS Engineer Thanks for the quick response.

I'm not sure what the enhancement request would be other than maybe support for specifying on a MPS shader the pixelFormat so it knows how to handle it.

Unfortunately the AVFoundation RGB format makes performance significantly worse - so we want to ideally decode natively. If I want to keep things decoded YUV - would the better performance be to convert to RGB in a shader then do the MPS Scale? Or just create two different MPS filters and use a bi-planar decode format. The bi-planar seems better for performance because I would need to encode back to a ProRes file quickly.

Thanks, Joe

0

Answer 3

DTS Engineer OP

Apple

Aug ’24

Hey @Joebayld,

I'm not sure what the enhancement request would be other than maybe support for specifying on a MPS shader the pixelFormat so it knows how to handle it.

You've got it, more or less. I recommend requesting that MPSImageBilinearScale be enhanced to support scaling of packed yuv data.

Unfortunately the AVFoundation RGB format makes performance significantly worse - so we want to ideally decode natively.

You could stick with the native format, but you would need to implement your own kernel to do the scaling. Depending on the implementation, it is possible that the performance would actually be worse overall, compared to converting to a different format and then scaling.

If I want to keep things decoded YUV - would the better performance be to convert to RGB in a shader then do the MPS Scale? Or just create two different MPS filters and use a bi-planar decode format. The bi-planar seems better for performance because I would need to encode back to a ProRes file quickly.

Just as a general rule of thumb as it relates to performance of a video pipeline: Video pipelines tend to be quite complex with lots of small but significant details, often times the only way to truly discover if one implementation is more performant than another is to implement them both and then measure and compare.

Best regards,

Greg

0