Running 120Hz with low latency on M1 Max

I am trying to get a little game prototype up and running using Metal using the metal-cpp libraries where I run everything natively at 120Hz with a coupled renderer using Vsync turned on so that I have the absolute physically minimum input to photon latency possible.

   // Create the metal view
    SDL_MetalView metal_view = SDL_Metal_CreateView(window);
    CA::MetalLayer *swap_chain = (CA::MetalLayer *)SDL_Metal_GetLayer(metal_view);

    // Set up the Metal device
    MTL::Device *device = MTL::CreateSystemDefaultDevice();
    swap_chain->setDevice(device);
    swap_chain->setPixelFormat(MTL::PixelFormat::PixelFormatBGRA8Unorm);
    swap_chain->setDisplaySyncEnabled(true);
    swap_chain->setMaximumDrawableCount(2);

I am using SDL3 just for creating the window. Now when I go through my game / render loop - I stall for a long time on getting the next drawable which is understandable - my app runs in about 2-3ms.

    m_CurrentContext->m_Drawable = m_SwapChain->nextDrawable();
	m_CurrentContext->m_CommandBuffer = m_CommandQueue->commandBuffer()->retain();
    char frame_label[32];
    snprintf(frame_label, sizeof(frame_label), "Frame %d", m_FrameIndex);
    m_CurrentContext->m_CommandBuffer->setLabel(NS::String::string(frame_label, NS::UTF8StringEncoding));

    m_CurrentContext->m_RenderPassDescriptor[ERenderPassTypeNormal] = MTL::RenderPassDescriptor::alloc()->init();
    MTL::RenderPassColorAttachmentDescriptor* cd = m_CurrentContext->m_RenderPassDescriptor[ERenderPassTypeNormal]->colorAttachments()->object(0);
    cd->setTexture(m_CurrentContext->m_Drawable->texture());
    cd->setLoadAction(MTL::LoadActionClear);
    cd->setClearColor(MTL::ClearColor( 0.53f, 0.81f, 0.98f, 1.0f ));
    cd->setStoreAction(MTL::StoreActionStore);

However my ProMotion display does not reliably run at 120Hz when fullscreen and using the direct to display system - it seems to run faster when windowed in composite which is the opposite of what I would expect. The Metal HUD says 120Hz, but the delay to getting the next drawable and looking at what Instruments is saying tells otherwise.

When I profile it, the game loop has completed and is sitting there waiting for the next drawable, but the screen does not want to complete in 8.33ms, so the whole thing slows down for no discernible reason.

Also as a game developer it is very strange for the command buffer to actually need the drawable texture free to be allowed to encode commands - usually the command buffers and swapping the front and back render buffers are not directly dependent on each other. Usually you only actually need the render buffer texture free when you want to draw to it. I could give myself another drawable, but because I am completing in less than 3ms, all it would do would be to add another frame of latency.

I also looked at the FramePacing example and its behaviour is even worse at having high framerate with low latency - the direct to display is always rejected for some reason.

Is this just a flaw in the Metal API? Or am I missing something important? I hope someone can help - the behaviour of the display is baffling.

Answered by DTS Engineer in 804003022

Hi,

Alecazam is partially correct about using 2 drawables i.e. not recommended for iOS or iPadOS; however, 2 drawables is ok on macOS assuming you're within frame budget.

Overall we recommend Improving your game’s graphics performance and settings for guidance.

Other than adjusting the number of drawables you may want to reconsider:

swap_chain->setDisplaySyncEnabled(true);

because "that determines whether the layer synchronizes its updates to the display’s refresh rate." i.e. attempts to lock display updates to vsync.

For frame pacing in particular we strongly advise using CAMetalDisplayLink over CADisplayLink (as shown in combination with MTKView in many of our samples). If it isn't meeting your needs for some reason then let's discuss in a new thread. Note that SDL3 is out of scope.

Lastly, you may find Optimizing ProMotion refresh rates for iPhone 13 Pro and iPad Pro helpful.

In fact, I get the same behaviour if I modify the LearnMetalCPP texturing example to allow the window to be resizeable, set the preferred framerate to 120 and make the num frames in flight in two places to be 2 rather than 3

void MyAppDelegate::applicationDidFinishLaunching( NS::Notification* pNotification )
{
    CGRect frame = (CGRect){ {100.0, 100.0}, {1024.0, 1024.0} };

    _pWindow = NS::Window::alloc()->init(
        frame,
        NS::WindowStyleMaskClosable|NS::WindowStyleMaskTitled | NS::WindowStyleMaskResizable,
        NS::BackingStoreBuffered,
        false );

    _pDevice = MTL::CreateSystemDefaultDevice();

    _pMtkView = MTK::View::alloc()->init( frame, _pDevice );
    _pMtkView->setColorPixelFormat( MTL::PixelFormat::PixelFormatBGRA8Unorm_sRGB );
    _pMtkView->setClearColor( MTL::ClearColor::Make( 0.1, 0.1, 0.1, 1.0 ) );
    _pMtkView->setDepthStencilPixelFormat( MTL::PixelFormat::PixelFormatDepth16Unorm );
    _pMtkView->setClearDepth( 1.0f );
    _pMtkView->setPreferredFramesPerSecond(120);

    _pViewDelegate = new MyMTKViewDelegate( _pDevice );
    _pMtkView->setDelegate( _pViewDelegate );

    _pWindow->setContentView( _pMtkView );
    _pWindow->setTitle( NS::String::string( "07 - Texture Mapping", NS::StringEncoding::UTF8StringEncoding ) );

    _pWindow->makeKeyAndOrderFront( nullptr );

    NS::Application* pApp = reinterpret_cast< NS::Application* >( pNotification->object() );
    pApp->activateIgnoringOtherApps( true );
}
static constexpr size_t kMaxFramesInFlight = 2;
const int Renderer::kMaxFramesInFlight = 2;

I get 80Hz on average rather than reliably 120Hz when I switch into fullscreen

You use multiple command buffers, and request the drawable as late as humanly possible. You enqueue them, so they can be filled out in threads and sequenced properly.

It's lame but nextDrawable is how the compositor throttles the app.

Also make sure you have the MTKView set to framebuffer only. This will keep it linear for video scan out.

Finally, don't run x64 through Rosetta 2, or Apple has some bug since macOS 11 that limits the top framerate to 60Hz. So you'll never hit 120Hz.

Also don't use 2 drawables. You need to triple buffer. 2 drawables is basically impossible to make work with the macOS and iOS compositor.

Hi,

Alecazam is partially correct about using 2 drawables i.e. not recommended for iOS or iPadOS; however, 2 drawables is ok on macOS assuming you're within frame budget.

Overall we recommend Improving your game’s graphics performance and settings for guidance.

Other than adjusting the number of drawables you may want to reconsider:

swap_chain->setDisplaySyncEnabled(true);

because "that determines whether the layer synchronizes its updates to the display’s refresh rate." i.e. attempts to lock display updates to vsync.

For frame pacing in particular we strongly advise using CAMetalDisplayLink over CADisplayLink (as shown in combination with MTKView in many of our samples). If it isn't meeting your needs for some reason then let's discuss in a new thread. Note that SDL3 is out of scope.

Lastly, you may find Optimizing ProMotion refresh rates for iPhone 13 Pro and iPad Pro helpful.

I would use the CAMetalDisplayLink, but when I adapt the example code for it I get similar issues - it seems like the fullscreen mode management of the drawables is not doing what I would expect.

I suspect that the bugs come from setting the display sync which is somehow using up one of the drawables (through not releasing at the right time) interacting with whatever code decides how long to display the current frame.

The two things seem to influence each other in bad ways. For example, I would have expected that if I pack both CPU and GPU time added together down to sub 8ms, then the very next vsync, it should show and release the drawable for the next frame. But that is not happening and the drawable is being held on to and then the timer which is trying to manage the ProMotion thinks things have gone on too long and hold the frame time too.

I suspect that the display sync enabled code path just has not been tested for 120hz properly when there is more than nothing going on, but way less than 8ms worth.

The CAMetalDisplayLink also does not seem to like the Sync being enabled especially if you have an external monitor attached - it can't seem to tell which monitor has which Vsync

Just if it wasn't clear - I want to lock to VSync because that is optimal from a latency point of view - it should be a guarantee of the smallest time from input to screen because you are synchronising the update code to it.

I am not running decoupled deliberately to have the smallest latency possible

This is an example where everything is complete well before vsync and yet everything just decides to wait for another 8ms. The encoder and the GPU work is completing on the verge of 8 ms. This happens regularly.

In the examples given, the encoder is running and the frame starts / ends when the command buffer is committed, but if you have the vsync enabled but the sync is to get the drawable, then the encoder has to eat up GPU time to be done by the next vsync.

If I go to three drawables to "fix" this timing issue, then it holds back a frame rather than going as early as possible.

I am currently call presentDrawableAfterMinimumDuration with a 4ms duration to try and trick it to release early, but it doesn't seem to make much of a difference

Running 120Hz with low latency on M1 Max
 
 
Q