Hi, I'm trying to modify the ScreenCaptureKit Sample code by implementing an actor for Metal rendering, but I'm experiencing issues with frame rendering sequence.
My app workflow is:
ScreenCapture -> createFrame -> setRenderData
Metal draw callback -> renderAsync (getData from renderData)
I've added timestamps to verify frame ordering, I also using binarySearch to insert the frame with timestamp, and while the timestamps appear to be in sequence, the actual rendering output seems out of order.
// ScreenCaptureKit sample
func createFrame(for sampleBuffer: CMSampleBuffer) async {
if let surface: IOSurface = getIOSurface(for: sampleBuffer) {
await renderer.setRenderData(surface, timeStamp: sampleBuffer.presentationTimeStamp.seconds)
}
}
class Renderer {
...
func setRenderData(surface: IOSurface, timeStamp: Double) async {
_ = await renderSemaphore.getSetBuffers(
isGet: false,
surface: surface,
timeStamp: timeStamp
)
}
func draw(in view: MTKView) {
Task {
await renderAsync(view)
}
}
func renderAsync(_ view: MTKView) async {
guard await renderSemaphore.beginRender() else { return }
guard let frame = await renderSemaphore.getSetBuffers(
isGet: true, surface: nil, timeStamp: nil
) else {
await renderSemaphore.endRender()
return }
guard let texture = await renderSemaphore.getRenderData(
device: self.device,
surface: frame.surface) else {
await renderSemaphore.endRender()
return
}
guard let commandBuffer = _commandQueue.makeCommandBuffer(),
let renderPassDescriptor = await view.currentRenderPassDescriptor,
let renderEncoder = commandBuffer.makeRenderCommandEncoder(descriptor: renderPassDescriptor) else {
await renderSemaphore.endRender()
return
}
// Shaders ..
renderEncoder.endEncoding()
commandBuffer.addCompletedHandler() { @Sendable (_ commandBuffer)-> Swift.Void in
updateFPS()
}
// commit frame in actor
let success = await renderSemaphore.commitFrame(
timeStamp: frame.timeStamp,
commandBuffer: commandBuffer,
drawable: view.currentDrawable!
)
if !success {
print("Frame dropped due to out-of-order timestamp")
}
await renderSemaphore.endRender()
}
}
actor RenderSemaphore {
private var frameBuffers: [FrameData] = []
private var lastReadTimeStamp: Double = 0.0
private var lastCommittedTimeStamp: Double = 0
private var activeTaskCount = 0
private var activeRenderCount = 0
private let maxTasks = 3
private var textureCache: CVMetalTextureCache?
init() {
}
func initTextureCache(device: MTLDevice) {
CVMetalTextureCacheCreate(kCFAllocatorDefault, nil, device, nil, &self.textureCache)
}
func beginRender() -> Bool {
guard activeRenderCount < maxTasks else { return false }
activeRenderCount += 1
return true
}
func endRender() {
if activeRenderCount > 0 {
activeRenderCount -= 1
}
}
func setTextureLoaded(_ loaded: Bool) {
isTextureLoaded = loaded
}
func getSetBuffers(isGet: Bool, surface: IOSurface?, timeStamp: Double?) -> FrameData? {
if isGet {
if !frameBuffers.isEmpty {
let frame = frameBuffers.removeFirst()
if frame.timeStamp > lastReadTimeStamp {
lastReadTimeStamp = frame.timeStamp
print(frame.timeStamp)
return frame
}
}
return nil
} else {
// Set
let frameData = FrameData(
surface: surface!,
timeStamp: timeStamp!
)
// insert to the right position
let insertIndex = binarySearch(for: timeStamp!)
frameBuffers.insert(frameData, at: insertIndex)
return frameData
}
}
private func binarySearch(for timeStamp: Double) -> Int {
var left = 0
var right = frameBuffers.count
while left < right {
let mid = (left + right) / 2
if frameBuffers[mid].timeStamp > timeStamp {
right = mid
} else {
left = mid + 1
}
}
return left
}
// for setRenderDataNormalized
func tryEnterTask() -> Bool {
guard activeTaskCount < maxTasks else { return false }
activeTaskCount += 1
return true
}
func exitTask() {
activeTaskCount -= 1
}
func commitFrame(timeStamp: Double,
commandBuffer: MTLCommandBuffer,
drawable: MTLDrawable) async -> Bool {
guard timeStamp > lastCommittedTimeStamp else {
print("Drop frame at commit: \(timeStamp) <= \(lastCommittedTimeStamp)")
return false
}
commandBuffer.present(drawable)
commandBuffer.commit()
lastCommittedTimeStamp = timeStamp
return true
}
func getRenderData(
device: MTLDevice,
surface: IOSurface,
depthData: [Float]
) -> (MTLTexture, MTLBuffer)? {
let _textureName = "RenderData"
var px: Unmanaged<CVPixelBuffer>?
let status = CVPixelBufferCreateWithIOSurface(kCFAllocatorDefault, surface, nil, &px)
guard status == kCVReturnSuccess, let screenImage = px?.takeRetainedValue() else {
return nil
}
CVMetalTextureCacheFlush(textureCache!, 0)
var texture: CVMetalTexture? = nil
let width = CVPixelBufferGetWidthOfPlane(screenImage, 0)
let height = CVPixelBufferGetHeightOfPlane(screenImage, 0)
let result2 = CVMetalTextureCacheCreateTextureFromImage(
kCFAllocatorDefault,
self.textureCache!,
screenImage,
nil,
MTLPixelFormat.bgra8Unorm,
width,
height,
0, &texture)
guard result2 == kCVReturnSuccess,
let cvTexture = texture,
let mtlTexture = CVMetalTextureGetTexture(cvTexture) else {
return nil
}
mtlTexture.label = _textureName
let depthBuffer = device.makeBuffer(bytes: depthData, length: depthData.count * MemoryLayout<Float>.stride)!
return (mtlTexture, depthBuffer)
}
}
Above's my code - could someone point out what might be wrong?
Metal
RSS for tagRender advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.
Posts under Metal tag
200 Posts
Sort by:
Post
Replies
Boosts
Views
Activity
In our app we use CoreML. But ever since macOS 15.x was released we started to get a great bunch of crashes like this:
Incident Identifier: 424041c3-884b-4e50-bb5a-429a83c3e1c8
CrashReporter Key: B914246B-1291-4D44-984D-EDF84B52310E
Hardware Model: Mac14,12
Process: <REMOVED> [1509]
Path: /Applications/<REMOVED>
Identifier: com.<REMOVED>
Version: <REMOVED>
Code Type: arm64
Parent Process: launchd [1]
Date/Time: 2024-11-13T13:23:06.999Z
Launch Time: 2024-11-13T13:22:19Z
OS Version: Mac OS X 15.1.0 (24B83)
Report Version: 104
Exception Type: SIGABRT
Exception Codes: #0 at 0x189042600
Crashed Thread: 36
Thread 36 Crashed:
0 libsystem_kernel.dylib 0x0000000189042600 __pthread_kill + 8
1 libsystem_c.dylib 0x0000000188f87908 abort + 124
2 libsystem_c.dylib 0x0000000188f86c1c __assert_rtn + 280
3 Metal 0x0000000193fdd870 MTLReportFailure.cold.1 + 44
4 Metal 0x0000000193fb9198 MTLReportFailure + 444
5 MetalPerformanceShadersGraph 0x0000000222f78c80 -[MPSGraphExecutable initWithMPSGraphPackageAtURL:compilationDescriptor:] + 296
6 Espresso 0x00000001a290ae3c E5RT::SharedResourceFactory::GetMPSGraphExecutable(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, NSDictionary*) + 932
.
.
.
43 CoreML 0x0000000192d263bc -[MLModelAsset modelWithConfiguration:error:] + 120
44 CoreML 0x0000000192da96d0 +[MLModel modelWithContentsOfURL:configuration:error:] + 176
45 <REMOVED> 0x000000010497b758 -[<REMOVED> <REMOVED>] (<REMOVED>)
No similar crashes on macOS 12-14!
MetalPerformanceShadersGraph.log
Any clue what is causing this?
Thanks! :)
I want to turn off my ray-tracing conditionally. There's is_null_acceleration_structure but when I don't bind an acceleration structure (or pass nil to setFragmentAccelerationStructure), I get the following API validation error:
-[MTLDebugRenderCommandEncoder validateCommonDrawErrors:]:5782: failed assertion `Draw Errors Validation
Fragment Function(vol_deferred_lighting): missing instanceAccelerationStructure binding at index 6 for accelerationStructure[0].
I can turn off API validation and it works, but it seems like I should be able to use nil for the acceleration structure w/o triggering a validation error. Seems like a bug, right?
I suppose I can work around this by creating a separate pipeline with the ray-tracing disabled via a function constant instead of using is_null_acceleration_structure.
(Can we get a ray-tracing tag for questions?)
The Problem
When transitioning between view controllers that each have their own MTKView but share a Metal renderer backend, we run into delegate ownership conflicts. Only one MTKView can successfully render at a time, since setting the delegate on one view requires removing it from the other, leading to paused views during transitions.
For my app, I need to display the same visuals across multiple views and have them all render correctly.
Current Implementation Approach
I've created a container object that manages the MTKView and its relationship with the shared renderer:
class RenderContainer {
let metalView: MTKView
private let renderer: MetalRenderer
func startRendering() {
metalView.delegate = renderer
metalView.isPaused = false
}
func stopRendering() {
metalView.isPaused = true
metalView.delegate = nil
}
}
View controllers manage the rendering lifecycle in their view appearance methods:
override func viewWillAppear(_ animated: Bool) {
super.viewWillAppear(animated)
renderContainer.startRendering()
}
override func viewWillDisappear(_ animated: Bool) {
super.viewWillDisappear(animated)
renderContainer.stopRendering()
}
Observations & Issues
During view controller transitions, one MTKView must stop rendering before the other can start. Also there is no guarantee that the old view will stop rendering before the new one starts, with the current API design.
This creates a visual "pop" during animated transitions
Setting isPaused = true helps prevent unnecessary render calls but doesn't solve the core delegate ownership problem
The shared renderer maintains its state but can only output to one view at a time
Questions
What's the recommended approach for handling MTKView delegate ownership during animated transitions?
Are there ways to maintain visual continuity without complex view hierarchies?
Should I consider alternative architectures for sharing the Metal content between views?
Any insights for this scenario would be appreciated.
I am using SCNTechnique in combination with ARSCNView.
The technique is doing so minor post-processing. I have written several filter variant for this post-processing, but I'm facing an issue when with one of the filters/fragment shaders, SCNTechnique discards my output and just presents the plain camera feed on screen instead. This is clearly visible in the Metal pipeline, using the GPU frame debugger.
Let me stress that my setup works for 90% of my filters, but not this one and I want to know why.
iOS 18.1, iPhone 13 Mini. Xcode 16.1.
Encoder 0 & 1 are injected by the system.
Render encoder 2 & 3 correspond to my SCNTechnique's render passes: one to manipulate pixel data (darken it in this case) and another to BLIT it back to the main texture. I know the separate buffer is not strictly for this particular operation, but it shouldn't matter.
Note that the issue occurs in encoder 4 (not mine but ARKit's).
In Render Encoder 4, scn_postprocess_AR_fragment handle my texture (#0, ending in f980) and another from the camera feed (Texture 2). I know this pass is typically used for grain because that's what it used to do before I disabled grain on ARSCNView (+ the buffer still contains grain paramaters).
I have other post-processing filters that work just fine. By what magic is ARKit determining to use Texture 2 instead of my Texture 0?
Sure, I could keep digging into the minute differences between my shaders to find out which LoC affects how some ARKit shader down the line operates, but it's awfully opaque so far.
I plan to create a simple motion graphics software for macOS that animates text, basic shapes, and handles audio. I'll use SwiftUI for the UI.
What are the commonly used technologies for rendering animated graphics? Core Animation is suitable for UI animations but not for exporting and controlling UI animations.
Basic requirements:
Timeline user interface
Animation of text and basic shapes
Viewer in SwiftUI GUI with transport control (play, pause, scrub, …)
Export to video file
Is Metal or Core Graphics typically used directly? I want to keep it as simple as possible.
I would like to take YCbCr CVPixelBuffers from AVCaptureVideoDataOutput, apply some processing in RGB space, render to an MTKView, and pass to AVAssetWriter for recording.
Right now, I'm doing this all manually – deswing the incoming data if necessary, choose the right matrix to convert to RGB, apply processing, etc. I also have to convert back to YCbCr before feeding the frames to AVAssetWriter because encoding performs much better if I do. Is there any efficient, built-in way to achieve the same?
I can't use AVCaptureVideoPreviewLayer, since I need to do some further processing before display. I can't use AVCaptureVideoDataOutput's videoSettings to get automatic BGRA conversion because that would lose bit depth for 10 bit video formats (and isn't available on all formats anyway).
I see these Accelerate functions, but they seemingly don't use the GPU, nor do they support all the formats and bit depths I'd need.
I found reference to some undocumented MTLPixelFormats that seem to do exactly what I want, but I don't want to rely on something like this unless it's explicitly endorsed. This would also incur an RGB/YCbCr conversion on every texture read and write, right?
Is there anything I'm missing here?
I am building a MacOS desktop app (https://anukari.com) that is using Metal compute to do real-time audio/DSP processing, as I have a problem that is highly parallelizable and too computationally expensive for the CPU.
However it seems that the way in which I am using the GPU, even when my app is fully compute-limited, the OS never increases the power/performance state. Because this is a real-time audio synthesis application, it's a huge problem to not be able to take advantage of the full clock speeds that the GPU is capable of, because the app can't keep up with real-time.
I discovered this issue while profiling the app using Instrument's Metal tracing (and Game tracing) modes. In the profiling configuration under "Metal Application" there is a drop-down to select the "Performance State." If I run the application under Instruments with Performance State set to Maximum, it runs amazingly well, and all my problems go away.
For comparison, when I run the app on its own, outside of Instruments, the expensive GPU computation it's doing takes around 2x as long to complete, meaning that the app performs half as well.
I've done a ton of work to micro-optimize my Metal compute code, based on every scrap of information from the WWDC videos, etc. A problem I'm running into is that I think that the more efficient I make my code, the less it signals to the OS that I want high GPU clock speeds!
I think part of why the OS is confused is that in most use cases, my computation can be done using only a small number of Metal threadgroups. I'm guessing that the OS heuristics see that only a small fraction of the GPU is saturated and fail to scale up the power/clock state.
I'm not sure what to do here; I'm in a bit of a bind. One possibility is that I intentionally schedule busy work -- spin threadgroups just to waste energy and signal to the OS that I need higher clock speeds. This is obviously a really bad idea, but it might work.
Is there any other (better) way for my app to signal to the OS that it is doing real-time latency-sensitive computation on the GPU and needs the clock speeds to be scaled up?
Note that game mode is not really an option, as my app also runs as an AU plugin inside hosts like Garageband, so it can't be made fullscreen, etc.
I have to guess which command queue is the one I want to debug in Xcode here:
I have label set, but it doesn't seem to affect this part of Xcode.
I have multiple CAMetalLayers that I render content to and noticed that the graphics overview HUD does not function properly when I have more than one CAMetalLayer. The values reported will be very strange. For example, FPS may report 999 or some large negative value. It the HUD simply not designed to work with multiple CAMetalLayers or MTKViews? When I disable all but one of my CAMetalLayers, the HUD works as expected.
Hi!
How to define and call an inline function in Metal? Or simple function that will return some value.
Case:
inline uint index4D(constant _4D& shape,
constant uint& n,
constant uint& c,
constant uint& h,
constant uint& w) {
return n * shape.C * shape.H * shape.W + c * shape.H * shape.W + h * shape.W + w;
}
When I call it in my kernel function I get No matching function for call error.
Thx in advance.
When generating large arrays of random numbers, NaNs show up. They also show up at the same indices when using the same seed, leading me to believe that this is a bug with MPSMatrixRandom's normally distributed Float32 random number distribution.
Happens with both Philox and MTGP32.
Is this intentional and how do I work around this?
See the original post for a MWE in Swift and Julia: https://github.com/JuliaGPU/Metal.jl/issues/474
Hi,
A user sent us a crash report that indicates an error occurring just after loading the default Metal library of our app.
Application Specific Information:
Crashing on exception: *** -[__NSArrayM objectAtIndex:]: index 0 beyond bounds for empty array
The report pointed me to these (simplified) lines of codes in the library setup:
_vertexFunctions = [[NSMutableArray alloc] init];
_fragmentFunctions = [[NSMutableArray alloc] init];
id<MTLLibrary> library = [device newDefaultLibrary];
2 vertex shaders and 5 fragment shaders are then loaded and stored in these two arrays using this method:
-(BOOL) addShaderNamed:(NSString *)name library:(id<MTLLibrary>)library isFragment:(BOOL)isFragment {
id shader = [library newFunctionWithName:name];
if (!shader) {
ALOG(@"Error : Unable to find the shader named : “%@”", name);
return NO;
}
[(isFragment ? _fragmentFunctions : _vertexFunctions) addObject:shader];
return YES;
}
As you can see, the arrays are not filled if the method fails... however, a few lines later, they are used without checking if they are really filled, and that causes the crash...
But this coding error doesn't explain why no shader of a certain type (or both types) have been added to the array, meaning: why -newFunctionWithName: returned nil for all given names (since the implied array appears completely empty)?
Clue
This error has only be detected once by a user running the app on macOS 10.13 with a NVIDIA Web Driver instead of the default macOS graphic driver. Moreover, it wasn't possible to reproduce the problem on the same OS using the native macOS driver.
So my question is: is it some known conflicts between NVIDIA drivers and the use of Metal libraries? Or does this case would require some specific options in the Metal implementation?
Any help appreciated, thanks!
Hi,
When using a High Definition Display, is there a way to render at exactly the target resolution on the physical screen?
My understanding is that the default behavior is to render to a backing store with a resolution (in pixels) which can be twice the size of the logical resolution (in points). Then we let the OS handle the down-scaling to the actual target resolution on the screen. This is all nice for non-graphics intensive apps, but it means that my game will render at a higher resolution than needed, which seems like an obvious loss of performance.
My expectation is that, for graphics intensive application such as games, we should be able to query and render to the final resolution on the display. Can it / should it be done?
Thank you for your help :)
FYI I did find a document which explains how to setup your CAMetalLayer to render at a custom resolution. I suspect that this may be what I have to do?
if you check the code here,
https://developer.apple.com/documentation/compositorservices/interacting-with-virtual-content-blended-with-passthrough
var body: some Scene {
ImmersiveSpace(id: Self.id) {
CompositorLayer(configuration: ContentStageConfiguration()) { layerRenderer in
let pathCollection: PathCollection
do {
pathCollection = try PathCollection(layerRenderer: layerRenderer)
} catch {
fatalError("Failed to create path collection \(error)")
}
let tintRenderer: TintRenderer
do {
tintRenderer = try TintRenderer(layerRenderer: layerRenderer)
} catch {
fatalError("Failed to create tint renderer \(error)")
}
Task(priority: .high) { @RendererActor in
Task { @MainActor in
appModel.pathCollection = pathCollection
appModel.tintRenderer = tintRenderer
}
let renderer = try await Renderer(layerRenderer,
appModel,
pathCollection,
tintRenderer)
try await renderer.renderLoop()
Task { @MainActor in
appModel.pathCollection = nil
appModel.tintRenderer = nil
}
}
layerRenderer.onSpatialEvent = {
pathCollection.addEvents(eventCollection: $0)
}
}
}
.immersionStyle(selection: .constant(appModel.immersionStyle), in: .mixed, .full)
.upperLimbVisibility(appModel.upperLimbVisibility)
the only way it's dealing with the error is fatalError.
And don't think I can throw anything or return anything else?
Is there a way I can gracefully handle this and show a message box in UI?
I was hoping I could somehow trigger a failure and have https://developer.apple.com/documentation/swiftui/openimmersivespaceaction return fail.
but couldn't find a nice way to do so.
Let me know if you have ideas.
XCode: 16.0
MacBook Pro: M1 Pro, Sonoma 14.5
Device: iPhone16, 18.0.1
Game: UnrealEngine 5.4.2 based
100% crash after gpu cature, even only renders a login UI
I've been working with ARKit and Metal on the Vision Pro, but I've encountered a slight flickering issue with the mesh rendered using Metal. The flickering tends to occur around the edges of objects or on pixels with high color contrast, and it becomes more noticeable as the distance increases. Is there any way to resolve this issue?
I have been concentrating on developing the visionOS application. While I am currently quite familiar with RealityKit, CompositorServices has also captured my attention. I have not yet acquired knowledge of CompositorServices. Could you please clarify whether it is essential for me to learn CompositorServices? Additionally, I would appreciate it if you could provide insights into the advantages of RealityKit and CompositorServices.
When I was using Instruments to test the Display on my phone, I discovered a long-duration frame. Below that frame, there were some gaps in the vsync queue. As I understand it, vsync signals should appear consistently and steadily. How can this behavior be explained?
My phone is iPhone 15 Plus and iOS 18.
Hey guys,
is it possible to implement mirror like reflections like in this project:
https://developer.apple.com/documentation/metal/metal_sample_code_library/rendering_reflections_in_real_time_using_ray_tracing
for visionOS? Or is the hardware not prepared for Metal Raytracing?
Thanks in advance