Render advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.

Posts under Metal tag

200 Posts
Sort by:

Post

Replies

Boosts

Views

Activity

OS choosing performance state poorly for GPU use case
I am building a MacOS desktop app (https://anukari.com) that is using Metal compute to do real-time audio/DSP processing, as I have a problem that is highly parallelizable and too computationally expensive for the CPU. However it seems that the way in which I am using the GPU, even when my app is fully compute-limited, the OS never increases the power/performance state. Because this is a real-time audio synthesis application, it's a huge problem to not be able to take advantage of the full clock speeds that the GPU is capable of, because the app can't keep up with real-time. I discovered this issue while profiling the app using Instrument's Metal tracing (and Game tracing) modes. In the profiling configuration under "Metal Application" there is a drop-down to select the "Performance State." If I run the application under Instruments with Performance State set to Maximum, it runs amazingly well, and all my problems go away. For comparison, when I run the app on its own, outside of Instruments, the expensive GPU computation it's doing takes around 2x as long to complete, meaning that the app performs half as well. I've done a ton of work to micro-optimize my Metal compute code, based on every scrap of information from the WWDC videos, etc. A problem I'm running into is that I think that the more efficient I make my code, the less it signals to the OS that I want high GPU clock speeds! I think part of why the OS is confused is that in most use cases, my computation can be done using only a small number of Metal threadgroups. I'm guessing that the OS heuristics see that only a small fraction of the GPU is saturated and fail to scale up the power/clock state. I'm not sure what to do here; I'm in a bit of a bind. One possibility is that I intentionally schedule busy work -- spin threadgroups just to waste energy and signal to the OS that I need higher clock speeds. This is obviously a really bad idea, but it might work. Is there any other (better) way for my app to signal to the OS that it is doing real-time latency-sensitive computation on the GPU and needs the clock speeds to be scaled up? Note that game mode is not really an option, as my app also runs as an AU plugin inside hosts like Garageband, so it can't be made fullscreen, etc.
0
0
43
6h
Cannot use Metal graphics overview HUD with multiple CAMetalLayers
I have multiple CAMetalLayers that I render content to and noticed that the graphics overview HUD does not function properly when I have more than one CAMetalLayer. The values reported will be very strange. For example, FPS may report 999 or some large negative value. It the HUD simply not designed to work with multiple CAMetalLayers or MTKViews? When I disable all but one of my CAMetalLayers, the HUD works as expected.
0
0
104
1d
Metal Inline Functions
Hi! How to define and call an inline function in Metal? Or simple function that will return some value. Case: inline uint index4D(constant _4D& shape, constant uint& n, constant uint& c, constant uint& h, constant uint& w) { return n * shape.C * shape.H * shape.W + c * shape.H * shape.W + h * shape.W + w; } When I call it in my kernel function I get No matching function for call error. Thx in advance.
1
0
102
2d
Normally distributed MPSMatrixRandom number generation generates NaN
When generating large arrays of random numbers, NaNs show up. They also show up at the same indices when using the same seed, leading me to believe that this is a bug with MPSMatrixRandom's normally distributed Float32 random number distribution. Happens with both Philox and MTGP32. Is this intentional and how do I work around this? See the original post for a MWE in Swift and Julia: https://github.com/JuliaGPU/Metal.jl/issues/474
0
1
123
1w
Metal and NVIDIA graphic driver
Hi, A user sent us a crash report that indicates an error occurring just after loading the default Metal library of our app. Application Specific Information: Crashing on exception: *** -[__NSArrayM objectAtIndex:]: index 0 beyond bounds for empty array The report pointed me to these (simplified) lines of codes in the library setup: _vertexFunctions = [[NSMutableArray alloc] init]; _fragmentFunctions = [[NSMutableArray alloc] init]; id<MTLLibrary> library = [device newDefaultLibrary]; 2 vertex shaders and 5 fragment shaders are then loaded and stored in these two arrays using this method: -(BOOL) addShaderNamed:(NSString *)name library:(id<MTLLibrary>)library isFragment:(BOOL)isFragment { id shader = [library newFunctionWithName:name]; if (!shader) { ALOG(@"Error : Unable to find the shader named : “%@”", name); return NO; } [(isFragment ? _fragmentFunctions : _vertexFunctions) addObject:shader]; return YES; } As you can see, the arrays are not filled if the method fails... however, a few lines later, they are used without checking if they are really filled, and that causes the crash... But this coding error doesn't explain why no shader of a certain type (or both types) have been added to the array, meaning: why -newFunctionWithName: returned nil for all given names (since the implied array appears completely empty)? Clue This error has only be detected once by a user running the app on macOS 10.13 with a NVIDIA Web Driver instead of the default macOS graphic driver. Moreover, it wasn't possible to reproduce the problem on the same OS using the native macOS driver. So my question is: is it some known conflicts between NVIDIA drivers and the use of Metal libraries? Or does this case would require some specific options in the Metal implementation? Any help appreciated, thanks!
0
0
138
1w
Resolution for Games
Hi, When using a High Definition Display, is there a way to render at exactly the target resolution on the physical screen? My understanding is that the default behavior is to render to a backing store with a resolution (in pixels) which can be twice the size of the logical resolution (in points). Then we let the OS handle the down-scaling to the actual target resolution on the screen. This is all nice for non-graphics intensive apps, but it means that my game will render at a higher resolution than needed, which seems like an obvious loss of performance. My expectation is that, for graphics intensive application such as games, we should be able to query and render to the final resolution on the display. Can it / should it be done? Thank you for your help :) FYI I did find a document which explains how to setup your CAMetalLayer to render at a custom resolution. I suspect that this may be what I have to do?
2
0
312
1w
Proper way of handing opening ImmersiveSpace?
if you check the code here, https://developer.apple.com/documentation/compositorservices/interacting-with-virtual-content-blended-with-passthrough var body: some Scene { ImmersiveSpace(id: Self.id) { CompositorLayer(configuration: ContentStageConfiguration()) { layerRenderer in let pathCollection: PathCollection do { pathCollection = try PathCollection(layerRenderer: layerRenderer) } catch { fatalError("Failed to create path collection \(error)") } let tintRenderer: TintRenderer do { tintRenderer = try TintRenderer(layerRenderer: layerRenderer) } catch { fatalError("Failed to create tint renderer \(error)") } Task(priority: .high) { @RendererActor in Task { @MainActor in appModel.pathCollection = pathCollection appModel.tintRenderer = tintRenderer } let renderer = try await Renderer(layerRenderer, appModel, pathCollection, tintRenderer) try await renderer.renderLoop() Task { @MainActor in appModel.pathCollection = nil appModel.tintRenderer = nil } } layerRenderer.onSpatialEvent = { pathCollection.addEvents(eventCollection: $0) } } } .immersionStyle(selection: .constant(appModel.immersionStyle), in: .mixed, .full) .upperLimbVisibility(appModel.upperLimbVisibility) the only way it's dealing with the error is fatalError. And don't think I can throw anything or return anything else? Is there a way I can gracefully handle this and show a message box in UI? I was hoping I could somehow trigger a failure and have https://developer.apple.com/documentation/swiftui/openimmersivespaceaction return fail. but couldn't find a nice way to do so. Let me know if you have ideas.
1
0
183
1w
CompositorServices Or RealityKit
I have been concentrating on developing the visionOS application. While I am currently quite familiar with RealityKit, CompositorServices has also captured my attention. I have not yet acquired knowledge of CompositorServices. Could you please clarify whether it is essential for me to learn CompositorServices? Additionally, I would appreciate it if you could provide insights into the advantages of RealityKit and CompositorServices.
1
0
201
3w
Reasonable time for fix to easy-to-reproduce kernel panic?
Since I haven't heard so much as a peep from Apple on this, I thought I'd take a poll here on how long I could expect an easily reproducible (albeit possibly obscure) kernel panic to be fixed. I was under the impression that kernel panics were a big deal but it's been almost 2 months since I updated from macOS 14 to macOS 15.0 dev beta 7 / public beta 5 when I originally came across and reported a panic triggered while playing StarCraft II. I've been able to consistently trigger panics playing certain (maybe all) Co-op maps in SC2 and since my first report Aug 22, I've filed 8 additional bug reports, each automatically generated after hitting yet another panic. (I'm not sure exactly who is able to view these but for what it's worth, these are the reports I've filed so far: FB14886510, FB14905773, FB14960435, FB15304609, FB15391195, FB15467943, FB15468127, FB15491485, FB15491684.) A few other people have reported the issue to SC2's developer, Blizzard, and apparently Blizzard has acknowledged they're aware of the problem so it's safe to rule out the possibility of a hardware defect or other issue specific only to my computer. The logs point the blame at the AppleDCP driver, although I suppose the problem could technically be in the DCP firmware instead. Regardless, Apple's code is clearly at fault here. I'll admit the importance of a video game isn't exactly like keeping the power on at a hospital but I don't know why it would be deemed particularly unimportant either. At 53 days in, am I wrong to expect this to have been fixed by now or is Apple really being that slow?
0
0
187
3w
Page-Curl Shader -- Pixel transparency check is wrong?
Given I do not understand much at all about how to write shaders I do not understand the math associated with page-curl effects I am trying to: implement a page-curl shader for use on SwiftUI views. I've lifted a shader from HIROKI IKEUCHI that I believe they lifted from a non-metal shader resource online, and I'm trying to digest it. One thing I want to do is to paint the "underside" of the view with a given color and maintain the transparency of rounded corners when they are flipped over. So, if an underside pixel is "clear" then I want to sample the pixel at that position on the original layer instead of the "curl effect" pixel. There are two comments in the shader below where I check the alpha, and underside flags, and paint the color red as a debug test. The shader gives this result: The outside of those rounded corners is appropriately red and the white border pixels are detected as "not-clear". But the "inner" portion of the border is... mistakingly red? I don't get it. Any help would be appreciated. I feel tapped out and I don't have any IRL resources I can ask. // // PageCurl.metal // ShaderDemo3 // // Created by HIROKI IKEUCHI on 2023/10/17. // #include <metal_stdlib> #include <SwiftUI/SwiftUI_Metal.h> using namespace metal; #define pi float(3.14159265359) #define blue half4(0.0, 0.0, 1.0, 1.0) #define red half4(1.0, 0.0, 0.0, 1.0) #define radius float(0.4) // そのピクセルの色を返す [[ stitchable ]] half4 pageCurl ( float2 _position, SwiftUI::Layer layer, float4 bounds, float2 _clickedPoint, float2 _mouseCursor ) { half4 undersideColor = half4(0.5, 0.5, 1.0, 1.0); float2 originalPosition = _position; // y座標の補正 float2 position = float2(_position.x, bounds.w - _position.y); float2 clickedPoint = float2(_clickedPoint.x, bounds.w - _clickedPoint.y); float2 mouseCursor = float2(_mouseCursor.x, bounds.w - _mouseCursor.y); float aspect = bounds.z / bounds.w; float2 uv = position * float2(aspect, 1.) / bounds.zw; float2 mouse = mouseCursor.xy * float2(aspect, 1.) / bounds.zw; float2 mouseDir = normalize(abs(clickedPoint.xy) - mouseCursor.xy); float2 origin = clamp(mouse - mouseDir * mouse.x / mouseDir.x, 0., 1.); float mouseDist = clamp(length(mouse - origin) + (aspect - (abs(clickedPoint.x) / bounds.z) * aspect) / mouseDir.x, 0., aspect / mouseDir.x); if (mouseDir.x < 0.) { mouseDist = distance(mouse, origin); } float proj = dot(uv - origin, mouseDir); float dist = proj - mouseDist; float2 linePoint = uv - dist * mouseDir; half4 pixel = layer.sample(position); if (dist > radius) { pixel = half4(0.0, 0.0, 0.0, 0.0); // background behind curling layer (note: 0.0 opacity) pixel.rgb *= pow(clamp(dist - radius, 0., 1.) * 1.5, .2); } else if (dist >= 0.0) { // THIS PORTION HANDLES THE CURL SHADED PORTION OF THE RESULT // map to cylinder point float theta = asin(dist / radius); float2 p2 = linePoint + mouseDir * (pi - theta) * radius; float2 p1 = linePoint + mouseDir * theta * radius; bool underside = (p2.x <= aspect && p2.y <= 1. && p2.x > 0. && p2.y > 0.); uv = underside ? p2 : p1; uv = float2(uv.x, 1.0 - uv.y); // invert y pixel = layer.sample(uv * float2(1. / aspect, 1.) * float2(bounds[2], bounds[3])); // ME<---- if (underside && pixel.a == 0.0) { //<---- PIXEL.A IS 0.0 WHYYYYY pixel = red; } // Commented out while debugging alpha issues // if (underside && pixel.a == 0.0) { // pixel = layer.sample(originalPosition); // } else if (underside) { // pixel = undersideColor; // underside // } // Shadow the pixel being returned pixel.rgb *= pow(clamp((radius - dist) / radius, 0., 1.), .2); } else { // THIS PORTION HANDLES THE NON-CURL-SHADED PORTION OF THE SAMPLING. float2 p = linePoint + mouseDir * (abs(dist) + pi * radius); bool underside = (p.x <= aspect && p.y <= 1. && p.x > 0. && p.y > 0.); uv = underside ? p : uv; uv = float2(uv.x, 1.0 - uv.y); // invert y pixel = layer.sample(uv * float2(1. / aspect, 1.) * float2(bounds[2], bounds[3])); // ME if (underside && pixel.a == 0.0) { //<---- PIXEL.A IS 0.0 WHYYYYY pixel = red; } // Commented out while debugging alpha issues // if (underside && pixel.a == 0.0) { // // If the new underside pixel is clear, we should sample the original image's pixel. // pixel = layer.sample(originalPosition); // } else if (underside) { // pixel = undersideColor; // } } return pixel; }
1
0
194
3w
RealityKit crashes randomly in the simulator but not on the device
I'm writing a RealityKit/ARKit app that runs on iOS. Starting with Xcode 16.0 beta 1, at least through Xcode 16.1 beta 2 (16B5014f), in the iOS 18 simulator, my app randomly crashes in about 20% of app sessions the first time it attempts to present an ARView. The crashes seem to occur at multiple points within RealityKit and Metal. Below, I've included screenshots of the call stacks of the crashes, which occur as a result of both EXC_BAD_ACCESS and assertion failures within RealityKit. The app only crashes in the iOS 18 simulator, and does not crash in the iOS 17 simulator or earlier. The app only crashes in the simulator, and does not crash on a device running iOS 18. Before I investigate further, I'd appreciate it if an Apple engineer could give me a sense of if these crashes are most likely the result of known issues within RealityKit and/or the simulator, or if your opinion is that there are probably bugs in my app's code. I've submitted several feedback issues in the past, and I'd love to submit this issue too, but I expect that I would spend many hours attempting to create a repro case in a sample app. Understandably, I'd rather not spend this time if an Apple engineer could tell me this is a known issue, for example. Thank you.
5
0
323
2w