How can I access the player’s camera vector in VisionOS, specifically using RealityKit?
In Unity and other game engines, there’s often an API like Camera.main.transform.forward for this purpose. I’ve found the head anchor but haven’t identified a way to obtain the forward vector in RealityKit.
Is there a related API for this? Any guidance would be greatly appreciated.
Thanks!
Discuss spatial computing on Apple platforms and how to design and build an entirely new universe of apps and games for Apple Vision Pro.
Post
Replies
Boosts
Views
Activity
How can I access the player’s camera vector in VisionOS, specifically using RealityKit?
In Unity and other game engines, there’s often an API like "Camera.main.transform.forward for this purpose. "
I’ve found the head anchor but haven’t identified a way to obtain the forward vector in RealityKit.
Is there a related API for this? Any guidance would be greatly appreciated.
Thanks!
I haven't found a way to programmatically position the main view in visionOS apps, which seems intentional. While this aligns with user-controlled window placement, it creates a challenge: as users move, they must constantly reposition the main window manually.
A potential solution could be a feature that quickly brings the window to the user, perhaps via a custom gesture. This might improve user experience significantly.
Given my current understanding of visionOS, I may be missing something. I'd appreciate any insights or alternative perspectives on this issue. Thoughts?
When using ARView of RealityKit, I can code like this let results = arView.raycast(from: point, allowing: .estimatedPlane, alignment: .any) to get the 3D position of where I tap on the plane. In iOS 18, we can use RealityView and I found that unproject(_:from:to:ontoPlane:) may implement the same function, but I don't know how to set the ontoPlane parameter.
Can someone help me with some code snippets?
Hi,
I'm building a windowed game in visionOS which requires a gamepad. And I'm using a PS5 controller for this case.
However, I found a few problems:
When looking at the window close button, and press X in the gamepad, the window will be closed. This can happen accidentally when the user is playing.
When press the PS button(the home button), I want my app to handle it, e.g. take the user to the title screen. But visionOS will always capture this input and opens the home screen(App Library).
How to avoid these 2 issues? Or in general, how to make the gamepad input only available to my app but not the visionOS system?
I have a RealityView displaying a Reality Composer Pro scene in window. Things are generally working fine, but the content seems to be appearing in front of and blocking the VisionOS window, rather than being contained inside it. Do I need to switch to a volumetric view for this to work? My scene simply contains a flat display which renders 3D content (it has a material that sends different imagery to each eye).
I am trying to rotate topEntity around the origin point of shapeEntity, but have not found a way to do so.
topEntity is an entity group that also contains shapeEntity, so I cannot set topEntity as a child of shapeEntity.
From Blender I set the correct origin of topEntity, but when I import the usd model into Reality Composer Pro it does not save the origin point and there is no way to set the origin in Reality Composer Pro.
DragGesture()
.targetedToEntity(where: .has(CustomComponent.self))
.onChanged({ value in
let rotation = -Float(value.translation.height)
let clampedRotation = min(max(rotation, 0), 45)
if value.entity.name == "grab"{
if let topEntity = selectedEntity.findEntity(named: "top"),
let shapeEntity = selectedEntity.findEntity(named: "Shape_1"){
topEntity.transform.rotation = simd_quatf(
angle: clampedRotation * .pi / 180,
axis: SIMD3(x: 0, y: 0, z: 1)
)
}
}
})
In VisionOS versions 2.1 and 2.2, I’m encountering a significant limitation when using the .immersionStyle(selection: .constant(.mixed), in: .mixed) mode, specifically in mixed immersive style. Here’s a breakdown of the behavior:
In full immersion mode (.immersionStyle(selection: .constant(.full), in: .full)), users can interact with and manipulate system windows while inside a 3D model, allowing typical interactions like moving windows, pinching, or activating UI switches.
However, in mixed immersive mode, using the exact same layout “inside” a 3D model (which doesn’t visually obstruct the window), users are unable to interact with window content or move the window. Basic interactions like pinching or toggling switches require users to physically touch these elements in AR space, which is inconsistent with the behavior in full immersion.
From a usability perspective, this restriction seems unnecessary, as the software should ideally allow for similar interaction capabilities across both immersive styles. The expected behavior is to enable window manipulation within a 3D model in mixed mode, matching the functionality observed in full immersion.
The scene in question is a House in which the user is placed during the immersion that's why I am referring to the user being "Inside" of the scene.
Has anyone else experienced this or found a workaround?
Rendering the scene onto a RenderTarget with twice the resolution of the Drawable, and then downsampling to the Drawable, causes the image to appear distorted.
Modifications were made on the Xcode VisionOS template
Foveation should be enabled by default
struct ContentStageConfiguration: CompositorLayerConfiguration {
func makeConfiguration(capabilities: LayerRenderer.Capabilities, configuration: inout LayerRenderer.Configuration) {
configuration.depthFormat = .depth32Float
configuration.colorFormat = .bgra8Unorm_srgb
let foveationEnabled = capabilities.supportsFoveation
configuration.isFoveationEnabled = foveationEnabled
let options: LayerRenderer.Capabilities.SupportedLayoutsOptions = foveationEnabled ? [.foveationEnabled] : []
let supportedLayouts = capabilities.supportedLayouts(options: options)
configuration.layout = supportedLayouts.contains(.layered) ? .layered : .dedicated
}
}
To avoid errors, rasterizationRateMap is not set.
var renderPassDescriptor = MTLRenderPassDescriptor()
renderPassDescriptor.colorAttachments[0].texture = self.renderTarget.currentFrameColor
renderPassDescriptor.renderTargetWidth = self.renderTarget.currentFrameColor.width
renderPassDescriptor.renderTargetHeight = self.renderTarget.currentFrameColor.height
renderPassDescriptor.colorAttachments[0].loadAction = .clear
renderPassDescriptor.colorAttachments[0].storeAction = .store
renderPassDescriptor.colorAttachments[0].clearColor = MTLClearColor(red: 0.0, green: 0.0, blue: 0.0, alpha: 0.0)
renderPassDescriptor.depthAttachment.texture = self.renderTarget.currentFrameDepth
renderPassDescriptor.depthAttachment.loadAction = .clear
renderPassDescriptor.depthAttachment.storeAction = .store
renderPassDescriptor.depthAttachment.clearDepth = 0.0
//renderPassDescriptor.rasterizationRateMap = drawable.rasterizationRateMaps.first
if layerRenderer.configuration.layout == .layered {
renderPassDescriptor.renderTargetArrayLength = drawable.views.count
}
The rendering process is as follows:
We used real-time object tracking, and with enterprise permissions, we can improve the smoothness to 30Hz, but there are still noticeable delays. On one hand, we want to know why this delay occurs; is it due to performance considerations? Because we found that the delay in hand tracking is actually very low.
On the other hand, we consider that it may be due to the complexity of 3D objects, so I considered using image tracking. However, we found that there are even more serious delays in image tracking and QR code tracking. We hope to optimize it. Currently, the frame rate for recognizing images for tracking seems to be one frame per second, and we hope to increase it because object recognition and tracking can be very smooth on other Apple platforms, such as iOS.
Additionally, can we appropriately consider interfaces for depth recognition to obtain depth data?
We want to know what accuracy vision can achieve in measuring the physical world, as well as the accuracy in rendering on the screen. We wonder if this is related to hardware devices like radar. Also, what accuracy can we achieve in tracking the movement distance of objects?
Is it possible to access the raw lidar measurements before the sceneDepth calculation is done to combines the lidar measurements with visual data. In low light environments the lidar scanner should still work and provide depth info but I cannot figure out how to access those pure lidar depth measurements. I am currently using:
guard let frame = arView.session.currentFrame,
let depthData = frame.sceneDepth?.depthMap else {
print("Depth data is unavailable.")
return
}
but this is the depth data after sensor fusion occurs and fails in low light conditions.
Despite being enrolled, I am utterly unable to locate any option to download 2.2 beta on my Vision Pro. All I see in the system update is that I'm uptodate with 2.1.
How do I locate beta download option?
thanks
We are currently using ObjectCapture from ARKit, and we would like to fix exposure time, white balance parameter and ISO. How can we do this ?
Additionally, we'd like to obtain the following information from the ARKit : white balance parameters (in case we cannot fix them) and color correction matrices ?
If I put an alpha image texture on a model created in Blender and run it on
RCP or visionOS, the rendering between the front and back due to alpha will result in an unintended rendering. Details are below.
I expor ted a USDC file of a Blender-created cylindrical object wit h a PNG (wit h alpha) texture applied to t he inside, and
t hen impor ted it into Reality Composer Pro.
When multiple objects t hat make extensive use of transparent textures are placed in front of and behind each ot her,
t he following behaviors were obser ved in t he transparent areas
・The transparent areas do not become transparent
・The transparent areas become transparent toget her wit h t he image behind t hem
the order of t he images becomes incorrect
Best regards.
I'm having trouble re-setting the position of a child entity during app re-load even though it appears that I am correctly obtaining and persisting the correct translation values after a drag gesture.
The problem exists when I drag a child element to a new location (persist those new values) then reload the app to force re-positioning from persisted translation values.
I notice that the parent relationship changes during interaction (tap or drag) which can be seen in the debug statements. I'm wondering if this is related to the problem, or, if the parent change is normal during re-rendering and is un-related to my problem.
My thought process is since we care about relative translation values when persisting, if the parent relationship is changed just before persistence, then, are we persisting and setting the wrong values?
Project Link: Private
STEPS TO REPRODUCE
Run the app.
Drag the pre-loaded stage down the Y axis so that the floor of the stage is more visible to your eye (in order to better visualize the problem).
Tap the button in the timeline to create a new project.
Drag the only visible element from the left panel onto the timeline (element is labeled f_works_entity_1).
There should now be a green 3d model added to the stage.
Drag this green element to a new location (be careful to hover over the green element so that you don't inadvertently drag the stage).
Re-run the app to see that the green element is offset to a new location, not the last dragged location.
To reset and try again, delete the project canvas next to the project name (trash button) then restart the app.
Areas of concern:
RealityKitView is the only file you may need.
Line 119 is where we create new child entities
Lines 185-219 is where we persist and apply persisted values.
You can also search FIXME in the file to see areas of concern.
Tip:
I have a tap gesture on each entity that produces a debug statement with info about the entity and its parent including IDs.
When using a trackpad (or screen-shared Mac) with the Vision Pro, moving your attention to a new window or app immediately refocuses the mouse cursor, which in many circumstances is really useful. But in circumstances where there is a viewer-only window, that window jumping gets in the way. Imagine a 3d object editor of some sort, with a live viewer in a second window, maybe a browser. Manipulating the 3d object with the mouse in the editor gets continually interrupted when looking at the live viewer because the cursor jumps to the viewer window.
Is there anyway to reject that focus?
In visionOS beta, when using ARKit for image detection, the initially detected AnchorUpdate status is .add, and subsequent detections of the same image are marked as .update. However, after toggling immersiveSpace, the same image is detected with the status .add again. After updating to visionOS 2.1, the first detection status remains `add, and subsequent detections of the same image remain .update, even after toggling immersiveSpace. Could this be due to a change in processing flow?
We would like to create an Immersive video and store the video file locally in Vision Pro for viewing.
By Immersive video, I mean the video that is played at the end of the Vision Pro experience at the Apple Store (LeBron's dunk, Curry's 3-point shot, tightrope walk, etc.). It is unclear if a way is currently provided to view Immersive video locally.
I can find some information about Spatial video on the Dev site, but I can't find any information about Immersive video. My understanding is:
Spatial video:
A video window appears in space and plays video with depth. Up to 4K side-by-side video can be converted to MV-HEVC format using Xcode and played back in the Photos app.
Immersive video:
180VR video, but I’m not sure how it was created. Similar to Spatial video, I converted a side-by-side 180VR video to MV-HEVC format using Xcode, but it could not be played back in the Photos app as expected.
Vision Pro's Photos app features an Immersive button during video playback, but this appears to be for zooming in on Spatial video to the full field of view, which seems different from Immersive video.
The demo video provided by Apple is streamed from Apple TV, and there are no local files available.
We are currently considering creating an app that displays different videos to each eye, but we prefer not to go this route due to licensing and distribution issues.
I have a visionOS app that plays audio using AVAudioEngine and presents both a window and an immersive space. If I close the window, the audio session gets interrupted and attempting to restart the session and audio engine has no effect. I need to dismiss the app, then reopen it, which reopens the main window, in order for audio to start playing again.
This is in all visionOS 2 betas. Note that I have background audio enabled for my app.
I am developing with Apple Vision Pro to implement object tracking functionality, but each model needs to go into Create ML for training, and the training time is very long. Are there other ways to shorten training time while obtaining reference files in the same format?
Additionally, can the delay in object tracking be further optimized? Although the refresh rate has been optimized, there is still a noticeable delay.