In visionOS, the virtual content is covered by the hand by default, so I want to know that in the hybrid space, if the distance of an entity is behind a real object, how can the object in the room be covered like the virtual content is covered by the hand?
RSS for tagIntegrate iOS device camera and motion features to produce augmented reality experiences in your app or game using ARKit.
We are using the ARKit image tracking feature on visionOS 2.0 with three pre-registered images. The image tracking works, but only one image is actively tracked at a time. When more than one target image is visible to the camera, it has difficulty detecting and tracking the other images.
Is this the expected behavior in visionOS, or is there something we need to do to resolve this issue?
We are currently working with the Enterprise APIs for visionOS 2 and have successfully obtained the necessary entitlements for passthrough camera access. Our goal is to capture images of external real-world objects using the passthrough camera of the Vision Pro, not just take screenshots or screen captures.
Our specific use case involves:
1. Accessing the raw passthrough camera feed.
2. Capturing high-resolution images of objects in the real world through the camera.
3. Processing and saving these images for further analysis within our custom enterprise app.
We would greatly appreciate any guidance, tutorials, or sample code that could help us achieve this functionality. If there are specific APIs or best practices for handling real-world image capture via passthrough cameras with the Enterprise APIs, please let us know.
I am new to learning about concurrency and I am working on an app that uses the HandTrackingProvider class.
In the Happy Beam sample code, there is a HearGestureModel which has a reference to the HandTrackingProvider() and this seems to write to a struct called HandUpdates inside the HeartGestureModel class through the publishHandTrackingUpdates() function. On another thread, there is a function called computeTransformofUserPerformedHeartGesture() which reads the values of the HandUpdates to determine whether the user is making the appropriate gesture.
My question is, how is the code handling the constant read and write to the HandUpdates struct?
I'm porting over some code that uses ARKit to Swift 6 (with Complete Strict Concurrency Checking enabled).
Some methods on ARSCNViewDelegate, namely Coordinator.renderer(_:didAdd:for:) among at least one other is causing a consistent crash. On Swift 5 this code works absolutely fine.
The above method consistently crashes with _dispatch_assert_queue_fail. My assumption is that in Swift 6 a trap has been inserted by the compiler to validate that my downstream code is running on the main thread.
In Implementing a Main Actor Protocol That’s Not @MainActor, Quinn “The Eskimo!” seems to address scenarios of this nature with 3 proposed workarounds yet none of them seem feasible here.
For #1, marking ContentView.addPlane(renderer:node:anchor:) nonisolated and using @preconcurrency import ARKit compiles but still crashes :(
For #2, applying @preconcurrency to the ARSCNViewDelegate conformance declaration site just yields this warning: @preconcurrency attribute on conformance to 'ARSCNViewDelegate' has no effect
For #3, as Quinn recognizes, this is a non-starter as ARSCNViewDelegate is out of our control.
The minimal reproducible set of code is below. Simply run the app, scan your camera back and forth across a well lit environment and the app should crash within a few seconds. Switch over to Swift Language Version 5 in build settings, retry and you'll see the current code works fine.
import ARKit
import SwiftUI
struct ContentView: View {
@State private var arViewProxy = ARSceneProxy()
private let configuration: ARWorldTrackingConfiguration
@State private var planeFound = false
init() {
configuration = ARWorldTrackingConfiguration()
configuration.worldAlignment = .gravityAndHeading
configuration.planeDetection = [.horizontal]
var body: some View {
ARScene(proxy: arViewProxy)
.onAddNode { renderer, node, anchor in
addPlane(renderer: renderer, node: node, anchor: anchor)
.onAppear {
.onDisappear {
.overlay(alignment: .top) {
if !planeFound {
Text("Slowly move device horizontally side to side to calibrate")
} else {
Text("Plane found!")
private func addPlane(renderer: SCNSceneRenderer, node: SCNNode, anchor: ARAnchor) {
guard let planeAnchor = anchor as? ARPlaneAnchor,
let device = renderer.device,
let planeGeometry = ARSCNPlaneGeometry(device: device)
else { return }
planeFound = true
planeGeometry.update(from: planeAnchor.geometry)
let material = SCNMaterial()
material.isDoubleSided = true
material.diffuse.contents = UIColor.white.withAlphaComponent(0.65)
planeGeometry.materials = [material]
let planeNode = SCNNode(geometry: planeGeometry)
struct ARScene {
private(set) var onAddNodeAction: ((SCNSceneRenderer, SCNNode, ARAnchor) -> Void)?
private let proxy: ARSceneProxy
init(proxy: ARSceneProxy) {
self.proxy = proxy
func onAddNode(
perform action: @escaping (SCNSceneRenderer, SCNNode, ARAnchor) -> Void
) -> Self {
var view = self
view.onAddNodeAction = action
return view
extension ARScene: UIViewRepresentable {
func makeUIView(context: Context) -> ARSCNView {
let arView = ARSCNView()
arView.delegate = context.coordinator
arView.session.delegate = context.coordinator
proxy.arView = arView
return arView
func updateUIView(_ uiView: ARSCNView, context: Context) {
context.coordinator.onAddNodeAction = onAddNodeAction
func makeCoordinator() -> Coordinator {
extension ARScene {
class Coordinator: NSObject, ARSCNViewDelegate, ARSessionDelegate {
var onAddNodeAction: ((SCNSceneRenderer, SCNNode, ARAnchor) -> Void)?
func renderer(_ renderer: SCNSceneRenderer, didAdd node: SCNNode, for anchor: ARAnchor) {
onAddNodeAction?(renderer, node, anchor)
class ARSceneProxy: NSObject, @preconcurrency ARSessionProviding {
fileprivate var arView: ARSCNView!
@objc dynamic var session: ARSession {
Any help is greatly appreciated!
Hey, captureHighResolutionFrame() produces the normal camera shutter sound and that really doesn't fit the ARKit context. I can't override it the usual way because there's no AVCaptureSession object in ARSession. Any ideas on what to do? Thanks!
When using the plane PlaneDetectionProvider in visionOS I seem to have hit a limitation which is that regardless of where the headset is in the space, planes will only be detected that are (as far as I can tell) less that 5m from the world origin. Mapping a room becomes very tricky as a result because you often find some walls are outside the radius, even if you're standing two feet away from a ten foot wall. It just won't see it. I've picked my way through the documentation but I cannot see any way to extend this distance. Am I missing something?
I am working on a React Native app, specifically on the iOS native module with RealityKit. An apparently unexplainable error keeps happening at runtime, as you can see from the following image:
Crash log from XCode When I try to retrieve the position of my AnchorEntity relative to the world space (so using relativeTo: nil), it triggers a runtime exception during some of its internal calls:
CoreRE: re::BucketArray<unsigned short*, 32ul>::operator[](unsigned long) + 204
As you can see from the code, my AnchorEntity is not null as there is a guard check. I also tried to move that code into an objective c static function in order to use @try @catch and catch runtime exceptions, to later realise that RealityKit is not compatible with ObjectiveC. Do you have any idea/suggestion on how to fix it/prevent it?
Dear all,
We are building an XR application demonstrating our research on open-vocabulary 3D instance segmentation for assistive technology. We intend on bringing it to visionOS using the new Enterprise APIs. Our method was trained on datasets resembling ScanNet which contain the following:
localized (1) RGB camera frames (2) with Depth (3) and camera intrinsics (4)
point cloud (5)
I understand, we can query (1), (2), and (4) from the CameraFrameProvider. As for (3) and (4), it is unclear to me if/how we can obtain that data.
In handheld ARKit, this example project demos how the depthMap can be used to simulate raw point clouds. However, this property doesn't seem to be available in visionOS.
Is there some way for us to obtain depth data associated with camera frames?
"Faking" depth data from the SceneReconstructionProvider-generated meshes is too coarse for our method. I hope I'm just missing some detail and there's some way to configure CameraFrameProvider to also deliver depth and/or point clouds.
Thanks for any help or pointer in the right direction!
~ Alex
We are developing VisionOS app now, we have applied the Enterprise API for visionOS, including Main Camera Access for Vision Pro, and already get the "Enterprise.license" in the mail apple sent us, we use the developer account import the license file into Xcode:
but in Xcode, we cannot find the entitlement of Enterprise API:
if we put into Entitlement file of the project manually,Xcode will alarm:
and we find that the app itself dont have "Additional Capabilities" which include the Enterprise API:
what should we do to have the entitlement file for the Enterprise API, so we can use the enterprise API?
I’m developing a visionOS app using EnterpriseKit, and I need access to the main camera for QR code detection. I’m using the ARKit CameraFrameProvider and ARKitSession to capture frames, but I’m encountering this error when trying to start the camera stream:
ar_camera_frame_provider_t: Failed to start camera stream with error: <ar_error_t Error Code=100 "App not authorized.">
VisionOS using EnterpriseKit for camera access and QR code scanning.
My Info.plist includes necessary permissions like NSCameraUsageDescription and NSWorldSensingUsageDescription.
I’ve added the entitlement as per the official documentation here.
My app is allowed camera access as shown in the logs (Authorization status: [cameraAccess: allowed]), but the camera stream still fails to start with the “App not authorized” error.
I followed Apple’s WWDC 2024 sample code for accessing the main camera in visionOS from this session.
Sample of My Code:
import ARKit
import Vision
class QRCodeScanner: ObservableObject {
private var arKitSession = ARKitSession()
private var cameraFrameProvider = CameraFrameProvider()
private var pixelBuffer: CVPixelBuffer?
init() {
Task {
await requestCameraAccess()
private func requestCameraAccess() async {
await arKitSession.queryAuthorization(for: [.cameraAccess])
do {
try await[cameraFrameProvider])
} catch {
print("Failed to start ARKit session: \(error)")
let formats = CameraVideoFormat.supportedVideoFormats(for: .main, cameraPositions: [.left])
guard let cameraFrameUpdates = cameraFrameProvider.cameraFrameUpdates(for: formats[0]) else { return }
Task {
for await cameraFrame in cameraFrameUpdates {
guard let mainCameraSample = cameraFrame.sample(for: .left) else { continue }
self.pixelBuffer = mainCameraSample.pixelBuffer
// QR Code detection code here
Things I’ve Tried:
Verified entitlements in both Info.plist and .entitlements files. I have added the entitlement.
Confirmed camera permissions in the privacy settings.
Followed the official documentation and WWDC 2024 sample code.
Checked my provisioning profile to ensure it supports ARKit camera access.
Has anyone encountered this “App not authorized” error when accessing the main camera via ARKit in visionOS using EnterpriseKit? Are there additional entitlements or provisioning profile configurations I might be missing? Any help would be greatly appreciated! I haven't seen any official examples using new API for main camera access and no open source examples either.
Now I'm developing a 3D motion capture app by using ARKit.
So I tested this sample code, but in iOS18, hand's and leg's orientations seems to be wrong.
Forrowing image is sample app's screen captures in iOS17 and iOS18.
How can I create a 3D model of clothing that behaves like real fabric, with realistic physics? Is it possible to achieve this model by photogrammetry? I want to use this model in the Apple Vision Pro and interact with it using hand gestures.
In my Vision OS app I am using plane detection and I want to create planes that have physics I want to create an effect that my reality kit entities rest on real world detected planes.
I was curious to see that the code below that I found in the Samples is the most efficient way of doing this.
func processPlaneDetectionUpdates() async {
for await anchorUpdate in planeTracking.anchorUpdates {
let anchor = anchorUpdate.anchor
if anchorUpdate.event == .removed {
if let entity = planeEntities.removeValue(forKey: {
planeAnchors[] = anchor
let entity = Entity() = "Plane \("
entity.setTransformMatrix(anchor.originFromAnchorTransform, relativeTo: nil)
// Generate a mesh for the plane (for occlusion).
var meshResource: MeshResource? = nil
do {
let contents = MeshResource.Contents(planeGeometry: anchor.geometry)
meshResource = try MeshResource.generate(from: contents)
} catch {
print("Failed to create a mesh resource for a plane anchor: \(error).")
var material = UnlitMaterial(color: .red)
material.blending = .transparent(opacity: .init(floatLiteral: 0))
if let meshResource {
// Make this plane occlude virtual objects behind it.
entity.components.set(ModelComponent(mesh: meshResource, materials: [material]))
// Generate a collision shape for the plane (for object placement and physics).
var shape: ShapeResource? = nil
do {
let vertices = anchor.geometry.meshVertices.asSIMD3(ofType: Float.self)
shape = try await ShapeResource.generateStaticMesh(positions: vertices,
faceIndices: anchor.geometry.meshFaces.asUInt16Array())
} catch {
print("Failed to create a static mesh for a plane anchor: \(error).")
if let shape {
entity.components.set(CollisionComponent(shapes: [shape], isStatic: true))
let physics = PhysicsBodyComponent(mode: .static)
let existingEntity = planeEntities[]
planeEntities[] = entity
extension MeshResource.Contents {
init(planeGeometry: PlaneAnchor.Geometry) {
self.instances = [MeshResource.Instance(id: "main", model: "model")]
var part = MeshResource.Part(id: "part", materialIndex: 0)
part.positions = MeshBuffers.Positions(planeGeometry.meshVertices.asSIMD3(ofType: Float.self))
part.triangleIndices = MeshBuffer(planeGeometry.meshFaces.asUInt32Array())
self.models = [MeshResource.Model(id: "model", parts: [part])]
extension GeometrySource {
func asArray<T>(ofType: T.Type) -> [T] {
assert(MemoryLayout<T>.stride == stride, "Invalid stride \(MemoryLayout<T>.stride); expected \(stride)")
return (0..<count).map {
buffer.contents().advanced(by: offset + stride * Int($0)).assumingMemoryBound(to: T.self).pointee
func asSIMD3<T>(ofType: T.Type) -> [SIMD3<T>] {
asArray(ofType: (T, T, T).self).map { .init($0.0, $0.1, $0.2) }
subscript(_ index: Int32) -> (Float, Float, Float) {
precondition(format == .float3, "This subscript operator can only be used on GeometrySource instances with format .float3")
return buffer.contents().advanced(by: offset + (stride * Int(index))).assumingMemoryBound(to: (Float, Float, Float).self).pointee
extension GeometryElement {
subscript(_ index: Int) -> [Int32] {
precondition(bytesPerIndex == MemoryLayout<Int32>.size,
This subscript operator can only be used on GeometryElement instances with bytesPerIndex == \(MemoryLayout<Int32>.size).
This GeometryElement has bytesPerIndex == \(bytesPerIndex)
var data = [Int32]()
for indexOffset in 0 ..< primitive.indexCount {
.advanced(by: (Int(index) * primitive.indexCount + indexOffset) * MemoryLayout<Int32>.size)
.assumingMemoryBound(to: Int32.self).pointee)
return data
func asInt32Array() -> [Int32] {
var data = [Int32]()
let totalNumberOfInt32 = count * primitive.indexCount
for indexOffset in 0 ..< totalNumberOfInt32 {
data.append(buffer.contents().advanced(by: indexOffset * MemoryLayout<Int32>.size).assumingMemoryBound(to: Int32.self).pointee)
return data
func asUInt16Array() -> [UInt16] {
asInt32Array().map { UInt16($0) }
public func asUInt32Array() -> [UInt32] {
asInt32Array().map { UInt32($0) }
I was also curious to know if I can do this without ARKit using SpatialTrackingSession. My understanding is that using SpatialTrackingSession in RealityKit I can only get the transforms of the AnchorEntities but it won't have geometry information to create the collision shapes.
I am working on a project that requires access to the main camera on the Vision Pro. My main account holder applied for the necessary enterprise entitlement and we were approved and received the Enterprise.license file by email. I have added the Enterprise.license file to my project, and manually added the entitlement to the entitlement file and set it to true since it was not available in the list when I tried to use the + Capability button in the Signing & Capabilites tab.
I am getting an error: Provisioning profile "iOS Team Provisioning Profile: " doesn't include the entitlement. I have checked the provisioning profile settings online, and there is no manual option for adding the main camera access entitlement, and it does not seem to be getting the approval from the license.
link -> double tap gesture deprecated in visionOS2.0. use only watchOS. right..?
so how can i make a double tap gesture in visionOS??
In visionOS, I want to make a watch. After the actual production, the display of the hand makes it impossible to see the watch due to the virtual watch.
How to set up the watch to give priority to display?
I have an application running on visionOS 2.0 that uses the ARKit C API to create anchors and listen for updates.
I am running an ARKit session with a WorldTrackingProvider (and a CameraFrameProvider, if that is relevant)
Then, I am registering a callback using ar_world_tracking_provider_set_anchor_update_handler_f
When updates arrive I iterate over the updated anchors using ar_world_anchors_enumerate_anchors_f.
Then, as described in the documentation, I walk around and hold down the Digital Crown to reposition the current space. This resets the world origin to my current position.
When this happens, anchor updates arrive. In most cases, the anchor updates return the new transform (using ar_world_anchor_get_origin_from_anchor_transform) but sometimes I get an anchor update that reports the transform of the anchor from before the world origin was repositioned. Meaning instead of staying in place in the physical world, the world anchor moves relative to me.
I can work around this by calling ar_world_tracking_provider_copy_all_world_anchors_f which provides me with the correct transform, but this async method also adds some noticeable delay to the anchor updates.
Is this already a known issue?
I stumbled across the function setWorldOrigin(relativeTransform:) from the ARSession which is documented here:
I made a custom ARSession where i override this function and print and modify the relativeTransform parameter. The print shows that this function is called with an updated relativeTransform value but it seems that it has no impact e.g. on the world origin when starting or continuing a scan, the tiny puppet house in RoomPlan or any tracking position that i get from ARKit.
Has anybody experience with this method or knows what parts are influenced by setWorldOrigin()?
We tried out our Unity-based AR app for the very first time under iOS 18 and noticed an immediate, repeatable crash.
When run in Xcode 16, we get this error message:
Assert: /Library/Caches/ : HasValidPose()
Assert: /Library/Caches/ : HasValidPose()
That's a blocker to us.
We're using Unity 2022.3.27f1.