Explore object tracking for visionOS

Explore object tracking for visionOS

Find out how you can use object tracking to turn real-world objects into virtual anchors in your visionOS app. Learn how you can build spatial experiences with object tracking from start to finish. Find out how to create a reference object using machine learning in Create ML and attach content relative to your target object in Reality Composer Pro, RealityKit or ARKit APIs.

Chapters
- 0:00 - Introduction
- 5:07 - Create reference object
- 9:28 - Anchor virtual content
Resources
Related Videos

WWDC24
WWDC23
- Meet Object Capture for iOS
Download

Hello, and welcome to “Explore object tracking for visionOS”. I’m Henning, an engineer on the Object Tracking team. Today, I’ll show you how to turn real world objects into virtual anchors that your visionOS app can bring to life using our new Object Tracking technology. You may have already created spatial experiences using Reality Composer Pro or our RealityKit and ARKit frameworks. You may also be familiar with the concept of anchoring virtual objects in people’s surroundings. For example, your visionOS app can use our RealityKit and ARKit APIs to place content relative to planes, images, or hands. Anchors are a great starting point to immersive experiences that blur the line between what’s real and what’s virtual.
With object tracking, we’re now adding support for using real world items as anchors in your app. Imagine everyday objects that reveal useful information once you glance at them. Or household appliances and devices that already come with a virtual manual. Or collectables and toys that come to life and launch you into an immersive storytelling experience. Sounds like magic? Well, let me show you! Here is a view of my space and I have a few items sitting on my desk. There is a globe, a microscope, and an oscilloscope. With Object Tracking, my app can get the position and orientation of each of these items, represented by a coordinate system and bounding box as shown here in my example. Now that my app knows exactly where my objects are sitting in spatial coordinates, I can take this one step further and augment them with interesting content.
Here is the view of my globe again, but this time there is a virtual label attached to it, telling me to tap on the globe for more information. Doing that reveals some objects orbiting around my globe. Oh, and there is a space shuttle launching into my room! Notice how the virtual moon and the space station disappear behind the real globe. I love this part, as it makes the experience feel even more immersive.
But wait, there is more! When I tap the globe again, I can see the inner core of the earth shown on top of my globe. Wow, this looks great! This is only one example I created using an object from my collection. But I’m sure you have objects of your own that you can bring to life, too. So let’s see how you can use Object Tracking for that.
We’ve made it really simple to use Object Tracking in your app.
There are just 3 steps you need to follow.
First, you’ll need to provide a 3D model representing the real-world object that you want to track. There are some easy-to-use tools to help you get a 3D model if you don’t have one yet. You’re going use it to train a machine learning model, required by Object Tracking. For the training part, you import this 3D asset into the Create ML app. Once training is completed, the result is a reference object, a new file type that we’re introducing for Object Tracking this year.
Finally, you use this reference object in your app to anchor your virtual content and create your experience. There are several tools and frameworks available for you to enable this last step, and I will cover a few examples later in this session. So, let’s first talk about the 3D model needed for object tracking.
As mentioned before, the Create ML app requires a 3D asset to train a machine learning model for your object. For this, your 3D asset needs to be in the USDZ file format. To ensure best tracking quality, your asset should be as photorealistic as possible. Essentially you’re looking for a digital twin of your real world object.
A simple way for you to obtain a photorealistic 3D model is by using our Object Capture technology.
All you need for this is an iPhone or an iPad. For objects that contain glossy or transparent parts you may also provide a multi-material asset obtained from any other acquisition workflow.
If you’re interested in learning more about the details and best practices for Object Capture, please checkout our “Meet Object Capture for iOS” session.
Now let’s look at what objects are supported by Object Tracking. It works best for objects that are mostly stationary in the surrounding. Also, aim for objects that have a rigid appearance in both shape and texture. Lastly, your objects should be non-symmetrical, meaning that they have a distinct appearance from all views. Earlier, I showed you my globe which works great because it has a non-symmetrical texture on top of it’s spherical shape. For capturing the globe, I removed the stand to ensure only the rigid part of my object is considered for setting up the tracking.
Let’s dive further into what it takes to make this work with your objects and your content. I’ve already covered some of the basics, but I’ll go into much more detail in this next section. I’ll begin by showing you how to create a reference object and then I’ll walk you through an example on how to use our existing tools to anchor virtual content to your real world items.
Let’s start with creating a reference object.
As mentioned earlier, Object Tracking requires individual machine learning training for each target object. We made this part really easy for you by integrating it in the Create ML app, which is the perfect fit for this task. All the machine learning training will run locally on your Mac.
When you launch the Create ML app, you can choose from a variety of templates, such as Image Classification or 2D Object Detection. This year, we’re introducing a category called Spatial with our new Object Tracking template.
The training workflow has three simple steps: First, configure the training session with your USDZ assets. Then, train your ML model locally on your Mac. And finally, save the reference object to build your spatial experience on Apple Vision Pro.
Let’s see how this was done for the example that I’ve shown earlier in the talk. When creating a new project with the Object Tracking template, CreateML launches a training configuration view with an empty 3D viewport. My next step is to simply drag and drop my USDZ file from my desktop into this viewport.
Here, I’m using the 3D asset of my globe that you’ve seen earlier. I find the 3D viewport very useful because I can view my 3D model from different angles and confirm that it actually matches the real-world object. It’s a good idea to verify that the scale displayed in the bottom right corner matches the actual dimensions of the object.
You can add multiple objects to your project to have them tracked, similar to the example I showed you earlier. I can simply click the plus-icon next to model sources in the left menu and import another USDZ asset.
There is one more configuration step needed before I can start training my model, selecting the most suitable viewing angle for my items. This helps to optimize the Object Tracking experience, depending on the type of object, and how they’re typically placed in your space. For example, many stationary objects may only be seen from an upright or front position. I can utilize this information to guide the machine learning training, in order to achieve the best tracking quality.
There are three viewing angle categories for you to chose from: All Angles, Upright, and Front. Let’s understand each of them in detail. You can find the viewing angle selection right below the 3D viewport.
For the globe asset that I’ve added earlier, let’s take a look at the actual item.
We can see that my globe is mounted on a stand and I can rotate it around an axis that is not aligned with gravity.
Because it can be seen from any angle, I select the “All Angles” option in my configuration view. When using this option, keep in mind that your object should have a distinct and unique appearance from all views in order to achieve high tracking quality. Here is another item from my collection, a microscope, which is expected to be standing on a surface, aligned with gravity. For that reason, I’m choosing the “Upright” setting, which excludes bottom viewing angles. Lastly, let’s take a look at my 3rd item - an oscilloscope. Again, I’m assuming this object will be standing on a surface but it also doesn’t need to be tracked from backside views. So I’m choosing the ‘Front’ option. This mode excludes backside and bottom viewing angles to limit Object Tracking support to only what matters for my spatial experience.
Notice that the 3D viewport displays a ground plane and a back plane as well as two axes pointing in the assumed up and front direction of the 3D model.
If your 3D model appears to be in the wrong orientation, for example facing backwards, you can use Reality Composer Pro to correct this before running the training.
Now that my configuration is done, I’m ready to start the training.
Going back to my project, I simply click the Train button in the top left corner. Training starts immediately and there is a progress bar to help me keep track of my project status. Training a reference object can take a few hours. The exact duration depends on your Mac configuration. And please note that this is only supported on Apple Silicon Macs.
Once the training is done, I can go to the output tab and save the resulting reference object.
You can find the Create ML app in the developer tools menu in Xcode. Create ML works for a variety of tasks and it allows you to train machine models beyond Object Tracking. If you’re interested in learning more about Create ML, please check out our “What’s new in Create ML” talk.
Now, let’s move on to anchoring virtual content to your reference objects.
There are multiple ways how you can create an immersive experience with a tracked object. You can anchor your virtual content with Reality Composer Pro and also use our new RealityKit and ARKit APIs. I like to begin my authoring process with Reality Composer Pro which offers me an intuitive way to edit and place my virtual content.
I’ll start off by creating a new Xcode project using the visionOS app template.
This will automatically create a default scene that I can open in Reality Composer Pro.
Switching over to Reality Composer Pro.
I’ll find the same default scene and I can delete the default sphere.
In this scene, I’ll first create an empty Transform entity and add an anchoring component to it.
This entity serves as the container for my object anchor. To facilitate object tracking, we've introduced a new target named "Object", which I'll go ahead choose.
Next, I’ll import the reference object generated with Create ML and associate it with my AnchoringComponent.
While I’m using Reality Composer Pro in this example, it’s worth noting that I can also use RealityKit APIs to create my AnchoringComponent at runtime. Moving on, you’ll notice that a semi-transparent visual cue of the original USDZ model appears in the viewport. This is especially helpful when I need to place content accurately with respect to specific parts of my target object. Let’s explore the scenes used in my globe experience. I’d like to show you how I’ve set this up to create some of the immersive effects in this example.
Remember the space shuttle that launched directly from my globe? Turns out I’ve chosen a distinct launch location: Cape Canaveral in Florida.
Using the visual cue in my viewport makes it easy for me to locate this spot on the globe and setup my space shuttle entity. Onto another immersive effect, how did I make the virtual moon and space station disappear as they orbit around my globe? Let’s take a look! I’m using a timeline animation for the moon and space station to make them orbit around the globe. My scene contains a separate USDZ globe entity attached as a child node to the Anchor Entity. This serves as the occluding shape in my scene. Since object tracking updates the transform of the parent anchor entity, the occluder will be aligned with the physical globe. To complete this part, I can apply an occlusion material on the USDZ globe entity using the ShaderGraph editor.
This makes the orbiting objects disappear once they move behind my globe entity.
Finally, I’ve also added a tap gesture to this occluder entity to switch between the two experiences using the Behaviors component. Let’s checkout what I’ve built so far in action on Apple Vision Pro. I can tap to play my first animation that will launch my orbiting objects and I can also see them disappear behind my globe.
Great! And here is my second animation after I tapped again - which works just as expected.
Nice. But so far my app is missing a way to tell me how to start this experience, especially if I’m new to this, not knowing which objects I’m looking for. To improve this, I’ll add a coaching UI to my app that displays a preview of the target object up to the point when it gets detected by Object Tracking. And I’ll also add a virtual label to explain how to interact with my globe.
The RealityKit API provides an extensive toolset to help me achieve all of this and more. Let me go over the steps to implement my coaching UI. First, I’d like to display a preview 3D model to help me find the right object in my space. My coaching UI should react to changes in the AnchorState, so I need to check for that in my code. Once the object is tracked, I’d like show a transition where the displayed 3D model transforms to the position of the anchorEntity. After that, I’ll add a virtual label with instructions to tap on the globe to begin.
This code sample shows how to display the 3D model of my target object in the coaching UI. I retrieve it from the reference object file with a little help from the ARKit API, then load the USDZ file just like any other model entity. I’ve also set it’s opacity to 50% to indicate this is a preview entity. And lastly, add it to my scene for display.
To know whether the object is being tracked, let’s first find the object anchor entity I had earlier created in Reality Composer Pro. Then in the update loop, I can check for the entity’s isAnchored flag state and decide what to display in both cases.
For my transition animation, I want the object preview to move towards the tracked object’s position when tracking starts. For this, I need to get the anchor’s transform data. I use a SpatialTrackingSession to ask for the correct authorizations. Then I can access the transform of my object anchor and implement an animation that uses it.
Finally, I’m adding a virtual label near the globe telling me how to start this experience. With RealityView attachments, it’s easy to place SwiftUI elements on RealityKit anchor entities. First, I define the SwiftUI elements in the attachments section under RealityView. Then in the scene setup, I can find this UI entity and add it as a child note of a reference transform previously defined in Reality Composer Pro. Let’s checkout these additions on Apple Vision Pro.
This time, before my globe gets detected, the app displays a preview of the target object for me to know what I’m looking for. Once the tracking begins, this preview moves towards the target object, guiding my view to follow it. And I can see a virtual label to start the experience.
Great! This is a good moment to wrap up my example.
We’re also releasing a new ARKit API for Object Tracking this year. It gives you access to your tracked objects’ bounding boxes along with the corresponding USDZ files, as we’ve seen in the previous section. The API delivers refined information about when your objects are ready to track, or whether there were any issues, so that your app can react to such events in a controlled manner.
Along with the new API, we’re also publishing an object tracking sample app for you to download and try out on your device.
To learn more, please check out the “Create enhanced spatial computing experiences with ARKit” session. And that’s all there is to explore for object tracking on visionOS. Object tracking can unlock new use cases for spatial computing that require precise placement of virtual content on real-world items. I’ve only shown you a few examples of what’s possible with this new technology, but there is so much more to discover. Reality Composer Pro and RealityKit offer a large variety of exciting features for you to build great spatial experiences beyond what I’ve covered here. I highly recommend that you check out these sessions to learn more.
My entire team and I are really excited to see which ideas and objects you will bring to life.

13:55 - Coaching UI - display object USDZ preview

// Display object USDZ
struct ImmersiveView: View {
   @State var globeAnchor: Entity? = nil
    var body: some View {
        RealityView { content in
            // Load the reference object with ARKit API
            let refObjURL = 
            Bundle.main.url(forResource: "globe", withExtension: ".referenceobject")
            let refObject = try? await ReferenceObject(from: refObjURL!)

            // Load the model entity with USDZ path extracted from reference object
            let globePreviewEntity = 
            try? await Entity.init(contentsOf: (refObject?.usdzFile)!)

            // Set opacity to 0.5 and add to scene
            globePreviewEntity!.components.set(OpacityComponent(opacity: 0.5))
            content.add(globePreviewEntity!)
        }
    }
}

14:13 - Coaching UI - check anchor state

// Check anchor state
struct ImmersiveView: View {
   @State var globeAnchor: Entity? = nil
    var body: some View {
        RealityView { content in
            if let scene = try? await Entity(named: "Immersive", in: realityKitContentBundle) {
                globeAnchor = scene.findEntity(named: "GlobeAnchor")
                content.add(scene)
            }
            let updateSub = content.subscribe(to: SceneEvents.AnchoredStateChanged.self) { event in
                if let anchor = globeAnchor, event.anchor == anchor {
                    if event.isAnchored {
                        // Object anchor found, trigger transition animation
                    } else {
                        // Object anchor not found, display coaching UI
                    }
                }
            }
        }
    }
}

14:31 - Coaching UI - Transform space with SpatialSession

// Transform space
struct ImmersiveView: View {
   @State var globeAnchor: Entity? = nil
    var body: some View {
        RealityView { content in
            // Setup anchor transform space for object and world anchor
            let trackingSession = SpatialTrackingSession()
            let config = SpatialTrackingSession.Configuration(tracking: [.object, .world])
            if let result = await trackingSession.run(config) {
                if result.anchor.contains(.object) {
                    // Tracking not authorized, adjust experience accordingly
                }
            }
           // Get tracked object's world transform, identity if tracking not authorized
            let objectTransform = globeAnchor?.transformMatrix(relativeTo: nil)
            // Implement animation
            ...
        }
    }
}

Looking for something specific? Enter a topic above and jump straight to the good stuff.

An error occurred when submitting your query. Please check your Internet connection and try again.

Chapters

Resources

Related Videos

WWDC24

WWDC23