Streaming is available in most browsers,
and in the Developer app.
-
Build immersive web experiences with WebXR
Discover how WebXR empowers you to add fully immersive experiences to your website in visionOS. Find out how to build WebXR experiences that take full advantage of the input capabilities of visionOS, and learn how you can use Simulator to test WebXR experiences on macOS.
Chapters
- 0:00 - Introduction
- 1:55 - Discover WebXR
- 6:15 - Integrate WebXR
- 23:15 - Test and debug WebXR
Resources
- A-Frame WebXR framework
- Adding a web development tool to Safari Web Inspector
- babylon.js – webGL and WebXR library
- Forum: Safari & Web
- PlayCanvas – webGL and WebXR library
- Safari Release Notes
- three.js – webGL and WebXR library
- Web Inspector Reference
- WebKit Open Source Project
- WebKit.org – Bug tracking for WebKit open source project
- WebXR Chess Garden Demo
- WebXR Device API | W3C
- Wonderland Engine
Related Videos
WWDC24
-
Download
Hi, my name is Ada Rose Cannon. I am a Safari engineer working on web standards. Today I'm going to show you how to add immersive virtual reality experiences to your website for the WebXR API.
In case you're not familiar with WebXR and what it can do, let's start with a demo.
Here I am, on the home view. I can launch Safari and see a web site which I built earlier. There is a button that says Launch VR Experience. I glance at it and tap. Now Safari is asking for my permission to let the website launch an immersive experience. I tap Allow and the chess garden appears all around me.
In front of me is a chess board where I can play chess against the computer using natural input to select pieces and move them around. I built this demo to help me get a deeper understanding of chess. I learned a lot whilst building it. Although you wouldn't believe it by how quickly I lose here, I can leave this virtual reality experience by pressing the digital crown, or using the gesture to get to the home view. This exits the experience and takes me back to the web page in Safari. I find it amazing creating experiences for the web using WebXR. I love building something on my Mac and then visiting my creation in full scale. That was previously only in my imagination and on my monitor. WebXR is available to all Apple Vision Pro users this full.
This session will introduce WebXR. Show you how to add virtual reality capabilities to your website, and take advantage of the features specific to the Vision Pro.
And how to test and debug your website on visionOS.
Let me begin by introducing WebXR.
The chess demo you just saw showed virtual reality in action. We were transported into a virtual experience built with hardware accelerated graphics through WebGL. Safari on visionOS 2.0 supports immersive virtual reality sessions on the Vision Pro.
Virtual reality, VR, and the related technology, augmented reality, AR, when talked about together are called XR. The API to use XR technology on the web is called WebXR. WebXR is a web standard that is developed, within the W3C's, immersive web working group.
The group design WebXR to enable immersive experiences on the web, which as well as working cross-browser, can also work on a wide variety of XR hardware with minimal changes. The API is being designed to be, if not future-proof, at least future-resistant, a difficult task in a field where newform factors, and interaction models are not uncommon. The standard is designed, so that in addition to trying to be robust, it also prioritizes user privacy and security.
Because the web can be a scary place, any new web APIs can be used by bad actors as well as well-meaning developers. From the very beginning, user protections are built into the WebXR standard, which has been designed to not take the user by surprise and be easy to exit.
The security all begins at the server. All WebXR content requires HTTPS. This way, XR content cannot be injected by machine in the middle attacks.
If the WebXR experience is embedded into a web page with an iFrame, then the page which contains the iFrame needs to include the HTML attribute, allow equals XR-spatial- tracking in the iFrame's HTML tag.
This requirement prevents third-party code like advertisements from launching VR experiences without you, as the developer, allowing them to do so.
All WebXR experiences require a user interaction to request to start a session. A website must have a button or some other affordance, where the user takes the first action to begin an XR session. You cannot just drop users into XR. There is no launching a session as soon as the page loads. The user must signal they want XR.
This prevents sessions from launching from any interaction with the web page, surprising and disorienting users with the sudden immersive experience. On visionOS, Safari will ask the user if the site should be allowed to launch the immersive experience. A WebXR session may also choose to request persistent hand tracking. If they do, then a second prompt will ask for consent for that.
It's important that users are informed and clear about what they are granting.
The WebXR specification encourages implementations to take efforts to preserve user privacy throughout a session. On visionOS, there is a gaze and pinch interaction model, but knowing what the user is looking at is considered very sensitive information. In WebXR and Safari, on visionOS, spatial inputs only reveal where the user is looking at, at the moment of a pinch, and WebXR sessions only know where the user's hands are, whilst they are pinching.
The WebXR standard mandates that the system interaction is reserved for exiting a WebXR session. This requirement ensures that there is a way for users to quickly leave a session, if it is making them uncomfortable. Users can easily exit the immersive experience at any time, by pressing the crown, or using the gestures to get to the home view. As a designer or developer, you can, and perhaps should, provide user interface in your experience, to make it easy to exit. But you should also expect users to just leave by a system interaction, if they are in a hurry.
At every level of WebXR, from the network connection, through the duration of a session and ending a session, user safety and privacy is a priority. Firstly, how does WebXR differ from typical Web development? The web has many ways to display content to users through the document. There is the traditional HTML and CSS method. There is the canvas element for rendering pixels directly, and there is WebGL, for hardware accelerated computer graphics.
WebXR sessions, once they have started, hide the document and browser window.
And WebXR sessions exclusively use WebGL for rendering the content. So before we dive into WebXR, we shall take a look at rendering WebGL, starting with the WebGL hello world, rendering a single square.
First, set up your canvas, start by writing a fragment in vertex shaders and language with LSL, then define the vertices of shape in 3D space, then call to WebGL to render your square. Remember to brush up, in your vector mathematics and your matrix algebra.
As you can see, that was a lot of code for such a simple output.
No one will ever expect you to work solely with raw WebGL. It's rarely necessary. Not even I dabble with raw WebGL. It's just not a sensible way for me to work. Fortunately, there are a wide variety of WebGL frameworks, to help us get started quickly. Each of these libraries are appropriate for different levels of experience, with both graphics and JavaScript. They also have WebXR support integrated. So even though WebXR isn't too difficult to use directly, they often provide abstractions for WebXR too. 3JS, Babylon and PlayCanvas, each use JavaScript, but have a slightly different syntax.
Wonderland engine has a graphical interface to provide high level tools for building your scene.
Finally, there is A-Frame. It is declarative. It's based on custom HTML elements for describing a 3D environment. Let's take a closer look at A-Frame.
I quite like introducing A-Frame to people new to WebXR. It is nice and concise. The code to get started with a simple demo can fit on a single screen. Its custom HTML elements give you a feel for putting together a scene built with WebGL, whilst giving web developers a familiar way of working. HTML elements and events. It has an active community and plenty of community built components. And when you are ready to get your feet wet by building your own components with JavaScript, then you'll be pleased to discover the declarative HTML elements about using the 3JS library. So you can take what you learn with A-Frame to your future projects. Let's take a look at rendering something similar to the red square example, with A-Frame.
After including the libraries JavaScript file in the document head, I can use their suite of custom HTML components. I add the scene element which wraps the A-Frame content and I add a red box, place it eye level and slightly into the scene. The box has the default width, height and depth of 1 meter. Meters are the units which WebXR uses. Since it's in real scale, the virtual objects have real sizes.
Great, we already have the box working. The library handles the WebXR parts for us, and even includes the button to start a WebXR session. Let's build things up a little. Here I have described some additional primitive shapes, such as a sphere and a cylinder. I also give it a sky box and a plane, which I rotate so that it lies on the ground.
Next, I add the cursor component to the a-scene. This uses a ray caster, to find what the spatial inputs are pointing at. And fire virtual events on them. The virtual events are named to match similar events in the 2D document. Click, mouse enter and mouse leave, even though they aren't made using a real mouse.
Finally, I attach the animations to the shapes I want to be animated.
And that's the complete code, for this demo.
Here is the result. It's a variation on the A-Frame hello world. It demonstrates how the cursor component, lets us look and pinch the fire events, on objects. With a little imagination, you can see how you can construct more complete immersive scenes from these foundations. In A-Frame, you don't really need to touch the WebXR API directly.
But once you need to start extending components, or getting deeper into other frameworks, it can be really useful to know how WebXR works. I've broken the WebXR lifecycle into 4 segments. Before you have a session, you should find out what can even be supported.
The first thing you want to do before using WebXR, is to make sure that it's even possible to start a session. You can use the Is session supported method, from the Navigator.XR object, to check if an immersive VR session is supported.
Next, if VR is supported, you can add a button to let the user enter XR. More than likely, your library of choice will take care of requesting a session. But we will take a look at how it works because requesting a session, has some interesting options.
When you request your session, you include any additional features you want, such as hand tracking. Requesting a session is what triggers the dialogue to pop up, to let the user allow the session.
There are two features lists. If you can handle a feature not being present, then put it in the optional features list. If you are building something that absolutely cannot handle a particular feature being missing, then put it in the required features list. But be warned, if any required features cannot be provided, either because it is not supported or the user denies access to it, then your request for a session will be rejected. So use optional features wherever you can. Once you have your session, you can request your reference space. The reference space describes the origin of your coordinate system.
Local floor is pretty commonly available, and it is useful because the origin of the scene is near the user's feet. It's good for standing experiences.
Stepping away from WebXR for a second. When you are doing standard animation in the web, you use request animation frame from the window. To get a call back for when the next frame on the window is due to be rendered. You perform the changes you want to make for that frame. And then you call, request animation frame again to queue up the actions for the next frame. This is known as a frame loop.
The WebXR frame loop is very similar, but a typical XR device from that different frame timing is compared to the computer driving the experience.
A web browser will typically run at 60 frames per second. WebXR runs much faster. So the WebXR session provides its own request animation frame on the session, which is synced to the WebXR display rather than to the window.
Here we establish our frame loop by immediately requesting the next frame to call this function again. The information you are probably most interested in with WebXR are are numerical positions of the various tracked objects. There is no global coordinate system for XR. Instead all the positions are given in reference to the reference space which we requested after we started the session.
In WebXR spaces such as a controllers target race base represent locations but don't have numerical values. The numerical position is known as a pose which contains the actual coordinates at the time of the next frame.
This you can use for interactions and rendering.
You use the getPose methods on the frame for getting these poses.
The session can end when you as a developer call the method to end the session or when the user requests the browser to end the session themselves.
For example on Vision Pro by pressing the digital crown. When the session ends, for any reason the end event is fired to let you know at this point you can re-show the buttons to enter WebXR are again. We've just taken a look at WebXR from beginning to end. Let's take a closer look at what happens during a session. Because as well as the regular animation frames there are also interesting events to help you work with spatial inputs.
Traditional XR headsets have hardware controllers which as well as buttons are also tracked in 3D space so you know the position and orientation. Some also use hand tracking. Hand tracking inputs contain information about what a user's hand is doing. Enough information to let you animate a 3D model of each of the user's hands. These inputs in WebXR are known as tracked pointers.
VisionOS has natural input. Natural input uses a combination of gaze and pinch to interact with the virtual content. We've worked within the W3C to add support for this input type to WebXR. In WebXR these inputs are known as transient pointers. Because they only exists whilst the user is making a gesture.
The XR session contains a list of all the input sources which you can iterate over.
XR input sources represents some sort of spatial input method.
Each XR input input source has a target ray space which if you draw a line along the negative Z direction represents what the XR input source is pointing towards. For transient pointer, this space goes from the user's eyes towards what they are looking at. Inputs may have a grip space too which represents where it is being held. For transient pointer, this represents where the thumb and the finger are meeting. Transient pointer inputs are unusual and that the only exists for the duration of an interaction.
When the interaction starts, a number of events will be fired. Input sources change for when an input is added or removed and select events to describe a select action. Let's stick into a play by play of what happens in WebXR are when a user pinches in visionOS.
Initially, there are no inputs. The website doesn't know what I'm looking at or what my hands are doing. Then I pinched my thumb and my finger together.
A new input is created and added to the session's input sources list. And its target ray mode tells us that this is a transient pointer.
An input sources change event was fired to let us know the list was updated and this new input was added.
And immediately, a select start event is also fired on the input because we are selecting.
Next, I move my hand around a little.
There are no new events, although the positions represented by the grip space and the target ray space get updated.
The grip space still follows the point where my thumb and my finger touch. The target ray space does not continuously track my gaze. It instead moves as if it's attached to my hand, letting me adjust the ray after the initial pinch by moving my hand.
The select event is fired to let us know that the gesture has successfully completed.
Select end lets me know that the gesture is finished regardless of whether it is completed successfully. Having two events is useful for situation for an input with loose tracking or get cancelled partway through an action. Use select end for cleaning up and select for actions to be performed once the user successfully confirmed something. Finally, the input is removed.
And a new event is fired, letting you know that the input sources list has changed.
Now we are no longer pinching, we are back where we started. There are no inputs and the website does not know what my hands or gaze are doing.
Okay, now we have seen the step by step. Let's take a look at that chest demo from earlier and notice what has happened.
At the start, the website has no idea what I am looking at or where my hands are. This is great for my privacy and the website can still deliver a great experience without this knowledge.
I am going to move this pawn.
I look at it and then pinch my finger and thumb together.
At the instant I pinched, this created a new input with the grip space at this pinch point and the target race base from my eyes towards the pawn.
A select start event is fired.
This event contains a reference to the frame and input source. This is everything I need to find out the object I was looking at. The website uses a ray caster to pick the object I looked at.
Each frame, the experience is going to move this object and sync with the target race base as I move my hand around. Notice how it doesn't follow my gaze, but instead moves with my hand.
I then place it where I want, then release the gesture.
The select end event is fired and the website places my piece on the nearest square. The input is removed and the website no longer knows what my hand is doing or where I am looking.
I've talked a lot about what happens without hand tracking. Whether it's our on visionOS supports hand tracking too, you have to request it as a feature when you start your session.
If you request and get granted hand tracking, then any detected hands will be added available as tracked pointer inputs. These are inputs they can be expected to be available as long as they can be tracked.
This input has a hand object with joint spaces on the input, so you can display the user's hands however you please using WebGL. Which you really must do, because if the user cannot see their own hands, it can get pretty uncomfortable.
Perhaps the biggest difference between having hand tracking enabled and not is that if you request hand tracking, you are expected to draw the user's hands yourself with WebGL. But without hand tracking, we will show the user's real hands. And when hand tracking is enabled, transient pointer inputs are still available. On visionOS, all select events are associated with the transient pointer inputs. So you can build experiences which make use of both hand tracking and natural input together.
This does mean that if you request hand tracking, and the user pinches with both hands at the same time, you could have up to 4 inputs simultaneously.
This is the sort of thing you can do with hand tracking. It is very useful for precise close-up interactions.
But range actions with hand tracking can be a little tricky to perform precisely. In this example, I am using the proximity of two fingers to let me pick up objects by their handles and move them around. To do this, I had to compare the distance between finger tips and work out if the center point was intersecting the handle.
You can make use of transient pointer for both close-up interactions and for interacting with objects remotely.
And it is still available when hand tracking is enabled.
Transient pointer and hand tracking are very cool, but the web is many exciting things you can use. Even though the document is hidden during WebXR, you still have access to many of the standard web APIs you would use when building a traditional website.
These are some features which work especially well with WebXR.
Outside of the WebXR specific inputs, you can still use traditional game paths like PlayStation 5 controllers. Its buttons and analog sticks could be very useful for piloting a little VR vehicle. Speech synthesis and speech recognition\ are great for building vocal interfaces, where you can speak commands to perform actions and hear back a response. One of the most important things to pair with great stereographic is atmospheric audio and sound effects.
The web audio API's panner node is great for building spatial soundscapes for a truly immersive experience.
It really adds to the immersion when an object makes sounds that seem to come from the object itself.
Now we have introduced WebXR, let's take a look at how to test and debug on visionOS specifically. When it comes to testing your content, you can use a real Vision Pro or a visionOS simulator. For this example, we will use the simulator. WebXR requires a secure context to use, and aside from a HTTPS website, the other secure context is local host, on your development machine. Here I am already running a HTTP server of the chest demo from earlier. I can open the page and I can test the button to launch an XR session. I can use the WAS and D keys to move around in 3D space. I can right click and drag to rotate the camera. Clicking on the window during WebXR will simulate transient pointer inputs.
Note that you can't test hand tracking through the simulator. For that you'll need a real device. I've set up some logging on this page to help me with performance. It tells me the current frame rate and the number of draw calls. We can use Web Inspector in macOS Safari to inspect this page to ensure it is working correctly.
The page is available from the develop menu. And from here I can see my logs and I can inspect, pause and step through the JavaScript, just like I would for a local web page.
I primarily use the simulator for rapid iteration when building the experience. Then I would push the changes to the server and use the real device to verify my changes.
You can find out more information about debugging and testing on visionOS on this WebKit blog post. Which should hopefully get you set up for building WebXR experiences for the Vision Pro. Even if you don't own a device yet.
Also watch Patrick's talk from WWDC23, Rediscover Safari developer features, for a complete view into Safari developer tools. WebXR is new on Apple platforms and it is still early days. We are really excited to see what you'll be able to build WebXR. And I hope you'll keep following these developments into the future.
Even if you don't have a Vision Pro, the simulator is a great way to ensure your WebXR project works with the visionOS and the new transient pointer inputs. If you've never tried WebXR before, I encourage you to give it a try yourself. There are large and active communities for WebXR and WebGL. With great resources help developers new to graphics coding get started.
Check out the resources associated with this session and watch the related sessions. Thank you for joining me today and enjoy WWDC.
-
-
Looking for something specific? Enter a topic above and jump straight to the good stuff.
An error occurred when submitting your query. Please check your Internet connection and try again.