Developing for the Apple Vision Pro

An overview of the considerations and challenges we faced while developing Avatour for the Apple Vision Pro, illustrating the current gap between promise and reality, and how we managed to close it. Includes some detail on WebXR and Unity implementation for the AVP.

Developing for the Apple Vision Pro

March 12, 2024

Avatour

There’s obviously a lot of excitement within the world of immersive developers for the new Apple Vision Pro. We’ve predicted and analyzed and anticipated Apple’s entry to the XR - excuse me, spatial computing – space for years, and the device is finally here in all its glass-and-aluminum glory. Now we get to see what we can do with it.

TL;DR: Here’s what we did with it:

I believe we’ve demonstrated the very first live 360° video on Apple Vision Pro, and certainly the first real-time 360° video call. Our beta release is coming soon. But there were a lot of learnings along the way, which is the point of this article. 

I won’t provide my own in-depth assessment of the Vision Pro as a product, as it’s been done already with great skill. (Here’s my favorite take.) I will offer that I found the hardware form factor a bit disappointing (the much-maligned Quest Pro is much more comfortable), but the overall software experience - the UX - is indeed a breakthrough. As it did for the personal computer and the cell phone, Apple has created a coherent and fully formed interaction concept for XR that sets the standard for everything to come after.

But: what’s it like to develop for? Or, more precisely - what’s it like to port a successful existing XR experience to the Vision Pro?

Crash Course

In early February, with the shrink wrap from the lovely Apple packaging still on our floor, we received an invitation to join the Apple Vision Pro Workshop at Founders Inc. This turned out to be a terrific opportunity to work in their beautiful San Francisco offices, with views of the Golden Gate, alongside almost two dozen other developers. A quick shout-out and thanks to the Founders Inc. gang for putting this together - it was a terrific environment and great fun!

I’ve been on the business side at Avatour for years, but I still remember enough about coding to be occasionally dangerous. With our dev team mostly committed to long-standing customer requirements, and with only one AVP in our possession, I wound up digging into the development myself. This entailed a crash course in developing on Apple platforms in general, as well as for the Vision Pro. Currently there are basically three options to get something running on the device.

Apple’s own SwiftUI language makes it very easy to get started with VisionOS development. It allows you to conjure up most major UX elements with just a few lines of almost script-like code. If you’re building a new app from scratch and you don’t need to extend beyond the Apple user ecosystem, there’s little reason to look further. But there are other options.

Unity is a game development platform which – mostly due to a lack of interest in XR from the other main game development platform, Unreal Engine – has a de facto monopoly in the XR world. The vast majority of apps for the Meta Quest platform, including Avatour, are built on Unity. Unity announced support for the Vision Pro last Summer, claiming it would be relatively easy to port existing VR apps to the new platform.

Finally, there is WebXR. Formerly WebVR, and not to be confused with the too-early-for-its-own-good VRML, WebXR is the latest attempt to create an immersive version of the Web. Chrome and Firefox have supported WebXR for years, and before the release of the Vision Pro, Apple announced they would include “experimental” support for WebXR in Safari. (But only for Vision Pro. iOS and MacOS users, you’re out of luck, for now at least.) WebXR has the advantage of not requiring an app download; but it has the very distinctive disadvantage of requiring the user to set a bunch of obscure “feature flags” in a settings menu.

So in figuring out how to bring Avatour to VisionOS, we had a decision to make.

Decision Time

We dug into the details on our three potential options: SwiftUI, Unity, or WebXR. Right off the bat, there are major differences to consider.

The Vision Pro permits apps to run in “Shared Space” or a “Full Space.” Shared Space apps can be run simultaneously in separate windows that can be repositioned by the user, sort of a giant desktop. Full Space apps take over the complete UX, and furthermore can run in three different “Immersion Styles”: mixed, progressive, and full. (For more information on this basic paradigm, check out this Apple developer training video.)

We figured the ideal Avatour implementation should permit operation in either a Shared or Full Space, so that users could decide whether or not to be fully immersed. In the Shared Space, Avatour should work similarly to other AVP apps, in a resizable, movable window. In Full Space mode, our app should ideally use the progressive immersion style, so that the digital crown could be used to “dial” the amount of immersion up or down.

Now, we’re not starting from scratch. We have an existing Web client for our plain old desktop/mobile customers – think Shared Space – and an existing Unity app for our VR users – which translates to Full Space, Full Immersion. We’d ideally want to reuse some of that existing code in order to get our AVP client out quickly. So how do we get from here to there?

At this point, the choirboys start singing, “you can’t always get what you want...” 

tough choices

Among the potential development options, only SwiftUI enables use of the “progressive” immersion style. But for those of us who have complex existing XR apps, a complete re-write in SwiftUI is a daunting prospect. We didn’t know what other issues we might run into in Unity or WebXR, but we knew we didn’t have the resources to rebuild our entire app in SwiftUI, at least not without a solid business case for the AVP user community.

So we set aside SwiftUI and began work on both other options, hoping that one or the other of them might prove feasible – and quickly.

Porting a Unity VR app to Vision Pro

A lot of XR developers are in the same boat we are - they have an existing Unity app for Meta Quest, and would love to get it running on Vision Pro. But as we learned, porting an existing VR app – especially a complex one like Avatour – is no walk in the park.

First, there are significant inherent limitations that come with Unity on AVP. As mentioned above, the “progressive” immersion style is flat out not supported, and it’s not currently on the Unity VisionOS roadmap, so don’t expect it to ever become available. Also, if you want to deploy a “Shared Space” version of your app, there doesn’t appear to be any tooling for Apple UI elements. This is a bigger problem than it might seem: in our new gaze/pinch world, the SwiftUI interface paradigm is quite specific and important to maintain for usability and comfort. Showing users windowed apps that don’t have that familiar UX will not be a good user experience.

OK. So we were willing to abandon our hopes for progressive immersion and a “shared space” version, and set out to build a “fully immersive” version of our app on Vision Pro - basically a straight port. Unity’s instructions on porting a VR app to Vision Pro are where we started.

We got pretty far pretty quickly. We had already converted our app to Universal Rendering Pipeline, and after disabling some third-party components, we actually got the app to build and run on Vision Pro. Awesome! And so we saw the Avatour login screen on the Vision Pro for the first time!

And then … there was no way to proceed further. 

Unity Polyspatial doesn’t support keyboard text entry.

OK, maybe we skip the login scene, for now. Build a version that hard-codes the login, and go straight into an Avatour session. And we know live videoconferencing will be tricky, so let’s just try to get 360° video playback to work. Well, it would be a whole project to build our custom video player for Vision Pro, maybe we can use the Unity stack?

Ah. Unity Polyspatial doesn’t work with the Unity video player.

Hm. 

OK, so let’s just go straight to live 360° videoconferencing. We have our own WebRTC wrapper for Unity … can we get that to compile for VisionOS?

Well, no. Our wrapper has a dependency on libwebrtc, the massive multi-platform open source project which for very good reasons relies on the GN build system which in turn relies on the Ninja build subsystem, which in turn relies on the ld64 linker for Apple devices, which hasn’t been updated recently and has no idea what a Vision Pro is. We got as far as getting object files built for VisionOS, but couldn’t manage to get it to link.

At this point we were 72 hours from demo day and it was becoming increasingly clear that no matter how many short cuts we took, building on Unity was going to be a long road. So we tried something else.

Porting a Web app to WebXR on Vision Pro

Before even getting our hands on the Vision Pro hardware, we had access to the Vision Pro simulator, and with it the early version of Safari with the “experimental” WebXR support. We tested various WebXR video playback code on the device and discovered it didn’t work. More precisely, after entering immersive mode, the video playback paused inexplicably, and could not be started again, no matter how many play commands were sent.

After getting the hardware in our hands, we tried again, looking at WebXR test pages like this one. We found the same result: 360 images worked great, but 360 videos would pause.

We also looked a bit into the interaction paradigm. We discovered that hand tracking worked great, but selecting things … well, not so much. It turned out that the select callback was not triggered by the gaze/pinch action – in fact, not supported in any form in the initial release of WebXR on Safari. Kind of an important thing to include, wouldn’t you think?

These two basic issues led us to focus on Unity for much of our efforts. It was only after we were well and truly defeated by dependencies that we came back to WebXR. And here, we caught a break.

In late February, Apple released a beta 1.1 update to VisionOS. This, it turned out, updated WebXR to include the select action. And this inspired us to dig in deeper to what was causing the playback failure.

Our crack three.js programmer Pavlo was the one to figure it out. It wasn’t that the video wasn’t playing; in fact, decoding was continuing to occur in the background. It’s just that the texture was not updating. Once Pavlo started updating the texture explicitly, suddenly, we had video!

Next up was to test the WebRTC functionality that had been so problematic in Unity. And here, it was web technologies FTW! WebRTC worked just, well, normally. Just like it would in a browser. Because it is in a browser.

And that, basically, was that. At 11:55pm the night before demo day, we saw live 360° video on the Vision Pro for the first time. We’re now working to clean up the user experience for a beta release, but there are no technical blockers at this point.

Concluding Thoughts

If you’re building a new, ground-up app for the Vision Pro, there’s no choice to be made: go directly to SwiftUI.

If you’ve got an existing Unity app, you have a difficult choice to make. The Unity team is moving quickly to add capability to Polyspatial - in fact, in the two weeks since we did this work, both keyboard support and video support have been released. You can expect a lot of these basic issues to get solved. But some core limitations remain - notably, the lack of a proper Swiftian UX for windowed elements in “Shared Space” mode, and any support for the “progressive immersion style” at all.

Like a lot of XR devs, we’re hoping Unity will support ‘Unity as a Library’ on VisionOS. This would potentially allow us to create a Swift UX for the lobby and settings scenes, switching to Unity only for the fully immersive portion of the experience. This might not be the best of both worlds, but it’s at least a way to bridge them.

WebXR is obviously not a great option at the moment, buried as it is behind several layers of settings menus. For our application, though, it largely fills the bill, and with minimal development to boot. WebXR is quite effective on the Meta browser, so the question isn’t whether WebXR could work on the Vision Pro, but whether Apple will let it. Apple has a long and ugly history of slow-walking cross-platform web technologies on iOS in order to push devs (and revenue) toward native apps. A cynic would expect the same from their VisionOS strategy. But maybe evidence of market penetration would influence their decisions.

So if you want to do something for the future of cross-platform XR, and you happen to have dropped $3500 on a headset, please go enable WebXR on your Vision Pro today!

Stay Up to Date with Avatour

Sign up for our monthly newsletter highlighting the latest
company news and industry insights.