When Apple unveiled Vision Pro, the developer community split into two camps: those who saw it as the future of computing, and those who dismissed it as an expensive novelty. After shipping multiple visionOS apps over the past year, we've landed firmly in the first camp — but with a lot of hard-won lessons about what actually works in spatial computing.
This post is a look at the real-world patterns, pitfalls, and design decisions we've navigated while building for Vision Pro at RainByte Studios.
Unity PolySpatial vs. Native: When to Use What
The first decision every Vision Pro project faces is the toolchain. Apple offers native development with SwiftUI and RealityKit. Unity provides PolySpatial, which lets existing Unity projects render in visionOS's shared space. Both are production-ready in 2026 — but they serve very different use cases.
We use Unity PolySpatial when:
- The project involves complex 3D scenes, physics, or game mechanics that already exist in Unity
- Cross-platform deployment matters (Quest, SteamVR, iOS, and Vision Pro from one codebase)
- The team's primary expertise is in C# and Unity's ecosystem
We go native with SwiftUI + RealityKit when:
- The app is primarily UI-driven with occasional 3D elements
- Tight integration with visionOS system features (SharePlay, Persona, system gestures) is essential
- Performance is paramount and every frame counts
- The client wants the smallest possible binary and fastest startup time
The best approach isn't one or the other — it's knowing which tool fits the problem. We've shipped projects using both, and sometimes a hybrid approach where the UI layer is native SwiftUI while the 3D experience runs through Unity.
Eye Tracking Changes Everything
Designing for eye tracking is the single biggest mindset shift from traditional UI design. On Vision Pro, the user's gaze is the cursor. There's no mouse, no touch target in the traditional sense. Your interface needs to respond to where someone is looking — and it needs to do it without being creepy or distracting.
Key lessons we've learned:
- Hover states need to be obvious but subtle. A gentle scale-up (1.0 → 1.05) with a soft highlight works better than dramatic colour changes. The user's eyes are already telling them where they're looking — your UI just needs to confirm it.
- Minimum target sizes are bigger than you think. Apple's Human Interface Guidelines suggest 60pt minimum, but in practice we've found 72pt+ leads to much fewer accidental activations.
- Depth matters for selection clarity. When elements are at different z-distances, eye tracking precision drops. Keep interactive elements on the same plane whenever possible.
Spatial Audio is Not Optional
One of the most underestimated aspects of Vision Pro development is spatial audio. In a headset where the display wraps around your field of view, sound that doesn't come from the right direction breaks the illusion instantly.
We treat spatial audio as a first-class design element, not a nice-to-have. Every interactive element should have an audio response — a soft click when a button is tapped, a whoosh when a panel slides in, a gentle ambient hum that anchors the user in the experience.
RealityKit's spatial audio engine is genuinely excellent. It handles head tracking, room modelling, and distance attenuation out of the box. Our advice: invest time in audio design early. It will make your app feel 10x more polished.
Hand Gestures: Keep It Simple
Vision Pro supports a rich vocabulary of hand gestures — pinch, drag, rotate, zoom. The temptation is to use all of them. Don't.
In our experience, the best Vision Pro apps use two or three gestures max. Pinch-to-select is universal. Pinch-and-drag for positioning. And maybe a two-hand pinch-and-pull for scaling. Anything beyond that and users start fumbling, especially in the first session.
Custom gestures can work for power users, but they need clear onboarding. We've had the best results with progressive disclosure — start with the basics, and introduce advanced gestures only after the user has demonstrated comfort with the core interactions.
Performance on visionOS: What We Watch
Vision Pro runs at 90fps with reprojection to 96fps. Dropping below that threshold is immediately noticeable and can cause discomfort. Here's what we optimise for:
- Draw call budgets. Keep it under 150 draw calls for shared space apps. Full immersive mode gives you more headroom, but not infinite.
- Shader complexity. RealityKit's built-in materials are heavily optimised for Apple Silicon. Custom shaders should be used sparingly and profiled aggressively.
- Asset streaming. Large 3D models should be streamed and LOD'd. Vision Pro has excellent memory bandwidth, but loading a 200MB model on launch will tank your startup time.
What's Next for Us
We're currently exploring enterprise use cases for Vision Pro — training simulations, architectural visualisation, and remote collaboration tools. The hardware is mature enough now that businesses are asking for real deployments, not just demos.
If you're considering building for Vision Pro — whether it's a game, a productivity tool, or something entirely new — we'd love to talk. This platform is where computing is heading, and there's a real advantage to building expertise now.