Exploring VR and the Future of Digital Work
using A-Frame and Zoom Meetings
One topic I have been interested in is the future of digital work. I am interested in ways that the interaction between humans and computers can be optimized as this will not only benefit my day-to-day job but change the way a large percentage of the population interacts with computers. There are two areas that I am concerned with:
- augmented interaction. For me, this includes augmented reality and brain-computer interaction. It seeks to define a new way for people to interact with computers without using screens or traditional feedback loops. I see this as experimental and typically used for consumer needs. It may be too early to see impact in the workplace but it is clear that the technology needs room to grow before mass adoption.
- standard/day-to-day interaction. This is how technology fits into an existing job. It is allowing you to do the same task as before with minimal change or adjustment. The biggest technology in this space is working in Virtual Reality.
The Virtual Workspace
Today, many of us use laptop and desktop computers to do work. These devices use screens to display data and use keyboards and a mouse or touchpad for input. I believe that brain computer interfaces will replace keyboards with time. However, what will replace the monitor? While we wait for augmented reality to advance, we can rely on virtual reality for replacement of monitors.
Advantages of Virtual Reality:
- Larger field of vision than a monitor. The means “windows” can be arranged and managed without the customer remembering which window is behind another. This won’t be perfect, but virtual reality should allow for easier application management.
- Three dimensions can make some data more intuitive. Traditionally, most desktop applications are built in two dimensions. Part of my job requires understanding data from a graph. The third dimension should allow for segmentation of another variable. Virtual reality should improve the ability to understand this data. This is difficult in our two dimensional applications mainly because of tooling. Building a full featured three dimensional operating system should make displaying three dimensional data the default.
- Novel operating environment. If work moves to virtual reality, our interactions between people will also likely move to virtual reality. I believe this may also unlock more collaborative experiences that don’t exist with two dimensional computers. It may also accelerate remote work as the operating environment will be virtual.
Disadvantages of Virtual Reality:
- Headset weight and heat. Wearing and using Virtual Reality headsets is uncomfortable. Given that good percentage of office work is over 7 hours, this is the biggest obstacle for adoption.
- Lack of common tooling. Unlike desktop, mobile, or web applications, 3d applications have historically been built for games and leisure. This tooling is accessible but doesn’t overlap with the rest of application development. Most of challenges are in the presentation layer which is unique to each framework. There isn’t yet a framework similar to the web and each set of tooling feels more like a locking in to “native” development.
- Fear/Uncertainty/Doubt. Doing things the old way is easy. Some people are scared of new ways of doing things and Virtual Reality presents a massive shift in human computer interaction.
- Sunken cost. If organizations have invested in 2d technology, adopting new physical devices must provide considerable returns for adoption.
I don’t think there is a single winner for 2d vs 3d operating environments. I think there is just enough tooling to build some early technology but there will be challenges with locking into the technology stack now. I worry that there will not be enough momentum to create a rich operating environment that makes building applications as simple as writing a web page.
The Virtual Operating System
If virtual reality headsets have been out for sometime, surely someone has built a virtual operating system. What does the existing landscape look like? Googling for VR Desktop should highlight the following projects. (also, take a look at this list on Product Hunt)
- vrdesktop: should be the first result from google. this appears to be desktop software to view 2d application in three dimensional space. I am not sure there is any framework here.
- supermedium: seems to be both a company and an application for the Oculus. It looks like development uses the A-Frame framework though I am not sure there are building blocks exposed to embed your application into their environment.
- Firefox Reality or Chrome for Daydream. These applications both provide access to web applications in VR. This may not be suitable if you are using applications that won’t ever support virtual reality (like most IDEs).
Similar to what was mentioned above, we can see a development from different approaches:
- bring desktop to VR (no tiling, management, or plug-able framework)
- explore and aggregate VR content (incompatible with legacy windows)
- port browser to VR (but don’t create the operating environment)
Given the above assessment of the limitations of each platform, I sought to create a proof of concept that would allow accessing native applications while creating a new operating environment for “native” applications.
My HackWeek Project: Desktop with Zoom
As part of my company’s HackWeek program, I decided to let my creativity get the best of me and finally explore this development area.
The goal of the project was to get my IDE (IntelliJ) into Virtual Reality. Some weeks prior, I had done some work using OBS as a way to capture windows, ossrs/srs for hosting, and dailymotion/hls.js for client-side streaming.
This proof of concept took part of a Sunday and it was pretty clear that even a bit more investment could take a the project further. I had learned that the client-side streaming code was the most difficult piece of the problem to solve and wondered how I might replace it.
Because my HackWeek project needs to run in a corporate environment, I didn’t want to run both OBS and srs on my desktop computer. The only alternative I could think of was classic window sharing in conferencing software. A Zoom meeting could replicate all three parts of the stack: client uploading screen buffers, sending or sharing the buffers with multiple clients, and client-side code to render the buffer.
The first thing I did was try and find a way to join a Zoom meeting on my Daydream headset. The first option was the VR Browser application. I believe I ran into some issues with Zoom rendering so I didn’t pursue buying this asset for use in Unity. In theory, this would be a great option for development (rending HTML in 3D is difficult!). I briefly considered using the Chromium Embedded Framework but I couldn’t really understand how to use it or if it would really solve my problem (it probably was already in use by the “VR Browser” application, too).
Given that Unity wouldn’t work as a framework, I figured the only option left was A-Frame. I had explored A-Frame some time ago and I am still very excited about using the Web as a platform for application development (Firefox OS, anyone?). A-Frame would mean my client-side application code lives and runs in the browser. While obvious, Zoom meetings should work on native mobile and desktop browsers. I wouldn’t need to render the whole page in VR. Rather, I would join the meeting as I normally would and then launch the VR experience after joining.
Most modern content on the Web is rendered using Canvas elements. Fortunately, canvas elements can be rendered in A-Frame so it should be a matter of copying and pasting this frame data.
To solve the multiple desktop windows in one VR experience, I would have to join multiple Zoom meetings as each meeting can only share one screen. The technology to load multiple independent webpages is the <iframe>
, so I decided to add a small form that appends a <iframe>
for every Zoom meeting.
Extracting the video buffers was the difficult part. I knew that I would be running this application on my Pixel so I figured I had two reliable browsers to choose from: Chrome and Firefox. Desktop Chrome applications allow for arbitrary code injection for particular web pages. However, this feature is not supported on their mobile version of Chrome. On the other hand, Firefox does a great job unifying their developer experience between mobile and desktop applications. This is truly amazing and must take a considerable amount of coordination and cooperation! They even had a demo of this feature in the examples section: https://github.com/mdn/webextensions-examples/tree/master/user-script . All that was required was setting up Android Developer tools, node.js, and their command line tool.
I injected a script that would find the <canvas>
element on the page and send the base64 encoded content via window.postMessage()
(documentation). I had quite a bit of trouble setting this buffer as the texture on the A-Frame element. I was receiving some mysterious errors (I think something like “operation not permitted”) which I figured was the browser tracking where this data had come from and not allowing it to be placed on a canvas element. Oddly enough, rendering the buffer to am <img>
element worked just fine. My work around was a script on a setInterval
that would copy the <img>
element to the <canvas>
.
Results
At the end of the project, I was left with the following demo of both Google Calendar and Google Mail on two separate planes in one <a-scene>
! Here are the screenshots from the demo:
Key Takeaways
- Although A-Frame exists, the 3d world and 2d world are largely incompatible. As mentioned above, this is a massive barrier for adoption and migration.
- Firefox has great developer tooling. We all know their documentation is amazing but I didn’t realize that their developer experience for their browsers is also just as great! Awesome job!
- A “virtual desktop” with desktop applications is possible (but it will probably be very slow). I didn’t have quite enough time to make the demo fast, but it wasn’t that far from being usable. The window shared with the native Zoom desktop client has under 10ms of delay whereas Zoom meetings launched in Chrome and shared window through Chrome had somewhere between 100ms and 500ms of delay. Unfortunately, you can’t join multiple Zoom meetings with the desktop client.
- Multiple Zoom meetings on a mobile browser make it slow. This shouldn’t come as any surprise, but I saw serious performance degradation with only two
iframe
windows. I needed to setdisplay:none
on theiframes
to even make the page usable after joining the meetings. If an application like this was developed, it would need to scale to multiple windows. There would need to be some serious throttling to make rendering more efficient (non-focused windows at different FPS rates).
Theoretical Next Steps
This project shows a framework where an arbitrary webpage could expose some video buffer for use in three dimensions. A window-specific or A-Frame-specific protocol could be built for web pages where more complex objects could be passed out of the website. The closest analog would be rending a Desktop website on your mobile phone. All websites started as Desktop websites before building an optional mobile version. Similarly, all “windowed” content used today can be interpreted as 2d content but could optionally share 3d content if the viewer was capable. This approach could allow a migration from old-to-new rather than building an independent platform and wishing/hoping people will migrate.
The End!
Hopefully, you enjoyed reading this and learned something new! If I was able to build a prototype in 3 days, imagine what would be possible with a month of work!