All posts in WebRTC

Enabling Real-Time Communication on the Web Platform

Mozilla’s manifesto describes the internet as an integral part of modern life and a key component in communication. However, communication on the web has far to go before it’s as rich as face-to-face communication. Real-time video communication on the web should be easy, rich, and readily available to developers in a way that proprietary formats can’t be.

That’s why a new project is spinning up at Mozilla called WebRTC (Real-Time Communication). WebRTC will allow developers to use the web platform to include video and audio conferencing as part of their websites and applications, both mobile and on the desktop. In its first phase, WebRTC will make webcam feeds a primary object in the browser, allowing sites to create rich interactions such as video calling and conferencing. In later phases, WebRTC will allow interactions like co-browsing, in which users can share their screen with a friend.

Privacy and Security

Privacy and security are major concern in enabling open video communication on the web. A face and voice are two of the most identifiable kinds of shareable data, and keeping users in absolute control of who has access to them is vital. As the IETF states in its WebRTC draft document, the ability for users to control access to their webcam, be able to cancel communication at any time, and not be eavesdropped upon are essential.

Some of the challenges we’ll face are in giving users the most accurate information possible about the site and caller who are requesting access to their webcam. Most requests for webcam access will simply be from a trusted site itself, but a malicious site could potentially try to gain access by embedding its call request within a trusted site. In this paper, Eric Rescorla outlines how potential “ad-hoc” calling attacks could come from ads in iFrames embedded within trusted sites.  Many other potential attacks need to be dealt with.  For instance, because WebRTC would be controlled by a web server rather than conventional real-time systems, web browsers might expose JavaScript APIs which allow a server to place a call. If access to such an API were unrestricted, sites could “bug” a user’s computer and capture video camera activity (Rescorla).

Even a trusted site could be compromised, both during a call or after. And, since the sites themselves would control and display the UI of the call itself, Firefox needs to give the user both constant indication that they are in a call and the ability to disconnect at any time.

User Interface

However, guarding against threats only goes so far towards keeping users in control of their webcam communication. Clear messaging, useful tools, and sensible defaults need to be in place for video conferencing to safely take root in the browser.

The first phase of enabling WebRTC will allow the most basic use case: giving a site access to a user’s webcam and microphone. The browser already serves as a mediator for other user data, such as location and access to cookies. Firefox usually asks for permissions using a door hanger notification. Door hangers stem from the URL bar to show the site is asking for a permission, and it extends past the content area to show that Firefox is the mediator of the permission request. Using a door hanger notification for WebRTC is both consistent within Firefox and correctly conveys visually that the site has requested access, and Firefox is asking the user for that permission.

Usually, these door hangers simply ask the user for a permission, and in a click the user can give it. However, webcam access requires a secondary stage: showing a preview of the webcam feed. This approach has three benefits:

  1. It gives users the ability to make sure their webcam and microphone work correctly
  2. If users had casually or accidentally accepted the webcam permission, nothing makes people more aware of what they’re about to transmit like showing them their own grubby mug
  3. It gives users the ability to fix their hair/put on a shirt/remove incriminating items from background before beginning call

In some ways, it’s unfortunate to ask users to pass through two dialogs to give webcam feed rather than one. After all, in most cases the site itself will be providing all necessary UI, and perhaps even a video preview before a call is initiated. So, this could all be redundant in many cases.  However, we cannot predict what purpose a site may be requesting webcam feed for, nor what UI will be in place for the user on that page. Even with all our efforts against security threats, any request for webcam access must be treated as potentially malicious.

Once a user has given a site access to their webcam and is likely engaging in face-to-face communication, that interaction should be given a heightened level of priority within the browser. For a user to lose that tab or forget they are broadcasting could range from mildly embarrassing to, well, use your imagination. If a user is actively sharing their webcam feed, they should be able to jump to the tab where data’s being shared or simply cut their webcam feed from anywhere within Firefox. This will require at the very least a toolbar-level Firefox control that appears once a user’s actively sharing.

Designing and implementing a new API is always a complex process.  If you’re interested in reading more or contributing to this project, here are some resources: