Today’s topic is speaker selection. If you’ve never given it much thought, or consider it straightforward, this post is for you. Let’s start by testing your general WebRTC knowledge:
Pop Quiz: If a website wants to play out of different speakers on your system, what permission must it have?
-
- Speaker-selection permission
- Microphone permission
If you answered 2 then chances are you know your WebRTC stuff well, but you’re probably on a Chromium browser.
Pause for a moment. Why would websites need microphone permission to control speaker output? Why should users expose themselves to possibly being recorded just to redirect songs to their portable speakers? They shouldn’t. It’s a permission escalation and an entirely unnecessary invasion of privacy. Nevertheless, in some browsers this remains the only answer today.
Thankfully, Firefox provides the more straightforward and privacy-preserving answer of 1.
In this post, we’ll look at how speaker selection works in different browsers, how permissions can be abused for fingerprinting purposes, and how to add speaker selection with a fallback so it works in all browsers.
How speaker selection works in Firefox (and the spec)
On desktop, Firefox supports the standards-track navigator.mediaDevices.selectAudioOutput()
API. You only need a <button>
to invoke it:
speakers.onclick = async () => { const info = await navigator.mediaDevices.selectAudioOutput(); await videoElement.setSinkId(info.deviceId); speakers.innerText = `${info.label}...`; localStorage.previouslyUsedSpeakersId = info.deviceId; }
Passing in a previously used deviceId
you wish to use is optional, but allows you to skip a prompt in most cases.
If you’re in Firefox right now, you can put on your headset and try it here:
Pressing the button brings up Firefox’s speaker picker and updates to reflect your choice. Press the reset button that appears on the page to go back to the dynamic OS default. For non-Firefox users reading this article, or if you’re on your phone, here’s what this looks like in Firefox on desktop:
And yes, my external Bluetooth speakers really are named “Marley Get Together”!
Users can sometimes change their speakers through the OS as well, but each website decides how to handle this. I will continue to hear the demo page playing through ‘Marley Get Together’ until I press the reset button. At that point it goes back to the current OS default. This is illustrative of the API. The demo page stores your choice in local storage. Note that Firefox has a bug where it doesn’t yet fire a devicechange
event when the OS default speakers for Firefox are changed.
Strong privacy characteristics
Notably, this demo is from a different origin and runs here in an iframe
with allow="speaker-selection"
as the only permission policy delegated to it. It therefore cannot listen to you while you’re reading this. I wish it were hypothetical, but this blog site quite possibly has access to your microphone already from previous posts you’ve interacted with — that is, unless you’re a first-time reader, in which case welcome, and never mind! — note, Firefox does not persist permission unless you ask it to, but some other browsers do.
Moreover, the API adheres to the W3C Privacy Interest Group and W3C Technical Architecture Group‘s design principle of not exposing device information of unused devices.
This seems like a slam dunk API. So, why hasn’t it been broadly implemented yet? Let’s look at that.
The microphone loophole
Unfortunately, cameras and microphones (which are about input not output) are exposed through an older API. That API has a poor privacy profile, and is considered a mistake. The data support this: 7.2% of the web calls the enumerateDevices()
API, greatly exceeding its legitimate use at 0.2%. This highlights significant privacy concerns with 7% likely being web-trackers fingerprinting users! It’s a whole other blog post.
But what’s relevant here is a misunderstood microphone loophole where devices that act as both speakers and microphone must be exposed as both in enumerateDevices()
upon live microphone access by the website. This allows for headset detection and lets websites promote full duplex audio if they wish. For example, a website might wish to auto-switch to AirPods speakers whenever the user uses the AirPods microphone, and only then. This synchronization puts everything on the same device clock, improving echo.
But this loophole doesn’t work for speaker selection, because not all speakers have an associated microphone. Users would consider any website-built picker that only shows a subset of their speakers terrible (this is real, we get bug reports about it).
Even if we changed the spec to expose all speakers through the loophole, audience members would still need to grant microphone access even if they never intend to speak, a burdensome requirement.
For these reasons we recommend ignoring the microphone loophole for speaker selection. See below what we recommend instead.
How do other browsers manage speakers today?
How are the other browsers making do without a speaker selection API? The answer is different for each browser.
Safari doesn’t implement speaker switching (videoElement.setSinkId()
) at all. It instead relies on the robust OS-level speaker selection offered by macOS and iOS. This might be great for privacy in Safari, but Firefox needs a solution that works on Windows, Linux, and Android as well.
Chrome is unique in exposing all speakers through the microphone loophole violating the spec, and also interpreting allow="microphone"
as an implicit allow="speaker-selection"
(likely for lack of support for the latter).
But that seems like a dead-end. Once Chrome fixes crbug 40138537 to tighten exposure to active use instead of just permission, websites will need to actually turn on the microphone to select speakers.
This is a sad state of affairs for web developers, necessitating a different approach for each browser. Worse, many just copy what they do for microphones since this works in the dominant browser today.
As a result, this puts Firefox in a tough spot as the only browser without working speaker selection on many sites on some platforms, even though it respects user privacy by following the spec. For this reason, we’re reaching out to web developers. Your assistance can make a significant difference!
Making it work in all browsers
We’ll now demonstrate speaker selection working in all browsers. There’s no shim, but if you’ve already done this in chromium, then the rest should be relatively simple.
We encourage websites to feature-detect the new API with a fallback to the old API. Like this (HTML):
Speakers: <span id="newapi"> <button id="speakers1">Default system output...</button> <button id="reset" hidden>Reset</button> </span> <span id="oldapi" style="display: none;"> <select id="speakers2"> <option value="">Default system output</option> </select> </span>
Feature detection is then done like this in JS:
if (!("selectAudioOutput" in navigator.mediaDevices)) { newapi.style.display = "none"; oldapi.style.display = "inline"; }
We won’t cover populating the <select>
from the old API — it is similar to microphones — and we covered the new API already. Note, if you want to implement memory like this demo does, have a look at its source for more details.
This results in a demo that works in all browsers:
Hit the start button to make it go!
- In Firefox, you’ll enjoy the same great speaker selection experience as above
- In Chrome or Edge, you’ll see speaker choices listed the same way microphones are listed
- In Safari… well, then you’re on macOS or iOS where this is easily done in the OS instead
Please consider applying this to your website. It will help speaker selection work in more browsers. Other vendors will eventually implement this, so you’ll be future-proofing your website.
I hope you enjoyed this post and found it helpful. I’d love to hear your feedback or thoughts. If you have comments or questions, you can reach me on X. See you on the web!
The post How WebRTC speaker selection works appeared first on Advancing WebRTC.