No really, why can't we have raw UDP in JavaScript?

In my opinion, the pat answers about security are incomplete. I'd like to see a detailed writeup of specifically why a raw UDP API cannot be made as secure as current HTTPS.

Jul 05, 2022

A sculpture of a cartoon character stuck in a pipe.

By now I should know better than to ask on Twitter for a “rigorous analysis” of anything. As George W. Bush said, “Fool me once, shame on you… fool me can’t fooled again.”

I don’t want to be “fool me can’t get fooled again”, so I officially give up on technical tweets. Today’s the last day I will ever post anything technical on Twitter, I promise. Instead, you will be forced to endure yet another Substack, so I can post 3,000-word posts that no one will read.

Here we go:

Why have a browser raw UDP API?

The goal with raw UDP is very simple: better performance and security on the server side.

HTTPS is an unbaked sausage made by grinding pure text HTTP with TLS and encasing the result in an arbitrary selection of third-party animal intestine… err, I mean, “highly secure” certificates provided by arbitrarily selected certificate providers. Implementing HTTPS is a massive amount of code that is inexorably slow. It is not only theoretically difficult to secure completely, but is insecure in practice in popular implementations available to the public.

Oh, and the certificate authorities are also insecure, by the way - but that’s another story (and another, and another, and …)

It also relied (up until recently) on TCP, which, unless you plan to write a completely custom network stack for every type of server/NIC you ever use, requires the underlying kernel to understand and track network connections. This means that you inherit substantial overhead, and perhaps vulnerabilities as well, from the TCP/IP substrate before you even begin to write your server code.

If you were a large company with significant academic and engineering resources, you might instead want to design your own private secure protocol that:

Uses encryption you control, so it cannot be bypassed by hacking the certificate authority,
Uses UDP to avoid having OS connection state on the server side, and
Uses a well-designed, known packet structure to improve throughput and reduce security vulnerabilities from HTTP/TLS parsing.

The first thing on that list is half-possible now. Although there’s nothing you can (ever1) do to avoid man-in-the-middle attacks the very first time someone interacts with your server, web APIs have long made it possible to store data on the client for later use. One use for that data would be storing your own set of public keys.

So even using nothing newer than XHR and cookies, you could theoretically add your own layer of encryption to anything you send to the server. This would ensure that any subsequent hack of the certificate authority could not inspect or modify your packets. It’d be much less efficient than rolling your own top-to-bottom, because now you pay the entire cost for your encryption and TLS. But you can do it.

It’s slow, but possible. Call it half-possible, like I did above.

The second thing on the list is sort-of possible now as well. If you can somehow manage to use HTTP/3 exclusively as your target platform, you will still be talking HTTP but you’ll be doing it over UDP instead of TCP, and can manage connection state however you wish without OS intervention.

It is probably unrealistic to assume that you could do this in practice today. If you didn’t care about broad compatibility, you probably wouldn’t be deploying on the web anyway, so presumably the current adoption of HTTP/3 is insufficient. But at least it exists, and perhaps if adoption continues to grow, eventually it will be possible to require HTTP/3 without losing a significant number of users. For now, it’s only something you can do on the side - you still have to have a traditional HTTPS fallback.

Which brings us to the third item on the list, and the real sticking point. As far as I’m aware, no current or planned future Web API ever lets you do number three. There are many new web “technologies” swarming around the custom packet idea (WebRTC, WebSockets, WebTransport), but to the best of my knowledge, all of them require an HTTPS connection to be made first, so your “custom packet” servers still need to implement all of HTTPS anyway.

What is the deployment scenario?

I can imagine someone raising the following objection at this point: “If you don’t support HTTPS on the server, how do you serve the WASM/JavaScript/whatever with the custom packet logic in the first place?”

That’s a reasonable question.

The answer is, the two most logical deployment scenarios I can think of both involve a separate server (or process) for the initial HTTPS transaction.

The first is what I imagine would be the most common: you upload to a CDN a traditional web package containing the PWA-style web worker necessary to do your own custom packet logic. The CDN serves this (static) content everywhere for you. They obviously implement HTTPS already, because that’s what they do for a living, and they’re not your servers anyway so you don’t care.

The second would be less common, but plausible: you run your own CDN-equivalent, because you’re just that hard core. But you expect that your HTTPS code is more vulnerable than your custom code, since HTTPS is vastly more complicated and has ridiculous things in it like arbitrary text parsing, which no one in their right mind would ever put into a “secure” protocol. So you cabin your HTTPS server instances into their own restricted processes or own machines entirely. This prevents exploits of the HTTPS code from affecting anything other than newly connecting users - existing users (who are only talking to your custom servers) remain unharmed.

In neither scenario do you actually include HTTPS code in any of the processes running your actual secure server.

What precedent has been set by other “secure” APIs in the browser?

So that’s the hopefully-at-least-somewhat-convincing explanation of why someone might want raw UDP. Now the question is, can raw UDP be provided by a browser in a way that is “secure”?

I’m putting a lot of these words in scare quotes because browsers aren’t secure for any serious definition of that word, and hopefully that is overwhelmingly obvious to everyone who has ever used one. But just to be clear about the landscape, there are two different ways browsers are not secure:

The web as a platform consists of massive, overlapping, poorly-specified APIs that require millions of lines of code to fully implement. As a result, browsers inexorably have an effectively infinite number of security exploits waiting to be found.
Browsers include the ability, sans exploit, to transmit information from the client computer to any number of remote servers. Without the ability to control this behavior, the user’s data could be misappropriated.

Clearly, for raw UDP, we only care about the second one of these. The first one happens in browsers all the time already and there’s no reason to suspect that raw UDP would somehow have more implementation code vulnerabilities on average than any other part of the sprawling browser substrate.

So the question is, assuming the browser has not been exploited, what is the security standard for web features, and can raw UDP be implemented under that standard or not?

As a point of comparison, I will use the example of the current camera/microphone/location policy as it presently exists. That will be our “gold standard”, since if it were not considered “secure” by web implementers, presumably it would not have been knowingly shipped in web browsers everywhere for the past several years.

As everyone who uses a web browser knows, a web site at present is allowed to ask you for permission, temporarily or permanently (your choice), to access your camera, microphone, and location data. Once you say “yes” to any one of these things, that site can transmit that data anywhere in the world, and use it for any purpose, trivially.

Allow me to provide a worked example.

Suppose I partner with Jeffrey Toobin to make a cybersex conduit site for people who, like him, see the value in quickly switching tabs away from your work meetings to get down to some real business. We launch cyberballsdeep.net, and it’s a big success.

When a user visits our site, they see at most two security-related things:

An allow/deny request for access to the microphone and camera, and
A lock icon indicating that the connection has been signed by a third party warranting that this connection is end-to-end encrypted from the user’s machine to some server somewhere with the secure keys for cyberballsdeep.net.

Assuming you click “allow” - which you have to in order to use the service - the servers at cyberballsdeep.net can now do anything they want with your (very sensitive) video data. They can, for example, record you while you are toobin’ and play it back at any time, anywhere, at their discretion. They could play it on a billboard in Times Square, they could send it to your spouse - anything goes.

So the “security standard” that you are getting, in practice, exactly mirrors the two things you saw:

You know your sensitive data will not be captured unless you click “allow”, and
You know that nobody will be able to see your sensitive data unless either cyberballsdeep.net or the issuing certificate authority let them (either intentionally, or unintentionally if they’ve been hacked).

That’s it. You don’t know anything else. In practice, you basically have no security guarantees other than a warrant that your sensitive data will go to a particular named party first before it goes somewhere else.

Hopefully we can all agree that this extremely low bar for security is the only hurdle one should have to clear in order to dismiss concerns of “security” as a reason not to implement a feature in a W3C spec. It’s not much, but it is something.

Can raw UDP meet this security standard?

OK, finally, with all that out of the way, this is what I actually wanted someone to point me to when I asked about this on Twitter. I just wanted to see that someone, somewhere, had worked out exactly why UDP could not be made to fit the same security model considered acceptable across other basic web features already deployed and considered “secure”.

Since nobody sent me such a thing, I am still stuck with my own security modeling, with nothing to compare against. My model goes something like this:

Step one - the “allow/deny” step - is easy for raw UDP to provide. The browser is still sitting between the JavaScript/WASM layer and the OS sockets layer, so it can ensure that inbound and outbound packets are filtered any way the browser wishes.

This means that it would be trivial for a browser to only allow UDP packets to and from servers that the user has authorized, as it does with microphone, camera, and location data. Any site that wishes to access raw UDP simply provides a hostname to the browser, and the browser asks the user whether they wish to allow the page to communicate with that site.

Furthermore, since the browser already allows the page to send as much HTTPS data as it wants back to the originating site, one could optionally allow any site to send UDP packets back to its own (exact) originating IP without asking the user. This is not necessary for raw UDP to work, but I can’t think of any violation of “step one” that would happen as a result, so it could be considered.

Note that this is not true for something like camera/microphone/location data. Those are additional data sources to which the page gets access, so if anything, raw UDP permission is less dangerous in terms of user permission, since at no time does the page itself get additional access to the user’s data, regardless of whether they allow UDP communication.

Which brings us to step two.

As far as I can tell, there’s actually nothing special about step two. The original web page was served by HTTPS, obviously, since that’s the only way the browser supports getting WASM/JavaScript downloaded in the first place. So the originating server and code are already exactly as “secure” as they would be in any other scenario.

The user had to affirmatively allow the destination name, so the page can only send UDP to a specifically approved endpoint.

So the only question is, can the user be sure that the data sent to that endpoint is encrypted such that only the endpoint or the certificate authority can decrypt it?

I can’t know the hivemind of a W3C committee (thank the heavens). But if I had to guess, I would suspect that this is why they didn’t want to allow raw UDP (or raw TCP for that matter). In their mind, it probably seems less secure than HTTPS to allow a web page to implement its own secure UDP protocol.

However, to my mind, this is based upon a flawed assumption. That assumption is that somehow web implementers can be trusted to deploy their encryption keys securely, but cannot be trusted to deploy their protocol securely.

To be more specific, HTTPS can be intercepted trivially if the attacker A) has a machine on the route between the endpoints and B) has access to the server’s keys, or any certificate authority’s signing capability. (A) either happens or it doesn’t - there’s no way to control it - so (B) is really the entire question.

So the notion that allowing web pages to use UDP for transmission is less secure than HTTPS seems to me to be predicated on the notion that web developers can be trusted to do something complicated in one place (run a set of servers without leaking keys), but also cannot be trusted to do something complicated in another (download, for example, a JavaScript UDP encryption library and use it).

Stated alternately, the hard constraint on the client side that you can’t roll your packet code “for security reasons” is nowhere to be found on the server side. There is no requirement anywhere in W3C or anywhere else that says your web server has to be… well… anything at all, really. You can just go ahead and write your own code from top to bottom. You can even have a dedicated web page on your site that has the entire cryptographic key set for the server posted on it for people to cut-and-paste, so everyone can impersonate your server to anyone, anywhere, at any time. You can leave a thumb drive with your keys at the bar. You can generate your keys with a random seed of 0x000000000000000000. Anything goes.

Nobody seems to be panicked about this. Nobody has pushed the policy that the W3C should standardize on a specific web server deployment that you are forced to use, or a set of n of them made by Google/Mozilla/Apple, or what have you. It is just assumed that everyone is allowed to write their own server packet handling, but that no one is allowed to write their own client packet handling.

So that’s what I would like explained. Internet, justify this!

Appendix A.1.2b subsection 12: What about destination consent?

I have seen people mention (but not support) a claim that raw UDP would cause “denial of service” problems because malicious web pages would send UDP packets to random servers in an attempt to overload them. This claim seems completely baseless to me, because there is no reason why you can’t employ the relevant XHR DDoS restrictions to UDP. If DDoS was the concern, just require that UDP packets be sent exclusively within the same domain as the originating code.

Furthermore, you could restrict the port ranges of raw web UDP to some assigned range. A new port range could be explicitly reserved just for raw web UDP if that makes people more comfortable, so it could literally be discarded at the gateway on any network that doesn’t want to support raw UDP for web, making it easier to deal with than UDP attacks from native code and viruses which can choose their ports at will.

At that point, I fail to see how raw UDP from the browser could be significantly more dangerous than XHR, unless I am missing some particularly clever use of UDP. And again, that’s why I asked for writeups in my original tweet. I’m totally willing to believe I’m missing something, but I want to see a complete technical explanation about what it is.

Now, none of this is the same as saying I can’t see how you would perform DDoS attacks with raw UDP. I certainly can. I just can’t see how you would perform them more easily than with XHR, which obviously is considered “secure”.

As a simple example, suppose a commercial CDN distributes the payload of ddosfuntimes.com. On the main page, there’s an XHR to target.ddosfuntimes.com. Even though the CDN is a completely different set of IP addresses as target.ddosfuntimes.com, this is completely legal under XHR policy.

The owners of ddosfuntimes.com can go ahead and set the IP address in their DNS records to point target.ddosfuntimes.com at any server they want, and they will receive all the XHR traffic from every browser that visits the page. And to the best of my knowledge, there isn’t a damn thing the target can do about that.

So unless I’m missing something, XHR already allows you to target any website you wish with unwanted traffic from anyone who visits your site. So why the concern about UDP?

This is way off topic, but in case it struck people as odd: all secure systems have a root trust problem. At some point you have to get something from somebody that you will just blindly trust. This is the root of the chain of trust, and unfortunately, there’s really nothing you can do to make it secure. You just have to hope that this initial exchange is trusted.

So in the case of web browsers, you have to keep in mind that HTTPS doesn’t actually guarantee you anything beyond a chain of trust. You are implicitly trusting that a) nobody messed with the browser when you downloaded it, b) none of the certificate authorities trusted by that browser download have been compromised, c) the certificate for signing browser root certificate updates hasn’t itself been compromised.

Etc., etc.

So in general, when we talk about adding security to a protocol, we can only talk about securing it up to a point. No matter what we do, there will never be a way for it to be completely secure, because the chain of trust is not infinite, and any of its endpoints (in this case, the browser itself or any certificate authority) can lie to you for as long as it takes for a security firm to catch them doing it.

Hasen Judi

Jul 5, 2022

My sense is that the game they are playing is blame management and plausible deniability.

Without https, it becomes plausible for banks and other websites where security is paramount to blame the web standards for lacking a way to secure connections when they leak sensitive user data.

So what the committees and browser vendors really wants is a way for the browsers to easily know that all connections with this site are "secured". Now, if information leaks, the blame is solely on the site operators.

Currently they can do this if the site uses https.

If you introduce UDP to the mix, and tell them "I will encrypt the packets myself", then the browser has no way to tell whether the connection is secure or not, so they will default to telling the user that this website uses an insecure connection.

This would not be so problematic, except I think they want to eventually deprecate non-secure connections.

Efficiency and simplicity is the last thing they care about. They will only care about it when someone demonstrates the existence of a clearly superior web application that cannot be implemented without a certain feature. I think this is why wasm got standarized.

Expand full comment

1 reply

Tose Nikolov

Create your own client app. This is very much trying to fit a square peg into a round hole.

If you want to, you can even give your client app an address bar, and let others use your app for their servers. Then you won't even need to touch html or css or JavaScript.

26 more comments...

Computer, Enhance!

28 Comments