As any informed internet citizen of the 2020s will tell you, two things are true. The first is that privacy is important. The second is that the internet does not have a good track record of ensuring privacy.

The services we use every day are being hacked again and again, governments all over the world gather information about our contacts and habits, and employees of messaging giants read their partners' supposedly private messages.

As any informed internet citizen of the 2020s will also tell you, there is ostensibly a solution: end-to-end encryption. End-to-end encryption makes sure that a message can only be read by its intended recipient, even if the government spies on the network, even if the service you’re using has malicious employees, and even if said service gets hacked. The messaging app Signal is perhaps the prime example of an end-to-end encrypted messaging service, and it cannot be overstated how great Signal is for privacy. No one can read your messages, provably. I love Signal, and if you aren’t using it, you should.

Yet, regrettably, end-to-end encryption is not the ultimate solution.

Why not? Because it leaks metadata.

Metadata is everything in a communication except the actual message content itself. It is who you talk to, how often you talk to them and for how long, and who the people you talk to talk to. While end-to-end encryption hides the actual data, it does nothing to protect metadata.

It may seem like it’s not too much of a problem that metadata isn’t protected. After all, what’s a hacker going to do with the fact that you texted your ex 10 times in a row last night? Embarrass you, maybe. While that may or may not convince you that metadata is worth protecting1, consider the woman who called the domestic violence hotline last week, the government employee who shared files with journalists the past month, or the Belarusian student who joined an opposition party group chat this morning. For all of them, leaking who they talked to (i.e., metadata) could be disastrous, even if the message contents remain protected.

Indeed, metadata is incredibly powerful. The following quote from former NSA director Michael Hayden puts it in simple terms:

“We kill people based on metadata.” - General Michael Hayden

(Yes, this is a real quote.2)

Don’t just take it from me or from the former NSA director. Academic research over the past decades has time and time again illustrated the vast amount of information that can be extracted simply from knowing who talks to who, when. Metadata is important, and end-to-end encryption doesn’t hide it at all.

It is therefore time to move beyond end-to-end encryption, and enter complete privacy: when you talk to a friend, no information at all should be revealed, be it content or metadata. No one except you and your friend should be able to know that you two are communicating. End-to-end encryption solves content privacy. To go beyond, we need metadata privacy.

How do we achieve metadata privacy? At a high level, there are two approaches. The first is to design our services such that they delete any records of who is talking to who as soon as technically possible. This is what Signal is doing. While this is a good start, it requires users to trust that the Signal servers are actually running the code that they tell us they are running. It also does not preclude a powerful network observer (e.g., the NSA) from analyzing timing events to figure out partial metadata, and it does not protect against a hacker who gains access to Signal’s servers.

The second approach is to require cryptographically complete privacy: regardless of what the server is doing, regardless of any network observers, and regardless of hackers, no metadata at all should be possible to extract (assuming standard cryptographic assumptions, e.g. that factoring integers is hard). The second approach is to the first approach like end-to-end encryption is to in-transit-only encryption. With cryptographically complete privacy for metadata, as with end-to-end encryption for content, you need to trust no one but your own computer. It is obvious that we should prefer the second approach to the first.

Today, to my knowledge, no completely private service of any kind exists. There is Tor, but it does not have provable anonymity, and as a consequence, it suffers from many different privacy attacks. The reason complete privacy doesn’t exist yet is that it’s a hard theoretical problem: researchers have studied it for years and years, trying out and inventing new cryptographic and algorithmic solutions. Complete privacy has long been at odds with scalability, and while it still is, recent research has brought complete privacy into the realm of feasibility.

We can do complete privacy now. We can protect metadata. And we should. It is time for us to move beyond end-to-end encryption.

My name is Arvid, and I’m currently working on making a completely private messaging platform. The technical problems are extremely interesting from both a performance engineering perspective and a cryptography perspective, and I believe that we will have to add on to existing, recent, exciting research. In addition to being technically interesting, complete privacy is socially important, for the everyday person as well as the dissenter, the whistleblower and the repressed. If this sounds interesting to you (which it should! it is super cool!), and you are an exceptional performance or systems engineer, an ingenious cryptographer, or an excellent communicator and business person passionate about privacy, email me: [email protected].

  1. There’s a good argument to be made for why privacy should be important to you even if you have nothing to hide. ↩︎

  2. The quote isn’t even out of context. See this YouTube video, at 17m50s. To paraphrase, the point the General is making is that while the NSA kills foreigners based on metadata, the metadata it collects on Americans is not used to kill anyone, and hence there should be nothing for anyone to worry about… Regardless of how comforting you find that context to be, his comments unequivocally highlight the immense power that metadata has. ↩︎