Vote From Home, Validated but anonymous
May 20, 2020 / Stefan H. Farr
There is a battle over the Internet which seems to put privacy and security at opposite ends of the front. On one side stand the privacy advocates, who believe privacy is not only a fundamental right of people but also an important ingredient of a healthy democracy, where people can exercise their free will without the fear of being persecuted and abused by powerful organizations (companies and states alike) who might be tempted to dip their allegorical hands as deep as they can into peoples' private life seeking either bigger profits, more control, or both. On the other end stand states and crime fighting organizations who consider that sacrificing privacy is a small price to pay to curb cybercrime which affects so many, not only economically but also personally. Besides the hundreds of billions of dollars lost to crime yearly, there are countless children that are being cyber-bullied, people who meet their kidnappers online, illegal substances are being transacted, money being laundered, intellectual property stolen, government secrets leaked and state affairs interfered with.
Anonymous but validated digital identity can transform an online survey into a national election platform
Both sides are right. Both anonymity, which is essential to preserving privacy, and personal validation, which is essential to attribute deeds, are fundamentally important to a healthy cyberspace and yet they seem irreconcilable. Not only are they difficult to enforce individually, but ensuring them both at the same time seems impossible. Yet in some special situations, this requirement is indispensable. Such is the case of democratic public election which requires guarantees that voters are eligible to express their option and at the same time their anonymity must be guaranteed to avoid any pre or post interference in the democratic process.
In this whitepaper, I am going to explore how Identity Plus, our privacy centered identity model can reconcile the two seemingly incompatible requirements and transform a fictional, but otherwise ordinary online survey platform called "True Survey", into a national election platform. While this exercise is purely theoretical, all technology elements described here exit, and the only things separating this thought experiment from reality, are a few logistical steps and the political decisions to make it happen.
Everybody loves online surveys. They are an extremely handy way to poll for opinions and express preference. For most situations online surveys are practical but when stakes get higher, their limitations become obvious.
Corporations, especially big ones, where hierarchy is large, rely extensively on surveys to gage employees' general state of mind. Employee satisfaction is a very important component of corporate strategy, but in order to receive actionable, honest feedback, surveys must ensure the anonymity of the answer.
Most people do not express frustration openly, especially when they feel this can create tension with superiors and potentially jeopardize their career. Yet every time such a survey comes to pass people can't shake the feeling that anonymity is not completely ensured. After all, they are registered in the platform using personally identifiable information, which is obvious since surveys usually arrive by email. A unique reference is in the URL to accurately connect their responses to this personal information, categories are created, reports are run, there are dashboards employees don't have access to but managers and those that run the survey do. There is uncertainty and even though the platform usually promises to protect the anonymity of surveyees, the information is there and trust is limited. The company's need for certainty that nobody from outside the company interferes with the results, effectively sacrifices the guarantee of anonymity. In a democratic election this simply won't do.
The trouble lies in how we connect real people to their digital identities, which put companies in control of people's information.
The Identity Anchor
A digital identity is a highly intangible thing. I create an account in a service but what is that account? How does that account identify me, the real person? How does the service communicate with me? How do they know it's still me when I go back? These are all very basic needs of any service - person relationship. The minimum necessary and most commonly used today is an email address and password. They are convenient. An email address is a unique, pre-existing external anchor that can also be used to reach me. In fact, emails have become convenient online identity anchors for people in general: I am my email address in the digital space. But since email addresses are public in nature, we need passwords. They are supposed to be shared secrets (known only by the service and me) so that if I am able to provide that upon return, it's guaranteed that the person who created the account was me indeed. This is necessary to prevent anybody that knows my email from impersonating me.
This setup proved very useful through the history of the Internet, but lately, as privacy is ever more abused, the principles behind public anchors are being questioned. And for cases where anonymity is important, the very act of using such public identity anchors - like email - is a broken practice. It seems like an impossible problem, but the answer to it is actually quite simple.
The privacy problem with digital identity anchors is not necessarily who the anchor sits with, but rather how we use it, or better said, how we anchor things to it.
Today's go-to solution is the value share solution, in which the identity anchor is shared either by the holder of the identity - me - or the custodian of the identity - in this case, my email service and copied by the Identity Consumer. In either case, whether I use an email + password combination or a single-sign-on protocol, the identity consumer will retain the value of my digital identity anchor. The process repeats for every service I sign up with using the same email address or using the same single-sign-on provider. With this model, knowledge of the digital identity anchor spreads like a virus.
Not only that, but the vast majority of services also collect other information like my name, date of birth, my address, phone number (a shipping company for instance must know my address, otherwise it would not be able to deliver the goods to me). To add insult to injury, these services share data with other services in exchange for metrics and analytics to improve their market position. Vast quantities of information flow from all directions into these aggregators, accompanied with shared identity information. Correlating information from many individual sources is so easy that identification of the real person behind the digital identity is practically unavoidable.
Single-sign-on is a mechanism through which the identity provider ensures the identity of the person sitting behind the browser and can forward this information to partner services that integrate these single-sign-on protocols. It is very convenient as the person only needs to sign in once, to have access into any of these partner services. It's fairly straightforward to notice that the "authentication status" of the user can be communicated by the identity provider to the consumer with a simple, depersonalized "yes" message. However, determining the identity of the user at the identity consumer based on the identity of the user at the provider requires the exchange of some common knowledge. This is where personally identifiable information, like email is used.
Traditionally, this looks something like this: I open my browser and go to "truesurvey.com". I am not signed in just yet, so I go to the menu and choose, login with "emailservice.com". I am redirected to "emailservice.com" where I sign into my email (if not already signed in) and the service redirects me back to "truesurvey.com". At this point True Survey knows that I am logged in, but it has no idea which of their accounts belongs to me. To overcome this problem, "Email Service" will provide my email address - my digital identity reference which they also provided when I signed up with "True Survey" - so now "True Survey" can look up all the data that is associated with me in their database. It was the process of association which required the common information, which was facilitated by sharing my personal information.
That said, spreading people's identities all over the Internet and allowing organizations to track them everywhere is not a necessity to allow for person-to-business relationships across the digital space. It is but one solution to the problem which has unfortunately become wildly popular and the go-to solution for pretty much every business. Alternatives exist albeit, they require a slightly different type of single-sign-on.
To achieve that, the process can be reverted such that at sign-up time, when the common reference is established, instead of the identity provider sharing the common reference (my email address, as it was in the previous example), the reference can be provided by True Survey.
Let's say I am signing up with True Survey, which now tells the identity provider to bind "id1" to my identity anchor. Note that "id1" at this stage is an absolute random, but a unique reference: my local identity anchor at "True Survey".
"Identity Provider" signs me in and associates "id1" with my identity anchor in such a way that when "True Survey" requests in the future that I be authenticated, "Identity Provider" can supply back that reference "id1" - as was agreed at sign up time.
From an authentication perspective, the capability is identical: the identity provider will always provide a stable anchor for the returning visitor. From a privacy perspective however, things look infinitely better. The identity reference used to correlate my accounts are unique to the identity provider - consumer relationship. The process is repeated for each identity consumer so with this new model I will end up having distinct digital identities at each identity consumer service, in such a way that it is practically impossible to correlate globally. Evidently, some service will need to hold personal information on me (by the nature of our relationship), but with this enclaved identity model, these private data points can be the exception rather than the norm.
Enabling Minimum Knowledge
Besides the unique anchor, there is another important problem to solve before I can go largely anonymous on the Internet, but still retain the capability to do my day-to-day like I normally would, without having to spend half my day trying to hide my path. Of course the purpose of this anonymity is not to completely disappear from the map, this is not an exercise for political exiles or those that are hiding from justice, but rather a path for an effective privacy conscious future. A future in which personal information sharing is explicit, and it is practically impossible to correlate information for individuals outside a thorough investigation where legal procedures can be used to compel multiple services to give up information on a particular virtual identity.
Due to the need to communicate with customers, most services collect at least their email address. Communication is a basic need and since email addresses are globally unique, they will permit correlation of identities in data aggregators even if the digital identity itself is enclaved at each service. This practice again, is not due to technical limitations, but rather the model of communication with the customer.
Services collect email addresses from the customers so that they can in turn contact the email delivery service to send the messages to the customers. This is how the email protocol is defined - one needs to know the email address to be able to send another an email. This chain reaction however can be short-circuited by adding a simple message forwarding functionality to the identity provider service, so that the identity provider will also become the message delivery service.
Identities must be unique per customer at each service and since each service is unique, the combination of the two is also unique. This is how emails get to be unique across the internet. I can have firstname.lastname@example.org and email@example.com, and those are two unique email addresses even though I used my-name in both cases.
Similarly an identity-consumer-one/id1 will be unique in the database of the identity provider service, so if we switched the model such that the identity provider held the email address of the owner, Identity Consumer 1 can simply instruct Identity Provider to send a message to identity-consumer-one/id1, which uniquely identifies the individual from the identity consumer service.
With this new setup, the identity service still has access to email addresses in order to be able to deliver email messages to identity holders, as required by the email protocol. Nevertheless, the model reduced the exposure of the email address drastically as a good chunk of the services people use do not require more than being able to uniquely identify people time and time again and to be able to send messages. After all, bills can be sent this way, online payments are widely available, so unless we have a strong need like the delivery of goods, really, services can function with no more than a first name. In fact they don't even need a first name, but things start to become really weird that way. There is however room to segregate knowledge of personally identifiable information even further, in such a way that no single service in the entire chain of services a person uses wil no more than what is strictly necessary to perform its individual function the consumer actually needs.
To exemplify such segregation in the context of our example we configure our setup such that the email service also authenticates the user via the same identity provider "True Service" does. To ensure identities are enclaved, instead of the identity provider telling the email service what the identity is, the email service will generate a random local id (other than the email address, of course) and exchange that with the identity provider. As a result we will have three enclaved identities:
Even though it is impossible for our survey service to directly instruct the messaging service to deliver a message to the person - there is no way to correlate id(1) with the corresponding id(x) directly - it can instruct the identity provider to broker the message to the messaging service, because the identity provider retains the ability to correlate the two identity spaces, even though it operates in both cases with random, non-sensitive references:
survey-service/id(1) ... identity-service/anchor ... messaging-service/id(x) ... person
The ability to send communication to the real person without having to hold any personally identifiable information on the person opens the path towards a truly privacy conscious identity system. After all, an identity provider needs not know any information about the user. Its sole purpose is to ensure the uniqueness of the entity in digital space, to provide a stable anchor to which local (enclaved) identities can be bound and to facilitate transactions between parties such that each service would hold only the minimum necessary knowledge on the person.
When users sign up and allocate their digital identities with such an anonymous identity provider, things look very strange indeed. Such digital identity is extremely intangible in itself, but that is actually part of its beauty and strength. It represents the guarantee that personal information will remain isolated with those that actually need it, which raises an important aspect of our digital presence: who really needs highly specific information and who can settle for the guarantee that certain conditions are met.
Things like being above a certain age, being the resident of a country or city or even unique fingerprints about my state issued identity can be crafted into simple guarantees that are nebulous enough not to uncover my identity but are sufficient to provide the necessary guarantees that I comply with the service requirements or the jurisdictional requirements the service must in turn comply with.
In between them, the services that do know certain hard facts about me - the kind of information that can bridge the gap between my digital presence and me as an individual - nowadays know pretty much every detail of my life with eerie accuracy. To name just a few,
This is not only fine but even desirable - it is the only way I can receive quality service from them - but only under the condition that information stays segregated with those that really need it, of course. Concomitantly, this knowledge, even in this segregate form, opens the prospect for these services to provide depersonalized guarantees about me for those who need such guarantees but do not necessarily need or want to hold the exact information on me.
An election platform needs to know whether I have the minimum voting age (18, 21, depending on state) but they do not need to know my exact age or my birthday. For a social network it is convenient to know my date of birth, so they can suggest it to my friends, but for that they only need to know the month and day, not the year of my birth. From a privacy perspective, such depersonalized guarantees are highly desirable. They tell services what they need to know, but keep me in control of my information.
The identity broker seems the perfect spot to exchange such guarantees. Similarly how a message can be sent to a person via the broker, other types of information can also be exchanged. It is only a matter of having the ability to publish the need for certain guarantees, having services that are able to provide such guarantees, having established trust between the parties and relaying the message.
Validation and the Network of Trust
Once you start using your anonymous identity to authenticate into services, things will radically change in terms of what that digital identity represents. It is no longer an empty hook in the digital space - with no name or any other information attached to it - but rather the anchor that tethers together the services you use in this environment: your email box, the space you keep your family photos, where you do your shopping, your bank accounts. It's like the root of a tree, keeping all the branches together, and even though the branches cannot correlate the information among them, the sheer dependability of the root gives the digital identity weight.
It's easy to see how the simple knowledge that a digital identity binds together 10 services is more valuable in terms of trust to the identity consumer service than an identity that has no such information associated. The guarantee based validation is aimed at providing stronger validation of your real persona all the while keeping your private information private. Instead of having to disclose your details to each service, services can associate depersonalized tags to your identity which other services can receive, giving them even more confidence in the person your digital identity stands for: you.
Let's assume that you are completely anonymous to "True Survey", but attached to that anonymity you have some tags from your bank, "Trusted Bank", that say simple guarantees like "is 21 or older", "has driver's license". It does not sound like much, but in most cases that's all a company needs to know to comply with state regulations. And even if regulatory compliance is not an issue, "True Survey" now knows a great deal more about your trustworthiness than they would if they got your email address. Anybody can get a phony email address, they are free these days.
A guarantee is an exceptional exercise of trust. In a free world, anybody can state anything, but such statements are worthless if the guarantor, the authority that issues the guarantee, is not trusted by those that consume the information. There has to be a predefined trust relationship between authority and consumer, which extends beyond cyberspace, but once such relationship is established in the real world, configuring such relationship on the identity broker is no different from establishing friendship on a social network service. With that information at hand the identity broker can figure out which service seeks certain guarantees and which of the services that can provide that guarantee is trusted. Should a user have an account at both services the identity broker can then mediate the guarantee, with the user's permission, of course.
The user's ability to interfere with the guarantee is limited to allowing or blocking the guarantee in its entirety. This gives absolute control to people to decide who they accept guarantees from and who can see those guarantees. This is a powerful privacy failsafe , but under normal circumstances it is in the best interest of the person to grant permission for guarantees to be forwarded as not forwarding is 100% equivalent with the lack of such guarantee, which may limit functionality on the receiving service side. On the same token, the mechanism ensures that services will only receive guarantees from trusted sources and that users cannot manufacture guarantees for themselves.
The model ensures that everybody has complete visibility and control over the information they base their decisions on.
Strong Authentication with PKI Device Identity
Passwords have been recognized as being the headache of digital security for quite some time. As technology has grown more sophisticated, compute power is ever easier to come by, it's been harder and harder to secure online accounts using a password. The way the human mind learns and retains things is exactly the opposite of how it would be ideal for passwords to be effective, which is kind of a nuisance given that the human mind works quite differently from computers and there are many situations where the human brain excelles and computers suck. Sadly with passwords this is not the case. If it's easy to remember, it is easy to crack, and not remembering is not an option as you risk not being able to access your information online. For that reason people use easy to remember secrets, they reuse them for different sites, a fact that is true not only for the average person but also to system administrators and software developers. Countless security companies and products have been breached due to easy to guess or built in passwords, this is why you don't want people to log into your election platform with their social media accounts. You need a stronger form of authentication where people cannot be impersonated.
PKI, short for Public Key Infrastructure, is a collection of trust and secret exchange protocols that are built on public key cryptography, the kind of cryptography that uses two mathematically connected keys one for encryption and the other for decryption. The beauty of the system is that it enables parties to exchange secrets or establish authenticity without pre-shared secrets like encryption keys or the more commonly, passwords. Since the protocols use different keys for encryption and decryption (the keys are interchangeable, either one can encrypt with the limitation that only the counterpart can decrypt the resulting cipher text) one can can be shared with the public while the other can be kept a true secret (shared secrets are not secrets). The resulting setup has massive potential in the identity and relationship space.
Pick a random secret, encrypt it with the publicly shared key. Only the possessor of the private key will be able to decrypt it which means it will stay safe even if exchanged across a hostile network
Encrypt a random text, and challenge the other party to decrypt it and tell you what the original text was. If they can do it, it means they have the private key and they are who they pretend they are - the original.
Encrypt the hash of any electronic content with your private key and publish it with the document. If anybody changes even one character in the document the decrypted hash will no longer match and they will know it is tempered with
Say something about somebody and encrypt the hash with your private key. The person can claim you said it and everybody can verify it with your public key. Such certifications can be daisy-chained and the authenticity of the entire chain can be established
If you ever visited a site over HTTPS, you then have benefited from several capabilities of PKI. When your browser connected to the service, the service presented a certificate which contained the domain name of the service, signed by a third party authority your browser trusts. Your browser then made up a random secret encryption key and encrypted it with the public key of the service, which was included in this certificate. As a result an encrypted communication channel has been established which guarantees both the privacy of your communication with the service but also that you are communicating with a trusted service.
The service however, does not know who it communicates with. That is why you need to authenticate into the service with either a user name or a password or an identity provider which will in turn authenticate you with a username and password. But passwords are not only weak security measures but they can also be used from anywhere, and why your browser has the certainty that it connects to a server that belongs to the service, the service has no guarantees that the device sending over your password is your device, or any of the other 8 billion devices connected to the internet.
Identity Plus will provide you with a similar certificate the service presented to your browser, but this time its role is to certify that the device belongs to your digital identity. Your browser is quite capable of operating with this certificate, which is called a client certificate, as the HTTPS protocol was designed with mutual authentication in mind ever since the beginning, 1998. It is the cryptographic way of saying "this is my computer" (or "these are my devices", should you have more than one) and my account should not be accessible from anywhere else. From now on, the service no longer expects you to contact it from any of the potentially 8 billion devices, but rather from this handful of devices you designated yourself. Your machine is your key to your digital identity, and your password or fingerprint is the key to your machine. Neither your secrets nor your personal details need leave the premises of your device, preserving your anonymity while access to your digital identity is ensured by the strongest form of authentication ever invented.
The Election Machine
Electronic elections would be such a convenient way to exercise our democratic preference for leadership. No more standing in lines, not a problem if you're not at home or even in a foreigh country, counting is a breeze, zero logistics, results are instantaneous and the cost of organizing a fraction. No wonder it's been considered time and again but to this day it has eluded practice because the prerequisites have been so difficult to satisfy. With traditional digital identity models, people are either validated or anonymous, really it's hard enough to ensure even one of these at a time, but with anonymous validation we have the ingredients to turn this around.
People can sign up for a completely anonymous digital identity. Having no information about the user's actual personally identifiable information the identity platform is unable to share that further with the electronic election platform
Since the identity platform is anonymous, so is any local account the person signs up with. Unless people choose to disclose specific information with the platform the platform has no way of obtaining such information from anywhere else
Services that do collect, or have already collected information on the individual can supply the necessary anonymized guarantees by integrating current accounts with the anonymous identity platform and using it as a guarantee broker
Once the trust has been established between the election platform and the services capable of providing guarantees all the user need to do is to bind their digital identity to their already existing service account
With PKI in place, it is not only the person that is validated but also the devices they login from. Login from any other device in the world is practically impossible so impersonation is not an issue
Use the same anonymous but validated identity to ensure origin of electoral messages and prevent the electorate from being mislead by fake news and fake influencers.
Identity Plus is a cryptographic identity broker that gives you a strong digital identity that requires no personal information. It gives you the possibility to link the devices you own to your digital identity using public key cryptography and allows services which authenticate you via your identity plus to only allow access from your predefined devices and no other device in the world.
With such privacy focused strong authentication in place, TrueSurvey, the imaginary survey platform, can now safely run this year's presidential elections. You can sign up anonymously and exercise your constitutional right or even return and change your mind anytime up until the closing of the polls. Validation comes as depersonalized information from state accredited online institutions capable of providing the necessary guarantees that you are indeed entitled to vote in a certain ballot. All that needs to be done to obtain such validation is to bind your identity plus digital identity to either one of these accredited services, your bank, your favorite online store or any other service that has the ability to check your state issued identification and issue an anonymized unique fingerprint about you.
That's All It Takes
It's secure, tamper-proof, convenient. There is no need for purpose-built machines, no need for massive personnel or to reorganize cities to accommodate the millions of voters who flock to exercise their democratic right. In a world where time is so precious, people busy with their own lives and young generations expecting everything at a click of a button, it's time for elections to shed their physical existence and move into the digital space.
|< The Many Faces of Authentication|