In an attempt to understand the technical side of the cryptocurrency craze, I read the book, Bitcoin and Cryptocurrency Technologies (http://bitcoinbook.cs.princeton.educeton.edu). It was a great book and I’d highly recommend it. What follows is somewhat of a book report on the beginning chapters, a look under the hood at what a blockchain is.
Act I, Primitives
A blockchain is composed of a few primitive building blocks. The first is called a hash function. Essentially, a hash function takes any data as input and produces a fixed length output. The same input will always produce the same output. You can play around with one hashing function here, https://passwordsgenerator.net/md5-hash-generator/.
If you’re wondering why a hash function is useful, one use case is checking if something has been tampered with. An example is Dropbox (note, I don’t know how Dropbox actually works). Imagine you put a 5MB image file in your Dropbox folder. It gets uploaded to Dropbox, syncs across all your devices, and now that image is copied to the cloud, your iPhone, maybe even another computer. A week goes by and Dropbox might be wondering, did you edit that picture? How can they tell? They could upload the entire 5MB image and check if it’s the same, bit by bit. But that’s expensive, and you might have thousands of other pictures to check next. Here’s one place a hash function is useful. If they store a 256 bit hash of the image, they can verify is hasn’t changed between two locations by comparing just the hashes. They could even hash the whole image folder, or even your whole Dropbox folder, and verify nothing inside has changed.
But what if the picture did change but the hash output stayed the same? Now Dropbox would incorrectly think the picture was unchanged and that becomes a bug in their syncing. In the hashing world, this is called a collision. While you can’t theoretically prevent collisions, in practice, collisions are prevented by having a uniform distribution across a large enough output space. For a 256 bit hash, there are approximately 1076 possible outputs. The chance of a collision, even if you had quadrillions of computers guessing input values for quadrillions of years, is 0.00%. If Dropbox uses a 256 bit hash they can sleep soundly at night knowing there is no syncing bug caused by collisions.
It’s important to note that when talking about cryptocurrencies, we’re talking about hash functions that are collision proof. In fact, there’s a name for such a hashing function, they’re called cryptographic hash functions. And on top of being collision proof, cryptographic hash functions hide their input. This means the output can not be deconstructed to draw any conclusions about the input. When you have the output of a cryptographic hash function, you have the output for a unique input but you can say nothing about what the input was.
Act II, Hash Pointer
Now that we have hash functions, the next building block is a hash pointer. A hash pointer is a pointer, or reference, to some data, and a hash of the data. It doesn’t include the data itself, it just includes its location and a hash of it. What a hash pointer allows us to do is two things: 1. Use the pointer to retrieve the data; and, 2., verify, via the hash, that the data hasn’t changed.
Act III, Blockchain
A hash pointer works if you want to verify a static chunk of data hasn’t changed. But what if we want to store a sequence of data over time and still be able to verify none of it has changed? One way to structure that is to start with the first piece of data and a hash pointer to it. If we then want to add more data, we can group the new data with the hash pointer, and then store a hash pointer to them combined.
We end up with what we call a linked list of hash pointers. What that means is we have a hash pointer and some data that points to a hash pointer with some data, etc. If any data gets tampered with the previous link in the chain will have an incorrect hash. If someone tries to tamper with that hash to cover their tracks, the next hash pointer will have an incorrect hash. This goes on and on until in order to change even one bit of data, they’d have to modify every hash all the way up. If you knew just the root hash pointer, you could verify if the chain was tampered with.
And this is a blockchain. It’s another way of saying a linked list of hash pointers. It’s a tamper-proof chain of data. If a system can find consensus about which chunk of data to add next to the chain, everyone that participates in the system can sleep soundly at night knowing none of the past links in the chain were tampered with.
Bitcoin doesn’t have actual coins. It has a public, distributed ledger with every transaction. By looking through the ledger, you can see if a person saying they’ll send you a bitcoin actually has a bitcoin. If they send it to you, that transaction is added to the ledger, and now you have a bitcoin available to send. Without the blockchain technology the ledger would be insecure. People could fudge transactions from the past to make it look like they have more bitcoins than they do and then they could fool you into accepting them as payment.
The technicals of a full-fledged cryptocurrency like Bitcoin get significantly more complicated. Hopefully, you can start to see how the blockchain technology underpins the security of the data. Cryptocurrencies depends on a distributed consensus of data and so it’s fundamental that the data is tamper-proof.