Thursday, April 11, 2013

Hash functions - one of the key concepts Bitcoin relies on

Before you can truly understand the mechanics behind Bitcoin, you need to know what a hash function is. It sounds complicated, but it's not.

A hash function takes some data (like a page of text from a book) and turns it into a smaller fragment of data (like a number). If someone else takes the same page from the book, and uses the same hash function, they are guaranteed to get the same output number.

Using the same idea, here is an example of a hash function: counting "a" as 1, "b" as 2, "c" as 3, and so on, turn all the letters on the page of the book you chose into number. Then add all the numbers together. That final number is the "letters to a number hash" of your book page. Let's call this hash function LTN.

So for example, what is the LTN hash of "a cat"? 1 + 3 + 1 + 20 = 25. Simple. The number you'd get from hashing the 32nd page of Lord of the Rings would be a bit longer, but still just as simple to calculate. It would probably take you a couple of minutes, and a computer program could do it in microseconds.

Real hash functions as used by Bitcoin are quite a bit more complicated - the two used are known by the names SHA256 and RIPEMD-160.¨You could apply them to some data by hand, but it would take a long time. And the other thing is that the chance of you finding two pages of a book that each gave the same SHA256 hash output is extremely unlikely. You certainly wouldn't be able to do that by hand.

But why would we be interested in hash functions? There are a number of reasons, but the main thing is that it's easy to calculate the hash of a bit of data, but it's hard to find a bit of data that gives you a given hash. The example I've given isn't too hard - if I said "find some data that gives you a LTN hash of 26", looking back at the previous example you might suggest "a cbt". Or perhaps "z" if you're a bit smarter. But with SHA256 finding a piece of data that gives you a specific hash is really really hard. And finding two pieces of data that give you the same hash is therefore even harder.

Technical note: SHA256 takes as its input some binary data, and outputs a 256 bit number (i.e. any given piece of data, when run through SHA256, gives a number that is between 0 and about 1 followed by 77 zeroes. If you use hexadecimal notation to represent this, it's 64 characters long.)

No comments:

Post a Comment