Do you need to understand computer science and cryptography to understand Merkle Trees? No, but it might be helpful. As the applications of Merkle Trees have become more visible, there is more interest in how they work. Merkle Trees rely on cryptographic hash functions which rearrange bits to represent data. You can build your intuition about hash functions here and learn some of the basics of how data is stored and transformed by computers here and here.

How are Merkle Trees used? The applications of Merkle Trees are broad, which is why they are incorporated in so many exciting projects. Merkle Trees allow us to ensure the integrity of data without doing large amounts of computation.

Merkle Trees are crucial for Peer-to-peer file exchange systems, where data is discovered and shared between computers, instead of being downloaded from a central server. Merkle trees allow a large collection of data (a database, document, video, etc.), to be broken up into blocks and downloaded block-by-block. Using a Merkle Tree, we download blocks from many different sources, and quickly check that the final data collection is exactly what we were expecting. If a single block of data is incorrect, we can efficiently identify this block, re-download it and check again if our updated dataset is correct.

Another way that Merkle Trees are used is in the verification of blockchain transactions. In their basic form, a blockchain is an immutable (unchangeable) record of transactions between parties. To make this record immutable, transactions must be verified by multiple parties which all come to a consensus about the state of the blockchain. Merkle Trees are crucial for Blockchains (including Bitcoin) to efficiently verify transactions. Because many different parties need to verify the presence of new transactions in a blockchain, it becomes infeasible for every party to verify every single transaction each time the blockchain state is updated. Instead of checking every transaction, Merkle Trees make it easy to perform a single check which validates the integrity of the entire history of transactions. At the end of this article, we will discuss how we can verify that certain data has been included in a Merkle Tree using a Merkle Proof.

One more widespread implementation of Merkle Trees is in Version Control systems like `git`

. While this application may be less familiar to those without some experience in software development, version control is a crucial part of producing software collaboratively. Individual developers introduce changes to a software project, and version control systems track the time, lineage, and data of changes to the project (new lines of code, renamed files, deleted files, etc.). Merkle Trees allow changes to a software project to be recorded in an efficient format that can be quickly validated. Because of Merkle Trees, developers can agree on the state of a software project and incrementally introduce updates to that state, creating a version history of the project.

So how do Merkle Trees work? Merkle Trees (also called hash trees) are built out of cryptographic hashes. A cryptographic hash function takes an arbitrary array of bits and maps it to a completely unique output.

As an example:

`SHA1(foo) = 0beec7b5ea3f0fdbc95d0dd47f3c5bc275da8a33`

Our hash function here is called Secure Hashing Algorithm 1 (SHA-1) and our input is foo. The hash function returns 160 bits that are completely unique to the input foo. This is not very helpful for storing a single word, but it is crucial for verifying the integrity of large blocks of data. Instead of comparing two blocks of data which could have thousands (or millions) of bits, we can compare two 160 bit hashes that exactly identify the content of the data blocks. Obviously, this is much more efficient.

One more note: because the output of a hash function is just an array of bits, we can get the hash of other hashes.

`SHA1(0beec7b5ea3f0fdbc95d0dd47f3c5bc275da8a33) = 2865765152809a426f118f48c468c5f459425211`

This is precisely how Merkle Trees work. We build a tree of hashes, hashes of hashes, and hashes of hashes of hashes, etc.

*Note: SHA-1 is deprecated because it has been demonstrated to produce the same hash for different inputs. We are using it here because it has a manageable digest size.*

Now that we have covered the fundamentals, we will build ourselves a simple Merkle Tree to show exactly how it is constructed step-by-step.

A Merkle Tree is called a tree because the data structure resembles a tree which goes from a root node to a collection of leaf nodes.

The leaf nodes of a Merkle Tree are constructed from hashes of data blocks. In the peer-to-peer network case this could be sections of a file, for a blockchain it would be individual transactions, and for a version control system it would be individual lines of code, for example.

Consider an example Merkle Tree with four blocks of data. (1) We build our tree by calculating the hashes of the data blocks first. These hashes make our leaf nodes. (2) We then concatenate adjavent leaf hashes together, and calculate their combined hash, forming our intermediate nodes. (3) Finally, we calculate the hash of the concatenated hashes of the of intermediate nodes. This final step produces our root hash. Note that the root hash contains information about every leaf node. If a single leaf node changes, the root hash will also change.

Two notes about Merkle Trees:

(1) We are using a simple example with 4 data blocks. In real life, our Merkle Tree could have hundreds or thousands of leaf nodes and therefore, many more intermediate nodes.

(2) Because we must use a binary tree, we may have to balance a tree if it has an uneven number of leaf nodes. We could do that by creating a data block full of zeros as our final leaf node.

So, now that we have put together a merkle tree, let's look at how we can use it to verify the integrity of our data. As we noted above, a leaf node corresponds to a section of data, and if the data in a single leaf changes, this will also change the root hash of the entire Merkle Tree.

We will use the example of peer-to-peer networks. In these networks, data can be transferred from untrusted peers, and is validated with a trusted root hash. If we are downloading data in a peer-to-peer network, we retrieve the expected root hash, then download blocks of data from many different untrusted sources. We construct the final data collection according to a defined pattern, and calculate a Merkle Tree. We then check that the root hash of our Merkle Tree is the same as the root hash we were expecting.

The important piece of this process is what we do if we find that the root hash isnt what we expected. In this case, we can efficiently identify a problem block and replace it. If we find that our root hash isnt what we expected, we can go down a level in the tree. At this level, we find that one hash is what we expect and the other isnt. For the hash that we expected, we dont have to worry about any of the hashes at a lower level. Instead, we only look at the next level for the hashes that dont agree. In this case, we can identify the problem block in the tree without checking the hashes of every leaf node. This is exactly what makes Merkle Trees efficient for verifying data integrity.

In fact, using a Merkle Tree means that we only need to compute *log2(N)* hashes where *N* is the number of leaf nodes.

Another use of Merkle Trees is for parties to verify that a certain piece of data has been included in a given tree. We do this using a Merkle Proof. These proofs are most relevant to blockchains, where they are used to demonstrate that a certain transaction has been included in a block. The terminology is a bit confusing here. Above, we described leaf nodes corresponding to blocks of data. Here, a block is a collection of transactions, recorded as a Merkle Tree, hence the terminology Block-chain.

If I have a transaction (actually, the hash of a transaction) that I would like to verify, I can ask someone who has stored the entire transaction history of a blockchain (a full node in Bitcoin parlance) to prove to me that they have included the specific transaction I am interested in. I know:

(1) The hash of my transaction.

(2) The root hash of the entire block of transactions.

To verify that my transaction was included in a block, I request only the specific intermediate nodes in the tree structure that I need to complete my own version of the Merkle Tree, using my own version of the transaction hash. I can then "rebuild" the tree and recompute the root hash, to verify that my new root hash is the same as the root hash I was expecting. This proves to me that my transaction was included in the block. If it wasnt, I would get a completely different root hash.

This is an overview of how Merkle Trees work, and an example of a few of their applications. Hopefully you are encouraged to play around with a Merkle Tree data structure and better appreciate the applications that implement them.

https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html

https://en.wikipedia.org/wiki/Merkle_tree

https://en.wikipedia.org/wiki/SHA-1

https://bitcoin.stackexchange.com/questions/69018/merkle-root-and-merkle-proofs

https://en.wikipedia.org/wiki/Peer-to-peer

*Images by the author.*

You may have used a web mapping application (like Bing Maps) which presents a map containing business locations, labels, and a base map. The base map is a collection of images, typically satellite imagery, and as you zoom in on the base map, it becomes more and more detailed. Obviously, your phone cannot store a high resolution image of the entire globe, so how is it able to provide exactly the high resolution imagery that you need?

To do this, Bing Maps divides the imagery into manageable sections, called tiles. Because imagery can have different "levels" of resolution, the Bing Maps Tile System also uniquely identifies tiles in a hierarchy from the most to the least detailed. To do this, Bing Maps constructs an identifier, called a quadkey. Using quadkeys, Bing Maps can render standard base maps for everybody who needs them and send the same map tiles to any client.

A simple approach to dividing spatial data into tiles is to divide a square area into a grid of 4 tiles. This is exactly the method implemented in the Bing Tile System. To add another level of higher resolution data to our grid, we can divide each of the original grid tiles into another 4 tiles. Now, Level 1 has 4 tiles and Level 2 has 16 tiles. We repeat this division as many times as we want, increasing the number of tiles with each new level. The number of tiles in a given level will be:

4^{n}

Where n is the number of levels we have divided by. The Bing Tile System has 23 levels.

Dividing data into hierarchical levels of tiles is good (and reduces the size of individual tiles), but the Bing Tile System implements a clever way of identifying each tile in the hierarchy using a quadkey.

Quadkeys can look a bit gnarly:

`1202102332221212`

In fact, they are rather simple to use and they are a very efficient way to traverse our tiled dataset. By referencing each tile with a quadkey, we reduce the amount of memory needed to store indices for specified sections of data and increase the efficiency of database indexing for tiles.

How do we generate a quadkey for any section of data in our hierarchy? To do this, we use different numeral systems (binary and quaternary). If you are not familiar with these systems you can read an introduction to binary here. Understanding quaternary is just like understanding binary, but quaternary uses four numerals.

So, let's create a quadkey.

In the simplest example, imagine our grid with four sections and assign an integer to each section (Top Left: 0, Top Right: 1, Bottom Left: 2, Bottom Right: 3). In this case, the quadkey of any tile would simply be the integer that refers to its position. For the Bottom Right tile, the quadkey would be:

`3`

For another level of nesting, we would repeat this system, and add a new integer to the quadkey. For example, at the next level down (where we have 16 cells), the quadkey:

`33`

would refer to the Bottom Right tile at level 2 within the Bottom Right tile at level 1.

This is a simple way to think about quadkeys, but it is not precisely how quadkeys are constructed. In fact, we use the x, y position of tiles in a given level to create the quadkey. With the same tile as the above example (the Bottom Right tile at level 2), we can construct a quadkey using the x and y coordinates of the tiles location in its level. Level 2 has 4^{2} tiles (and we begin counting at 0) so this tile would have the position:

`(x = 3, y = 3)`

To construct the quadkey, we convert the x and y positions to binary and interleave the bits (starting with the Y position) to produce a key:

`X = 11Y = 11Key = 1111`

We then convert this key (in binary) to quaternary:

`Quadkey = 33`

This gives us our quadkey.

This is a relatively simple example (it always produces 3s). But consider, for example, the tile with the position: (x = 228, y = 216). In this case, we would produce our quadkey with:

`X = 11100100Y = 11011000Key = 1111011010010000Quadkey = 33122100`

Done! Note that the quadkey is 8 integers long. The smallest map level that would support this index is 256 * 256 tiles, or 4^{8} tiles (level 8). This is one of the nice things about quadkeys: the length of a quadkey equals a quadkey's level of detail.

*Note: Quadkeys always have the length of the level of detail they refer to. The tile at level 1 with position (0, 0) has the quadkey: 0. The tile at level 8 with position (0, 0) has the quadkey: 00000000.*

The idea of gridded, hierarchical data is general, but how does it actually relate to spatial data? If we use a square map projection to represent the earth, we can divide that square projection into a hierarchy of tiles. To create a square map, we transfer (project) the shape of the earth from a globe to a flat surface. For the Bing Tile System, we use the Mercator Projection which projects the globe onto a cylinder (then unrolls the cylinder to make a flat map). The Mercator projection is famous for distorting areas; places near the poles look much larger than they actually are. But the projection has some useful properties for creating our tile system:

(1) It is cylindrical meaning that North, South, East and West correspond to Up, Down, Right, and Left on the map. This is helpful for navigation.

(2) It is conformal meaning that it doesnt distort the shape of things on the map, just their area. This is important for making street corners and buildings look square.

To make the map projection square, we reduce the latitude of the map between +- 85.05 degrees.

The primary use of gridded spatial data, as we mentioned above, is to serve imagery for mobile mapping applications. Imagery requires a relatively large amount of memory to store, so we would like to reduce the amount of imagery we need to send to individual map users. We can also reduce the memory needed to store other spatial data by referencing it to quadkeys. For example, if I use a GPS signal to record my location on earth, I could report my quadkey (at some resolution level) instead of my actual latitude and longitude coordinates. This makes it easy to combine and aggregate my location with the location of others.

Quadkeys have two other nice properties that we have not yet mentioned:

(1) Quadkeys record a tiles lineage in the grid hierarchy.

(2) Quadkeys that are close to each other in X Y space are also numerically close together.

(1) For the first property: quadkeys easily identify the parents of a given tile. For example, a tile with quadkey 1320 has the parent 132 which has the parent 13. To move up a level, we simply need to drop an integer off of the quadkey.

(2) To show the second property, we can use the example of three quadkeys at level 8:

`A: XY = (255, 255), Quadkey = 33333333B: XY = (250, 250), Quadkey = 33333030C: XY = (100, 100), Quadkey = 03300300`

We convert the quadkeys to decimal and calculating their differences:

`A - B = 65535 - 65484 = 51A - C = 65535 - 15408 = 50127`

As you can see, A and B are numerically much closer than A and C (the same goes for B and A compared to B and C). This is very helpful for optimising the storage and querying of quadkeys in a database. For example, if we know we are looking for the neighbors of a given quadkey, we can focus on a nearby numerical range when we design our query.

Hopefully you have learned a bit about the Bing Tile System, and gained a general intuition about gridded spatial datasets. The Bing Tile System is a very clever solution to a data management problem, but it is not the only gridded tile system out there. For other implementations, check out Googles S2 geometry or Ubers H3 system (this one uses hexagons). Readers are encouraged to try a few of the calculations on their own to build their intuition about the mathematical foundation of quadkeys and their numerical relationships to one another.

https://docs.microsoft.com/en-us/bingmaps/articles/bing-maps-tile-system

https://en.wikipedia.org/wiki/Mercator_projection

https://github.com/uber/h3

https://s2geometry.io/

*All images by the author.*

We will look at Python for simplicity and legibility (we are not focused on performance) and will walk step by step through the process of producing a SHA-1 hash digest from an array of bits. We are using SHA-1 because of its relatively short digest. Note that SHA-1 has been deprecated because of known issues with uniqueness and has been superseded by newer hashing algorithms, like SHA-256. SHA-1 is still a great example of the construction of a secure hashing algorithm and has continued to be used (for now) in applications like git.

*Note: A hash digest is the name for the output of a hashing algorithm.*

First things first: what is a hashing algorithm? At its core, a hashing algorithm is a function that takes any input in bits and transforms it into a standard length representation. As an example, SHA-1 takes the input foo and transforms it to:

`SHA1(foo) = 0beec7b5ea3f0fdbc95d0dd47f3c5bc275da8a33`

SHA-1 is a cryptographic hashing algorithm. This means that, in addition to the basic transformation operation, SHA-1 also guarantees certain security properties. Cryptographic hashing algorithms have two core characteristics:

(1) Hash digests are unique

(2) Hashes cannot be reversed

(1) A cryptographic hashing algorithm must map unique inputs to unique digests. Because these algorithms transform all inputs (of arbitrary length) into a standard length representation, it would not be useful if they returned the same hash digest for different input data. When distinct inputs are mapped to the same digest, it is called a collision. SHA-1 actually has this problem and has therefore been deprecated (researchers found two different PDF documents that produced the same digest).

(2) It must not be possible to reverse a hash digest. This means that, given some digest, it must be impossible to work backwards to identify the input data that created that digest. This is important for cryptographic hashing algorithms to ensure that once some data is hashed, the original data cannot be retrieved. Think of it like one-way, irreversible encryption. To achieve irreversibility, hashing algorithms repeatedly perform one-way mathematical operations. An easy way to intuit one-way operations is to imagine that you have the result of an addition. Although you know the result, it would be impossible to identify the operands that created that result. In the case of hashing algorithms, it is computationally infeasible to calculate the original input used to create a specific hash digest.

To understand exactly how a hashing algorithm works, it is important to have an understanding of the internal representation of data in a computer: bits. Hashing algorithms operate by performing bitwise operations on sections of input data, so it is also helpful to understand the ways that arrays of bits can be transformed and rearranged.

In our example of hashing the string foo the string is actually an array of characters and these characters are uniquely identified by an integer (each integer is represented by an array of bits). SHA-1 rearranges and compresses the bits in the input message to produce the hash digest. The hash digest is simply a collection of 160 bits, represented by 40 hexadecimal characters.

To understand SHA-1, we will use an existing implementation in Python by TheAlgorithms. See the full implementation here. This program accepts either a string or a file path and hashes the corresponding bits. Readers are encouraged to clone the file and run it from a directory using the command line:

`python sha1.py -string foo`

This will print the hash digest to your command line. You can check that the resulting hash is the same as the standard SHA-1 implementation from hashlib:

`hashlib.sha1(b"foo").hexdigest()`

So, what is happening internally when we produce a SHA-1 digest? As we have said, a hash function is concerned with the rearrangement of bits to compress and transform any input into a unique, standard length representation. We do this using a series of operations on our input:

(1) Padding

(2) Splitting

(3) Unpacking

(4) Transformations with constants

(1) The first step for transforming an array of bits to a hash value is to ensure that the incoming data has a standard size. For SHA-1, the input length of our message must be a multiple of 512 (64 bytes). The message is padded with a one and zeros, and is followed by 64-bits describing the original message length. The number of zeros we choose to use ensures that the input message has an appropriate size.

(2) Next, we divide the input bits into constant length sections of 64 bytes. We do this by splitting the message into appropriate length blocks of data. We will iterate through these blocks, with each block contributing to our final hash digest.

(3) In order to rearrange the bits themselves, we need to represent them as integers. We unpack the 64 bytes into 16 integers, and append 64 zeros, giving us a total of 80 integers. We then iterate through each integer, performing operations defined in the SHA-1 specification on the entire array of integers. These operations transform the array into an unpacked array of 80 integers that we will use to create our final hash digest.

(4) After preprocessing the input message to constant width blocks of 80 integers, we are now ready to produce the hash digest. We do this by performing operations on the data using a series of constants. Which operations and which constants we use depends on the index of the integer we are processing (019, 2039, 4059, 6079; see here 6.1.2). The constants we use form the initial hash value and each integer of each block of the message continuously transforms these constants. Once we have iterated through each integer in every message block, we concatenate the transformed constants to create our final hash digest.

*Note: The constants we use are called Nothing-up-my-sleeve numbers and are used to initialise a hash function. These constants are unique to specific hash functions and are thoroughly researched to ensure that they produce appropriate hash digests for all (in practice, very many) inputs. SHA-1 uses the first 32 bits of the square roots of 2, 3, 5, and 10 (represented in binary).*

The most intuitive way to think about a hash function is to imagine chopping an input message into constant length sections and iteratively updating a hash value based on the content of each message section. Each section of the message makes an impression on the final hash digest, resulting in a digest that is totally unique to the original message.

SHA-1 and newer hash functions in the Secure Hashing Algorithm family are carefully researched to guarantee their security properties. It is never a good idea to write your own cryptographic hash functions, but it is still fascinating to dig in and understand how they work. Hopefully you are better able to appreciate the security properties and performance of the cryptographic hash function of your choice.

Photo by Nick Fewings on Unsplash

https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html

https://crypto.stackexchange.com/questions/10829/why-initialize-sha1-with-specific-buffer

https://cs.winona.edu/lin/cs435/Presentations/SECURE%20HASHING%20ALGORITHM.ppt

https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf

https://modernresearchconsulting.com/2017/07/23/implementing-sha-1-in-python/

https://github.com/TheAlgorithms/Python/blob/8c13a7786f4fe15af4d133fed14e5e2fb0888926/hashes/sha1.py

https://crypto.stackexchange.com/questions/45377/why-cant-we-reverse-hashes

https://en.wikipedia.org/wiki/Avalanche_effect

These operators are the basis for higher level mathematical operations. It is helpful to have an understanding of binary to understand these operations, see here for an introduction. Understanding the basics of bitwise operators will help you understand the way that your computer actually performs computations using collections of bits.

We will look at three types of bitwise operator:

(1) Logical operators

(2) Shifts

(3) Rotations

(1) People familiar with basic programming should be comfortable with logical operations (AND, NOT, OR) in their own computer programs. These operations are also relevant to combining bit arrays. (2) Shifts are used to change the position of bits in a bit array (replacing bits on one side of the array). (3) Rotations move bits around an array without discarding them.

*Note: we will refer to a bit as set - 1, or clear - 0.*

`AND`

: identifies the set bits that are shared between operands.

`0010 AND 1110 = 0010`

`OR`

: identifies any set bits in either operand.

`0010 OR 1110 = 1110`

`XOR`

: sets bits only if they are different between operands.

`0010 XOR 1110 = 1100`

`NOT`

: Also known as the complement, reverses the bits in an operand.

`NOT 0010 = 1101`

Bit shifts rearrange bits by shifting them some number of positions. Shifts remove and replace bits, in this case, we replace bits with zeros.

`LEFT SHIFT`

: Shifts bits left by some number of positions.

Shifting bits left by 1 place:

`LEFT SHIFT 1101 BY 1 = 1010`

Shifting bits left by 3 places:

`LEFT SHIFT 1101 BY 3 = 1000`

`RIGHT SHIFT`

: Shifts bits right by some number of positions.

Shifting bits right by 1 place:

`RIGHT SHIFT 1101 BY 1 = 0110`

Shifting bits right by 3 places:

`RIGHT SHIFT 1101 BY 3 = 0001`

Rotations are similar to shifts but do not discard bits from the end of an array. They are also called circular shifts because they shift bits in a circle from the front to the end of a bit array.

`LEFT CIRCULAR SHIFT`

: Performs a left shift by some number of positions and adds the shifted bits to the end of the array, rather than discarding them.

Rotating bits left by 1 place:

`LEFT CIRCULAR SHIFT 1101 BY 1 = 1011`

`RIGHT CIRCULAR SHIFT`

: Performs a right shift by some number of positions and adds the shifted bits to the end of the array.

Rotating bits right by 1 place:

`RIGHT CIRCULAR SHIFT 1101 BY 1 = 1110`

Rotations do not remove and replace information in the same way that shifts do.

For our bitwise operations to work correctly, we need to know which way bits will be processed in our computer. The way that we refer to this is endianness, a term that describes which position in a bit array is the starting position. This will be the position where the computer starts writing or reading when it is processing a bit array.

There are two types of endianness: Big-endian and Little-endian. In Big-endian systems, bits are processed starting from the smallest memory address while in Little-endian systems, they are processed from the largest memory address. Shifts and rotations will produce different results depending on the endianness of a system.

Why are bitwise operators important? They form the core operations that allow a computer to perform mathematical calculations on bits. To illustrate, we will use binary operators to perform an addition of two integers: `0001`

and `0010`

(This is `1`

and `2`

in decimal).

To perform an addition, we perform the following operations and test for the case where one of our operands equals zero:

(1) `AND`

(2) `XOR`

(3) `LEFT SHIFT`

See the full pseudocode here.

Using our example operands, there is only one iteration needed to perform an addition.

Therefore, we can reduce the required steps to a single `XOR`

operation:

`0010 XOR 0001 = 0011`

This is the same as `1`

+ `2`

= `3`

in decimal. Using a single operator is a bit of a fluke, a more complex addition would require multiple iterations. Still, recognise how we performed a mathematical operation using a binary operator. The core mathematical operations that your processor performs are constructed this way.

This is a quick reference for the basic bitwise operators. While it is almost certainly unnecessary for most people to understand bitwise operators in depth, it is worth understanding how your computer uses them to represent data and perform computations. Hopefully this helps you develop an intuitive connection between the operations performed on bits at the processor level, and the higher level the computations your computer performs.

Photo by Gary Aaronson on Unsplash

https://en.wikipedia.org/wiki/Bitwise_operation

https://en.wikipedia.org/wiki/Endianness

https://stackoverflow.com/questions/7184789/does-bit-shift-depend-on-endianness

https://stackoverflow.com/questions/3722004/how-to-perform-multiplication-using-bitwise-operators

Do you need to understand binary? Hopefully not! The history of computing, in a sense, has been a progression away from needing to understand the inner workings of a computer through layers of increasing abstraction (compilers, programming languages, operating systems, graphical user interfaces, etc.). Nonetheless, understanding binary will help you develop your understanding of the way that computers store data and perform calculations.

The most basic description of binary, of course, is that it is a base-2 numeral system, meaning that the entire system is represented using two numerals: 0 and 1. We are more familiar with the decimal numeral system (a base-10 system) which has 10 numerals: 09. Binary works exactly the same as the decimal system, but, because there are only 2 numerals, we quickly need more digits to represent larger numbers, compared to counting numbers in decimal. From here on, we will refer to a single digit of binary, which can have the value 0 or 1, as a bit.

When we count from 0 to 10 in the decimal system, we use the numbers 0 through 9, then add a new digit in the tens position and start counting our numerals over again in the ones position:

`0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10`

In this case (obviously), we have added a new digit and started counting again from the beginning of our numeral system.

In binary, we perform the same steps, but we quickly run out of new numerals to use:

`0, 1, 10`

Therefore, we add a new digit and continue counting (the above represents 0, 1, 2 in decimal).

This continues for higher numbers, and we need to add new digits:

`0, 1, 10, 11, 100, 101, 110, etc.`

In fact, we can represent the number of bits that is needed to store a decimal number in binary as:

`log2(10 ** n)`

Where n is the number of digits in the decimal number anddenotes a ceiling function. Using this equation, we find that we need 14 bits to represent the number 1000 in binary.

A single bit can only store a small amount of information. To do anything actually useful with bits, we need to use a collection of them. For convenience, we refer to a collection of 8 bits as a byte.

`1 byte: 0000 0000`

Why does a byte have 8 bits? It has to do with the original encoding of characters in a computer. Importantly, a byte has the ability to represent the decimal numbers 0 to 255. Because there are 2 combinations of bits in a byte, we can represent 256 numbers using a single byte:

`0000 0000, 0000 0001, , 1111 1111`

This is a useful amount of storage, and is the basis for the other standard quantities of computer memory (Kilobytes, Megabytes, Gigabytes, Terabytes, Petabytes).

Nobody except your computer wants to look at binary. Because of the large number of bits required to store information in binary, it becomes very hard for humans to work with. Imagine writing a computer program in binary! It is also onerous to print out lots of binary from a computer program. To address this, we use a simple system to represent binary: hexadecimal.

Using four bits, we have the ability to represent 16 numbers (2):

`0000, 0001, , 1111`

If we extend the decimal numeral system a bit to include more characters, we could represent each of these combinations of bits with a single numeral:

`0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F`

Hence, the hexadecimal system (a base-16 numeral system: hex a dec imal) with the numerals from 09 and the characters A-F. Instead of using `0000`

, we could simply use `0`

, and instead of `1111`

, we could use `F`

.

Find the whole table here (4 bits are called a nibble, for your information).

Using hexadecimal, we can represent:

`0000 0100`

Simply as:

`04`

That saves space and is easier for a human to interpret. Using hexadecimal, the representation of a byte goes from 8 to 2 digits. In Python for example, a byte is represented as two digits of hexadecimal (with a \x prefix):

`\x8a`

Understanding binary is like taking a deep dive into the way your computer sees the world. Luckily, there is little need for most people to interface directly with binary. It is still helpful to understand exactly how information is represented in a computer. This representation underlies the core data structures of computer science, and operations on bits are what make all computation possible. Hopefully you have gained an appreciation for the innovations that have separated us from the ones and zeros.

Cover Photo Photo by Felix Janen on Unsplash

https://en.wikipedia.org/wiki/Binary_number

https://stackoverflow.com/questions/7150035/calculating-bits-required-to-store-decimal-number

https://stackoverflow.com/questions/42842662/why-is-1-byte-equal-to-8-bits

https://en.wikipedia.org/wiki/Nibble

https://stackoverflow.com/questions/52688418/python-bytes-representation