This assumption was based on an estimate that the service would not need to handle more than 1000 writes / second. Notice the assumption that a 7 character URL with 62 available characters for each position would be sufficient. In the context of a designing a system, these concepts can be used validate critical assumptions. Most of this post has focused on the mechanics of how to convert between different bases - a topic I’ve investigated in the past, but clearly had not really stopped to think about deeply like I did today.Īnd while I feel the need to understand the mechanics before I can safely apply a concept, the why here is important. ![]() Now we can convert this into our alpha-numeric representation: 4 LcIdVX6.Īnd just like that, we’ve converted the first 43 bits of a URL into its base 62 representation. Now, we could convert this to base 62 to represent it with our available character set. Converting it first to base 10 makes this exercise a little clearer however because of the familiarity of base 10 (this is also what Tushar does, and I assume it’s for similar reasons). Taking the first 43 bits of the hashed URL then, we could convert it to base 62. For example, if, instead of converting 13, I tried converting 256 to base 2 with only a byte of space, I’d run into problems because 1. This is also how we can tell if we need more space to store a value. Since I’m using a byte, I started at 2 7 and worked down from there. If we have 8 bits, we look at each bit to see if it should be flipped. Using the example of the byte then, let’s take a quick look at going backwards from base 10 to base 2.ġ / 2^0 = 1 (equivalent to 1 % 2^0 = 0) This is analogous to how we evaluated the byte above, but now we’re going from 43 binary bits to a base 62 string. Base 62, in this context, means that we can map the value to a letter a-z, A-Z or a digit 0-9. One way to do that would be to hash the long URL and then take the first 43 bits and convert them into base 62. Now that we know how long our URLs need to be, we can figure out how to create them. If we did not add the 43rd bit, we’d be left with ~2.2 trillion positive numbers and 2.2 trillion negative - falling short of the 3.5 trillion target. So, why the 43rd bit? This is reserved for the most significant bit, which can be used to correspond to the sign. What is the value of 2 41? ~2,2 trillion. ![]() The value of a byte then is capped at ~2 8. This is because 11111111 is evaluated as: 2 The byte that represents 255 is 11111111. For starters: A bit is a single digit, 1 or 0. ![]() To see why this is the case, let’s take a quick detour to refresh ourselves on bits before determining where 43 came from. 1Īt this point, we know the number of possible permutations, but Tushar then makes the claim that we can represent numbers 0-3.5 trillion with 43 bits. If the URL is 7 characters long, then we have 62 7 possible permutations ( ~3.5 trillion). This is the total set of available characters. We can always change that assumption later.įirst, if the URL is comprised only of a-z, A-Z, and 0-9 characters, there are 62 different options for each spot (26 for a-z, 26 for A-Z and another 10 for 0-9). ![]() One thing I didn’t fully understand, and so want to explore in this post is converting bases and how to determine how much space is needed to represent a particular string / number.įor the purposes of this exercise, we’ll follow Tushar’s example and decide that a 7 character URL of strings and numbers is a good starting point. In his System Design walkthrough of TinyUrl, Tushar Roy does an excellent job of describing which assumptions he’s making and the pros and cons of the different decisions.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |