Design tinyurl system
Design a tinyurl system is a very frequently asked design question in This system converts a long URL into some small URL, typically people use it when they post something on platforms where there is limit on number of characters like Twitter.
This is system has perfect use case for hash table or hash maps. Hash tables are data structures which store key value pairs where lookup operation for a given key is O(1). So, given a key it’s corresponding value can be found in constant time.
As mentioned above, tinyurl system creates a small url for long usual url. Hence short url is key and the long url is our value. Now, how long short url can be? It depends. Let’s say one creates three character short url. As allowed characters in url are a-z, A-Z and 0-9, the number of possibilities for each place in three character long url is 62, which can represent 2^62 (approx. 250,000) urls. If urls never expire and more url needs to be shortened, the system has to use 4 character which will eventually hit the limit and hence size will be increase to 5 characters.
Well, let’s talk design now. Following are steps are must to design tinyurl system:
1. Take the url and hash it. Define a good hash function which has very low collision rate. What are good hash functions?
2. Store the entry into hash table with tiny url as key and long url as value.
3. When tinyurl is accessed, go to the hash table and access hash[tinyurl] and redirect to that page.
However, it is good to mention here that normal hashing functions will not work here. Usually the hash key generated by them is too long. For example MD5 hashing function generates 32 bit key. Our system should some kind of permutation algorithm which incrementally goes through all the permutations of characters and assign value to it.
Design tinyurl system : mapping function
One thing we need to think of in more detail is how to implement hash, where if tinyurl is pass it gives fetches the correct index in the hash, and at that index we have our long URL stored. Our tinyurl consists of alphanumeric letters [A-Z][a-z][0-9] that is total of 62 digit. If we store long URL in simple indexed based hash, and convert that index to base 62, every digit in base 62 will give a us character, for example, take index to be 134, in base 62, it can be represented as
134 = 2 X 621 + a X 620
So the digits which we get are [2,a]. Now how to get character corresponding to these digits? Create a table such that
table ='A' table ='B' table ='C' ... table ='a' table ='b' .... table ='0' table ='1' ... table ='9'
Lookup each digit in this table and get the corresponding URL. Other problem which needs to be solved is lookup. That is given a tinyurl, how to get the index in hash table?
To convert, take each character in url, do a reverse lookup on alphabets. Resolve the position of each alphabet in table and covert those digits into base 10 numbers. For example, if the tinyurl is a93, then we look up the table and find that corresponding positions are [26,61, 53], So the decimal conversion will be like:
a9362 = [26,61,53] = 26 X 622 + 62 X 611 + 53 X 620 = 99,944 + 3782 + 53 = 103,779
Go and find the long url on index 103,779. This is good hashing function because it is bijective function. A bijective function is a function f(n) such that for any i and j with i != j, f(i) != f(j); hence collision chances are none.
Below figure shows the entire design concept.
Very good points which I read from stack overflow by Abel. It’s worth mentioning as it will earn point for you in interview.j
It says that whenever someone clicks on a tiny url, HTTP request is sent to their server. Remember that out key which was generated above is last part of the real url for example, in tinyurl like biturl.in/aDffg1, on aDffg1 is our key, biturl.in is used to send request to designated server. So one of the design aspect of the system will be to set up a server with given url.
Now, server looks for the corresponding long url and uses 302, that is for temporary redirect to redirect user to actual url. Most important part Adel talks about is this and I quote
If you were to use files or first load HTML and then redirect, the browser would add TinyUrl to the history, which is not what you want. Also, the site that is redirected to will see the referrer (the site that you originally come from) as being the site the TinyUrl link is on (i.e., twitter.com, your own site, wherever the link is). This is just as important, so that site owners can see where people are coming from. This too, would not work if a page gets loaded that redirects.
PS: there are more types of redirect. HTTP 301 means: redirect permanent. If that would happen, the browser will not request the bit.ly or TinyUrl site anymore and those sites want to count the hits. That’s why HTTP 302 is used, which is a temporary redirect. The browser will ask TinyUrl.com or bit.ly each time again, which makes it possible to count the hits for you (some tiny url services offer this).
Good explanation to talk about in interview is given coding horror blog entry for tinyurl. So this is what you should talk when somebody asks you to design tinyurl system when asked in design round. Please share your views on this in comments. You can like the page for more updates and interview experiences.