Generating Unique Short Hashes

By | May 1, 2019

Ever wonder how URL shortening websites get such short hashes? Most of them simply make a random hash then check if they’ve used it before. An advantage of this system is that very short hashes can be created with an no possibility of collision — especially if a wide range of characters are used. While It’s acceptable to use [a-zA-Z0-9], and maybe a few special characters, we’re going to just use uppercase letters and no “O”, since if often gets confused with a zero.

Hashing is usually done by some kind of summing, but we’re going to ignore that altogether. All we want is to map a random set of characters to a URL. This can be done with hashing, but getting very short hashes of, say, 4 characters is almost impossible while preventing collision. If you consider hashing to be a summation, then the title should possibly preplase “hashes” with “unique IDs.”

We’ll start by creating some global parameters for our functions. We’ll set the length of the hash to 4, but it can be any number. A hash of length 5 allows for 255 or 11,881,376 unique IDs — and that’s just using our 25 characters.

We’ll also make a global variable for the URL to be hashed. Ideally, you’d pass it to the function, but I’m using a variable here so you can see its value.

Here we create an array of hashes that have already been assigned, so they can no longer be used. Ideally, you’d use a database for this. You’d only need a few columns and the query would be very simple. We’re going to use a nested array where there are only four hashes assigned so far, and each is mapped to a domain. The hash “column” must be unique, and the “url” column should be unique to save repetition, but it’s not necessary. For the purpose of this tutorial, we’re only checking for a unique hash.

We’ll now generate a random hash. We’ll need a function for this. It will create a hash of specified length. It’s not truly random, but it is truly unique — which is what’s important when it’s used as an ID.

We’ll need to check if the hash already exists in our array of used hashes. If it exists, we can’t assign it because it’s already been taken. Otherwise, we can use the hash as an ID.

Our last function actually gets the hash. You must pass the text you want passed, in this case a URL, and the length of the hash, in this case 4 characters.

Lastly, we’ll get the hash using the getHash() function then write it to the console:

Lastly, using only four characters from a set of 25 will give you to just under 10 million possible hashes. If you feel you need more, adding just one character will bring this up to 244 million.

Here is the code in action. Feel free to alter it.

Share this article