Category Archives: Tutorials

Generating Unique Short Hashes

Ever wonder how URL shortening websites get such short hashes? Most of them simply make a random hash then check if they’ve used it before. An advantage of this system is that very short hashes can be created with an no possibility of collision — especially if a wide range of characters are used. While… Read More »

How not to save user passwords

On March 21, 2019, Facebook announced that it had exposed hundreds of millions of their users’ passwords. A bug in its password management systems caused passwords for Facebook, Facebook Lite, and Instagram to be stored as plaintext in an internal platform. As a result, thousands of Facebook employees could have potentially seen them. Krebs reports… Read More »

Tips on cleaning English text data for analysis

Here’s some advice on how to clean natural text for data analysis. These suggestions are meant for English. These are in order of how useful I think they are, not the order that you should apply them. For example, you would need to do safe reduction before deleting stop words. 1. Keep copies Keep a… Read More »

N-Gram Tutorial in R

What are n-grams? N-grams of texts are extensively used in text mining and natural language processing tasks. They are basically a set of co-occurring words within a given window and when computing the n-grams you typically move one word forward. For example, for the sentence: “The cow jumps over the moon”. If N=2 (known as a… Read More »