Author Archives: Victor Stoddard

General Problem Solving

General problem solving (GPS) involves solving every kind of problem in a satisfactory manner. This includes steps referred to as the “problem-solving cycle.” These are steps used in order until a satisfactory solution is found. How acceptable the solution depends on personal judgment. Five of the most common processes and factors that researchers have identified… Read More »

Generating Unique Short Hashes

Ever wonder how URL shortening websites get such short hashes? Most of them simply make a random hash then check if they’ve used it before. An advantage of this system is that very short hashes can be created with an no possibility of collision — especially if a wide range of characters are used. While… Read More »

How not to save user passwords

On March 21, 2019, Facebook announced that it had exposed hundreds of millions of their users’ passwords. A bug in its password management systems caused passwords for Facebook, Facebook Lite, and Instagram to be stored as plaintext in an internal platform. As a result, thousands of Facebook employees could have potentially seen them. Krebs reports… Read More »

Tips on cleaning English text data for analysis

Here’s some advice on how to clean natural text for data analysis. These suggestions are meant for English. These are in order of how useful I think they are, not the order that you should apply them. For example, you would need to do safe reduction before deleting stop words. 1. Keep copies Keep a… Read More »

Why does PHP’s password_hash() output change each time the same password is hashed?

Nota bene: the hash() algorithm in this article has been slightly altered so that the code below doesn’t work. This is intentional: this code should not be used for secure hashing as it is merely a demonstration of why the same password can generate a different hash. The hash for a password should change each… Read More »

Decision Trees for Linguists Pt. 2 – Information Gain

Entropy and Information Gain Continued from Part 1. Let’s start with some definitions. Tree: A hierarchical structure of nodes and connections between those nodes (branches) with parent-child relationships. Child nodes have parent nodes, which in turn may have their own parents nodes. The highest node is the root node. Decision tree: a flow-chart-like structure where… Read More »

Decision Trees for Linguists Pt. 1 – A super simple example

Continued in Part 2. Purity If I have a basked of apples, and only apples, then it’s considered “pure.” If I put a banana in the basket, I can no longer call it a basket of bananas, and it’s now considered “impure.” Purity is a measure of how varied a set is. A purity of… Read More »

N-Gram Tutorial in R

What are n-grams? N-grams of texts are extensively used in text mining and natural language processing tasks. They are basically a set of co-occurring words within a given window and when computing the n-grams you typically move one word forward. For example, for the sentence: “The cow jumps over the moon”. If N=2 (known as a… Read More »

One-Shot Learning: The End of Big Data?

Recently, a Bayesian probabilistic model outperformed neural networks and humans  in classifying written letters using very small datasets. One-shot learning is a type of machine learning that learns an object class after just one or a few examples. This is similar to how humans learn to identify objects, creating a rich, abstract template of objects… Read More »