Category Archives: LInguistics

Tips on cleaning English text data for analysis

Here’s some advice on how to clean natural text for data analysis. These suggestions are meant for English. These are in order of how useful I think they are, not the order that you should apply them. For example, you would need to do safe reduction before deleting stop words. 1. Keep copies Keep a… Read More »