Parallel Computing

Preprocess Text in Python --- A Cleaner and Faster Approach

Motivation Well, I think it all start with one of my favorite tweets from 2013: In Data Science, 80% of time spent prepare data, 20% of time spent complain about need for prepare data. — Big Data Borat (@BigDataBorat) February 27, 2013 When building NLP models, pre-processing your data is extremely important. For example, different stopwords removal, stemming and lemmization might have huge impact on the accuracy of your models.