Today I am so pleased to introduce my first PyPI package (so much easier to submit comparing to CRAN) — gower for calculating gower distance. The core function is originally published by Marcelo Beckmann. There are lots of packages in R that incorporated this method but unfortunately not for Python users. I took this chance to try the whole package-making experience for PyPI and here we go!
What is gower distance?
Intuition I was working with an Elasticsearch project on AWS using Python and the requests_aws4auth package worked like a charm for me. Never had any issue regarding the authentication (AWS V4 could be hard to work with sometimes). However, when I trying to create a Shiny app for my project, the problem emerged. I just couldn’t get the V4 auth to work with httr in R. I tried aws.signature package on Github but keep getting request header issues.
Motivation Well, I think it all start with one of my favorite tweets from 2013:
In Data Science, 80% of time spent prepare data, 20% of time spent complain about need for prepare data.
— Big Data Borat (@BigDataBorat) February 27, 2013
When building NLP models, pre-processing your data is extremely important. For example, different stopwords removal, stemming and lemmization might have huge impact on the accuracy of your models.