Hacker Newsnew | past | comments | ask | show | jobs | submit | gsheni's commentslogin

I've been working in the open-source community for six years, and I've learned a lot about what it takes to create successful open-source projects. In this blog post, I share the key lessons that have made the biggest impact on my work. Check it out and let me know what you think!


Featuretools-SQL: import data from your RDBMS and perform automated feature engineering. Supports SnowFlake, MySQL, Postgres.


If your dataset has hundreds of columns, do you really want to individually identify the correct type for each column?

With Woodwork, an open source library for rich semantic data typing, we made type identification fast, simple, and effortless.

Read about our work to understand how we added type inference for natural language columns.


For a long time data analysis products have had "profiling" tools

https://en.wikipedia.org/wiki/Data_profiling

which can look at the values in a column and make some inferences about the column such as "these are all integers between 35 and 89". Most of those work at the level of the whole column, but I worked at a firm that developed a convolutional network classifier that could take either a single data point (say "1999-08-24") or the column header text plus the data point ("Independence Date", "1998-08-24") and guess at the data type (e.g. "date", "address", ...)

It worked really well but wasn't explainable. Another disadvantage was that there was some things it was never going to figure out, such as this checksum on credit card numbers:

https://en.wikipedia.org/wiki/Luhn_algorithm


I haven't given much thought to type annotations.

pandas recommends using accessors to extend DataFrames (as we do in Woodwork), which are actually just decorators: https://pandas.pydata.org/pandas-docs/stable/development/ext...


I’ve been creating an open source library for rich semantic data typing, which helps you managing and communicating data typing information. If you’re curious what this library can do, check out our new blog post!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: