Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN:Skills/Tools to Stand Out as a Data Scientist
18 points by methusala8 on June 1, 2019 | hide | past | favorite | 15 comments
Apart from Conventional tools like Python/R and Knowledge of Machine Learning/Statistics/SQL are there any other skills that I can pick up in order to up skill myself as a Data Scientist?

I have nearly three years experience in this field and would like to level up. Thanks.



A provable ability to conduct Bayesian data analysis: from experiment design, to modelling, to evaluation and back again.

I think it’s what makes a “data scientist” legitimate.


Some others have commented that this isn’t “required” but I can tell you that solid understanding of Bayesian statistics makes everything make way more sense and gives you a great basis for building more understanding of any ML algorithm. I don’t mean just memorizing Bayes’ theorem and answering the canned “99% accurate test is positive for a disease that affects 0.01% of the population” interview questions, but understanding relationships between methods (GP’s, SVM, neural nets, tree based things) and how they relate back to Bayesian statistics either directly or by approximation is key to having true visibility into what you’re doing. Sure you know about loss functions and regularization, but if you can’t look at a loss function and translate it into a precise statistical statement in terms of priors and likelihoods, you don’t truly know what’s going on. You also will have a hard time extending other loss functions/regularization schemes in a way that isn’t problematic, and explaining/justifying the choices you make on that end. Thinking in terms of Bayesian statistics (Bayesian stats IS stats, it’s mathematically equivalent to frequentist statistics but has a different philosophical interpretation) saves you from many many rabbit holes that in hindsight are obviously Bad Ideas when seen from a more Bayesian point of view. Not to undermine people without a strong statistical background, you can still do good work, but your life becomes easier and less ad hoc (in my experience) once you understand things statistically. You should ideally understand the assumptions behind your models, and what the implications of those assumptions are, and it’s hard to do that consistently and rigorously without Bayesian statistics.

That being said, Bayesian statistics is not SUFFICIENT for doing data science well. You need to know pragmatic solutions to intractable problems, what heuristics might be ok to try, how to EXPLAIN things to people, how to code well enough, etc. But boy does knowing Bayesian statistics help.


Thanks for the detailed reply. I will try and understand the connection between all the various algos. As of now, I am only looking at the frequentist approach and the general linear models umbrella.


Cool! Kudos! Also don’t feel too discouraged if it takes you awhile, I’ve been trying to learn Bayesian related things for years now, and there are many aspects with significant learning curves (speaking for myself, maybe you won’t have so much trouble!). It’s not trivial stuff. But it’s helpful so every bit you can understand is a good investment I think.


I am yet to encounter projects that require this skill, hence not well-versed in this. I will look this up.Thanks.


It’ll colour your approach to everything but it’s unlikely to ever be directly “required”.

I’d recommend BDA3. It’s a Treasure of a book. The maths is relatively simple but it does expect a early grad level mathematical maturity.


Is this the book you are recommending? http://www.stat.columbia.edu/~gelman/book/


Yes.


I will look into this book. With the recent threads on HN commenting on whether Data Science is over-hyped, I think this knowledge would help in adding value.Thanks.


Depending on the role, data science is generally either [data analysis (with very little modelling) + business understanding + communication and presentation skills], OR it's [statistics + software development]. There can be some deviation and mixing between the two, but to help with the latter:

- linear algebra

- calculus

- software development - best practices, version control, design patterns etc.


Can you elaborate on the third point with any resources for Best practices and Design patterns? I looked into Amazon and came across a few books for both. Many people have commented that Data Science should move towards Software best practices, etc. As I am a statistics major, This is a gap that I would have to bridge. Thanks.


Look up The Pragmatic Programmer and Clean Code - they're decent books and I'm sure other people on HN have even better recommendations.

Also, you don't necessarily need to cover something completely in order to get started. And these will be useful mostly if your work is in some way part of a software product instead of being some "offline" analysis, model build or forecast. Just learning to use version control, to collaborate with other developers and to run tests goes a long way.


Fundamental understanding of linear algebra and optimization techniques


Any particular resources that you would recommend for Optimisation techniques? Thanks.


Data visualization and storytelling.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: