Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Everytime a "what's the best language for Machine Learning/Data Science?" thread pops up, it always devolves into a flame war between "real data analysts use R" and "Python has thousands more libraries!". (most recent example 16 days ago: https://news.ycombinator.com/item?id=13110230 )

My response is always use-the-most-appropriate-tool-for-the-job-dammit and don't pigeon-hole yourself into one language, since each language has their pros and cons. I am very tempted to write an HN autocomment bot at this point. (in Python instead of R, of course, since that's the most appropriate tool for this job)



That's a trade-off. Is the cost of learning and context switching between languages/frameworks less than the benefit of using a tool that is X% more appropriate?


Are R and Python really your only choices? Basic competency in machine learning is on my todo list, and I don't relish the thought of using either language.


There is a pro for Python in that it makes machine learning really easy, or at least incredibly accessible. You can write and test classifiers in about 5 lines of Python using scikit-learn. The second point is that virtually all the latest deep learning packages come with Python frontends by default nowadays. For stats you could also use SPSS.

The other advantage of Python is that as a scripting language it's very powerful for data wrangling and pre-processing, without needing all the boilerplate that e.g. C++ would require.


Not to join a flame war, but R makes it pretty easy to test multiple models on a single dataset as well. I have also noticed it does better stats and missing data handling out of the box.


I have played around with scikit-learn and love how simple and easy it is to work with, but the story for scaling it doesn't seem super straightforward - is this something anyone here has experience with?

I built a recommendation system in Spark earlier this year that used terabytes of input and would run it on a 40 node EMR cluster so it took less than half an hour. It wasn't trivial to make it run in a clustered environment, but it wasn't very hard either.


Out of curiosity, were you using spark-scala or pyspark?


I was using scala


If you consider SPSS as an alternative, you'll probably really have no use for R. I agree that Python is more approachable for people with a CS background (unless your fan of array processing languages) but R actually is a nice language for data centric tasks.


Julia is another option. You can even call R and python code from a Julia REPL.

You generally want an interactive language, though, because there is an iterative cycle in prototyping models.


They are not the only ones, but python is definitely the one which is likely to get you the most productivity fastest.


R and Python are not your only options. Check out the article for some other languages people are using.

R and Python are probably the two with the most support/community materials around them - lots of tutorials, libraries, guides etc.


Which is the one main reason why PHP is still around. If you are starting a new language or a new project, it is better to have examples and guides available.

I am preferring python to "R" because it gives me better search results.


Given that the article mentions many library implementations in Java, that puts JVM languages on the table. While Scala may be a good one, which the article mentions, that also puts Clojure on the table as well.


Just added clojure to the query to see what it gives. You can do it too, simply follow the link I give in the article. Closure is a bit less popular than Julia, hence not significant at this point. See https://www.indeed.com/jobtrends/q-python-and-%28%22machine-...


> don't relish the thought of using either language

Why not?

> only choices

Obviously not, according to the linked article. However, many people like Python and/or R. Perhaps you should find out why before dismissing their choices.


"Everytime a "what's the best language for Machine Learning/Data Science?" thread pops up,"

there will always be someone saying your statement as well.


use-the-most-appropriate-tool-for-the-job-dammit

Wow, and to think some fools think the most inappropriate tool should be used.


Given how often it comes up, you might be better off using Erlang\Elixir for concurrency sake.


Writing ML in Elixir and be able to natively integrate the code in a Phoenix would be great. With Elixir I think the problem is the lack of libraries, especially ML libraries..


Elixir, I'm afraid, is the wrong tool for ML. Numeric computation is Erlang's weak spot.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: