henripal's comments

henripal · on Feb 1, 2023

We are a series A stage biotechnology company building a novel platform for drug discovery focusing on difficult targets. Machine learning has struggled in early stage drug discovery efforts because most of these efforts do not have enough data for the models to parse. Our technology solves the data problem with massively parallel biochemistry in the form of DNA Encoded Libraries (DELs), allowing us to analyze 100-1000x more compounds compared to traditional approaches. By feeding our algorithms with this data we can identify better compounds faster than competing solutions. We recently closed a substantial Series A, having assembled a highly interdisciplinary team of both bench and computational scientists and proven the technology by showing how our data and ML models improve on the state of the art by 3x-16x depending on the application. Come help us build the future of drug discovery!

Tech Stack: K8S, Python, Spark, AWS, Pytorch

*You’ll own:*

- Bespoke analysis of early stage drug discovery data - Create Python tools to assist bench scientists in the day-to-day activities - Work with ML specialists to deliver useful data

*You’ll assist:*

- Maintain and upgrade existing registration and analysis tools - Processing and validation of large data sets - Interpreting analyses, including ML results, for bench scientists as needed

*Skills & expertise you have:*

- A BS or MS degree in a Computer Science discipline with 0 – 3 years experience, - OR a BS/MS Chemistry or Biology degree with at least 3 years professional Python programming experience - Python programming - Familiarity with SQL - Chemistry or Biology lab-based classwork or equivalent experience - Good communication skills. Mentoring or tutoring experience a plus as this role requires the ability to explain compute-oriented concepts to non-compute professionals

Email me directly if you're interested! henri at anagenex dot com

henripal · on June 1, 2022

We are a seed stage biotechnology company building a novel platform for drug discovery focusing on difficult targets. Machine learning has struggled in early stage drug discovery efforts because most of these efforts do not have enough data for the models to parse. Our technology solves the data problem with massively parallel biochemistry in the form of DNA Encoded Libraries (DELs), allowing us to analyze 100-1000x more compounds compared to traditional approaches. By feeding our algorithms with this data we can identify better compounds faster than competing solutions. We recently closed a substantial Series A, having assembled a highly interdisciplinary team of both bench and computational scientists and proven the technology by showing how our data and ML models improve on the state of the art by 3x-16x depending on the application. Come help us build the future of drug discovery!

Tech Stack: K8S, Python, Spark, Snowflake, AWS, Pytorch

Full job descriptions: - Senior/Staff/Principal Backend Engineer (Dev/Data/ML Ops): https://jobs.lever.co/anagenex/51c2f505-ac83-490a-a10b-f5d88... - Senior+ Computational Chemist: https://jobs.lever.co/anagenex/0accc8e9-a9f0-4b02-af62-8819f...

Email me directly if you're interested! henri at anagenex dot com

henripal · on April 12, 2019

I do machine learning/data science consulting, mostly for startups. Literally 80% of my time is cleaning data and coding up basic ML for their excel spreadsheets :)

zihua · on April 12, 2019

Man that's so annoying... what do you mean by "for their excel spreadsheet" though? Also, have you tried something like AutoML?

henripal · on April 12, 2019

Most of the data outside software startups is in Excel spreadsheets :) And yes - believe it or not I rolled out a Beta version of an AutoML app for spreadsheets app today!

toomuchtodo · on April 12, 2019

Is this something you’ve opened sourced or are charging for? Very interested either way!

henripal · on April 12, 2019

Email me (email in profile) - not open source but completely free Beta. Just trying to figure out if it's something people would be interested in. (Edit: email rather than DM)

swuber · on April 12, 2019

It my experience, even within software startup most data is in excel spreadsheets :)

henripal · on July 15, 2018

Self plug: If you enjoy this and want some (less technical) insight on connections between ML and thermodynamics, you might enjoy my series of blog posts: http://henripal.github.io/blog/stochasticdynamics

SmooL · on July 15, 2018

Oh I'm going to enjoy this, thank you!

henripal · on July 16, 2018

Happy to talk some more about this :)

Another self plug (would have put it in first post but the video was just uploaded):

Here's my SciPy 2018 talk on the subject (from last week!!) https://m.youtube.com/watch?v=WUs0u2PJ2UU&index=46&t=0s&list...

henripal · on April 5, 2018

Cool. I'm guessing from trying it out that you're using Highcharts. I've run into really unpleasant memory leaks/slowness when streaming data (especially when the existing chart is already thousands of data points). Are you seeing something similar?

nicodjimenez · on April 5, 2018

Yes using Highcharts. You've had issues with Highcharts? Yeah it's not designed to stream data extremely rapidly but it's a great "good enough" product, especially for something like Losswise where the differentiation is the overall design and architecture and developer experience, not the prettiest possible graphs.

henripal · on April 5, 2018

Yes... For example if I'm running three experiments at the same time, auto-refreshing the chart every two seconds, it essentially freezes the app to a crawl after a thousand points or so. So we reverted to manual updates.

If you know of any better alternatives for data streaming, I'm curious. I tried benchmarking a couple libs recently: https://github.com/henripal/ChartingLibBenchmark

TorsteinHonsi · on April 6, 2018

As a Highcharts developer, I had a look at your benchmarking, and have some thoughts about optimizing for Highcharts. The first step is to turn off animation, which helps a lot. The default Highcharts animation on addPoint is 250ms, so with a refresh rate of 100ms you will get a lot of redrawing going on for nothing. The second thing that possibly optimizes a bit is to use hard-coded axis values so that it doesn't have to recompute axis values for each iteration.

With those modifications the performance is much better: http://jsfiddle.net/highcharts/1o5ghqc8/

nicodjimenez · on April 5, 2018

Yeah I have no idea. If you need really high performance that Highcharts doesn't provide you probably need to write your own specialized charting library.

henripal · on April 5, 2018

Cool, thanks.

For those of you who want to tinker, there's a much rougher, open source library based on Vuejs, postgres, and Flask with some momentum on GitHub right now, LabNotebook https://github.com/henripal/labnotebook

(Disclaimer: I'm one of the authors)