Hacker Newsnew | past | comments | ask | show | jobs | submit | rlh2's commentslogin

People also love to hate R but data.table is light years better than pandas in my view


I tried this out and switched to ledger-cli instead. All of the features listed above are possible and the the flexibility is incredible. The hardest part is data entry. For accounts that provide API access I wrote curl/jq/awk parsers and for accounts that you can only get via download, I wrote short csv parsers in awk. I have ~ 15 accounts to keep track of and various loans/securities and I use ledger-cli to keep track of cost basis, interest vs principal on loans and my mortgage, all expenses, etc.

Took some time to write the transaction parsers, but updating everything takes about 5 min/month now and I have a complete financial picture without sending any data to something like mint.


I envy you, this has been my dream for years. But obscure payment gateways make this a task for the chosen few in this country. If somebody could point me to a workable solution (on Linux / cross platform) for Germany it would be highly appreciated.

“Every financial system is broken in its own way” (ex-coworker who had worked in financial services)


I use `aqbanking-cli` for grabbing transactions from my banks' FinTS/HBCI API and generate a CSV out of that. That CSV then goes through a bit of Python that splits up the entries into transactions. Those get rendered out as `beancount` transactions (but `ledger` works as well, I used that before I switched to beancount) and appended to my actual ledger.

I then use `fava` (a beancount web UI) to fix mistakes, and have another piece of code (this time written in Go, but could be Python/whatever as well) that takes transactions that are generated from my brokerage account and enriches them with data parsed from my brokers' PDF reports (since the FinTS/HBCI info doesn't contain stuff like ISINs or taxes/fees separately).

This is for my personal finances, but I used the same system (minus the brokerage stuff) when I managed the finances of a hackerspace in the middle of Germany for a few years.


~Look for aqbanking and hbci/fints. Most of the major banks hace an endpoint, though you might have to google for the exact data~

Edit: Missed that his is about ledger-cli, not gnucash. Sorry.


Actually https://github.com/dpaetzel/buchhaltung could fit your needs. Just found it because I thought that someone must have built this already.


Ok. But written in Haskell? Why would you write this kind of stuff in anything but python - it’s trivial after all, and it should stay trivial through all layers…


There are companies that offer APIs to access bank accounts as a service. Plaid.com is one of them, although you might need to search around for another one that supports your bank.

At one point, there was a decent standard called "OFX" which financial institutions were supposed to support. It let you use an app like GNU Cash or QuickBooks which could automatically connect to your bank account. That died apparently and was replaced by some API/standard called "Open Banking". It's shittier in every way since it seems to require a middleman now (like Plaid) whereas with OFX you could just query an endpoint easily.


Neat thing about gnucash in germany at least is that most traditional banks(but not the fintechs) support hbci/fints and there is a plugin(?) for this. Paired with the auto-assign its very easy to import transactions


Any change you open sourced those parsers? I also switched from GnuCash to ledger-cli a long time ago, so I'm interested to see your solution.


I did not, mostly because the parsers are custom to how I want things displayed. Here is a simplified example for Chase, though:

  BEGIN{
      FS = ",";
  }
  {
      if(FNR > 1){
          date = gensub(/([0-9]{2})\/([0-9]{2})\/([0-9]{4})/, "\\3-\\1-\\2", 1, $1);
          print date, $3
          print "    liabilities:chase"
          if($5 != "Payment"){
              print "    expenses:" $4, "  $", $6 \* -1.0
          }else{
              ## This exists because I have an offset in the account
              ## I use to pay the card with. If the amount in my residual
              ## account != 0, I know there is something wrong
              print "    liabilities:chase:residual  $", -$6
          }
      }
  }
Suppose this script is called 'parse.awk', you would then run: awk -f parse.awk myfile.csv

I've been doing this for a couple years now and so far I haven't had file formats change on me. Most of the complexity I have experienced so far is when the regex to parse line-items into the appropriate ledger account is non-trivial. It started off as a way for me to learn awk and double entry bookkeeping and somehow turned into something useful.


> It started off as a way for me to learn ... and somehow turned into something useful

This is the way.


My wife has used GnuCash for years. Then she's the average person, not a programmer or techie. She loves GnuCash.


I use ledger, but only for accounts where I'm a trustee on a trust

For my own accounts, I don't have time to deal with reconciliation, so I just do csv -> paste into spreadsheet -> have sql queries that sum things up


The obsession with cpu speed almost always confuses me in these topics. Time it takes to program is way more important, and that’s where a terse language like R shines. The base/most common functions are almost always executing C anyway. It’s kind of like lisp in that it’s easy to write slow code, but who cares if it’s “fast enough”? Also, it’s almost always easy to speed up if necessary at the R level and R’s C API is also easy to use for for numeric computing/optimization which is exposed at the C level if you want to use it.


It depends. Take for example any omic dataset where you might need to run a GLM model on ~500,000 rows. Codes I've seen for this operation can range in time from taking 30 minutes to 2 days.

My take away here is that, sure, for one operation the speed is not that critical, but there is always the case where that one operation will be used close to a million times in one analysis and then it all adds up. On top of that if it's implemented in C then the invocation from R to C and back will be happening that many times which adds to the slowness.


Yes, I use R, Julia, and Python from time to time depending on the case and my mood and they all have their advantages and disadvantages.

R is more than fast enough for straightforward prototypical analyses where a lot of the code is calling C or something lower level and you're not introducing something "new" to the interpreter system. But if you want to do some unusual optimization there's going to be something that bottlenecks everything unless you go into C/C++/Fortran yourself, and then Julia is a good compromise. I've had times when Julia didn't save any time whatsover, and other times when it took something that would literally run over a week at least in R and it was done in 30 minutes in Julia.

Having said that, the more I use Julia the more I find myself scratching my head about it. It's very elegant but it's just low-level enough that sometimes I wonder if it's worth it over, say, modern C++ or something similarly low level, which tends to have nice abstracted libraries that have accumulated over the years. I also have the general impression, mentioned in a controversial post discussed here on HN, that a lot of Julia libraries I've used just don't quite work for mysterious reasons I've never been able to figure out. Everything with Julia has gotten better with time but I still have this sense that I could put a lot of time into some codebase, and have it just hit a wall because of some dependency that's not operating as documented.

There's kind of an embarrassment of riches in numerical computing today, and yet I still have the feeling there's room for something else. Maybe that's the mythical golden language that's lured all sorts of language developers since the beginning though.


I have been thinking the same and had similar timing experiences. As Julia is lower level than R/Python, there is a lot of annoying things to take care of that are not needed in R/Python. And then why not use, say Rust? Or just Rcpp in R. We just did a small test program in Rust that is called very often on the command line and takes a couple of seconds to run. Very happy with the experience. Same run speed as Julia, 10 times faster than R/python, and no 60 second load time like julia.


Julia 1.9, now in beta, implements native code caching. Precompiling a Julia package now creates a native shared library, a ".so", ".dylib", or ".dll" file. For some packages, this lowers load time considerably. It may some time before many packages take full advantage of this.

The promise of Julia is that you can have the high-level interface and the low-level code in the same language. The alternative would be coding the low level code in Rust or C and then creating bindings for Python or R.

For a while Julia made the most sense for long-running code that is that is executed almost as often as it is modified (e.g. scientific computing). In this situation Rust or C static compilation times become a hinderance. As ahead-of-time and static compilation features get added to Juliaz this scope will expand.


Yes I follow this. The load time keeps getting better. And am looking forward to 1.9.

I really don't want to come across as negative, Julia is a fantastic language, and my hope is that that it will continue its impressive improvement path.

But to follow form the thread's sentiment, I have the feeling Julia lives in an unstable equilibrium. It is lower level than R/Python but doesn't quite deliver the benefits of rust/c/fortran/c++. I find my colleagues gravitate to one of the 2 equilibria.

Maybe your last paragraph crystallizes it. If one lives in the REPL, Julia is wonderful. Not how I work. I prefer the command line. Have new data, run code on it. Data changes in real time, code not. My code may run millions of times on different operating systems and only infrequently change.


We already have some forward prototypes of being able to run Julia ahead-of-time compiled native code from the command line.

https://github.com/brenhinkeller/StaticTools.jl

I think what we'll end up with is a language that can be used in both a fully static mode and in a dynamic mode along with some possible mixing. We may yet get the benefits of a statically compiled language as the tooling continues to develop. I do not see anything inherent in the language that would prevent that from happening.


In Julia you can go low-level, but there is no requirement. You can write purely high-level, generic, untyped code, with good performance. So I'm a bit reluctant to accept the claim that it's lower level.

What are the things where low-level code is required in Julia, but not in Python/R?


One of the key points of Julia is that the language you use for performance critical parts is also Julia. That applies to both the libraries like DataFrames.jl and for situations where you'd drop to a lower level language when optimising. I think being productive in Fortran or C++ is unrealistic for most scientific programmers.


It is a trade-off and a sweet spot has a lot to do with the specific context and background. Run speed matters a lot when the difference is between having to run your code on a dataset for half an hour vs through the whole night. Once you have prototyped your code, you are gonna use it more and more (not to mention runs in order to tweak parameters or validate results), and R's speed is not satisfying enough for my work. Python matlab are easy and fast enough to program in, and much faster for tasks that are computing-heavy. If I was getting into C I would not have saved as much time as I would have put into learning how run eg parallel tasks there safely. Moreover, R is not necessarily faster to program, always; real (ie tidyverse-style) R is quite idiosyncratic, if you come from a programming and not from a statistics background probably it will take more time to learn than it is worth unless it is sth important in your work environment.


When someone understands what is happening when their program executes they will write faster programs without much more effort.

You might like writing slow programs, but that doesn't mean people like using them.


I have been using R for almost 20 years now. I work on a medium-sized quant team at a large asset manager and we run several $BN off R - we mostly trade equities and vanilla derivatives. Our models are primarily statistical/econometric-based. In aggregate, we probably have about a hundred scheduled jobs associated with a variety of models and on the order of 15 shiny applications to facilitate implementation. We have an internal CRAN-like repo and everything we produce is packaged/versioned with gitlab CI/CD. We have RStudio Server at my firm and half my team uses that for development, the other half, including myself, uses emacs/ess. All of us use RConnect for scheduling & application hosting - it has it's quirks, but it's excellent in a constrained IT environment.

I often chuckle when people complain about R in production and how it isn't a good general purpose programming language, my experience has been the polar opposite. You can write bad code in any language, and R is no exception, but R allows you to write so much less code and R-core is truly exceptional at backwards compatibility. Our approach to R is basically:

- Don't have a lot of dependencies, and when you do have dependencies, make sure they themselves don't have a lot of dependencies. While we do use shiny as mentioned above, our core models are very dependency light and shiny is just a basic front end.

- data.table (which was designed by quants) is a zero-dependency package that is by far the best tabular data manipulation package that has ever been created since the dawn of time. We generally work on an EC2 instance running linux with a ton of memory. In the < .01% of cases where a dataset doesn't fit in memory (e.g. tick data), we do initial parsing with awk if file based or SQL if DB based and then work in R.

- Check/coerce argument types and lengths on function input to catch and avoid all the quirky edge cases that drive people nuts - it's so easy!

- I hate OOP and I love that R doesn't encourage it. Mutable state, especially for non-software engineers, is the devil. Don't get me wrong, OOP has its place, but the fact that R encourages functional programming is one of the best things about it. The slight inefficiency this produces is almost never a problem.

- R is not slow at all when used correctly. Additionally, the C API is a joy to use when necessary.

- Stick to the base types: vectors, matrices, lists, environments and data.tables (only exception). The fact that you can name, and then use names to index all of the above is stunningly powerful. The only "objects" we really create are lightweight extensions of lists with an S3 print method.

- We have an internal version of renv/packrat that creates a plain text "dependency file" for projects and we pin package versions in docker containers. RConnect doesn't use docker right now, but they do have a versioning system that works quite well in my experience.

I definitely wouldn't want to build something like a company website in R, but I also wouldn't want to build that in C either. R definitely has it's place a server-side language even outside it's assumed domain of statistics.

Haters gonna hate, but joke is on them.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: