What rr does

bornfreddy · on June 4, 2022

Now this is how the project documentation should look like! Explains what the project does, gives some short examples of the most common features, then goes into the details - while being easy to understand for the target audience. Kudos to whoever wrote this! (and rr sounds like a nice tool too ;) )

logbiscuitswave · on June 4, 2022

I fully agree. I wish more intro-level documentation had this kind of easy to follow and progressive level of detail.

All too many things like this either dive straight into the deep end inundating you with superfluous details when all you want is a primer, or provide so little information as to be nearly useless.

The writers on this did a great job.

jchw · on June 4, 2022

Historically, rr has not worked on AMD processors, which is a bummer. However, I have been able to make good use of it on my 5950X now with the workaround script and newer versions of rr. This is good news.

I've not read their extended technical report, but I am kind of curious exactly what performance counters AMD is implementing poorly and how that impacts rr.

KenoFischer · on June 4, 2022

vchuravy's link gives the details, but basically, there's a microarchitecture optimization in Zen that breaks determinism of the performance counters. Fortunately, there's a chicken bit that turns it off, which is what the script does. I've been trying to convince AMD to officially document the bit such that the kernel can set it automatically, but no luck so far.

There is still one remaining annoyance, which is that AMD's NMI latency is super high, which directly tanks rr's reverse execution latency. There's probably some improvements that could be made to the replayer to be more aggressive about optimistic assumptions on NMI latency and retrying if those assumptions are off, but it'd be a fair bit of work. I don't really understand why AMD decided to use this kind of architecture. It also makes profiles much less accurate.

jchw · on June 4, 2022

Thanks for the info. I was wondering what was going on in that script. It’s unfortunate that their architectural decisions had to impact rr, but I guess these days, every last bit of benchmark score really matters.

vchuravy · on June 4, 2022

I think https://github.com/rr-debugger/rr/issues/2034#issuecomment-6... is the right synopsis.

sph · on June 4, 2022

This is incredible!

For those that have used it, how useful it is for debugging multithreading heisenbugs? Can I let a process run under rr for days, wait until it crashes to due a heisenbug, and replay the trace without rr having to go through days of recording? i.e. is it possible to fast forward the trace, somehow?

(I nerd sniped myself a bit here, wondering how fast forwarding could be implemented. I think it might be achievable with periodic process memory snapshots and incremental traces.)

roca · on June 4, 2022

You probably could record a process running for days but it would also take days to replay to the end, which would not be much fun. We don't create checkpoints during the recording.

You'd be better off restarting the recording periodically. Also, rr has a "chaos mode" which randomizes scheduling and often makes threading bugs easier to reproduce. https://robert.ocallahan.org/2016/02/introducing-rr-chaos-mo...

Agingcoder · on June 4, 2022

Chaos mode works quite well in my experience, definitely worth trying if you don't want to wait for days.

I had a heisenbug which would appear once a week, and that I couldn't trigger on my workstation. Chaos mode did the trick.

anarazel · on June 4, 2022

I've found that the scheduling quanta with chaos mode are too high to hit concurrency issues in a reasonable amount of time. And IIUC --num-cpu-ticks is not randomized. So if something happens below that tick quantum it's hard to hit.

I wonder if a) rr could randomize the cpu ticks as well, at least in chaos mode, b) profiled code could somehow hint to rr that a certain instruction would be an "interesting" scheduling point.

roca · on June 4, 2022

Chaos mode varies the scale of the tick quantum to try to catch stuff like that. It doesn't always work, especially if the window of vulnerability to the bug is incredibly small (e.g. a few instructions).

anarazel · on June 4, 2022

Hm. Is it possible that that works better with multi-threaded than multi-process programs?

roca · on June 5, 2022

That shouldn't make any difference.

pm215 · on June 4, 2022

Hmm, when I asked for 'replay -e' I thought it would be faster than 'type run and wait' -- is it not?

roca · on June 4, 2022

fanf2 · on June 4, 2022

I have not used rr heavily, but I did use it to help find a multithreading heisenbug in BIND https://kb.isc.org/docs/aa-01606

I could not reproduce the bug in less than an hour of run time, which meant that analysing the bug in gdb required an hour for it to run forward to the crash point, after which it was possible to skip back and forth.

pm215 · on June 4, 2022

rr numbers each 'event' it records, and you can pass an event number to the gdb 'run' command to tell it to start from that event. Recent 'rr' now also supports the -e option to replay meaning 'start the debug session pointing at the last recorded event, whatever that was'. Details in the usage page: https://github.com/rr-debugger/rr/wiki/Usage

AIUI you get 'start at an event' basically for free, because 'step backward' is implemented as 'start at the preceding event and then step forward by N', so events are frequent in the trace and the machinery to get to that point without running all the way from the start of the debug session exists anyway. There's some stuff on the website about how this is all implemented, I think.

IshKebab · on June 4, 2022

I haven't used it, but it might be quite useful. It does force code to run on a single core, so you won't get truly concurrent execution which I guess might hide some multithreading bugs. On the other hand it does come with "chaos mode" which is basically thread schedule fuzzing.

You can "fast forward" the trace as you imagine. rr works by recording all non-deterministic input and output to the program so it can start from the core dump and step backwards.

As I understand it anyway; I've never actually used it - the one time I really wanted something like rr was on a Mac.

sfink · on June 4, 2022

> You can "fast forward" the trace as you imagine. rr works by recording all non-deterministic input and output to the program so it can start from the core dump and step backwards.

Not exactly. rr can't magically inflate a core dump into all the open file descriptors and other state accumulated during a process's execution. It needs to run from the beginning.

So starting from the beginning, you can let it run to any arbitrary point. (And there are ways of knowing useful points, eg if you record with -M it will print out event counts with anything written to stdout/stderr, so you can quickly run with -g to start debugging at the point that message was emitted.) But it does still need to run from the beginning. And you're recording a whole process tree, you need to start from the initial process and let it go forward to your requested point in time.

In practice, I usually use it by starting a replay, continuing forward to a crash (or a breakpoint at some line if it didn't crash), and only then starting to pay attention to what's going on. It's a simple, muscle-memory process to get to that point, and if it was a long recording you kind of start it up and wait until it's ready. (Which will take roughly as long as the initial run took to get to the same point. A little slower because of the overhead, a little faster because it doesn't actually have to wait for I/O, averaging out to a mostly unnoticeable amount slower.)

I always have to mention: one of my favorite things about rr is something that doesn't even require all the sophisticated machinery. I often want to debug a single process within a whole process tree, and with most things there aren't --debugger flags (or they're broken). With rr, I can just record the whole tree, then pick out the process I care about after the fact. It's a small thing, but it saves me from my usual hairball of wrapper scripts.

Random example: when debugging a gcc plugin, I record a call to gcc, but the actual compile I care about is done by a forked cc1plus process.

patrec · on June 4, 2022

> For those that have used it, how useful it is for debugging multithreading heisenbugs?

Not that useful, because as it says on the page (under "Limitations"):

> emulates a single-core machine.

semiquaver · on June 4, 2022

Multithreading does not require multiple processors, it has existed since long before SMP was a thing.

db48x · on June 4, 2022

But rr does squash all your threads into a single virtual cpu core. It context–switches between them, but ultimately only one of them is running at a time. This makes it hard to capture some kinds of bugs. To compensate it also has a chaos mode that randomly stops switching between the threads fairly (starving some and giving others more than their fair share) in the hopes of triggering those same bugs.

For most uses rr is a major win, but for race conditions it sometimes doesn’t help.

patrec · on June 5, 2022

And a lot of this multi-threaded software that previously worked fine on single-core machines, required urgent fixes as multi-core CPUs became more common.

slashdev · on June 4, 2022

Only somewhat useful as it runs singlethreaded, last I checked. That will prevent some forms of heisenbugs from happening.

amelius · on June 4, 2022

Next question: can I do the same with a multi-node system?

borodi · on June 4, 2022

rr is a life changing experience for debugging things. One underrated thing is being able to save and share rr traces. rr + CI makes finding and potentially fixing heisenbugs a lot easier.

logbiscuitswave · on June 4, 2022

This kind of replayable debugging can be wonderful - especially for hard to debug issues like heap corruption and such.

Windows has something similar called Time Travel Debugging[1] but in my experience the dump files it creates can be enormous and be a pain to analyze as a result. (It also relies on WinDbg which while being extremely powerful and capable, has a huge learning and usability cliff. I’ve been using it for over a decade and I still need a cheat sheet from time to time. The revamped WinDbg Preview[2] improves the UI a lot, but ultimately it’s still WinDbg.)

[1] https://docs.microsoft.com/en-us/windows-hardware/drivers/de...

[2] https://docs.microsoft.com/en-us/windows-hardware/drivers/de...

DoctorOW · on June 4, 2022

Previously: https://news.ycombinator.com/item?id=18388879

dang · on June 4, 2022

Thanks! Macroexpanded:

Instant replay: Debugging C and C++ programs with rr - https://news.ycombinator.com/item?id=27034588 - May 2021 (66 comments)

Using time travel to remotely debug faulty DRAM - https://news.ycombinator.com/item?id=24589597 - Sept 2020 (62 comments)

Time Traveling Linux Bug Reporting: Coming in Julia 1.5 - https://news.ycombinator.com/item?id=23069372 - May 2020 (21 comments)

rr: lightweight recording and deterministic debugging - https://news.ycombinator.com/item?id=18388879 - Nov 2018 (52 comments)

Rr 5.0 Released - https://news.ycombinator.com/item?id=15191445 - Sept 2017 (3 comments)

Debugging Leaks with rr - https://news.ycombinator.com/item?id=10573308 - Nov 2015 (4 comments)

Back to the Futu-Rr-e: Deterministic Debugging with Rr - https://news.ycombinator.com/item?id=10492664 - Nov 2015 (9 comments)

Rr 4.0 Debugger Released with Reverse Execution - https://news.ycombinator.com/item?id=10441618 - Oct 2015 (11 comments)

Rr records nondeterministic executions and debugs them deterministically - https://news.ycombinator.com/item?id=8817954 - Dec 2014 (9 comments)

Rr 3.0 Released with x86-64 Support - https://news.ycombinator.com/item?id=8734502 - Dec 2014 (6 comments)

Porting rr to x86-64 - https://news.ycombinator.com/item?id=8543624 - Nov 2014 (9 comments)

mmarq · on June 4, 2022

Sorry, I didn’t know that. I thought the site would recognise duplicated links.

jwilk · on June 4, 2022

From the FAQ <https://news.ycombinator.com/newsfaq.html>:

> Are reposts ok?

> If a story has not had significant attention in the last year or so, a small number of reposts is ok.

dang · on June 4, 2022

As jwilk correctly says, reposts are fine after a year or so. Pointing to previous links with comments is just to satisfy users who might be curious for more. You did good!

speps · on June 4, 2022

It's quite common for people to refer to old discussions (here 2+ years ago) for popular projects like rr.

PostOnce · on June 4, 2022

Yes, because there may be something to learn from the old comments and the new. It's good. Different people comment in different eras.

mkl95 · on June 4, 2022

Reposting is OK. I once got an email from HN staff inviting me to repost a link.

adgjlsfhk1 · on June 4, 2022

Breakpoint plus reverse watch is incredibly powerful. It makes it trivial to find the code that last modified a variable before a breakpoint.

blacksqr · on June 4, 2022

Makes every day seem like Talk Like a Pirate Day.

qumpis · on June 4, 2022

Is there a counterpart of this in Python?

pizza · on June 4, 2022

pytrace https://pytrace.com

BiteCode_dev · on June 4, 2022

There was an attempt for Python 2, but it didn't catch on.

sirwhinesalot · on June 4, 2022

This sounds like a debugger I might actually enjoy using (unlike all the others).

db48x · on June 4, 2022

rr is a superpower, but pernosco is several superpowers (https://pernos.co/; it’s built on rr). I recommend both!

Agingcoder · on June 4, 2022

I second this. Pernosco is just unbelievable.

omginternets · on June 4, 2022

How does this work, exactly? Is it recording every state change in the program?

sfink · on June 4, 2022

No, that has been done but is much slower. It records all communication with the external world. The full answer is well described in https://arxiv.org/pdf/1705.05937.pdf which I'll quote here:

> We identify a boundary around state and computation, record all sources of nondeterminism within the boundary and all inputs crossing into the boundary, and reexecute the computation within the boundary by replaying the nondeterminism and inputs. If all inputs and nondeterminism have truly been captured, the state and computation within the boundary during replay will match that during recording.

So for any chunk of time spent entirely in user space doing computation, the replay starts out in the same situation and executes in exactly the same way the original process did, with zero overhead. That's what enables rr to be so low overhead overall; most programs spend the bulk of their time computing stuff and reading/writing memory. The replayed process has no way of knowing that its file descriptors aren't actually open, since anything it does with them will be provided by the recording. Quoting again:

> In particular, user-space memory and register values are preserved exactly, with a few exceptions noted later in the paper. This implies CPU-level control flow is identical between recording and replay, as is memory layout.

omginternets · on June 4, 2022

That is very cool, and very clever! :)

sfink · on June 4, 2022

If you use https://pernos.co/ then you don't need any of this, but I have a set of only slightly buggy gdbinit scripts that extend the rr debugging experience at <https://github.com/hotsphink/sfink-tools>. The main things it adds are:

1. a `log` command that just records whatever you give it into a plaintext file, together with its "point in time" according to rr. This is useful because when using rr, you tend to move forward and backward in time a lot, and it's hard to keep track of the actual sequence of events and where you are within them. It also creates a checkpoint so you can return to any one of your log points. It also has some niceties like replacing any expression enclosed in curly brackets with the results of executing the gdb expression given, so you can do things like

    log starting execution of Init() with v={v}. About to crash.

2. a `label` command that lets you assign names to random hex values. Then in the output of `p expr` or the above `log` with no arguments, which displays the full set of log messages you've recorded, it will replace known hex values with their labels. This is so much nicer than memorizing numeric values and matching them up.

    (rr) p obj
    $1 = (JSObject*) 0x7f606892a200
    (rr) label OUTER_OBJ=obj
    (rr) p $OUTER_OBJ
    (JSObject*) $OUTER_OBJ
    (rr) log
    701/31299795 [c4] starting processing with obj=(JSObject*) $OUTER_OBJ
    983/31299 [c2] starting processing with obj=(JSObject*) 0x7f6068a2a200
    2081/7382911 [c3] traversing to (JSObject*) 0x7f6069c2a7e8
    3316/199 [c1] crashing while accessing field of object (JSObject*) $OUTER_OBJ

The [c2] markers are the automatically-created checkpoints, numbered in order that you made those log entries in the debugger. It reorders the log messages to show them in execution order rather than debugging order. Pernosco has a very similar feature called the Notebook (where you only have to click on a log entry to view the state at that point in time.)

The scripts are also intended for sharing log files and labels between multiple concurrent replays of the same execution, which I find useful to have separate windows each maintaining a different context (point in time, and portion of the execution that I'm examining.) That tends to be the buggier part of the scripts, though. ;-)

If you're working with C or C++ (or Rust? haven't tried it), rr really is a superpower. I rarely bother using straight gdb anymore. It feels crippled.

InfiniteRand · on June 4, 2022

Does rr require debug builds? Like if I took a random executable on Linux and used rr record, would rr replay work?

mstange · on June 4, 2022

It works with optimized builds, and it works better with them than gdb does.

When you debug an optimized build with debug info in gdb by stepping line by line, it is easy to accidentally step "too far" and completely lose your place. In rr, you can always step back and recover.

teaearlgraycold · on June 4, 2022

You might be relegated to stepping through disassembled machine code. I was able to use rr with a home made JIT compiler, stepping through JIT’d instructions. So I see no reason why you can’t at least get that experience with a production binary.

pizza · on June 4, 2022

what I would give to have something like this that additionally worked on Mac and Windows too

jng · on June 4, 2022

Eager for the day someone integrates this into VS Code.

hsivonen · on June 4, 2022

Already done: https://farre.github.io/midas/

jng · on June 7, 2022

Wow, thanks. I will give it a try!!

jchw · on June 4, 2022

FWIW, rr integrates into gdb, so it should be possible to use anything that integrates with gdb.

https://github.com/rr-debugger/rr/wiki/Using-rr-in-an-IDE

raydiatian · on June 4, 2022

Man. I wish I had this for Typescript.

Well done!

ris58h · on June 4, 2022

Could you next time provide some small but meaningful description in the title? "Rr" is a little bit short in my opinion.

asicsp · on June 4, 2022

For those reading the above comment, but hadn't clicked the article link yet (like me):

>rr aspires to be your primary C/C++ debugging tool for Linux, replacing — well, enhancing — gdb. You record a failure once, then debug the recording, deterministically, as many times as you want. The same execution is replayed every time.

>rr also provides efficient reverse execution under gdb. Set breakpoints and data watchpoints and quickly reverse-execute to where they were hit.

mlochbaum · on June 4, 2022

It also becomes a bit of a cheat in terms of HN points, as mobile users miss the link and hit the up arrow next to it.

https://news.ycombinator.com/item?id=30906989

kzrdude · on June 4, 2022

The RR debugger

pvinis · on June 4, 2022

maybe "rr, a gdb replacement"

bajsejohannes · on June 4, 2022

That feels like it's underselling it a bit, since gdb does not have reverse execution, which is a pretty major contribution by rr.

ncmncm · on June 4, 2022

AIUI, Gdb does claim reverse execution, for certain targets. So, there are differences, but I don't understand them.

ynik · on June 4, 2022

gdb's reverse execution is incredibly slow.

Performance overhead of reverse debuggers:

* gdb: >1000x (note: I never tested this one myself; just heard about this overhead in a HN comment a long time ago)

* Microsoft WinDbg TimeTravel Debugger: >40x

* rr: 1.5x

rr is the only one fast enough to be used on a regular basis -- the others are slow enough that they only make sense on particularly nasty bugs (usually memory corruption)

zempfel · on June 4, 2022

I would be surprised if the overhead of TTD is typically 40x, given that it records multithreaded processes in parallel. Which, to my knowledge, rr does not.

It also supports selective recording so, if this is configured (e.g. selecting certain functions), only a subset of the process execution is actually committed to the trace file, further reducing the overhead.

ynik · on June 4, 2022

I don't know about multithreaded processes -- the program I use it is single-threaded. My main use case is not to make the crashes reproducible (they usually already are), but to understand where a bogus value is coming from. (memory breakpoint + run backwards) Which often ends up being a certain third-party library that makes liberal use of C unions, sometimes accessing the wrong variant...

I'll have to look into selective recording, but I'm not sure how helpful it'll be in my use case (I don't know said library well enough to predict which functions might be causing the bogus values)

ncmncm · on June 5, 2022

Would we recognize the blamed library?

fanf2 · on June 4, 2022

There is also a commercial alternative, https://undo.io/solutions/products/udb/ though I have not used it myself and I don’t know what its overhead is. (I know some of the people who work on it.)

db48x · on June 4, 2022

Using GDB’s reverse execution requires you to already be pretty sure where the bug you are looking for is, and then recording a very short portion of the program, preferably just 10k instructions or so. Recording for a whole second could easily take 15 minutes. But it does work well, within those limitations.

jcranmer · on June 4, 2022

gdb claims it. I have not once ever gotten it to work, however. For anything seemingly larger than a trivial program, the reverse-execution state grows too big and needs to be pruned. I also don't think it supports such fancy things as "floating point".

teddyh · on June 4, 2022

> gdb does not have reverse execution

GDB does have reverse execution:

https://sourceware.org/gdb/current/onlinedocs/gdb/Reverse-Ex...

sph · on June 4, 2022

It doesn't look like it's a replacement, it's more a companion tool for gdb to deterministically record, replay and debug a process after the fact.

db48x · on June 4, 2022

Yea, I wouldn’t call it a replacement. It acts as a GDB debugging target; basically you connect a GDB process to rr and GDB controls rr for you. (To confuse things further, the “rr replay” command starts both rr and GDB for you, so it can be difficult to see the seams.)

mmarq · on June 4, 2022

Can I edit it?

pvg · on June 4, 2022

You could but really the title is fine, just like reposting after sufficient time is fine. It's ok for a title to require a click.

kzrdude · on June 4, 2022

Normally yes, but maybe not so late after submission. Moderators can update it.

stefantalpalaru · on June 4, 2022

https://github.com/rr-debugger/rr#system-requirements :

"rr currently requires either:

    - An Intel CPU with Nehalem (2010) or later microarchitecture.
    - Certain AMD Zen or later processors (see https://github.com/rr-debugger/rr/wiki/Zen)"

stewbrew · on June 4, 2022

We already have https://r-project.org. Now we have https://rr-project.org. So, https://rrr-project.org is next?

leoff · on June 4, 2022

It is actually a geometric progression, so https://rrrr-project.org would be the next one.

throwamon · on June 4, 2022

Common misconception. Actually it's a Fibonacci sequence, so the next one really is https://rrr-project.org and then it's https://rrrrr-project.org.

This does also mean that there's https://-project.org, and that https://r-project.org secretly disambiguates into two different projects.

jonnycomputer · on June 4, 2022

Looks like a cool debugging tool, but I clicked the link because I thought maybe it was related to R. Maybe modify the title to make it clearer?