Segfaulting Python with afl-fuzz

__s · on March 7, 2016

I've been segfaulting CPython quite a bit with stack underflows while developing a befunge-to-python-bytecode JIT that uses the python stack as the befunge stack. It has to include instrumentation to track the stack depth so that it substitutes 0 when the user pops a value on an empty stack. Latest issue was this weekend, reducing the bytecode size of `p` by converting a while loop to move the stack into an array for recompilation to use FOR_ITER, it didn't like being called on non iterables

https://github.com/serprex/Befunge/blob/master/funge.py

It'd be neat to see how PyPy handles fuzzing. It uses CPython's bytecode, I was able to get it to run beer6.bf (it was pretty slow, since that's a benchmark that mostly tests recompile speed) but it locked up when testing mandel.bf (odd since mandel.bf doesn't trigger recompilation)

mateo411 · on March 7, 2016

Here is a quick edit that you should make.

Where you write:

> In laments terms

You probably want to write:

In layman's terms

"In layman's terms" is an idiomatic way of saying, simply put, or explaining something to somebody who might not be technically inclined.

To lament is to feel upset about something, it often refers to the grief one feels when a loved one has died.

Overall, this was an interesting read, and I'm looking forward to your next installment.

lmm · on March 7, 2016

So... what are the crashes? What was the goal of all this? I feel like the article ended just as it was about to get interesting.

JoachimSchipper · on March 7, 2016

I think the article is a tutorial. It doesn't appear to present a new result.

It's moderately-well-known that .pyc files execute arbitrary code (not just arbitrary Python code) inside the Python process. (See e.g. https://docs.python.org/2/library/marshal.html#module-marsha..., "Warning: The marshal module [i.e. loading .pyc files] is not intended to be secure against erroneous or maliciously constructed data. Never unmarshal data received from an untrusted or unauthenticated source." It is nice to see that afl can re-discover this issue, but I'm pretty sure it's not new.

orf · on March 7, 2016

Oh my, I finished this quite late last night and forgot to add a conclusion. I thought the post was getting a bit lengthy, and the next step is to use gdb to dive into the crashes + make a patch, which I feel is too much for one post.

So tune in next week :)

stevekemp · on March 7, 2016

I started something similar recently, but figured a simpler target would be more usable.

Bug reports here, along with possible patches, for GNU Awk:

* https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=816277

* https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=816271

Writeup was pretty simple:

* https://blog.steve.org.uk/If_line_noise_is_a_program__all_fu...

masklinn · on March 7, 2016

Would the core team really be interested in that? The bytecode interpreter relies on implicit invariants from the codegen, re-checking these invariants on the bytecode means slowing down the interpreter for very little value.

orf · on March 7, 2016

Nope they wouldn't, making the patch is just for completeness sake I guess. Also the afl tool to narrow down segfaults seems to always result in the same fault, so maybe if I patch it it will narrow down some more interesting ones.

electrum · on March 8, 2016

That's interesting, because bytecode verification is extremely well-defined for the JVM: https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.ht...

lmm · on March 12, 2016

The JVM is explicitly designed to run untrusted bytecode.

el8squad · on March 7, 2016

[flagged]

orf · on March 7, 2016

It's more of a tutorial than bragging about crashing something with no safety checks....