Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Layman warning: I never really understood (as an outsider who does not code) why code can be unsafe. Is code more like art (painting) than math (writing an equation which balances itself)?


Ever seen a 3D printer in operation?

3D printers use a "language" called gcode. It's not really programming, it's a series of commands that tell the 3D printer nozzle to move to a certain location at a certain speed while extruding at a certain rate. There are a lot of ways you can mess that up, you can tell the nozzle to go as low as it can and just start extruding, giving you a big blob on the bottom of your 3D printer. You can tell it to move to a position that it physically can't, outside of the bounding box it can print in. Most 3D printers don't have endstops to prevent you from going too high on an axis, so they'll try to do that and tear themselves apart. You can try to extrude while your extruder isn't up to heat and grind down your filament. You can physically jam your extruder into whatever it is you're printing. There are all kinds of things that 3D printers are physically capable of doing that are unsafe.

Computers are just a machine like a 3D printer. You're "physically" moving bytes of data around (that's where most of the heat comes from), doing operations on them, etc. Nowadays you generally can't get them to destroy themselves but in the earlier history you absolutely could tell the machines to tear themselves apart in the same way you can tell a 3D printer to tear itself apart.

Computers are just machines for moving bytes around, and it's really hard to make a machine that you can only do safe stuff with.


> Computers are just a machine like a 3D printer. You're "physically" moving bytes of data around (that's where most of the heat comes from), doing operations on them, etc. Nowadays you generally can't get them to destroy themselves but in the earlier history you absolutely could tell the machines to tear themselves apart in the same way you can tell a 3D printer to tear itself apart.

This is wrong. It's not what 'unsafe' means for C or C++. You could have 100% safe hardware for your code to run on, and your code could very well still be unsafe. The phrase you're looking for is Undefined Behavior.


I think that "you can also move a byte into the wrong place and have that change your program" was implied implicitly, but sure.


The code can be unsafe because it is physically impossible to test for every input in a computer. This is where various engineering designs come in which reduce the area of testing based on some theories.

Also, even in math, there are enough mistakes in publications (not just typos, but reasoning errors) which hopefully do not affect the eventual results in any fundamental way. The equivalent of safe code in computer science would be equivalent of completely formal proofs in mathematics (like in Coq and similar languages), but probably much more difficult due to existence of temporal conditions.


Err there are other ways to prove (memory) safety than exhaustive testing, such as better type systems and static analysis (Rust) or better run time checks (any garbage collected language)


Memory safety is not the only type of safety though. There are race conditions for example.


There is a fundamental difference between math and code. In a typical modern day computer instructions, and the data to be manipulated are put in the same place [1]. This is a critical feature that makes things like downloading a program and running it possible. But for programmers to keep their code "safe" they must enforce artificial boundaries between the instructions (code) and the data. Hackers are experts at crossing those boundaries and tricking computers to treat data as code when they shouldn't.

Mathematicians have the good sense to keep their data and instructions separate. [2]

[1] https://en.wikipedia.org/wiki/Stored-program_computer [2] https://en.wikipedia.org/wiki/Field_(mathematics)


Compare it with tax law: lawyers, legislators, and many other people dedicate enormous amounts of time to create tax laws, and still tax evaders (hackers) find loopholes to exploit. Those loopholes are the law's bugs.

One could argue that those loopholes are left intentionally (back doors), but even if all actors were honest, bugs would still happen from time to time.

Computer code has even more bugs because it is produced massively and quickly, without the bureaucracy of tax code. Anyone can create an awesome application/library/framework/etc., and share it freely with the world. People end up using these projects as stepping stones for their own projects, creating a complex layered cake where bugs can hide for years.


I don't know about the art vs math question but, taking the math example, your code can be unsafe in the same way your maths can be wrong (i.e. maybe you start with a bad premise or your derivation is invalid). More generally I'd describe both of these situations as 'unsound' and actually they manifest themselves in the same way in both disciplines (an oversight, incorrect model, complexity, etc).

You might think that if you do maths on the computer, maybe it can help you keep things valid as you execute your derivations, and something similar can be done for coding. This is true, in maths/logic they have theorem provers and in coding we have static typing. Again they literally manifest themselves in the same way in both disciplines due to the Curry-Howard Correspondence[1].

You can also argue that in conjunction with static typing there are also linters, etc, but I anchor specifically to static typing in this example because of how directly it relates to your math comparison.

[1]: https://en.wikipedia.org/wiki/Curry%E2%80%93Howard_correspon...

EDIT: spelling


Ever seen one of those prank videos where someone is in the shower rinsing shampoo off their head, and the prankster leans over the shower wall and squirts a bit more shampoo onto their head, and the prankee gets more confused and annoyed when they keep rinsing "endless" amounts of shampoo that should be done by now?

Buffer overflow and "unsafe" code is like that - the showering person isn't painting or equating, they're expecting an end condition "when the water coming off my head stops having soapy lather and runs clear" which works every time, but is not a "safe" pattern - it assumes no malicious intervention. Someone else can change the surrounding circumstances so that the end condition doesn't happen when it should, and "cause" the rinse routine to keep running for longer and longer.

Buffer overflow attacks are like this, they're expecting to read some data and stop when they get to an end condition; when badly designed an attacker can change something to delay the end condition and cause more data to be read. Inside a computer there are no such things as "separate applications" or "security boundaries" or "program data" or "OS instructions", except that the patterns of numbers are supposed to be interpreted as those things. If a program can write "program data" but cannot give the OS instructions, maybe it can drop some more shampoo on the OS's head and cause the OS to keep reading more and more "OS instructions" only it's now reading past the normally expected end and reading into the "program data" location, and the same numbers which were once "safe program data" are becoming "OS instructions" to be executed by the OS using its OS permissions, which the program had no original rights to do. Breaking the imaginary security boundary by exploiting the assumptions baked into some other code that is running.


I think it's more like laws (of man, not nature) than art or math. There are complex rules that define how a computer and a programming language on that computer function. Even if we assume these rules are perfectly defined and not bug-ridden the programmer still needs to understand them and write contracts (code) with no logic errors that can lead to any number of disastrous scenarios such as data-loss, data-corruption, or data-leaking among others.

Take this immense complexity of the computer and the programming language and complex it further with requirements for the business problem being solved and throw in a tight deadline and you have a recipe which leads to the vast majority of code being buggy.


Software is more like building a machine than either math or art. There've been attempts to make formally-provable programs (so it _is_ like math) but these are not in widespread use.

Go watch the Lockpicking Lawyer on youtube pick locks and trivially crack/open every lock ever made. This is, roughly, the best physical analog to what happens with computer programs and safety. The creators are trying but they have to be correct everywhere, from every angle, and the attacker only needs to find one weakness to break it.


Programming is a craft. In the same way the pentagon and white house have structural proofs against certain kinds of attacks, programs have certain kinds of defences against certain kinds of attacks.

Defences are necessary when the program interacts with users or external inputs of any form. This can be inputs in the form of text or files, or even by interfacing with a program, e.g. the malicious code executes system code in a specific manner to cause certain side effects.


In mathematics, you need to assert a set of axioms (or preconditions) under which the theorem is held to be true. These axioms can be challenging to figure out; naïve set theory was destroyed by Russel's paradox. Rather famously, the axiom of choice is equivalent (in the sense that assuming one, one can prove the other) with the well-ordering principle, and yet one is "obviously" true and the other is "obviously" false.

Euclid proved a lot of statements in geometry using several axioms, but the last one was clunky and seemed to be something that ought to be a theorem instead: this held that, given a line and a point not on that line, there was exactly one other line that was parallel to the first line passing through said point. Eventually, though, it was found that there was a reasonable interpretation of geometry where that axiom is not true, whence spherical geometry (parallel lines do not exist) and hyperbolic geometry (many lines can pass through that point and remain parallel).

Another example is in physics: the Crystallographic Restriction Theorem mathematically restricts the kind of forms that crystals could form in. And yet, in the 1980s, several crystals were demonstrated which had five-fold symmetry, which is forbidden by that theorem. The issue is that theorem presupposes that crystals need to be symmetric under linear translations, but there exist forced tilings that have rotational symmetry but not translational symmetry--and these can have five-fold symmetry. (We now call these quasicrystals).

In CS, "unsafe code" amounts to code where programmers did not assert all of the possible preconditions to their code. In contrast to much of mathematics, failing to assert all of the preconditions for safety is remarkably easy in some languages, chiefly C/C++.


When programmers call code unsafe, what they mean is the code can be unpredictable, and unpredictability leads to unintentional behaviors, which are generally bad since good behaviors are intentional. Code can be unpredictable because it's not written in a vacuum. Not only does code exist within the context of other code, it exists within the context of a compiler or interpreter, possibly a runtime, an operating system and firmware, a CPU architecture and memory and disk and networking and, of course, user input. All of these things make for an incredibly complex system, and incredibly complex systems produce emergent behaviors. So it is very difficult to work within such a system and add behaviors to it without creating unintended consequences. This is especially true the more complex the interactions of the components of the system, and c/c++ allow for very complex interactions. There's more to it than that of course, but I think that's what underlies most of it.


Imagine writing code for an elevator. If there is a glitch in your code such that when the date changes from 1999 to 2000 it'll release all the ropes... that'd cause bunch of people to die. Something like this is exceptionally unlikely, but if you're writing code for a real life device you should always always think about its implications.

Read this: https://en.wikipedia.org/wiki/Therac-25

This Radiotherapy machine had a software bug which caused 6 people to be given massive radiation overdose.


Imagine trying to assign a unique number to every bit of data your program uses, including stuff like text, pictures, etc. Such that some text that uses 100 bytes uses 100 numbers, a picture with 1,000,000 byte uses 1,000,000 numbers, etc.

You can just say "Start at number 0 and create a new number for each bit of data", but then maybe that JPEG your program uses occupies the same set of numbers as the text you're writing to. So you need to make sure it's all unique, and that each logical thing you're storing gets its own unique set of numbers. Easy enough, except data changes as your program runs, so every now and then you need to say "ok there's not enough space to store this thing, so I'm going to assign it a new number so that it doesn't conflict with this other data I have."

That works well enough, except what if parts of your software do stuff like "write value X to number 103820"? Will that do what you want? Maybe that code is responsible for updating some text somewhere, but what if that text grew too big and moved somewhere else? How do you know if the number it's writing to is actually the right text?

What's way worse, is that some of these numbers are used by the processor for bookkeeping on things like "what was the last bit of code I was executing before I ran this code?" and if you overwrite those numbers, you can cause the processor to do evil things.

That's memory safety. It's the idea that, if you just let code write to arbitrary locations in memory, it's very very difficult to do this safely. The answer ends up being to have languages that simply don't let you do that, and that's a big step towards having safe code. "Safe" languages instead only let you do things like "append to this data", which will automatically move the data to another address if it's too big. But they won't let you just write to arbitrary addresses. Even "Safer" languages ensure that one thread can't be in the middle of moving some data to a new address while another thread is trying to write to it, etc etc.

So to your question, it's very much like painting in that regard. If you start on one corner of the canvas and draw something way too big and don't leave yourself enough room, you'll paint over parts of the painting you wanted to keep. Since programs are super dynamic, the problem of making everything has enough space to be represented in a real computer, ends up being kinda hard, and the way older languages are designed can sometimes make it nearly impossible.


"unsafety" is a very overloaded term. In this context, one specific technical meaning is assumed: _memory safety_ [0] (not type safety, not safety from hacking, etc, although they do depend on memory safety).

Programming languages are tools for building abstractions and concrete implementation of abstractions. They are very rarely verified and described mathematically and exhaustively; it is possible to state some properties of some programs, but it is mathematically impossible to state any meaningful property for an arbitrary program [1].

However, it is possible to constrain used abstractions in a way that allows to uphold some properties (to some degree). Memory safety of a language means that a program in that language can change a variable/state if and only if it "touches" it semantically (e.g. via a valid pointer/reference). A memory-safe language creates a reliable "abstraction cage" for (almost) any program written in it that guarantees (but not necessarily mathematically) that unrelated variables cannot be changed. "Glitches in the Matrix" (changing one variable magically changes a random other one) are still possible, but very rare in practice. Examples: Java/Python (which incur significant inefficiency when executing a program), and recently (the safe part of) Rust, which often comes very close to C/C++ in efficiency while retaining memory safety in most of its code.

C/C++ are examples of memory unsafe languages: their memory abstractions are not even close to an "abstraction cage"/"Matrix", they are just thin "guardrails" and guides, not enforceable rules: it is easy to read/corrupt an unrelated variable in a program (sometimes even via a malicious input to a program). This design choice was semi-deliberate: C/C++ solve the task of programming existing computer hardware efficiently and nobody knew how to create a practical, efficient and memory-safe systems programming language even twenty years ago. It is possible for a coder to code "defensively", using empirical best practices and tools for reducing possibility of using program memory incorrectly. C++ has a subset and tooling that comes tantalizingly close to memory safety, but it is still a costly uphill battle and even the best C/C++ coders/organizations fail to avoid memory misuse.

[0]: https://en.wikipedia.org/wiki/Memory_safety [1]: https://en.wikipedia.org/wiki/Rice%27s_theorem


Simplest example I can think of:

Your maths function takes a variable and divides by that variable. What happens if that variable is set to zero?





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: