Very highly recommended.
That comparison sold me. You deserve a commission.
Be warned, the O'Reilly print quality is miles apart between the two. Their print-on-demand text quality and the binding are real let-downs.
Are there any other suggestions?
This is not true. The reason why Cassandra doesn't support that is because of hashing of keys across the cluster -- you'd have to query all shards and merge the results. That has nothing to do with the LSM storage.
I haven't heard of anything faster...
edit: found the benchmarks I ran https://www.patreon.com/posts/more-bcachefs-16647914
Though I assume that, like Redis, a reboot would lose acked writes?
An SSD can't save a write in a microsecond, its latency is at least an order of magnitude higher, right?
Consumer drives typically don't pay for this, but instead good drives buffer writes in SLC flash which has lower latency than MLC.
ADDENDUM: funny thing is that you have to actively manage the energy in the caps; the writebuffer must never have more data than you have energy to write back. This becomes especially important on power-up where the empty cap is charging.
Does any of this map to a GPU for column-oriented analytical data processing? Basically, machines are only good at reading and writing large, contiguous chunks of data. As these machines evolve, the optimal width of those chunks keeps getting larger. The volume of data available is growing. And the types of operations being done on data are becoming more “analytical” (meaning column-oriented and streaming access, rather than row-oriented random access). I would expect “modern storage” algorithms to therefore be cache friendly, column oriented and take the modern, in-memory storage hierarchy into account (from on-chip registers to, to high bandwidth GPU type parallel devices, to NVRAM system memory).
This article comes off to me like a CS101 intro doing Big-O asymptotic analysis on linked lists, without even mentioning the existence and effects of memory caches.
Linear algebra. It’s so hot right now.
Basically: design viruses with fluorescent molecules attached to them such that they attach to specific sections of RNA in the cell that are associated with particular genes. Soak tissue in viruses. Look at tissue through a microscope and count individual fluorescent dots, each of which represents one RNA molecule (I find this absolutely mind-blowing). Wash off the viruses with a specific type of chemical, and repeat with a new one for a different gene.
You can only use a few colours at a time because otherwise the microscope cannot discern them. But that would severely limit the throughput - there are a lot of genes we want to check, but the tissue will also degenerate after repeated washing. So, what can we do? Well, as I've understood, scientists using smFISH and similar techniques now use multiplexing and with Hamming codes to get around that.
So yeah, linear algebra is definitely so, so hot right now.
Using a Neural Network in place of a B-tree. What is interesting is a NN can use many processors at the same time versus you can not with a B-tree.
In the end it comes down the power as in joules to get something done.
Sure. My knowledge is pretty much limited to what this articles talks about, so interested to know what else is out there.
The big thing in practice is that TokuDB vs InnoDB is dealing with large datasets.
However, I don't know where the very latest MyRocks stands vs TokuDB.
I don't mean to crap on the work presented. Great article, good summary, and the tech is solid. It's just older than what excites me; a lot of progress has been made in 20 years and the majority of it hasn't found commercial applications.
But here is a citation root for a lot of amazing work in this space:
My absolute hero Edward Kmett gave a talk stitching a lot of this work together a long time ago: https://www.youtube.com/watch?v=uA0Z7_4J7u8 . I have no idea if he's pursued it, it's just one of his many talks that left me with an incredibly lasting impression.
Variants of this technique work for arbitrary documents and structures, work better at very high volume, have cache oblivious properties, and support transactions. Universal indexes that are reasonably good for all queries (as opposed to specfiic queries) are also possible. Coupled with discrimination-based techniques for O(n) table joins, there's probably a whole startup around there.
Sorry I can't do better right now.
This property probably doesn't get the respect it deserves in this super weird world where you can't really say how many caches are between you and your data.