There is also a nice new datastructure called Zip Trees [1] which is isomorphic to skip lists in the sense that it performs exactly the same comparisons on insertion/deletion (but it can skip some redundant ones). I would expect zip trees to be faster than skip lists in practice.
Yeah, for those tempted in these directions, Zip trees seem like the clearly preferred tool. The variable-sized heap blocks of the skip list and the need for 4 pointers per entry (on average two doubly-linked list nodes) turn out to be real limitations in practice. At the end of the day, the only think skip lists really win on is code size.
For those not tempted and wondering if they should be: no, stick with your existing (red/black or AVL) balanced trees. They're clunkier and harder to explain, but they work just as well.
Unless you absolutely require deletion of a node specified by pointer, why would you include backlinks?
(Also, if you’re fine with resorting to randomized data structures, conventional treaps are quite good while being IMO easier to understand and code than either AVL or RB. The only reason I could imagine for not using them is adversarial input, although I’ve never seen an attack demostrated. Now that I’m reading the paper, zip trees seem quite nice and simple as well.)
CockroachDB uses a skip list for its “timestamp cache” data structure, used to implement serializable isolation. The data structure records reads and exists to ensure that writes cannot rewrite history.
This[1] library is the basis of the implementation in the database.
ISTR SkipLists being used by the NCBI toolkit for DNA/AA sequence data way back in the day. Of course now I can't find any reference to it, but did find this paper on SkipLists in multiple alignments from 2003:
I remember implementing a skip-list in order to do some hacky ray-casting on the GPU. Pretty significant performance improvement, and fun to implement!
Actually, the “skip list” I implemented wasn’t as general as the ones talked about here. I had the “skips” pre-computed at specific intervals to skip sometimes huge numbers of entries, sometimes just a few, it wasn’t equally distributed. Didn’t know there was a more general implementation, neat!
Skip lists in practice are faster than balanced binary trees in insertion, since rebalancing is slow, while insert and delete in a sl are very fast (if you have a fast source of random or pseudorandom numbers). Skip list insertion is also very easily to parallelize.
Insert on a bbt requires locking basically the whole tree.
If you can turn your keys into byte/bit sequences, and most of the time you can, you can use crit bit trees or binary Patricia trees. These are fast structures and support ordered traversal. They also do not rebalance and do not require locking the whole tree.
Skip lists are built upon double linked lists, many of them. Parallelizing operations on double linked lists is not easy, it is quite easy to introduce potential deadlock.
I've always seen them with singly-linked lists; double doesn't give an average or worst-case order improvement to anything, but it does make some ops faster within the same complexity class.
Okay, skip lists are built upon singly-linked lists, but there are still need to lock more than one object to modify. The need to lock multiply objects at once is the potential source of deadlocks.
Double linked lists would help navigate quicker. But let's discuss why even skip list on even single linked lists is not that easy.
The problem with double-linked lists modified in parallel is that one should lock at least two places where changes are performed. When you lock two things, you need an order between them on which to lock first. Otherwise, it is possible to introduce deadlocks - first actor needs A and B and locks A, second actor needs A and B and locks B.
Crucial part here is the need to lock more than one object.
If I understand skip lists correctly, it is possible to insert element into several lists at once. Thus, the need to work on more than one element. Thus, the possibility of deadlock that needs to be accounted for.
Simplicity is really the biggest advantage. Being simple, its much easier to implement a skiplist lock free vs other data structures. This helps it perform really well under highly concurrent point read and write workloads.
Its not as good at scans, but if you really care about scan performance you should be using a columnstore layout (and not a tree).
If we are going as far as using a source of randomness in our data structures, I think treaps are conceptually just as simple if not simpler, and I think is also easier to implement than skip lists.
A hierarchy of sorted arrays, with merging when needed. Single access time is O(logN), scan is theoretically the fastest possible, inserts and deletes are also O(logN).
For what it’s worth, my crappy skip list implementation was a bit slower than my crappy red-black tree implementation at mixed read/write workloads, but it was faster to iterate sequentially, but that never
mattered for me enough for it to be a net win. I ended up switching back to red-black trees.
This is just one data point. I suspect I’ll try using them again next time I’m trying to squeeze perf out of an ordered search data structure.
> I remain unconvinced that skiplists cannot be better replaced in most cases with a good hash table/dictionary implementation.
Skiplists are ordered. Perhaps without much advantage compared to, say, a balanced BST. But Hash tables aren't ordered, so they aren't as good for storing data when preserving order is important.
> Also, most linked lists should actually be deques.
Yep, for almost all purposes, so long as using the extra memory for the reverse pointers isn't an issue. Deques are more flexible and much easier to work with (and especially to debug) than singly-linked lists. Not much of a take, I think a large proportion of developers would vehemently agree with you there.
You can typically insert into a skiplist in parallel with better performance than inserting into a balanced BST in parallel. So you may prefer a skiplist for concurrent write-heavy loads.
Having implemented a SkipList and compared it to various binary tree implementations, my observation is that they're essentially equivalent in terms of performance for most operations. For insertion, my SkipList implementation was faster than most binary tree implementations but I found a Red-Black Binary Tree implementation which was faster.
Still, my SkipList was much faster with batch-deletion of contiguous records. This is because a SkipList does not require re-balancing after each individual deletion; once you've found a starting point and ending point (which has O(log n) time complexity), the deletion of an arbitrarily large chunk of a SkipList can be done in constant time.
[1]: https://arxiv.org/abs/1806.06726