Let's Talk SkipList

anfelor · on Aug 7, 2022

There is also a nice new datastructure called Zip Trees [1] which is isomorphic to skip lists in the sense that it performs exactly the same comparisons on insertion/deletion (but it can skip some redundant ones). I would expect zip trees to be faster than skip lists in practice.

[1]: https://arxiv.org/abs/1806.06726

ajross · on Aug 8, 2022

Yeah, for those tempted in these directions, Zip trees seem like the clearly preferred tool. The variable-sized heap blocks of the skip list and the need for 4 pointers per entry (on average two doubly-linked list nodes) turn out to be real limitations in practice. At the end of the day, the only think skip lists really win on is code size.

For those not tempted and wondering if they should be: no, stick with your existing (red/black or AVL) balanced trees. They're clunkier and harder to explain, but they work just as well.

mananaysiempre · on Aug 8, 2022

Unless you absolutely require deletion of a node specified by pointer, why would you include backlinks?

(Also, if you’re fine with resorting to randomized data structures, conventional treaps are quite good while being IMO easier to understand and code than either AVL or RB. The only reason I could imagine for not using them is adversarial input, although I’ve never seen an attack demostrated. Now that I’m reading the paper, zip trees seem quite nice and simple as well.)

vvern · on Aug 7, 2022

CockroachDB uses a skip list for its “timestamp cache” data structure, used to implement serializable isolation. The data structure records reads and exists to ensure that writes cannot rewrite history.

This[1] library is the basis of the implementation in the database.

[1] https://github.com/andy-kimball/arenaskl

slyrus · on Aug 7, 2022

ISTR SkipLists being used by the NCBI toolkit for DNA/AA sequence data way back in the day. Of course now I can't find any reference to it, but did find this paper on SkipLists in multiple alignments from 2003:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC521198/

infradig · on Aug 7, 2022

Nice. I have used unrolled SkipLists extensively in Trealla Prolog, where it is integral to fast performance on large (1M+) Prolog rule-sets.

rektide · on Aug 7, 2022

Go Terps (University of Maryland, College Park)! First described by Bill Pugh in 1989.

nfgrep · on Aug 7, 2022

I remember implementing a skip-list in order to do some hacky ray-casting on the GPU. Pretty significant performance improvement, and fun to implement!

nfgrep · on Aug 7, 2022

Actually, the “skip list” I implemented wasn’t as general as the ones talked about here. I had the “skips” pre-computed at specific intervals to skip sometimes huge numbers of entries, sometimes just a few, it wasn’t equally distributed. Didn’t know there was a more general implementation, neat!

mbfg · on Aug 7, 2022

Is it me, or does the post not explain what the benefits are of a skiplist? Ok, maybe it's just "a here's how to implement" post, but seems strange.

bodhiandphysics · on Aug 7, 2022

Skip lists in practice are faster than balanced binary trees in insertion, since rebalancing is slow, while insert and delete in a sl are very fast (if you have a fast source of random or pseudorandom numbers). Skip list insertion is also very easily to parallelize. Insert on a bbt requires locking basically the whole tree.

thesz · on Aug 8, 2022

If you can turn your keys into byte/bit sequences, and most of the time you can, you can use crit bit trees or binary Patricia trees. These are fast structures and support ordered traversal. They also do not rebalance and do not require locking the whole tree.

Skip lists are built upon double linked lists, many of them. Parallelizing operations on double linked lists is not easy, it is quite easy to introduce potential deadlock.

dragonwriter · on Aug 8, 2022

> Skip lists are built upon double linked lists,

I've always seen them with singly-linked lists; double doesn't give an average or worst-case order improvement to anything, but it does make some ops faster within the same complexity class.

thesz · on Aug 9, 2022

Okay, skip lists are built upon singly-linked lists, but there are still need to lock more than one object to modify. The need to lock multiply objects at once is the potential source of deadlocks.

Tyr42 · on Aug 8, 2022

Looks like singlely linked lists in the linked article.

thesz · on Aug 9, 2022

Double linked lists would help navigate quicker. But let's discuss why even skip list on even single linked lists is not that easy.

The problem with double-linked lists modified in parallel is that one should lock at least two places where changes are performed. When you lock two things, you need an order between them on which to lock first. Otherwise, it is possible to introduce deadlocks - first actor needs A and B and locks A, second actor needs A and B and locks B.

Crucial part here is the need to lock more than one object.

If I understand skip lists correctly, it is possible to insert element into several lists at once. Thus, the need to work on more than one element. Thus, the possibility of deadlock that needs to be accounted for.

AdamProut · on Aug 8, 2022

I talked a bit about some of the advantages in this pretty old blog post: https://www.singlestore.com/blog/what-is-skiplist-why-skipli...

Simplicity is really the biggest advantage. Being simple, its much easier to implement a skiplist lock free vs other data structures. This helps it perform really well under highly concurrent point read and write workloads.

Its not as good at scans, but if you really care about scan performance you should be using a columnstore layout (and not a tree).

kccqzy · on Aug 8, 2022

If we are going as far as using a source of randomness in our data structures, I think treaps are conceptually just as simple if not simpler, and I think is also easier to implement than skip lists.

mbfg · on Aug 8, 2022

is the max tower height really just randomly determined, or is it fixed, or sized based on the total # of entries?

killingtime74 · on Aug 7, 2022

You’re right it didn’t. I believe the main benefit vs binary search in arrays is that it’s linked list like nature makes insertion and removal Faster

thesz · on Aug 8, 2022

Please consider COLA: http://people.seas.harvard.edu/~minilek/cs229r/fall13/lec/le...

A hierarchy of sorted arrays, with merging when needed. Single access time is O(logN), scan is theoretically the fastest possible, inserts and deletes are also O(logN).

Almost no pointer chasing.

pizlonator · on Aug 8, 2022

For what it’s worth, my crappy skip list implementation was a bit slower than my crappy red-black tree implementation at mixed read/write workloads, but it was faster to iterate sequentially, but that never mattered for me enough for it to be a net win. I ended up switching back to red-black trees.

This is just one data point. I suspect I’ll try using them again next time I’m trying to squeeze perf out of an ordered search data structure.

KerrAvon · on Aug 7, 2022

Cold take:

I remain unconvinced that skiplists cannot be better replaced in most cases with a good hash table/dictionary implementation.

Also, most linked lists should actually be deques.

PotatoPancakes · on Aug 7, 2022

> I remain unconvinced that skiplists cannot be better replaced in most cases with a good hash table/dictionary implementation.

Skiplists are ordered. Perhaps without much advantage compared to, say, a balanced BST. But Hash tables aren't ordered, so they aren't as good for storing data when preserving order is important.

> Also, most linked lists should actually be deques.

Yep, for almost all purposes, so long as using the extra memory for the reverse pointers isn't an issue. Deques are more flexible and much easier to work with (and especially to debug) than singly-linked lists. Not much of a take, I think a large proportion of developers would vehemently agree with you there.

dietrichepp · on Aug 8, 2022

You can typically insert into a skiplist in parallel with better performance than inserting into a balanced BST in parallel. So you may prefer a skiplist for concurrent write-heavy loads.

jongjong · on Aug 8, 2022

Having implemented a SkipList and compared it to various binary tree implementations, my observation is that they're essentially equivalent in terms of performance for most operations. For insertion, my SkipList implementation was faster than most binary tree implementations but I found a Red-Black Binary Tree implementation which was faster.

Still, my SkipList was much faster with batch-deletion of contiguous records. This is because a SkipList does not require re-balancing after each individual deletion; once you've found a starting point and ending point (which has O(log n) time complexity), the deletion of an arbitrarily large chunk of a SkipList can be done in constant time.

Here is my implementation (Node.js/JavaScript) in case anyone is interested: https://www.npmjs.com/package/proper-skip-list

ignoramous · on Aug 8, 2022

> Perhaps without much advantage compared to, say, a balanced BST.

A balanced BST comparatively does more work than a SkipList. Its cousin, SplayTrees, on the other hand, might give SkipList a run for its money.

dreamcompiler · on Aug 7, 2022

A skip list is an ordered data structure with lookup time O[logn]. Hash tables are unordered, with lookup time O[1]. They serve different purposes.

idsout · on Aug 8, 2022

Skip lists can be a good candidate for an ordered concurrent map implementation, outperforming an ordered map based on a bst/btree for instance.