Implement a SkipList based K-V Database in Java
Skiplist is linked list with layers. The implementation of a skiplist is quite easier than trees. It is a K-V Database based on SkipList written in Java.
SkipList
Skiplist is linked list with layers. The taller layer the fewer the node is. It is a probabilistic data structure that allows average $O(logn)$ operations. At the bottom level the linked list has all the data in.
Implementation
The implementation of a skiplist is quite easier than trees. Red-Black, AVL, sub-trees — seems a pain. The index layer above has half of the data in and the layer above has half of that, and so on in a ideal situation. The design of the levels of index is based on “flipping coins”: if the result is “face”, rise the level. Therefore, from level 0
there are $\frac{1}{2}$ of chance nodes goes to level 1
, $\frac{1}{4}$ chance level 2
, $\frac{1}{8}$ chance level 3
so far and so on. If we were searching for a node we would start at the top level.
Node
The node class will represent an element in the skiplist. Each node will cantain key-value and a forward
array with pointers to the next node at each level.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
/**
* Node to store data
* @param <K>
* @param <V>
*/
public static class Node<K extends Comparable<K>, V>{
K key;
V value;
int level;
/**
* forward list to store next node in different levels
*/
ArrayList<Node<K, V>> forward;
Node(K key, V value, int level){
this.key = key;
this.value = value;
this.level = level;
this.forward = new ArrayList<>(Collections.nCopies(level + 1, null));
}
public K getKey() {
return this.key;
}
public V getValue() {
return this.value;
}
public void setValue(V value) {
this.value = value;
}
}
SkipList Class
SkipList
class is used to manage nodes and levels.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
/**
* Max level of the skip list
*/
private static final int MAX_LEVEL = 32;
/**
* Dir for data persistence
*/
private static final String DATA_STORE = "./data";
/**
* header of the skip list
*/
private final Node<K, V> header;
/**
* current level of the skip list
*/
private int curLevel;
/**
* Count of node in current skip list
*/
private int nodeCnt;
Search
To find a key, we start at the highest level of the list. Move forward with forward
array in the current level until the next key is greater than or eqaul to the target. Then, drop down to a lower level and repeat the moving forward operation until you reach the 0 level.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
/**
* search and get node value
* @param key node key to search
* @return the value of key node
*/
public V get(K key){
Node<K, V> cur = this.header;
for(int i = this.curLevel; i >= 0; i--){
while(cur.forward.get(i) != null && cur.forward.get(i).getKey().compareTo(key) < 0){
cur = cur.forward.get(i);
}
}
cur = cur.forward.getFirst();
if(cur != null && cur.getKey().compareTo(key) == 0){
return cur.getValue();
}
return null;
}
Insert
Before the insertion, we need to determine the insert point by performing a search operation. Then randomly determine the level for the new node using java.util.Random
.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
/**
* insert specific node
* @param key key of the node
* @param value value of the node
* @return true if node is successfully inserted
*/
public synchronized boolean insert(K key, V value){
Node<K, V> cur = this.header;
ArrayList<Node<K, V>> updateTable = new ArrayList<>(Collections.nCopies(MAX_LEVEL + 1, null));
for(int i = this.curLevel; i >= 0; i--){ // get the last lower key
while(cur.forward.get(i) != null && cur.forward.get(i).getKey().compareTo(key) < 0){
cur = cur.forward.get(i);
}
updateTable.set(i, cur);
}
cur = cur.forward.getFirst(); // lowest level, get(0)
if(cur != null && cur.getKey().compareTo(key) == 0){ // same key, replace value
cur.setValue(value);
return true;
}
int randLevel = randomLevel();
if(cur == null || cur.getKey().compareTo(key) != 0){ // different key, insert node
if(randLevel > curLevel){
for(int i = curLevel + 1; i < randLevel + 1; i++){
updateTable.set(i, header);
}
curLevel = randLevel;
}
Node<K, V> insertNode = createNode(key, value, randLevel);
for(int i = 0; i <= randLevel; i++){ // insert node and index
insertNode.forward.set(i, updateTable.get(i).forward.get(i));
updateTable.get(i).forward.set(i, insertNode);
}
this.nodeCnt++;
return true;
}
return false;
}
If we searched and find the same key, just replace it with the newest value.
1
2
3
4
if(cur != null && cur.getKey().compareTo(key) == 0){ // same key, replace value
cur.setValue(value);
return true;
}
The updateTable
is to keep track of the nodes that need to have their forward
pointers updated to correctly insert the new node into the skiplist. By traversing down the levels and storing these predecessors, the insertion process can efficiently update the skiplist’s structure at all the necessary levels.
In Java, the keyword synchronized
is for controlling access to shared resources (like objects and variables) by multiple threads. It helps prevent race conditions and ensures thread safety. It provides mutual exclusion, meaning that only one thread can hold the lock on a particular object or class at any given time. When a thread enters a synchronized
block or method, all other threads trying to enter the same synchronized block or method on the same object will be blocked until the first thread releases the lock. Here, the synchronized
keyword ensures that only one thread can execute this insert
method at a time, preventing race conditions and maintaining the integrity of the skiplist in a multi-threaded environment.
Delete
When the deletion is performing, it will remove all the references from all levles where it is present.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
/**
* delete node from skip list
* @param key target node's key
* @return true if successfully deleted
*/
public synchronized boolean remove(K key){
Node<K, V> cur = this.header;
ArrayList<Node<K, V>> updateTable = new ArrayList<>(Collections.nCopies(MAX_LEVEL, null));
for(int i = this.curLevel; i >= 0; i--){
while(cur.forward.get(i) != null && cur.forward.get(i).getKey().compareTo(key) < 0){
cur = cur.forward.get(i);
}
updateTable.set(i, cur);
}
cur = cur.forward.getFirst();
if(cur != null && cur.getKey().compareTo(key) == 0){ // node found
for(int i = 0; i < this.curLevel; i++){
if(updateTable.get(i).forward.get(i) != cur){
break;
}
updateTable.get(i).forward.set(i, cur.forward.get(i));
}
}
while(this.curLevel > 0 && this.header.forward.get(this.curLevel) == null){
this.curLevel--; // update the list level
}
this.nodeCnt--;
return true;
}
REPL & Test
REPL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
public class Main {
public static void main(String[] args){
SkipList<String, String> skiplist = new SkipList<>();
Scanner scanner = new Scanner(System.in);
skiplist.loadFile();
boolean loop = true;
while(loop){
String cmd = scanner.nextLine();
String[] cmdList = cmd.split(" ");
switch (cmdList[0]) {
case "insert" -> {
boolean status = skiplist.insert(cmdList[1], cmdList[2]);
if(status){
System.out.println("Insert data: [" + cmdList[1] + "," + cmdList[2] + "] successfully!");
}else{
System.out.println("Failed to insert data: [" + cmdList[1] + "," + cmdList[2] + "]");
}
}
case "delete" -> {
boolean status = skiplist.remove(cmdList[1]);
if(status){
System.out.println("[" + cmdList[1] + "] deleted successfully!");
}else{
System.out.println("Failed to delete data!");
}
}
case "search" -> {
String value = skiplist.get(cmdList[1]);
if(value != null){
System.out.println("Key " + cmdList[1] + " found: " + value);
}else{
System.out.println("Key " + cmdList[1] + " not exists!");
}
}
case "exit" -> {
Scanner scan = new Scanner(System.in);
System.out.print("Save file? [Y/n]: ");
String yn = scan.next();
if(yn.equals("y") || yn.equals("Y")){
skiplist.saveFile();
}else if(yn.equals("n") || yn.equals("N")){
break;
}else{
skiplist.saveFile();
}
loop = false;
}
case "list" -> skiplist.list();
case "save" -> skiplist.saveFile();
case "load" -> skiplist.loadFile();
default -> System.out.println("Command not found");
}
}
}
}
Stress Test
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
import java.util.Random;
public class StressTest{
public static final int INSERT_TIMES = 100000;
public static final int SEARCH_TIMES = 100000;
public static String randString(){
String characters = "abcdefghijklmnopqrstuvwxyz0123456789";
int len = 10;
StringBuilder result = new StringBuilder(len);
Random random = new Random();
for(int i = 0; i < len; i++){
int index = random.nextInt(characters.length());
result.append(characters.charAt(index));
}
return result.toString();
}
public static void main(String[] args) throws InterruptedException{
int nThread = 10;
long start = System.currentTimeMillis();
Thread[] threads = new Thread[nThread];
SkipList<String, String> skipList = new SkipList<>();
for(int i = 0; i < nThread; i++){
threads[i] = new Thread(new InsertTask<>(skipList));
threads[i].start();
}
for(int i = 0; i < nThread; i++){
threads[i].join();
}
long end = System.currentTimeMillis();
System.out.println("Insert in " + nThread + " Thread " + (nThread * INSERT_TIMES) + " takes " + (end - start) + "ms");
long start2 = System.currentTimeMillis();
Thread[] threads2 = new Thread[nThread];
for(int i = 0; i < nThread; i++){
threads2[i] = new Thread(new SearchTask<>(skipList));
threads2[i].start();
}
for(int i = 0; i < nThread; i++){
threads2[i].join();
}
long end2 = System.currentTimeMillis();
System.out.println("Search in " + nThread + " Thread " + (nThread * SEARCH_TIMES) + " takes " + (end2 - start2) + "ms");
}
private static class InsertTask<K extends Comparable<K>, V> implements Runnable{
SkipList<K, V> skipList;
InsertTask(SkipList<K, V> skipList){
this.skipList = skipList;
}
@Override
@SuppressWarnings("unchecked")
public void run(){
for(int i = 0; i < INSERT_TIMES; i++){
boolean status = this.skipList.insert((K)randString(), (V)randString());
if(!status){
System.out.println("Failed to insert");
break;
}
}
}
}
private static class SearchTask<K extends Comparable<K>, V> implements Runnable{
SkipList<K, V> skipList;
SearchTask(SkipList<K, V> skipList){
this.skipList = skipList;
}
@Override
@SuppressWarnings("unchecked")
public void run(){
for(int i = 0; i < SEARCH_TIMES; i++){
this.skipList.get((K)randString());
}
}
}
}
It defines a helper function randString()
to generate random 10-character alphanumeric strings. The main
method initializes a SkipList
and creates an array of 10 threads (nThread = 10
). For each thread, it then creates InsertTask
instances, starts them to concurrently insert random key-value pairs into the SkipList
, and waits for all insertion threads to complete.
1
2
3
4
Insert in 10 Thread 1000000 takes 5248ms
Search in 10 Thread 1000000 takes 808ms
Process finished with exit code 0
Disadvantages
A database built on skiplists, while offering advantages like simpler implementation and good performance, also has disadvantages, primarily due to memory overhead and lack of guaranteed worst-case performance compared to balanced trees.
Answer from templatetypedef on StackOverflow: https://stackoverflow.com/a/23814116/26756091