Post

Implement a SkipList based K-V Database in Java

Skiplist is linked list with layers. The implementation of a skiplist is quite easier than trees. It is a K-V Database based on SkipList written in Java.

Implement a SkipList based K-V Database in Java

SkipList

Skiplist is linked list with layers. The taller layer the fewer the node is. It is a probabilistic data structure that allows average $O(logn)$ operations. At the bottom level the linked list has all the data in.

Implementation

The implementation of a skiplist is quite easier than trees. Red-Black, AVL, sub-trees — seems a pain. The index layer above has half of the data in and the layer above has half of that, and so on in a ideal situation. The design of the levels of index is based on “flipping coins”: if the result is “face”, rise the level. Therefore, from level 0 there are $\frac{1}{2}$ of chance nodes goes to level 1, $\frac{1}{4}$ chance level 2, $\frac{1}{8}$ chance level 3 so far and so on. If we were searching for a node we would start at the top level.

skiplist

Node

The node class will represent an element in the skiplist. Each node will cantain key-value and a forward array with pointers to the next node at each level.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
/**
 * Node to store data
 * @param <K>
 * @param <V>
 */
public static class Node<K extends Comparable<K>, V>{
    K key;
    V value;
    int level;
    /**
     * forward list to store next node in different levels
     */
    ArrayList<Node<K, V>> forward;

    Node(K key, V value, int level){
        this.key = key;
        this.value = value;
        this.level = level;
        this.forward = new ArrayList<>(Collections.nCopies(level + 1, null));
    }

    public K getKey() {
        return this.key;
    }

    public V getValue() {
        return this.value;
    }

    public void setValue(V value) {
        this.value = value;
    }
}

SkipList Class

SkipList class is used to manage nodes and levels.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
/**
 * Max level of the skip list
 */
private static final int MAX_LEVEL = 32;
/**
 * Dir for data persistence
 */
private static final String DATA_STORE = "./data";
/**
 * header of the skip list
 */
private final Node<K, V> header;
/**
 * current level of the skip list
 */
private int curLevel;
/**
 * Count of node in current skip list
 */
private int nodeCnt;

To find a key, we start at the highest level of the list. Move forward with forward array in the current level until the next key is greater than or eqaul to the target. Then, drop down to a lower level and repeat the moving forward operation until you reach the 0 level.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
/**
 * search and get node value
 * @param key node key to search
 * @return the value of key node
 */
public V get(K key){
    Node<K, V> cur = this.header;

    for(int i = this.curLevel; i >= 0; i--){
        while(cur.forward.get(i) != null && cur.forward.get(i).getKey().compareTo(key) < 0){
            cur = cur.forward.get(i);
        }
    }

    cur = cur.forward.getFirst();

    if(cur != null && cur.getKey().compareTo(key) == 0){
        return cur.getValue();
    }

    return null;
}

Insert

Before the insertion, we need to determine the insert point by performing a search operation. Then randomly determine the level for the new node using java.util.Random.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
/**
 * insert specific node
 * @param key   key of the node
 * @param value value of the node
 * @return  true if node is successfully inserted
 */
public synchronized boolean insert(K key, V value){
    Node<K, V> cur = this.header;
    ArrayList<Node<K, V>> updateTable = new ArrayList<>(Collections.nCopies(MAX_LEVEL + 1, null));

    for(int i = this.curLevel; i >= 0; i--){    // get the last lower key
        while(cur.forward.get(i) != null && cur.forward.get(i).getKey().compareTo(key) < 0){
            cur = cur.forward.get(i);
        }
        updateTable.set(i, cur);
    }

    cur = cur.forward.getFirst();   // lowest level, get(0)

    if(cur != null && cur.getKey().compareTo(key) == 0){    // same key, replace value
        cur.setValue(value);
        return true;
    }

    int randLevel = randomLevel();

    if(cur == null || cur.getKey().compareTo(key) != 0){    // different key, insert node
        if(randLevel > curLevel){
            for(int i = curLevel + 1; i < randLevel + 1; i++){
                updateTable.set(i, header);
            }
            curLevel = randLevel;
        }

        Node<K, V> insertNode = createNode(key, value, randLevel);

        for(int i = 0; i <= randLevel; i++){    // insert node and index
            insertNode.forward.set(i, updateTable.get(i).forward.get(i));
            updateTable.get(i).forward.set(i, insertNode);
        }
        this.nodeCnt++;
        return true;
    }
    return false;
}

If we searched and find the same key, just replace it with the newest value.

1
2
3
4
if(cur != null && cur.getKey().compareTo(key) == 0){    // same key, replace value
    cur.setValue(value);
    return true;
}

The updateTable is to keep track of the nodes that need to have their forward pointers updated to correctly insert the new node into the skiplist. By traversing down the levels and storing these predecessors, the insertion process can efficiently update the skiplist’s structure at all the necessary levels.

In Java, the keyword synchronized is for controlling access to shared resources (like objects and variables) by multiple threads. It helps prevent race conditions and ensures thread safety. It provides mutual exclusion, meaning that only one thread can hold the lock on a particular object or class at any given time. When a thread enters a synchronized block or method, all other threads trying to enter the same synchronized block or method on the same object will be blocked until the first thread releases the lock. Here, the synchronized keyword ensures that only one thread can execute this insert method at a time, preventing race conditions and maintaining the integrity of the skiplist in a multi-threaded environment.

Delete

When the deletion is performing, it will remove all the references from all levles where it is present.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
/**
 * delete node from skip list
 * @param key target node's key
 * @return true if successfully deleted
 */
public synchronized boolean remove(K key){
    Node<K, V> cur = this.header;
    ArrayList<Node<K, V>> updateTable = new ArrayList<>(Collections.nCopies(MAX_LEVEL, null));

    for(int i = this.curLevel; i >= 0; i--){
        while(cur.forward.get(i) != null && cur.forward.get(i).getKey().compareTo(key) < 0){
            cur = cur.forward.get(i);
        }
        updateTable.set(i, cur);
    }

    cur = cur.forward.getFirst();

    if(cur != null && cur.getKey().compareTo(key) == 0){    // node found
        for(int i = 0; i < this.curLevel; i++){
            if(updateTable.get(i).forward.get(i) != cur){
                break;
            }
            updateTable.get(i).forward.set(i, cur.forward.get(i));
        }
    }

    while(this.curLevel > 0 && this.header.forward.get(this.curLevel) == null){
        this.curLevel--;    // update the list level
    }

    this.nodeCnt--;
    return true;
}

REPL & Test

REPL

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
public class Main {
    public static void main(String[] args){
        SkipList<String, String> skiplist = new SkipList<>();
        Scanner scanner = new Scanner(System.in);
        skiplist.loadFile();
        boolean loop = true;
        while(loop){
            String cmd = scanner.nextLine();
            String[] cmdList = cmd.split(" ");
            switch (cmdList[0]) {
                case "insert" -> {
                    boolean status = skiplist.insert(cmdList[1], cmdList[2]);
                    if(status){
                        System.out.println("Insert data: [" + cmdList[1] + "," + cmdList[2] + "] successfully!");
                    }else{
                        System.out.println("Failed to insert data: [" + cmdList[1] + "," + cmdList[2] + "]");
                    }
                }
                case "delete" -> {
                    boolean status = skiplist.remove(cmdList[1]);
                    if(status){
                        System.out.println("[" + cmdList[1] + "] deleted successfully!");
                    }else{
                        System.out.println("Failed to delete data!");
                    }
                }
                case "search" -> {
                    String value = skiplist.get(cmdList[1]);
                    if(value != null){
                        System.out.println("Key " + cmdList[1] + " found: " + value);
                    }else{
                        System.out.println("Key " + cmdList[1] + " not exists!");
                    }
                }
                case "exit" -> {
                    Scanner scan = new Scanner(System.in);
                    System.out.print("Save file? [Y/n]: ");
                    String yn = scan.next();
                    if(yn.equals("y") || yn.equals("Y")){
                        skiplist.saveFile();
                    }else if(yn.equals("n") || yn.equals("N")){
                        break;
                    }else{
                        skiplist.saveFile();
                    }
                    loop = false;
                }
                case "list" -> skiplist.list();
                case "save" -> skiplist.saveFile();
                case "load" -> skiplist.loadFile();
                default -> System.out.println("Command not found");
            }
        }
    }
}

Stress Test

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
import java.util.Random;

public class StressTest{
    public static final int INSERT_TIMES = 100000;
    public static final int SEARCH_TIMES = 100000;

    public static String randString(){
        String characters = "abcdefghijklmnopqrstuvwxyz0123456789";
        int len = 10;
        StringBuilder result = new StringBuilder(len);
        Random random = new Random();

        for(int i = 0; i < len; i++){
            int index = random.nextInt(characters.length());
            result.append(characters.charAt(index));
        }
        return result.toString();
    }

    public static void main(String[] args) throws InterruptedException{
        int nThread = 10;
        long start = System.currentTimeMillis();

        Thread[] threads = new Thread[nThread];
        SkipList<String, String> skipList = new SkipList<>();

        for(int i = 0; i < nThread; i++){
            threads[i] = new Thread(new InsertTask<>(skipList));
            threads[i].start();
        }

        for(int i = 0; i < nThread; i++){
            threads[i].join();
        }

        long end = System.currentTimeMillis();
        System.out.println("Insert in " + nThread + " Thread " + (nThread * INSERT_TIMES) + " takes " + (end - start) + "ms");

        long start2 = System.currentTimeMillis();

        Thread[] threads2 = new Thread[nThread];
        for(int i = 0; i < nThread; i++){
            threads2[i] = new Thread(new SearchTask<>(skipList));
            threads2[i].start();
        }

        for(int i = 0; i < nThread; i++){
            threads2[i].join();
        }

        long end2 = System.currentTimeMillis();
        System.out.println("Search in " + nThread + " Thread " + (nThread * SEARCH_TIMES) + " takes " + (end2 - start2) + "ms");

    }

    private static class InsertTask<K extends Comparable<K>, V> implements Runnable{
        SkipList<K, V> skipList;
        InsertTask(SkipList<K, V> skipList){
            this.skipList = skipList;
        }

        @Override
        @SuppressWarnings("unchecked")
        public void run(){
            for(int i = 0; i < INSERT_TIMES; i++){
                boolean status = this.skipList.insert((K)randString(), (V)randString());
                if(!status){
                    System.out.println("Failed to insert");
                    break;
                }
            }
        }
    }

    private static class SearchTask<K extends Comparable<K>, V> implements Runnable{
        SkipList<K, V> skipList;
        SearchTask(SkipList<K, V> skipList){
            this.skipList = skipList;
        }

        @Override
        @SuppressWarnings("unchecked")
        public void run(){
            for(int i = 0; i < SEARCH_TIMES; i++){
                this.skipList.get((K)randString());
            }
        }
    }
}

It defines a helper function randString() to generate random 10-character alphanumeric strings. The main method initializes a SkipList and creates an array of 10 threads (nThread = 10). For each thread, it then creates InsertTask instances, starts them to concurrently insert random key-value pairs into the SkipList, and waits for all insertion threads to complete.

1
2
3
4
Insert in 10 Thread 1000000 takes 5248ms
Search in 10 Thread 1000000 takes 808ms

Process finished with exit code 0

Disadvantages

A database built on skiplists, while offering advantages like simpler implementation and good performance, also has disadvantages, primarily due to memory overhead and lack of guaranteed worst-case performance compared to balanced trees.

Answer from templatetypedef on StackOverflow: https://stackoverflow.com/a/23814116/26756091

This post is licensed under CC BY 4.0 by the author.