2021-02-22 17:48:14 +00:00
..
2021-02-22 17:48:14 +00:00
2021-02-22 17:48:14 +00:00
2021-02-22 17:48:14 +00:00
2021-02-22 17:48:14 +00:00

#Graviton DB

Graviton Database is a pure Go embeddable key/value store having features unmatched by other software (such as Boltdb, berkeleydb, mysql, postgresql etc). The goal of the project is to provide a simple, fast, reliable, versioned, authenticated database for projects which require such features.

Since GravitonDB is meant to be used as such a low-level piece of functionality, simplicity is key. The API will be small and only focus on getting values and setting values and other related stuff. That's it.

##Project Status Graviton is currently alpha software.Full unit test coverage and randomized black box testing are used to ensure database consistency and thread safety. The project already has 100% code coverage.A number of decisions such as change/rename APIs/ handling errors etc, hashing algorithms etc are being evaluated.

##Features Graviton Database in short is "ZFS for key-value stores".

  • Authenticated data store ( all keys, values are backed by blake 256 bit checksum)
  • Append only data store
  • Support of 2^64 trees ( Theoretically ) within a single data store. trees can be named and thus used as buckets
  • Versioning support ( all committed changes are versioned with ability to visit them at any point in time )
  • Snapshots ( multi tree commits in a single version causing multi bucket sync, each snapshot can be visited, appended and further modified, keys deleted, values modified etc, new keys, values stored )
  • Ability to iterate over all key-value pairs in a tree
  • Ability to diff between 2 trees in linear time and report all changes (insertions,deletions,modifications)
  • Minimal, small, simplified API
  • Theoretically support Exabyte data store, multi Terabyte tested internally.
  • Decoupled storage layer, allowing use of object stores such as Ceph, AWS etc
  • Ability to generate cryptographic proofs which can prove key existance or non-existance ( proofs are around 1 KB )
  • Superfast proof generation time of around 1000 proofs per second per core.
  • Support for disk based filesystem based persistant stores.
  • Support for memory based non-persistant stores
  • 100% code coverage
  • this is alpha software, we are still evaluating whether blake or other algorithm should be used for hashing.

##Table of Contents

  • Getting Started
    • Installing
    • Opening and using a database
    • Tree
    • Using key/value pairs
    • Iterating over keys
    • Snapshots
    • Diffing of 2 trees to detect changes between versions or compare 2 arbitrary trees in linear time
    • Backups
    • Stress testing
    • ToDo
  • Comparison with other databases (Mysql,Postgres,LevelDB,RocksDB, LMDB,Bolt etc)

##Getting Started ###Installing To start using Graviton DB, install Go and run go get:

$ go get github.com/deroproject/graviton/...

This will retrieve the library and build the library

###Opening and using a database

The top-level object in Graviton is a Store. It is represented as a directory with multiple files on your disk and represents a consistent snapshot of your data at all times.

To open your database,

package main

import "fmt"
import "github.com/deroproject/graviton"

func main() {
   //store, _ := gravitron.NewDiskStore("/tmp/testdb")   // create a new testdb in "/tmp/testdb"
    store, _ := gravitron.NewMemStore()            // create a new  DB in RAM
    ss, _ := store.LoadSnapshot(0)           // load most recent snapshot
    tree, _ := ss.GetTree("root")            // use or create tree named "root"
    tree.Put([]byte("key"), []byte("value")) // insert a value
    gravitron.Commit(tree)                  // commit the tree
    value, _ := tree.Get([]byte("key"))
    fmt.Printf("value retrived from DB \"%s\"\n", string(value))
}

NOTE: Linux (or other platforms) have open file limit for 1024. 
      You may need to raise such limits. Default limits allows upto 2TB graviton databases.

###Tree A Tree in Graviton DB acts like a bucket in boltdb or a ZFS dataset.It is named and can contain upto 128 byte names.Any store can contain infinite trees. Each tree can also contain infinite key-value pairs. However, practically being limited by the server or sytem storage space.

Each tree can be accessed with its merkle root hash using "GetTreeWithRootHash" API. Also each tree maintains its own separate version number and any specific version can be used GetTreeWithVersion. Also, note that each tree can also have arbitrary tags and any tagged tree can be accessed using the tag GetTreeWithTag. Also, 2 arbitrary trees can diffed in linear time and relevant changes detected.

NOTE:Tree tags or names cannot start with ':'

Using key/value pairs

To save a key/value pair to a tree ( or bucket), use the tree.Put() function:

        tree, _ := ss.GetTree("root") 
        tree.Put([]byte("answer"), []byte("44")) // insert a value
        graviton.Commit(tree)  // make the tree persistant by storing it in backend disk

This will set the value of the "answer" key to "44" in the root tree. To retrieve this value, we can use the tree.Get() function:

	tree, _ := ss.GetTree("root") 
	v,_ := tree.Get([]byte("answer"))
	fmt.Printf("The answer is: %s\n", v)

The Get() function returns an error because its operation is guaranteed to work (unless there is some kind of system failure which we try to report). If the key exists then it will return its byte slice value. If it doesn't exist then it will return an error.

Iterating over keys

Graviton stores its keys in hash byte-sorted order within a tree. This makes sequential iteration over these keys extremely fast. To iterate over keys we'll use a Cursor:

	// Assume "root" tree exists and has keys
    tree, _ := store.GetTree("root") 
	c := tree.Cursor()

	for k, v, err := c.First(); err == nil; k, v, err = c.Next() { 
		fmt.Printf("key=%s, value=%s\n", k, v)
	}

The cursor allows you to move to a specific point in the list of keys and move forward or backward through the keys one at a time.

The following functions are available on the cursor:

First()  Move to the first key.
Last()   Move to the last key.
Next()   Move to the next key.
Prev()   Move to the previous key.

Each of those functions has a return signature of (key []byte, value []byte, err error). When you have iterated to the end of the cursor then Next() will return an error ErrNoMoreKeys. You must seek to a position using First(), Last() before calling Next() or Prev(). If you do not seek to a position then these functions will return an error.

###Snapshots Snapshot refers to collective state of all buckets + data + history. Each commit ( tree.Commit() or Commit(tree1, tree2 .....)) creates a new snapshot in the store.Each snapshot is represented by an incremental uint64 number, 0 represents most recent snapshot. Snapshots can be used to access any arbitrary state of entire database at any point in time.

Eg. code

package main

import "fmt"
import "github.com/deroproject/graviton"

func main() {
	   key := []byte("key1")
   //store, _ := graviton.NewDiskStore("/tmp/testdb")   // create a new testdb in "/tmp/testdb"
   store, _ := graviton.NewMemStore()          // create a new  DB in RAM
   ss, _ := store.LoadSnapshot(0)         // load most recent snapshot
   tree, _ := ss.GetTree("root")          // use or create tree named "root"
   tree.Put(key, []byte("commit_value1")) // insert a value
   commit1, _ := graviton.Commit(tree)         // commit the tree
   tree.Put(key, []byte("commit_value2")) // overwrite existing value
   commit2, _ := graviton.Commit(tree)         // commit the tree again

   // at this point, you have done 2 commits
   // at first commit or snapshot,  "root" tree contains  "key1 : commit_value1"
   // at second commit or snapshot,  "root" tree contains  "key1 : commit_value2"

   // we will traverse now commit1 snapshot
   ss, _ = store.LoadSnapshot(commit1)
   tree, _ = ss.GetTree("root")
   value, err := tree.Get(key)
   fmt.Printf(" snapshot%d  key %s value %s err %s\n", ss.GetVersion(), string(key), string(value), err)

   // we will traverse now commit2 snapshot
   ss, _ = store.LoadSnapshot(commit2)
   tree, _ = ss.GetTree("root")
   value, err = tree.Get(key)
   fmt.Printf(" snapshot%d  key %s value %s err %s\n", ss.GetVersion(), string(key), string(value), err)
}

###Diffing of 2 trees to detect changes between versions or compare 2 arbitrary trees in linear time 2 arbitrary trees can be diffed in linear time to detect changes.Changes are of 3 types, insertions, deletions and modifications (same key but value changed ). If the reported changes are applied to base tree, it will be equivalent to the head tree being compared.

func Diff(base_tree, head_tree *Tree, deleted, modified, inserted DiffHandler) (err error)

Diffhandler is a callback function of the following type having k,v as arguments

type DiffHandler func(k, v []byte)

The algorithm is linear time in the number of changes. Eg. a tree with billion KVs can be diffed with parent almost instantaneously.

Database backups

You can simply cp or copy command or use rsync to sync a Graviton database even while the database is being used. However, as you have noticed that the database might be continuously appending, backup will always lag a bit. However, note that the database or backups will NEVER get corrupted while commits are being done.

###Stress Testing A mini tool to do mini single thread testing is available. which can be used to various tests on memory or disk backend.

go run github.com/deroproject/graviton/cmd/stress

See help using --help argument. To use disk backend, use --memory=false

Internals

Internally, all trees are stored within a base-2 merkle with collapsing path. This means if tree has 4 billion key-value pairs, it will only be 32 level deep.This leads to tremendous savings in storage space.This also means when you modify an existing key-value, only limited amount of nodes are touched.

###Lines of Code ~/tools/gocloc --by-file node_inner.go tree.go snapshot.go proof.go node_leaf.go store.go node.go hash.go const.go doc.go diff_tree.go cursor.go ----------------------------------------------------------------- File files blank comment code ----------------------------------------------------------------- node_inner.go 76 33 364 store.go 69 22 250 tree.go 75 71 250 proof.go 30 16 171 snapshot.go 36 18 155 node_leaf.go 29 3 150 diff_tree.go 34 33 133 cursor.go 21 15 106 node.go 5 3 35 const.go 4 0 21 hash.go 7 2 19 doc.go 16 42 1 ----------------------------------------------------------------- TOTAL 12 402 258 1655 -----------------------------------------------------------------

##ToDo

  • currently it is not optimized for speed and GC (garbage collection)
  • more testing should be done
  • expose/build metrics
  • currently, we have error reportingapi to reports rot bits. but what about disks corruption, should we discard such error design and make the API simpler ( except snapshots, tree loading, commiting, no more errors ). This needs major discussion.

Comparison with other databases

Postgres, MySQL, & other relational databases

Relational databases structure data into rows and are only accessible through the use of SQL. This approach provides flexibility in how you store and query your data but also incurs overhead in parsing and planning SQL statements. Graviton accesses all data by a byte slice key. This makes gravitpn fast to read and write data by key but provides no built-in support for joining values together.

Most relational databases (with the exception of SQLite) are standalone servers that run separately from your application. This gives your systems flexibility to connect multiple application servers to a single database server but also adds overhead in serializing and transporting data over the network. Graviton runs as a library included in your application so all data access has to go through your application's process. This brings data closer to your application but limits multi-process access to the data. Also, None of the databases provides ability to traverse back-in-time for each and every commit.Graviton is the only DB which provides back-in-time.It also provides ability to diff between 2 trees in linear time.

LevelDB, RocksDB

LevelDB and its derivatives (RocksDB, HyperLevelDB) are similar to Graviton in that they are libraries bundled into the application, however, their underlying structure is a log-structured merge-tree (LSM tree). An LSM tree optimizes random writes by using a write ahead log and multi-tiered, sorted files called SSTables. Graviton uses a base 2 merkle tree internally. Both approaches have trade-offs.

If you require a high random write throughput or you need to use spinning disks then LevelDB could be a good choice. If your application is read-heavy or does a lot of key-value or needs versioning (and traversing history) or needs authenticated data, then Graviton could be a good choice.

LMDB, Bolt

LMDB, Bolt are architecturally similar. Both use a B+tree, have ACID semantics with fully serializable transactions, and support lock-free MVCC using a single writer and multiple readers.

The two projects have somewhat diverged. LMDB heavily focuses on raw performance while Bolt has focused on simplicity and ease of use. For example, LMDB allows several unsafe actions such as direct writes for the sake of performance. Bolt opts to disallow actions which can leave the database in a corrupted state. The only exception to this in Bolt is DB.NoSync.Graviton does not leave the database in corrupted state at any point in time.

Both do not support versioning/snapshotting/diffing etc while Graviton supports them.