Design Principles
From the many programming projects (often API's) I have created in the past, I observed that I always need to persist data to disk in order to query/update it later. Quite frequently, I found myself (re)creating quick & dirty persistance logic. I usually store everything as JSON on disk. I don't like to use SQL databases if I don't have to.
Of course I could use something like Memcached or Redis. Or even a SQL database service such as PostgreSQL. To be honest, Memcached or Redis would have been a better solution.
Having discussed reinventing the wheel and that I like doing it, let's focus on the design principles and the capabilities that I need from db.js
:
- In-Memory: Recently stored data should be kept in an in-memory cache, since recent data is read and updated way more frequently than old data. This observation is paramount!
- Key-Value semantics: I like to associate the stored object with an unique key. Therefore, I like to work with key-value storages.
- JSON Format: I like to store data as JSON in files, since the performance benefits of other data formats don't outweight the easiness to work with JSON. Put differently: I just don't have the time to learn any other data format than JSON. JSON is easily readable and that's what matters most. Everyone understands JSON. There are other things such BSON, but no one really cares about it.
- Persistance: I don't want to care about when/why/where to persist data. This should be done by
db.js
in the background in a safe and consistent manner. Data is persisted to simple JSON files after the memory-cache reaches a certain age or size. - No SQL required: No complex SQL query semantic is needed. In fact, the only way I need to query data is:
- base on a key with lookup time
O(1)
- based on a time range
(ts0, ts1)
wherets0
andts1
are both timestamps - based on an index range
(start, stop)
wherestart
andstop
are both integers - if I don't specify any selection criteria, then
db.js
should just return the memory cache contents (lookup timeO(1)
)
- base on a key with lookup time
- Data does not need to be deleted: I don't care about deleting data. Delete operations are hard to implement, since a delete operation requires an index and reverse index update. In fact, providing a delete operation doesn't outweigh the complexity introduced by its implementation.
Quick Start & Usage
db.js
allows you to work with key/value data without caring about data persistance and storage. All that db.js
gives you is a key/value store. db.js
persists data to disk as JSON files periodically and safely. Currently db.js
does not run as a daemon, it will shut down safely when the process is terminated.
Installation:
git clone https://github.com/NikolaiT/db.js
cd db.js/
Simple Usage:
const DBjs = require('./dbjs').DBjs;
let db_js = new DBjs();
db_js.set('4343', {'name': 'test'});
console.log(db_js.get('4343'));
db_js.close();
Of course db.js
has many different configuration options that you can use:
const DBjs = require('./dbjs').DBjs;
const config = {
// after what size in MB the memory cache should be persisted to disk
persist_after_MB: 20,
// after what time in seconds the memory cache should be persisted to disk
persist_after_seconds: 12 * 60 * 60,
// absolute/relative path to database directory
database_path: '/tmp/database/',
// path to file where to log debug outputs to
logfile_path: '/tmp/dbjs.log',
// after how many seconds should the cache and index be persisted
flush_interval: 5 * 60,
// file prefix for archived files
file_prefix: 'dbjs_',
// whether to print debug output
debug: false,
// max key size in bytes
max_key_size_bytes: 1024,
// max value size in bytes
max_value_size_bytes: 1048576,
};
let db_js = new DBjs(config);
db_js.set('someKey', 'someValue');
console.log(db_js.get('someKey'));
db_js.close();
More Realistic Example
Obviously, using db.js
like above does not make much sense. I use db.js
mostly as data storage for the many web API's I am creating. Therefore, the following example using express is more useful:
const express = require('express')
const DBjs = require('./dbjs').DBjs
const config = {
// absolute/relative path to database directory
database_path: '/tmp/exampleDatabase/',
// path to file where to log debug outputs to
logfile_path: '/tmp/example.log',
// whether to print debug output
debug: true,
};
let db_js = new DBjs(config);
const app = express()
const port = 3000
function randomString(length = 100) {
let str = '';
for (let i = 0; i < length; i++) {
str += String.fromCharCode(Math.floor(65 + Math.random() * 25));
}
return str;
}
// Setting keys: http://localhost:3000/set?key=alpha&value=beta
// Getting value for a key: http://localhost:3000/get?key=alpha
app.all('/set', (req, res) => {
res.header('Content-Type', 'application/json');
if (req.query.key === undefined) {
return res.status(400).send({ msg: 'you must provide a key' })
}
let key = req.query.key;
let value = undefined;
if (req.query.value) {
value = req.query.value
}
if (req.body && req.body.value) {
value = req.body.value;
}
if (value === undefined) {
return res.status(400).send({ msg: 'you must provide a value' })
}
db_js.set(key, value);
return res.status(200).send({ msg: 'ok' })
})
app.get('/get', (req, res) => {
res.header('Content-Type', 'application/json');
if (req.query.key === undefined) {
return res.status(400).send({ msg: 'you must provide a key' })
}
let key = req.query.key;
return res.status(200).send(db_js.get(key))
})
app.get('/get_all', (req, res) => {
res.header('Content-Type', 'application/json');
return res.status(200).send(JSON.stringify(db_js._getn(100000), null, 2))
})
app.get('/insert_random', (req, res) => {
res.header('Content-Type', 'application/json');
if (req.query.num === undefined) {
return res.status(400).send({ msg: 'you must provide the number of random values to insert with the key `num`' })
}
let num = parseInt(req.query.num);
for (let i = 0; i < num; i++) {
db_js.set(randomString(5), randomString(30));
}
return res.status(200).send({ msg: 'ok' })
})
app.listen(port, () => {
console.log(`Example db.js app listening on port ${port}`)
})
db.js API
The db.js API currently has five main API methods:
set(key, value)
set(key, value)
- Assigns the value
to the key
in the storage. If the key
is already in the database, the value will be overwritten. keys are unique.
get(key)
get(key)
- Returns the value
associated with key
from the storage. The lookup time is O(1)
.
getn(index_range, time_range)
getn(index_range, time_range)
- Returns an array of values in insertion order. This means that the most recent inserted value (Inserted with set(key, value)
) is returned as first element of the array. When both index_range=null
and time_range=null
are set to null
, then getn()
returns the memory cache contents by default.
The variable index_range
selects values to be returned by index range. If you specify index_range=[0, 500]
, then the last 500 inserted values are returned.
The variable time_range
selects values to be returned by an timestamp range. If you specify time_range=[1649418657952, 1649418675192]
, then the items that were inserted between those two timestamps will be returned.
index_size()
index_size()
- Returns the index size of the database. This is equivalent to the number of all database entries and thus the size of the database.
cache_size()
cache_size()
- Returns the cache size of the database. The cache includes all database entries that are kept in memory.