Cloud Backed SQLite

Page Contents

The “Cloud Backed SQLite” (CBS) system allows databases to be stored within
cloud storage accounts such that they can be read and written by storage
clients without first downloading the entire database to the client. This
document should be read in concert with the detailed API documentation present
in the two main public header files:

Databases may be accessed concurrently by multiple clients, although ensuring
that only a single client is writing to a database at any time is (mostly) left
to the application. A “client” is a single operating system process system –
within a single client process there may be multiple SQLite handles reading and
writing a database using the usual SQLite WAL-mode read/write locking to
manage concurrent access between themselves. Existing clients do not see
changes made by writers automatically, they are obliged to explicitly poll
cloud storage to do so.

The system currently supports
Azure Blob Storage
and Google Cloud Storage. It also features
an API that may be used to implement support to other
cloud storage systems.

The software is currently developed on Linux with stock gcc, and on Windows-10
using the MSVC-compatible mingw64 gcc compiler. Other platforms are quite
likely to work too, but have not been tested.

1.1. Storage Format

SQLite databases are not stored within the blob storage system in the usual
format. Instead, each database is divided into one or more fixed-size blocks.
The default block-size is 4MB, although this is configurable. Each block is
assigned an id between 16 and 32 bytes (128 and 256 bits) in size (all blocks
in a single container have the same sized ids). The name of each block file
is created by encoding its id in hexadecimal format and adding the extension
“.bcv”. For example:

  787A5B026DBF882F89748C37AED04CCD.bcv

For containers that use ids smaller than 24 bytes in size, each block id is
randomly generated. For manifests that use ids 24 bytes or larger, the first
16 bytes of each id may contain the md5 hash of the block contents, with
the remainder of the id made up by pseudo-random values. See the
section on block sharing below for further details.

Along with the block files is a “manifest” file, which contains, amongst other
things, a machine-readable description of how the various block files may be
assembled into into one or more SQLite databases. The name of the manifest
file is always:

  manifest.bcv

There is also an SQLite database used to store the contents of the “bcv_kv”
virtual table MAKE THIS LINK SOMEWHERE! named:

  bcv_kv.bcv

At present, manifest and block files may only be stored in the root directory
of a blob storage container, not in any sub-directory. This means that each
container contains at most a single manifest file. It may be that this
limitation can be removed if desirable. Other files may be stored in a
container along with the CBS files, but they must not use the “.bcv” extension.
Files that end with “.bcv” may be mistaken for surplus block or other files and
deleted or overwritten by CBS clients.

Note: CBS documentation uses the Azure terminology “container” to refer
to the cloud storage virtual receptacle in which block files and manifests are
stored. In Google Storage documentation the analagous concept is a “bucket”.
Other cloud storage systems may use their own vernacular.

1.2. System Components

There are three components to the system:

  1. Primitives to:
    • Create blob storage containers and populate them
      with empty manifest files (manifest files indicating that the
      container contains no databases).
    • Destroy entire blob storage containers and their contents.
    • Upload databases to cloud storage.
    • Download databases from cloud storage.
    • Create copies of existing databases within cloud storage.
    • Delete databases from cloud storage.
    • List the databases within a cloud storage container.
    • Clean up (remove) unused blocks from cloud storage.

    Each primitive is available both as an
    C API that can be called by
    applications, and as a command-line tool.

  2. A daemon process that, while running, provides local SQLite
    database clients in the same or different OS process with read-only
    access to databases stored in remote blob storage containers.

  3. A VFS module that may be used in two modes, as follows:
    • For read-only access of cloud databases via a running daemon process,
      or
    • For read/write access of cloud databases in “daemonless” mode.

    The advantage of using a daemon over daemonless mode is that a single
    local cache of downloaded database blocks may be shared by multiple
    processes. In daemonless mode, each process is obliged to maintain its
    own separate cache.

Both the daemon and command line tools are packaged as a single binary
executable – “blockcachevfsd”.

command from the root directory of the source tree.

Adding Application Support

To add support for the VFS and various primitives, the following C files from
the source code distribution must be built into the application:

    bcvutil.c
    bcvmodule.c
    blockcachevfs.c
    simplexml.c
    bcvencrypt.c

The following header files from the source code distribution are required:

    bcvutil.h
    bcvmodule.h
    blockcachevfs.h
    simplexml.h
    bcv_int.h
    bcvencrypt.h

As well as SQLite, the application must be linked against libcurl and openssl.

The application should not include either “bcv_int.h” or “simplexml.h”
directly. They are required by the build, but are not intended to be used
directly. The other three header files are intended to be used by applications,
they container declarations for APIs that provide the following functionality:

  • Header file blockcachevfs.h
    contains APIs required to use the VFS module to access cloud databases
    via usual SQLite interfaces.
  • Header file bcvutil.h
    contains APIs required to use the various cloud storage primitives.
  • Header file bcvmodule.h
    contains APIs required to extend blockcachevfs to access new
    cloud storage systems (other than Azure Blob Storage and Google Cloud
    Storage, for which there is built-in support).

Building blockcachevfsd:

The easiest way to build the “blockcachevfsd” executable that contains both
the daemon and the command line tools is using the standard:

  ./configure && make

Alternatively, it may be built from all sources listed above as required
for adding blockcachevfs support for an application, along with file
“blockcachevfsd.c”, which contains the main routine.

Testing the system:

The blockcachevfs system has an automated test suite. Notes describing its
use may be found here.

3.1. Uploading Databases

This section illustrates how to upload databases to a blob storage account
and begin using them. It assumes that cloud storage module name and
authentication details, for example an account name and access key are stored
in a file named “account.txt” in the current directory. Example account.txt
contents for Azure Blob Storage:

  -module azure
  -user devstoreaccount1
  -auth Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==

The command-line options above can just as easily be specified directly on the
command-line. But storing them in a file, and then using the
“-f ” option supported by all commands to treat the contents
of the named file as if they were specified directly on the command-line can
be less cumbersome.

To begin, create a container and an empty manifest to which databases may be
uploaded, where $CONTAINER is the container name:

  $ blockcachevfsd create -f account.txt $CONTAINER

If the named blob storage container already exists, CBS will clobber any
manifest file within it with a new, empty, manifest file, effectively
deleting any databases in the container.

Next, upload one or more databases from the local file system to the blob
storage account. In the following, $LOCALDB is an absolute or relative path
to an SQLite database file and $DBNAME is the name used to access the
database after it has been uploaded. It is an error if there is already
a database named $DBNAME in the manifest:

  $ blockcachevfsd upload -f account.txt -container $CONTAINER $LOCALDB $DBNAME

3.2. Accessing Databases

This section explains, by way of example code, how to access a blockcachevfs
datbase stored in cloud storage using an SQL client. The full program is
available here. It should be read together with
the API documentation in file blockcachevfs.h

The example code implement a program that accepts four arguments on the command
line:

  • A path to an existing directory to use for a blockcachevfs cache
    directory,
  • The name of a container to attach to the new blockcachevfs VFS
    the program creates,
  • A path to a database to open using the blockcachevfs VFS, and
  • An SQL script to evaluate against that database.

The program creates a VFS, attaches the container, opens a database handle
and then executes the supplied SQL script. The code as presented below omits
several uninteresting features for brevity – there is no main() routine for
example. But the full version, linked above, contains everything required
for a working application.

The following block contains the required #include directives and the
authentication callback for the example application. All applications
require an authentication callback, as there is no other way to provide
authentication tokens to system. And there are currently no implementations
of cloud storage modules that do not require authentication tokens.

/*
** The code in this file creates and uses a VFS, but it doesn't use any 
** cloud storage primitives or implement a new cloud storage module, so 
** it only needs to include "blockcachevfs.h". And "sqlite3.h" of course.
*/
#include "blockcachevfs.h"
#include "sqlite3.h"

/*
** This program is hardcoded to access an Azure emulator running on port
** 10000 (the default for Azurite) of the localhost. It is also hardcoded
** to the demo account - the account built-in to the emulator with the
** well-known credentials reproduced below. 
*/ 
#define CS_STORAGE "azure?emulator=127.0.0.1:10000"
#define CS_ACCOUNT "devstoreaccount1"
#define CS_KEY "Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw=="

/*
** Authentication callback. A real application would return a different 
** authentication token based on the storage system, account name and 
** container name parameters, but since the credentials used by this
** application are hard coded, it just returns a copy of constant string
** CS_KEY.
**
** Because the API is defined such that this function must return a buffer
** allocated using sqlite3_malloc() or compatible, this implementation
** uses sqlite3_mprintf() to make a copy of the authentication token.
*/
static int csAuthCb(
  void *pCtx,
  const char *zStorage,
  const char *zAccount,
  const char *zContainer,
  char **pzAuthToken
){
  *pzAuthToken = sqlite3_mprintf("%s", CS_KEY);
  return (*pzAuthToken) ? SQLITE_OK : SQLITE_NOMEM;
}

The next block contains the start of the cloudsql() function, which does the
bulk of the work for this application. It begins by creating the VFS object.
Note that, like almost all other blockcachevfs APIs, if an error occurs the
sqlite3_bcvfs_create() function returns an error message in a buffer that
must be freed by the caller using sqlite3_free().

Once the VFS has been created successfully, the application can use the
sqlite3_bcvfs_isdaemon() API to see if the VFS has connected to a daemon or
is running in daemonless mode.

If, when sqlite3_bcvfs_create() is called, there is a daemon process using the
specified directory as its cache directory, then the VFS created by the
sqlite3_bcvfs_create() call automatically connects to the daemon and
provides read-only access. The daemon can be started using the following
command line:

    blockcachevfsd daemon $DIRECTORY

where $DIRECTORY above must be the same string as passed to cloudsql()
function below via the zDir parameter. If a daemon process was using the
directory for its cache directory, but has since exited, the
sqlite3_bcvfs_create() call will fail.

If there has never been a daemon running in the directory, then the call to
sqlite3_bcvfs_create() creates a VFS running in daemonless mode. In this mode
it requires exclusive access to the directory. If there is some other
daemonless VFS already running in the specified directory, the call to
sqlite3_bcvfs_create() fails with SQLITE_BUSY.

/*
** Open a VFS that uses directory zDir as its cache directory. Then attach
** container zCont. Next, open an SQLite handle on path zPath using the new 
** VFS and execute SQL script zSql.
*/
static int cloudsql(
  const char *zDir,               /* Directory to use for blockcachevfs cache */
  const char *zCont,              /* Container to attach */
  const char *zPath,              /* Path to open */
  const char *zSql                /* SQL to execute */
){
  int rc = SQLITE_OK;             /* Error code */
  char *zErr = 0;                 /* Error message */
  sqlite3_bcvfs *pVfs = 0;        /* VFS handle */
  sqlite3 *db = 0;                /* Database handle open on zPath */

  /* Create a VFS object. Directory zDir must already exist. If it exists
  ** and there is a daemon running in that directory, the new VFS connects
  ** to the daemon for read-only access. Or, if there is no such daemon,
  ** the new VFS will provide read-write daemonless access.  */
  rc = sqlite3_bcvfs_create(zDir, "myvfs", &pVfs, &zErr);

  /* Check if this is a daemon VFS or not */
  if( rc==SQLITE_OK ){
    if( sqlite3_bcvfs_isdaemon(pVfs) ){
      printf("VFS is using a daemonn");
    }else{
      printf("VFS is in daemonless moden");
    }
  }

Following this, assuming it was create successfully, the code connects the
authorization callback to the new VFS. And then “attaches” the cloud
storage container specified by the user to the VFS.

Before the databases in a cloud storage container can be accessed, it must
be attached to the VFS. Once it has been attached, SQL clients may access
the databases within the container by opening paths of the form
“/$CONTAINER/$DATABASE” using the blockcachevfs VFS.

  /* Configure the authorization callback. */
  if( rc==SQLITE_OK ){
    sqlite3_bcvfs_auth_callback(pVfs, 0, csAuthCb);
  }

  /* Attach the container. Specify the SQLITE_BCV_ATTACH_IFNOT flag so that
  ** it is not an error if the container is already attached. 
  **
  ** There are two reasons the container might already be attached, even
  ** though the VFS was only just created. Firstly, if this VFS is connected
  ** to a running daemon process, then some other client may have already
  ** attached the container to the daemon. Secondly, VFS objects store their
  ** state in the cache directory so that if they are restarted, all
  ** containers are automatically reattached. So if this (or some other
  ** blockcachevfs application) has run before specifying the same 
  ** cache directory, the container may already be attached.  */
  if( rc==SQLITE_OK ){
    rc = sqlite3_bcvfs_attach(pVfs, CS_STORAGE, CS_ACCOUNT, zCont, 0,
        SQLITE_BCV_ATTACH_IFNOT, &zErr
    );
  }

Once the container is attached to the VFS, the database handle is opened
and the SQL script executed. Before the script is executed though,
sqlite3_bcvfs_register_vtab() is called to make the virtual tables available
to the database handle.

For the sake of brevity, the code below omits the sqlite3_exec() callback
that is part of the full version.

  /* Open a database handle on a cloud database. */
  if( rc==SQLITE_OK ){
    rc = sqlite3_open_v2(zPath, &db, SQLITE_OPEN_READWRITE, "myvfs");
    if( rc!=SQLITE_OK ){
      zErr = sqlite3_mprintf("%s", sqlite3_errmsg(db));
    }
  }

  /* Enable the virtual table interface. */
  if( rc==SQLITE_OK ){
    rc = sqlite3_bcvfs_register_vtab(db);
  }

  /* Execute the provided SQL script. */
  if( rc==SQLITE_OK ){
    sqlite3_exec(db, zSql, 0, 0, &zErr);
  }

Finally, the cloudsql() function cleans up the two handles it allocated,
outputs any error message and returns.

  sqlite3_close(db);
  sqlite3_bcvfs_destroy(pVfs);

  /* Output any error, free any error message and return. */
  if( rc!=SQLITE_OK ){
    fprintf(stderr, "Error: (%d) %sn", rc, zErr);
  }
  sqlite3_free(zErr);
  return rc;
}

3.3. Secure Containers

When running in daemon mode, containers may be attached either securely or
insecurly. By default, containers are attached to a daemon insecurely. This
means that:

  • Any local process that can connect to the daemon’s listening socket
    may read and write the database, and that

  • Even after the daemon process has stopped, any cached parts of
    databases in attached containers may be read directly from the
    cache file.

When containers are attached securely, all data stored in the cache file
is encrypted using 128-bit AES OFB encryption. Local clients obtain the
encryption key from the daemon process by supplying valid cloud storage
credentials – a client is granted the encryption key only if it can prove
that it already has permission to access the database within cloud storage.
Cloud storage credentials are verified only once, when the client first
connects, so it is quite possible for a client to remain connected to a
daemon process and accessing a database long after its supplied credentials
have expired.

A container is nominated as secure when it is first attached to a daemon,
by specifying the SQLITE_BCV_ATTACH_SECURE flag along with the
sqlite3_bcvfs_attach() call.

There are two known disadvantages to attaching containers securely:

  • Blocks are stored unencrypted in cloud storage. This means that as
    well as local clients encrypting and decrypting individual database
    pages as they are read, each time a block is downloaded the entire block
    must be encrypted before it can be stored in the cache file. Under some
    cirumstances this overhead may be significant.

  • If the daemon process crashes or is killed after data is written
    to a secure container database but before said data has been uploaded
    to cloud storage, the data is lost.

All CBS APIs and command-line tools accept a “module” specification, which
determines the cloud storage system that the system operates on and how
it operates. There are two built-in modules, “azure”, which works with
Azure Storage, and “google”, which works with Google Cloud Storage. There
is also an API that may be used to add new cloud
storage modules.

A module specification consists of the module name optionally followed by
one or more URI style key/value parameters. For example:

  azure
  azure?sas=1
  azure?emulator=127.0.0.1:10000&sas=1

There is no default module. It must always be specified explicitly.

As well as a module specification, CBS APIs and command-line tools require
a user-name and an authentication value – a string containing cloud storage
authentication information. Exactly how these are interpreted depends on the
module specification. The values expected by the two built-in modules, and how
they are used, are described in the following two sections.

4.1 The Built-In “azure” Module

The “azure” module is used to interface with Azure Storage. The value provided
as a user-name is used as the Azure user name when connecting. The
authentication value may either be the corresponding

account access-key
(shared-key access), or an

Azure Shared Access Signature token
, depending on the URI style parameters
provided along with the module name.

One way to generate an SAS token for Azure is to use the “az” command line
tool. For example, given account $ACCOUNT with access-key $ACCESSKEY, to
generate an SAS token that allows read/write access to container $CONTAINER and
expires on the 15th of April, 2020:

  az storage container generate-sas -o tsv 
      --account-name $ACCOUNT 
      --account-key $ACCESSKEY 
      --expiry 2020-04-15 
      --name $CONTAINER 
      --permissions dlrw

The daemon process and some command-line commands may also use read-only SAS
tokens. To generate a read-only token, the –permissions option in the command
above should be passed “lr” instead of “dlrw”.

The following URI style parameters may be used to modify the behaviour of the
azure module:

Option Interpretation
emulator If this option is present, it indicates that CBS should connect to an
Azure emulator instead of to a real Azure cloud storage account. Both
the open source azurite
and the
legacy emulator
are supported. The value of this option should be set to the
“host:portnumber” address of the port on which the emulated Azure
blob service is running. If using either supported emulator on the
local machine with default options, this address is
“127.0.0.1:10000”.
sas This option must be set to either “0” (the
default) or “1”. If it is set to 1, then the module expects the
provided authentication value to be an SAS token. Otherwise, it
assumes it to be an access-key.
customuri This option may only be present if “sas=1”
is also specified. Similarly, it may only be set to “0” (the
default), or “1”. If it is set to 1, then a base URI for the
container must be specified in place of the account name parameter.
If the specified container name is NULL or a zero-length string,
then the base URI is used as is as the URI for the remote container.
Otherwise, the specified container name is appended as a directory
component to the base URI. For example, if the string value passed as
the account name for an “customuri=1” module is:

https://photos.contoso.com

and the container name “myforms”, then the URI used to access the container manifest is:

https://photos.contoso.com/myforms/manifest.bcv

Alternatively, the same container can be accessed by specifying a NULL container name and:

https://photos.contoso.com/myforms

as the custom URI value.

For example, to connect to a real Azure account using an SAS token for
authentication purposes:

  azure?sas=1

To connect to an Azure blob emulator running on localhost port 10000 (the
default) using an access-key for authentication, either of the following
may be used:

  azure?emulator=127.0.0.1:10000
  azure?emulator=127.0.0.1:10000&sas=0

Or, to connect to a real Azure account using an access-key for authentication:

  azure

may be used.

4.2 The Built-In “google” Module

The “google” module is used to connect CBS to Google Cloud Storage. The
user-name parameter should be passed the
project-id
for the project that owns the storage bucket or buckets to be operated on.
In practice, this is only required for CBS’s “create” primitive, and only
then if it needs to
create a new cloud storage container (bucket). In all other
cases, an empty string may be specified for the CBS user-name parameter.

The authorization value parameter must be passed the text of a

Google Cloud access token
.

The google module does not accept any URI style parameters.

5.1. Concurrent Access From Multiple Clients

The system supports any number of concurrent readers using either daemonless
or daemon mode. Even though no locks are taken, read transactions are not
usually interfered with, even if a new version of the database is pushed by
another host while they are ongoing. This is because block files are never
modified after they are created; new block files are uploaded for each change.
And old block files are not deleted immediately (see
below
), so ongoing read transactions can usually continue to access them
for as long as required.

If a database is modified within the cloud storage system, the modifications
do not become automatically visible to existing clients. Until the container
is “polled”, clients continue to see the old versions of all databases. A
container may be polled either by calling the sqlite3_bcvfs_poll() API:

    int sqlite3_bcvfs_poll(sqlite3_bcvfs *pFs, const char *zCont, char **pzErr);

or the equivalent PRAGMA interface. To be clear, any
changes made are immediately visible to other SQL clients using the same
read-write daemonless VFS object, but are not visible to SQL clients using
other VFS objects until the container has been polled.

Similarly, after changes are made to a cloud storage database via a read-write
daemonless VFS object, the new version of the database is not automatically
uploaded to cloud storage. In order to upload changes to cloud storage so
that they are available to new clients or existing clients after they have
polled the container, use the sqlite3_bcvfs_upload() API:

    int sqlite3_bcvfs_upload(
      sqlite3_bcvfs *pFs,             /* VFS handle */
      const char *zCont,              /* Container (alias) to upload databases of */
      int (*xBusy)(void*,int),        /* Busy-handler callback */
      void *pBusyArg,                 /* First argument passed to xBusy */
      char **pzErr                    /* OUT: Error message */
    );

or the equivalent PRAGMA interface.

At present, it is not possible to upload changes to a database if there have
been changes made to *any* database in the container since the most recent
poll operation. This is not usually a problem, the container can be polled
and then the upload reattempted.

However, the poll operation will fail if there have been changes to one or
more databases in cloud storage for which there are also local changes yet
to be uploaded, or if there are local changes to a database that has been
deleted from cloud storage. It is the responsibility of the user to avoid
this situation – concurrent modifications being made to the same database
at multiple sites – perhaps by using the bcv_kv virtual table.

5.2. Deleting Old Blocks

Once they have been uploaded, blocks are never modified within cloud storage.
Instead, when a database is modified, new block files are uploaded for the
modified regions and the manifest updated to refer to the new list of blocks.
This makes managing reader/writer concurrency easier, and facilitates
block sharing, but also means that as time passes a
large number of unused block files may accumuate within cloud storage.

Unused block files may be removed by invoking the sqlite3_bcv_cleanup()
interface from bcvutil.h:

    int sqlite3_bcv_cleanup(sqlite3_bcv *p, int nSecond);

In order to avoid disturbing existing read clients, the second parameter
may be used to specify the minimum number of seconds the block must have
been unused for to be considered eligible for cleanup.

5.3. Block Sharing

Sometimes, a single block may be used by multiple databases within a
manifest, saving cloud storage space and bandwidth. This is known as
“block sharing”. Block sharing occurs in two scenarios:

  • If a copy of a database is made using the
    command line copy primitive, then the copy
    shares all of its blocks with the original. This takes place regardless
    of the configured size of block ids for the manifest file.

  • If the configured block-id size for a manifest file is greater than
    or equal to 24 bytes, then the first 16 bytes of each block id is
    an md5 hash of the block contents. In this case, before each new
    block is uploaded, the system checks to see if there already exists a
    block with the same md5 hash (and hence, we assume, the same content)
    that can be used in its place.

    This usually only occurs when a database being uploaded is a copy or
    slightly modified copy of an existing database, or when an existing
    database is overwritten with a copy of another database using the
    SQLite backup API.

The block id size used by a manifest file must be configured when the
manifest is first created using the
command line create command or equivalent
API. If block sharing using md5 hash values is required, then the block id size
should be set to 24 bytes. Otherwise, it is best left set to the default value
of 16 bytes to save space.

5.4. Pragma Interface

Blockcachevfs currently supports three PRAGMA statements – “PRAGMA bcv_upload”
, “PRAGMA bcv_poll” and “PRAGMA bcv_client”. If further PRAGMA commands are
added in the future, the names of the new commands will begin with “bcv_”.

PRAGMA bcv_upload

Blockcachevfs database clients may execute the following PRAGMA statement
to force any changes made locally to be uploaded to the cloud storage
account:

  PRAGMA [database].bcv_upload

The PRAGMA statements does not return until the upload is complete. If
successful, an empty string is returned. If an error occurs and the upload
cannot be completed, an SQLite error code is returned and the error message
set to an English language explanation of the error.

Invoking this PRAGMA is different to using the sqlite3_bcvfs_upload()
interface in that this command only uploads changes for the current database,
whereas sqlite3_bcvfs_upload() uploads all locally modified databases in
the container.

PRAGMA bcv_poll

Invoking “PRAGMA bcv_poll”:

  PRAGMA [database].bcv_poll

is the equivalent to calling sqlite3_bcvfs_poll() on the container that the
database is a part of. If successful, an empty string is returned. Otherwise,
an SQLite error code is returned and the error message set to an English
language explanation of the error.

PRAGMA bcv_client

Invoking “PRAGMA bcv_client”:

  PRAGMA [database].bcv_client
  PRAGMA [database].bcv_client = NAME

This PRAGMA is used to set or query a cloudsqlite database connection’s client
name. The configured client name appears in the “client” column of the
bcv_http_log virtual table for requests associated
with the current database connection. It may also appear in log messages. It is
not used for any other purpose. The default client name is an empty string.

5.5. Virtual Table Interface

The virtual table interface consists of three eponymous virtual tables
(or two in daemon mode). It is not available automatically, but must be
made available to each SQLite connection explicitly by calling the
sqlite3_bcvfs_register_vtab() API from
blockcachevfs.h:

    int sqlite3_bcvfs_register_vtab(sqlite3*);

The bcv_container table:

The read-only “bcv_container” table contains one row for each container
attached to the VFS. It has the equivalent of the following schema:

    CREATE TABLE bcv_container(
      name      TEXT,          -- local name (alias) of container 
      storage   TEXT,          -- cloud storage system (e.g. "azure")
      user      TEXT,          -- cloud storage username
      container TEXT,          -- container name in cloud storage
      ncleanup  INTEGER        -- number of blocks eligible for cleanup
    )

Column “name” contains the local alias of the attached container (this
may be the same as the remove container name). Columns “storage”, “user”
and “container” contain the storage module, account and cloud storage
container names as specified to sqlite3_bcvfs_attach() when the container
was attached.

The “ncleanup” column usually contains the number of blocks that will
be deleted from cloud storage if a cleanup operation (see the
sqlite3_bcv_cleanup() API)is run on the container. However, if
errors or client crashes have occurred while uploading changes to
cloud storage, then there may be extra unused blocks left in the
cloud storage container. In this case those blocks will be deleted
by the next cleanup operation, but are not included in the value
stored in the “ncleanup” column of the “bcv_container” table.

The bcv_container table is available in both daemon and daemonless
mode.

The bcv_database table:

The read-only “bcv_database” table contains one row for each database in each
attached container. It has the equivalent of the following schema:

    CREATE TABLE bcv_database(
      container TEXT,          -- local name (alias) of container
      database  TEXT,          -- name of database
      nblock INTEGER,          -- total number of blocks in database
      ncache INTEGER,          -- number of blocks in cache
      ndirty INTEGER,          -- number of dirty blocks in cache
      walfile BOOLEAN,         -- true if transactions in local wal file
      state TEXT               -- state of database (see below)
    )

The “container” column contains the local alias of the container (the
same value that is stored in the “name” column of the bcv_container
table). Column “database” contains the database name.

Columns “nblock”, “ncache” and “ndirty” contain the total number of
blocks in the database, the number of those blocks that are currently
cached locally, and the number of those cached blocks that are dirty
and require uploding. It is always true that:

    nblock >= ncache >= ndirty

Boolean column “walfile” is set to true if there are currently write
transactions in a local wal file (blockcachevfs databases are always
in wal mode). To determine whether or not there have been local
changes to a database, an application might use a query similar to:

    SELECT (walfile OR ndirty>0) AS has_local_changes FROM bcv_database WHERE...

The “state” column usually contains an empty string. There are two
exceptions:

  • If the database has been created using sqlite3_bcvfs_copy() but
    not yet uploaded, then this column contains the text ‘copied’.
  • If the database has been deleted using sqlite3_bcvfs_delete() but
    the delete operation has not yet uploaded, then this column contains the
    text ‘deleted’.

The bcv_database table is available in both daemon and daemonless
mode.

The bcv_http_log table:

The “bcv_http_log” table contains one row for each HTTP request made
by the VFS or connected daemon. It has the equivalent of the following
for a schema:

     CREATE TABLE bcv_http_log(
       id INTEGER,              -- Unique, monotonically increasing id value
       start_time TEXT,         -- Time request was made, as iso-8601
       end_time TEXT,           -- Time reply received, as iso-8601 (or NULL)
       method TEXT,             -- "PUT", "GET" etc.
       client TEXT,             -- Name of client that caused request
       logmsg TEXT,             -- Log message associated with request
       uri TEXT,                -- URI of request
       httpcode INTEGER         -- HTTP response code (e.g. 200)
     ) 

For those requests that can be associated with a single SQLite database
handle, the contents of the “client” column is the client name as
configured using the PRAGMA bcv_client command.
For requests associated with a prefetch operation, it contains the string
‘prefetch’.

To prevent it from consuming an ever-increasing amount of memory, entries
are automatically removed from the bcv_http_log on a first-in/first-out
according to the values configured for the SQLITE_BCV_HTTPLOG_TIMEOUT
and SQLITE_BCV_HTTPLOG_NENTRY parameters. Or, in daemon mode, according
to the values passed via the –httplogtimeout and –httplognentry
command-line options.

Including various overheads, each entry may be assumed to consume
256 bytes of memory or less. So allowing 4096 entries to accumulate in
the bcv_http_log table may require up to 1MiB of memory.

The bcv_kv table:

The schema of the “bcv_kv” table is:

    CREATE TABLE bcv_kv(
      name TEXT PRIMARY KEY,   -- Key value 
      value                    -- Associated payload value
    )

Unlike the other two eponymous virtual tables described in this section, the
bcv_kv table is read-write. Applications may write arbitrary data to the table
to be stored in the cloud storage container. The table is empty when the
container is first created.

The contents of the bcv_kv table is stored as a separate file (an
SQLite database) in the cloud container – “bcv_kv.bcv”.

The bcv_kv table implements transaction-like properties. As follows:

  • The first time in a database transaction that the bcv_kv table
    is read or written, the file is downloaded from cloud storage and cached
    for the duration of the transaction. All user queries – read and write –
    are executed against this cached version of the bcv_kv table. This
    ensures that each transaction sees a consistent version of the table
    contents.
  • When a read/write transaction is committed, a new version of the bcv_kv
    table data is uploaded to cloud storage. At this point, if it is found
    that the file has been modified within cloud storage since it was
    downloaded at the beginning of the transaction (because some other
    client wrote to it), the commit fails and the transaction is rolled
    back. The extended error code in this case is SQLITE_BUSY_SNAPSHOT.

Because the entire bcv_kv table is downloaded (and, for write transactions,
uploaded) for each transaction, applications should avoid storing large
amounts of data within it.

The bcv_kv_meta table:

The schema of the “bcv_kv_meta” table is:

    CREATE TABLE bcv_kv_meta(
      name TEXT PRIMARY KEY,   -- Header field name
      value                    -- Header field value
    )

The bcv_kv_meta table is similar to the bcv_kv table, in that querying it
causes cloudsqlite to download the bcv_kv.bcv database file and cache it
for the remainder of the transaction. Instead of providing read/write access
to the contents of the bcv_kv.bcv database, it provides read-only access
to two of the http headers recieved from the cloud storage server when the
file was downloaded – “Date” and “Last-Modified”.

The bcv_kv_meta table always contains exactly two rows. The first contains
the string ‘date’ in the “name” column, and the value of the timestamp from the
HTTP “Date:” header, in ISO-8601 format, in the “value” column. The second row
contains ‘last-modified’ in the “name” column, and the ISO-8601 equivalent
of the timestamp from the HTTP “Last-Modified:” header in “value”.

If, for some reason, the cloud storage server does not provide a “Date:” or
“Last-Modified:” header, or if the contents of that header cannot be parsed,
the corresponding “value” field in the bcv_kv_meta table is set to NULL.

The following options are supported by all or most of the commands enumerated
below. All commands accept unique prefixes of any option in place of the full
option string.

Option Argument
-user USER-NAME

The user or account name to use when accessing cloud storage.

-auth ACCESS-KEY

The authentication information to use when accessing cloud
storage. The form of this depends on the cloud storage module in use.

-container CONTAINER-NAME

The blob storage container to read from or write to. This option is
accepted (and required) by the commands “upload”, “delete”, “list”
and “download” only.

-file PATH

This option tells the comand to read the local text file at relative or
absolute path PATH and treat its contents as whitespace separated command
line arguments.

-module MODULE-NAME

The cloud storage module to use to access the remote container. This may
be a simple module name (e.g. “google”) or a module name followed by
URI style parameters (e.g. “azure?emulator=127.0.0.1:10000&sas=1”).

-log STRING

The argument to this option determines when log messages are output
to stdout. If the value contains an “h” character, a message is output
each time an HTTP(S) request is issued or a reply received. If a “v”
is present in the string, libcurl verbose logging is enabled. All other
characters are ignored. The intepretation of the argument to this
option is subject to revision at any point. The default value is an
empty string (no log messages at all).

The following commands are supported:

 blockcachevfsd copy ?OPTIONS? DATABASE1 DATABASE2

Make a copy of database DATABASE1 named DATABASE2. If
DATABASE2 exists it is clobbered by the copy. This operation
always uses block sharing to avoid uploading any
new block files.

 blockcachevfsd create ?OPTIONS? CONTAINER

Create a new container and add an empty manifest file containing zero
databases to it. It is not an error if the container already exists. In this
case any existing manifest is clobbered by the new, empty, manifest,
effectively deleting any databases currently stored in the container.

The following extra options are supported:

      -blocksize BlOCK-SIZE
      -namebytes BLOCKID-SIZE

The “-blocksize” option is used to set the size of each block file
stored within cloud storage. The default size is 4MiB. If the argument
is an integer value it is interpreted as a size in bytes. If an integer value
followed by a ‘k’ or ‘K’ character, a size in KiB. Or, if the argument
is an integer value followed by an ‘m’ or ‘M’ character, a size in MiB.
The specified size must be a power of two in bytes. The following switches
all serve to set the block-size to 2MiB:

      -blocksize 2M
      -blocksize 2048k
      -blocksize 2097152

The “-namebytes” option may be passed an integer value between
16 and 32, inclusive, to set the size in bytes of the block ids that
will be used by the manifest file. The default value is 16. The block id size
has implications for block sharing.

 blockcachevfsd delete ?OPTIONS? NAME

Remove database NAME from the container identified by the -container
option.

 blockcachevfsd destroy ?OPTIONS? CONTAINER

Delete an entire blob storage container and its contents.

WARNING: Deleting a container from Azure blob storage is (apparently) a
lazily executed operation. Even though the container appears to be removed
immediately, it still exists in some senses as Azure prevents a new
container with the same name from being created for some time. Sometimes
this condition persists for up to 24 hours or longer. For this reason, it
is better to “create -clobber” a container instead of attempting to
delete and then recreate it.

 blockcachevfsd download ?OPTIONS? NAME ?LOCALFILE?

Download the named database from the container specified by the
-container option to the local file-system. If the LOCALFILE argument
is specified, then it is the local file-system path to store the downloaded
database at. If no LOCALFILE argument is specified, the downloaded
database is stored in the current directory using the same name as is used
in cloud storage.

The download command supports the following additional option:

      -nrequest N

This option sets the number of HTTP(S) requests that are allowed to be
outstanding at any one time. The default value is 1. Using a higher value
might increase throughput.

 blockcachevfsd upload ?OPTIONS? DATABASE-FILE ?NAME?

Upload the specified local database file to and add it to the manifest in
the container identified by the -container option. If the NAME parameter is
present then it is used as the name of the uploaded remote database. Otherwise,
the remote database takes the same name as the local file, with any directory
components removed.

The upload command supports the following additional option:

      -nrequest N

This option sets the number of HTTP(S) requests that are allowed to be
outstanding at any one time. The default value is 1. Using a higher value
might increase throughput.

The eight commands above mirror the eight APIs made available to database
clients (attach and detach by blockcachevfs.h, the others by bcvutil.h).
The following commands are also supported, but are considered a convenience
only. The command-line options and the output of the following are subject
to change at any time:

 blockcachevfsd files ?OPTIONS? CONTAINER

List the files in the specified container, one per line. All files
are listed, even those that are not part of blockcachevfs databases.

 blockcachevfsd list ?OPTIONS? 

If there is no -container option, list all databases in all containers in
the blob storage account. Or, if the -container option is passed, list only
the databases in the specified container.

 blockcachevfsd manifest ?OPTIONS? CONTAINER

Output a human-readable version of the manifest file found in the
specified container.

The daemon process is started as follows:

 blockcachevfsd daemon ?OPTIONS? DIRECTORY 

The daemon command supports the following options:

Option Argument Default Value
-addr IPV4-ADDRESS (default – “127.0.0.1”)

Normally, the daemon listens and accepts connections from client on
the localhost address (127.0.0.1). This option allows the daemon to
listen for connections on some other local address.

-cachesize CACHE-FILE-SIZE (default 1G)

Database file blocks downloaded from the blob storage account are cached
locally. The cache is allowed to grow up to the size configured by this
option, then blocks are expelled on a least-recently-used basis as new
blocks are downloaded. The argument to this option must be a positive
integer followed by a single “M” or “G” (or “m” or “g”) character to
indicate whether the integer value should be interpreted as a number
of MiB or GiB. For example, to use a 1GiB cache, either “-cachesize 1024M”
or “-cachesize 1G” may be specified.

-httptimeout SECONDS (default value 600

The number of seconds to wait before assuming an HTTPS request made to
cloud storage has been lost.

-log STRING (default value “”)

The argument to this option determines when log messages are output
to stdout by the daemon process. Each character in the argument string
turns on different log messages. Characters not listed in the following
table are ignored. The interpretation of the argument passed to this
option is subject to revision at any time.

Character Interpretation
m Output a log message each time a message is received
from or sent to a connected database client.
h Output a log message each time an HTTP(S) request is
set or a reply received.
e Output a log message when various internal “events”
occur.
v Turn on libcurl CURLOPT_VERBOSE logging.
-notimestamps

This option causes the daemon process to omit the high-resolution
timestamps from the log messages it outputs.

-port PORT-NUMBER (default – first free tcp port >= 22002)

The localhost port on which to listen for connections from local
database clients.

-httplogtimeout SECONDS (default value 3600)

The number of seconds after which entries are automatically removed from
the bcv_http_log virtual table. A value of less
than zero means entries are never removed.

-httplognentry INTEGER (default value -1)

The maximum number of entries allowed in the
bcv_http_log virtual table. Once this limit has
been reached old entries are automatically removed to make way for new.

-autoexit

This debugging option causes the daemon process to exit
cleanly when the number of local database clients drops to zero.

-delay MS 0

This debugging option causes the daemon process to
to pause for the specified number of milliseconds before beginning to
listen for incoming connections.

-readymessage

This debugging option causes the daemon process to print “READYn”
to stdout when it is ready to accept incoming client connections. This
makes it easier to write test scripts that start and stop daemon processes.

Read More

Leave a Comment