A distributed database system is one in which the data belonging to a single logical database is distributed to two or more physical databases. Beyond that simple definition, there are a confusing number of possibilities for when, how, and why data is distributed. Some are applicable to edge and/or fog computing, some others are applicable to fog and/or cloud computing, and some are applicable across the entire spectrum of edge, fog and cloud computing.
This paper will cover the types of distributed database systems in the context of edge, fog and cloud computing, explain “when, how and why” the data is distributed, and why those details make certain distributed database systems applicable (or not) to specific needs in edge, fog and cloud computing.
Wikipedia authors have taken a collective stab at defining a distributed database: “A distributed database is a database in which storage devices are not all attached to a common processor. It may be stored in multiple computers, located in the same physical location; or may be dispersed over a network of interconnected computers. Unlike parallel systems, in which the processors are tightly coupled and constitute a single database system, a distributed database system consists of loosely coupled sites that share no physical components.” That definition, itself, derives partly from the Institute for Telecommunications Sciences associated with the U.S. Department of Commerce.
That definition is actually pretty narrow. There are at least three other use cases that I get asked about under the general heading of “distributed database”: High Availability, Cluster Database, and Block Chain. The Wikipedia definition, High Availability and Cluster all have applicability to the Internet of Things. Further, I contend that a distributed database’s shards can exist on the same physical computer. What makes a database distributed is that the various partitions are managed by separate instances of the database system, not the physical location of those partitions. More on this in section VI.
For a database system to achieve high availability (HA), it needs to maintain, in real time, an identical copy of the physical database in a separate hardware instance. By maintain, I mean keep the copy consistent with the master. In this scenario, there are (at least) two copies of the database that we call the master and the slave(s) (sometimes called replicas). Actions applied to the master database (i.e. insert, update, delete operations) must be replicated on the slave and the slave must be ready to change its role to master at any time. This is called failover. The master and replica are normally deployed on different physical systems, though in telecommunications a common HA set up is multiple boards within a chassis: a master controller board, a standby controller board, and some number of line cards that each serve some protocol (BGP, OSPF, etc). Here, the master database is maintained by processes on the master controller board. The database system replicates changes to a slave database on the standby controller board, which has identical processes waiting to take over processing in the event that the master controller board fails (or is simply removed in a hot-swappable setup). In the Internet of Things, High Availability is desirable for mission critical industrial systems, to maintain availability of gateways, and in the cloud to ensure that real-time analytics can continue to execute even in the face of hardware failures.
A Cluster Database is one in which there are multiple physical copies of the entire database which are kept synchronized. The difference with HA is that any physical instance of the database can be modified, and will replicate its modifications to the other database instances within the cluster. This is also known as a master-master configuration, in contrast to HA’s master-slave configuration. This is where the similarities among database cluster implementations end. Broadly speaking, there are two implementation models: ACID and Eventual Consistency. In ACID implementations, modifications are replicated synchronously in a two-phase commit protocol to assure that once they are committed, the changes are immediately reflected in every physical instance of the database. In other words, all the database instances are consistent, all the time. This architecture eliminates the potential for conflicts (or, rather, conflicts are resolved before a transaction can be successfully committed to the cluster). With Eventual Consistency, changes are replicated asynchronously, possibly long after the originating node committed the change to the database. This implies some sort of reconciliation process to resolve conflicting changes originated by two or more nodes. With Eventual Consistency, applications must be written to contend with the possibility of having stale data in the physical instance of the database to which they’re attached. For example, consider a worldwide online bookseller. There may be one copy of a certain book in stock; buyers in New York and Sydney will both see that the book is available, and both can put the book in their shopping cart and check out. The system will have to sort out who really gets the book and whose order is backordered. Users have come to accept this. But, this model would never work for a cellular telephone network needing to verify that a subscriber has subscribed to a certain service, or has sufficient funds. This type of system requires a consistent database view. Because of the nature of the synchronous replication required for the ACID implementation, horizontal scalability is limited, but the implementation is straightforward (no conflict resolution needed). Scalability of Eventual Consistency implementations is quite high, but so is the complexity. Cluster implementations abound in the Internet of Things. For example, IoT gateways can be clustered for improved scalability and reliability. See Figure 1. The number of nodes in each gateway cluster is modest, so both the Immediate and Eventual Consistency model are suitable. The cluster can handle more traffic from edge devices than a single gateway would be able to, and reliability/availability is improved (the inherent limits to scalability with the Immediate Consistency model don’t come in to play in small clusters).
The term “Distributed Database” is often associated with Blockchain technologies (BitCoin being the most well-known). It is used synonymously with “Distributed Ledger”, which is more apt (in this author’s opinion). My problem with using the term distributed database in the context of Blockchain technologies is that ‘distributed database’ implies a distributed database management system. But there is rarely a database management system involved in Blockchain. Not to belabor the point, but it is important to draw a distinction between a database and a database management system. A database is simply a collection of data, which might or might not be distributed. A database management system is the software that manages a database. A Blockchain is, in fact, a distributed database. But, as previously mentioned, there is rarely a database management system involved in creating/maintaining the Blockchain distributed ledger.
The distributed (partitioned) database topology implied by the Wikipedia definition “...stored in multiple computers, located in the same physical location…” is what is colloquially called sharding a database. The key difference between sharding versus HA and Cluster distributed databases is that each physical database instance (shard) houses just a fraction of all the data. All the shards together represent a single logical database, which is manifested in many physical shards. I part company with the Wikipedia definition in that shards need not be stored in multiple computers to gain the benefits of sharding. Logically, the purpose is the same: scalability. Whether the shards are distributed across servers, or partitioned on a single server to leverage multiple CPUs or CPU cores is immaterial. In all cases, processing is parallel. How the shards are physically distributed is an unimportant artifact. For example, in STAC-M3 published benchmarks we’ve conducted since 2012, we’ve utilized single servers with 24 cores, creating 72 shards, and we’ve utilized 4 to 6 servers each with 16 to 22 cores, creating 64 to 128 shards. In all cases, the goal is to saturate the I/O channels to get data into the CPU cores for processing. While STAC-M3 is a capital markets (tick database) benchmark, the principles apply equally to the Internet of Things’ Big Data analytics. IoT data is overwhelmingly time series data (e.g. sensor measurements), just like a tick database is time series data.
Sharding a database implies support for distributed query processing. Each shard is managed by its own instance of a database server. Since each shard/server represents some fraction of the whole logical database, the potential exists that a query result returned by any shard is just a partial result set and needs to be merged with the partial result sets of every other shard/server and only then be presented to the client application as a complete result set. If the data is distributed among the shards in the most optimal way, then all of the data for a given query can be found on a single shard, and the query can be distributed to the specific server instance that manages that shard. Often, both approaches must be supported. For example, consider a large smart-building IoT deployment that spans multiple campuses, each with multiple buildings. We might choose to distribute (shard) information about each campus across multiple physical databases. If we want to calculate some metric for a specific building (e.g. power consumption in 15-minute windows, we only need to query the shard that contains the data for that building. But, if we want to calculate the same metric for multiple buildings and/or across campuses, then we need to distribute that query to many shards/servers, and this is where the parallelism comes in to play. Each server instance works on its portion of the problem in parallel with every other server instance.
Database sharding also supports vertical scalability (i.e. being able to store 10s or 100s of terabytes, petabytes and beyond). To create a single 100 terabyte logical database, I can create 50 instances of 2 terabyte physical databases. Distributed database systems often support “elastic” scalability, allowing me to add shards, which could also mean adding servers to the distributed system so that the system is scalable in both the vertical and horizontal dimensions. Vertical and horizontal scalability is essential for large IoT systems that generate very high volumes of data. You need vertical scalability to handle the ever-growing volume of data, and you need horizontal scalability to maintain the ability for timely processing/analytics of the data as it grows from 1TB to 10TB to 100TB to petabytes and beyond.
While, strictly speaking, not a distributed database implementation, we would be remiss if we didn’t talk about data distribution within an IoT system. The IoT ecosystem usually consists of the “Edge”, “Gateway” and “Cloud”, and databases exist in all locations. IoT data is initially generated on the edge and needs to be distributed from there to gateways, and from gateways to the cloud. Data at the edge is often used to control some “thing” in real time, e.g. to open or close a solenoid in an industrial IoT system. At the enterprise level, in many (if not most) cases, one of the objectives of an IoT system is to capture and extract value from the data. Usually, that means some sort of “monetization” of the data. That can come in the form of increased efficiency or reduction in maintenance costs or downtime in an industrial setting, or more intelligent interaction with customers leading to greater efficiency extracting dollars from their wallets.
Data distribution with an IoT system means moving the data from the edge through a gateway or gateways to a private or public cloud. This movement of data is fraught with its own issues:
Edge devices can be offline, either by design or because of faults in the communication infrastructure. For example, battery-powered edge devices will be offline by design, and only connect to a gateway on a schedule. Or, they can be mobile devices that move in and out of range of gateways or cell towers. Or, the communication link can simply be broken. In any case, the device must have the intelligence to queue data for later transport.
Security is the top consideration for IoT systems in this decade, and likely into well into the next. Data in flight needs to be protected. This could be as simple as utilizing SSL/TLS.
The communication channels available at edge devices can have extremely limited bandwidth. For example Bluetooth Low Energy (BLE) is 1 or 2 megabit per second (Mbit/s). Zigbee ranges 20 to 250 kilobits per second (kbit/s). This compares to the slowest Ethernet at 10 Mbit/s. To maximize the available bandwidth, data should be compressed before it is put into the communication channel.
Some questions that IoT system designers need to consider: Will all data be pushed from the edge to the cloud? Or is some data only useful at the edge? Will data be aggregated before it is transmitted, or is only raw granular data transmitted?
In eXtremeDB, we’ve anticipated and addressed these concerns in the Active Replication Fabric™.
In summary, the term distributed database encompasses three different database system arrangements for three distinct purposes. High Availability database systems distribute a master database to one or more replicas for the express purpose of preserving the availability of the system in the face of failure. Cluster database systems distribute a database for massive/global scalability (Eventual Consistency) or for cooperative computing among a relatively small number of nodes (ACID). Sharding partitions a logical database into multiple shards to facilitate parallel processing and horizontal scalability. All capabilities are integral to the deployment of scalable and reliable IoT systems.
These distributed database mechanisms are often used in combination. Refer again to Figure 1 above, where we see clusters of gateways with each node in a cluster aggregating data from some number of devices. If a cluster node fails, the devices it was serving can connect to another gateway in the cluster and maintain operations. At the server level, a sharded database is depicted, with each shard receiving data from one of the gateway clusters. Collectively, the shards represent a single logical database. Each shard consists of a master/replica HA pair. This is desirable because without HA, if any shard failed, the logical database integrity is compromised
Steve Graves co-founded McObject in 2001. As the company’s president and CEO, he has both spearheaded McObject’s growth and helped the company attain its goal of providing embedded database technology that makes embedded systems smarter, more reliable and more cost-effective to develop and maintain. Prior to McObject, Graves was president and chairman of Centura Solutions Corporation, and vice president of worldwide consulting for Centura Software Corporation (NASDAQ: CNTR); he also served as president and chief operating officer of Raima Corporation. Graves is a member of the advisory board for the University of Washington’s certificate program in Embedded and Real Time Systems Programming. For Steve’s updates on McObject, embedded software and the business of technology, follow him on Twitter.