Embedded Database Alternatives for Intelligent Devices
Comparing and Contrasting Two Distinct Technologies for Embedded Device Data Management
Software development for intelligent devices is in a state
of transition. These devices are no longer disconnected gadgets. The
combination of connectivity, increasing processing power, integrated sensor
technology and the growing amounts of content and data that must be processed
has transformed them into high performance, data-driven information computing
platforms. Mobile phones, set-top boxes, and in-dash infotainment and
navigation systems are just a few examples of intelligent devices that are
expected to efficiently integrate complex, real-world information to create new
functionality, new services and new user experiences. As a result, device
software applications have largely become data-driven, spending most of their
time acquiring, processing and acting upon diverse and dynamic real-time data
sets.
Much like the enterprise computing environment, today's
intelligent devices are fueled by their underlying data sources. In the
enterprise environment, the use of data management technologies in the form of
client-server relational databases is ubiquitous. However, the hardware
resource constraints and the varying forms of dynamic, real-time stream data to
be managed within intelligent devices demands a fundamentally different data
management technology than that used in the enterprise computing environment.
There are many client-server database technologies available
to embedded developers today. Many of these databases, by reputation, are sound
implementations of classic client-server databases that have been ported and
downsized to work with a 32-bit embedded system. Many of these databases
support a relational data model with transaction support that includes rollback
and crash recovery. Yet as client-server database management systems, these
databases are not the optimal choice for device data management.
This white paper explains some important differences between
client-server embedded database systems, or other embedded databases that employ
a classic, client-server-style application library versus the approach taken by
the architects of DeviceSQLTM. It also explores why
application-specific data management systems that are generated as inline code
present a better alternative for today's intelligent devices.
This White paper is brought to you by Microcontroller.com and is available in our easy-to-read
collapsed outline format. Simply click on any heading to expand or collapse the section you wish to read.
Design Goal Comparison
The differences between client-server databases and
technologies designed specifically for device data management are a natural
result of differences in design goals. Most classic relational database servers
are designed to perform as general, dynamic SQL engines for table data within
an ad-hoc query environment supporting concurrent clients. They assume clients
will generate new tables and new queries dynamically within a running system
and that the size of the data set is large, relatively unbounded and
unpredictable.
For this reason, most embedded databases carry all the
operational machinery required to parse and interpret new constructs and assume
a disk-based model for data access. They are founded on a "store then process"
model which was not designed to handle live data streams coming from sensors,
network I/O or other dynamic data sources. In many intelligent devices, data
constructs are often fixed and the data sets are not necessarily large or
unbounded. Applications therefore pay a significant price in terms of size and
performance for incorporating a generalized client-server database in an
embedded system where only a small percentage of the functionality may ever be
used. The extra overhead is often too steep for resource-constrained or
cost-constrained devices, therefore precluding the use of a client-server model
database.
By contrast, DeviceSQL is a next generation software
framework designed specifically for device data management. It is designed to
provide developers (and the products they build) with ultra high performance
data management capabilities that can operate on diverse data types from
multiple sources, all within the extreme design and resource constraints of
real-world intelligent devices. DeviceSQL is optimized for data-driven
applications in which the data schema, table queries and data operations are generally
known at system build time and for which the data set size can be predictably
bounded within a resident memory space, or for which data may be obtained from
both tables as well as from generalized data streams like sensors or network
connections. DeviceSQL is designed to perform efficiently and predictably
within resource-constrained as well as resource-rich environments and support
the wide variety of storage mediums that may, or may not, be present. It is
designed for systems and intelligent devices that are self-contained and
capable of running continuously without modification. For device data
management, key technical considerations tend to be reliability,
predictability, performance and optimized resource utilization. It must also
support the different kinds of data and data sources required.
Similarly, client-server architecture has benefits when
there is a need for an added level of scalability and security. This assumption
is true in an environment, such as an enterprise, where adding more servers is
possible for scalability. It also has benefits in a multi-user database system
where one would not want a single rogue application bringing the whole system
down. However, in most device applications, client-server database model
benefits are not needed and carry undesirable performance characteristics and
expense.
Design and Feature Comparison
The fundamental technical distinctions between client-server
databases and the application-specific data management solutions that can be
built with the DeviceSQL Framework include:
- Client-server database interactions are interpreted at runtime
whereas DeviceSQL's data manipulation interactions are typically pre-compiled
on the host and execute natively as inline code. However, should dynamic access
to DeviceSQL-compiled data structures be desired, DeviceSQL optionally supports
a dynamic C level API and a dynamic DeviceSQL API. This provides more
flexibility and control for developers.
- Most embedded client-server databases provide no model for data
sources and sinks other than tables. DeviceSQL goes further and extends the
relational model to support data management operations on arbitrary stream data
sources such as those generated by network connections, sensors and other I/O
sources common to device applications.
- DeviceSQL offers much higher-performance (typically 15-50x
faster) and offers a far greater degree of predictability and determinism than
a client-server architecture and/or a disk-based database. This is primarily
due to the native in-line execution model
- Embedded client-server databases are typically either disk-based
or in-memory only. DeviceSQL can concurrently support several persistent store mediums
on a per table basis, as well as no persistent store at all (pure in-memory
operation)
- Client-server databases may require a high degree of persistent storage
support, typically a file system. By contrast DeviceSQL can operate with or
without a file system.
- Client-server databases typically require a multi-tasking OS. DeviceSQL does
not require an underlying operating system.
- Most embedded client-server databases have no procedural language
and depend on a fixed, server-style API to access the database. DeviceSQL has a
powerful procedural language (DeviceSQL language) that greatly reduces
complexity and development time by servicing as moth a data definition language
(DDL) as well as a data manipulation language (DML). The DeviceSQL language is
derived from Oracle's PL/SQL and extends SQL with a rich, strongly-typed
procedural language that includes extensions for the unique needs of device
data management.
- Classic client-server databases require database statements to be
included throughout the main application's programming logic and call a
proprietary API. The DeviceSQL language is compiled to C source code and
executes natively as in-line code. DeviceSQL enables users to define C level
calling interface (a user-defined API) so DeviceSQL-compiled data management
functions can be seamlessly integrated with existing applications. In addition,
should dynamic access to DeviceSQL-compiled data structures be desired,
DeviceSQL optionally supports a pre-defined C level API plus a dynamic
DeviceSQL for dynamic data access.
- Most embedded client-server databases provide no capability to integrate and
use legacy data management functions written in C/C++. In addition, existing
applications must be re-architected to call the database's APIs. By contrast, DeviceSQL
allows developers to incorporate existing C code in DeviceSQL language
statements as well as export DeviceSQL-compiled C code into existing applications with a
user-specified calling interface.
- Most embedded client-server databases do not allow databases to be split across
multiple types of persistent storage media such as RAM, flash, disk, etc. DeviceSQL
allows individual tables to be placed in different storage media, allowing
developers to select the optimum place to store data by matching data usage
with media characteristics and system resources.
- DeviceSQL requires far less memory (typically 90% less) than most embedded
client-server databases for both code and data and has proven to actually reduce
overall memory consumption in many applications. Embedded databases typically
increase the memory footprint and slow performance for applications.
- Based on comparisons with published benchmarks DeviceSQL-enabled
data management code is generally much faster (roughly 15-50x faster) than most
embedded databases on non-updating transactions, and comparable or considerably
faster on updating operations..
- DeviceSQL supports data integrity features such as crash recovery
and rollback while many in-memory only databases cannot support this feature
and therefore do not offer full ACID compliance (Atomicity, Consistency,
Isolation, Durability).
Operational Comparisons
Interpreted vs. Compiled
Client-server embedded databases have no mechanism for
compiling data access operations to machine code. Instead, an application
interacts with a server at runtime by passing a C string containing a SQL query
or command along with necessary parameter binding and output mechanism via a
generic API call to a server. This string and the associated data are parsed
for syntactic correctness, analyzed for semantic correctness, a query plan is
determined, interpreted execution code is generated, and finally, the
interpreted code is executed and results returned using the output mechanism.
This model has a number of implications.
- The server must include the parser, query plan generator, and
interpreter as part of the runtime system. It is assumed that errors in syntax
or semantics can be discovered and handled at runtime. Not only does this
require the presence of an ad-hoc query processing facility, it also requires
operational machinery to handle runtime error-checking for parser errors and
semantic errors in SQL queries in the client application.
- The server, having no idea what queries will be requested,
is forced to include code to support all possible type operations and query operators as
part of the runtime system. This increases memory requirements even though,
perhaps, only a small subset of the code is exercised in the lifetime of the
device.
- Complex expressions, even purely arithmetic ones, are evaluated
by out-of-line function calls. In non-updating table scans involving arithmetic
expressions, the cost of interpreting versus compiling is a factor of 10x or
more.
- Because a client-server database has
to interpret references to potentially unknown tables and columns along with
their types and other data in expressions, it must include all of the metadata
for the system as part of the system itself, increasing memory requirements.
By contrast, with DeviceSQL's compiled in-line execution model, the entire user
data management application is analyzed and compiled in advance. The DeviceSQL runtime
system does not require a SQL parser, query generator, or interpreter of any
kind. The DeviceSQL runtime system is specific to the application and includes
support only for those data types and table operations that are actually used.
With DeviceSQL, SQL type operations expressed in the DeviceSQL language are
analyzed for syntactic and semantic errors at build time, so the data
management part of the executable will be significantly smaller, faster and
more deterministic than a client-server database implementation.
OS and File System Requirements
OS and file system requirements are built into the basic
architecture of most embedded client-server databases. Most of these databases
will not run on systems that do not have an operating system or provide basic
file system support. DeviceSQL has no operating system requirements and
persistent storage is accessed via user configurable Storage Manager services.
This allows developers to map tables directly onto the various storage devices
in the target system, and to take advantage of the characteristics of each type
of storage. Tables may be stored "in memory" (or RAM-only) if they hold data
that is no longer needed once the device is powered down. Alternatively, tables
may be stored persistently in different types of storage media appropriate to
the application and the cost model of the device.
DeviceSQL implements persistent transaction support by
utilizing a no-overwrite strategy for updates and by using a dirty-bit approach
to keep crash-recovery relevant state in a single, persistent data store. With DeviceSQL,
a single COMMIT operation is used for persistent store while enabling full
crash recovery. Use of a single persistent storage area means that DeviceSQL does
not require a file system. Implementations of persistent storage can be as
simple as arrays of persistent memory in flash.
Disk-Based vs. Memory-Resident
In most client-server embedded databases, table data is
disk-based and dynamic memory is treated as a cache into which data is paged,
as needed, for processing. With DeviceSQL, data is usually RAM-based and
flushed to or read from a persistent store only at transaction boundaries (commit
or rollback), and then only if data in persistent tables has been modified.
When persistent store can be accessed as RAM (i.e., NVRAM), then there may be
no data copying at all. Specifically, for pure reads of data from tables, DeviceSQL
requires no access to persistent store. In comparison, most embedded databases
require reading a page into memory from disk.
A disk-based model, used by many client-server databases,
entails certain costs. It may impose the kinds of operating system and file
system requirements described earlier. The caching mechanism is complex and
significantly increases the size of the runtime. This price is paid regardless
of whether a particular data set is intended to be persistent and even when
this persistent store supports random access. Execution times for even pure
read accesses may be unpredictable. Additional access time and space, relative
to a memory-based scheme, is required to access data, regardless of whether or
not that data is cached or on disk.
Transaction support (commit/rollback) in this environment
requires maintaining a separate journal file in addition to the database. Every
update may involve not only a write to the database, but also access to the
journal (commonly known as a transaction log) to record previous values in
order to make rollback feasible. Crash recovery in this environment is far more
complicated and time consuming. Indexes must be maintained on disk along with
tables, which requires larger and more complex index structures and can lead to
unpredictable performance even for indexed access.
Data Model Comparison
API vs. Higher-Level Language
The consequences and benefits of providing a higher-level
language interface for applications data are significant. With DeviceSQL, device
data management functions are expressed in a more powerful, higher-level method
using the DeviceSQL language. DeviceSQL language is based on Oracle's PL/SQL,
the world most popular procedural language for data-centric development, and
has been optimized for device software development. This forms the basis for a
compiled custom interface and supports strong typing. This is very important
from a performance standpoint in that data can be processed efficiently
directly within the data management component without having to move it to
structures outside that component. The DeviceSQL language provides a much
higher-level of semantic expressibility than C or C++, which, along with strong
typing, leads to greater development productivity and improved code quality,
reliability and portability.
The style of functional interface provided by the DeviceSQL Framework
allows the creation of data management components and solutions with an
abstraction layer so that external components can remain independent of the
data management system's definition and implementation.
Most embedded databases provide no higher-level procedural
language support for data manipulation.
Opaque vs. Import / Export
Within most embedded databases, there is no mechanism for
extending the operators and types supported other than by modifying the source
code. DeviceSQL freely allows external functions to be included (Imported)
within the data management system, and even used in query evaluation.
Symmetrically, functions defined in DeviceSQL language can be freely called
(Exported) by external modules written in C, C++ or Java.
Support for Data Streams
Most embedded client-server databases have operators that
apply only to relatively static data stored in tables. DeviceSQL was designed
to bring the power of relational data management concepts normally reserved for
tables and extend it to apply to dynamic data streams. This allows the benefits
of relational data concepts to be applied directly to data from dynamic stream
sources such as network connections, sensors and other I/O commonly found in
connected, intelligent devices. This enables a much richer data management
foundation for device applications and makes more efficient use of system
resources than with the conventional "store then process" model used by most
database technologies.
Summary
There are major differences in performance and suitability
between embedded client-server database technologies and the device data
management components and solutions that can be built with the Encirq DeviceSQL
Framework. These differences stem from fundamental differences in design goals
and architecture. Most embedded client-server databases are designed to perform
as general-purpose, dynamic SQL engines for static table data within an ad-hoc
query environment with concurrent clients. They assume clients will generate
new tables and new queries dynamically within a running system, and that the
size of the data set is large, relatively unbounded and unpredictable.
By contrast, the DeviceSQL approach trades dynamism with
respect to ad-hoc queries for higher performance and determinism, reduced
memory usage and greater control and flexibility for supporting the kind of
data sources and storage mediums used in complex devices. It is designed for
systems that are self-contained and capable of running continuously without
modification. This unique, breakthrough approach brings many of the benefits of
relational database type functionality - and more - to developers and their
products. It also allows developers to leverage more-efficient, data-centric
approaches that make it easier to solve challenging device data management issues
in their designs.
About Encirq