Database Development: Contents
- Database
management systems
- Relational
database management systems
- Post-relational
database models
- Object
database models
- DBMS
internals
- Storage
and physical database design
- Indexing
- Transactions
and concurrency
- Replication
- Security
- Locking
- Architecture
- Applications
of databases
Database Development: Database
management systems
Main article: Database management system
Database Development: Relational database
management systems
An RDBMS implements the features of the relational
model outlined above. In this context, Date's
Information Principle states:
The entire information content
of the database is represented in one and only
one way. Namely as explicit values in column positions
(attributes) and rows in relations (tuples) Therefore,
there are no explicit pointers between related
tables.
Database Development: Post-relational database
models
Several products have been identified as post-relational
because the data model incorporates relations
but is not constrained by the Information Principle,
requiring that all information is represented
by data values in relations. Products using a
post-relational data model typically employ a
model that actually pre-dates the relational model.
These might be identified as a directed graph
with trees on the nodes.
Examples of models that could
be classified as post-relational are PICK aka
MultiValue, and MUMPS.
Database Development: Object database models
In recent years, the object-oriented paradigm
has been applied to database technology, creating
a new programming model known as object databases.
These databases attempt to bring the database
world and the application programming world closer
together, in particular by ensuring that the database
uses the same type system as the application program.
This aims to avoid the overhead (sometimes referred
to as the impedance mismatch) of converting information
between its representation in the database (for
example as rows in tables) and its representation
in the application program (typically as objects).
At the same time, object databases attempt to
introduce the key ideas of object programming,
such as encapsulation and polymorphism, into the
world of databases.
A variety of these ways have
been tried for storing objects in a database.
Some products have approached the problem from
the application programming end, by making the
objects manipulated by the program persistent.
This also typically requires the addition of some
kind of query language, since conventional programming
languages do not have the ability to find objects
based on their information content. Others have
attacked the problem from the database end, by
defining an object-oriented data model for the
database, and defining a database programming
language that allows full programming capabilities
as well as traditional query facilities.
Database Development: DBMS internals
Database Development: Storage and physical
database design
Main article: Database storage structures
Please help improve this section by expanding
it. Further information might be found on the
talk page or at requests for expansion. (June
2008)
Database tables/indexes are
typically stored in memory or on hard disk in
one of many forms, ordered/unordered flat files,
ISAM, heaps, hash buckets or B+ trees. These have
various advantages and disadvantages discussed
further in the main article on this topic. The
most commonly used are B+ trees and ISAM.
Other important design choices
relate to the clustering of data by category (such
as grouping data by month, or location), creating
pre-computed views known as materialized views,
partitioning data by range or hash. As well memory
management and storage topology can be important
design choices for database designers. Just as
normalization is used to reduce storage requirements
and improve the extensibility of the database,
conversely denormalization is often used to reduce
join complexity and reduce execution time for
queries. [1]
Database Development: Indexing
All of these databases can take advantage of indexing
to increase their speed. This technology has advanced
tremendously since its early uses in the 1960s
and 1970s. The most common kind of index is a
sorted list of the contents of some particular
table column, with pointers to the row associated
with the value. An index allows a set of table
rows matching some criterion to be located quickly.
Typically, indexes are also stored in the various
forms of data-structure mentioned above (such
as B-trees, hashes, and linked lists). Usually,
a specific technique is chosen by the database
designer to increase efficiency in the particular
case of the type of index required.
Relational DBMS's have the advantage
that indexes can be created or dropped without
changing existing applications making use of it.
The database chooses between many different strategies
based on which one it estimates will run the fastest.
In other words, indexes are transparent to the
application or end-user querying the database;
while they affect performance, any SQL command
will run with or without index to compute the
result of an SQL statement. The RDBMS will produce
a plan of how to execute the query, which is generated
by analyzing the run times of the different algorithms
and selecting the quickest. Some of the key algorithms
that deal with joins are nested loop join, sort-merge
join and hash join. Which of these is chosen depends
on whether an index exists, what type it is, and
its cardinality.
An index speeds up access to
data, but it has disadvantages as well. First,
every index increases the amount of storage on
the hard drive necessary for the database file,
and second, the index must be updated each time
the data are altered, and this costs time. (Thus
an index saves time in the reading of data, but
it costs time in entering and altering data. It
thus depends on the use to which the data are
to be put whether an index is on the whole a net
plus or minus in the quest for efficiency.)
A special case of an index is
a primary index, or primary key, which is distinguished
in that the primary index must ensure a unique
reference to a record. Often, for this purpose
one simply uses a running index number (ID number).
Primary indexes play a significant role in relational
databases, and they can speed up access to data
considerably.
Database Development: Transactions and concurrency
In addition to their data model, most practical
databases ("transactional databases")
attempt to enforce a database transaction . Ideally,
the database software should enforce the ACID
rules, summarized here:
Atomicity: Either all the tasks
in a transaction must be done, or none of them.
The transaction must be completed, or else it
must be undone (rolled back).
Consistency: Every transaction must preserve the
integrity constraints — the declared consistency
rules — of the database. It cannot place
the data in a contradictory state.
Isolation: Two simultaneous transactions cannot
interfere with one another. Intermediate results
within a transaction are not visible to other
transactions.
Durability: Completed transactions cannot be aborted
later or their results discarded. They must persist
through (for instance) restarts of the DBMS after
crashes
In practice, many DBMSs allow most of these rules
to be selectively relaxed for better performance.
Concurrency control is a method
used to ensure that transactions are executed
in a safe manner and follow the ACID rules. The
DBMS must be able to ensure that only serializable,
recoverable schedules are allowed, and that no
actions of committed transactions are lost while
undoing aborted transactions.
Database Development: Replication
Replication of databases is closely related to
transactions. If a database can log its individual
actions, it is possible to create a duplicate
of the data in real time. The duplicate can be
used to improve performance or availability of
the whole database system. Common replication
concepts include:
Master/Slave Replication: All
write requests are performed on the master and
then replicated to the slaves
Quorum: The result of Read
and Write requests are calculated by querying
a "majority" of replicas.
Multimaster: Two or more replicas
sync each other via a transaction identifier.
Parallel synchronous replication of databases
enables transactions to be replicated on multiple
servers simultaneously, which provides a method
for backup and security as well as data availability.
Database Development: Security
Database security denotes the system, processes,
and procedures that protect a database from unintended
activity.
Security is usually enforced
through access control, auditing, and encryption.
Access control ensures and restricts
who can connect and what can be done to the database.
Auditing logs what action or
change has been performed, when and by whom.
Encryption: Since security has become a major
issue in recent years, many commercial database
vendors provide built-in encryption mechanism.
Data is encoded natively into the tables and deciphered
"on the fly" when a query comes in.
Connections can also be secured and encrypted
if required using DSA, MD5, SSL or legacy encryption
standard.
Enforcing security is one of the major tasks of
the DBA.
In the United Kingdom, legislation
protecting the public from unauthorized disclosure
of personal information held on databases falls
under the Office of the Information Commissioner.
United Kingdom based organizations holding personal
data in electronic format (databases for example)
are required to register with the Data Commissioner.[2]
Database Development: Locking
Please help improve this section by expanding
it. Further information might be found on the
talk page or at requests for expansion. (June
2008)
Locking is how the database
handles multiple concurrent operations. This is
how concurrency and some form of basic integrity
is managed within the database system. Such locks
can be applied on a row level, or on other levels
like page (a basic data block), extend (multiple
array of pages) or even an entire table. This
helps maintain the integrity of the data by ensuring
that only one process at a time can modify the
same data.
Unlike a basic filesystem files
or folders, where only one lock at the time can
be set, restricting the usage to one process only.
A database can set and hold mutiple locks at the
same time on the different level of the physical
data structure. How locks are set, last is determined
by the database engine locking scheme based on
the submitted SQL or transactions by the users.
Generally speaking, no activity on the database
should be translated by no or very light locking.
For most DBMS systems existing
on the market, locks are generally shared or exclusive.
Exclusive locks mean that no other lock can acquire
the current data object as long as the exclusive
lock lasts. Exclusive locks are usually set while
the database needs to change data, like during
an UPDATE or DELETE operation.
Shared locks can take ownership
one from the other of the current data structure.
Shared locks are usually used while the database
is reading data, during a SELECT operation. The
number, nature of locks and time the lock holds
a data block can have a huge impact on the database
performances. Bad locking can lead to disastrous
performance response (usually the result of poor
SQL requests, or inadequate database physical
structure)
Default locking behavior is
enforced by the isolation level of the dataserver.
Changing the isolation level will affect how shared
or exclusive locks must be set on the data for
the entire database system. Default isolation
is generally 1, where data can not be read while
it is modified, forbidding to return "ghost
data" to end user.
At some point intensive or inappropriate
exclusive locking, can lead to the "dead
lock" situation between two locks. Where
none of the locks can be released because they
try to acquire resources mutually from each other.
The Database has a fail safe mechanism and will
automatically "sacrifice" one of the
locks releasing the resource. Doing so processes
or transactions involved in the "dead lock"
will be rolled back.
Databases can also be locked
for other reasons, like access restrictions for
given levels of user. Databases are also locked
for routine database maintenance, which prevents
changes being made during the maintenance. See
"Locking tables and databases" (section
in some documentation / explanation from IBM)
for more detail.)
Database Development: Architecture
Depending on the intended use, there are a number
of database architectures in use. Many databases
use a combination of strategies. On-line Transaction
Processing systems (OLTP) often use a row-oriented
datastore architecture, while data-warehouse and
other retrieval-focused applications like Google's
BigTable, or bibliographic database(library catalogue)
systems may use a Column-oriented DBMS architecture.
Document-Oriented, XML, Knowledgebases,
as well as frame databases and rdf-stores (aka
Triple-Stores), may also use a combination of
these architectures in their implementation.
Finally it should be noted that
not all database have or need a database 'schema'
(so called schema-less databases).
Also there are other types of
database which cannot be classified as relational
databases
Database Development: Applications of databases
Databases are used in many applications, spanning
virtually the entire range of computer software.
Databases are the preferred method of storage
for large multiuser applications, where coordination
between many users is needed. Even individual
users find them convenient, and many electronic
mail programs and personal organizers are based
on standard database technology. Software database
drivers are available for most database platforms
so that application software can use a common
Application Programming Interface to retrieve
the information stored in a database. Two commonly
used database APIs are JDBC and ODBC. |