Unit 3: Introduction to Relational Database, SQL and Relational Model Basics
Relational Database
A relational database is a
collection of data items organized as a set of
formally-described tables from which data can be accessed or
reassembled in many different ways without having to reorganize the database
tables. The relational database was invented by E. F. Codd at IBM in
1970.
A relational database is a
digital database whose organization is based on the relational
model of data, as proposed by E.F. Codd in 1970. This model
organizes data into one or more tables (or "relations") of rows and
columns, with a unique key for each row. Generally, each entity type described
in a database has its own table, the rows representing instances of that type
of entity and the columns representing values attributed to that instance.
Because each row in a table has its own unique key, rows in a table can be
linked to rows in other tables by storing the unique key of the row to which it
should be linked (where such unique key is known as a "foreign key").
Codd showed that data relationships of arbitrary complexity can be represented
using this simple set of concepts.
Relational Model
The relational model for database management
is a database model based on first-order
predicate logic, first formulated and proposed in 1969 by Edgar F. Codd. In
the relational model of a database, all data is represented in terms of tuples, grouped into relations. A database organized in terms of the
relational model is a relational database.
Diagram of an example database according to the Relational model.
In the relational
model, related records are linked together with a "key".
Relations or Table
A relation is defined as a set
of tuples that have the same attributes. A Tuple usually
represents an object and information about that object. Objects are typically
physical objects or concepts. A relation is usually described as a table,
which is organized into rows and columns. All the data
referenced by an attribute are in the same domain and conform to the
same constraints.
The relational model specifies that the
tuples of a relation have no specific order and that the tuples, in turn,
impose no order on the attributes. Applications access data by specifying
queries, which use operations such as select to identify
tuples, project to identify attributes, and join to combine
relations. Relations can be modified using the insert, delete,
and update operators. New tuples can supply explicit values or be
derived from a query. Similarly, queries identify tuples for updating or
deleting.
Tuples by definition are unique. If the Tuple
contains a candidate or primary key then obviously it is unique;
however, a primary key need not be defined for a row or record to be a tuple.
The definition of a tuple requires that it be unique, but does not require a
primary key to be defined. Because a tuple is unique, its attributes by
definition constitute a super key.
Domain
A domain
describes the set of possible values for a given attribute, and can be
considered a constraint on the value of the attribute. Mathematically,
attaching a domain to an attribute means that any value for the attribute must
be an element of the specified set. The character string "ABC",
for instance, is not in the integer domain, but the integer value123 is.
Another example of domain describes the possible values for the field
"Gender" as ("Male, "Female"). So, the field
"Gender" will not accept input values like (0.1) or (M,F).
Optimization
A DBMS often has a choice about the access path for retrieving data. For
example, the DBMS can use an index (fast lookup for specific entries) or scan
the entire table to retrieve the appropriate rows. In addition, in statements
in which two tables are joined, the DBMS can choose which table to examine
first (join order) and how to join the tables (join strategy). Optimization means
that DBMS makes the best (optimal) choice of access paths, join order, and join
strategy. True query optimization means that the DBMS will usually make a good
choice regardless of how the query is written. The optimizer does not
necessarily make the best choice, just a good one.
Derby can use
indexes to improve the performance of DML (data manipulation language)
statements such as queries, updates, and deletes. The query optimizer can make
decisions about whether to use an index for a particular table (access path)
and also makes decisions about join order, type of join, and a few other matters.
Database Catalog
In computing,
a catalog is a directory of information about data sets, files, or a database .
A catalog usually describes where a data set, file or database entity is
located and may also include other information, such as the type of device on
which each data set or file is stored.
The database catalog of
a database instance consists of metadata in which
definitions of database objects such as base tables, views (virtual
tables), synonyms, value ranges, indexes, users, and user groups are
stored.
The SQL standard specifies a uniform means
to access the catalog, called the INFORMATION_SCHEMA, but not
all databases follow this, even if they implement other aspects of
the SQL standard. For an example of database-specific metadata access
methods, see Oracle metadata.
Views
A view is a
subset of a database that is generated from a query and stored as a permanent
object. Although the definition of a view is permanent, the data contained
therein is dynamic depending on the point in time at which the view is
accessed.
Views represent a subset of the data contained in a table. They can join and simplify multiple tables into one virtual table. They take up very little storage space because the database contains only the view definition, not the data. Furthermore, they can provide results for different calculations (like sum and average) along with the stored data, and can limit the degree to which tables are exposed to the outer world.
Purpose of
View
The views are required for several reasons. Some of
them are as follows:
1. It can b used
to hide sensitive columns. To do this the owner of the view must grant the
SELECT privilege on the view and revoke the privilege on the underlying tables.
2. It can be used
to hide complex queries involving multiple tables. Users need not concern with
the syntax of complex queries.
3. View created
with a check option, prevents the updating of other rows and columns.
4. It allows the
users to see the same data in different Views, at the same time.
5. It can
simplify complex operations on the base relations.
Transactions
A transaction is a unit of
work that is performed against a database. Transactions are units or sequences
of work accomplished in a logical order, whether in a manual fashion by a user
or automatically by some sort of a database program.
A transaction is the
propagation of one or more changes to the database. For example, if you are
creating a record or updating a record or deleting a record from the table,
then you are performing transaction on the table. It is important to control
transactions to ensure data integrity and to handle database errors.
Practically, you will club
many SQL queries into a group and you will execute all of them together as a
part of a transaction.
Transaction Properties
There are four
important properties of transaction that a DBMS must ensure to maintain data in the
case of concurrent access and system failures. These are:
·
Atomicity:
(all or nothing)
This property
states that a transaction must be treated as an atomic unit, that is, either
all of its operations are executed or none. There must be no state in a
database where a transaction is left partially completed. States should be
defined either before the execution of the transaction or after the
execution/abortion/failure of the transaction.
A transaction
is said to be atomic if a transaction always executes all its actions in one
step or not executes any actions at all It means either all or none of the
transactions operations are performed.
·
Consistency:
(No violation of integrity constraints)
The database
must remain in a consistent state after any transaction. No transaction should
have any adverse effect on the data residing in the database. If the database
was in a consistent state before the execution of a transaction, it must remain
consistent after the execution of the transaction as well.
A transaction
must preserve the consistency of a database after the execution. The
DBMS assumes that this property holds for each transaction. Ensuring this
property of a transaction is the responsibility of the user.
·
Isolation:
(concurrent changes invisibles)
In a database
system where more than one transaction are being executed simultaneously and in
parallel, the property of isolation states that all the transactions will be
carried out and executed as if it is the only transaction in the system. No
transaction will affect the existence of any other transaction.
The
transactions must behave as if they are executed in isolation. It means that if
several transactions are executed concurrently the results must be same as if
they were executed serially in some order. The data used during the execution
of a transaction cannot be used by a second transaction until the first one is
completed.
·
Durability:
(committed update persist)
The database
should be durable enough to hold all its latest updates even if the system
fails or restarts. If a transaction updates a chunk of data in a database and
commits, then the database will hold the modified data. If a transaction
commits but the system fails before the data could be written on to the disk,
then that data will be updated once the system springs back into action.
The effect of
completed or committed transactions should persist even after a crash. It means
once a transaction commits, the system must guarantee that the result of its
operations will never be lost, in spite of subsequent failures.
The acronym ACID
is sometimes used to refer above four properties of transaction that we have
presented here: Atomicity, Consistency, Isolation, and Durability.
In short, we can summarize
the properties of transaction as follows;
·
Atomicity: ensures that all operations within the work unit are
completed successfully; otherwise, the transaction is aborted at the point of
failure, and previous operations are rolled back to their former state.
·
Consistency: ensures that the database properly changes states upon a
successfully committed transaction.
·
Isolation: enables transactions to operate independently of and
transparent to each other.
·
Durability: ensures that the result or effect of a committed
transaction persists in case of a system failure.
Process of Transaction
The
transaction is executed as a series of reads and writes of database objects,
which are explained below:
·
Read Operation
To read a
database object, it is first brought into main memory from
disk, and then its value is copied into a program variable as shown in figure.
·
Write Operation
To write a
database object, an in-memory copy of the object is first modified and then
written to disk.
States of Transactions
A transaction in a
database can be in one of the following states.
·
Active − In this state, the transaction is
being executed. This is the initial state of every transaction.
·
Partially Committed − When a
transaction executes its final operation, it is said to be in a partially
committed state.
·
Failed − A transaction is said to be in a
failed state if any of the checks made by the database recovery system fails. A
failed transaction can no longer proceed further.
·
Aborted − If any of the checks fails and the
transaction has reached a failed state, then the recovery manager rolls back
all its write operations on the database to bring the database back to its
original state where it was prior to the execution of the transaction.
Transactions in this state are called aborted. The database recovery module can
select one of the two operations after a transaction aborts −
·
Re-start the transaction
·
Kill the transaction
·
Committed − If a transaction executes all its
operations successfully, it is said to be committed. All its effects are now
permanently established on the database system.
Advantages of Concurrent Execution of Transaction
The DBMS
interleaves the actions of different transactions to improve performance of
system as discussed below:
• Improved
Throughput: Consider that transaction are performed in serial order and
active transaction is waiting for a page to be read in from disk, then instead
of CPU waiting
for a page, it can process another transaction. This is because Input/output
activity can be done in parallel with the CPU activity. The overlapping of Input/output
activities of CPU reduces the amount of time disks and processors are idle and
increases system throughput (the average number of transaction completed in a
given time.)
• Reduced
Waiting time: Interleaved execution of a short transaction with a long
transaction usually allows the short transaction to complete quickly. In serial
execution a short transaction could get stuck behind a long transaction leading
to unpredictable delays in response time or average time taken to complete a
transaction.
Overview of SQL
SQL stands for
Structured Query Language. SQL is used to communicate with a database. According
to ANSI (American National Standards Institute), it is the standard language
for relational database management systems. SQL statements are used to perform
tasks such as update data on a database, or retrieve data from a database. Some
common relational database management systems that use SQL are: Oracle, Sybase,
Microsoft SQL Server, Access, Ingres, etc. Although most database systems use
SQL, most of them also have their own additional proprietary extensions that
are usually only used on their system. However, the standard SQL commands such
as "Select", "Insert", "Update",
"Delete", "Create", and "Drop" can be used to
accomplish almost everything that one needs to do with a database. This tutorial
will provide you with the instruction on the basics of each of these commands
as well as allow you to put them to practice using the SQL Interpreter.
Structure Query
Language (SQL) is a programming language used for storing and managing data in
RDBMS. SQL was the first commercial language introduced for E.F Codd's Relational model.
Today almost all RDBMS (MySql, Oracle, Infomix, Sybase, MS Access)
uses SQL as the standard database language. SQL is used to perform
all type of data operations in RDBMS. It can be summarized in following points;
·
SQL is an interactive language
·
SQL is a database programming language
·
SQL is a database administration language
·
SQL is a client/server language
·
SQL is an internet data access language
·
SQL is a distributed database language
·
SQL is a database gateway language
·
SQL is a standard computer database
programming Language and its popularity has expired since two decades.
·
It is portable language which supports right
from mainframe systems to personal computers and even to hand held devices.
Features of SQL
·
SQL is a High Level Language
·
SQL is a free-format syntax which gives users
the ability to structure SQL statements on any part of the screen.
·
SQL is very flexible.
·
SQL can be embedded also in front end
application code like of VB 6.0, JAVA2, VC++ 6.0 and so on.
·
SQL based applications are Portable i.e. they
can be run on any system.
·
SQL works with database programs like DB2,
Oracle, MS Access, Sybase, and MS SQL Server etc.
·
SQL allows the user to create, update,
delete, and retrieve data from a database.
·
SQL is very simple and easy to learn.
Embedded SQL
Embedded SQL is a method of combining the computing power
of a programming language and the database manipulation capabilities of SQL.
Embedded SQL statements are SQL statements written in line with the program
source code of the host language. The embedded SQL statements are parsed by an embedded
SQL preprocessor and replaced by host-language calls to a code library.
The output from the preprocessor is then compiled by the host compiler.
This allows programmers to embed SQL statements in programs written in any
number of languages such as C/C++, COBOL and FORTRAN.
The SQL standards committee defined the embedded SQL
standard in two steps: a formalism called Module Language was
defined, and then the embedded SQL standard was derived from Module Language.
The SQL standard defines embedding of SQL as embedded SQL and the
language in which SQL queries are embedded is referred to as the host language.
A popular host language is C. The mixed C and embedded SQL is called Pro*C in Oracle and Sybase database
management systems. In the PostgreSQL database management system this
pre-compiler is called ECPG. Other embedded SQL pre-compilers are Pro*Ada,
Pro*COBOL, Pro*FORTRAN, Pro*Pascal, and Pro*PL/I.
Features
of Embedded SQL
1.
Embedded SQL is prefixed
by EXEC SQL to distinguish it from host language.
2.
Delimiter
terminates every embedded SQL.
3.
If embedded SQL
statements extend over multiple lines then some strategy is used for
continuation as used by the host language.
4.
Embedded SQL code
is submitte3d to SQL precompiler.
5.
Variables of host
language can be re3ferenced in embedded SQL statements to allow the values
calculated by the program to be used.
SQL
Constraints
SQL Constraints are
compulsory rules on Data columns in a Table. These rules are used to limit the
type of data inserted into a table. SQL Constraints ensures
the accuracy and reliability of the data in the database.
Constraints could be
column level or table level:
- Column level constraints are applied only to one
column
- Table level constraints are applied to the whole
table
The below are
commonly used constraints available in SQL:
·
NOT
NULL
Ø Makes sure that a column cannot have NULL
value.
·
DEFAULT
Ø Defines a default value for a column when none
is specified.
·
UNIQUE
Ø Makes sure that all values in a column are
different.
·
PRIMARY Key
Ø Uniquely identifies each rows/records in a
database table.
·
FOREIGN Key
Ø Uniquely identifies a rows/records in any
another database table.
·
CHECK
Ø Ensures that all values in a column satisfy
certain conditions.
·
INDEX
Ø Use to create and retrieve data from the
database very quickly.
Data
Integrity
The below are
categories of the Data Integrity that exist with each RDBMS:
·
Entity
Integrity
o Ensures that there are no duplicate rows in a
table.
·
Domain
Integrity
o Ensures that only valid entries are entered
for a given column by, restricting the type, the format, or the range of
values.
·
Referential
integrity
o Ensures that Rows cannot be deleted, which are
used by other records.
·
User-Defined
Integrity
o Administers some explicit rules that do not
fall into Entity, Domain or Referential Integrity.
Basic
Data Types Used in SQL
SQL data type is an attribute
that specifies type of data of any object. Each column, variable and expression
has related data type in SQL.
You would use these data types
while creating your tables. You would choose a particular data type for a table
column based on your requirement.
SQL Server offers six
categories of data types for your use:
Exact Numeric Data Types:
Data Type
|
From
|
To
|
Bigint
|
-9,223,372,036,854,775,808
|
9,223,372,036,854,775,807
|
Int
|
-2,147,483,648
|
2,147,483,647
|
smallint
|
-32,768
|
32,767
|
tinyint
|
0
|
255
|
Bit
|
0
|
1
|
decimal
|
-10^38 +1
|
10^38 -1
|
numeric
|
-10^38 +1
|
10^38 -1
|
money
|
-922,337,203,685,477.5808
|
+922,337,203,685,477.5807
|
smallmoney
|
-214,748.3648
|
+214,748.3647
|
Approximate
Numeric Data Types:
Data Type
|
From
|
To
|
float
|
-1.79E +
308
|
1.79E +
308
|
real
|
-3.40E +
38
|
3.40E +
38
|
Date and
Time Data Types:
Data Type
|
From
|
To
|
datetime
|
Jan 1,
1753
|
Dec 31,
9999
|
smalldatetime
|
Jan 1,
1900
|
Jun 6,
2079
|
date
|
Stores a
date like June 30, 1991
|
|
time
|
Stores a
time of day like 12:30 P.M.
|
Note: Here, datetime has 3.33 milliseconds accuracy where as
smalldatetime has 1 minute accuracy.
Character
Strings Data Types:
Data Type
|
From
|
To
|
char
|
char
|
Maximum
length of 8,000 characters.( Fixed length non-Unicode characters)
|
varchar
|
varchar
|
Maximum
of 8,000 characters.(Variable-length non-Unicode data).
|
varchar(max)
|
varchar(max)
|
Maximum
length of 231characters, Variable-length non-Unicode data (SQL Server 2005
only).
|
text
|
text
|
Variable-length
non-Unicode data with a maximum length of 2,147,483,647 characters.
|
Unicode
Character String Data Types:
Data Type
|
From
|
To
|
nchar
|
Maximum
length of 4,000 characters.( Fixed length Unicode)
|
nchar
|
nvarchar
|
Maximum
length of 4,000 characters.(Variable length Unicode)
|
nvarchar
|
nvarchar(max)
|
Maximum
length of 231characters (SQL Server 2005 only).( Variable length Unicode)
|
nvarchar(max)
|
ntext
|
Maximum
length of 1,073,741,823 characters. ( Variable length Unicode )
|
ntext
|
Binary Data
Types:
Data Type
|
From
|
To
|
binary
|
Maximum
length of 8,000 bytes(Fixed-length binary data )
|
binary
|
varbinary
|
Maximum
length of 8,000 bytes.(Variable length binary data)
|
varbinary
|
varbinary(max)
|
Maximum
length of 231 bytes (SQL Server 2005 only). ( Variable length Binary data)
|
varbinary(max)
|
image
|
Maximum
length of 2,147,483,647 bytes. ( Variable length Binary Data)
|
image
|
Misc Data
Types:
Data Type
|
Description
|
sql_variant
|
Stores
values of various SQL Server-supported data types, except text, ntext, and
timestamp.
|
timestamp
|
Stores a
database-wide unique number that gets updated every time a row gets updated
|
uniqueidentifier
|
Stores a
globally unique identifier (GUID)
|
xml
|
Stores
XML data. You can store xml instances in a column or a variable (SQL Server
2005 only).
|
cursor
|
Reference
to a cursor object
|
table
|
Stores a
result set for later processing
|
The SQL supports a variety of data types as
given below:
Data
types
|
Description
|
Size
|
Number (P, S)
|
It is used to store numeric data types. P
stands for precision (total number of decimal digits) and S stands for
scale (total number of digits after decimal point).
|
Range of P is from 1 to 38. And S
is from -87 to 127.
|
Date
|
It is used to store data and time values
|
Range of date is from Jan, 47 B.C. to Dec
31 999 AD
|
Char (Size)
|
It is used to store fixed size character
data.
|
Range of char is 1 (By default) to 2000
bytes.
|
Varchar(Size)
|
It is used to store variable size character
data.
|
Range of varchar is 1 (By default) to 4000
bytes.
|
Long
|
It is used to store variable size character
data.
|
The range of Long is up to 2 GB
|
Raw(size)
|
It is used to store fixed binary data.
|
Maximum size up to 2000 bytes.
|
Long raw
|
It is used to store variable binary data.
|
Maximum size is up to 2 GB
|
Int
|
It is used to store numeric data
|
The range of int is range (2-31) to (231)-1
|
what's app?
ReplyDeleteApp is an abbreviated form of the word "application." An application is a software program that's designed to perform a specific function directly for the user or, in some cases, for another application program.
DeleteOR
a self-contained program or piece of software designed to fulfil a particular purpose; an application, especially as downloaded by a user to a mobile device.