Unit 3: Introduction to Relational Database, SQL and Relational Model Basics

Relational Database

A relational database is a collection of data items organized as a set of formally-described tables from which data can be accessed or reassembled in many different ways without having to reorganize the database tables. The relational database was invented by E. F. Codd at IBM in 1970.

A relational database is a digital database whose organization is based on the relational model of data, as proposed by E.F. Codd in 1970. This model organizes data into one or more tables (or "relations") of rows and columns, with a unique key for each row. Generally, each entity type described in a database has its own table, the rows representing instances of that type of entity and the columns representing values attributed to that instance. Because each row in a table has its own unique key, rows in a table can be linked to rows in other tables by storing the unique key of the row to which it should be linked (where such unique key is known as a "foreign key"). Codd showed that data relationships of arbitrary complexity can be represented using this simple set of concepts.

Relational Model

The relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F. Codd. In the relational model of a database, all data is represented in terms of tuples, grouped into relations. A database organized in terms of the relational model is a relational database.

Diagram of an example database according to the Relational model.

In the relational model, related records are linked together with a "key".

Relations or Table

A relation is defined as a set of tuples that have the same attributes. A Tuple usually represents an object and information about that object. Objects are typically physical objects or concepts. A relation is usually described as a table, which is organized into rows and columns. All the data referenced by an attribute are in the same domain and conform to the same constraints.

The relational model specifies that the tuples of a relation have no specific order and that the tuples, in turn, impose no order on the attributes. Applications access data by specifying queries, which use operations such as select to identify tuples, project to identify attributes, and join to combine relations. Relations can be modified using the insert, delete, and update operators. New tuples can supply explicit values or be derived from a query. Similarly, queries identify tuples for updating or deleting.

Tuples by definition are unique. If the Tuple contains a candidate or primary key then obviously it is unique; however, a primary key need not be defined for a row or record to be a tuple. The definition of a tuple requires that it be unique, but does not require a primary key to be defined. Because a tuple is unique, its attributes by definition constitute a super key.

Domain

A domain describes the set of possible values for a given attribute, and can be considered a constraint on the value of the attribute. Mathematically, attaching a domain to an attribute means that any value for the attribute must be an element of the specified set. The character string "ABC", for instance, is not in the integer domain, but the integer value123 is. Another example of domain describes the possible values for the field "Gender" as ("Male, "Female"). So, the field "Gender" will not accept input values like (0.1) or (M,F).

Optimization

A DBMS often has a choice about the access path for retrieving data. For example, the DBMS can use an index (fast lookup for specific entries) or scan the entire table to retrieve the appropriate rows. In addition, in statements in which two tables are joined, the DBMS can choose which table to examine first (join order) and how to join the tables (join strategy). Optimization means that DBMS makes the best (optimal) choice of access paths, join order, and join strategy. True query optimization means that the DBMS will usually make a good choice regardless of how the query is written. The optimizer does not necessarily make the best choice, just a good one.

Derby can use indexes to improve the performance of DML (data manipulation language) statements such as queries, updates, and deletes. The query optimizer can make decisions about whether to use an index for a particular table (access path) and also makes decisions about join order, type of join, and a few other matters.

Database Catalog

In computing, a catalog is a directory of information about data sets, files, or a database . A catalog usually describes where a data set, file or database entity is located and may also include other information, such as the type of device on which each data set or file is stored.

The database catalog of a database instance consists of metadata in which definitions of database objects such as base tables, views (virtual tables), synonyms, value ranges, indexes, users, and user groups are stored.

The SQL standard specifies a uniform means to access the catalog, called the INFORMATION_SCHEMA, but not all databases follow this, even if they implement other aspects of the SQL standard. For an example of database-specific metadata access methods, see Oracle metadata.

Views

A view is a subset of a database that is generated from a query and stored as a permanent object. Although the definition of a view is permanent, the data contained therein is dynamic depending on the point in time at which the view is accessed.

Views represent a subset of the data contained in a table. They can join and simplify multiple tables into one virtual table. They take up very little storage space because the database contains only the view definition, not the data. Furthermore, they can provide results for different calculations (like sum and average) along with the stored data, and can limit the degree to which tables are exposed to the outer world.

Purpose of View

The views are required for several reasons. Some of them are as follows:

1. It can b used to hide sensitive columns. To do this the owner of the view must grant the SELECT privilege on the view and revoke the privilege on the underlying tables.

2. It can be used to hide complex queries involving multiple tables. Users need not concern with the syntax of complex queries.

3. View created with a check option, prevents the updating of other rows and columns.

4. It allows the users to see the same data in different Views, at the same time.

5. It can simplify complex operations on the base relations.

Transactions

A transaction is a unit of work that is performed against a database. Transactions are units or sequences of work accomplished in a logical order, whether in a manual fashion by a user or automatically by some sort of a database program.

A transaction is the propagation of one or more changes to the database. For example, if you are creating a record or updating a record or deleting a record from the table, then you are performing transaction on the table. It is important to control transactions to ensure data integrity and to handle database errors.

Practically, you will club many SQL queries into a group and you will execute all of them together as a part of a transaction.

Transaction Properties

There are four important properties of transaction that a DBMS must ensure to maintain data in the case of concurrent access and system failures. These are:

· Atomicity: (all or nothing)

This property states that a transaction must be treated as an atomic unit, that is, either all of its operations are executed or none. There must be no state in a database where a transaction is left partially completed. States should be defined either before the execution of the transaction or after the execution/abortion/failure of the transaction.

A transaction is said to be atomic if a transaction always executes all its actions in one step or not executes any actions at all It means either all or none of the transactions operations are performed.

· Consistency: (No violation of integrity constraints)

The database must remain in a consistent state after any transaction. No transaction should have any adverse effect on the data residing in the database. If the database was in a consistent state before the execution of a transaction, it must remain consistent after the execution of the transaction as well.

A transaction must preserve the consistency of a database after the execution. The DBMS assumes that this property holds for each transaction. Ensuring this property of a transaction is the responsibility of the user.

· Isolation: (concurrent changes invisibles)

In a database system where more than one transaction are being executed simultaneously and in parallel, the property of isolation states that all the transactions will be carried out and executed as if it is the only transaction in the system. No transaction will affect the existence of any other transaction.

The transactions must behave as if they are executed in isolation. It means that if several transactions are executed concurrently the results must be same as if they were executed serially in some order. The data used during the execution of a transaction cannot be used by a second transaction until the first one is completed.

· Durability: (committed update persist)

The database should be durable enough to hold all its latest updates even if the system fails or restarts. If a transaction updates a chunk of data in a database and commits, then the database will hold the modified data. If a transaction commits but the system fails before the data could be written on to the disk, then that data will be updated once the system springs back into action.

The effect of completed or committed transactions should persist even after a crash. It means once a transaction commits, the system must guarantee that the result of its operations will never be lost, in spite of subsequent failures.

The acronym ACID is sometimes used to refer above four properties of transaction that we have presented here: Atomicity, Consistency, Isolation, and Durability.

In short, we can summarize the properties of transaction as follows;

· Atomicity: ensures that all operations within the work unit are completed successfully; otherwise, the transaction is aborted at the point of failure, and previous operations are rolled back to their former state.

· Consistency: ensures that the database properly changes states upon a successfully committed transaction.

· Isolation: enables transactions to operate independently of and transparent to each other.

· Durability: ensures that the result or effect of a committed transaction persists in case of a system failure.

Process of Transaction

The transaction is executed as a series of reads and writes of database objects, which are explained below:

· Read Operation

To read a database object, it is first brought into main memory from disk, and then its value is copied into a program variable as shown in figure.

· Write Operation

To write a database object, an in-memory copy of the object is first modified and then written to disk.

States of Transactions

A transaction in a database can be in one of the following states.

· Active − In this state, the transaction is being executed. This is the initial state of every transaction.

· Partially Committed − When a transaction executes its final operation, it is said to be in a partially committed state.

· Failed − A transaction is said to be in a failed state if any of the checks made by the database recovery system fails. A failed transaction can no longer proceed further.

· Aborted − If any of the checks fails and the transaction has reached a failed state, then the recovery manager rolls back all its write operations on the database to bring the database back to its original state where it was prior to the execution of the transaction. Transactions in this state are called aborted. The database recovery module can select one of the two operations after a transaction aborts −

· Re-start the transaction

· Kill the transaction

· Committed − If a transaction executes all its operations successfully, it is said to be committed. All its effects are now permanently established on the database system.

Advantages of Concurrent Execution of Transaction

The DBMS interleaves the actions of different transactions to improve performance of system as discussed below:

• Improved Throughput: Consider that transaction are performed in serial order and active transaction is waiting for a page to be read in from disk, then instead of CPU waiting for a page, it can process another transaction. This is because Input/output activity can be done in parallel with the CPU activity. The overlapping of Input/output activities of CPU reduces the amount of time disks and processors are idle and increases system throughput (the average number of transaction completed in a given time.)

• Reduced Waiting time: Interleaved execution of a short transaction with a long transaction usually allows the short transaction to complete quickly. In serial execution a short transaction could get stuck behind a long transaction leading to unpredictable delays in response time or average time taken to complete a transaction.

Overview of SQL

SQL stands for Structured Query Language. SQL is used to communicate with a database. According to ANSI (American National Standards Institute), it is the standard language for relational database management systems. SQL statements are used to perform tasks such as update data on a database, or retrieve data from a database. Some common relational database management systems that use SQL are: Oracle, Sybase, Microsoft SQL Server, Access, Ingres, etc. Although most database systems use SQL, most of them also have their own additional proprietary extensions that are usually only used on their system. However, the standard SQL commands such as "Select", "Insert", "Update", "Delete", "Create", and "Drop" can be used to accomplish almost everything that one needs to do with a database. This tutorial will provide you with the instruction on the basics of each of these commands as well as allow you to put them to practice using the SQL Interpreter.

Structure Query Language (SQL) is a programming language used for storing and managing data in RDBMS. SQL was the first commercial language introduced for E.F Codd's Relational model. Today almost all RDBMS (MySql, Oracle, Infomix, Sybase, MS Access) uses SQL as the standard database language. SQL is used to perform all type of data operations in RDBMS. It can be summarized in following points;

· SQL is an interactive language

· SQL is a database programming language

· SQL is a database administration language

· SQL is a client/server language

· SQL is an internet data access language

· SQL is a distributed database language

· SQL is a database gateway language

· SQL is a standard computer database programming Language and its popularity has expired since two decades.

· It is portable language which supports right from mainframe systems to personal computers and even to hand held devices.

Features of SQL

· SQL is a High Level Language

· SQL is a free-format syntax which gives users the ability to structure SQL statements on any part of the screen.

· SQL is very flexible.

· SQL can be embedded also in front end application code like of VB 6.0, JAVA2, VC++ 6.0 and so on.

· SQL based applications are Portable i.e. they can be run on any system.

· SQL works with database programs like DB2, Oracle, MS Access, Sybase, and MS SQL Server etc.

· SQL allows the user to create, update, delete, and retrieve data from a database.

· SQL is very simple and easy to learn.

Embedded SQL

Embedded SQL is a method of combining the computing power of a programming language and the database manipulation capabilities of SQL. Embedded SQL statements are SQL statements written in line with the program source code of the host language. The embedded SQL statements are parsed by an embedded SQL preprocessor and replaced by host-language calls to a code library. The output from the preprocessor is then compiled by the host compiler. This allows programmers to embed SQL statements in programs written in any number of languages such as C/C++, COBOL and FORTRAN.

The SQL standards committee defined the embedded SQL standard in two steps: a formalism called Module Language was defined, and then the embedded SQL standard was derived from Module Language. The SQL standard defines embedding of SQL as embedded SQL and the language in which SQL queries are embedded is referred to as the host language. A popular host language is C. The mixed C and embedded SQL is called Pro*C in Oracle and Sybase database management systems. In the PostgreSQL database management system this pre-compiler is called ECPG. Other embedded SQL pre-compilers are Pro*Ada, Pro*COBOL, Pro*FORTRAN, Pro*Pascal, and Pro*PL/I.

Features of Embedded SQL

1. Embedded SQL is prefixed by EXEC SQL to distinguish it from host language.

2. Delimiter terminates every embedded SQL.

3. If embedded SQL statements extend over multiple lines then some strategy is used for continuation as used by the host language.

4. Embedded SQL code is submitte3d to SQL precompiler.

5. Variables of host language can be re3ferenced in embedded SQL statements to allow the values calculated by the program to be used.

SQL Constraints

SQL Constraints are compulsory rules on Data columns in a Table. These rules are used to limit the type of data inserted into a table. SQL Constraints ensures the accuracy and reliability of the data in the database.

Constraints could be column level or table level:

Column level constraints are applied only to one column
Table level constraints are applied to the whole table

The below are commonly used constraints available in SQL:

· NOT NULL

Ø Makes sure that a column cannot have NULL value.

· DEFAULT

Ø Defines a default value for a column when none is specified.

· UNIQUE

Ø Makes sure that all values in a column are different.

· PRIMARY Key

Ø Uniquely identifies each rows/records in a database table.

· FOREIGN Key

Ø Uniquely identifies a rows/records in any another database table.

· CHECK

Ø Ensures that all values in a column satisfy certain conditions.

· INDEX

Ø Use to create and retrieve data from the database very quickly.

Data Integrity

The below are categories of the Data Integrity that exist with each RDBMS:

· Entity Integrity

o Ensures that there are no duplicate rows in a table.

· Domain Integrity

o Ensures that only valid entries are entered for a given column by, restricting the type, the format, or the range of values.

· Referential integrity

o Ensures that Rows cannot be deleted, which are used by other records.

· User-Defined Integrity

o Administers some explicit rules that do not fall into Entity, Domain or Referential Integrity.

Basic Data Types Used in SQL

SQL data type is an attribute that specifies type of data of any object. Each column, variable and expression has related data type in SQL.

You would use these data types while creating your tables. You would choose a particular data type for a table column based on your requirement.

SQL Server offers six categories of data types for your use:

Exact Numeric Data Types:

Data Type	From	To
Bigint	-9,223,372,036,854,775,808	9,223,372,036,854,775,807
Int	-2,147,483,648	2,147,483,647
smallint	-32,768	32,767
tinyint	0	255
Bit	0	1
decimal	-10^38 +1	10^38 -1
numeric	-10^38 +1	10^38 -1
money	-922,337,203,685,477.5808	+922,337,203,685,477.5807
smallmoney	-214,748.3648	+214,748.3647

Approximate Numeric Data Types:

Data Type	From	To
float	-1.79E + 308	1.79E + 308
real	-3.40E + 38	3.40E + 38

Date and Time Data Types:

Data Type	From	To
datetime	Jan 1, 1753	Dec 31, 9999
smalldatetime	Jan 1, 1900	Jun 6, 2079
date	Stores a date like June 30, 1991
time	Stores a time of day like 12:30 P.M.

Note: Here, datetime has 3.33 milliseconds accuracy where as smalldatetime has 1 minute accuracy.

Character Strings Data Types:

Data Type	From	To
char	char	Maximum length of 8,000 characters.( Fixed length non-Unicode characters)
varchar	varchar	Maximum of 8,000 characters.(Variable-length non-Unicode data).
varchar(max)	varchar(max)	Maximum length of 231characters, Variable-length non-Unicode data (SQL Server 2005 only).
text	text	Variable-length non-Unicode data with a maximum length of 2,147,483,647 characters.

Unicode Character String Data Types:

Data Type	From	To
nchar	Maximum length of 4,000 characters.( Fixed length Unicode)	nchar
nvarchar	Maximum length of 4,000 characters.(Variable length Unicode)	nvarchar
nvarchar(max)	Maximum length of 231characters (SQL Server 2005 only).( Variable length Unicode)	nvarchar(max)
ntext	Maximum length of 1,073,741,823 characters. ( Variable length Unicode )	ntext

Binary Data Types:

Data Type	From	To
binary	Maximum length of 8,000 bytes(Fixed-length binary data )	binary
varbinary	Maximum length of 8,000 bytes.(Variable length binary data)	varbinary
varbinary(max)	Maximum length of 231 bytes (SQL Server 2005 only). ( Variable length Binary data)	varbinary(max)
image	Maximum length of 2,147,483,647 bytes. ( Variable length Binary Data)	image

Misc Data Types:

Data Type	Description
sql_variant	Stores values of various SQL Server-supported data types, except text, ntext, and timestamp.
timestamp	Stores a database-wide unique number that gets updated every time a row gets updated
uniqueidentifier	Stores a globally unique identifier (GUID)
xml	Stores XML data. You can store xml instances in a column or a variable (SQL Server 2005 only).
cursor	Reference to a cursor object
table	Stores a result set for later processing

The SQL supports a variety of data types as given below:

Data types	Description	Size
Number (P, S)	It is used to store numeric data types. P stands for precision (total number of decimal digits) and S stands for scale (total number of digits after decimal point).	Range of P is from 1 to 38. And S is from -87 to 127.
Date	It is used to store data and time values	Range of date is from Jan, 47 B.C. to Dec 31 999 AD
Char (Size)	It is used to store fixed size character data.	Range of char is 1 (By default) to 2000 bytes.
Varchar(Size)	It is used to store variable size character data.	Range of varchar is 1 (By default) to 4000 bytes.
Long	It is used to store variable size character data.	The range of Long is up to 2 GB
Raw(size)	It is used to store fixed binary data.	Maximum size up to 2000 bytes.
Long raw	It is used to store variable binary data.	Maximum size is up to 2 GB
Int	It is used to store numeric data	The range of int is range (2^-31) to (2³¹)^-1

What is an Operator in SQL?

Search This Blog

Shree Secondary School