Class- X_DBMS Unit 1: Introduction to Database System Architecture
What is Data?
Data is a
singular form of "Datum" is a collection of facts and figure, such as
numbers, words, measurements, observations or even just descriptions of things.
Data can be collected in many ways. The simplest way is direct observation.
Example: you
want to find how many cars pass by a certain point on a road in a 10-minute
interval.
So: stand at
that point on the road, and count the cars that pass by in that interval.
We collect data by
doing a Survey.
There are two
types of Data
Qualitative:
Qualitative
data is descriptive information (it describes something)
Quantitative:
Quantitative
data, is numerical information (numbers).
Quantitative
data can also be Discrete or Continuous:
Discrete
data can only take certain values (like whole numbers)
Continuous
data can take any value (within a range)
Discrete
data is counted, Continuous data is measured
Qualitative:
He is brown and
black
He has long hair
He has lots of
energy
Quantitative:
Discrete:
He has 4 legs
He has 2
brothers
Continuous:
He weighs 25.5
kg
He is 565 mm
tall
To help you
remember think "Quantitative is about Quantity"
Qualitative:
Your friends'
favorite holiday destination
The most common
given names in your town
How people
describe the smell of a new perfume
Quantitative:
Height
(Continuous)
Weight
(Continuous)
Petals on a
flower (Discrete)
Customers in a
shop (Discrete)
Information:
When data are processed, organized, structured or
presented in a given context so as to make them useful, they are called
Information. Information is data that has been processed in
such a way as to be meaningful to the person who receives it. it is any thing
that is communicated. So we can say, Information
is a processed form of data which gives meaningful information about an entity
( Person, Place or Thing).
Information is needed to:
• To gain
knowledge about the surroundings, and whatever is happening in the society and
universe.
• To keep
the system up to date.
• To know
about the rules and regulations and bye laws of society, local government,
provincial and central government, associations, clients etc. as ignorance is
no bliss.
• Based on
above three, to arrive at a particular decision for planning current and
prospective actions in process of forming, running and protecting a process or
system
Knowledge
Human mind
purposefully organized the information and evaluate it to produce knowledge. In
other words the ability of the person recalls or uses his information and
experience is known as knowledge. For example, "386" is data,
"your marks are 386" is information, and "It is result of your
hard work" is knowledge.
Knowledge is of two types:
• Facts based or Information based Knowledge:
The
knowledge gained from fundamentals and through experiments. The knowledge is
derived from the information contained in fundamental science derived from
experiments, rules, regulations commonly agreed by experts.
• Heuristic Knowledge:
It is knowledge
of good practice, experience and good judgment like hypothesis. It is the
knowledge underlying "expertise", rules of thumb, rules of good
guessing, that usually achieves desired results but do not guarantee them.
Today
Knowledge Management plays a significant role in the development of an
organization.
Differences Between Data and Information
Data
|
Information
|
1.
Data
is raw fact and figures.
Eg.
55 is data
|
1.
Information
is processed form of data.
Eg.
When 55 is stored in row column form as shown below is information
Age
55
|
2.
Data
is not significant to a business and of itself.
|
2.
Information
is significant to a business and of itself. For example 55 is insignificant
for business but Age 55 is significant for a business like music.
|
3.
Data
are atomic level piece of information.
|
3.
Information
is a collection of data. For example age and 55 collected together to form
information.
|
4.
Data
does not help in decision making.
|
4.
As
explained above information helps in decision making.
|
5.
for
example in the health care industries, much activity surrounds data
collection. Nurses collect data everyday and sometimes hourly. Examples of
data include vital signs. Weight and relevant assessment parameters.
|
5.
Information,
however, provides answers to questions that guide clinicians to change their
practices. for example, the trending of vital signs over time provides a
pattern that may lead to certain clinical decisions.
|
Database:
The organized collection of stored
information that can be shared and used for multiple purposes is known as a
database. A database document is just a collection
of information stored in computerized form. Computer databases
can be highly structured, storing the same kind of information about each item
in the database in well-defined compartments. A specific set of fields and
records organized in a specific order, including the information they contain,
is called a table. In fact, tables are often displayed on the screen with each
item, or record, in a row, and each field as a column.
Structured databases can be either flat
file databases or relational databases. In a flat file database, you can work
with only one data table-one set of fields-at a time. In a relational database,
you can use multiple tables (multiple database documents) at once. Flat file
databases are much easier to understand and use, but relational databases are
much more efficient for many things you commonly do with data, especially in
businesses.
A database can also be simply a free-form
collection of information, without any particular structure. In this case, the
analogy would be to a pile of notes you've written on whatever paper was handy
at the time the information on each piece of paper doesn't have to be organized
in the same way.
The term database can also refer to the
software package itself that you use to create the database. More often, the
software is called a "database program" ("database
application" is more specific) or a "database management system"
(DBMS).
A database application is one of the most
useful tools on the computer, and is actually an incredible amount of fun.
What
is DBMS?
DBMS A database management system is the
software system that allows users to define, create and maintain a database and
provides controlled access to the data.
A Database Management System (DBMS) is
basically a collection of programs that enables users to store, modify, and
extract information from a database as per the requirements. DBMS is
an intermediate layer between programs and the data. Programs access the DBMS,
which then accesses the data. There are different types of DBMS ranging from
small systems that run on personal computers to huge systems that run on
mainframes. The following are main examples of database applications:
• Computerized library systems
• Automated teller machines
• Flight reservation systems
• Computerized parts inventory systems
A database management system is a piece
of software that provides services for accessing a database, while maintaining
all the required features of the data. Commercially available Database
management systems in the market are dbase, FoxPro, IMS and Oracle, MySQL, SQL
Servers and DB2 etc.
These systems allow users to create
update, and extract information from their databases.
Compared to a manual filing system, the
biggest advantages to a computerized database system are speed, accuracy, and'
accessibility.
Components of the Database System Environment
There are five major components in the database system environment
and their interrelationship are.
• Hardware
• Software
• Data
• Users
• Procedures
1.Hardware:
The hardware is the actual computer system used for keeping and accessing
the database. Conventional DBMS hardware consists of secondary storage devices,
usually hard disks, on which the database physically resides, together with the
associated Input-Output devices, device controllers and· so forth. Databases
run on a' range of machines, from Microcomputers to large mainframes. Other
hardware issues for a DBMS includes database machines, which is hardware
designed specifically to support a database system.
2.
Software:
The software is the actual
DBMS. Between the physical database itself (i.e. the data as actually stored)
and the users of the system is a layer of software, usually called the Database
Management System or DBMS. All requests from users for access to the database
are handled by the DBMS. One general function provided by the DBMS is thus the
shielding of database users from complex hardware-level detail.
The DBMS allows the users to
communicate with the database. In a sense, it is the mediator between the
database and the users. The DBMS controls the access and helps to maintain the
consistency of the data. Utilities are usually included as part of the DBMS.
Some of the most common utilities are report writers and application
development.
3.
Data :
It is the most important
component of DBMS environment from the end users point of view. As shown in
observes that data acts as a bridge between the machine components and the user
components. The database contains the operational data and the meta-data, the
'data about data'.
The database should contain
all the data needed by the organization. One of the major features of databases
is that the actual data are separated from the programs that use the data. A
database should always be designed, built and populated for a particular
audience and for a specific purpose.
4. Users :
There are a number of users
who can access or retrieve data on demand using the applications and interfaces
provided by the DBMS. Each type of user needs different software capabilities.
The users of a database system can be classified in the following groups,
depending on their degrees of expertise or the mode of their interactions with
the DBMS. The users can be:
• Naive Users
• Online Users
• Application Programmers
• Sophisticated Users
• Data Base Administrator
(DBA)
Naive
Users:
Naive Users are those users
who need not be aware of the presence of the database system or any other
system supporting their usage. Naive users are end users of the database who
work through a menu driven application program, where the type and range of
response is always indicated to the user.
A user of an Automatic Teller
Machine (ATM) falls in this category. The user is instructed through each step
of a transaction. He or she then responds by pressing a coded key or entering a
numeric value. The operations that can be performed by valve users are very
limited and affect only a precise portion of the database. For example, in the
case of the user of the Automatic Teller Machine, user's action affects only
one or more of his/her own accounts.
Online
Users :
Online users are those who
may communicate with the database directly via an online terminal or indirectly
via a user interface and application program. These users are aware of the
presence of the database system and may have acquired a certain amount of
expertise with in the limited interaction permitted with a database.
Sophisticated
Users :
Such users interact with the
system without ,writing programs.
Instead, they form their
requests in database query language. Each such query is submitted to a very
processor whose function is to breakdown DML statement into instructions that
the storage manager understands.
Specialized
Users :
Such users are those ,who
write specialized database application that do not fit into the fractional
data-processing framework. For example: Computer-aided design systems,
knowledge base and expert system, systems that store data with complex data types (for example, graphics data
and audio data).
Application
Programmers :
Professional programmers are
those who are responsible for developing application programs or user
interface. The application programs could be written using general purpose
programming language or the commands available to manipulate a database.
Database
Administrator:
The database administrator
(DBA) is the person or group in charge for implementing the database system
,within an organization. The "DBA has all the system privileges allowed by
the DBMS and can assign (grant) and remove (revoke) levels of access (privileges)
to and from other users. DBA is also responsible for the evaluation, selection
and implementation of DBMS package.
5.
Procedures:
Procedures refer to the
instructions and rules that govern the design and use of the database. The
users of the system and the staff that manage the database require documented
procedures on how to use or run the system.
These may consist of
instructions on how to:
• Log on to the DBMS.
• Use a particular DBMS
facility or application program.
• Start and stop the DBMS.
• Make backup copies of the
database.
• Handle hardware or software
failures.
Change the structure of a
table, reorganize the database across multiple disks, improve performance, or
archive data to secondary storage.
What are the Components of DBMS?
A typical structure of a DBMS with
its components and relationships between them is shown. The DBMS software is
partitioned into several modules. Each module or component is assigned a
specific operation to perform. Some of the functions of the DBMS are supported
by operating systems (OS) to provide basic services and DBMS is built on top of
it. The physical data and system catalog are stored on a physical disk. Access
to the disk is controlled primarily by as, which schedules disk input/output.
Therefore, while designing a DBMS its interface with the as must be taken into
account.
Components of a DBMS
The DBMS accepts the SQL
commands generated from a variety of user interfaces, produces query evaluation
plans, executes these plans against the database, and returns the answers. As
shown, the major software modules or components of DBMS are as follows:
(i)
Query processor:
The query processor transforms user queries
into a series of low level instructions. It is used to interpret the online
user's query and convert it into an efficient series of operations in a form
capable of being sent to the run time data manager for execution. The query
processor uses the data dictionary to find the structure of the relevant
portion of the database and uses this information in modifying the query and preparing
and optimal plan to access the database.
(ii)
Run time database manager:
Run time database manager is
the central software component of the DBMS, which interfaces with
user-submitted application programs and queries. It handles database access at
run time. It converts operations in user's queries coming. Directly via the
query processor or indirectly via an application program from the user's
logical view to a physical file system. It accepts queries and examines the
external and conceptual schemas to determine what conceptual records are
required to satisfy the user’s request. It enforces constraints to maintain the
consistency and integrity of the data, as well as its security. It also
performs backing and recovery operations. Run time database manager is
sometimes referred to as the database control system and has the following
components:
•
Authorization control:
The authorization control
module checks the authorization of users in terms of various privileges to
users.
•
Command processor:
The command processor
processes the queries passed by authorization control module.
Integrity
checker:
It .checks the integrity
constraints so that only valid data can be entered into the database.
Query
optimizer:
The query optimizers
determine an optimal strategy for the query execution.
•
Transaction manager:
The transaction manager
ensures that the transaction properties
should be maintained by the system.
•
Scheduler:
It provides an environment in
which multiple users can work on same piece of data at the same time in other
words it supports concurrency.
(iii)
Data Manager:
The data manager is
responsible for the actual handling of data in the database. It provides
recovery to the system which that system should be able to recover the data
after some failure. It includes Recovery manager and Buffer manager. The buffer
manager is responsible for the transfer of data between the main memory and secondary storage (such as disk or
tape). It is also referred as the cache manger.
Execution Process of a DBMS
As show, conceptually,
following logical steps are followed while executing users to request to access
the database system:
(I) Users issue a query using
particular database language, for example, SQL commands.
(ii) The passes query is presented to a query
optimizer, which uses information about how the data is stored to produce an
efficient execution plan for the evaluating the query.
(iii) The DBMS accepts the users SQL commands and
analyses them.
(iv) The DBMS produces query evaluation plans, that
is, the external schema for the user, the corresponding external/conceptual
mapping, the conceptual schema, the conceptual/internal mapping, and the
storage structure definition. Thus, an evaluation\ plan is a blueprint for
evaluating a query.
(v) The DBMS executes these plans against the
physical database and returns the answers to the user.
Using components such as
transaction manager, buffer manager, and recovery manager, the DBMS supports
concurrency and recovery.
Advantages of DBMS
The database management
system has promising potential advantages, which are explained below:
1.
Controlling Redundancy:
In file system, each
application has its own private files, which cannot be shared between multiple
applications. 1:his can often lead to considerable redundancy in the stored
data, which results in wastage of storage space. By having centralized database
most of this can be avoided. It is not possible that all redundancy should be
eliminated. Sometimes there are sound business and technical reasons for·
maintaining multiple copies of the same data. In a database system, however
this redundancy can be controlled.
For example: In case of college database, there may be the number of
applications like General Office, Library, Account Office, Hostel etc. Each of
these applications may maintain the following information into own private file
applications:
It is clear from the above
file systems, that there is some common data of the student which has to be
mentioned in each application, like Rollno, Name, Class, Phone_No~ Address etc.
This will cause the problem of redundancy which results in wastage of storage
space and difficult to maintain, but in case of centralized database, data can
be shared by number of applications and the whole college can maintain its
computerized data with the following database:
It is clear in the above
database that Rollno, Name, Class, Father_Name, Address,
Phone_No, Date_of_birth which
are stored repeatedly in file system in each application, need not be stored
repeatedly in case of database, because every other application can access this
information by joining of relations on the basis of common column i.e. Rollno.
Suppose any user of Library system need the Name, Address of any particular
student and by joining of Library and General Office relations on the basis of
column Rollno he/she can easily retrieve this information.
Thus, we can say that
centralized system of DBMS reduces the redundancy of data to great extent but
cannot eliminate the redundancy because RollNo is still repeated in all the
relations.
1.
Integrity can be enforced:
Integrity of data means that
data in database is always accurate, such that incorrect information cannot be
stored in database. In order to maintain the integrity of data, some integrity
constraints are enforced on the database. A DBMS should provide capabilities
for defining and enforcing the constraints.
For Example: Let us consider
the case of college database and suppose that college having only BTech, MTech,
MSc, BCA, BBA and BCOM classes. But if a user enters the class MCA, then this
incorrect information must not be stored in database and must be prompted that
this is an invalid data entry. In order to enforce this, the integrity
constraint must be applied to the class attribute of the student entity. But,
in case of file system tins constraint must be enforced on all the application
separately (because all applications have a class field).
In case of DBMS, this
integrity constraint is applied only once on the class field of the
General Office (because class
field appears only once in the whole database), and all other applications will
get the class information about the student from the General Office table so
the integrity constraint is applied to the whole database. So, we can conclude
that integrity constraint can be easily enforced in centralized DBMS system as
compared to file system.
3. Inconsistency can be avoided :
When the same data is
duplicated and changes are made at one site, which is not propagated to the
other site, it gives rise to inconsistency and the two entries regarding the
same data will not agree. At such times the data is said to be inconsistent.
So, if the redundancy is removed chances of having inconsistent data is also
removed.
4. Data can be shared:
As explained earlier, the
data about Name, Class, Father __name etc. of General Office is shared by
multiple applications in centralized DBMS as compared to file system so now
applications can be developed to operate against the same stored data. The
applications may be developed without having to create any new stored files.
5.
Standards can be enforced :
Since DBMS is a central
system, so standard can be enforced easily may be at Company level, Department
level, National level or International level. The standardized data is very
helpful during migration or interchanging of data. The file system is an
independent system so standard cannot be easily enforced on multiple
independent applications.
6. Restricting unauthorized access:
When multiple users share a
database, it is likely that some users will not be authorized to access all
information in the database. For example, account office data is often
considered confidential, and hence only authorized persons are allowed to
access such data. In addition, some users may be permitted only to retrieve data,
whereas other are allowed both to retrieve and to update. Hence, the type of
access operation retrieval or update must also be controlled. Typically, users
or user groups are given account numbers protected by passwords, which they can
use to gain access to the database. A DBMS should provide a security and
authorization subsystem, which the DBA uses to create accounts and to specify
account restrictions. The DBMS should then enforce these restrictions
automatically.
7. Solving Enterprise Requirement than Individual
Requirement:
Since many types of users
with varying level of technical knowledge use a database, a DBMS should provide
a variety of user interface. The overall requirements of the enterprise are
more important than the individual user requirements. So, the DBA can structure
the database system to provide an overall service that is "best for the
enterprise".
For example: A representation
can be chosen for the data in storage that gives fast access for the most
important application at the cost of poor performance in some other
application. But, the file system favors the individual requirements than the
enterprise requirements
A DBMS must provide
facilities for recovering from hardware or software failures. The backup and
recovery subsystem of the DBMS is responsible for recovery. For example, if the
computer system fails in the middle of a complex update program, the recovery
subsystem is responsible for making sure that the .database is restored to the
state it was in before the program started executing.
9. Cost of developing and maintaining system is lower:
It is much easier to respond
to unanticipated requests when data is centralized in a database than when it
is stored in a conventional file system. Although the initial cost of setting
up of a database can be large, but the cost of developing and maintaining
application programs to be far lower than for similar service using
conventional systems. The productivity of programmers can be higher in using
non-procedural languages that have been developed with DBMS than using
procedural languages.
10.
Data Model can be developed :
The centralized system is
able to represent the complex data and interfile relationships, which results
better data modeling properties. The data madding properties of relational
model is based on Entity and their Relationship, which is discussed in detail
in chapter 4 of the book.
11.
Concurrency Control :
DBMS systems provide
mechanisms to provide concurrent access of data to multiple users.
Disadvantages of DBMS
The disadvantages of the
database approach are summarized as follows:
1. Complexity :
The provision of the
functionality that is expected of a good DBMS makes the DBMS an extremely
complex piece of software. Database designers, developers, database
administrators and end-users must understand this functionality to take full
advantage of it. Failure to understand the system can lead to bad design
decisions, which can have serious consequences for an organization.
2. Size :
The complexity and breadth of
functionality makes the DBMS an extremely large piece of software, occupying
many megabytes of disk space and requiring substantial amounts of memory to run efficiently.
3. Performance:
Typically, a File Based
system is written for a specific application, such as invoicing. As result,
performance is generally very good. However, the DBMS is written to be more
general, to cater for many applications rather than just one. The effect is
that some applications may not run as fast as they used to.
4.Higher impact of a failure:
The centralization of
resources increases the vulnerability of the system. Since all users and
applications rely on the ~vailabi1ity of the DBMS, the failure of any component
can bring operations to a halt.
5. Cost of DBMS:
The cost of DBMS varies
significantly, depending on the environment and functionality provided. There
is also the recurrent annual maintenance cost.
Additional Hardware costs:
The disk storage requirements for the DBMS and the database may
necessitate the purchase of additional storage space. Furthermore, to achieve
the required performance it may be necessary to purchase a larger machine,
perhaps even a machine dedicated to running the DBMS. The procurement of
additional hardware results in further expenditure.
7. Cost of Conversion:
In some situations, the cost
oftlle DBMS and extra hardware may be insignificant compared with the cost of
converting existing applications to run on the new DBMS and hardware. This cost
also includes the cost of training staff to use these new systems and possibly
the employment of specialist staff to help with conversion and running of the
system. This cost is one of the main reasons why some organizations feel tied
to their current systems and cannot switch to modern database technology.
The DBMS can
be classified according to the number of users and the database site locations. These are:
On the basis of the number of
users:
• Single-user DBMS
• Multi-user DBMS
On the basis of the site
location
• Centralized DBMS
• Parallel DBMS
• Distributed DBMS
• Client/server DBMS
we will discuss about some of
the important types of DBMS system, which are presently being used.
The database system may be
multi-user or single-user. The configuration of the hardware and the size of
the organization will determine whether it is a multi-user system or a single
user system.
In single user system the
database resides on one computer and is only accessed by one user at a
time. This one user may design, maintain, and write database programs.
Due to large amount of data
management most systems are multi-user. In this situation the data are both
integrated and shared. A database is integrated when the same information is not recorded in two places. For
example, both the Library department and the Account department of the college
database may need student addresses. Even though both departments may access
different portions of the database, the students' addresses should only reside
in one place. It is the job of the DBA to make sure that the DBMS makes the
correct addresses available from one central storage area.
Centralized Database System
The centralized database
system consists of a single processor together with its associated data storage devices and other peripherals. It is physically
confined to a single location. Data can be accessed from the multiple sites
with the use of a computer network while the database is maintained at the
central site.
Disadvantages of Centralized
Database System
• When the central site
computer or database system goes down, then every one (users) is blocked from
using the system until the system comes back.
• Communication costs from
the terminals to the central site can be expensive.
Parallel Database System
Parallel database system
architecture consists of a multiple Central
Processing Units (CPUs) and data
storage disk in parallel. Hence, they improve processing and Input/Output (I/O)
speeds. Parallel database systems are used in the application that have to
query extremely large databases or that have to process an extremely large
number of transactions per second.
Advantages of a Parallel
Database System
• Parallel database systems
are very useful for the applications that have to query extremely large
databases (of the order of terabytes, for example, 1012 bytes) or that have to
process an extremely large number of transactions per second (of the order of
thousands of transactions per second).
• In a parallel database
system, the throughput (that is, the number of tasks that can be completed in a
given time interval) and the response time (that is, the amount of time it
takes to complete a single task from the time it is· submitted) are very high.
Disadvantages of a Parallel
Database System
• In a parallel database
system, there· is a startup cost associated with initiating a single process
and the startup-time may overshadow the actual processing time, affecting
speedup adversely.
• Since process executing in
a parallel system often access shared resources, a slowdown may result from
interference of each new process as it completes with existing processes for
commonly held resources, such as shared data storage disks, system bus and so
on.
Distributed Database System
A logically interrelated
collection of shared data physically distributed over a computer network is
called as distributed database and the software system that permits the
management of the distributed database and makes the distribution transparent
to users is called as Distributed DBMS.
It consists of a single
logical database that is split into a number of fragments. Each fragment is
stored on one or more computers under the control of a separate DBMS, with the
computers connected by a communications network. As shown, in distributed database
system, data is spread across a variety of different databases. These are
managed by a variety of different DBMS software running on a variety of
different operating systems. These machines are spread (or distributed)
geographically and connected together by a variety of communication networks.
Advantages of Distributed
Database System
• Distributed database
architecture provides greater efficiency and better performance.
• A single database (on
server) can be shared across several distinct client (application) systems.
• As data volumes and
transaction rates increase, users can grow the system incrementally.
• It causes less impact on
ongoing operations when adding new locations.
• Distributed database system
provides local autonomy.
Disadvantages of Distributed
Database System
• Recovery from failure is
more complex in distributed database systems than in centralized systems.
Client-Server DBMS
Client/Server architecture of
database system has two logical components namely client, and server. Clients
are generally personal computers or workstations whereas server is large
workstations, mini range computer system or a mainframe computer system. The
applications and tools of DBMS run on one or more client platforms, while the
DBMS soft wares reside on the server. The server computer is caned backend and
the client's computer is called front end. These server and client computers
are connected into a network. The applications and tools act as clients of the
DBMS, making requests for its services. The DBMS, in turn, processes these
requests and returns the results to the client(s). Client/Server architecture
handles the Graphical User Interface (GUI) and does computations and other
programming of interest to the end user. The server handles parts of the job
that are common to many clients, for example, database access and updates.
Multi-Tier client server
computing models
In a single-tier system the
database is centralized, which means the DBMS Software and the data reside in
one location and the dumb terminals were used to access the DBMS as shown.
The rise of personal
computers in businesses during the 1980s, the increased reliability of
networking hardware causes Two-tier and Three-tier systems became common. In a
two-tier system, different software is required for the server and for the
client. Illustrates the two-tier client server model. At the early stages
client server computing model was called two-tier-computing model in which
client is considered as data capture and validation tier and Server was
considered as data storage tier. This scenario is depicted.
Problems of two-tier
architecture
The need of enterprise
scalability challenged this traditional two-tier client-server model. In the mid-1990s, as
application became more complex and could be deployed to hundreds or thousands of end-users, the
client side, now undergoes with following problems:
• A' fat' client requiring
considerable resources on client's computer to run effectively. This includes
disk space, RAM and CPU.
• Client machines require
administration which results overhead.
Three-tier architecture
By 1995, three-tier
architecture appears as improvement over two-tier architecture. It has three
layers, which are:
• First Layer: User Interface which runs on end-user's computer (the client) .
• Second Layer: Application Server It is a business logic and data processing
layer. This middle tier runs on a server which is called as Application Server.
• Third Layer: Database Server It is a
DBMS, which stores the data required by the middle tier. This tier may run on a
separate server called the database server.
As, described earlier, the
client is now responsible for application's user interface, thus it requires
less computational resources now clients are called as 'thin client' and it
requires less maintenance.
Advantages of Client/Server
Database System
• Client/Server system has
less expensive platforms to support applications that had previously been
running only on large and expensive mini or mainframe computers
• Client offer icon-based
menu-driven interface, which is superior to the traditional command-line, dumb
terminal interface typical of mini and mainframe computer systems.
• Client/Server environment
facilitates in more productive work by the users and making better use of
existing data.
• Client/Server database
system is more flexible as compared to the Centralized system.
• Response time and
throughput is high.
• The server (database)
machine can be custom-built (tailored) to the DBMS function and thus can
provide a better DBMS performance.
• The client (application database)
might be a personnel workstation, tailored to the needs of the end users and
thus able to provide better interfaces, high availability, faster responses and
overall improved ease of use to the user. + A single database (on server) can
be shared across several distinct client (application) systems.
Disadvantages of
Client/Server Database System
• Programming cost is high in
client/server environments, particularly in initial phases.
• There is a lack of
management tools for diagnosis, performance monitoring and tuning and security
control, for the DBMS, client and operating systems and networking
environments.
What is Data Independence?
Data independence is the
type of data transparency
that matters for a centralized DBMS. It refers to the immunity of
user applications to changes made in the
definition and organization of data.
There are two kinds of data
independence:
• Logical data
independence
• Physical data independence
Logical Data Independence
Logical data independence
indicates that the conceptual schema can be changed without affecting the
existing external schemas. The change would be absorbed by the mapping between
the external and conceptual levels. Logical data independence also insulates
application programs from operations such as combining two records into one or
splitting an existing record into two or more records. This would require a.
change in the external/conceptual mapping so as to leave the external view
unchanged.
Physical Data Independence
Physical data independence
indicates that the physical storage structures or devices could be changed
without affecting conceptual schema. The change would be absorbed by the
mapping between the conceptual and internal levels. Physic 1data independence
is achieved by the presence of the internal level of the database and the n, lapping
or transformation from the conceptual level of the database to the internal
level. Conceptual level to internal level mapping, therefore provides a means
to go from the conceptual view (conceptual records) to the internal view and
hence to the stored data in the database (physical records).
If there is a need to change
the file organization or the type of physical device used as a result of growth
in the database or new technology, a change is required in the conceptual/
internal mapping between the conceptual and internal levels. This change is
necessary to maintain the conceptual level invariant. The physical data
independence criterion requires that the conceptual level does not specify
storage structures or the access methods (indexing, hashing etc.) used to
retrieve the data from the physical storage medium. Making the conceptual
schema physically data independent means that the external schema, which is
defined on the conceptual schema, is in turn physically data independent.
The Logical data independence
is difficult to achieve than physical data independence as it requires the
flexibility in the design of database and prograll1iller has to foresee the
future requirements or modifications in the design.
What is a Database
Architecture
An early proposal for a
standard terminology and general architecture for database systems was produced
in 1971 by the DBTG (Data Base Task Group) appointed by the Conference on Data
Systems and Languages (CODASYL, 1971). The DBTG recognized the need for a two
level approach with a system view called the schema and user views called
subschema. The American National Standards Institute (ANSI) Standards Planning
and Requirements Committee (SPARC) produced a similar terminology mid architecture
in 1975 (ANSI 1975). ANSI-SPARC recognized the need for a three level approach
with a system catalog.
There are following
three levels or layers of DBMS
architecture:
• External Level
•Conceptual Level
• Internal Level
Objective of the Three Level
Architecture
The objective of the three
level architecture is to separate each user's view of the database from the Way
the database is physically represented. There are several reasons why this
separation is desirable:
• Each user should be able to
access the same data, but have a different customized view of the data. Each
user should be able to change the way he or she views the data, and this change
should not affect other users.
• Users should not have to
deal directly with physical database storage details, such as indexing or
hashing. In other words a user's interaction with the database should be
independent of storage considerations.
• The Database Administrator
(DBA) should be able to change the database storage structures without
affecting the user's views.
. The internal structure of
the database should be unaffected by changes to the physical aspects of
storage, such as the changeover to a new storage device.
. The DBA should be able to
change the conceptual structure of the database without affecting all users.
External Level or View level
It is the users' view of the
database. This level describes that part of the database that is relevant to
each user. External level is the one which is closest to the end users. This
level deals with the way in which individual users vie\v data. Individual users
are given different views according to the user's requirement.
A view involves only those
portions of a database which are of concern to a user. Therefore same database
can have different views for different users. The external view insulates users
from the details of the internal and conceptual levels. External level is also
known as the view level. In addition different views may have different
representations of the same data. For example, one user may view dates in the
form (day, month, year), while another may view dates as (year, month, day).
Conceptual Level or Logical
level
It is the community view of
the database. This level describes what data is stored in the database and the
relationships among the data. The middle level in the three level architecture
is the conceptual level. This level contains the logical structure of the
entire database as seen by the DBA. It is a complete view of the data
requirements of the organization that is independent of any storage
considerations. The conceptual level represents:
• All entities, their
attributes, and their relationships;
An Entity is an object whose information is stored in the database. For
example, in student database the entity is student. An attribute is a
characteristic of interest about an entity.
For example, in case of
student database Roll No, Name, Class, Address etc. are attributes of entity
student.
• The constraints on the
data;
• Semantic information about
the data;
• Security and integrity
information.
The conceptual level supports
each external view, in that any data available to a user must be contained in,
or derivable from, the conceptual level. However, this level must not contain
any storage dependent details. For instance, the description of an entity
should contain only data types of
attributes (for example, integer, real, character) and their length (such as
the maximum number of digits or characters), but not any storage
considerations, such as the number of bytes occupied. Conceptual level is also
known as the, logical level.
Internal level or Storage
level
It is the physical
representation of the database on the computer.
This level describes how the data is stored in the database. The internal level
is the one that concerns the way the data are physically stored on the
hardware. The internal level covers the physical\ implementation of the
database to achieve optimal runtime performance and storage space utilization.
It covers the data structures and file organizations used to store data on storage devices. It interfaces with
the operating system access
methods to place the data on the storage devices, build the indexes, retrieve
the data, and so· on.
The internal level is
concerned with such things as:
• Storage space allocation
for data and indexes;
• Record descriptions for
storage (with stored sizes for data items);
• Record placement;
• Data compression and data
encryption techniques.
There will be only one
conceptual view, consisting of the abstract representation of the database in
it’s entirely. Similarly there will be only one internal or physical view,
representing the total database, as it is physically stored.
What is Schema?
It is important to note that
the data in the database changes frequently, while the plans or schemes remain
the same over long periods of time. The database plans consist of types of
entities that a database deals with, the relationship among these entities and
the ways in which the entities and relationships are expressed from one level
of abstraction to the next level for the users' view. The users' view of the
data (also called logical organization of data) should be in a form that is
most convenient for the users and they should not be concerned about the way
data is physically organized. Therefore, a DBMS should do the translation
between the logical (users' view) organization and the physical organization of
the data in the database.
The plan or scheme of the
database is known as Schema. Schema gives the names of the entities and
attributes. It specifies the relationship among them. It is a framework into
which the values of the data items (or fields) are fitted. The plans or the
format of schema remains the same. But the values fitted into this format
changes from instance to instance. In other terms, schema means overall plans
of all the data item (field) types and record types stored in a database.
Schema includes the definition of the database name, the record type and the
components that make up those records
Types of Schema
There are three different
types of schema in the database corresponding to each data view of database. In
other words, the data views at each of three levels are described by schema.
A schema is defined as an
outline or a plan that describes the records and relationships existing at the
particular level. The External view is described by means of a schema called
external schema that correspond to different views of the data. Similarly the
Conceptual view is defined by conceptual schema, which describes all the
entities, attributes, and relationship together with integrity constraints.
Internal View is defined by internal schema, which is a complete description of
the internal model, containing definition of stored records, the methods of
representation, the data fields, and the indexes used.
There is only one conceptual
schema and one internal schema per database. The schema also describes the way
in which data elements at one level can be mapped to the corresponding data
elements in the next level.
Thus, we can say that schema
establishes correspondence between the records and relationships in the two
levels. In a relational database, the schema defines the tables, the fields in
each table, and the relationships between fields and tables. Schema are
generally stored in a data dictionary.
The data in the database at
any particular point in time is called a database instance. Therefore, many
database instances can correspond to the same database schema. The schema is
sometimes called the intension of the database, while an instance is called an
extension (or state) of the database.
Example: To understand the
difference between the three levels, consider again the database schema that
describes College Database system. If User1 is a Library clerk, the external
view would contain only the student and book information. If User2 is an
account office clerk then he/she may be interested in students detail and fee
detail. Shows specific information actually available at each level regarding a
particular user.
The external view would
depend upon the user who is accessing the database. The conceptual level
contain the logical view of the whole database, it represents the data type of
each required field. The internal view represents the physical location of each
element on the disk of the servers well as how many bytes of storage each
element needs.
Mapping between Views
The DBMS is responsible for
mapping between these three types of schema. Two mappings are required in a
database system with three different views.
External/Conceptual Mapping:
Each external schema is
related to the conceptual schema by the external/conceptual mapping. A mapping
between the external and conceptual views gives the correspondence among the
records and the relationships of the external and conceptual views the external
view is an abstraction of the conceptual view, which in its turn is an
abstraction of the internal view. It describes the contents of the database as
perceived by the user or application program of that view. The user of the
external view sees and manipulates a record corresponding to the external view.
There is a mapping from0 a particular logical record in the external view to
one (or more) conceptual record(s) in the conceptual view.
Differences between
External/Conceptual views
Following could be
differences that exist between the two:
Names of the field’s and.
records, for instance, may be different. A number of conceptual fields can be
combined into a single external field, for example, Last_Name and First_Name at
the conceptual level but Name at the external level. A given external record
could be derived from a number of conceptual .records.
Conceptual/Internal Mapping:
Conceptual schema is related to the internal schema by the
conceptual/internal mapping. This enables the DBMS to find the actual record or
combination of records in physical storage that constitute a logical record in
conceptual schema. Mapping between the conceptual and the internal levels
specifies the method of deriving the conceptual record from the physical
database.
What is Mapping?
We
know that three view-levels are described by means of three schemas. These
schemas are stored in the data dictionary. In DBMS, each user refers
only to its own external schema. Hence, the DBMS
must transform a request on. A specified external schema
into a request against conceptual schema, and then into a request against
internal schema to store and retrieve data to and from the database.
The process to convert a request (from external level) and the
result between view levels is called mapping. The mapping defines the
correspondence between three view levels. The mapping description is also
stored in data dictionary. The DBMS is responsible for mapping between these
three types of schemas. There are two types of mapping.
(i)
External-Conceptual mapping
An external-conceptual mapping
defines the correspondence between a particular external view and
the conceptual view. The external-conceptual mapping tells
the DBMS which objects on the conceptual level correspond to the
objects requested on a particular user's external view. If changes
are made to either an external view or conceptual view,
then mapping must be changed accordingly.
(ii)
Conceptual-Internal
mapping
The conceptual-internal
mapping defines the correspondence between the conceptual view and
the internal view, i.e. Database stored on the physical storage device. It
describes how conceptual records are stored and retrieved to and from
the storage device. This means that conceptual-internal mapping tells the
DBMS that how the conceptual! Records are physically represented. If the
structure of the stored database is changed, then the mapping must be changed accordingly.
It is the responsibility of DBA to manage such changes.
What is Database
Administration?
Database
administration is the function of managing and maintaining database
management systems (DBMS) software. Mainstream DBMS software such as Oracle,
IBM DB2 and Microsoft SQL Server need ongoing management.
Who is DBA? What are the Role
and Responsibilities of DBA?
Database Administrator (DBA)
is a person or group in charge for implementing DBMS in
an organization. Database Administrator's job requires a high degree of
technical expertise and the ability to understand and interpret management
requirements at a senior level. In practice the DBA may consist of team of
people rather than just one person
The main responsibilities of
DBA are:
• Makes decisions
concerning the content of the database:
It is the DBA's job to decide exactly what information is to be held in the database-in other
words, to identify the' entities of interest to the enterprise and to identify
information to be recorded about those entities .
• Plans storage
structures and access strategies:
The DBA must also decide how the data is to be
represented in the database, and must specify the representation by writing the
storage structure definition (using the internal data definition language).
In addition, the associated
mapping between the storage structure definition and the conceptual schema must
also be specified.
• Provides support to
users:
It is the responsibility of
the DBA to provide support to the users, to ensure that the data they require
is available, and to write the\ necessary external schemas (using the
appropriate external data definition language).
In addition, the mapping
between any given eA1ernal schema and the conceptual' schema must also be
specified.
• Defines security and
integrity checks:
DBA is responsible for
providing the authorization and authentication checks such that no malicious
users can access database and it must remain protected. DBA must also ensure
the integrity of the database.
• Interprets backup and
recovery strategies:
In the event of damage to any
portion\ of the database-caused by human error, say, or a failure in the
hardware or supporting operating system-it is essential to be able to repair
the data concerned witl1 a minimum of delay and with as little effect as
possible on the rest of the system.
The DBA must define and
implement an appropriate recovery strategy to recover he database from all
types of failures.
• Monitoring
performance and responding to changes in requirements:
The DBA is responsible for so
organizing the system as to get the performance that is "best for the
enterprise," and for making the appropriate adjustments as requirements
change.
The DBA should posses the
following skills
(1) A good knowledge of the operating system(s)
(2) A good knowledge of
physical database design
(3) Ability to perform both
Oracle and also operating system performance monitoring and the necessary
adjustments.
(4) Be able to provide a
strategic database direction for the organization.
(5) Excellent knowledge of
Oracle backup and recovery scenarios.
(6) Good skills in all Oracle
tools.
(7) A good knowledge of
Oracle security management.
(8) A good knowledge of how
Oracle acquires and manages resources.
(9) Sound knowledge of the
applications at your site.
(10) Experience and knowledge
in migrating code, database changes, data and
Menus through the various stages
of the development life cycle.
(11) A good knowledge of the
way Oracle enforces data integrity.
(12) A sound knowledge of
both database and program code performance tuning.
(13) A DBA should possess a
sound understanding of the business.
(14) A DBA should have sound
communication skills with management, development teams, vendors, systems
administrators and other related service providers.
What is Data Manager?
The data manager is the
central software component of the DBMS. It is sometimes referred to as the
database control system. One of the functions of the data manager is to convert
operations in the user's queries coming directly via the query processor or\
indirectly via an application program from the user's logical view to a
physical file system. The data manager is responsible for interfacing with the
file system as show. In addition, the tasks of enforcing constraints to
maintain the consistency and integrity of the data, as well as its security,
are also performed by the data manager. It is also the responsibility of the
Data. Manager to provide the synchronization in the simultaneous operations
performed by concurrent users and to maintain the backup and recovery
operations.
What is File Manager?
Responsibility for the
structure of the files and managing the file space rests with the file manager.
It is also responsible for locating the block containing the required record,
requesting this block from the disk manager, and transmitting the required
record to the data manager as shown. The file manager can be implemented using
an interface to the existing file subsystem provided by the operating system of the host computer or it can include a file subsystem
written especially for the DBMS.
What is Disk Manager?
The disk manager is part of
the operating system of the host computer and all physical input and output
operations are performed by it. The disk manager transfers the block or page
requested by the file manager so that the latter need not be concerned with the
physical characteristics of the underlying storage media.
What are Data Models? Type of
Data Models.
Data Model can be defined as
an integrated collection of concepts for describing and manipulating data,
relationships between data, and constraints on the data in an organization.
A data model comprises of
three components:
i.
A structural part, consisting of a set of rules according to which
databases can be constructed.
ii.
A manipulative part, defining the types of operation that are
allowed on the data (this includes the operations that are used for updating or
retrieving data from the database and for changing the structure of the
database).
iii.
Possibly a set of integrity rules, which ensures that the data is
accurate.
The purpose of a data model
is to represent data and to make the data understandable. There have been many
data models proposed in the literature. They fall into three broad categories:
·
Object Based Data Models
·
Physical Data Models
·
Record Based Data Models
The object based and record
based data models are used to describe data at the conceptual and external
levels, the physical data model is used to· describe data at the internal
level.
Object Based Data Models
Object based data models use
concepts such as entities, attributes, and relationships. An entity is a
distinct object (a person, place, concept, and event) in the organization that
is to be represented in the database. An attribute is a property that describes
some aspect of the object that we wish to record, and a relationship is an
association between entities.
Some of the more common types
of object based data model are:
• Entity-Relationship
• Object Oriented
• Semantic
• Functional
The Entity-Relationship
model has emerged as one of the main techniques for modeling database design
and forms the basis for the database design methodology. The object oriented
data model extends the definition of an entity to include, not only the
attributes that describe the state of the object but also the actions that are
associated with the object, that is, its behavior. The object is said to
encapsulate both state and behavior. Entities in semantic systems represent the
equivalent of a record in a relational system or an object in an OO system but
they do not include behavior (methods). They are abstractions 'used to
represent real world (e.g. customer) or conceptual (e.g. bank account) objects.
The functional data model is now almost twenty years old. The original idea was
to' view the database as a collection of extensionally defined functions and to
use a functional language for querying the database.
Physical Data Models
Physical data models describe
how data is stored in the computer,
representing information such as record structures, record
ordering, and access paths. There are not as many physical data models as
logical data models, the most common one being the Unifying Model.
Record Based Logical Models
Record based logical models
are used in describing data at the logical and view levels. In contrast to
object based data models, they are used to specify the overall logical structure
of the database and to provide a higher-level description of the
implementation. Record based models are so named because the database is
structured in fixed format records of several types. Each record type defines a
fixed number of fields, or attributes, and each field is usually of a fixed
length.
The three most widely
accepted record based data models are:
• Hierarchical Model
• Network Model
• Relational Model
The relational model has
gained favor over the other two in recent years. The network and hierarchical
models are still used in a large number of older databases.
What is ER-Model? Advantages
and Disadvantages of E-R Model.
There are two techniques used
for the purpose of data base designing from the system requirements. These are:
• Top down Approach
known as Entity-Relationship Modeling
• Bottom Up approach known as
Normalization.
we will focus on top
down approach of designing database. It is a graphical technique, which is used
to convert the requirement of the system to graphical representation, so that
it can become well understandable. It also provides the framework for designing
of database.
The Entity-Relationship (ER)
model was originally proposed by Peter in 1976 as a way to unify the network
and relational database views. Simply stated, the ER model is a conceptual data
model that views the real world as entities and relationships. A basic
component of the model is the Entity-Relationship diagram, which is used to
visually represent data objects. For the database designer, the utility of the
ER model is:
• It maps well to the
relational model. The constructs used in the ER model can easily be transformed
into relational tables.
• It is simple and easy to
understand with a minimum of training. Therefore, the model can be used by the
database designer to communicate the design to the end user.
• In addition, the model can
be used as a design plan by the database developer to implement a data model in
specific database management software.
Advantages and Disadvantages of E-R Data Model
Following are advantages of
an E-R Model:
• Straightforward relation
representation: Having designed an E-R
diagram for a database application, the relational representation of the
database model becomes relatively straightforward.
• Easy conversion for E-R to
other data model: Conversion from E-R diagram
to a network or hierarchical data model can· easily be accomplished.
• Graphical representation
for better understanding: An E-R model gives graphical and diagrammatical
representation of various entities, its attributes and relationships between
entities. This is turn helps in the clear understanding of the data structure
and in minimizing redundancy and other problems.
Disadvantages of E-R Data Model
Following are disadvantages
of an E-R Model:
• No industry standard for
notation: There is no industry standard notation for
developing an E-R diagram.
• Popular for high-level
design: The
E-R data model is especially popular for high level.
What is Metadata OR Data
Dictionary?
A metadata (also called the data dictionary) is the data
about the data. It is the self describing nature of the database that provides
program-data independence. It is also called as the System Catalog. It holds
the following information about each data element in the
databases, it normally includes:
+ Name
+ Type
+ Range of values
+ Source
+ Access authorization
+ Indicates which application
programs use the data so that, when a change in a data structure is
contemplated, a list of the affected programs can be generated.
Data dictionary is used to
actually control the database operation, data integrity and accuracy. Metadata
is used by developers to develop the programs, queries, controls and procedures
to manage and manipulate the data. Metadata is available to database administrators
(DBAs), designers and authorized user as on-line system documentation. This
improves the control of database administrators (DBAs) over the information
system and the user's understanding and use of the system.
Active and Passive Data
Dictionaries
Data dictionary may be either
active or passive. An active data dictionary (also called integrated data
dictionary) is managed automatically by the database management software.
Consistent with the current structure and definition of the database. Most of
the relational database management systems contain active data dictionaries
that can be derived from their system catalog.
The passive data dictionary
(also called non-integrated data dictionary) is the one used only for
documentation purposes. Data about fields, files, people and so on, in the data
processing environment are. Entered into the dictionary and cross-referenced.
Passive dictionary is simply a self-contained application. It is managed by the
users of the system and is modified whenever the structure of the database is
changed. Since this modification must be performed manually by the user, it is
possible that the data dictionary will not be current with the current
structure of the database. However, the passive data dictionaries may be maintained
as a separate database. Thus, it allows developers to remain independent from
using a particular relational database
management system. It may be extended to contain information about
organizational data that is not computerized.
Importance of Data Dictionary
Data dictionary is essential
in DBMS because of the following reasons:
• Data dictionary provides
the name of a data element, its description and data structure in which it may
be found.
• Data dictionary provides
great assistance in producing a report of where a data element is used in all
programs that mention it.
• It is also possible to
search for a data name, given keywords that describe the name. For example, one
might want to determine the name of a variable that stands for net pay. Entering
keywords would produce a list of possible identifiers and their definitions.
Using keywords one can search the dictionary to locate the proper identifier to
use in a program.
These days, commercial data
dictionary packages are available to facilitate entry, editing and to use the
data elements.
Comments
Post a Comment