History Creations

DBMS Basics

Database schema

Database schema skeleton structure of and it represents the logical view of entire database. It tells about how the data is organized and how relation among them is associated. It formulates all database constraints that would be put on data in relations, which resides in database. A database schema defines its entities and the relationship among them. Database schema is a descriptive detail of the database, which can be depicted by means of schema diagrams. All these activities are done by database designer to help programmers in order to give some ease of understanding all aspect of database.

->Physical Database Schema: This schema pertains to the actual storage of data and its form of storage
like files, indices etc. It defines the how data will be stored in secondary storage etc.
-> Logical Database Schema: This defines all logical constraints that need to be applied on data stored. It
defines tables, views and integrity constraints e

Database Instance

It is important that we distinguish these two terms individually. Database schema is the skeleton of database. It is designed when database doesn't exist at all and very hard to do any changes once the database is operational. Database schema does not contain any data or information. Database instances, is a state of operational database with data at any given time. This is a snapshot of database. Database instances tend to change with time. DBMS ensures that its every instance (state) must be a valid state by keeping up to all validation, constraints and condition that database designers has imposed or it is expected from DBMS itself.

Data Independence:

There's a lot of data in whole database management system other than user's data. DBMS comprises of three kinds of schemas, which is in turn data about data (Meta-Data). Meta-data is also stored along with database, which once stored is then hard to modify. But as DBMS expands, it needs to be changed over the time satisfy the requirements of users. But if the whole data were highly dependent it would become tedious and highly complex. Data about data itself is divided in layered architecture so that when we change data at one layer it does not affect the data layered at different level. This data is independent but mapped on each other.

Logical Data Independence

Logical data is data about database, that is, it stores information about how data is managed inside. For example, a table (relation) stored in the database and all constraints, which are applied on that relation. Logical data independence is a kind of mechanism, which liberalizes itself from actual data stored on the disk. If we do some changes on table format it should not change the data residing on disk.

Physical Data Independence

All schemas are logical and actual data is stored in bit format on the disk. Physical data independence is the power to change the physical data without impacting the schema or logical data. For example, in case we want to change or upgrade the storage system itself, that is, using SSD instead of Hard-disks should not have any impact on logical data or schemas.

DBMS Normalization

Functional Dependency

Functional dependency (FD) is set of constraints between two attributes in a relation. Functional dependency says that if two tuples have same values for attributes A1, A2,..., An then those two tuples must have to have same values for attributes B1, B2, ..., Bn. Functional dependency is represented by arrow sign (→), that is X→Y, where X functionally determines Y. The left hand side attributes determines the values of attributes at right hand side.

Armstrong's Axioms

If F is set of functional dependencies then the closure of F, denoted as F , is the set of all functional dependencies logically implied by F. Armstrong's Axioms are set of rules, when applied repeatedly generates closure of functional dependencies.

-> Reflexive rule: If alpha is a set of attributes and beta is_subset_of alpha, then alpha holds beta.
-> Augmentation rule: if a → b holds and y is attribute set, then ay → by also holds. That is adding attributes in dependencies, does not change the basic dependencies.
-> Transitivity rule: Same as transitive rule in algebra, if a → b holds and b → c holds then a → c also hold. a → b is called as a functionally determines b.

Trivial Functional Dependency

-> Trivial: If an FD X → Y holds where Y subset of X, then it is called a trivial FD. Trivial FDs are always hold.
-> Non-trivial: If an FD X → Y holds where Y is not subset of X, then it is called non-trivial FD.
-> Completely non-trivial: If an FD X → Y holds where x intersect Y = Φ, is said to be completely non-trivial FD.

Normalization

If a database design is not perfect it may contain anomalies, which are like a bad dream for database itself. Managing a database with anomalies is next to impossible.
-> Update anomalies: if data items are scattered and are not linked to each other properly, then there may be instances when we try to update one data item that has copies of it scattered at several places, few instances of it get updated properly while few are left with there old values. This leaves database in an inconsistent state.
-> Deletion anomalies: we tried to delete a record, but parts of it left undeleted because of unawareness, the data is also saved somewhere else.
-> Insert anomalies: we tried to insert data in a record that does not exist at all.

First Normal Form:

This is defined in the definition of relations (tables) itself. This rule defines that all the attributes in a relation must have atomic domains. Values in atomic domain are indivisible units.

Second Normal Form:

Before we learn about second normal form, we need to understand the following: ->Prime attribute: an attribute, which is part of prime-key, is prime attribute. ->Non-prime attribute: an attribute, which is not a part of prime-key, is said to be a non-prime attribute. Second normal form says, that every non-prime attribute should be fully functionally dependent on prime key attribute. That is, if X → A holds, then there should not be any proper subset Y of X, for that Y → A also holds.

Third Normal Form:

For a relation to be in Third Normal Form, it must be in Second Normal form and the following must satisfy: ->No non-prime attribute is transitively dependent on prime key attribute ->For any non-trivial functional dependency, X → A, then either ->X is a superkey or, ->A is prime attribute.

We find that in above depicted Student_detail relation, Stu_ID is key and only prime key attribute. We find that City can be identified by Stu_ID as well as Zip itself. Neither Zip is a superkey nor City is a prime attribute. Additionally, Stu_ID → Zip → City, so there exists transitive dependency.

Boyce-Codd Normal Form:

BCNF is an extension of Third Normal Form in strict way. BCNF states that ->For any non-trivial functional dependency, X → A, then X must be a super-key.
In the above depicted picture, Stu_ID is super-key in Student_Detail relation and Zip is super-key in ZipCodes relation. So,
Stu_ID → Stu_Name, Zip
And
Zip → City
Confirms, that both relations are in BCNF.

DBMS INDEXING

Indexing is a data structure technique to efficiently retrieve records from database files based on some attributes on which the indexing has been done. Indexing in database systems is similar to the one we see in books.

Indexing is defined based on its indexing attributes. Indexing can be one of the following types:

-> Primary Index: If index is built on ordering 'key-field' of file it is called Primary Index. Generally it is the
primary key of the relation.
-> Secondary Index: If index is built on non-ordering field of file it is called Secondary Index.
-> Clustering Index: If index is built on ordering non-key field of file it is called Clustering Index.
Ordering field is the field on which the records of file are ordered. It can be different from primary or candidate
key of a file.

Ordered Indexing is of two types:
-> Dense Index
-> Sparse Index

Dense Index

In dense index, there is an index record for every search key value in the database. This makes searching faster but requires more space to store index records itself. Index record contains search key value and a pointer to the actual record on the disk.

Sparse Index

In sparse index, index records are not created for every search key. An index record here contains search key and actual pointer to the data on the disk. To search a record we first proceed by index record and reach at the actual location of the data. If the data we are looking for is not where we directly reach by following index, the system starts sequential search until the desired data is found.

Multilevel Index

Index records are comprised of search-key value and data pointers. This index itself is stored on the disk along with the actual database files. As the size of database grows so does the size of indices. There is an immense need to keep the index records in the main memory so that the search can speed up. If single level index is used then a large size index cannot be kept in memory as whole and this leads to multiple disk accesses.

B + Tree

B tree is multi-level index format, which is balanced binary search trees. As mentioned earlier single level index records becomes large as the database size grows, which also degrades performance. + + All leaf nodes of B tree denote actual data pointers. B tree ensures that all leaf nodes remain at the same + height, thus balanced. Additionally, all leaf nodes are linked using link list, which makes B tree to support random access as well as sequential access.

Hash Organization

-> Bucket: Hash file stores data in bucket format. Bucket is considered a unit of storage. Bucket typically stores one complete disk block, which in turn can store one or more records. -> Hash Function: A hash function h, is a mapping function that maps all set of search-keys K to the address where actual records are placed. It is a function from search keys to bucket addresses.

Static Hashing

In static hashing, when a search-key value is provided the hash function always computes the same address. For example, if mod-4 hash function is used then it shall generate only 5 values. The output address shall always be same for that function. The numbers of buckets provided remain same at all times.

->Insertion: When a record is required to be entered using static hash, the hash function h, computes the
bucket address for search key K, where the record will be stored.
Bucket address = h(K)
-> Search: When a record needs to be retrieved the same hash function can be used to retrieve the address
of bucket where the data is stored.
-> Delete: This is simply search followed by deletion operation.

Bucket Overflow:

The condition of bucket-overflow is known as collision. This is a fatal state for any static hash function. In this case overflow chaining can be used. Overflow Chaining: When buckets are full, a new bucket is allocated for the same hash result and is linked after the previous one. This mechanism is called Closed Hashing.

For a hash function to work efficiently and effectively the following must match:
-> Distribution of records should be uniform
-> Distribution should be random instead of any ordering.

Dynamic Hashing

Problem with static hashing is that it does not expand or shrink dynamically as the size of database grows or shrinks. Dynamic hashing provides a mechanism in which data buckets are added and removed dynamically and on-demand. Dynamic hashing is also known as extended hashing. Hash function, in dynamic hashing, is made to produce large number of values and only a few are used initially.

DBMS Transaction

ACID Properties

A transaction may contain several low level tasks and further a transaction is very small unit of any program. A transaction in a database system must maintain some properties in order to ensure the accuracy of its completeness and data integrity. These properties are refer to as ACID properties and are mentioned below:

1.Atomicity: Though a transaction involves several low level operations but this property states that a transaction must be treated as an atomic unit, that is, either all of its operations are executed or none. There must be no state in database where the transaction is left partially completed. States should be.defined either before the execution of the transaction or after the execution/abortion/failure of the transaction.

2.Consistency: This property states that after the transaction is finished, its database must remain in a consistent state. There must not be any possibility that some data is incorrectly affected by the execution of transaction. If the database was in a consistent state before the execution of the transaction, it must remain in consistent state after the execution of the transaction.

3.Durability: This property states that in any case all updates made on the database will persist even if the system fails and restarts. If a transaction writes or updates some data in database and commits that data will always be there in the database. If the transaction commits but data is not written on the disk and the system fails, that data will be updated once the system comes up.

4. Isolation: In a database system where more than one transaction are being executed simultaneously and in parallel, the property of isolation states that all the transactions will be carried out and executed as if it is the only transaction in the system. No transaction will affect the existence of any other transaction.

States of Transactions:

->Active: In this state the transaction is being executed. This is the initial state of every transaction.

-> Partially Committed: When a transaction executes its final operation, it is said to be in this state. After execution of all operations, the database system performs some checks e.g. the consistency state of database after applying output of transaction onto the database.

-> Failed: If any checks made by database recovery system fails, the transaction is said to be in failed state, from where it can no longer proceed further.

-> Aborted: If any of checks fails and transaction reached in Failed state, the recovery manager rolls back all its write operation on the database to make database in the state where it was prior to start of execution of transaction. Transactions in this state are called aborted. Database recovery module can select one of the two operations after a transaction aborts:
*Re-start the transaction
*Kill the transaction

Committed: If transaction executes all its operations successfully it is said to be committed. All its effects are now permanently made on database system.

Computer Science