Database management systems

Course Purpose

1) The aim is to present the general principles of the database systems with a practical focus, and some implementation assignments. Those assignments may or may not include some programming

2)The course assumes that students have some programming or strong logical reasoning skills in programming languages like C or C++. Some programing data structures such as (heap files, buffer manager, B+ trees, hash indexes, varoius goin methodes) are used in the course

3) The course concentrates on issues of the design, tuning , and implementation of the databases

The object of Database

A database is a collection of information, which describes the activities of one of more related organizations. The performance and productivity of organization depends on their abilities to acquire accurate and up to time data, and manage it. The amount of the data organization is exponentially growing and hard to manage by hand, so the database management system comes to help.
For example, if the database is about a food store, the information in database can be characterized as
- Entities, such as products, salespeople, etc
- Relationships between the entities such as products, and salesmen who are selling them.
- Database management system, or DBMS is a software that assists people in managing the database information.And well as its performance
The basic operations of the database systems can be characterized as
- Database Design: This is the way the user describes the real world issue in terms of the DBMS system
- Data Analysis:How a user can address the problem of analyzing information upon a request by the company, and doing it efficiently.
- Concurrency: How the DBMS can handle multiple requests accurately at the same time.

Describing and storing data in the database

Is the issue of translating the real world structure of the business into the data model. Data model is a collection of high-level presented data. The database software allows user to define data which will be stored in DBMS. Most of the databases use the relational data model. Semantic data model is a more complicated mind map of the database needed to tolerate given situation.
The semantic data model is also called the Entity Relationship model

Chapter 1: Relational Model

Data is formatted in tables, where each row represents an entry. For example, in a table of salespeople each row represents information about each particular salesperson. This data includes his/her name, age, address, etc. Such a single row of information is called a record.

The record can be treated as an array of an information, with its components treated as fields.

Relation is rule which relates different records of different tables. For example: records exist in a table of salespeople, as well as in a table of products. So it is possible to assign each salesperson as a specialist in each product. Such an assignment can be characterized as a relation.

A database query is a program which presents particular information from the DBMS upon users request.
A stored procedure is a program which performs data maintenance other than a query.

Relational Model Concepts

Relation is often represented as a table. Each table consists of several rows with many related values. One such row is called a tuple. In formal relational model a table is called table, the column headers are called attributes.

Chapter 2: File Organization and Indexing

Even though the database shows us data in the form of relations we must understand that it is just a logical representation. Finally all the data needs to be stored on the hard disk as files. It is very important in performance point of view that this organization should allow the database software to access data quickly and in an efficient way.

A Primary Storage is that storage that is directly accessible to the CPU and is limited. The data to be processed is usually brought from secondary storage to primary storage as and when needed. It is very fast but size is limited.
A Secondary Storage includes magnetic discs, tapes, optical media etc. In short the mass storage devices.
A Buffering is done to copy data from secondary storage to primary storage. In case the secondary storage is magnetic disk data is copied in units called blocks. Often a block is of same size as a cluster.

Files on Disk

Records

Data is stored in files in the form of records. Records are related type of data. Ex. Details of one employee. However the data types too are important while storing data. The various data types are integer, text, date and time etc. Once they are recorded to a file while retrieving them back we must know their correct data type. With the latest databases we have a huge number of data types. To store binary data like program files and images we have a data type called BLOBs.

Fixed Length v/s Variable Length Records

Files are accessed sequentially hence to retrieve each record we must know its size. If the records are known to be of fixed size the work is very simple. Consider a list of mobile numbers where each number is exactly 10 digits long. Retrieving number will be simple here because we can read 10 characters at a time. But consider a list of names. How do we know the size of names? They can vary. To make matters more complex we can have records of certain number of fixed length entries and some variable length entries. Variable length records can be separated by a delimiter or separator such as a comma ',' or new line. Comma Separated Values or CVS is one such file based database.

   If every record in the file has exactly the same size (in bytes) the file is said to be made up of Fixed length records.

if different records in the file have different sizes,the file is made up of variable length records.

Records on Disk: Spanned v/s Unspanned

Records need to be stored on a disk. Read and Write operations on disk are usually done in disk blocks. If the block size is greater than one record we might record many records in a single block and vice versa. In any case if the block size is not exactly divisible by the size of each records or to put in better terms the number of records that can be stored in a block is not a natural number a part of record will get stored on one block and remaining on some other disk block. In such a case there will be a pointer at the end of the disk block that will point to the next block. Such a organization is called Spanned. If Records are not allowed to cross block boundaries we call them unspanned.

Related materials

School:Computer Science:

Bibliography

Database Management Systems, second edition by Raghu Ramakrishnan(University of Wisconsin) and Johannes Gerkhe (Cornell University)

This course is a part of the computer science program.