Databases/Introduction

What is a database?

edit

A persistent repository of data stored in a computer. The data represent recorded information. By persistent we mean that the data remain available indefinitely, after the software applications that use or create the data are closed, and even when the computer systems on which the data are stored reboot or crash due to software or hardware failures.

A database management system (DBMS) is a software system specifically designed to hold databases. We usually reserve the term database for an information repository maintained in a database management system.

Most database management systems are designed to store relational databases. A relational database maintains its information in the form of a collection of relations. A relation is usually called a database table. Each table has a set of named columns and a set of records (rows). Each row contains values for the columns.

This is used to record information as follows. The columns represent properties; the rows contain the values of these properties. For instance, we may have a table with information about persons, and columns with names such as first name, last name, and date of birth, and each row containing the first name, last name, date of birth, etcetera, of a particular person.

Usually, the information in different tables is related. For instance, to record the place of birth for persons, we may have a column place of birth within the persons table, and to record more information about such places, we can have a second table with properties of such places, for instance, their name and their geographical location. Each value in the place of birth column in the first table will then systematically identify a row in the second table.

Each relational database conforms to a database schema, which describes the tables, their columns, and the relationships between them.

Once we have a database filled with information we want to use that information. Sometimes we need the full contents of tables, but often we only need a selection or a summary of the information in the table, or we want the answer to specific questions. For instance, we may want to know how many persons on which we have information were born before 1968, or how many were born in London, England, or whether there are two persons with the same place of birth, date of birth, first and last name, but a different address. Such a question is known as a query and special database query languages have been developed to allow them to be formulated. By far the most popular query language for relational databases is SQL; nearly all relational database management systems support it, although with some variation among them. SQL can not only answer queries, but also return the contents of tables, and tables constructed as selections of combinations from tables in the database. It can also be used to define and modify tables. All access to a database can be done in SQL.

The properties and design of databases with database management systems are included in the study of Computer Science. While the underlying principles are the same for every relational DBMS, many of the details differ. To really get into this, you may consider downloading and installing the DBMS of your choice to your local machine. (I use and recommend MySQL because it's free and fairly easy to understand). Later on we'll focus on SQL, learning that language to do labs.

MediaWiki, the software that runs Wikiversity, uses the MySQL database within its core to generate the pages you see and to track page histories, recent changes, what links here and many other useful functions. This database also helps authenticate and log the Wikiversity userbase, keeping track of user contributions, watchlists and many other items.

Database tables

edit

A database table can exist in a number of forms from a simple w:flat file database (not more than a spreadsheet) to full-featured SQL databases, Relational databases or object-relational databases. These tables consist of rows of cells called fields. The fields are generally positioned in columns with a top row of column headers. The rows may also contain a column of special fields at the left called row headers. This field is often called a primary key and acts as a sort of "handle" for the whole row. Sometimes rows and columns can "trade places" to form what is called a pivot table.

Table definitions are usually provided by a w:schema which preformats the information to be contained within a particular field. For example, one column may consist of fields conforming to a date format such as dd/mm/yyyy for day/month/year data. Another common example is to require field data that conforms to a monetary value - $0,000.00 for US Dollars or €0.000,00 for Euros. Learning about data types and structures are of primary importance when configuring database management systems and the schemas used in designing and manipulating them.

Data structures

edit

Data structures can be thought of as ways to make sure the correct data goes in its appropriate place within the database schema. Nearly all databases, no matter how sophisticated, are designed and understood in terms of structure. Rows within a table, for example, should all contain the same number of fields, but the number of rows in the table can grow and shrink as needed.

The rows can be ordered alphabetically, numerically or in some other way, allowing the insertion of new rows while maintaining the order. A column should also contain the same data type all the way down, for instance the name fields would be structured according to a uniform method as in Last name, First name M.I or name: Firstname Middleinitial Lastname or similar.

The database programmer may wish to disallow null values in some fields while allowing them in others. At any rate, data structures provide a framework for building large stores of information that can be readily stored and retrieved by the DBMS. The way in which data enters, is modified or presented by the database engine is through a database query, stored procedure and/or a method.

See Data structures for more in-depth study.

Labs

edit

These Labs progress from a simple text-based example to a complex study of the Wikiversity application of MediaWiki, MySQL, and PHP. Learners are asked to participate in other topics for the more advanced labs.

Our first database

edit

Our first database is a Flat file database we shall call Favorites. Most folks are familiar with the Favorites folder or bookmarks file in their browser. Our flat file database is simply a list of URLs with a face string and a rank variable.

First, open your favorite text editor. (Microsoft Windows users can use Notepad. Linux users can use Gedit or similar.). Type Favorites database on the first line and hit enter twice to create a blank line and position your cursor at our top row which will be our header row.

Now on the new line type:

| RANK || URL || FACE

This creates a header row for what can be called a pipe-delimited table. This table schema is compatible with MediaWiki table syntax. Hit enter to start a new line. Type |- and hit enter again. Now type:

| # || url || face

You should now see:

|-
| # || url || face

Now copy and paste these two lines several times below the first two. Save the new document as favorites.txt. You now have a framework for filling your first text-based database which you can convert to a subpage of your Wikiversity userpage if you like. (provided your favorite sites conform to Wikiversity:Policy i.e. relevant, decent, etc.).

Next, use your Web browser (a tabbed browser is recommended) to open a few of your frequently-visited sites. Now copy and paste the url from the address bar of each tab replacing the text url in each row. Then copy and paste or type the preferred text that best identifies the site to replace face. Repeat these for a number of your favorite sites. You can replace the # character to rank your selections. You should now have something like:

| RANK || URL || FACE
|-
| 1 || http://wikiversity.org || Wikiversity
|-
| 2 || http://google.com || Google
|-
| 3 || http://www.gnu.org || Free Software Foundation
|-
| # || url || face

To convert this to a wikitable add:

{| class=wikitable style="text-align:center"
|- bgcolor="Gainsboro"
...your table data...
|}

The resulting table will look like this:

RANK URL FACE
1 http://wikiversity.org Wikiversity
2 http://google.com Google
3 http://www.gnu.org Free Software Foundation
# url face

Simply copy and paste the contents of favorites.txt to your Wikiversity userpage or a subpage to bookmark other sites. For more on Wikitables, see Wiki 101.