Dealing with structured data

Since the 1970's database systems have been based on relational principles, (otherwise referred to as SQL databases, the language used to access the information). The simple example we have been looking at so far was designed as a set of relational tables, But other types of database are  becoming popular collectively known as NOSQL databases. The most commonly used is a document database    Let's look at the same very simple application using both approaches to identify the differences.

In the relational database example (above left), the structure of the records is restrained by four rules:

  1. The data is always stored in 'flat' records in a table. This is like a spreadsheet. There is no structure to the record. 
  2. Every record (called a 'row' for obvious reasons) has the same fields in the same order.  This is pre-defined for the database in a 'schema'.
  3. There are many exam results for each subject, so they have to be stored in a separate table. We referred earlier to the need to ensure that you can't delete the parent table (subjects) if there are child records (exam papers) that link to it.  This issue is called referential integrity.
  4. Before entering data you have to create a 'schema' that defines what data is stored in each record.

The process of reducing the data model to individual tables is called normalisation.  This model has served the industry well for half a century. 

In the document database example (above right) a lot is new. Data is stored in records called 'documents'. 

  1. The documents are structured, so if you want to store the date as day/month/year you define a date group and then day/month/year below it. You can't do that with a relational database and you end up with structured field names; start-day start-month etc..
  2. Instead of links from the papers to the subjects, we can list the papers in the subject record.  There is no limit to the number of papers you could store in this way.  This is sometimes called denormalized data and can improve performance. 
  3. It wouid be perfectly possible to incorporate all of the papers information within the subject record. In that case the referential integrity issue dissapears because everything is in one record.   In this example we allow for the same paper to be used for different subjects.
  4. Different documents don't have to have the same information. If a subject needs different information to the rest, then it can be included. There is no fixed schema.

We have used the system to model the same problem solved using each of these features:

  1. A relational model
  2. Structured data
  3. Variable record content

Next: The relational model