- Understand what a database is and why databases are used to store data
- Identify key database structures: tables, records, and fields
- Explain the role of primary keys and foreign keys in a relational database
- Recognise common data types: TEXT, INTEGER, REAL, and NULL
- I can define the terms: database, table, record, and field
- I can explain what a primary key is and why every table needs one
- I can explain what a foreign key is and how it links two tables
- I can identify the correct data type for a given piece of data
- I can give one advantage of a relational database over a flat-file spreadsheet
Download films.sql and import it into DB Browser for SQLite before the warm-up. If you haven't set up DB Browser yet, follow the setup guide first.
Answer before the lesson begins. These check prior knowledge — it's fine if you're unsure.
1. You have used spreadsheets before. In a spreadsheet each row holds data about one thing. What is a single row of data called in a database?
2. Which of the following is not a real-world example of a database?
3. You imported films.sql into DB Browser for SQLite before this lesson. How many tables does the films database contain?
Key vocabulary
last_name).director_id.Database Concepts
What is a database?
A database is an organised collection of structured data stored electronically. Unlike a document or a spreadsheet, a database is designed so data can be stored, searched, sorted, and updated with precision and speed. Every digital service you use depends on databases: streaming a film, logging into a school app, searching for a product online — all of these involve database queries happening in the background.
A Database Management System (DBMS) is the software that controls access to the database. It handles storing data on disk, running queries, preventing two users from corrupting the same record at once, and enforcing rules about what data is allowed. Examples include SQLite, MySQL, and PostgreSQL. In this course we use SQLite via the free tool DB Browser for SQLite.
Tables, records, and fields
Data in a relational database is stored in tables. Each table stores information about one type of entity — a person, a product, a film, a director. You can think of a table like a spreadsheet grid, but with stricter rules:
- Each column is a field — it stores one piece of information about every record (e.g.
last_name,nationality). - Each row is a record — it holds all the information about one specific item (e.g. one director).
- Every cell in a table holds exactly one value (or NULL).
Our films database has two tables. The director table has 15 records and 4 fields. The film table has 52 records and 8 fields. Each table stores a different type of entity, and the two tables are linked together using a key.
Primary keys
Every table must have a primary key (PK) — a field whose value uniquely identifies each record. A primary key must satisfy three rules:
- Unique — no two records in the table can have the same value.
- Not NULL — every record must have a value; it can never be missing.
- Stable — ideally the value never changes once it has been assigned.
In the director table, the primary key is director_id. Every director is assigned a unique integer (1, 2, 3…). Using a number is more reliable than using a name: two different directors could share the same surname, but they can never share the same ID.
Foreign keys
A foreign key (FK) is a field in one table whose value refers to the primary key of another table. It creates a relationship between the two tables.
In the film table, the field director_id is a foreign key. For the film The Dark Knight, this field contains the value 2. Looking that value up in the director table tells us that director 2 is Christopher Nolan — so The Dark Knight was directed by Christopher Nolan.
Without foreign keys, we would have to copy Nolan's name, nationality, and all other details into every one of his films. Storing the same data in multiple places is called data redundancy: it wastes space, and if the data needs to change, it must be updated in every row — which is error-prone. Foreign keys eliminate this by storing director details once and linking to them.
Data types
Every field has a data type that defines what kind of value it can store. SQLite uses four main types:
- TEXT — any sequence of characters (e.g. a film title, a genre label, a certificate like "12A").
- INTEGER — whole numbers only, positive or negative (e.g. a year, a runtime in minutes, an ID).
- REAL — decimal numbers (e.g. a rating of 8.4).
- NULL — a special marker meaning the value is missing or unknown. It is not the same as zero or an empty string.
Choosing the right type matters. Storing a year as TEXT prevents the database from doing arithmetic on it (e.g. finding all films made after 2000 with year > 2000). Storing a rating as INTEGER would silently truncate the decimal — 8.4 would become 8.
NULL values
Some fields are allowed to be NULL, meaning the data simply was not recorded. In our films database, runtime_mins can be NULL — four films are missing this information. This is intentional: it teaches you how to handle missing data in queries.
NULL is not the same as zero (which would mean "zero minutes long") and not the same as a blank text string. It means the value is genuinely unknown. When querying for NULL values, you cannot use = NULL — you must use IS NULL or IS NOT NULL. (This will be covered in a later SQL lesson.)
Why use a database instead of a spreadsheet?
Spreadsheets work well for small, simple datasets handled by one person. Databases offer important advantages at scale:
- No data redundancy — data is stored once and linked by foreign keys, not duplicated in every row.
- Data integrity — the DBMS enforces rules (e.g. a foreign key value must exist in the related table), preventing invalid entries.
- Scalability — databases handle millions of records efficiently; large spreadsheets slow to a crawl.
- Multi-user access — many users can read and write simultaneously without corrupting data.
- Powerful queries — SQL lets you search, filter, sort, and join across multiple tables in ways spreadsheet formulas cannot match.
Worked examples
This is an extract from the director table in the films database:
| director_id | first_name | last_name | nationality |
|---|---|---|---|
| 1 | Steven | Spielberg | American |
| 2 | Christopher | Nolan | British |
| 3 | Ridley | Scott | British |
director. It stores information about film directors.director_id, first_name, last_name, nationality. Each field stores one type of information.last_name field in record 1. Every record has a value for every field (unless the field allows NULL).Below is one record from each table, showing how they are linked:
| director | |||
|---|---|---|---|
| director_id PK | first_name | last_name | nationality |
| 2 | Christopher | Nolan | British |
| film | |||||||
|---|---|---|---|---|---|---|---|
| film_id PK | title | year | certificate | runtime_mins | rating | genre | director_id FK |
| 9 | The Dark Knight | 2008 | 12A | 152 | 9.0 | Action | 2 |
director table, director_id is the primary key. It is unique for every director and is never NULL.film table, film_id is the primary key. Every film has its own unique ID.director_id column in film is a foreign key. Its value (2) matches Christopher Nolan's director_id in the director table.For each field in the film table, the correct data type and the reasoning:
| Field | Example value | Data type | Reason |
|---|---|---|---|
film_id | 9 | INTEGER | Whole number ID — no decimal needed. |
title | The Dark Knight | TEXT | A sequence of characters — any length. |
year | 2008 | INTEGER | Whole number — allows maths like year > 2000. |
certificate | 12A | TEXT | A label (PG, 15, 18) — not used in arithmetic. |
runtime_mins | 152 | INTEGER (nullable) | Whole minutes; some records are NULL. |
rating | 9.0 | REAL | Decimal — INTEGER would truncate 8.4 to 8. |
genre | Action | TEXT | A category label — stored as text. |
director_id | 2 | INTEGER | Whole number FK matching director.director_id. |
Look at this extract from the film table, which includes a record with a NULL value:
| film_id | title | year | certificate | runtime_mins | rating | genre | director_id |
|---|---|---|---|---|---|---|---|
| 22 | Big Fish | 2003 | PG | 125 | 8.0 | Fantasy | 5 |
| 23 | Corpse Bride | 2005 | PG | NULL | 7.3 | Animation | 5 |
For this extract, answer the following:
- What is the name of the table?
- How many records are shown? How many fields does this table have?
- Which field is the primary key? Give a reason.
- Which field is a foreign key? What table does it link to, and what does the value 5 tell us?
- Why does
runtime_minsshow NULL for Corpse Bride? What does NULL mean in this context?
- Table name:
film - Records shown: 2. Fields: 8 (
film_id,title,year,certificate,runtime_mins,rating,genre,director_id). - Primary key:
film_id— every film has a unique integer ID that is never NULL. No two films share the same ID. - Foreign key:
director_id. It links to thedirector_idfield in thedirectortable. The value 5 tells us both films were directed by the director with ID 5 — Tim Burton. - NULL in runtime_mins: The runtime for Corpse Bride was not recorded in this database. NULL means the value is missing or unknown — it does not mean the film is zero minutes long.
last_name). A record is a row — it is all the data about one specific item (e.g. one director). Fields run across the top; records run down the side.
runtime_mins is not a film with zero runtime — it is a film where the runtime was not recorded. NULL + 1 = NULL, not 1. This distinction matters in every SQL query involving NULL.
director_id in director). The foreign key is a copy of that value in another table (e.g. director_id in film). Same field name — different tables, different roles.
isbn (for books) or email could also serve as a primary key if every record is guaranteed to have a unique value.
The exam often asks you to "identify a suitable primary key and justify your choice." To earn full marks you must state both the field name and give a reason. A strong justification covers two points:
- The value is unique for every record — no two records share it.
- The value is never NULL — every record will always have one.
Example answer: "director_id is a suitable primary key because each director is assigned a unique ID number that is never empty, so every record can be individually identified."
Saying only "it identifies the record" will not score the justification mark. You must reference uniqueness and/or not NULL.
Questions 1–5 are auto-checked. Questions 6–9 are self-marked — write your answer, then reveal the model answer to check your work.
1. In a database table, what is a single column that stores one attribute for every record called? TYPE 1
2. The film table stores a film's rating, e.g. 8.4. Which data type is correct for this field? TYPE 1
3. The film table has a director_id column that holds the ID of the director of each film. What type of key is this? TYPE 1
4. Some records in the film table show NULL in the runtime_mins field. What does NULL mean? TYPE 1
5. Which of the following is not a requirement for a primary key? TYPE 1
6. Describe the difference between a primary key and a foreign key. Use the director and film tables as examples in your answer. TYPE 2
director table, director_id is the primary key — every director has a different ID number that is never empty.A foreign key is a field in one table that links to the primary key of another table. In the
film table, director_id is a foreign key — its value matches a director_id in the director table, telling us which director made each film. The same field name appears in both tables but plays different roles.
7. Give two advantages of storing film and director data in a relational database rather than a single flat-file spreadsheet. TYPE 2
1. No data redundancy — director details are stored once in the
director table and linked by a foreign key, rather than repeated in every row of the film table. If a director's details change, they only need to be updated in one place.2. Data integrity — the DBMS enforces rules (e.g. a
director_id in the film table must match an existing director), preventing invalid or inconsistent data from being entered.3. Scalability — databases handle very large numbers of records efficiently; a large spreadsheet becomes slow and difficult to manage.
4. Powerful queries — SQL allows complex searching, sorting, filtering, and joining across multiple tables in ways that spreadsheet formulas cannot match.
8. Open the director table in DB Browser for SQLite. Write down: (a) the name of the primary key field, (b) the data type of each field, and (c) the total number of records in the table. TYPE 3
director_id(b) Data types:
director_id — INTEGERfirst_name — TEXTlast_name — TEXTnationality — TEXT(c) Total records: 15
9. In the film table, four records have NULL in the runtime_mins field. (a) Name one of those films. (b) Explain what NULL means in this context. (c) Explain why a primary key can never be NULL. TYPE 3
(b) NULL means the runtime for that film was not recorded in the database. It does not mean the film is zero minutes long — it means the value is missing or unknown.
(c) A primary key must uniquely identify every record. If a primary key were NULL, there would be no way to tell records apart — two records with NULL primary keys would be indistinguishable. The uniqueness guarantee would be broken, making it impossible to link records from other tables reliably using a foreign key.
Suggested timing: 60 minutes. Warm up + vocab 10 min; notes with live DB Browser demo 20 min; worked examples 10 min; now you try 5 min; task set 15 min.
Practical setup: Pupils should have already imported films.sql into DB Browser for SQLite before this lesson begins. Tasks 8 and 9 require the database to be open. Warm-up Q3 is a quick diagnostic — any pupil who answers 1 table probably hasn't imported the file yet.
Key misconception to address: NULL ≠ 0. A useful provocation: ask "what is NULL + 1?" The answer is NULL, not 1. Ask pupils why that might matter for a query that totals runtimes.
Live demo suggestion: Open DB Browser, browse the film table, and point out the four NULL values in runtime_mins. Show the schema tab to discuss data types.
Extension question: Could last_name in the director table ever serve as a primary key? (No — two directors could share a surname. This reinforces why integer IDs are used.)
SQA command words covered: "identify", "describe", "explain". Remind pupils that for "explain" they need a reason, not just a definition. For "describe" they need more than one point about the topic.