DDD  ·  Database Design & Development

Data Dictionary and Validation

Lesson DDD4 of 10 Approx 50 min Design data validation rules to maintain data quality

Learning Intentions

  • Understand what a data dictionary is and why databases need one
  • Identify the five types of validation: presence, restricted choice, field length, range, foreign key
  • Create a complete data dictionary with all required fields and validation rules
  • Explain how validation enforces data quality and referential integrity

Success Criteria

  • You can name all five validation types with a real-world example for each
  • You can create a complete data dictionary for a multi-entity database
  • You can identify which validation type to apply to each attribute
  • You can explain why validation prevents errors and protects data integrity
🔥 Warm-up
Before we start, test what you already know.

1. What is a data dictionary? TYPE 1

2. Which validation type ensures a field is not left blank? TYPE 1

3. What does a foreign key validation check? TYPE 1

Key vocabulary

Data Dictionary
A document describing the structure of a database: all tables, fields, data types, constraints, and validation rules.
Validation
Rules enforced to ensure data entered into a database is valid, accurate, and complete before storage.
Presence Check
Validation rule that ensures a field is not left empty; the field must contain a value.
Restricted Choice
Validation rule that limits entries to a predefined set of allowed values (e.g., Grade: A, B, C, D, E only).
Field Length
Validation rule that limits how many characters can be entered into a text field.
Range Validation
Validation rule that ensures a numeric or date value falls within acceptable minimum and maximum limits.
Foreign Key
A field in one table that references the primary key of another table, enforcing referential integrity.
Referential Integrity
A database constraint ensuring foreign key values match existing primary key values in related tables.
Data Type
The category of data a field can store: Text, Integer, Decimal, Date, Boolean, etc.
Attribute
A column in a table; a property or characteristic of an entity.

Data Dictionary and Validation

What is a Data Dictionary?

A data dictionary is a document (often a table) that describes every aspect of a database's structure. It answers: What tables exist? What columns does each table have? What type of data does each column hold? What rules prevent bad data from being entered?

A data dictionary is created during the Design phase and updated as the database evolves. It serves as:

  • Documentation: Future developers understand the database without reading code.
  • Specification: Database developers know exactly what to build.
  • Quality assurance: Testers know what validation rules to test.
  • Audit trail: If data looks wrong, the data dictionary shows what rules should have prevented it.

Components of a Data Dictionary

A complete data dictionary includes columns for:

  • Entity: The table name (e.g., Student, Course, Enrollment).
  • Attribute: The field/column name (e.g., StudentID, FirstName, DateOfBirth).
  • Key Type: Is this a Primary Key (PK), Foreign Key (FK), or neither?
  • Data Type: Text, Integer, Decimal, Date, Boolean, etc.
  • Size: For text, max length (e.g., VARCHAR(50)). For numbers, precision (Integer 0–32,767).
  • Validation Rule: What constraints apply? (e.g., "Presence check," "Range 0–100," "Restricted to A–F").

The Five Validation Types

1. Presence Check

A presence check ensures a field is not left blank. If a field is mandatory, the database rejects any record where that field is empty.

Example: StudentID is a primary key, so it must have a value. If someone tries to add a student without an ID, the database rejects the entry.

  • Rule: StudentID must not be NULL
  • Error: "Student ID is required"

2. Restricted Choice

A restricted choice validation limits entries to a predefined set of allowed values. The field only accepts values from a list.

Example: A Grade field should only accept A, B, C, D, E, or F. Entering "X" is rejected.

  • Rule: Grade must be one of {A, B, C, D, E, F}
  • Error: "Grade must be A, B, C, D, E, or F"

3. Field Length Validation

Field length validation limits how many characters can be entered into a text field. This prevents data that's too long for the field.

Example: A FirstName field is defined as VARCHAR(30), meaning up to 30 characters. A 9-character name is accepted. A 50-character name is rejected.

  • Rule: FirstName must be ≤ 30 characters
  • Error: "First name cannot exceed 30 characters"

4. Range Validation

Range validation ensures numeric or date values fall within acceptable limits. Test scores must be 0–100, dates must be between 2020 and 2030, ages must be 0–120.

Example: A TestScore field should only accept 0–100. Entering –5 or 150 is rejected.

  • Rule: TestScore must be ≥ 0 AND ≤ 100
  • Error: "Test score must be between 0 and 100"

5. Foreign Key Validation

Foreign key validation is the most powerful. It ensures a value in one table exists as a primary key in another table. This enforces referential integrity.

Example: An Enrollment table has a StudentID field referencing the Student table. Before allowing enrollment for StudentID=55, the database checks: does StudentID 55 exist in Student? If not, enrollment is rejected.

  • Rule: StudentID in Enrollment must match a StudentID in Student
  • Error: "That student does not exist"

Why Validation Matters

Validation prevents garbage-in-garbage-out (GIGO). Without validation:

  • Users might leave required fields blank, creating incomplete records.
  • Typos might be entered (e.g., Grade="X" instead of "A").
  • Out-of-range values might be stored (e.g., Age=–5).
  • Orphaned records might exist (e.g., enrollment for a deleted student).
  • Reports and analysis become unreliable because data is inconsistent.

Validation catches errors before they're stored, ensuring data quality. This is much cheaper than fixing bad data later.

Worked examples

Example 1: Complete Data Dictionary for a Library System

Scenario: Design a data dictionary for a library that tracks books, borrowers, and loans.

1
Identify entities: Book, Borrower, Loan
2
Identify attributes: Book (ISBN, Title, Author, YearPublished, Copies); Borrower (BorrowerID, FirstName, LastName, MembershipDate); Loan (LoanID, BorrowerID FK, ISBN FK, DateBorrowed, DueDate)
3
Create data dictionary with validation rules for each attribute.
Example 2: Applying Validation to a Student Grading System

Scenario: Design validation rules for a Grades table with GradeID, StudentID (FK), CourseID (FK), Mark, Grade, DateRecorded.

1
GradeID (Primary Key): Presence check, Auto-increment
2
StudentID (Foreign Key): Must match StudentID in Student table (prevents grades for non-existent students)
3
CourseID (Foreign Key): Must match CourseID in Course table
4
Mark (Numeric): Range 0–100 (prevents nonsensical values like –5 or 150)
5
Grade (Letter): Restricted choice {A, B, C, D, E, F} (prevents typos like "X")
6
DateRecorded (Date): Presence check, Range ≤ today (can't record future grades)
Example 3: Preventing Orphaned Records with Foreign Key Validation

Scenario: A Booking table has CustomerID referencing Customer table. What happens without and with foreign key validation?

1
Without foreign key validation: Admin deletes customer 123. Now the booking for CustomerID=123 is orphaned—it references a non-existent customer. Reports fail.
2
With foreign key validation: Database rejects deletion: "Cannot delete customer 123: 3 bookings reference this customer." Admin must delete bookings first, or use cascade delete to delete both.
3
Result: Foreign key validation prevents orphaned records and maintains data integrity.
Now you try

Design validation rules for a Hospital Appointment System with Patient, Doctor, and Appointment tables.

Attributes to validate: PatientID, FirstName, LastName, DateOfBirth, PhoneNumber (11 chars), DoctorID, Speciality, AppointmentDate, AppointmentTime

Answer:

  • PatientID: Presence check, Primary key, Auto-increment
  • FirstName/LastName: Presence check, Field length ≤ 50
  • DateOfBirth: Presence check, Range (0–120 years old)
  • PhoneNumber: Presence check, Field length = 11
  • DoctorID (FK): Foreign key → Doctor.DoctorID
  • Speciality: Presence check, Restricted choice (Cardiology, Surgery, etc.)
  • AppointmentDate: Presence check, Range ≥ today
  • AppointmentTime: Presence check, Restricted choice (09:00, 10:00, 14:00, etc.)
⚠️ Common mistakes
Confusing data type with validation: "The field is Text" is a data type. "Field length ≤ 50" is validation. A data dictionary must include both.
Forgetting presence checks on required fields: If a field is mandatory (like StudentID), it must have a presence check. Without one, a record can be created with blank StudentID.
Not validating foreign keys: A foreign key should always reference a related table's primary key. Without it, you can create records for non-existent entities.
Backwards range validation: Students sometimes write "Range 0" and "Range 100" separately. Write "Range 0–100" or "0 ≤ Mark ≤ 100" for clarity.
Over-validating: "Field length ≤ 1000000" for a name field is overkill. Use reasonable limits (50 characters for names).
💡 Exam tip

Past paper question: "Design a data dictionary for a Hotel Booking system with Customer, Room, and Booking tables. Include validation rules."

Strong answer: Create a table with Entity, Attribute, Key Type, Data Type, Size, and Validation. Include at least one of each validation type. Explain why each validation is necessary. Weak answers list attributes without validation; strong answers explain the rules.

Task set

1. Which validation type prevents a user from leaving a field blank? TYPE 1

2. A field is defined as VARCHAR(25). Which validation does this enforce? TYPE 1

3. What does a foreign key validation ensure? TYPE 1

4. Which is NOT a standard validation type? TYPE 1

5. Describe what a data dictionary is and explain why databases need one. TYPE 2

Write 2-3 sentences.

6. A Mark field should only accept 0–100. What validation type applies and how would you specify it? TYPE 2

Write 1-2 sentences.

7. Explain how foreign key validation prevents orphaned records. TYPE 2

Write 2-3 sentences.

8. Design a data dictionary for a Cinema Booking system with Movie, Screen, and Booking tables. Create a table showing Entity, Attribute, Key Type, Data Type, and Validation. Include at least one of each validation type (10+ attributes). TYPE 3

Create a complete table.

9. A university database stores Student, Course, and Enrollment records. For each attribute (StudentID, CourseCode, EnrollmentDate, Grade), identify appropriate validation rules and explain why each is necessary. TYPE 3

Write 6-8 sentences covering all four attributes.

10. Compare a database WITH foreign key validation to one WITHOUT, using a real-world example. Explain consequences of missing validation (data corruption, orphaned records). TYPE 3

Write 6-8 sentences with a specific example.

Teacher notes — Shift+T to hide

Lesson duration: 50–60 minutes. Work through examples step-by-step with pupils.

Key points:

  • Data dictionary is a specification, not just a list. Every field needs validation.
  • Validation is a design decision, not an afterthought.
  • Foreign key validation is most powerful—it maintains referential integrity.
  • Validation prevents garbage-in-garbage-out (GIGO).

Common misconceptions: Students confuse data type with validation. Some think foreign keys are optional. Underestimate complexity of validation rules.

Past paper tips: Q5 asks students to design a data dictionary. Weak answers list attributes without validation; strong answers create clear tables with rules.