CS  ·  Computer Systems

Data Representation: Text

Computer Systems  ·  Lesson 2 of 10 ~45 minutes
Learning intentions
  • Understand why computers need a system to represent text characters
  • Describe how extended ASCII uses 8 bits to represent 256 characters
  • Identify the structure of the ASCII table (non-printable, uppercase, lowercase, symbols)
  • Understand how Unicode extends ASCII to support all world languages
  • Convert between a character and its ASCII decimal value
Success criteria
  • I can explain what ASCII stands for and how it works
  • I can state that extended ASCII uses 8 bits and represents 256 characters
  • I can identify where uppercase letters, lowercase letters and non-printable characters sit in the ASCII table
  • I can explain one limitation of ASCII that led to Unicode
  • I can decode a short ASCII message given a table of values
Warm up — what do you already know?

Answer before the lesson begins. These check prior knowledge — it's fine if you're unsure.

1. How many different values can be represented using 8 bits?

2. What is the binary number 01000001 in denary?

3. What does the term "bit" stand for?

Key vocabulary

ASCII
American Standard Code for Information Interchange — a system that assigns a unique number to each character.
Extended ASCII
An 8-bit version of ASCII representing 256 characters (codes 0–255).
Character
A single letter, digit, punctuation mark or symbol — anything you can type on a keyboard.
Character set
The complete list of characters a system can represent, each mapped to a unique number.
Non-printable character
ASCII codes 0–31; control signals sent to devices rather than displayed on screen (e.g. newline, beep).
Unicode
A modern character encoding standard using 16 bits, capable of representing 65,536+ characters from all world languages.
Encoding
The process of converting data (e.g. a character) into a binary representation for storage or transmission.
Bit depth
The number of bits used to represent a single unit of data — here, one character (8 bits in ASCII).

Data Representation: Text

Why do computers need a code for text?

Computers store everything as binary — 1s and 0s. Numbers can be represented directly in binary, but characters (letters, punctuation, symbols) have no natural binary equivalent. For a computer to store the letter 'A', it must be agreed in advance what binary pattern represents it. Without a shared standard, a file saved on one computer might produce garbled characters when opened on another. This is why character encoding standards were developed — they create a universal agreement between all computers about which binary number maps to which character.

ASCII — the original standard

ASCII (American Standard Code for Information Interchange) was developed in the 1960s and became the dominant early standard. Original ASCII used 7 bits, providing 128 possible values (0–127). Extended ASCII added an eighth bit, doubling capacity to 256 characters (0–255). Each character is assigned a unique code number, and the computer stores that number in binary.

The 256 characters break into key ranges:

  • Codes 0–31: Non-printable control characters. These don't display on screen — they send instructions to devices. Code 7 triggers a beep, code 10 is a newline, code 13 is a carriage return.
  • Code 32: The space character — the first printable character.
  • Codes 48–57: The digits 0–9.
  • Codes 65–90: Uppercase letters A–Z.
  • Codes 97–122: Lowercase letters a–z.
  • Codes 33–47, 58–64, 91–96, 123–127: Punctuation marks and symbols.

A key pattern to notice: 'A' is 65 and 'a' is 97 — a difference of exactly 32. This consistent offset means software can convert between upper and lowercase by simple arithmetic, without needing a lookup table.

ASCII quick-reference (selected characters)
Decimal Character Notes
65AFirst uppercase letter
66B
67C
68D
69E
70F
90ZLast uppercase letter
97aFirst lowercase letter (= 65 + 32)
98b
99c
100d
101e
102f
122zLast lowercase letter (= 90 + 32)
32spaceFirst printable character
33!Exclamation mark
480Digit zero (not binary zero)
579Digit nine

How text is stored in memory

When you type the word "Hi!" on a keyboard, the computer stores three separate ASCII values: 72 (H), 105 (i), 33 (!). Each takes 8 bits (1 byte) of storage. A 100-character text file therefore requires 100 bytes of storage — before any formatting. Every space, every punctuation mark, every character you can type has its own unique ASCII code. The computer never stores the letter itself — only the number that represents it.

Unicode — solving ASCII's limitations

Extended ASCII was created before computing was truly global. Its 256 character slots are sufficient for English and some Western European languages, but completely inadequate for Chinese (70,000+ characters), Arabic, Japanese, emoji, and many other scripts. Additionally, different companies assigned different characters to the 128–255 slots, creating compatibility problems between systems.

Unicode solves this by using 16 bits per character, providing 65,536 possible values — enough for every major language on Earth, plus thousands of symbols and emoji. Unicode is now the global standard; when you send a text message containing an emoji, Unicode is the encoding making that possible. Crucially, Unicode is backwards-compatible with ASCII — the first 128 Unicode values are identical to ASCII, so existing English text files need no conversion.

File size implications

Because every character is stored as 8 bits (1 byte) in ASCII, calculating the storage requirement of a text file is straightforward: number of characters × 8 bits. A 1,000-character essay would need 1,000 bytes = 8,000 bits of storage. In Unicode (16 bits per character), the same essay would need 2,000 bytes — double the storage, but with the ability to represent any character from any language on the planet. This trade-off between storage size and language coverage is a key concept in data representation.

Worked examples

Example 1 — Decoding ASCII values to text

ASCII values given: 72, 101, 108, 108, 111. What word do they spell?

1
Look up 72 in the ASCII table → H
2
Look up 101 → e
3
Look up 108 → l
4
Look up 108 → l (same code, same character)
5
Look up 111 → o
6
Read all characters in order: "Hello". Note: the SQA always provides the ASCII table — you do not need to memorise values.
Example 2 — Encoding text to ASCII decimal values

Convert the word "Cat" to its ASCII decimal values.

1
C is uppercase. Uppercase letters start at 65 (A=65, B=66, C=67). So C → 67.
2
a is lowercase. Lowercase letters start at 97 (a=97). So a → 97.
3
t is lowercase. Count from a=97: b=98, c=99 … t is the 20th letter, so 97 + 19 = 116. So t → 116.
4
Result: 67, 97, 116.
Example 3 — Calculating storage in bits and bytes

How many bits are needed to store the word "Computing" in extended ASCII?

1
Count the characters: C-o-m-p-u-t-i-n-g = 9 characters.
2
Extended ASCII uses 8 bits (1 byte) per character.
3
9 × 8 = 72 bits (equivalently, 9 bytes).
Example 4 — Using the uppercase/lowercase offset

The ASCII code for 'A' is 65. What is the ASCII code for 'a'? And if 'Z' is 90, what is 'z'?

1
The offset between any uppercase letter and its lowercase equivalent is always 32.
2
'a' = 65 + 32 = 97. ✓
3
'z' = 90 + 32 = 122. ✓ This pattern holds for every letter A–Z / a–z.
Now you try

An ASCII message reads: 87, 101, 108, 108, 32, 100, 111, 110, 101, 33

Use the partial reference table below to decode it:

33=!  |  32=space  |  87=W  |  100=d  |  101=e  |  108=l  |  110=n  |  111=o

Decode each value in order and write the complete message.

Working:

  • 87 → W
  • 101 → e
  • 108 → l
  • 108 → l
  • 32 → [space]
  • 100 → d
  • 111 → o
  • 110 → n
  • 101 → e
  • 33 → !

Answer: "Well done!"

Common mistakes
"ASCII stores letters directly in binary." ASCII stores a number; that number is then stored in binary. The letter 'A' → decimal 65 → binary 01000001. There is an extra step between the character and the binary pattern.
"Extended ASCII can only represent 128 characters." Standard (7-bit) ASCII gives 128 characters. Extended ASCII uses 8 bits = 28 = 256 characters. Always check whether a question means standard or extended.
"You need to memorise the ASCII table." The SQA always provides an ASCII/character table in the exam. What is tested is your ability to use the table correctly and to explain the concept — not memory recall.
"Unicode completely replaces ASCII." Unicode is backwards-compatible with ASCII for the first 128 values — standard English characters share identical codes in both systems. Unicode extends ASCII, it does not break it.
Exam tip

In N5 exam questions, you will always be given a reference table to decode ASCII. The skill being tested is your ability to use the table correctly and to explain the concept — not memorisation. Practise converting in both directions: decimal → character and character → decimal. Also be ready to calculate storage: number of characters × 8 bits (ASCII) or × 16 bits (Unicode). Watch out for spaces — a space is ASCII code 32, and it counts as a character when calculating file size.

Task Set

Questions 1–5 are auto-checked. Questions 6–10 are self-marked — write your answer, then reveal the model answer to check your work.

1. What does ASCII stand for? TYPE 1

2. How many characters can extended ASCII represent? TYPE 1

3. The ASCII code for 'A' is 65. What is the ASCII code for 'a'? TYPE 1

4. How many bits does extended ASCII use to represent each character? TYPE 1

5. ASCII codes 0–31 are known as: TYPE 1

6. Using the ASCII table, decode the following values: 78, 53, 32, 67, 83 TYPE 2

Reference: 32=space, 51=3, 53=5, 67=C, 78=N, 83=S

78 → N, 53 → 5, 32 → [space], 67 → C, 83 → S
Answer: "N5 CS"

7. A pupil types the name "Anya" into a text box. How many bits are needed to store this name in extended ASCII? Show your working. TYPE 2

"Anya" contains 4 characters.
Each character = 8 bits (1 byte).
4 × 8 = 32 bits (4 bytes).

8. Explain why Unicode was developed to replace ASCII as the global standard. TYPE 2

ASCII only supports 256 characters, which is sufficient for English but not for other world languages such as Chinese, Arabic or Japanese, which have thousands of characters. Unicode uses 16 bits per character (65,536 possible values) and can represent characters from all major world languages as well as symbols and emoji. It is also backwards-compatible with ASCII for the first 128 values.

9. A text file contains 500 characters. (a) How many bytes of storage does it require using ASCII? (b) How many bytes would it require if stored using Unicode (16-bit)? (c) Explain the trade-off in using Unicode instead of ASCII. TYPE 3

(a) ASCII: 500 × 1 byte = 500 bytes.
(b) Unicode (16-bit): 500 × 2 bytes = 1,000 bytes.
(c) Unicode can represent characters from all world languages and scripts, making it globally inclusive. However, it uses twice the storage per character compared to ASCII, so files take up more space.

10. Which of the following is a reason Unicode was developed? TYPE 1

Teacher notes — Shift+T to hide

Suggested timing: ~45 minutes. Warm up 8 min; notes + ASCII table 15 min; worked examples 8 min; now you try 5 min; task set 9 min.

  • Pupils often confuse 7-bit ASCII (128 chars) and 8-bit extended ASCII (256 chars). Emphasise the doubling rule — each extra bit doubles the number of possible values.
  • The uppercase/lowercase offset of 32 is a nice pattern to highlight — it sometimes appears directly in past paper questions ("what is the code for 'b' if 'B' is 66?").
  • If time allows, have pupils encode their own first name into ASCII decimal values using the reference table — effective consolidation and immediately personal.
  • Unicode depth: for N5, pupils only need to know it uses 16 bits and supports more characters than ASCII. The term "UTF-8" is not required at this level.
  • CS2 leads directly into CS3 (graphics). The underlying theme throughout CS1–CS4 is: all data is ultimately stored as binary numbers, whether it's integers, text, or images.
  • SQA command words covered in task set: identify, state, explain, calculate, describe.