Computer Science Core Concepts: Data, Systems, and Networks
💾 Data Representation Fundamentals
Binary Coded Decimal (BCD) Benefits
BCD is a method to represent decimal numbers in binary form, where each decimal digit is represented by a fixed number of bits, usually four.
Benefits of BCD include:
- Straightforward conversion between BCD and **decimal (base 10)**.
- Less complex to encode and decode for programmers.
- Easier for digital equipment to use BCD to display information.
- Can represent monetary values exactly.
Applications of BCD:
- Electronic displays (e.g., calculators, digital clocks) - easier conversion between decimal and BCD when only individual digits need to be shown.
- Storage of date and time in PC BIOS - easier conversion with decimal values.
Hexadecimal Applications
Hexadecimal is used in:
- MAC addresses.
- HTML color codes.
- Memory addresses in assembly language and machine code.
ASCII Character Representation
Each character has a unique code, and the character is replaced by its corresponding code. Codes are stored in the same order as in the word.
Character Sets: ASCII, UNICODE, Extended ASCII
A character set is all of the characters that the computer can represent or use. Each character has a corresponding unique binary number.
Character Set Similarities
- All can use 8 bits.
- ASCII is a subset of Unicode and Extended ASCII.
- Each represents characters using a unique code.
Character Set Differences
- Unicode can represent multiple languages and a wider range of characters than ASCII.
- ASCII uses 7 bits, Extended ASCII uses 8 bits, and UNICODE typically uses 16 bits or more.
🖼️ Multimedia Data
Graphics
Bitmap Graphics
A bitmap graphic is made up of pixels, each of a single color (with each color having a unique binary value). It is stored as a sequence of binary numbers (storing the binary value of each pixel).
Characteristics:
- Prone to **pixelation** when enlarged.
- Larger file size because data is stored for each pixel.
- Can be compressed significantly.
- More difficult to edit—each pixel needs to be edited separately.
Key Bitmap Terms
- **Pixel:** The smallest addressable element in an image.
- **File Header:** Stores metadata about the bitmap image (e.g., color depth, image resolution, file type, compression type, dimensions, file size).
- **Image Resolution:** Total number of pixels in an image (number of pixels wide × number of pixels high). Increasing resolution means more pixels are stored, resulting in a sharper image with less pixelation.
- **Bit Depth / Color Depth:** The number of bits used to represent each color (bits per pixel). Determines the number of colors that can be represented. An increase in bit depth means the image has a greater range of colors and is closer to the original, but it leads to an increased file size.
Vector Graphics
A vector graphic stores a set of instructions about how to draw the shape.
Characteristics:
- Does not pixelate when scaled or enlarged.
- Individual components of the image can be edited easily.
- Smaller file size because it contains just instructions (mathematical formulas).
- Does not compress well because it has little redundant data.
Key Vector Terms
- **Drawing Object:** A component of a vector graphic created using a formula or command.
- **Drawing Property:** Contains data about the shapes and defines aspects of the appearance of a drawing object (e.g., color, line thickness).
- **Drawing List:** The list of shapes that make up an image, storing commands required to draw each object and its attributes.
Vector Graphic Representation:
- Encoded as a series of geometric shapes.
- Stores coordinates of drawing objects in the image.
- Contains a drawing list—commands for creating each individual object and their attributes.
🔈 Sound Representation
Digital Sound Recording
Amplitude is recorded a set number of times in a second. Each amplitude measurement is given a corresponding unique binary value/number. The binary numbers are saved/stored in sequence.
Key Sound Terms
- **Sampling:** Taking measurements of the amplitude of an analog signal at regular intervals and storing the values.
- **Sampling Rate:** The number of samples taken per unit time (usually per second), measured in Hertz (Hz).
Effect of Increasing Sampling Rate
- Sound is recorded more often, resulting in smaller gaps in the sound wave and between samples.
- Reduces **quantization errors**.
- Improves accuracy.
- The digital waveform resembles the analog one more closely.
- Increases file size (increases the total number of samples taken, requiring more bits to store the data).
Key Term:
- **Sampling Resolution (Bit Depth):** The number of bits used to store each sample.
Effect of Increasing Sampling Resolution
- Increases the number of bits used to store each sample (more bits per sample).
- A wider range of amplitudes can be stored/represented.
- File size increases.
- The digital waveform is closer to the original (improving accuracy).
- Results in smaller quantization errors.
Analogue Data
A variable or data value that is constantly changing (continuous).
🗜️ Data Compression Techniques
Reasons for Compression
- Reduces file size, taking up less storage space (allowing more files to be stored).
- Faster download/upload rate (reduced transmission time) to/from the web.
- Less bandwidth is used during transmission.
- The original file might be too large to send via email or as an attachment.
Lossy Compression
Original data is permanently lost or deleted, and the file cannot be perfectly reconstructed. This method is generally unsuitable for text files, as losing data would corrupt the meaning.
- For videos, lossy compression can result in lower resolution but reduces buffering if streamed in real-time, lowering the required bandwidth.
- Used when not all data is required, the quality reduction is unnoticeable to the user, or if a significant reduction in file size is needed.
Lossless Compression
Original data is preserved, allowing the file to be fully restored to its original state.
- Used when all data is needed (e.g., text files, executable programs), or when a high-quality video or image is required.
Compressing Specific File Types
Sound Compression
- Reduces amplitude range (to only the range used), reducing the bits needed to store each sample.
- **Run-Length Encoding (RLE):** Consecutive sounds are grouped (the binary value of the sound is recorded along with the number of times it repeats).
- Record only changes in sound (not actual sounds).
Image Compression
- **Lossy Methods:**
- Reduce bit depth, reducing the number of bits used to store a color (fewer bits per pixel).
- Reduce the number of colors, meaning fewer bits are needed to store each color.
- Reduce resolution, resulting in fewer pixels overall (less binary data to store).
- **Lossless Methods:**
- **RLE:** Replaces sequences of the same color pixel with the color code and the number of identical pixels.
Run-length Encoding (RLE) Details
RLE identifies groups or sequences of repeated characters and replaces them with a copy of the character and the number of times it occurred consecutively.
Limitations of RLE:
- It relies on storing a color and the number of times it occurs consecutively.
- It is inefficient if there are few sequences of the same color.
- If colors rarely repeat, RLE might increase file size (e.g., RGB becomes R1 B1 G1, adding count data).
Here is an image of a rocket ship:
The image above is a logo for Rocket Revise. If the image had horizontal lines of the same color, RLE would be ideal for compression.
🌐 Networks and Communication
Network Types: LAN and WAN
Local Area Network (LAN)
A LAN allows communication and sharing of data and resources (e.g., hardware/software applications) between devices on the network, enabling central management (security, backup, etc.).
Characteristics:
- Covers a small geographical area.
- The connection between devices is usually physical (wired).
- The infrastructure is privately owned (not controlled by external organizations).
- High data transfer rate.
- Protection is easier to implement and more secure than a WAN.
Wide Area Network (WAN)
A WAN covers a large geographical area, and the connection is often virtual.
Characteristics:
- Low data transfer rate.
- Can have private or public ownership (likely to be controlled by external organizations).
Network Models
Client-Server Network Model
Web pages and data are saved on servers. The client sends a request, and web servers process the requests, perform requested tasks, and return results to the client. The client displays the result to the user.
- The user's computer is the client.
- The server can host shared files.
- Users can request a file from any client computer.
- Files can be accessed simultaneously by several users.
Examples:
- Sending and receiving an email.
- Using a print or file server.
- A company or school centrally storing files.
Peer-to-Peer (P2P) Network
Computers are of equal status. Each computer provides access to data and resources, meaning data is distributed. Computers can communicate and share resources. Each computer is responsible for its own security.
Drawbacks of P2P:
- Reduced security (no central management of security)—each computer is at risk from viruses from other computers.
- No central management of backup—if data from one computer is not backed up, it is lost to all of them.
- No central management of files, making it hard to maintain consistency.
- Computers have slower response times because they are being accessed by other computers.
- Files may not always be available, as not all computers are always switched on.
Thick-Client vs. Thin-Client
A **Thick-Client** relies minimally on the server for processing. Most resources are installed locally, so clients perform most of their processing independently.
A **Thin-Client** relies heavily on the server, which performs all processes required for the task and data storage. Clients only send requests to the server and display returned results.
Network Topologies
Star Topology
Devices are only connected to a central device (e.g., a switch or hub).
Characteristics:
- Each computer is only connected to the central switch/server.
- Fewer collisions.
- High performance because each device is only connected to the switch.
- Easily scalable—1 device is directly connected to the switch.
- More resilient—not reliant on one single cable connection between all nodes.
How data is transmitted in a Star Topology:
- Data from the sending device is transmitted to the router/switch.
- The data packet contains the address of the recipient.
- The router determines the recipient's destination address using a routing table.
- The router transmits data directly and only to the recipient.
Mesh Topology
All computers are connected to at least one other device, often multiple others.
Characteristics:
- Multiple routes exist between devices.
- Computers can act as relays and forward packets to the final destination.
Advantages:
- If one line goes down, alternate routes are available (fault tolerance).
- Improved security because it is not using one main line.
- Fewer collisions because more routes are available.
- New nodes can be added without interfering with others.
Cloud Computing
Accessing a file or service on a remote server via the internet.
- **Public Cloud:** Services offered by a third party over the public internet, available to anyone with appropriate equipment. Resources are available on the Internet.
- **Private Cloud:** Services offered via a private internal network, only available to select users (not the general public). It is a dedicated system only accessible from the organization.
Advantages of Cloud Computing
- Can be free (for basic services).
- Saves storage space on existing local devices.
- Data can be accessed from any device (with internet access).
- Data is likely to be backed up, offering a higher chance of recovery.
- Better security (managed by the provider).
- Scalable and easily shared.
Disadvantages of Cloud Computing
- Only accessible with internet access.
- Can take a long time to upload or download data.
- Can be expensive (long term subscription costs).
- May have limited storage space for free tiers.
- You are reliant on a third party (for security or backup).
- Cannot access files if the server goes down.
Disadvantages of Public Cloud Specifics
- **Loss of Control:** Data is stored on remote infrastructure, relying on an external provider.
- Requires a reliable internet connection to access data.
- Increased recurring costs—providers charge a fee, whereas a LAN is often a one-time setup cost.
Wired vs. Wireless Networks
Advantages of a Wired Network
- Higher bandwidth and lower latency (good for streaming larger files).
- More reliable and stable connection—less vulnerable to interference (distance/walls).
- More secure—confidential data can be transferred securely.
Advantages of a Wireless Network
- Freedom of movement (not fixed to a single location)—devices can be portable.
- Easily expandable/scalable if more devices want to join.
- Less cabling needed, resulting in a cheaper setup.
- Allows access in remote locations (e.g., rural areas).
Disadvantages of a Wireless Network
- Higher latency.
- Affected by weather and physical obstacles.
- Slower transmission speed compared to wired connections.
- Direct line of sight may be needed for some technologies.
Network Transmission Media
Copper Cables
Data is transmitted through electrical signals.
Characteristics:
- Lower transmission rate.
- Higher chance of interference and interception.
- Require repeaters over long distances.
- More sturdy/reliable but inflexible.
Fibre-optic Cables
Data is transmitted using light signals.
Characteristics:
- Greater bandwidth and faster transmission speed.
- Smaller risk of interference.
- Can be used over long distances, requiring less signal boosting.
- More difficult to hack into.
- More prone to damage, less flexible (can break when bent), more expensive to install, and difficult to terminate.
Radio Waves
Carries data wirelessly in the form of electromagnetic waves.
Satellite Communication
A communication device in Earth's orbit receives and transmits data.
Network Hardware Components
Switch
Allows communication between devices and connects individual devices to each other. It receives transmissions and forwards them only to their intended destination.
Server
Manages access to a centralized resource (usually between devices on a LAN).
Wireless Network Interface Controller (WNIC) Functions
- Provides an interface/allows connection to the wireless network using an antenna.
- Receives analog waves and converts them to digital signals.
- Takes digital input, converts it into analog waves, and sends radio waves through the antenna.
- Encrypts and decrypts data.
- Provides a MAC address to identify the device on the network.
Wireless Access Point (WAP)
Hardware that provides radio communication from a central device to nodes on a network.
Characteristics:
- Allows connection of devices using radio waves/signals/Wi-Fi.
- Allows wireless-enabled devices to connect to a wired network infrastructure.
Bridge
Connects two LANs with the same protocol and allows communication/data transmission between them.
Repeater
Restores or regenerates a digital signal so it can be transmitted over greater distances.
Role of the Router
- Receives packets from devices or the external network/internet.
- Stores IP and MAC addresses of all attached devices.
- Maintains a routing table.
- Routes or forwards packets to the destination.
- Finds the destination of a packet using IP addresses.
- Assigns private IP addresses to devices on a LAN (using DHCP).
- Finds the most efficient path to the destination.
- Can act as a firewall and gateway (performing protocol conversion/changes packet format).
Network Protocols and Data Transfer
Ethernet
A protocol used for data transmission over a wired network. It uses CSMA/CD. Data is transmitted in frames—each frame has source and destination addresses and error checking data.
Carrier Sense Multiple Access/Collision Detection (CSMA/CD)
A protocol used to detect and prevent collisions. A device/node listens to a communication channel (scans voltage). Data is only sent when the channel is free or idle.
Collision Handling:
- Since there are multiple nodes on the network, data from two nodes can start to transmit simultaneously, causing a collision.
- If a collision occurs, nodes send a signal to stop transmitting.
- Nodes wait a random time before attempting to send data again.
Bit Streaming
Data is compressed before transmission. Video is transmitted continuously as a series of bits. On download, the server sends data to a buffer on the client computer. The recipient receives a bit stream from the buffer.
- **Real-time Streaming:**
- Used when watching a live stream of events currently taking place.
- The event is captured live with a video camera connected to a computer.
- Media is sent to the user's device/buffer via a bit stream directly as it is being recorded.
- Cannot typically be paused or rewound.
- **On-demand Streaming:**
- The video is already recorded/the event has taken place.
- Existing media is encoded to bit streaming format and uploaded to a server.
- Can be watched at the user's convenience (can be paused, forwarded, or rewound).
WWW vs. The Internet
- The **World Wide Web (WWW)** uses HTTP/HTTPS protocols to transmit data, and it is a collection of web pages and resources.
- The **Internet** uses TCP/IP protocols, and it is the underlying physical interconnected network of networks.
Networks including the Internet Continued
Hardware Supporting the Internet
- **Public Switched Telephone Network (PSTN):**
- Consists of many different types of communication lines.
- Allows for full-duplex data transmission.
- Communication passes through different switching centers.
- The line remains active even during a power outage.
- A dedicated channel is used between two points for the duration of the phone call (circuit switching).
IP Addressing: IPv4 and IPv6
**IPv4:** Four groups, each represented by 8 bits (32 bits total). Uses decimal numbers between 0-255 in each group, separated by full stops.
**IPv6:** Eight groups, each represented by 16 bits (128 bits total). Uses hexadecimal numbers between 0 and FFFF. Groups containing only 0 can be replaced with "::". Separated using colons.
- IPv6 is used when the number of IP addresses needed exceeds the number available using IPv4.
Subnetting
Benefits of Subnetting:
- Improves security—data stays within its subnet, devices do not receive unintended data, and not all devices can access all areas of the network.
- Allows the extension of the network/easier to expand—allows a greater range of IP addresses.
- Reduces the amount of traffic in a network, improving network speed, as data stays within the subnet.
- Easier maintenance/management—only one subnetwork may need taking down, and faults can be isolated more efficiently.
IP Address Structure in a Subnetwork
An IP address is made up of a **Network ID** and a **Host ID**.
- Each device on the subnetwork has the same Network ID.
- Each subnetwork has a different Network ID.
- Every device in each subnetwork has a different Host ID, which uniquely identifies a device within the same subnetwork.
Public IP Address
- Visible to any device on the internet.
- Assigned to allow direct access to the internet.
- Allocated by an Internet Service Provider (ISP).
- Unique throughout the internet.
Private IP Address
- Only visible to devices within the LAN.
- Used for internal LAN communication only.
- Allocated by the router.
- Only unique within the LAN.
Dynamic vs. Static IP Address
A **Dynamic IP Address** is reallocated each time a device rejoins a network.
A **Static IP Address** does not change each time a device connects to the internet; it is fixed.
URL, WWW, and DNS Use
- A **URL** is entered into a web browser and parsed to obtain the domain name.
- The domain name is sent to the **DNS (Domain Name System)** server.
- DNS has a database of domain names and their corresponding IP addresses.
- DNS searches its database for the given domain name.
- If found, the IP address is returned to the web browser.
- The web browser uses the IP address to request the resource from the **WWW** server, which displays the resource.
- If not found, the request is forwarded to a higher-level DNS, and the IP address returned is added to the database of the lower-level DNS.
💻 Computer Hardware and Peripherals
Computers and Their Components
- **Need for Secondary Storage:** To store files, data, and software long-term (non-volatile storage).
- **Need for Primary Storage (RAM/ROM):**
- To store files needed to boot the system (BIOS/ROM).
- To store the Operating System (OS) or any system software currently in use.
- To store intermediate data or current data being processed (RAM).
Embedded Systems
A microprocessor within a larger system that performs a specific task.
Characteristics:
- Has memory, input/output abilities, and a processor integrated into the machine.
- The system is not easily changed by the user/owner.
- Example: A system in a washing machine that only controls cycle programs.
- A combination of hardware and software designed for a specific function.
- Does not typically have its own general-purpose operating system.
- Does not require much processing power.
Disadvantages of Embedded Systems:
- Difficult to change or update firmware by the user.
- Difficult to upgrade to take advantage of new technology.
- Cannot be easily adapted for another task.
- Difficult to update or repair—often the entire unit is replaced.
Peripheral Device Operation
Operation of a Laser Printer
- The revolving drum is given an electrical charge.
- The contents of the page (provided by the buffer/user) are drawn on the drum as an electrostatic charge by a laser beam that moves back and forth.
- The drum is coated with oppositely charged toner, which only sticks to areas charged by the laser beam.
- The drum then rolls over electrostatically charged paper, transferring the pattern onto the page.
- The paper is passed through a fuser/is heated to seal the image permanently.
- The electrical charge is removed from the drum, and excess toner is collected.
Operation of a 3D Printer
3D printing uses **additive manufacturing**, taking a digital 3D model or CAD file and building up the model one layer at a time, starting from the bottom, using XYZ coordinates. The material is fused together layer by layer.
- **Fused Deposition Modelling (FDM):** The material is heated and pushed through a nozzle.
Use of a Temperature Sensor in 3D Printing
- Prevents overheating or ensures the material is hot enough for fusion.
- Identifies the material of the object/material being used.
Microphone Operation
- The microphone has a diaphragm.
- Incoming sound waves cause vibrations of the diaphragm.
- This causes a coil to move past a magnet.
- An electrical signal is produced corresponding to the sound wave.
Speaker Operation
- An electric current is sent to the speaker.
- The electric current passes through a coil.
- The current in the coil creates an electromagnetic field.
- The electromagnet is repelled by or attracted to the permanent magnet based on the direction of the current in the coil.
- Movement of the coil causes the diaphragm (cone) to vibrate.
- This vibration creates sound waves.
Storage Technologies
Magnetic Hard Disk Drive (HDD)
An HDD has platters divided into sectors and concentric tracks. The surface of the disk can be magnetized and has a read/write head mounted on an arm. Data is encoded as a magnetic pattern.
- **Writing:** Variation in current in the head causes variation in the magnetic field on the disk.
- **Reading:** Variation in the magnetic field causes variation in the current through the head.
Advantages of HDDs:
- Costs less per unit storage (used when a large storage capacity is required).
- Has more longevity (used with devices that work all the time and have a large number of read-write operations).
Solid State Memory (SSD/Flash)
Uses a grid of columns and rows (arrays/blocks) that has two transistors at each intersection: a **floating gate** (stores voltage, representing either a 1 or 0) and a **control gate** (controls the movement of charge/electrons during read/write operations). It is not possible to overwrite existing data directly; data must be erased first, then written into the location.
Advantages of Solid State Memory:
- No moving parts, making it more reliable and durable.
- Faster data access times.
Optical Disk Reader/Writer (CD/DVD/Blu-ray)
A rotating disk with concentric tracks made from a reflective metal layer. Data is read/written using laser light (either red or blue) shone onto the disc. Data is stored in **pits and lands** on the track, which are sequences of amorphous and crystalline states on the metallic layer (corresponding to 0s and 1s).
- **Reading:** Reflected light from different states is encoded as a bit pattern.
- **Writing:** The laser changes the surface to crystalline or amorphous states based on the bit pattern being stored.
- Read and write operations can occur simultaneously (on certain types).
Features/Uses:
- Used for transferring data between devices or as backup systems.
- Can be read-only, used to distribute software, movies, or games.
- Generally has lower storage capacity compared to HDDs/SSDs.
Input/Output Devices
Resistive Touch Screen
Has two layers. When the user touches the screen, the layers touch, and a circuit is completed. The processor determines the horizontal and vertical point of contact. It will work if any object touches the screen.
Capacitive Touch Screen
Has several layers. When the top layer is touched (usually by a finger or conductive stylus), there is a change in the electric current, and a microprocessor identifies the coordinates of the touch.
Virtual Reality Headset Operation
Video/data is sent from a computer to the headset.
- The video feed is sent to an LCD or OLED display.
- Two lenses are placed between the eyes and the screen, which allows for focusing and reshaping of the video for each eye, creating a 3D stereoscopic effect.
- Uses a high Frame Rate Per Second (FPS), typically 60 to 120, to reduce motion sickness.
- Sensors measure/track the movements of the user (head tracking), allowing the video on the screen to react to and mimic movements.
- Uses **binaural (surround) sound**, so the sound from speakers appears to come from all directions.
- Can also use infrared sensors to monitor eye movement, allowing the depth of field on the screen to be more realistic.
Memory Management Components
Purpose of a Buffer
- To act as temporary storage and temporarily store data until it is ready to be transmitted to the device.
- Stores data before it is used by the receiving device.
- Allows processes to operate independently of each other, handling speed differences between components.
Examples include a video buffer when streaming videos, or a printer buffer when data is transferred from the computer to the printer. Process instructions and data are sent by the computer to the buffer, and data is transferred from the buffer to the device—allowing the user to continue using the computer or allowing the processor to continue processing. When the buffer is empty, an interrupt is sent to the computer, requesting more data.
Random Access Memory (RAM)
Primary memory that stores the currently running parts of the software, data, OS programs, and processes. It is volatile. It can store data about I/O devices, contents of buffers, or information about the current process.
- **Static RAM (SRAM):** Uses 4–6 transistors arranged as flip-flops; has more complex circuitry.
- **Dynamic RAM (DRAM):** Uses a single transistor and capacitor; stores bits as a charge.
| Feature | SRAM | DRAM |
|---|---|---|
| Advantages | Faster access time, used on CPU for performance (cache memory), lower power consumption. | Lower cost per unit, higher storage/data/bit density, simpler design. |
| Disadvantages | Lower data density, more expensive per bit. | Needs to be refreshed constantly, higher power consumption, slower access speed. |
| Typical Use | Used in cache memory. | Used in main memory. |
Read Only Memory (ROM)
Primary memory that stores start-up instructions (BIOS), firmware, and any permanently required data. It stores the kernel of the operating system or parts of the OS.
- **Programmable ROM (PROM):** Can be set once by the manufacturer.
- **Erasable Programmable ROM (EPROM):** Erased using UV light, needs to be removed from the device, can be overwritten multiple times, and must be entirely erased to rewrite.
- **Electrically Erasable Programmable ROM (EEPROM):** Erased using voltage (no additional equipment is needed), erased within the device, can be overwritten multiple times, does not have to be entirely erased before rewriting, and the contents of the firmware can be changed easily.
ROM is used in embedded systems to store data that does not change, and the data must be stored when the device is powered off (non-volatile) to store boot-up instructions.
Control Systems
A control system uses feedback to produce an action.
- **Role of an Actuator:** Generates a signal or converts electrical energy into mechanical energy to produce an action (e.g., opening a valve, turning a motor).
- **Importance of Feedback:** Ensures the system operates within given criteria, allows the system output to affect the system input, and allows conditions to be automatically adjusted (closed-loop system).
🧮 Logic Gates and Logic Circuits
(Content related to logic gates and circuits is missing from this document.)
⚙️ CPU and Processor Fundamentals
Central Processing Unit (CPU) Architecture
Stored Program Concept
Instructions and data are stored in the same memory space (main memory).
Components in Von Neumann Architecture
- Buses (Address, Data, Control)
- Registers (General and Special Purpose)
- CPU (Central Processing Unit)
- CU (Control Unit)
- ALU (Arithmetic Logic Unit)
- IAS (Immediate Access Store / Main Memory)
- System Clock
General Purpose Registers
Hold temporary data when performing operations and can be used for any purpose. They can be used by most instructions.
Special Purpose Registers
Hold the status of a program. They are specialized for a specific use and can only be used by certain instructions. Key examples include:
- **Program Counter (PC):** Holds the address of the next instruction to be loaded and is incremented once an instruction has been carried out.
- **Memory Address Register (MAR):** Stores the address of the memory location currently being read from or written to (where data is being fetched from).
- **Memory Data Register (MDR):** Holds the data fetched from the address in the MAR or data to be written to memory. It is copied to the CIR.
- **Current Instruction Register (CIR):** Holds the instruction currently being decoded or executed (copied from the MDR).
- **Index Register (IX):** Stores a value that is added to an address to give another address (used in indexed addressing).
- **Status Register (SR):** Stores flags from results of logic and arithmetic operations (e.g., overflow, zero) and interrupt flags. Contains bits that can be individually set or cleared depending on the operation.
Control Unit (CU) Functions
- Synchronizes the actions of other components in the CPU based on the pulses of the system clock.
- Sends and receives control signals along the control bus.
- Manages the execution of instructions and decodes an instruction's opcode during the Fetch-Execute cycle.
- Controls communication between components of the CPU.
- Types of signals it transfers: interrupt, timing, read, and write signals.
System Clock
- Synchronizes computer operations by creating time signals (pulses).
- Allows operations to be processed in the correct order or sequence.
- Keeps track of the date and time (in conjunction with other components).
Immediate Access Store (IAS)
Holds all the data and programs currently in use (synonymous with main memory/RAM).
- Volatile memory.
- Has fast access times.
Data Transfer Between Components
- The system clock gives out timing signals sent on the control bus, which synchronizes the other system components.
- The CU initializes data transfer and generates signals sent on the control bus to other components.
Role of Buses in Data Transfer
- **Address Bus:** Carries the address where data is being or going to be written to or read from (unidirectional).
- **Data Bus:** Carries data between the devices, buffer, or components (bidirectional).
- **Control Bus:** Carries control signals from the CU (bidirectional).
CPU Performance Factors
| Factor | Description |
|---|---|
| Number of Cores | Each core processes one instruction per clock pulse, so more cores mean more sequences of instructions can be carried out simultaneously. This decreases the time taken to complete a task if the software is optimized for parallel processing. |
| Bus Width | A wider bus allows the transfer of more data simultaneously during each transfer cycle. |
| Clock Speed | Each instruction is carried out on a clock pulse. The clock speed dictates the rate (in Hz) at which instructions are being run (more Fetch-Execute cycles per unit of time). |
| Cache Size | A higher capacity means it can store a higher amount of frequently used instructions and data for fast access. |
| Quantity of RAM | More applications can reside in main memory simultaneously, saving/decreasing disk access times (reducing the need for virtual memory/swapping). |
Impact of Number of Cores
- **Why performance may not increase:**
- Software may not be designed for multiple cores—one core will be left idle.
- Memory access speed might not match the speed of cores, causing a delay (bottleneck).
- Performance is also limited by other factors (e.g., the amount of RAM or bus speed).
Impact of Clock Speed
- Faster clock speed means more instructions can be run per second/time period.
Impact of Cache
- Cache is fast access memory located close to the CPU.
- Stores frequently used instructions/data.
- More cache means more instructions can be transferred faster—less swapping between RAM and cache.
- Prevents the CPU from idling while waiting for data (CPU bottleneck reduction).
Impact of Quantity of RAM
- More applications can reside in main memory simultaneously.
- Decreases disk access times by reducing reliance on virtual memory.
Ports
Purpose of Ports:
- To provide a connection to peripherals.
- To provide an interface between the computer and other devices.
Peripheral Devices and Data Transfer
Serial Communication Ports
- **USB (Universal Serial Bus):** Data is transferred 1 bit at a time (serially). USB can be asynchronous or synchronous. USB 3.0 and later are full duplex, otherwise half-duplex.
- **COM Port:** Another type of serial communication port.
Video and Display Ports
- **HDMI (High-Definition Multimedia Interface):** Allows video and audio to be transferred on one cable.
- **VGA (Video Graphics Array):** An older analog video output standard.
- **DisplayPort:** A digital display interface used for video output.
Fetch-Execute Cycle (F-E Cycle)
Here's how the fetch-execute cycle works, along with the registers involved:
- **PC (Program Counter):** Stores the address of the next instruction to be fetched and is incremented each cycle.
- **MAR (Memory Address Register):** Holds the address where data is fetched from or written into.
- **MDR (Memory Data Register):** Holds data at or from the address in MAR, or data to be entered into it.
- **CIR (Current Instruction Register):** Instruction from MDR is copied here for decoding and execution.
Register Transfer Notation (RTN)
- [PC] ← [PC] + 1: Address in PC is incremented.
- MAR ← [PC]: Contents of PC loaded to MAR.
- MDR ← M[MAR]: Data located at the address held in MAR is copied to MDR.
- CIR ← [MDR]: Contents of MDR copied to CIR.
Stages of the Fetch-Execute Cycle
- The address to be fetched is stored in the PC.
- This address is copied into MAR using the address bus.
- The instruction located at that address is copied from main memory to MDR using the data bus.
- The instruction is copied from MDR to CIR.
- The instruction is decoded by the CU (Control Unit) into opcode and operand.
- The processor executes the instruction (using the ALU if necessary).
- The address in PC is incremented (ready for the next cycle).
⚠️ Interrupts and Handling
Purpose of an Interrupt
- To send a signal from a device or process seeking the attention of the processor, often indicating an urgent need for service.
Causes of Interrupts
| Category | Causes |
|---|---|
| Software Interrupts | Division by zero, I/O runtime error, attempt to access invalid memory location, array index out of bounds, stack overflow, buffer overflow. |
| Hardware Interrupts | Printer is out of paper, keyboard key press, power failure, timer signal. |
Interrupt Handling Process
- An interrupt flag is raised in the interrupt register.
- The register is checked by the Control Unit at the start or end of the F-E cycle.
- The type and source of the interrupt are identified.
- The interrupt's priority is checked against the current process.
- If the interrupt priority is lower, the F-E cycle continues with the current process.
- If the interrupt priority is higher, the contents of the registers (the state of the current process) are stored in/moved to a **stack**.
- The address of the appropriate **Interrupt Service Routine (ISR)** is called and loaded into the PC.
- Once the ISR finishes, the system checks for further pending interrupts.
- If found, the cycle repeats for the next highest priority interrupt.
- Otherwise, data/contents are loaded from the stack back into the registers, and the previous process resumes.
🗣️ Assembly Language and Addressing
Two-Pass Assembler
First Pass
- Creates a **symbol table**.
- Reads assembly language instructions line by line.
- Adds any new symbolic addresses (labels) to the symbol table along with their memory location.
- Removes comments and white space.
- Checks that the opcode is valid (in the instruction set).
Second Pass
- Generates object/machine code.
- Reads the assembly language program one line at a time.
- Uses the symbol table to replace symbolic addresses with actual memory addresses.
Instruction Groups
| Group | Description |
|---|---|
| Data Movement | Moves data between registers, memory addresses, or other locations (e.g., LOAD, STORE). |
| Input/Output of Data | Takes input from the user or outputs characters or binary numbers. |
| Arithmetic Operations | Performs calculations such as addition or subtraction (e.g., ADD, SUB). |
| Unconditional Jumps | Moves program execution to another instruction regardless of conditions (e.g., JMP). |
| Conditional Jumps | Compares values or checks the status register flags before moving to another instruction (e.g., JNE, JGT). |
Addressing Modes
- **Immediate Addressing:** The operand is the actual data value to be used.
- **Direct Addressing:** The operand holds the memory address where the data is stored.
- **Indirect Addressing:** The operand holds the memory address that stores the memory address of the data (a pointer).
- **Indexed Addressing:** Forms the effective address by adding the address given in the operand to the contents of the Index Register.
- **Relative Addressing:** The address to be used is an offset number of locations away, relative to the address of the current instruction (PC). This allows for relocatable code.
🔢 Bit Manipulation and Shifts
Binary Shifts
Moving bits in a register a certain number of places within the register.
- **Logical Shift:** Bits shifted out of the register are replaced with 0s.
- **Arithmetic Shift:** Used for signed numbers. Bits shifted out of the register are replaced with 1s (or the sign bit is preserved, depending on the direction and convention).
- **Cyclic Shift (Rotation):** No bits are lost; bits shifted out of one end appear at the other end.
- Left Shift
- Right Shift
Bit Masking
- **AND:** Used to check if a specific bit has been set (to 1).
- **OR:** Used to set specific bits (force them to 1).
- **XOR (Exclusive OR):** Used to clear a bit that has been set (toggle the state of a bit).
💻 System Software and OS Management
Operating Systems (OS)
The operating system provides a user interface, a platform for software to run, and hides the complexities of hardware from the user.
Operating System Management Tasks
- Memory Management
- File Management
- Security Management
- Hardware Management (I/O)
- Process Management
- Error Checking and Recovery
Memory Management Functions
Controls the movement of data between RAM and the processor.
- Allocates memory to processes dynamically.
- Reclaims unused blocks of RAM.
- Prevents two programs from occupying the same area of RAM.
- Moves data from secondary storage when needed (swapping).
- Manages paging and virtual memory.
File Management Tasks
- Allocates space to particular files.
- Maintains a directory structure.
- Provides file naming abilities.
- Implements access rights.
- Allows file sharing.
- Specifies tasks that can be performed on a file (copy, paste, delete, close).
Security Management Functions
- Creates accounts and manages passwords.
- Provides firewall or anti-malware services.
- Validates user and process authenticity.
Hardware Management (I/O)
- Receives data from input devices and sends data to output devices.
- Operates and installs device drivers.
- Allows communication between peripheral devices and computers.
- Handles buffers for data transfer, ensuring smooth transfer between devices transmitting and receiving at different speeds.
- Manages interrupts from devices.
Process Management (Scheduling)
- Manages the scheduling of processes—deciding which process to run next and the order of processes.
- Manages resources the processes require, e.g., allocating memory.
- Enables processes to share or transfer data.
- Prevents interference between processes.
- Handles process queues.
- Supports multitasking, ensuring fair access and handling priorities/interrupts.
Utility Software
Software that helps set up or maintain the computer system.
- Makes memory allocation more efficient.
- Checks the system for faults.
Disk Formatter
A disk needs to be prepared or initialized for use.
- Prepares/initializes a disk for storing files by partitioning it (generates a new file system).
- Can delete all data from the disk.
- Sets up a file allocation table (FAT).
- Checks the disk for errors.
Defragmentation
Over time, saving and deleting files fragments the disk, scattering file parts across the storage medium.
- Moves/rearranges blocks of files so that each individual file is contiguous in memory.
- Moves free space together.
- Less time is taken to access files (less head movement as data is contiguous).
- Improves disk access times (no need to search for the next fragment).
Disk Repair
Needed to optimize performance and maintain data integrity.
- Scans for errors or inconsistencies in a disk and corrects them.
- Prevents bad sectors from being used.
- Reduces access times by optimizing storage structure.
Back-up Software
Allows retrieval of data and provides security against loss.
- Creates a copy of data in case the original is lost (at regular intervals).
- Allows retrieval of data if any is lost or corrupted.
Disk/System Clean Up
Optimizes storage by removing unwanted temporary or redundant files.
Compression Software
- Reduces file size.
- Saves storage and memory space.
- Reduces transmission time.
Virus Checker (Anti-Virus)
- Frees up RAM by removing malicious software.
- Scans files on the hard drive for malicious program code.
- Regularly scans the computer for viruses, checking against a stored database of known virus signatures. The database needs to be updated regularly.
- If a virus is detected, it is quarantined or deleted.
- Compares downloaded files to a database of known viruses, preventing the download from continuing.
Program Libraries
Program Library Definition
- Contains pre-written functions and subroutines.
- Can be referenced or imported into a program.
- The functions/routines it has can be called in own programs.
- Saves time as code does not have to be written from scratch.
- More likely to work (as the code is already tested).
- Program updates automatically if the routine is updated externally.
- Can perform more complex calculations than the programmer is able to do easily.
Dynamic Link Library (DLL)
- Requires less main memory as the DLL is only loaded once when needed.
- The executable file is smaller (it does not contain all library routines).
- No maintenance is needed from the programmer (DLL is separate from the program).
- No need to recompile the main program if changes are made to the DLL.
- Changes/improvements to DLL file code are done independently of the main program.
🌐 Language Translators and IDEs
Translators convert a high-level or assembly programming language into a different form (usually machine/object code).
| Translator | Description |
|---|---|
| Assembler | Translates assembly code into machine code. |
| Compiler | Translates high-level language entirely before execution (the whole code is translated, then run). |
| Interpreter | Translates high-level language line by line (each line is translated, then run immediately). |
Compiler Details
- Used after the program is completed.
- Produces an error report after translating the entire source code.
- Creates an executable file that can be run without the source code present.
- The user cannot easily access, edit, or sell the source code.
- Users do not need a translator installed to run the compiled program.
- Can be compiled for different hardware specifications, potentially generating more income.
Compiler Drawbacks when Testing
- Code cannot be changed without recompilation.
- The program will not run if there are any syntax errors.
- Errors cannot be corrected in real-time during execution.
- One error may result in false errors being reported later in the code.
- Cannot easily test individual sections of code if the program is unfinished.
Compiler Advantages when Testing
- Can debug multiple errors simultaneously (via the error report).
- The resulting executable file runs very quickly.
- The developer can test the program multiple times without recompiling (once the executable is generated).
Interpreter Details
- Used while writing a program for testing and debugging.
- Errors can be corrected in real time.
- Stops execution when an error occurs and displays the position of the error.
Interpreter Advantages when Testing
- Allows the developer to make real-time changes that can be seen immediately.
- The program can be debugged at each stage.
- The developer can test when the program is incomplete; small parts can be tested individually.
- If one section does not work, others can still be tested to avoid dependent errors.
Partial Interpreters/Compilers (Hybrid Systems)
- Can be used on different platforms as they are interpreted when run (e.g., Java bytecode).
- Code is optimized for the CPU as machine code is generated at run time (Just-In-Time compilation).
- Source code does not need to be recompiled for every platform.
Note: Programs may not need to be compiled if the software is already an executable file, has been pre-compiled/built using a compiler, or if the source code has not been provided.
Integrated Development Environment (IDE) Features
Coding Tools
- **Context-sensitive prompts:** Displays predictions/options to complete statements and suggests additions as the code is being written.
- **Auto-complete:** Helps the programmer figure out what to type next.
- **Auto-correct** (for common typos).
Error Detection Tools
- **Dynamic syntax check:** Underlines or highlights syntax errors as code is being entered in real-time.
Presentation Tools
- **Pretty printing:** Uses color coding and formatting to help identify key terms, variables, and syntax.
- **Auto-indentation:** Automatically formats code structure.
- **Expand/collapse code blocks:** Allows the programmer to hide sections of code for easier navigation.
Debugging Tools
- **Single stepping:** Allows the programmer to run the code one line at a time (breaking in between) so the effects of each statement on variable values can be seen/checked.
- **Breakpoints:** Stop the code executing at a set line to check current variable values and program progress.
- **Report windows/Watch windows:** Output contents of variables and data structures in real-time (see how variables change).
🔒 Security, Privacy, and Data Integrity
Definitions
- **Data Security:** Protects data against loss or corruption and ensures recovery mechanisms are in place.
- **Data Privacy:** Ensuring data is protected against unauthorized access and misuse.
- **Data Integrity:** Ensures consistency, accuracy, and timeliness of data (e.g., through validation/verification rules).
Importance of Security
- **Why Data Needs to Be Kept Secure:** To protect against someone deleting, modifying, or stealing it.
- **Why the Computer System Needs to Be Kept Secure:** To protect against someone installing malware, damaging the system, or accessing data on it.
Measures to Protect Computer Systems
- Two-factor authentication (2FA).
- Strong usernames and passwords.
- Biometric passwords.
- Digital signatures.
- Firewall implementation.
- Up-to-date Anti-Malware software.
- Anti-spyware software.
- Regular Backups.
- Encryption of sensitive data.
- Access rights and permissions.
Digital Signatures Process
- The sender puts a message through a hashing algorithm to produce a unique **message digest**.
- The digest is encrypted with the sender's **private key**, creating a digital signature.
- The message and signature are sent to the receiver.
- The receiver decrypts the signature using the sender's **public key** to reproduce the original digest.
- The receiver runs the same hashing algorithm on the received document to create a second digest.
- The two digests are compared. If they match, the document is authentic and has not been tampered with.
Security Tools
Firewall
- Monitors incoming and outgoing traffic/packets, comparing them to criteria set by the user or administrator (checks against whitelisted/blacklisted IP addresses and ports).
- Accepts or rejects incoming/outgoing packets based on these criteria.
- Blocks or rejects transmissions that do not match criteria and accepts ones that do.
Up-to-date Anti-Malware
- Scans files on the hard drive for malicious software.
- Regularly scans the computer for viruses, checking against a stored database of known virus signatures. The database needs to be updated regularly.
- If a virus is detected, it is quarantined or deleted.
- Compares downloaded files to a database of known viruses, preventing the download from continuing.
Anti-spyware
- Scans the computer for spyware, checking against a stored database of known spyware. The database needs to be updated regularly.
- If spyware is detected, it is quarantined or deleted.
- Compares downloaded files to a database of known spyware, preventing the download from continuing.
Encryption
Converts data into **ciphertext** using an algorithm and a key, making it unreadable if intercepted without a decryption key.
🚨 Common Cyber Threats
Malicious Software
- **Virus/Malware:** Malicious software that replicates itself. Downloaded/run without the user's knowledge. Runs in the background and can pretend to be legitimate. Can damage, delete, or corrupt data but typically does not send data out of the computer (unlike spyware).
- **Spyware:** Malicious software downloaded/run without the user's knowledge. Runs in the background and can pretend to be legitimate. Secretly records/collects user's data/actions (e.g., keystrokes). Sends data/activity logs to a third party but does not typically replicate itself.
Hacking
- **Hacking/Hackers:** Illegal or unauthorized access to a computer system or network. Used to delete, damage, or collect data, usually done with malicious intent.
Social Engineering Attacks
- **Phishing:** Requires user action. An email pretends to be from an official body. Persuades individuals to disclose private information (e.g., login credentials). Requests authentication by redirecting the user to an unofficial website.
- **Pharming:** Automatic redirection. Redirects the user to a false/fake website, often by manipulating DNS settings or the host file, without the user clicking a malicious link.
Prevention Methods (General)
- Check URL validity (spelling, domain).
- Ensure the connection is secure (look for HTTPS and the padlock icon).
🛡️ Data Protection Methods
Access Rights
- Give certain users access to different elements of the system or data.
- Uses different accounts/logins that have different levels of access (e.g., read/write, read-only).
- Specific views can be assigned (especially in databases).
Encryption
- Data is converted into ciphertext/data is encoded.
- Cannot be understood if intercepted without a decryption key.
✅ Data Integrity Checks
Validation
Checks data is reasonable or sensible (when explaining use, refer to specific checks).
- **Format Check:** Makes sure data is in the required pattern (e.g., postcode format).
- **Length Check:** Makes sure data contains the correct number of characters or falls within a specified range.
- **Type Check:** Ensures only the correct data type (e.g., numeric, non-numeric) is entered.
- **Existence Check:** Makes sure data is present in a required field (not left blank).
- **Range Check:** Ensures data falls within acceptable minimum and maximum values.
Verification
Checks that data matches the original source or is accurately transcribed.
- **Visual Check:** Manual comparison with the source document or material.
- **Double Entry:** Data is entered twice, and the computer system compares the two entries.
Reasons Why Data Might Still Be Incorrect
- Data on the original source document may not be correct (GIGO).
- Verification only checks if the input matches the source, not if the source is accurate.
- Validation doesn't check the accuracy of data, only if it's reasonable or meets structural rules.
Error Detection During Transfer
Parity Check
- The type of parity (even or odd) is decided upon before transfer.
- Each byte contains an additional **parity bit**.
- In parity blocks, an additional parity byte is sent with vertical AND horizontal parity (2D parity check).
- Each row/column must have an even or odd number of 1s.
- The receiver counts the number of 1s in the byte/block, allowing single-bit errors to be identified (and corrected if 2D parity is used).
- An error cannot be detected if an even number of bits has been changed, as they could cancel each other out.
Checksum
- A checksum value is calculated from the block of data before transmission (by summing the data units).
- The checksum value is transmitted along with the data.
- The receiving computer recalculates the checksum from the received data.
- If the received checksum and recalculated checksum match, no error has occurred. If they don't match, an error has occurred, and retransmission is requested.
🤝 Ethics, Copyright, and Licensing
Benefits of Joining an Ethical Body (e.g., BCS, IEEE)
- Provides set ethical guidelines to follow, ensuring clients/staff know the standards and reducing the need for subjective decision-making.
- Enhances professional integrity, assuring clients/staff that the member is reputable.
- Demonstrates skills and knowledge (certification).
- Provides help and support (e.g., legal advice).
- Training courses help to keep skills up to date.
Reasons to Act Ethically
- Ensures team members feel valued.
- Maximizes the quality of work produced.
- Promotes teamwork and cooperation.
- Enables the creation of the best product for the customer.
Professional Conduct
How to Act in the Best Interest of the Client
- Keep the client's personal data private.
- Involve the client in development and maintain clear communication.
- Provide solutions the client asked for (meeting requirements).
- Keep the project on schedule and stay within a given budget.
- Keep the client informed of any problems or delays.
How to Act Ethically (Personal Responsibility)
- Be truthful—ask for help on how to use a program if unsure.
- Perform your own research and due diligence.
- Ask for additional training or a mentor if needed.
Consequences of Being Unethical
- The product might fail if an error isn't reported.
- Code might not work, letting down the client.
- Failing duties as an employee or professional.
Intellectual Property (IP)
**Copyright:** The formal and legal rights to ownership of creative work.
**Intellectual Property Rights (IPR):** Protects against unauthorized reproduction of work and allows for legal right of redress if infringed.
Preventing Illegal Copies
- Encryption of software files.
- Use of a product key or license activation.
- Compile source code (distribute only the executable file, e.g., .exe).
Software Licensing
| Type | Description |
|---|---|
| Free Software Foundation & Open Source Initiative | The user can edit/improve the source code (which must usually be released under the same conditions as the original software) and redistribute the software. |
| Shareware | Enables the program to be copyrighted, so the user cannot legally modify it, and control over the product is maintained, protecting intellectual property rights. The user always gets a trial period and can also gain income if the full version is bought afterward. |
| Commercial Software (Proprietary) | The user must pay before being able to legally use the software and cannot redistribute or edit the source code. |
Reasons for an Open Source License
- Allows the user to customize the code.
- Allows errors to be reported and fixed quickly by the community.
- Allows additional features to be added to the code by collaborators.
- Allows for community collaboration and improvement.
Reasons Against an Open Source License
- Requires you to release the source code publicly.
- Allows anyone to edit, modify, and share the source code/program.
- Doesn't allow the original creator to profit directly from selling the software license.
Advantages of Commercial Software
- Enables the program to be copyrighted, so the user cannot legally edit it, and control over the product is maintained.
- Protects source code/prevents unauthorized changes from being made.
- A fee can be charged for the program, so the programmer gains income.
- Prevents illegal copies from being made, and legal action can be taken if this occurs.
- Likely to have fewer bugs (due to professional testing), redress available if the software is broken, and potentially better dedicated support as a fee is being charged.
🤖 Artificial Intelligence (AI) Applications
Examples of AI Use
- **Facial Recognition:** Police identifying wanted people using image/facial recognition (AI identifies features/patterns in an image and matches them to a person/object).
- **Natural Language Interfaces:** Using speech recognition (AI identifies language/words being spoken, learns accents, matches words to a database, and generates the most likely sentence).
- **Autonomous Vehicles:** Self-driving cars can detect their position on the road, self-park, avoid collisions, and follow a route.
- **Game Playing:** Models characters in computer games and allows computer characters to react according to the player's movements.
- **Surveillance:** AI can start recording to secondary storage only when a person is detected.
- **Camera Tracking:** AI can identify the direction of movement and then move the camera accordingly.
AI cameras scan the scene in real-time for facial/image recognition, taking each frame individually and analyzing pixels. Cameras focus on the pattern identified.
Social Impacts of AI
- **Privacy Issues:** People may dislike their personal data (e.g., biometric data) being stored and processed.
- **Error Risk:** Incorrect recognition or decision-making leads to mistakes (e.g., false arrests).
- **Safety:** Individuals will feel safer, leading to a reduction in crime and helping in catching criminals.
Economic Impacts of AI
- Reduces operational costs since less time is taken for tasks to be carried out (automation).
- Increases profits, leading to more efficient work performance (redundant tasks are done by AI).
- Decreases cost for the customer (due to efficiency).
- May decrease profit margins initially, as the program may be expensive to maintain, buy, or update.
Here is an image of a rocket with the words "Rocket Revise" written across it. This is to remind us of the importance of reviewing and revising material when learning!
🗄️ Database Systems and SQL
Drawbacks of a File-Based Approach
- **Data Redundancy:** The same data is stored many times across different files.
- **Data Inconsistency (Poor Integrity):** Data is not updated across the whole system, leading to duplicates that are stored differently.
- Hard to perform complex queries since a new program has to be written each time.
- **Lack of Privacy:** User views/access rights cannot be easily implemented.
Advantages of a Relational Database (DBMS)
- **Reduced Data Redundancy:** Each data item is only stored once (due to linked tables).
- **Maintains Data Consistency (Integrity):** Changes in one table automatically update in another via relationships.
- Complex queries are easier to run using standard query languages (SQL).
- Can provide different views, so the user can only see specific aspects/parts of the database (improved privacy and security).
- **Program-Data Independence:** Programs do not need to be rewritten if the data structure is changed (data is separate from the software application).
Key Database Terms
- **Entity:** An object or concept about which data can be stored (e.g., Customer, Product).
- **Field (Attribute):** A column in a table representing a characteristic of an entity.
- **Tuple (Record):** A single row of data in a table (about one instance of an entity).
- **Primary Key:** A unique attribute used to uniquely identify a record/tuple. It can be used as a foreign key in another table to form a link between tables.
- **Candidate Key:** An attribute or set of attributes that could potentially be chosen as the primary key.
- **Secondary Key:** An alternative key used along with a primary key to locate specific data (a candidate key that has not been chosen as a primary key, often used for indexing).
- **Foreign Key:** A field in one table that links to a primary key in another table, establishing a relationship.
Importance of Referential Integrity
- Makes sure data in the database is consistent and up-to-date.
- Ensures that every foreign key has a corresponding primary key value in the linked table.
- Prevents records from being added, modified, or deleted incorrectly (e.g., preventing deletion of a primary record if foreign keys still reference it).
- Ensures any changes made to data in one place are reflected in all related records.
- Ensures non-existent data cannot be referenced.
Normalization
- **First Normal Form (1NF):** No repeating groups of attributes. Each field must be atomic (e.g., Name split into FirstName and LastName). A primary key must be identified.
- **Second Normal Form (2NF):** Must be in 1NF and have no partial key dependencies (all non-key attributes must be fully dependent on the entire primary key).
- **Third Normal Form (3NF):** Must be in 2NF and have no transitive dependencies (all non-key attributes must be fully dependent on the primary key AND NO OTHER non-key attributes).
Steps for Normalization
- **Unnormalized Form (UNF) to 1NF:**
- Remove/identify any repeating groups of attributes.
- Ensure each field is atomic.
- Identify the primary key.
- **1NF to 2NF:**
- Remove any partial key dependencies (only relevant if the primary key is composite).
- **2NF to 3NF:**
- Remove any non-key dependencies (transitive dependencies).
⚙️ Database Management Systems (DBMS)
A DBMS is software that provides comprehensive data management capabilities.
Data Dictionary Contents (Metadata About Database)
- Data about data in a database (metadata).
- Table name and field name definitions.
- Data types for each field.
- Type of validation used and validation rules.
- Primary and foreign key definitions.
- Relationships between elements (tables).
Logical Schema
- Shows the structure of the database and its relationships (e.g., Entity-Relationship diagram).
- An abstract overview of a database structure.
- Models the problem using methods such as ER Diagrams.
- Independent of any particular DBMS software.
- Describes the relationship between data and its structure.
Security in a DBMS
- **Authentication:** Requiring usernames and passwords for access.
- **Backup/Recovery Procedures:** Automatically creating copies of the database and storing them off-site regularly, allowing data to be recovered if lost.
- **Access Rights:** Users are given different access permissions to different tables (e.g., read/write, read-only).
- **Views:** Different users are able to see different subsets of the database, ensuring they only see what is required (data hiding).
- **Encryption:** Data is turned into ciphertext and cannot be understood without a decryption key.
- **Record and Table Locking:** Prevents simultaneous access to data by multiple users, ensuring data is not overwritten inconsistently.
Software Tools in DBMS
- Provides a developer interface for management.
- Allows the user to create items such as tables, forms, and reports.
- Creates input and output abilities through menus, buttons, or monitors.
- **Query Processor:** Allows the user to enter criteria, searches for data that meets the entered criteria, and organizes results to be displayed to the user. This software processes and executes queries written in SQL.
⌨️ SQL: DDL and DML
DDL (Data Definition Language) Statements
DDL statements define the database structure or schema. They always end with a semicolon!
CREATE DATABASE:
CREATE DATABASE database_name;CREATE TABLE:
CREATE TABLE table_name (
column_name DATATYPE,
column_name DATATYPE NOT NULL,
PRIMARY KEY (column_name),
FOREIGN KEY (column_name) REFERENCES another_table(column_name)
);ALTER TABLE:
ALTER TABLE table_name
ADD column_name DATATYPE;
ALTER TABLE table_name
ADD PRIMARY KEY (column_name);
ALTER TABLE table_name
ADD FOREIGN KEY (column_name) REFERENCES another_table(column_name);
ALTER TABLE table_name
DROP COLUMN column_name;
ALTER TABLE table_name
RENAME COLUMN old_column_name TO new_column_name;
Common SQL Data Types
- **CHARACTER (CHAR):** Fixed-length string.
- **VARCHAR(N):** Variable-length string (e.g., VARCHAR(255) allows up to 255 characters).
- **BOOLEAN:** True or False value.
- **INTEGER (INT):** Whole numbers.
- **REAL:** Floating-point numbers (decimals).
- **DATE:** Date values.
- **TIME:** Time values.
English with a size of 74.9 KB