Database Fundamentals: SQL Queries and Relational Concepts

Posted by Anonymous and classified in Computers

Written on in English with a size of 99.31 KB

Core Database Concepts and Relational Algebra

Question 1: Cartesian Product Size

Question 1: S is a relation instance. If S has 6 tuples, how many tuples are there in the result of the following SQL query?

SELECT * FROM S S1, S S2;

Answer: 36 (Calculated as 6 * 6, representing the Cartesian product of S with itself.)

Question 2: Maximum Tuples and Primary Keys

Question 2: Let R(A, B, C, D) be a relation, where (A, B, C) is the Primary Key (PK) of R, and attribute D cannot be NULL. Assume A's domain has 5 different values, B's domain has 2, C has 4, and D has 3. What is the maximum number of tuples that can be in an instance of R?

Answer: 40 (Calculated as 5 * 2 * 4. The maximum number of tuples is determined by the product of the domain sizes of the Primary Key attributes, as the PK must be unique.)

Question 4: Foreign Keys and Subqueries

Consider the relations Customers(cid, cname, level, type, age) and Activities(cid, slopeid, day), where the cid attribute of Activities is a Foreign Key (FK) referencing the cid attribute of Customers.

4a) Explaining the EXISTS Clause

Explain why every activity in the Activities table will always appear in the result of the following SQL query:

SELECT * FROM Activities a WHERE EXISTS ( SELECT * FROM Customers c WHERE c.cid = a.cid );

Explanation: Because of the Foreign Key constraint, every cid that appears in Activities must also appear in Customers. Since cid is the Primary Key in Customers, there will be exactly one tuple in Customers that satisfies the WHERE condition of the subquery. This means the result of the subquery will be non-empty for each tuple a in Activities (i.e., EXISTS will be TRUE), and thus all tuples a will be returned.

4b) Explaining Single-Value Subquery Errors

Explain why the following SQL query might result in an error:

SELECT * FROM Activities a WHERE a.cid = ( SELECT c.cid FROM Customers c WHERE c.cname='Ice' );

Explanation: An error might occur because the = operator expects a single value, but the subquery could return more than one value. This can happen if multiple customers in the Customers table share the name 'Ice'.

Question 5: ACID and Declarative Languages

Provide a brief explanation for the following concepts:

5a) Atomicity in ACID Properties

The 'A' in database ACID properties stands for Atomicity. Explain what Atomicity means and give one example of what could go wrong without it.

Explanation of Atomicity: Atomicity means that all the database modifications that occur within a transaction happen completely, or none of those modifications happen. It ensures the transaction is treated as a single, indivisible unit ("All-or-Nothing").

Example of Failure: If an application intends to move $50 from a person's checking account to their savings account, and the system fails after the deduction from checking but before the addition to savings, the deduction might occur without the corresponding addition. The person loses $50, compromising data integrity.

5b) SQL as a Declarative Language

What is a Declarative Language? Explain why SQL is considered one.

Explanation: A Declarative Language is a type of programming language where the programmer specifies what they want to achieve (the desired result), rather than specifying how to achieve it (the step-by-step procedure or algorithm).

Why SQL is Declarative: When writing a SQL query, we specify the data we want to retrieve (e.g., rows meeting certain conditions). The database system is then free to choose the most efficient algorithm or execution plan to evaluate the query internally, without the programmer needing to define that process.

SQL Query Practice on the Scores Relation

Questions 6–8 refer to the Scores(Team, Day, Opponent, Runs) relation instance. (Team, Day) is the Primary Key. Team, Day, and Opponent are VARCHAR(10), and Runs is INTEGER.

1y9C7W8Nhp5eMHVpqpL+OSoXG2fX4zVVXTQVYSoa+Uhalt5iNpWDsbq+lAHLSIiIiJSuZTloKswUigiIiIiUh4e6EGLiIiIiFQPxB60iIiISDVFdNAiIiIi1RTRQYuIiIhUU0QHLSIiIlJN+T9Z82rJTdh3mAAAAABJRU5ErkJggg==

Question 6: LIKE and ORDER BY

For this instance of Scores, what is the result of the following SQL query?

SELECT Team FROM Scores WHERE Opponent LIKE '%rs' ORDER BY Team DESC;

Result: [Swallows, Giants, Dragons] (The teams are listed in reverse alphabetical order based on the Team column. Swallows appears twice in the underlying data matching opponents ending in 'rs'.)

Question 7: Aggregation and DISTINCT COUNT

For this instance of Scores, what is the result of the following SQL query?

SELECT Day, COUNT(DISTINCT Runs) AS theCount FROM Scores WHERE Runs IS NOT NULL GROUP BY Day;
DaytheCount
Sunday4
Monday4

(Tuples may appear in any order since there is no ORDER BY clause.)

Question 8: Correlated Subqueries and ANY

For this instance of Scores, what is the result of the following SQL query?

SELECT S1.Day, S1.Team FROM Scores S1 WHERE S1.Runs = ANY ( SELECT S2.Runs FROM Scores S2 WHERE S1.Day = S2.Day AND S1.Team != S2.Team );

(Finds teams that scored the same number of runs as another team on the same day.)

DayTeam
MondayDragons
MondayBay Stars
MondaySwallows
MondayGiants

(Tuples may appear in any order since there is no ORDER BY clause.)

Constraints, Views, and Query Equivalence (True/False)

All TRUE or FALSE questions below refer to the Customers table created by the following statement:

CREATE TABLE Customers (
    cid INTEGER,
    cname VARCHAR(20) UNIQUE,
    type VARCHAR(20) NOT NULL,
    level VARCHAR(20) DEFAULT 'Beginner' NOT NULL,
    age INTEGER,
    PRIMARY KEY (cid)
);

Question 9: COUNT(*) vs. COUNT(Attribute)

Question 9: TRUE or FALSE: The following two SQL queries Q1 and Q2 are equivalent?

Q1: SELECT COUNT(cname) FROM Customers;
Q2: SELECT COUNT(*) FROM Customers;

Answer: FALSE. COUNT(cname) counts only non-NULL values in the cname column. Since cname has a UNIQUE constraint but is not explicitly NOT NULL, it can contain NULL values. COUNT(*) counts all tuples.

Question 10: Legal Aggregation Query

Question 10: TRUE or FALSE: The following is a legal SQL statement?

SELECT type, level, COUNT(cname) FROM Customers WHERE age >= 40 GROUP BY type, level HAVING COUNT(cname) > 1;

Answer: TRUE. This query correctly uses GROUP BY on non-aggregated attributes and filters the resulting groups using HAVING.

Question 11: NOT EXISTS vs. NOT EQUAL ALL

Question 11: TRUE or FALSE: The following two queries Q1 and Q2 are equivalent?

Q1: SELECT c.cid FROM Customers c WHERE NOT EXISTS ( SELECT * FROM Customers c2 WHERE c2.type = c.type AND c2.level = 'Advanced' );
Q2: SELECT c.cid FROM Customers c WHERE c.type != ALL ( SELECT c2.type FROM Customers c2 WHERE c2.level = 'Advanced' );

Answer: TRUE. Both queries find customers whose type is not represented among the types associated with 'Advanced' level customers.

Question 12: Inserting into a View

Question 12: TRUE or FALSE: Assume that we've already executed: CREATE VIEW Skaters AS SELECT cid, cname FROM Customers WHERE type = 'skate'; Then execution of the following statement will always result in an error?

INSERT INTO Skaters(cid, cname, type) VALUES(5678, 'Chou', 'skate');

Answer: TRUE. The view Skaters only includes the attributes cid and cname. Attempting to insert a value for the non-existent attribute type in the view definition will cause an error.

Question 13: UNION vs. UNION ALL

Question 13: TRUE or FALSE: The following two SQL queries Q1 and Q2 are equivalent?

Q1: SELECT DISTINCT age FROM Customers WHERE level = 'Beginner' OR level = 'Advanced';
Q2: ( SELECT DISTINCT age FROM Customers WHERE level = 'Beginner') UNION ALL ( SELECT DISTINCT age FROM Customers WHERE level = 'Advanced');

Answer: FALSE. Q1 uses DISTINCT across the entire result set, eliminating duplicates. Q2 uses UNION ALL, which retains duplicates if an age appears in both subqueries.

Advanced Multi-Table SQL Queries

Assume the following schema and Foreign Key constraints:

  • Persons(personID, personName, personAddress)
  • Actors(actorID, equityJoinDate)
  • ShakespearePlays(playTitle, yearWritten, category)
  • Theaters(theaterID, theaterName)
  • Productions(playTitle, productionNum, theaterID)
  • CastMembers(playTitle, productionNum, actorID)

Foreign Key Constraints:

  • Actors.actorID references Persons.personID.
  • Productions.playTitle references ShakespearePlays.playTitle.
  • Productions.theaterID references Theaters.theaterID.
  • CastMembers.(playTitle, productionNum) references Productions.(playTitle, productionNum).
  • CastMembers.actorID references Actors.actorID.

Question 14: Finding Actors in Comedies

Find the name and address for all actors whose equityJoinDate is February 14, 2003, and who have been cast members in a production of a play that is a comedy (category 'C'). The result attributes should be theName and theAddress. No duplicates should appear.

SELECT DISTINCT
    p.personName AS theName,
    p.personAddress AS theAddress
FROM
    Persons p, Actors a, CastMembers c, ShakespearePlays sp
WHERE
    p.personID = a.actorID
    AND a.actorID = c.actorID
    AND c.playTitle = sp.playTitle
    AND a.equityJoinDate = '2003-02-14'
    AND sp.category = 'C';

Question 15: Theaters Without a Specific Actor

Find the theaterID and theaterName for each Stratford theater (name contains 'Stratford') which has never had a production in which the actor named 'Edwin Booth' was a cast member. No duplicates should appear.

Solution using NOT EXISTS:

SELECT t.theaterID, t.theaterName
FROM Theaters t
WHERE t.theaterName LIKE '%Stratford%'
AND NOT EXISTS (
    SELECT *
    FROM Persons pr, CastMembers c, Productions p
    WHERE pr.personName = 'Edwin Booth'
    AND pr.personID = c.actorID
    AND c.playTitle = p.playTitle
    AND c.productionNum = p.productionNum
    AND p.theaterID = t.theaterID
);

Alternative Solution using NOT IN:

SELECT t.theaterID, t.theaterName
FROM Theaters t
WHERE t.theaterName LIKE '%Stratford%'
AND t.theaterID NOT IN (
    SELECT p.theaterID
    FROM Persons pr, CastMembers c, Productions p
    WHERE pr.personName = 'Edwin Booth'
    AND pr.personID = c.actorID
    AND c.playTitle = p.playTitle
    AND c.productionNum = p.productionNum
);

Question 16: Aggregation and Sorting Productions

For each Shakespeare play that has had at least 5 productions, output the title of the play, the year it was written, and the number of productions. Plays written later should appear first. If years are the same, sort alphabetically by title. Attributes should be playTitle, yearWritten, and prodCount.

SELECT
    sp.playTitle,
    sp.yearWritten,
    COUNT(p.productionNum) AS prodCount
FROM
    ShakespearePlays sp
JOIN
    Productions p ON sp.playTitle = p.playTitle
GROUP BY
    sp.playTitle, sp.yearWritten
HAVING
    COUNT(p.productionNum) >= 5
ORDER BY
    sp.yearWritten DESC,
    sp.playTitle ASC;

Database Terminology and SQL Cheat Sheet

Key Database Definitions

Database
A collection of data, typically describing the activities of one or more related organizations over time.
Database Management System (DBMS)
A software system designed to assist in creating, storing, accessing, and updating a database while maintaining "enterprise qualities" such as transactions, high/continuous availability, performance, and security.
NoSQL ("Not Only SQL")
Systems characterized by simplicity of design and great scale-out, often compromising consistency in favor of scalability.
Relation / Table
The fundamental structure in the Relational Data Model, defined as a structure with rows (tuples) and columns (attributes, fields). Mathematically, a relation is a subset of a Cartesian product of sets.
Stored Relations (Tables)
Relations that contain tuples and are stored in the database.
Relation Schema (Schema)
A sequence of attributes, R(A1, ..., Ak), defining the structure of a relation.
Cartesian Product
Given sets D1, D2, ..., Dk, the Cartesian product D1 × D2 × ... × Dk is the set of all k-tuples (d1, d2, ..., dk) such that each di belongs to Di.
Atomicity
A property of transactions (part of ACID). It implies that a transaction is treated as a single, indivisible unit—it either fully succeeds or fully fails.
Transaction Processing
A process managed by the DBMS covering concurrent access, updates, and crash recovery.
Data Independence
The concept that the relational data model provides a logical view of the data, hiding the physical storage representation.

CSE 180 Midterm Essentials

I. Mathematical & Counting Essentials

  • Data Model Components: A mathematical formalism consisting of (1) structure of the data, (2) operations on the data, (3) constraints on the data.
  • Cartesian Product Size: If Di has ni elements, the Cartesian product D1 × ... × Dk has n1 × ... × nk elements.
  • Maximum Tuples (PK): The maximum number of tuples in a relation R(A,B,C,...) where (A,B,C) is the Primary Key is the product of the domain sizes of the key attributes (e.g., NA × NB × NC).
  • Cross Product Result Size: If relation S1 has N1 tuples and S2 has N2 tuples, SELECT * FROM S1, S2; results in N1 × N2 tuples (e.g., 6 × 6 = 36).
  • Atomicity (ACID): All database modifications in a transaction happen or none happen ("all-or-nothing").

II. Relational Data Model (RDM) and Constraints

  • Relation (Table): A subset of a Cartesian product of sets; rows are tuples and columns are attributes.
  • First Normal Form (1NF): Attributes must be atomic (no arrays or lists).
  • Superkey: A subset of attributes such that no two tuples have the same values for all of them.
  • Key: A minimal superkey.
  • Primary Key (PK): Chosen key; enforced NOT NULL; at most one per table.
  • UNIQUE Constraint: Enforces uniqueness for non-NULL values; allows NULLs; multiple allowed.
  • Foreign Key (FK): Ensures column values in one table exist as a Primary Key in another.
  • Declarative Language: Specifies what the outcome should be, not how to compute it; SQL is declarative.

III. SQL Data Definition Language (DDL)

  • CREATE TABLE (Key): CREATE TABLE R (A1 type PRIMARY KEY, ...) or PRIMARY KEY (A1, A2, ...).
  • CREATE TABLE (FK): FOREIGN KEY (A) REFERENCES R2(B).
  • DEFAULT: Sets a default value if none is provided on insertion.
  • DROP TABLE: DROP TABLE R; removes the table definition and data.
  • ALTER TABLE: ALTER TABLE R ADD A type; or ALTER TABLE R DROP A;.

IV. SQL Data Manipulation Language (DML) — Modification

  • INSERT (Values): INSERT INTO R(A1,...,An) VALUES (v1,...,vn);.
  • INSERT (Subquery): INSERT INTO R(A1,...) SELECT ... FROM ... WHERE ...;.
  • DELETE: DELETE FROM R WHERE <condition>; deletes matching rows; DELETE FROM R; deletes all.
  • UPDATE: UPDATE R SET A1=expr1,... WHERE <condition>;.
  • Modification Semantics: Evaluated on the old database state before producing the new state.

V. SQL Query Structure (SELECT, WHERE, JOIN)

  • Basic Query Form: SELECT [DISTINCT] c1,... FROM R1,... [WHERE condition] [GROUP BY ...] [HAVING condition] [ORDER BY ...];.
  • Execution Order: (1) FROM, (2) WHERE, (3) GROUP BY, (4) HAVING, (5) SELECT, (6) ORDER BY.
  • DISTINCT: Eliminates duplicates.
  • Alias: SELECT movieTitle AS name, ....
  • Implicit Join: Multiple tables in FROM with join condition in WHERE.
  • Explicit Join: FROM R JOIN S ON condition.
  • Pattern Matching: LIKE; % matches any sequence, _ matches one character.
  • ORDER BY: ORDER BY attribute [ASC|DESC].
  • NULL in ORDER BY: Treat NULL as smallest (unless otherwise specified).

VI. Three-Valued Logic and NULL Handling

  • SQL uses three truth values: TRUE, FALSE, UNKNOWN. WHERE only accepts tuples evaluating to TRUE.
  • p AND q truth values: Note that FALSE & UNKNOWN is FALSE, and TRUE & UNKNOWN is UNKNOWN.
  • Key NULL Facts: Most comparisons with NULL result in UNKNOWN. Use A IS NULL or A IS NOT NULL to test directly.

VII. Set Operations (Union-Compatible Relations)

  • UNION (DISTINCT): Combines all tuples, removes duplicates (set semantics).
  • UNION ALL: Keeps duplicates (bag semantics).
  • INTERSECT: Tuples common to both results (set semantics).
  • EXCEPT: Tuples in first result but not second (set difference).
  • EXCEPT ALL: Retains m−n copies if m > n.

VIII. Subqueries and Quantifiers

  • IN / NOT IN: TRUE if a value exists/does not exist in subquery. Example: WHERE e.cert# IN (SELECT m.producerC# ...).
  • EXISTS / NOT EXISTS: TRUE if subquery is non-empty/empty.
  • op ANY (or SOME): TRUE if comparison holds for at least one subquery value (e.g., A > ANY (SELECT B)A > MIN(B)).
  • op ALL: TRUE if comparison holds for all subquery values (e.g., A > ALL (SELECT B)A > MAX(B)).

IX. Aggregation and Grouping

  • COUNT(*): Counts all tuples.
  • COUNT([DISTINCT] A): Counts non-NULL values (distinct if specified).
  • SUM/AVG/MIN/MAX(A): Operates on non-NULL values; ignores NULLs; returns NULL for empty set except COUNT.
  • GROUP BY: Groups tuples by attribute values; non-aggregated SELECT attributes must appear in GROUP BY; NULLs form their own group.
  • HAVING: Filters groups by aggregate condition (e.g., HAVING COUNT(A) >= 5).

Practical SQL DML Examples

The following examples demonstrate common SQL Data Manipulation Language (DML) operations based on an assumed airline schema.

  1. Find the name and email of all frequent flyers.

    SELECT passengerName, email
    FROM Passenger
    WHERE frequentFlyer = TRUE;
  2. List all flights departing from 'JFK' and arriving in 'LAX', showing flight ID, departure, and arrival times.

    SELECT flightID, scheduledDeparture, scheduledArrival
    FROM Flight
    WHERE departureAirport = 'JFK' AND arrivalAirport = 'LAX';
  3. Insert a new passenger into the Passenger table.

    INSERT INTO Passenger(passengerID, passengerName, email, dateOfBirth, frequentFlyer)
    VALUES (101, 'Alice Johnson', '[email protected]', '1985-07-12', TRUE);
  4. Update the ticket price of all Economy class reservations on flight 203 to $350.

    UPDATE Reservation
    SET ticketPrice = 350
    WHERE flightID = 203 AND seatClass = 'Economy';
  5. Delete all check-in records where the bag count is zero.

    DELETE FROM CheckIn
    WHERE bagCount = 0;
  6. Find the total number of passengers who have checked in for each flight.

    SELECT flightID, COUNT(DISTINCT passengerID) AS totalCheckedIn
    FROM CheckIn
    GROUP BY flightID;
  7. List the names of crew members who have flown on every flight departing from 'SFO'.

    SELECT crewName
    FROM CrewMember
    WHERE crewID IN (
        SELECT crewID
        FROM FlightCrewAssignment F
        JOIN Flight FL ON F.flightID = FL.flightID
        WHERE FL.departureAirport = 'SFO'
        GROUP BY crewID
        HAVING COUNT(DISTINCT F.flightID) = (
            SELECT COUNT(*) FROM Flight WHERE departureAirport = 'SFO'
        )
    );
  8. Find all passengers who have NOT paid with a credit card.

    SELECT DISTINCT passengerName
    FROM Reservation
    WHERE paymentMethod <> 'Credit Card';
  9. Show flights along with crew names assigned to them (include flights with no crew).

    SELECT F.flightID, C.crewName
    FROM Flight F
    LEFT JOIN FlightCrewAssignment FCA ON F.flightID = FCA.flightID
    LEFT JOIN CrewMember C ON FCA.crewID = C.crewID;
  10. Find the average ticket price for passengers who checked in and have at least one bag.

    SELECT AVG(R.ticketPrice) AS avgTicketPrice
    FROM Reservation R
    JOIN CheckIn C ON R.reservationID = C.reservationID
    WHERE C.bagCount > 0;
  11. Find the names of passengers who have reservations on flight 101 or flight 102 (UNION).

    SELECT passengerName
    FROM Passenger P
    JOIN Reservation R ON P.passengerID = R.passengerID
    WHERE flightID = 101
    UNION
    SELECT passengerName
    FROM Passenger P
    JOIN Reservation R ON P.passengerID = R.passengerID
    WHERE flightID = 102;
  12. Find passengers who have reservations on both flight 101 and flight 102 (INTERSECT).

    SELECT passengerName
    FROM Passenger P
    JOIN Reservation R ON P.passengerID = R.passengerID
    WHERE flightID = 101
    INTERSECT
    SELECT passengerName
    FROM Passenger P
    JOIN Reservation R ON P.passengerID = R.passengerID
    WHERE flightID = 102;
  13. Find passengers who had a reservation on flight 101 but not on flight 102 (EXCEPT).

    SELECT passengerName
    FROM Passenger P
    JOIN Reservation R ON P.passengerID = R.passengerID
    WHERE flightID = 101
    EXCEPT
    SELECT passengerName
    FROM Passenger P
    JOIN Reservation R ON P.passengerID = R.passengerID
    WHERE flightID = 102;
  14. List all distinct cities where airports are located.

    SELECT DISTINCT city
    FROM Airport;
  15. Find all passengers whose email ends with 'gmail.com'.

    SELECT passengerName, email
    FROM Passenger
    WHERE email LIKE '%gmail.com';
  16. Find flights that have unknown actual arrival times.

    SELECT flightID, actualArrival
    FROM Flight
    WHERE actualArrival IS NULL;
  17. Find passengers who either have no frequent flyer status or have an unknown date of birth.

    SELECT passengerName
    FROM Passenger
    WHERE frequentFlyer = FALSE OR dateOfBirth IS NULL;
  18. Count the number of flights each crew member has been assigned to. Include crew with zero assignments.

    SELECT C.crewName, COUNT(FCA.flightID) AS totalFlights
    FROM CrewMember C
    LEFT JOIN FlightCrewAssignment FCA ON C.crewID = FCA.crewID
    GROUP BY C.crewName;
  19. Find all reservations where the ticket price is above the average ticket price (Subquery).

    SELECT reservationID, passengerID, ticketPrice
    FROM Reservation
    WHERE ticketPrice > (SELECT AVG(ticketPrice) FROM Reservation);
  20. List passengers who have reservations in either Economy or Business class, without duplicates.

    SELECT DISTINCT passengerName
    FROM Passenger P
    JOIN Reservation R ON P.passengerID = R.passengerID
    WHERE seatClass = 'Economy' OR seatClass = 'Business';

SQL Pattern Matching: The LIKE Operator

The LIKE operator is used in the WHERE clause to search for a specified pattern in a column.

1. The Percent Sign (%)

The % wildcard matches zero or more characters.

SELECT passengerName
FROM Passenger
WHERE email LIKE '%gmail.com';
  • This matches any email that ends with gmail.com, regardless of what comes before it.

2. The Underscore (_)

The _ wildcard matches exactly one character.

SELECT passengerName
FROM Passenger
WHERE passengerName LIKE 'A_ice';
  • Matches Alice, Amice, etc., but not Annie or Alycey.

3. Character Ranges ([ ])

Character ranges (e.g., [ ]) are supported by some SQL dialects (like SQL Server) but are not standard SQL.

SELECT passengerName
FROM Passenger
WHERE passengerName LIKE '[AB]lice';
  • Matches Alice or Blice.

Related entries: