Relational Database Concepts: Keys, Views, and SQL
Classified in Computers
Written at on English with a size of 14.08 KB.
Relational Keys
Superkey: Set of attributes with all those needed to ID a particular row. Candidate key: Minimal superkey; removing any attribute means it is no longer a superkey; can be multiple per relation; "UNIQUE". Strict: Candidate key + at least 1 extra attribute. Primary: Default candidate key, reference for foreign keys; automatically unique; "PRIMARY KEY". Foreign: "Logical pointer" in a dependent relation that refers to the candidate key in the parent relation (not always primary). Natural: Represent conceptual uniqueness constraints external to the DB, e.g., name, address. Artificial: Introduced solely for the DB, no external meaning, e.g., auto-generated ID; single attribute, simple data type. Surrogate: If artificial is used as the primary key when a natural key already exists.
VIEW: Virtual relation stored as a definition rather than a concrete set of tuples; contents are computed when the view is used within an SQL statement. A mechanism to limit how much data certain users can access by presenting only necessary information without details in underlying relations, such as passwords. CREATE VIEW name AS <SQL query expression>.
Three-valued logic: Three-valued logic uses different results for logical expressions: true if the condition holds, false if the condition doesn't hold, unknown if the comparison includes null. Three-valued logic is needed because of possible NULL values in databases and because a logical condition, to be decidable, needs ALL values to be known.
Entities
Weak: Entity type that doesn't have a primary key; exclusive "part-of relationship," e.g., a child from parents, payment of a loan, room in a hotel. Discriminator (partial key): Set of attributes that distinguishes among all types of a given weak entity type related to the owning (master) entity. Identifier (key): Discriminator + key of the identifying relationship (give CREATE TABLE example from HW7).
IS-A Hierarchy
Aggregation: Relation R2 also part of R1; eliminates redundancy --> don't need a new instance of employee/project relation for each advisor; the project is still valid even without an advisor.
Relational algebra: Selection (σ) selects a subset of rows from a relation (e.g., year = 2010); Projection (π) selects only specified attributes from a relation (e.g., name, year); Cross product: Combines every combination of tuples from two relations. Join: Combines matching tuples from two relations. Conditional join:. Natural join: Joins along common attributes where values match (columns must have the same name). Union: All tuples in R1 and/or R2. Intersect: All tuples in R1 + R2. Except: All tuples in R1, not R2. SQL x RA:.
Key constraints: Specified using PRIMARY KEY and UNIQUE. Domain constraints: The field must be of the proper data domain (e.g., integer); NOT NULL; DEFAULT <default_value>. REFERENTIAL INTEGRITY: Foreign keys must reference values that exist; ON DELETE/UPDATE --> NO ACTION, CASCADE, SET NULL/SET DEFAULT. Semantic integrity constraints: Enforced across multiple attributes in one row in one relation: e.g., CONSTRAINT S_not_L CHECK (lecturer <> supervisor). ALL STATIC CONSTRAINTS.
Assertions: Predicate expressing the condition for the DB to always satisfy, across several tables; when an assertion is made, the system tests it for validity and continues testing with each update that may violate it. Ex: for all students, the sum of all marks for a course <= 100. CREATE ASSERTION mark_constraint CHECK ( NOT EXISTS ( SELECT sid FROM Assessment GROUP BY sid, uos_code HAVING sum(mark) > 100)); To enforce a disjoint: CREATE ASSERTION AssertDisjoint CHECK ( NOT EXISTS (SELECT call_num FROM Book INTERSECT SELECT call_num FROM DVD)); To ensure every rider of class 'H' vehicle is at least 19yo: CREATE ASSERTION LegalAgeHeavyVehicle CHECK ( NOT EXISTS (SELECT 1 FROM AllowedDriver JOIN Person ON (driver=licenseNum) JOIN Vehicle ON (vehicle=regNum) WHERE vehicleClass = 'H' AND age(dateOfBirth) < interval '19 years'));
Trigger: Bound to just a single table; can check for a dynamic integrity constraint --> predicates on database state changes; e.g., needed to enforce that an employee can have at most 3 supervisors or that employees must be at least 25 years old to start supervising other employees.
DB Access Control
1. Mandatory access control (authentication): Every connection must login with login and password; CREATE USER or CREATE LOGIN commands, etc. 2. Discretionary access control (authorization): Access rights/privileges for objects (tables + views), and mechanisms for granting/revoking privileges for users; the creator of a table/view automatically gets all privileges on it; DBMS tracks gain/loss of privileges, only fulfills requests from users with necessary associated privileges; privilege on view != privilege on the underlying relation!
GRANT privileges ON table TO user_list *SELECT, INSERT, DELETE, UPDATE, REFERENCES.
GRANT insert, select ON student TO bryn --> bryn can query students or insert tuples into it.
GRANT delete ON student TO jon WITH GRANT OPTION --> Jon may delete tuples and can authorize others to do so as well.
GRANT update(title) ON uos TO dustin --> dustin can update ONLY the title field of uos tuples.
REVOKE update ON uos FROM jon --> revoke propagates to anyone else who received this update privilege solely from Jon.
Functional Dependencies
Problems with redundancy: Update anomaly: Can occur if redundant copies of information are stored, and one copy of data is updated, but other(s) is not. Delete anomaly: Can occur if multiple FDs are stored in one large relation; deleting one tuple wipes out many FDs. Insert anomaly: If there are too many FDs in one table, it is difficult to input one record that contains all attributes in one.
Table Decomposition
Normalization --> decompose the table along functional dependency(s) to minimize redundancy and get the highest normal form. Lossless join: Preserve information so that the natural join among decomposed tables returns the full original table. Dependency preserving: FDs should be preserved in decomposition such that LHS is with RHS in each relation.
Determining candidate keys: Make LHS/Both/RHS table; min candidate key = all LHS + any keys not on either side of FDs; if some FDs contain multiple attributes on LHS, it might require extra attributes in the candidate key.
Normal forms: 1NF = unique rows, atomic attributes; 2NF = no partial dependencies; 3NF = no transitive dependencies; BCNF = no reflective dependencies.
Interactive vs. Non-interactive SQL
Interactive: SQL statements input from the terminal, DBMS outputs to the screen; output is limited to relations; the amount of data returned isn't known in advance; limited expressive power (SQL not Turing-complete). Non-interactive: SQL statements are included in an application program written in the host language (e.g., C, Java, Python, etc.).
Server-side application development: Triggers, stored procedures.
1-Tier Architecture: Centralized system: Client computer --> [Presentation Services][Application services] --> DBMS (SQL via API) e.g., any app with an integrated DB --> MS access systems, SQLite, esp. Smartphone apps/games such as a Chess app on a tablet that stores a history of all moves from current/previous games, along with player details.
2-Tier Architecture: Client-server model: Client computer + presentation/app services = client tier; communicate with the server tier (DBMS) via network communication; multiple concurrent users e.g., restaurant point of service (POS) system that tracks bookings/orders/payments has many terminals throughout the restaurant, all of which retrieve and update data via SQL from the central DB in the main office.
3-Tier Architecture: Client machine + presentation services = presentation tier ----> app/web server = application tier -----> DBMS and DB server = data tier; e.g., social media smartphone apps; email web app, which is accessed via a web browser (presentation tier), email contents are stored within a DB (data tier).
SQL in Application Code
Statement-level interface (SLI): Embed SQL in the host language; the app program = mix of host language statements + SQL statements/directives.
Call-level interface: Create a special API to call SQL commands (ODBC, JDBC, PHP-PDO, etc.); SQL statements are passed as arguments to the host language (library); must handle incorrect SQL syntax, integrity constraint violations, network failures.
Cursors: Point to a single row in the resulting set of tuples and fetch attribute values from the current row, then move to the next row in the result set; prevent impedance mismatch wherein the host language deals with variables, but SQL deals with sets of tuples.
DB Application APIs (application programming interface).
Supports: creating a connection to DB, executing statements (queries, updates, etc.), retrieving results of queries, handling null values, handling database exceptions/errors.
JDBC also supports metadata retrieval --> e.g., querying about relations present in the DB; names and types of relation attributes.
Application <---> Driver manager <----> SQL Server Driver <----> SQL Server Database.
JDBC
1. Open a connection: Connection conn = DriverManager.getConnection(DB_URL, userID, pwd).
2. Create a "statement" object: Statement stmt = conn.createStatement();
3. Execute queries using the statement object to send queries: stmt.executeQuery(query).
4. Use cursors wrapped as ResultSet objects to fetch results: ResultSet rs = stmt.executeQuery(query);
5. Exception mechanism to handle errors (catch SQLexception e) {...};
Prepared statements: Allow stmt to be executed with host variables AFTER the query plan has been executed; more flexible and more secure against SQL injection; written in SQL, with placeholders for parameters to be specified later; can re-use the same execution plan for different sets of parameters, saving on query costs.
SQL injection: Infiltrating an SQL database with your own SQL commands; used to execute SQL statements with elevated privileges or to impersonate another user.
Without a direct DB connection (e.g., web app): Injecting SQL via unchecked user input; exploiting buffer overflows --> replace executable code with malicious code; output --> attacker's screen.
With a direct DB connection: SQL injection or buffer overflows in built-in or user-defined procedures.
Stored procedures: Can be used as a central code base in the DB server, which can be shared by multiple applications; can reduce network traffic (both in terms of the number of requests and the size of data transmitted) between client programs and DB servers; do NOT guarantee consistent transactions.
Transactions
Atomic: Transactions execute in an 'all-or-nothing' fashion: either the entire logical unit-of-work is executed, or none of it is.
Consistent: A transaction transforms the database from one consistent state to a new consistent DB state; maintains all integrity/domain constraints, cascades, triggers.
Isolated: Transactions are executed such that they do not see the effects of any other concurrent transaction ---> concurrency control, interleaved execution schedule.
Durable: Changes of a successfully committed transaction are permanent and will not be lost by any subsequent system failure; can rollback to the most recent successful commit if the system fails mid-transaction.
Commit: A transaction requests to finish and make all its changes permanent; though no guarantee to succeed.
Rollback: All data changes of the transaction are undone; DB left in the same consistent state as when the transaction started.
Syntax: BEGIN TRANSACTION; if(cond) COMMIT; else ROLLBACK generally only need one of each per transaction.
Anomalies with Interleaved Execution
Dirty read: Reading data that's been updated but not yet committed (Write-Read conflict).
T1: R(A), W(A), Abort.
T2: R(A), W(A), Commit.
Unrepeatable read: Reading two or more different versions of a value (Read-Write conflict); querying the same values twice yields different results.
T1: R(A), R(A), W(A), commit.
T2: R(A), W(A), commit.