profile pic
⌘ '
raccourcis clavier

See also refresh on SQL and indexes with views, or this slides on SQL DSL, and final review

closure of FF

For a set of FDs FF, then closure F+F^{+} is the set of all FDs that can be derived from FF

not to be confused with closure of an attribute set

rules

RuleDescription
ReflexivityIf YXY \subseteq X, then XYX \to Y
AugmentationIf XYX \to Y, then XZYZXZ \to YZ for any ZZ
TransitivityIf XYX \to Y and YZY \to Z, then XZX \to Z
UnionIf XYX \to Y and XZX \to Z, then XYZX \to YZ
DecompositionIf XYZX \to YZ, then XYX \to Y and XZX \to Z
Lien vers l'original

closure test

Given attribute set YY and FD set FF, we have YF+Y_F^{+} is the closure of YY relative to FF

Or set of all FDs given/implied by YY

Steps:

  • Start: YF+=Y,F=FY_F^{+}=Y, F^{'}=F
  • While there exists a fFf \in F^{'} s.t LHS(F)YF+\text{LHS}(F) \subseteq Y_F^{+}:
    • YF+=YF+RHS(f)Y_F^{+} = Y_F^{+} \cup \text{RHS}(f)
    • F=FfF^{'} = F^{'} - f
  • End: YBBYF+Y \to B \forall B \in Y_F^{+}
Lien vers l'original

minimal basis

The idea is to remove redundant FDs.

for minimal cover for FDs

  • Right sides are single attributes
  • No FDs can be removed, otherwise FF^{'} is no longer a minimal basis
  • No attribute can be removed from a LEFT SIDE

construction:

  1. decompose RHS to single attributes
  2. repeatedly try to remove a FD to see if the remaining FDs are equivalent to the original set
    • or finFtest whether J=(Ff)+\forall f in F^{'} \mid \text{test whether } J=(F^{'}-f)^{+} is equivalent to F+F^{+}
  3. repeatedly try to remove an attribute from LHS to see if the removed attribute can be derived from the remaining FDs.
Lien vers l'original

Design theory

Keys

KK is a key if KK uniquely determines all of RR and no subset of KK does

K is a superkey for relation RR if KK contains a key of RR

see also: keys

functional dependency

Think of it as ”XYX \to Y holds in RR

convention: no braces used for set of attributes, just ABCABC instead of {A,B,C}\{A,B,C\}

properties

  • splitting/combining
  • trival FDs
  • Armstrong’s Axioms

FD are generalisation of keys

superkey: XRX \to R, must include all the attributes of the relation on RHS

trivial

AAABAABCADABCD\begin{aligned} A &\to A \\ AB &\to A \\ ABC &\to AD \coloneqq ABC \to D \end{aligned}

always hold (right side is a subset)

splitting/combining right side of FDs

XA1A2An holds for R X \to A_{1} A_{2} \ldots A_{n} \text{ holds for R }

when each of XA1X \to A_{1}, XA2X \to A_{2}, …, XAnX \to A_{n} holds for RR

ex: ABCA \to BC is equiv to ABA \to B and ACA \to C

ex: AFA \to F and AGA \to G can be written as AFGA \to FG

Armstrong’s Axioms

Given X,Y,ZX,Y,Z are sets of attributes

rules

RuleDescription
ReflexivityIf YXY \subseteq X, then XYX \to Y
AugmentationIf XYX \to Y, then XZYZXZ \to YZ for any ZZ
TransitivityIf XYX \to Y and YZY \to Z, then XZX \to Z
UnionIf XYX \to Y and XZX \to Z, then XYZX \to YZ
DecompositionIf XYZX \to YZ, then XYX \to Y and XZX \to Z

dependency inference

ACA \to C is implied by {AB,BC}\{A \to B, B \to C\}

transitivity

example: Key

List all the keys of R(A,B,C,D)R(A,B,C,D) with the following FDs:

  • BCB \to C
  • BDB \to D

sol:

BC and BD(given)BCD(Union)ABACD(Augmentation)ABABCD(Reflexivity and Union)\begin{aligned} B \to C &\text{ and } B \to D &(\text{given})\\ B &\to CD &(\text{Union})\\ AB &\to ACD &(\text{Augmentation})\\ AB &\to ABCD &(\text{Reflexivity and Union})\\ \end{aligned}

closure test

Given attribute set YY and FD set FF, we have YF+Y_F^{+} is the closure of YY relative to FF

Or set of all FDs given/implied by YY

Steps:

  • Start: YF+=Y,F=FY_F^{+}=Y, F^{'}=F
  • While there exists a fFf \in F^{'} s.t LHS(F)YF+\text{LHS}(F) \subseteq Y_F^{+}:
    • YF+=YF+RHS(f)Y_F^{+} = Y_F^{+} \cup \text{RHS}(f)
    • F=FfF^{'} = F^{'} - f
  • End: YBBYF+Y \to B \forall B \in Y_F^{+}

minimal basis

The idea is to remove redundant FDs.

for minimal cover for FDs

  • Right sides are single attributes
  • No FDs can be removed, otherwise FF^{'} is no longer a minimal basis
  • No attribute can be removed from a LEFT SIDE

construction:

  1. decompose RHS to single attributes
  2. repeatedly try to remove a FD to see if the remaining FDs are equivalent to the original set
    • or finFtest whether J=(Ff)+\forall f in F^{'} \mid \text{test whether } J=(F^{'}-f)^{+} is equivalent to F+F^{+}
  3. repeatedly try to remove an attribute from LHS to see if the removed attribute can be derived from the remaining FDs.

Schema decomposition

goal: avoid redundancy and minimize anomalies (update and deletion) w/o losing information

One can also think of projecting FDs as geometric projections within a given FDs space

good properties to have

  • lossless join decomposition (should be able to reconstruct after decomposed)
  • avoid anomalies (redundant data)
  • preservation: (F1F2Fn)+=F+(F_{1} \cup F_{2} \cup \ldots \cup F_n)^{+} = F^{+}

A lossy decomposition results in the reconstruction of components to include additional information that is not in the original constructions

how can we test for losslessness?

A binary decomposition of R=(R,F)R=(R,F) into R1=(R1,F1)R_{1}=(R_{1},F_{1}) and R2=(R2,F2)R_{2}=(R_{2},F_{2}) is lossless iff:

  1. (R1R2)R1(R_{1} \cap R_{2}) \to R_{1} is the F+F^{+}
  2. (R1R2)R2(R_{1} \cap R_{2}) \to R_{2} is the F+F^{+}

if R1R2R_{1} \cap R_{2} form a superkey of either R1R_{1} or R2R_{2}, then decomposition of RR is lossless

Projection

  • Starts with Fi=F_i = \emptyset
  • For each subset X of RiX \text{ of } R_i
    • Compute X+X^{+}
    • For each attribute AX+A \in X^{+}
      • If AA in RiR_i: add XAX \to A to FiF_i
  • Compute minimal basis of FiF_i

Normal forms

BCNF3NF2NF1NF\text{BCNF} \subseteq 3\text{NF} \subseteq 2\text{NF} \subseteq 1\text{NF}
Normal FormDefinitionKey RequirementsExample of ViolationHow to Fix
First Normal Form (1NF)All columns contain atomic values and there are no repeating groups.- Each cell holds a single value (atomicity)
- No repeating groups or arrays
A column storing multiple phone numbers in a single cell (e.g., “123-4567, 234-5678”).Split the values into separate rows or columns so each cell is atomic.
Second Normal Form (2NF)A 1NF table where every non-key attribute depends on the whole of a composite key.- Already in 1NF
- No partial dependencies on a composite primary key
A table with a composite primary key (e.g., StudentID, CourseID) where a non-key attribute (e.g., StudentName) depends only on StudentID.Move attributes that depend on only part of the key into a separate table.
Third Normal Form (3NF)A 2NF table with no transitive dependencies.- Already in 2NF
- No transitive dependencies (non-key attributes depend only on the key, not on other non-key attributes)
A table where AdvisorOffice depends on AdvisorName, which in turn depends on StudentID.Put attributes like AdvisorName and AdvisorOffice in a separate Advisor table keyed by AdvisorID.
Boyce-Codd Normal Form (BCNF)A stronger version of 3NF where every determinant is a candidate key.- For every functional dependency X → Y, X must be a candidate keyA table where Professor → Course but Professor is not a candidate key.Decompose the table so that every functional dependency has a candidate key as its determinant.

1NF

no multi-valued attributes allowed

idea: think of storing a list of a values in an attributes

counter: Course(name, instructor, [student, email]*)

2NF

non-key attributes depend on candidate keys

idea: consider non-key attribute AA, then there exists an FD XX s.t. XAX \to A and XX is a candidate key

Second normal form, hwere AuthorName is dependent on AuthorID
Second normal form, hwere AuthorName is dependent on AuthorID

3NF

non-prime attribute depend only on candidate keys

idea: consider FD XAX \to A, then either XX is a superkey, or AA is prime (part of a key)

counter: studiostudioAddr\text{studio} \to \text{studioAddr}, where studioAddr depends on studio which is not a candidate key

Three normal form counter example
Three normal form counter example

theorem

It is always possible to convert a schema to lossless join, dependency-preserving 3NF

what you get from 3NF

  • Lossless join
  • dependency preservation
  • anomalies (doesn’t guarantee)

Boyce-Codd normal form (BCNF)

on additional restriction over 3NF where all non-trivial FDs have superkey LHS

theorem

We say a relation RR is in BCNF if XAX \to A is a non-trivial FD that holds in RR, and XX is a superkey 1

what you get from BCNF

  • no dependency preservation (all original FDs should be satisfied)
  • no anomalies
  • Lossless join

decomposition into BCNF

relation RR with FDs FF, look for a BCNF violation XYX \to Y (XX is not a superkey)

  • Compute X+X^{+}
    • find X+X all attributes X^{+} \neq X \neq \text{ all attributes } (XX is a superkey)
  • Replace RR by relations with
    • R1=X+R_{1} = X^{+}
    • R2=R(X+X)=RX+XR_{2} = R - (X^{+} - X) = R - X^{+} \cup X
  • Continue to recursively decompose the two new relations
  • Project given FDs FF onto the two new relations.

Remarque

  1. means AA is not contained in XX

Lien vers l'original

Relational Algebra

OperatorOperationExample
σC\sigma_CSelectionσA=10(R)\sigma_{A=10}(R)
πL\pi_LProjectionπA,B(R)\pi_{A,B}(R)
×\timesCross-ProductR1×R2R_1 \times R_2
\bowtieNatural JoinR1R2R_1 \bowtie R_2
C\bowtie_CTheta JoinR1R1.A=R2.AR2R_1 \bowtie_{R_1.A=R_2.A} R_2
ρR\rho_RRenameρS(R)\rho_S(R)
δ\deltaEliminate Duplicatesδ(R)\delta(R)
τ\tauSort Tuplesτ(R)\tau(R)
γL\gamma_LGrouping & AggregationγA,AVG(B)(R)\gamma_{A,AVG(B)}(R)

selection

idea: picking certain row

R1σC(R2)R_{1} \coloneqq \sigma_C(R_{2})

CC is the condition refers to attribute in R2R_{2}

def: R1R_{1} is all those tuples of R2R_{2} that satisfies C

projection

idea: picking certain column

R1πL(R2)R_{1} \coloneqq \pi_L(R_{2})

LL is the list of attributes from R2R_{2}

R=[AB1234]πA+BC,AA1,AA2(R)=[CA1A2311733]\begin{aligned} R &= \begin{bmatrix} A & B \\ 1 & 2 \\ 3 & 4 \end{bmatrix} \\[8pt] \pi_{A+B \rightarrow C, A \rightarrow A_1, A \rightarrow A_2}(R) &= \begin{bmatrix} C & A_1 & A_2 \\ 3 & 1 & 1 \\ 7 & 3 & 3 \end{bmatrix} \end{aligned}

products

R3R1×R2R_{3} \coloneqq R_{1} \times R_{2}

theta-join

R3R1CR2R_{3} \coloneqq R_{1} \bowtie_C R_{2}

idea: product of R1R_{1} and R2R_{2} then apply σC\sigma_C to results

think of AΘBA \Theta B where Θ=,<, etc.\Theta \coloneqq =, <, \text{ etc.}

natural join

R3R1R2R_{3} \coloneqq R_{1} \bowtie R_{2}
  • equating attributes of the same name
  • projecting out one copy of each pair of equated attributes

renaming

R1ρR1(A1,,An)(R2)R_{1} \coloneqq \rho_{R_{1}(A_{1},\ldots,A_n)}(R_{2})

set operators

union compatible

two relations are said to be union compatible if they have the same set of attributes and types (domain) of the attributes are the same

i.e: Student(sNumber, sName) and Course(cNumber, cName) are not union compatible

bags

definition

modification of a set that allows repetition of elements

Think of {1,2,3,1,1}\{1,2,3,1,1\} is a bags, whereas {1,2,3}\{1,2,3\} is also considered a bag.

in a sense, {1,2,3}\{1,2,3\} happens to also be a set.

Lien vers l'original

Set Operations on Relations

For relations R1R_1 and R2R_2 that are union compatible, here’s how many times a tuple tt appears in the result:

OperationSymbolResult (occurrences of tuple tt)
Union\cupm+nm + n
Intersection\capmin(m,n)\texttt{min}(m,n)
Difference-max(0,mn)\texttt{max}(0, m-n)

where mm is the number of times tt appears in R1R_1 and nn is the number of times it appears in R2R_2.

sequence of assignments

precedence of relational operators:

σπρ×\begin{aligned} &\sigma \quad \pi \quad \rho \\[8pt] & \times \quad \bowtie \\[9pt] & \cap \\ &\cup \quad - \end{aligned}

expression tree

extended algebra

δ\delta: eliminate duplication from bags

τ\tau: sort tuples

γL(R)\gamma_{L}(R) grouping and aggregation

outerjoin: avoid dangling tuples

duplicate elimination

δ(R)\delta(R)

Think of it as converting it to set

sorting

τL(R)\tau_L(R)

with LL is a list of some attributes of RR

basically for ascending order, for descending order then use τL,DESC(R)\tau_{L, \text{DESC}}(R)

applying aggregation

or γL(R)\gamma_{L}(R)

  • group RR accordingly to all grouping attributes on LL

  • per group, compute AGG(A) for each aggrgation on LL

  • result has one tuple for each group: grouping attributes and aggregations

aggregation is applied to an entire column to produce a single results

outerjoin

essentially padding missing attributes with NULL

bag operations

remember that bag and set operations are different

set union is idempotent, whereas bags union is not.rightarrow

bag union: {1,2,1}{1,1,2,3,1}={1,1,1,1,1,2,2,3}\{1,2,1\} \cup \{1,1,2,3,1\} = \{1,1,1,1,1,2,2,3\}

bag intersection: {1,2,1,1}{1,2,1,3}={1,1,2}\{1,2,1,1\} \cap \{1,2,1,3\} = \{1,1,2\}

bag difference: {1,2,1,1}{1,2,3}={1,1}\{1,2,1,1\} - \{1,2,3\} = \{1,1\}

Lien vers l'original

Transaction

see also concurrency

A sequence of read/write

concurrency

inter-leaved processing: concurrent exec is interleaved within a single CPU.

parallel processing: process are concurrently executed on multiple CPUs.

"\\usepackage{tikz}\n\\usetikzlibrary{arrows.meta, positioning}\n\n\\begin{document}\n\\begin{tikzpicture}[font=\\small, node distance=1.5cm, >=latex]\n\n%------------------------------\n% Interleaved Processing\n%------------------------------\n\n% Place the title higher up\n\\node[font=\\bfseries, align=center] (interleavedTitle) at (5, 1) {Interleaved (Time-Sliced) Processing};\n\n% Draw the timeline axis lower down\n\\draw[->] (-0.5,2) -- (10,2) node[below]{Time};\n\n% Processes above the time line\n% P1 intervals\n\\draw[fill=blue!30] (0,2.4) rectangle (2,3) node[midway]{P1};\n\\draw[fill=blue!30] (4,2.4) rectangle (6,3) node[midway]{P1};\n\\draw[fill=blue!30] (8,2.4) rectangle (9.5,3) node[midway]{P1};\n\n% P2 intervals\n\\draw[fill=red!30] (2,2.4) rectangle (4,3) node[midway]{P2};\n\\draw[fill=red!30] (6,2.4) rectangle (8,3) node[midway]{P2};\n\n%------------------------------\n% Parallel Processing\n%------------------------------\n\\begin{scope}[yshift=-1cm]\n\n% Title for parallel processing\n\\node[font=\\bfseries, align=center] (parallelTitle) at (5,-3.2) {Parallel (Concurrent) Processing};\n\n% Timelines for parallel processing\n\\draw[->] (-0.5,-1) -- (10,-1) node[below]{Time};\n\\draw[->] (-0.5,-2.5) -- (10,-2.5) node[below]{Time};\n\n% Process 1 on core 1 (above the -1 line)\n\\draw[fill=blue!30] (0,-0.6) rectangle (9.5,-0.0) node[midway]{P1 running on Core 1};\n\n% Process 2 on core 2 (above the -2.5 line)\n\\draw[fill=red!30] (0,-2.1) rectangle (9.5,-1.5) node[midway]{P2 running on Core 2};\n\\end{scope}\n\n\\end{tikzpicture}\n\\end{document}"Interleaved(Time-Sliced)ProcessingTimeP1P1P1P2P2Parallel(Concurrent)ProcessingTimeTimeP1runningonCore1P2runningonCore2
source code

ACID

  • atomic: either performed in its entirety (DBMS’s responsibility)
  • consistency: must take database from consistent state XX to YY
  • isolation: appear as if they are being executed in isolation
  • durability: changes applied must persist, even in the events of failure

Schedule

Venn diagram for schedule
Venn diagram for schedule

definition

a schedule SS of nn transaction T1,T2,,TnT_{1}, T_{2}, \ldots, T_{n} is an ordering of operations of the transactions subject to the constrain that

For all transaction TiT_i that participates in SS, the operations of TiT_i in SS must appear in the same order in which they occur in TiT_i

For example:

Sa:R1(A),R2(A),W1(A),W2(A),Abort1,Commit2;S_a: R_{1}(A),R_{2}(A),W_{1}(A),W_{2}(A),\text{Abort1},\text{Commit2};
  • serial schedule: does not interleave the actions of different transactions
  • equivalent schedule: effect of executing any schedule are the same

serialisable schedule

a schedule that is equvalent to some serial execution of the set of committed transactions

serial

serialisable schedule

Note that this is not a serial schedule, given there are interleaved operations.
Note that this is not a serial schedule, given there are interleaved operations.

S:R1(A),W1(A),R2(A),W2(A),R1(B),W1(B),R2(B),W2(B)S:R_1(A),W_1(A), R_2(A), W_2(A), R_1(B), W_1(B), R_2(B), W_2(B)

conflict

operations in schedule

said to be in conflict if they satisfy all of the following:

  1. belong to different operations
  2. access the same item AA
  3. at least one of the operations is a write(A)
Concurrency IssueDescriptionAnnotation
Dirty ReadReading uncommitted dataWR
Unrepeatable ReadT2 changes item AA that was previously read by T1, while T1 is still in progressRW
Lost UpdateT2 overwrites item AA while T1 is still in progress, causing T1’s changes to be lostWW

conflict serialisable schedules

Two schedules are conflict equivalent if:

  • involves the same actions of the same transaction
  • every pair of conflicting actions is ordered the same way

Schedule SS is conflict serialisable if SS is conflict equivalent to some serial schedule

If two schedule S1S_{1} and S2S_{2} are conflict equivalent then they have the same effect S1S2S_{1} \leftrightarrow S_{2} by swapping non-conflicting ops

Every conflict serialisable schedule is serialisable

on conflict serialisable

only consider committed transaction

schedule with abort

Note that this schedule is unrecoverable if T2 committed
Note that this schedule is unrecoverable if T2 committed

However, if T2 did not commit, we abort T1 and cascade to T2

need to avoid cascading abort

  • if TiT_i writes an object, then TjT_j can read this only after TiT_i commits

recoverable and avoid cascading aborts

Recoverable: a XactX_\text{act} commits only after all XactX_\text{act} it depends on commits.

ACA: idea of aborting a XactX_\text{act} can be done without cascading the abort to other XactX\text{act}

ACA implies recoverable, not vice versa

precedent graph test

is a schedule conflict-serialisable?

  • build a graph of all transactions TiT_i
  • Edge from TiT_i to TjT_j if TiT_i comes first, and makes an action that conflicts with one of TjT_j

if graphs has no cycle then it is conflict-serialisable

strict

if a value written by TiT_i is not read or overwritten by another TjT_j until TiT_i abort/commit

Are recoverable and ACA

Lock-based concurrency control

think of mutex or a lock mechanism to control access to a data object

transaction must release the lock

notation

Li(A) means TiT_i acquires lock for A, where as Ui(A) releases lock for A

Lock TypeNoneSX
NoneOKOKOK
SOKOKConflict
XOKConflictConflict

lock compatibility matrix

overhead due to delays from blocking; minimize throughput

  • use smallest sized object
  • reduce time hold locks
  • reduce hotspot

shared locks

ST(A)S_T(A) for reading

exclusive lock

XT(A)X_T(A) for write/read

strict two phase locking (Strict 2PL)

  • Each XactX_\text{act} must obtain a S lock on object before reading, and an X lock on object before writing
  • All lock held by transaction will be released when transaction is completed

only schedule those precedence graph is acyclic

recoverable and ACA

Example:

T1T2
L(A);
R(A), W(A)
L(A); DENIED...
L(B);
R(B), W(B)
U(A), U(B)
Commit;
...GRANTED
R(A), W(A)
L(B);
R(B), W(B)
U(A), U(B)
Commit;

implication

  • only allow safe interleavings of transactions
  • T1T_{1} and T2T_{2} access different objects, then no conflict and each may proceed
  • serial action

two phase locking (2PL)

lax version of strict 2PL, where it allow XactX_\text{act} to release locks before the end

  • a transaction cannot request additional lock once it releases any lock

implication

  • all lock requests must precede all unlock request
  • ensure conflict serialisability
  • two phase transaction growing phase, (or obtains lock) and shrinking phase (or release locks)

isolation

Isolation LevelDescription
READ UNCOMMITTEDNo read-lock
READ COMMITTEDShort duration read locks
REPEATABLE READLong duration read/lock on individual items
SERIALIZABLEAll locks long durations

Deadlock

cycle of transactions waiting for locks to be released by each other

usually create a wait-for graph to detect cyclic actions

  • wait-die: lower transactions never wait for higher priority transactions

  • wound-wait: TiT_i is higher priority than TjT_j then TjT_j is aborted and restarts later with same timestamp, otherwise TiT_i waits

Lien vers l'original