introduction to databases

closure of $F$

For a set of FDs $F$ , then closure $F^{+}$ is the set of all FDs that can be derived from $F$

not to be confused with closure of an attribute set

rules

Rule	Description
Reflexivity	If $Y \subseteq X$ , then $X \to Y$
Augmentation	If $X \to Y$ , then $XZ \to YZ$ for any $Z$
Transitivity	If $X \to Y$ and $Y \to Z$ , then $X \to Z$
Union	If $X \to Y$ and $X \to Z$ , then $X \to YZ$
Decomposition	If $X \to YZ$ , then $X \to Y$ and $X \to Z$

closure test

Given attribute set $Y$ and FD set $F$ , we have $Y_F^{+}$ is the closure of $Y$ relative to $F$

Or set of all FDs given/implied by $Y$

Steps:

Start: $Y_F^{+}=Y, F^{'}=F$
While there exists a $f \in F^{'}$ $f \in F^{^{'}}$ s.t $\text{LHS}(F) \subseteq Y_F^{+}$ $LHS (F) \subseteq Y_{F}^{+}$ :
- $Y_F^{+} = Y_F^{+} \cup \text{RHS}(f)$
- $F^{'} = F^{'} - f$
End: $Y \to B \forall B \in Y_F^{+}$

minimal basis

The idea is to remove redundant FDs.

for minimal cover for FDs

Right sides are single attributes

No FDs can be removed, otherwise $F^{'}$ is no longer a minimal basis

No attribute can be removed from a LEFT SIDE

construction:

decompose RHS to single attributes
repeatedly try to remove a FD to see if the remaining FDs are equivalent to the original set
- or $\forall f in F^{'} \mid \text{test whether } J=(F^{'}-f)^{+}$ is equivalent to $F^{+}$
repeatedly try to remove an attribute from LHS to see if the removed attribute can be derived from the remaining FDs.

Keys

$K$ is a key if $K$ uniquely determines all of $R$ and no subset of $K$ does

K is a superkey for relation $R$ if $K$ contains a key of $R$

see also: keys

functional dependency

Think of it as ” $X \to Y$ holds in $R$ ”

convention: no braces used for set of attributes, just $ABC$ instead of $\{A,B,C\}$

properties

splitting/combining

trival FDs

Armstrong’s Axioms

FD are generalisation of keys

superkey: $X \to R$ , must include all the attributes of the relation on RHS

trivial

\begin{aligned} A &\to A \\ AB &\to A \\ ABC &\to AD \coloneqq ABC \to D \end{aligned}

always hold (right side is a subset)

splitting/combining right side of FDs

X \to A_{1} A_{2} \ldots A_{n} \text{ holds for R }

when each of $X \to A_{1}$ , $X \to A_{2}$ , …, $X \to A_{n}$ holds for $R$

ex: $A \to BC$ is equiv to $A \to B$ and $A \to C$

ex: $A \to F$ and $A \to G$ can be written as $A \to FG$

Armstrong’s Axioms

Given $X,Y,Z$ are sets of attributes

rules

Rule	Description
Reflexivity	If $Y \subseteq X$ , then $X \to Y$
Augmentation	If $X \to Y$ , then $XZ \to YZ$ for any $Z$
Transitivity	If $X \to Y$ and $Y \to Z$ , then $X \to Z$
Union	If $X \to Y$ and $X \to Z$ , then $X \to YZ$
Decomposition	If $X \to YZ$ , then $X \to Y$ and $X \to Z$

dependency inference

$A \to C$ is implied by $\{A \to B, B \to C\}$

transitivity

example: Key

List all the keys of $R(A,B,C,D)$ with the following FDs:

$B \to C$
$B \to D$

sol:

\begin{aligned} B \to C &\text{ and } B \to D &(\text{given})\\ B &\to CD &(\text{Union})\\ AB &\to ACD &(\text{Augmentation})\\ AB &\to ABCD &(\text{Reflexivity and Union})\\ \end{aligned}

closure test

Given attribute set $Y$ and FD set $F$ , we have $Y_F^{+}$ is the closure of $Y$ relative to $F$

Or set of all FDs given/implied by $Y$

Steps:

Start: $Y_F^{+}=Y, F^{'}=F$
While there exists a $f \in F^{'}$ $f \in F^{^{'}}$ s.t $\text{LHS}(F) \subseteq Y_F^{+}$ $LHS (F) \subseteq Y_{F}^{+}$ :
- $Y_F^{+} = Y_F^{+} \cup \text{RHS}(f)$
- $F^{'} = F^{'} - f$
End: $Y \to B \forall B \in Y_F^{+}$

minimal basis

The idea is to remove redundant FDs.

for minimal cover for FDs

Right sides are single attributes

No FDs can be removed, otherwise $F^{'}$ is no longer a minimal basis

No attribute can be removed from a LEFT SIDE

construction:

decompose RHS to single attributes
repeatedly try to remove a FD to see if the remaining FDs are equivalent to the original set
- or $\forall f in F^{'} \mid \text{test whether } J=(F^{'}-f)^{+}$ is equivalent to $F^{+}$
repeatedly try to remove an attribute from LHS to see if the removed attribute can be derived from the remaining FDs.

Schema decomposition

goal: avoid redundancy and minimize anomalies (update and deletion) w/o losing information

One can also think of projecting FDs as geometric projections within a given FDs space

good properties to have

lossless join decomposition (should be able to reconstruct after decomposed)

avoid anomalies (redundant data)

preservation: $(F_{1} \cup F_{2} \cup \ldots \cup F_n)^{+} = F^{+}$

information loss with decomposition

Decompose $R$ into $S$ and $T$

consider FD $A \to B$ with $A \in S$ and $B \in T$

FD loss

Attribute $A$ and $B$ not in the same relation (thus must join $T$ and $S$ to enforce $A \to B$ , which is expensive)

Join loss

neither $(S \cap T) \to S$ nor $(S \cap T) \to T$ is in $F^{+}$

A lossy decomposition results in the reconstruction of components to include additional information that is not in the original constructions

how can we test for losslessness?

A binary decomposition of $R=(R,F)$ into $R_{1}=(R_{1},F_{1})$ and $R_{2}=(R_{2},F_{2})$ is lossless iff:

$(R_{1} \cap R_{2}) \to R_{1}$ is the $F^{+}$

$(R_{1} \cap R_{2}) \to R_{2}$ is the $F^{+}$

if $R_{1} \cap R_{2}$ form a superkey of either $R_{1}$ or $R_{2}$ , then decomposition of $R$ is lossless

Projection

Starts with $F_i = \emptyset$
For each subset $X \text{ of } R_i$ $X of R_{i}$
- Compute $X^{+}$
- For each attribute $A \in X^{+}$ $A \in X^{+}$
  - If $A$ in $R_i$ : add $X \to A$ to $F_i$
Compute minimal basis of $F_i$

Normal forms

\text{BCNF} \subseteq 3\text{NF} \subseteq 2\text{NF} \subseteq 1\text{NF}

Normal Form	Definition	Key Requirements	Example of Violation	How to Fix
First Normal Form (1NF)	All columns contain atomic values and there are no repeating groups.	- Each cell holds a single value (atomicity) - No repeating groups or arrays	A column storing multiple phone numbers in a single cell (e.g., “123-4567, 234-5678”).	Split the values into separate rows or columns so each cell is atomic.
Second Normal Form (2NF)	A 1NF table where every non-key attribute depends on the whole of a composite key.	- Already in 1NF - No partial dependencies on a composite primary key	A table with a composite primary key (e.g., StudentID, CourseID) where a non-key attribute (e.g., StudentName) depends only on StudentID.	Move attributes that depend on only part of the key into a separate table.
Third Normal Form (3NF)	A 2NF table with no transitive dependencies.	- Already in 2NF - No transitive dependencies (non-key attributes depend only on the key, not on other non-key attributes)	A table where AdvisorOffice depends on AdvisorName, which in turn depends on StudentID.	Put attributes like AdvisorName and AdvisorOffice in a separate Advisor table keyed by AdvisorID.
Boyce-Codd Normal Form (BCNF)	A stronger version of 3NF where every determinant is a candidate key.	- For every functional dependency X → Y, X must be a candidate key	A table where Professor → Course but Professor is not a candidate key.	Decompose the table so that every functional dependency has a candidate key as its determinant.

1NF

no multi-valued attributes allowed

idea: think of storing a list of a values in an attributes

counter: Course(name, instructor, [student, email]*)

2NF

non-key attributes depend on candidate keys

idea: consider non-key attribute $A$ , then there exists an FD $X$ s.t. $X \to A$ and $X$ is a candidate key

figure1: Second normal form, hwere AuthorName is dependent on AuthorID

3NF

non-prime attribute depend only on candidate keys

idea: consider FD $X \to A$ , then either $X$ is a superkey, or $A$ is prime (part of a key)

counter: $\text{studio} \to \text{studioAddr}$ , where studioAddr depends on studio which is not a candidate key

figure2: Three normal form counter example

theorem

It is always possible to convert a schema to lossless join, dependency-preserving 3NF

what you get from 3NF

Lossless join

dependency preservation

anomalies (doesn’t guarantee)

Boyce-Codd normal form (BCNF)

on additional restriction over 3NF where all non-trivial FDs have superkey LHS

theorem

We say a relation $R$ is in BCNF if $X \to A$ is a non-trivial FD that holds in $R$ , and $X$ is a superkey ¹

what you get from BCNF

no dependency preservation (all original FDs should be satisfied)

no anomalies

Lossless join

decomposition into BCNF

relation $R$ with FDs $F$ , look for a BCNF violation $X \to Y$ ( $X$ is not a superkey)

Compute $X^{+}$ $X^{+}$
- find $X^{+} \neq X \neq \text{ all attributes }$ ( $X$ is a superkey)
Replace $R$ $R$ by relations with
- $R_{1} = X^{+}$
- $R_{2} = R - (X^{+} - X) = R - X^{+} \cup X$
Continue to recursively decompose the two new relations
Project given FDs $F$ onto the two new relations.

means $A$ is not contained in $X$ ↩

Operator	Operation	Example
$\sigma_C$	Selection	$\sigma_{A=10}(R)$
$\pi_L$	Projection	$\pi_{A,B}(R)$
$\times$	Cross-Product	$R_1 \times R_2$
$\bowtie$	Natural Join	$R_1 \bowtie R_2$
$\bowtie_C$	Theta Join	$R_1 \bowtie_{R_1.A=R_2.A} R_2$
$\rho_R$	Rename	$\rho_S(R)$
$\delta$	Eliminate Duplicates	$\delta(R)$
$\tau$	Sort Tuples	$\tau(R)$
$\gamma_L$	Grouping & Aggregation	$\gamma_{A,AVG(B)}(R)$

selection

idea: picking certain row

R_{1} \coloneqq \sigma_C(R_{2})

$C$ is the condition refers to attribute in $R_{2}$

def: $R_{1}$ is all those tuples of $R_{2}$ that satisfies C

projection

idea: picking certain column

R_{1} \coloneqq \pi_L(R_{2})

$L$ is the list of attributes from $R_{2}$

\begin{aligned} R &= \begin{bmatrix} A & B \\ 1 & 2 \\ 3 & 4 \end{bmatrix} \\[8pt] \pi_{A+B \rightarrow C, A \rightarrow A_1, A \rightarrow A_2}(R) &= \begin{bmatrix} C & A_1 & A_2 \\ 3 & 1 & 1 \\ 7 & 3 & 3 \end{bmatrix} \end{aligned}

products

R_{3} \coloneqq R_{1} \times R_{2}

theta-join

R_{3} \coloneqq R_{1} \bowtie_C R_{2}

idea: product of $R_{1}$ and $R_{2}$ then apply $\sigma_C$ to results

think of $A \Theta B$ where $\Theta \coloneqq =, <, \text{ etc.}$

natural join

R_{3} \coloneqq R_{1} \bowtie R_{2}

equating attributes of the same name
projecting out one copy of each pair of equated attributes

renaming

R_{1} \coloneqq \rho_{R_{1}(A_{1},\ldots,A_n)}(R_{2})

set operators

union compatible

two relations are said to be union compatible if they have the same set of attributes and types (domain) of the attributes are the same

i.e: Student(sNumber, sName) and Course(cNumber, cName) are not union compatible

definition

modification of a set that allows repetition of elements

Think of $\{1,2,3,1,1\}$ is a bags, whereas $\{1,2,3\}$ is also considered a bag.

in a sense, $\{1,2,3\}$ happens to also be a set.

Set Operations on Relations

For relations $R_1$ and $R_2$ that are union compatible, here’s how many times a tuple $t$ appears in the result:

Operation	Symbol	Result (occurrences of tuple $t$ )
Union	$\cup$	$m + n$
Intersection	$\cap$	$\texttt{min}(m,n)$
Difference	$-$	$\texttt{max}(0, m-n)$

where $m$ is the number of times $t$ appears in $R_1$ and $n$ is the number of times it appears in $R_2$ .

sequence of assignments

precedence of relational operators:

\begin{aligned} &\sigma \quad \pi \quad \rho \\[8pt] & \times \quad \bowtie \\[9pt] & \cap \\ &\cup \quad - \end{aligned}

expression tree

extended algebra

$\delta$ : eliminate duplication from bags

$\tau$ : sort tuples

$\gamma_{L}(R)$ grouping and aggregation

outerjoin: avoid dangling tuples

duplicate elimination

\delta(R)

Think of it as converting it to set

sorting

\tau_L(R)

with $L$ is a list of some attributes of $R$

basically for ascending order, for descending order then use $\tau_{L, \text{DESC}}(R)$

applying aggregation

or $\gamma_{L}(R)$

group $R$ accordingly to all grouping attributes on $L$
per group, compute AGG(A) for each aggrgation on $L$
result has one tuple for each group: grouping attributes and aggregations

aggregation is applied to an entire column to produce a single results

outerjoin

essentially padding missing attributes with NULL

bag operations

remember that bag and set operations are different

set union is idempotent, whereas bags union is not.rightarrow

bag union: $\{1,2,1\} \cup \{1,1,2,3,1\} = \{1,1,1,1,1,2,2,3\}$

bag intersection: $\{1,2,1,1\} \cap \{1,2,1,3\} = \{1,1,2\}$

bag difference: $\{1,2,1,1\} - \{1,2,3\} = \{1,1\}$

concurrency

inter-leaved processing: concurrent exec is interleaved within a single CPU.

parallel processing: process are concurrently executed on multiple CPUs.

"\\usepackage{tikz}\n\\usetikzlibrary{arrows.meta, positioning}\n\n\\begin{document}\n\\begin{tikzpicture}[font=\\small, node distance=1.5cm, >=latex]\n\n%------------------------------\n% Interleaved Processing\n%------------------------------\n\n% Place the title higher up\n\\node[font=\\bfseries, align=center] (interleavedTitle) at (5, 1) {Interleaved (Time-Sliced) Processing};\n\n% Draw the timeline axis lower down\n\\draw[->] (-0.5,2) -- (10,2) node[below]{Time};\n\n% Processes above the time line\n% P1 intervals\n\\draw[fill=blue!30] (0,2.4) rectangle (2,3) node[midway]{P1};\n\\draw[fill=blue!30] (4,2.4) rectangle (6,3) node[midway]{P1};\n\\draw[fill=blue!30] (8,2.4) rectangle (9.5,3) node[midway]{P1};\n\n% P2 intervals\n\\draw[fill=red!30] (2,2.4) rectangle (4,3) node[midway]{P2};\n\\draw[fill=red!30] (6,2.4) rectangle (8,3) node[midway]{P2};\n\n%------------------------------\n% Parallel Processing\n%------------------------------\n\\begin{scope}[yshift=-1cm]\n\n% Title for parallel processing\n\\node[font=\\bfseries, align=center] (parallelTitle) at (5,-3.2) {Parallel (Concurrent) Processing};\n\n% Timelines for parallel processing\n\\draw[->] (-0.5,-1) -- (10,-1) node[below]{Time};\n\\draw[->] (-0.5,-2.5) -- (10,-2.5) node[below]{Time};\n\n% Process 1 on core 1 (above the -1 line)\n\\draw[fill=blue!30] (0,-0.6) rectangle (9.5,-0.0) node[midway]{P1 running on Core 1};\n\n% Process 2 on core 2 (above the -2.5 line)\n\\draw[fill=red!30] (0,-2.1) rectangle (9.5,-1.5) node[midway]{P2 running on Core 2};\n\\end{scope}\n\n\\end{tikzpicture}\n\\end{document}"

source code

ACID

atomic: either performed in its entirety (DBMS’s responsibility)
consistency: must take database from consistent state $X$ to $Y$
isolation: appear as if they are being executed in isolation
durability: changes applied must persist, even in the events of failure

Schedule

figure1: Venn diagram for schedule

definition

a schedule $S$ of $n$ transaction $T_{1}, T_{2}, \ldots, T_{n}$ is an ordering of operations of the transactions subject to the constrain that

For all transaction $T_i$ that participates in $S$ , the operations of $T_i$ in $S$ must appear in the same order in which they occur in $T_i$

For example:

S_a: R_{1}(A),R_{2}(A),W_{1}(A),W_{2}(A),\text{Abort1},\text{Commit2};

serial schedule: does not interleave the actions of different transactions
equivalent schedule: effect of executing any schedule are the same

serialisable schedule

a schedule that is equvalent to some serial execution of the set of committed transactions

serial

serialisable schedule

figure2: Note that this is not a serial schedule, given there are interleaved operations.

$S:R_1(A),W_1(A), R_2(A), W_2(A), R_1(B), W_1(B), R_2(B), W_2(B)$

conflict

operations in schedule

said to be in conflict if they satisfy all of the following:

belong to different operations

access the same item $A$

at least one of the operations is a write(A)

Concurrency Issue	Description	Annotation
Dirty Read	Reading uncommitted data	WR
Unrepeatable Read	T2 changes item $A$ that was previously read by T1, while T1 is still in progress	RW
Lost Update	T2 overwrites item $A$ while T1 is still in progress, causing T1’s changes to be lost	WW

conflict serialisable schedules

Two schedules are conflict equivalent if:

involves the same actions of the same transaction
every pair of conflicting actions is ordered the same way

Schedule $S$ is conflict serialisable if $S$ is conflict equivalent to some serial schedule

If two schedule $S_{1}$ and $S_{2}$ are conflict equivalent then they have the same effect $S_{1} \leftrightarrow S_{2}$ by swapping non-conflicting ops

Every conflict serialisable schedule is serialisable

on conflict serialisable

only consider committed transaction

schedule with abort

figure3: Note that this schedule is unrecoverable if T2 committed

However, if T2 did not commit, we abort T1 and cascade to T2

need to avoid cascading abort

if $T_i$ writes an object, then $T_j$ can read this only after $T_i$ commits

recoverable and avoid cascading aborts

Recoverable: a $X_\text{act}$ commits only after all $X_\text{act}$ it depends on commits.

ACA: idea of aborting a $X_\text{act}$ can be done without cascading the abort to other $X\text{act}$

ACA implies recoverable, not vice versa

precedent graph test

is a schedule conflict-serialisable?

build a graph of all transactions $T_i$
Edge from $T_i$ to $T_j$ if $T_i$ comes first, and makes an action that conflicts with one of $T_j$

if graphs has no cycle then it is conflict-serialisable

strict

if a value written by $T_i$ is not read or overwritten by another $T_j$ until $T_i$ abort/commit

Are recoverable and ACA

Lock-based concurrency control

think of mutex or a lock mechanism to control access to a data object

transaction must release the lock

notation

Li(A) means $T_i$ acquires lock for A, where as Ui(A) releases lock for A

Lock Type	None	S	X
None	OK	OK	OK
S	OK	OK	Conflict
X	OK	Conflict	Conflict

lock compatibility matrix

overhead due to delays from blocking; minimize throughput

use smallest sized object
reduce time hold locks
reduce hotspot

shared locks

$S_T(A)$ for reading

exclusive lock

$X_T(A)$ for write/read

strict two phase locking (Strict 2PL)

Each $X_\text{act}$ must obtain a S lock on object before reading, and an X lock on object before writing
All lock held by transaction will be released when transaction is completed

only schedule those precedence graph is acyclic

recoverable and ACA

Example:

T1	T2
L(A);
R(A), W(A)
	L(A); DENIED...
L(B);
R(B), W(B)
U(A), U(B)
Commit;
	...GRANTED
	R(A), W(A)
	L(B);
	R(B), W(B)
	U(A), U(B)
	Commit;

implication

only allow safe interleavings of transactions

$T_{1}$ and $T_{2}$ access different objects, then no conflict and each may proceed

serial action

two phase locking (2PL)

lax version of strict 2PL, where it allow $X_\text{act}$ to release locks before the end

a transaction cannot request additional lock once it releases any lock

implication

all lock requests must precede all unlock request

ensure conflict serialisability

two phase transaction growing phase, (or obtains lock) and shrinking phase (or release locks)

isolation

Isolation Level	Description
`READ UNCOMMITTED`	No read-lock
`READ COMMITTED`	Short duration read locks
`REPEATABLE READ`	Long duration read/lock on individual items
`SERIALIZABLE`	All locks long durations

Deadlock

cycle of transactions waiting for locks to be released by each other

usually create a wait-for graph to detect cyclic actions

wait-die: lower transactions never wait for higher priority transactions
wound-wait: $T_i$ is higher priority than $T_j$ then $T_j$ is aborted and restarts later with same timestamp, otherwise $T_i$ waits

closure of $F$

For a set of FDs $F$ , then closure $F^{+}$ is the set of all FDs that can be derived from $F$

not to be confused with closure of an attribute set

url: thoughts/.../Design-theory
description: Armstrong's Axiom
rules

Rule Description
Reflexivity If $Y \subseteq X$ , then $X \to Y$
Augmentation If $X \to Y$ , then $XZ \to YZ$ for any $Z$
Transitivity If $X \to Y$ and $Y \to Z$ , then $X \to Z$
Union If $X \to Y$ and $X \to Z$ , then $X \to YZ$
Decomposition If $X \to YZ$ , then $X \to Y$ and $X \to Z$
Lien vers l'original

url: thoughts/.../Design-theory
description: FD's closure test
closure test

Given attribute set $Y$ and FD set $F$ , we have $Y_F^{+}$ is the closure of $Y$ relative to $F$

Or set of all FDs given/implied by $Y$

Steps:

Start: $Y_F^{+}=Y, F^{'}=F$

While there exists a $f \in F^{'}$ s.t $\text{LHS}(F) \subseteq Y_F^{+}$ :

$Y_F^{+} = Y_F^{+} \cup \text{RHS}(f)$

$F^{'} = F^{'} - f$

End: $Y \to B \forall B \in Y_F^{+}$

Lien vers l'original

url: thoughts/.../Design-theory
description: minimal cover or minimal basis
minimal basis

The idea is to remove redundant FDs.

for minimal cover for FDs

Right sides are single attributes

No FDs can be removed, otherwise $F^{'}$ is no longer a minimal basis

No attribute can be removed from a LEFT SIDE

construction:

decompose RHS to single attributes

repeatedly try to remove a FD to see if the remaining FDs are equivalent to the original set

or $\forall f in F^{'} \mid \text{test whether } J=(F^{'}-f)^{+}$ is equivalent to $F^{+}$

repeatedly try to remove an attribute from LHS to see if the removed attribute can be derived from the remaining FDs.

Lien vers l'original

url: thoughts/.../Design-theory
Design theory

Keys

$K$ is a key if $K$ uniquely determines all of $R$ and no subset of $K$ does

K is a superkey for relation $R$ if $K$ contains a key of $R$

see also: keys

functional dependency

Think of it as ” $X \to Y$ holds in $R$ ”

convention: no braces used for set of attributes, just $ABC$ instead of $\{A,B,C\}$

properties

splitting/combining

trival FDs

Armstrong’s Axioms

FD are generalisation of keys

superkey: $X \to R$ , must include all the attributes of the relation on RHS

trivial
$\begin{aligned} A &\to A \\ AB &\to A \\ ABC &\to AD \coloneqq ABC \to D \end{aligned}$
always hold (right side is a subset)

splitting/combining right side of FDs
$X \to A_{1} A_{2} \ldots A_{n} \text{ holds for R }$
when each of $X \to A_{1}$ , $X \to A_{2}$ , …, $X \to A_{n}$ holds for $R$

ex: $A \to BC$ is equiv to $A \to B$ and $A \to C$

ex: $A \to F$ and $A \to G$ can be written as $A \to FG$

Armstrong’s Axioms

Given $X,Y,Z$ are sets of attributes

rules

Rule Description
Reflexivity If $Y \subseteq X$ , then $X \to Y$
Augmentation If $X \to Y$ , then $XZ \to YZ$ for any $Z$
Transitivity If $X \to Y$ and $Y \to Z$ , then $X \to Z$
Union If $X \to Y$ and $X \to Z$ , then $X \to YZ$
Decomposition If $X \to YZ$ , then $X \to Y$ and $X \to Z$

dependency inference

$A \to C$ is implied by $\{A \to B, B \to C\}$

transitivity

example: Key

List all the keys of $R(A,B,C,D)$ with the following FDs:

$B \to C$

$B \to D$

sol:
$\begin{aligned} B \to C &\text{ and } B \to D &(\text{given})\\ B &\to CD &(\text{Union})\\ AB &\to ACD &(\text{Augmentation})\\ AB &\to ABCD &(\text{Reflexivity and Union})\\ \end{aligned}$
closure test

Given attribute set $Y$ and FD set $F$ , we have $Y_F^{+}$ is the closure of $Y$ relative to $F$

Or set of all FDs given/implied by $Y$

Steps:

Start: $Y_F^{+}=Y, F^{'}=F$

While there exists a $f \in F^{'}$ s.t $\text{LHS}(F) \subseteq Y_F^{+}$ :

$Y_F^{+} = Y_F^{+} \cup \text{RHS}(f)$

$F^{'} = F^{'} - f$

End: $Y \to B \forall B \in Y_F^{+}$

minimal basis

The idea is to remove redundant FDs.

for minimal cover for FDs

Right sides are single attributes

No FDs can be removed, otherwise $F^{'}$ is no longer a minimal basis

No attribute can be removed from a LEFT SIDE

construction:

decompose RHS to single attributes

repeatedly try to remove a FD to see if the remaining FDs are equivalent to the original set

or $\forall f in F^{'} \mid \text{test whether } J=(F^{'}-f)^{+}$ is equivalent to $F^{+}$

repeatedly try to remove an attribute from LHS to see if the removed attribute can be derived from the remaining FDs.

Schema decomposition

goal: avoid redundancy and minimize anomalies (update and deletion) w/o losing information

One can also think of projecting FDs as geometric projections within a given FDs space

good properties to have

lossless join decomposition (should be able to reconstruct after decomposed)

avoid anomalies (redundant data)

preservation: $(F_{1} \cup F_{2} \cup \ldots \cup F_n)^{+} = F^{+}$

information loss with decomposition

Decompose $R$ into $S$ and $T$

consider FD $A \to B$ with $A \in S$ and $B \in T$

FD loss

Attribute $A$ and $B$ not in the same relation (thus must join $T$ and $S$ to enforce $A \to B$ , which is expensive)

Join loss

neither $(S \cap T) \to S$ nor $(S \cap T) \to T$ is in $F^{+}$

A lossy decomposition results in the reconstruction of components to include additional information that is not in the original constructions

how can we test for losslessness?

A binary decomposition of $R=(R,F)$ into $R_{1}=(R_{1},F_{1})$ and $R_{2}=(R_{2},F_{2})$ is lossless iff:

$(R_{1} \cap R_{2}) \to R_{1}$ is the $F^{+}$

$(R_{1} \cap R_{2}) \to R_{2}$ is the $F^{+}$

if $R_{1} \cap R_{2}$ form a superkey of either $R_{1}$ or $R_{2}$ , then decomposition of $R$ is lossless

Projection

Starts with $F_i = \emptyset$

For each subset $X \text{ of } R_i$

Compute $X^{+}$

For each attribute $A \in X^{+}$

If $A$ in $R_i$ : add $X \to A$ to $F_i$

Compute minimal basis of $F_i$

Normal forms
$\text{BCNF} \subseteq 3\text{NF} \subseteq 2\text{NF} \subseteq 1\text{NF}$
Normal Form Definition Key Requirements Example of Violation How to Fix
First Normal Form (1NF) All columns contain atomic values and there are no repeating groups. - Each cell holds a single value (atomicity)
- No repeating groups or arrays A column storing multiple phone numbers in a single cell (e.g., “123-4567, 234-5678”). Split the values into separate rows or columns so each cell is atomic.
Second Normal Form (2NF) A 1NF table where every non-key attribute depends on the whole of a composite key. - Already in 1NF
- No partial dependencies on a composite primary key A table with a composite primary key (e.g., StudentID, CourseID) where a non-key attribute (e.g., StudentName) depends only on StudentID. Move attributes that depend on only part of the key into a separate table.
Third Normal Form (3NF) A 2NF table with no transitive dependencies. - Already in 2NF
- No transitive dependencies (non-key attributes depend only on the key, not on other non-key attributes) A table where AdvisorOffice depends on AdvisorName, which in turn depends on StudentID. Put attributes like AdvisorName and AdvisorOffice in a separate Advisor table keyed by AdvisorID.
Boyce-Codd Normal Form (BCNF) A stronger version of 3NF where every determinant is a candidate key. - For every functional dependency X → Y, X must be a candidate key A table where Professor → Course but Professor is not a candidate key. Decompose the table so that every functional dependency has a candidate key as its determinant.

1NF

no multi-valued attributes allowed

idea: think of storing a list of a values in an attributes

counter: Course(name, instructor, [student, email]*)

2NF

non-key attributes depend on candidate keys

idea: consider non-key attribute $A$ , then there exists an FD $X$ s.t. $X \to A$ and $X$ is a candidate key

figure1: Second normal form, hwere AuthorName is dependent on AuthorID

3NF

non-prime attribute depend only on candidate keys

idea: consider FD $X \to A$ , then either $X$ is a superkey, or $A$ is prime (part of a key)

counter: $\text{studio} \to \text{studioAddr}$ , where studioAddr depends on studio which is not a candidate key

figure2: Three normal form counter example

theorem

It is always possible to convert a schema to lossless join, dependency-preserving 3NF

what you get from 3NF

Lossless join

dependency preservation

anomalies (doesn’t guarantee)

Boyce-Codd normal form (BCNF)

on additional restriction over 3NF where all non-trivial FDs have superkey LHS

theorem

We say a relation $R$ is in BCNF if $X \to A$ is a non-trivial FD that holds in $R$ , and $X$ is a superkey ¹

what you get from BCNF

no dependency preservation (all original FDs should be satisfied)

no anomalies

Lossless join

decomposition into BCNF

relation $R$ with FDs $F$ , look for a BCNF violation $X \to Y$ ( $X$ is not a superkey)

Compute $X^{+}$

find $X^{+} \neq X \neq \text{ all attributes }$ ( $X$ is a superkey)

Replace $R$ by relations with

$R_{1} = X^{+}$

$R_{2} = R - (X^{+} - X) = R - X^{+} \cup X$

Continue to recursively decompose the two new relations

Project given FDs $F$ onto the two new relations.

Lien vers l'original

url: thoughts/.../Relational-Algebra
Relational Algebra

Operator Operation Example
$\sigma_C$ Selection $\sigma_{A=10}(R)$
$\pi_L$ Projection $\pi_{A,B}(R)$
$\times$ Cross-Product $R_1 \times R_2$
$\bowtie$ Natural Join $R_1 \bowtie R_2$
$\bowtie_C$ Theta Join $R_1 \bowtie_{R_1.A=R_2.A} R_2$
$\rho_R$ Rename $\rho_S(R)$
$\delta$ Eliminate Duplicates $\delta(R)$
$\tau$ Sort Tuples $\tau(R)$
$\gamma_L$ Grouping & Aggregation $\gamma_{A,AVG(B)}(R)$

selection

idea: picking certain row
$R_{1} \coloneqq \sigma_C(R_{2})$
$C$ is the condition refers to attribute in $R_{2}$

def: $R_{1}$ is all those tuples of $R_{2}$ that satisfies C

projection

idea: picking certain column
$R_{1} \coloneqq \pi_L(R_{2})$
$L$ is the list of attributes from $R_{2}$
$\begin{aligned} R &= \begin{bmatrix} A & B \\ 1 & 2 \\ 3 & 4 \end{bmatrix} \\[8pt] \pi_{A+B \rightarrow C, A \rightarrow A_1, A \rightarrow A_2}(R) &= \begin{bmatrix} C & A_1 & A_2 \\ 3 & 1 & 1 \\ 7 & 3 & 3 \end{bmatrix} \end{aligned}$
products
$R_{3} \coloneqq R_{1} \times R_{2}$
theta-join
$R_{3} \coloneqq R_{1} \bowtie_C R_{2}$
idea: product of $R_{1}$ and $R_{2}$ then apply $\sigma_C$ to results

think of $A \Theta B$ where $\Theta \coloneqq =, <, \text{ etc.}$

natural join
$R_{3} \coloneqq R_{1} \bowtie R_{2}$

equating attributes of the same name

projecting out one copy of each pair of equated attributes

renaming
$R_{1} \coloneqq \rho_{R_{1}(A_{1},\ldots,A_n)}(R_{2})$
set operators

union compatible

two relations are said to be union compatible if they have the same set of attributes and types (domain) of the attributes are the same

i.e: Student(sNumber, sName) and Course(cNumber, cName) are not union compatible

url: thoughts/bags
bags

definition

modification of a set that allows repetition of elements

Think of $\{1,2,3,1,1\}$ is a bags, whereas $\{1,2,3\}$ is also considered a bag.

in a sense, $\{1,2,3\}$ happens to also be a set.

Lien vers l'original

Set Operations on Relations

For relations $R_1$ and $R_2$ that are union compatible, here’s how many times a tuple $t$ appears in the result:

Operation Symbol Result (occurrences of tuple $t$ )
Union $\cup$ $m + n$
Intersection $\cap$ $\texttt{min}(m,n)$
Difference $-$ $\texttt{max}(0, m-n)$

where $m$ is the number of times $t$ appears in $R_1$ and $n$ is the number of times it appears in $R_2$ .

sequence of assignments

precedence of relational operators:
$\begin{aligned} &\sigma \quad \pi \quad \rho \\[8pt] & \times \quad \bowtie \\[9pt] & \cap \\ &\cup \quad - \end{aligned}$
expression tree

extended algebra

$\delta$ : eliminate duplication from bags

$\tau$ : sort tuples

$\gamma_{L}(R)$ grouping and aggregation

outerjoin: avoid dangling tuples

duplicate elimination
$\delta(R)$
Think of it as converting it to set

sorting
$\tau_L(R)$
with $L$ is a list of some attributes of $R$

basically for ascending order, for descending order then use $\tau_{L, \text{DESC}}(R)$

applying aggregation

or $\gamma_{L}(R)$

group $R$ accordingly to all grouping attributes on $L$

per group, compute AGG(A) for each aggrgation on $L$

result has one tuple for each group: grouping attributes and aggregations

aggregation is applied to an entire column to produce a single results

outerjoin

essentially padding missing attributes with NULL

bag operations

remember that bag and set operations are different

set union is idempotent, whereas bags union is not.rightarrow

bag union: $\{1,2,1\} \cup \{1,1,2,3,1\} = \{1,1,1,1,1,2,2,3\}$

bag intersection: $\{1,2,1,1\} \cap \{1,2,1,3\} = \{1,1,2\}$

bag difference: $\{1,2,1,1\} - \{1,2,3\} = \{1,1\}$
Lien vers l'original

url: thoughts/.../Transaction
Transaction
see also concurrency

A sequence of read/write

concurrency

inter-leaved processing: concurrent exec is interleaved within a single CPU.

parallel processing: process are concurrently executed on multiple CPUs.

$"\\usepackage{tikz}\n\\usetikzlibrary{arrows.meta, positioning}\n\n\\begin{document}\n\\begin{tikzpicture}[font=\\small, node distance=1.5cm, >=latex]\n\n%------------------------------\n% Interleaved Processing\n%------------------------------\n\n% Place the title higher up\n\\node[font=\\bfseries, align=center] (interleavedTitle) at (5, 1) {Interleaved (Time-Sliced) Processing};\n\n% Draw the timeline axis lower down\n\\draw[->] (-0.5,2) -- (10,2) node[below]{Time};\n\n% Processes above the time line\n% P1 intervals\n\\draw[fill=blue!30] (0,2.4) rectangle (2,3) node[midway]{P1};\n\\draw[fill=blue!30] (4,2.4) rectangle (6,3) node[midway]{P1};\n\\draw[fill=blue!30] (8,2.4) rectangle (9.5,3) node[midway]{P1};\n\n% P2 intervals\n\\draw[fill=red!30] (2,2.4) rectangle (4,3) node[midway]{P2};\n\\draw[fill=red!30] (6,2.4) rectangle (8,3) node[midway]{P2};\n\n%------------------------------\n% Parallel Processing\n%------------------------------\n\\begin{scope}[yshift=-1cm]\n\n% Title for parallel processing\n\\node[font=\\bfseries, align=center] (parallelTitle) at (5,-3.2) {Parallel (Concurrent) Processing};\n\n% Timelines for parallel processing\n\\draw[->] (-0.5,-1) -- (10,-1) node[below]{Time};\n\\draw[->] (-0.5,-2.5) -- (10,-2.5) node[below]{Time};\n\n% Process 1 on core 1 (above the -1 line)\n\\draw[fill=blue!30] (0,-0.6) rectangle (9.5,-0.0) node[midway]{P1 running on Core 1};\n\n% Process 2 on core 2 (above the -2.5 line)\n\\draw[fill=red!30] (0,-2.1) rectangle (9.5,-1.5) node[midway]{P2 running on Core 2};\n\\end{scope}\n\n\\end{tikzpicture}\n\\end{document}"$
source code

ACID

atomic: either performed in its entirety (DBMS’s responsibility)

consistency: must take database from consistent state $X$ to $Y$

isolation: appear as if they are being executed in isolation

durability: changes applied must persist, even in the events of failure

Schedule

figure1: Venn diagram for schedule

definition

a schedule $S$ of $n$ transaction $T_{1}, T_{2}, \ldots, T_{n}$ is an ordering of operations of the transactions subject to the constrain that

For all transaction $T_i$ that participates in $S$ , the operations of $T_i$ in $S$ must appear in the same order in which they occur in $T_i$

For example:
$S_a: R_{1}(A),R_{2}(A),W_{1}(A),W_{2}(A),\text{Abort1},\text{Commit2};$

serial schedule: does not interleave the actions of different transactions

equivalent schedule: effect of executing any schedule are the same

serialisable schedule

a schedule that is equvalent to some serial execution of the set of committed transactions

serial

serialisable schedule

figure2: Note that this is not a serial schedule, given there are interleaved operations.

$S:R_1(A),W_1(A), R_2(A), W_2(A), R_1(B), W_1(B), R_2(B), W_2(B)$

conflict

operations in schedule

said to be in conflict if they satisfy all of the following:

belong to different operations

access the same item $A$

at least one of the operations is a write(A)

Concurrency Issue Description Annotation
Dirty Read Reading uncommitted data WR
Unrepeatable Read T2 changes item $A$ that was previously read by T1, while T1 is still in progress RW
Lost Update T2 overwrites item $A$ while T1 is still in progress, causing T1’s changes to be lost WW

conflict serialisable schedules

Two schedules are conflict equivalent if:

involves the same actions of the same transaction

every pair of conflicting actions is ordered the same way

Schedule $S$ is conflict serialisable if $S$ is conflict equivalent to some serial schedule

If two schedule $S_{1}$ and $S_{2}$ are conflict equivalent then they have the same effect $S_{1} \leftrightarrow S_{2}$ by swapping non-conflicting ops

Every conflict serialisable schedule is serialisable

on conflict serialisable

only consider committed transaction

schedule with abort

figure3: Note that this schedule is unrecoverable if T2 committed

However, if T2 did not commit, we abort T1 and cascade to T2

need to avoid cascading abort

if $T_i$ writes an object, then $T_j$ can read this only after $T_i$ commits

recoverable and avoid cascading aborts

Recoverable: a $X_\text{act}$ commits only after all $X_\text{act}$ it depends on commits.

ACA: idea of aborting a $X_\text{act}$ can be done without cascading the abort to other $X\text{act}$

ACA implies recoverable, not vice versa

precedent graph test

is a schedule conflict-serialisable?

build a graph of all transactions $T_i$

Edge from $T_i$ to $T_j$ if $T_i$ comes first, and makes an action that conflicts with one of $T_j$

if graphs has no cycle then it is conflict-serialisable

strict

if a value written by $T_i$ is not read or overwritten by another $T_j$ until $T_i$ abort/commit

Are recoverable and ACA

Lock-based concurrency control

think of mutex or a lock mechanism to control access to a data object

transaction must release the lock

notation

Li(A) means $T_i$ acquires lock for A, where as Ui(A) releases lock for A

Lock Type None S X
None OK OK OK
S OK OK Conflict
X OK Conflict Conflict

lock compatibility matrix

overhead due to delays from blocking; minimize throughput

use smallest sized object

reduce time hold locks

reduce hotspot

shared locks

$S_T(A)$ for reading

exclusive lock

$X_T(A)$ for write/read

strict two phase locking (Strict 2PL)

Each $X_\text{act}$ must obtain a S lock on object before reading, and an X lock on object before writing

All lock held by transaction will be released when transaction is completed

only schedule those precedence graph is acyclic

recoverable and ACA

Example:

T1 T2
L(A);
R(A), W(A)
L(A); DENIED...
L(B);
R(B), W(B)
U(A), U(B)
Commit;
...GRANTED
R(A), W(A)
L(B);
R(B), W(B)
U(A), U(B)
Commit;

implication

only allow safe interleavings of transactions

$T_{1}$ and $T_{2}$ access different objects, then no conflict and each may proceed

serial action

two phase locking (2PL)

lax version of strict 2PL, where it allow $X_\text{act}$ to release locks before the end

a transaction cannot request additional lock once it releases any lock

implication

all lock requests must precede all unlock request

ensure conflict serialisability

two phase transaction growing phase, (or obtains lock) and shrinking phase (or release locks)

isolation

Isolation Level Description
READ UNCOMMITTED No read-lock
READ COMMITTED Short duration read locks
REPEATABLE READ Long duration read/lock on individual items
SERIALIZABLE All locks long durations

Deadlock

cycle of transactions waiting for locks to be released by each other

usually create a wait-for graph to detect cyclic actions

wait-die: lower transactions never wait for higher priority transactions

wound-wait: $T_i$ is higher priority than $T_j$ then $T_j$ is aborted and restarts later with same timestamp, otherwise $T_i$ waits

Lien vers l'original

introduction to databases

Étiquette

publié à

modifié à

durée

source

rules

closure test

minimal basis

Design theory

functional dependency

trivial

splitting/combining right side of FDs

Armstrong’s Axioms

rules

dependency inference

transitivity

closure test

minimal basis

Schema decomposition

Projection

Normal forms

1NF

2NF

3NF

Boyce-Codd normal form (BCNF)

decomposition into BCNF

Relational Algebra

selection

projection

products

theta-join

natural join

renaming

set operators

bags

Set Operations on Relations

sequence of assignments

expression tree

extended algebra

duplicate elimination

sorting

applying aggregation

outerjoin

bag operations

Transaction

concurrency

ACID

Schedule

serial

conflict

conflict serialisable schedules

schedule with abort

recoverable and avoid cascading aborts

precedent graph test

strict

Lock-based concurrency control

shared locks

exclusive lock

strict two phase locking (Strict 2PL)

two phase locking (2PL)

isolation

Deadlock

Remarque

Vous pourriez aimer ce qui suit