CSS Prepare

Database Systems

9 min read

A database is an organised collection of related data stored to support efficient retrieval and update. A Database Management System (DBMS) is the software that manages databases — controlling access, ensuring integrity, supporting concurrency and recovery.

DBMS

A software system that enables users to define, create, maintain, query and control access to a database, providing data abstraction, persistence, concurrency and recovery.

Evolution of database systems

GenerationEraExample
File-based1960sFlat files, COBOL records
Hierarchical1960s-70sIBM IMS
Network1970sCODASYL DBTG
Relational1970s onwardsOracle, MySQL, PostgreSQL, SQL Server
Object-oriented1990sObjectStore, db4o
NoSQL2000s onwardsMongoDB, Cassandra, Redis
NewSQL2010s onwardsCockroachDB, Google Spanner

Relational model

Proposed by E. F. Codd in 1970. Built on relations (tables) of tuples (rows) with attributes (columns).

Key terminology

  • Relation — table.
  • Tuple — row.
  • Attribute — column.
  • Domain — set of permissible values.
  • Degree — number of attributes.
  • Cardinality — number of tuples.
  • Schema — structure description; instance — actual data.

Keys

  • Super key — set of attributes that uniquely identifies tuples.
  • Candidate key — minimal super key.
  • Primary key — chosen candidate key; NOT NULL, UNIQUE.
  • Alternate key — other candidate keys.
  • Foreign key — attribute(s) referencing a primary key in another table.

Integrity constraints

  • Domain — attribute values from its domain.
  • Entity — primary key cannot be NULL.
  • Referential — foreign-key values must match an existing primary-key value.
  • Key — uniqueness.

Structured Query Language (SQL)

The ISO standard relational language. Three sublanguages:

  1. DDL (Data Definition) — CREATE, ALTER, DROP, TRUNCATE.
  2. DML (Data Manipulation) — SELECT, INSERT, UPDATE, DELETE.
  3. DCL (Data Control) — GRANT, REVOKE.

Many sources add TCL (Transaction Control) — COMMIT, ROLLBACK, SAVEPOINT.

Basic query

SELECT name, salary
FROM employees
WHERE department = 'Finance'
ORDER BY salary DESC;

Joins

  • INNER JOIN — only matching rows.
  • LEFT/RIGHT OUTER JOIN — all rows from one side + matches.
  • FULL OUTER JOIN — all rows from both, NULL where no match.
  • CROSS JOIN — Cartesian product.

Aggregations

COUNT, SUM, AVG, MIN, MAX, with GROUP BY and HAVING.

Key Points
  • WHERE filters rows BEFORE grouping; HAVING filters AFTER grouping.
  • NULL represents the absence of a value; comparisons with NULL use IS NULL.
  • DISTINCT removes duplicate rows.
  • Subqueries: scalar, row, table; correlated subqueries reference outer query.

Normalisation

The process of decomposing tables to remove anomalies (insertion, update, deletion) and redundancy.

Normal formRule
1NFAtomic attribute values
2NF1NF + no partial dependency on composite key
3NF2NF + no transitive dependency
BCNFEvery determinant is a candidate key
4NFNo non-trivial multivalued dependencies
5NFNo join dependencies

Pakistani CSS papers commonly test up to BCNF; higher forms are rare.

Functional dependency (FD)

X → Y means: whenever two tuples agree on X, they must agree on Y.

Armstrong's axioms

  • Reflexivity: if Y ⊆ X, then X → Y.
  • Augmentation: X → Y implies XZ → YZ.
  • Transitivity: X → Y and Y → Z imply X → Z.

Transactions and ACID

A transaction is a logical unit of work. The DBMS guarantees ACID properties:

  1. Atomicity — all-or-nothing.
  2. Consistency — moves DB from one valid state to another.
  3. Isolation — concurrent transactions appear as if serial.
  4. Durability — once committed, persists despite failures.

Concurrency control

  • Locking — two-phase locking (2PL).
  • Timestamp ordering.
  • Optimistic concurrency control.
  • MVCC (Multi-Version Concurrency Control) — used by PostgreSQL, Oracle.

Isolation levels (SQL standard)

LevelDirty readNon-repeatable readPhantom read
Read UncommittedPossiblePossiblePossible
Read CommittedPreventedPossiblePossible
Repeatable ReadPreventedPreventedPossible
SerializablePreventedPreventedPrevented

Recovery

  • Write-Ahead Logging (WAL) — log changes before applying.
  • Checkpointing.
  • Undo / Redo logs.

Indexing

Indexes accelerate queries by maintaining a sorted structure on key columns.

  • Primary index — on primary key.
  • Secondary index — on non-key column.
  • Clustered index — table physically ordered (one per table).
  • Non-clustered index — separate structure with pointers.
  • B+ tree — dominant disk-based index (most RDBMS).
  • Hash index — O(1) lookup, no range queries.

Trade-off: indexes speed reads but slow writes (inserts, updates, deletes).

Query optimisation

A DBMS query optimiser chooses an execution plan from many alternatives:

  1. Logical optimisation — rewriting relational algebra (push selections, join reordering).
  2. Physical optimisation — choosing access paths (index scan, sequential scan) and join algorithms (nested loop, hash, merge).

NoSQL databases

Designed for web-scale workloads, often relaxing ACID for performance and partition tolerance.

Types

  • Document — JSON-like docs (MongoDB, Couchbase).
  • Key-value — Redis, DynamoDB.
  • Column-family — Cassandra, HBase.
  • Graph — Neo4j, ArangoDB.

CAP theorem (Brewer)

A distributed system can simultaneously guarantee at most two of:

  • Consistency, Availability, Partition tolerance.

Most NoSQL stores favour AP or CP over CA. Modern NewSQL systems (CockroachDB, Spanner) push back, offering strong consistency at scale.

Data warehouse vs. OLTP

  • OLTP — many short transactions, normalised schema.
  • OLAP / Data warehouse — analytics, star/snowflake schema, columnar storage (Snowflake, BigQuery, Redshift).

ETL (Extract, Transform, Load) moves data from operational systems to warehouses for reporting.

In CSS short questions, a favourite catch is the difference between DELETE (DML, can be rolled back, row-by-row) and TRUNCATE (DDL, faster, cannot be rolled back, resets identity columns). Memorise this distinction.

Pakistani context

  • NADRA maintains one of the world's largest biometric identity databases (over 240 million identities).
  • FBR's IRIS system runs on Oracle for tax returns.
  • The State Bank of Pakistan uses Oracle and SQL Server extensively.
  • The Election Commission of Pakistan maintains the electoral roll database.
  • NTC (National Telecom Corporation) offers data-centre services to the public sector.

Officers who can read schemas, write SQL, and reason about ACID guarantees are well-placed to evaluate vendor proposals and oversee digital-government initiatives.

Database Systems — Computer Science CSS Notes · CSS Prepare