Data Structures and Algorithms

9 min read

A program is data structure + algorithm (Niklaus Wirth). The CSS Computer Science paper rewards candidates who can analyse complexity, pick the right structure, and explain trade-offs.

Data Structure

A particular way of organising data in computer memory so that operations like insertion, deletion, search and update can be performed efficiently.

Complexity analysis

Algorithms are evaluated by how their resource use grows with input size n.

Time complexity — number of elementary operations.
Space complexity — additional memory used.

Big-O notation

Class	Name	Example
O(1)	Constant	Array index access
O(log n)	Logarithmic	Binary search
O(n)	Linear	Linear search
O(n log n)	Log-linear	Merge sort, heap sort
O(n²)	Quadratic	Bubble, insertion, selection sort
O(n³)	Cubic	Naive matrix multiplication
O(2ⁿ)	Exponential	Naive recursive Fibonacci
O(n!)	Factorial	Brute-force TSP

Other notations: Ω (omega) for lower bound, Θ (theta) for tight bound.

Key Points

Big-O describes worst-case growth.
Constants and lower-order terms are dropped — O(3n + 5) = O(n).
A faster-growing function eventually dominates regardless of constants.
For very small n, an O(n²) algorithm can outperform an O(n log n) one due to constants.

Linear data structures

Arrays

Contiguous fixed-size memory.

Access: O(1)
Insertion/deletion (middle): O(n)
Search (unsorted): O(n); (sorted): O(log n) with binary search.

Linked lists

Sequence of nodes connected by pointers.

Singly, doubly, circularly linked.
Insertion at head: O(1); search: O(n); access by index: O(n).

Stack (LIFO)

push, pop, peek — all O(1).
Uses: function call stack, expression parsing, undo operations.

Queue (FIFO)

enqueue, dequeue — both O(1).
Variants: circular queue, double-ended queue (deque), priority queue.

Hash tables

Map keys to values via a hash function.

Average operations: O(1); worst case: O(n) due to collisions.
Collision handling: separate chaining, open addressing (linear/quadratic probing, double hashing).

Non-linear data structures

Trees

A tree has nodes connected hierarchically with no cycles. Key terms: root, leaf, internal node, height, depth.

Binary Tree — each node ≤ 2 children.
Binary Search Tree (BST) — left subtree < node < right subtree.
Balanced BSTs — AVL trees, Red-Black trees. Guarantee O(log n) operations.
B-trees — used in databases and file systems (multiway, balanced).
Heap — complete binary tree satisfying heap property; min-heap or max-heap. Basis of heap sort and priority queue.
Trie — prefix tree for strings.

Tree traversals

In-order (Left-Root-Right): visits BST nodes in sorted order.
Pre-order (Root-Left-Right): used to copy a tree.
Post-order (Left-Right-Root): used for deletion.
Level-order (BFS): uses a queue.

Graphs

A graph G = (V, E) is a set of vertices and edges. Representations:

Adjacency matrix — O(V²) space; O(1) edge query.
Adjacency list — O(V + E) space; preferred for sparse graphs.

Types: directed/undirected, weighted/unweighted, cyclic/acyclic, connected/disconnected.

Graph traversals

Breadth-First Search (BFS) — uses a queue; explores layer by layer; O(V + E).
Depth-First Search (DFS) — uses recursion or stack; O(V + E).

Sorting algorithms

Algorithm	Best	Average	Worst	Space	Stable?
Bubble	O(n)	O(n²)	O(n²)	O(1)	Yes
Selection	O(n²)	O(n²)	O(n²)	O(1)	No
Insertion	O(n)	O(n²)	O(n²)	O(1)	Yes
Merge	O(n log n)	O(n log n)	O(n log n)	O(n)	Yes
Quick	O(n log n)	O(n log n)	O(n²)	O(log n)	No
Heap	O(n log n)	O(n log n)	O(n log n)	O(1)	No
Counting	O(n+k)	O(n+k)	O(n+k)	O(k)	Yes
Radix	O(n·k)	O(n·k)	O(n·k)	O(n+k)	Yes

Quick sort is fastest in practice due to good cache behaviour; merge sort is preferred when stability or external (disk-based) sorting is needed.

Searching algorithms

Linear Search — O(n); works on unsorted data.
Binary Search — O(log n); requires sorted array.
Hash-based — O(1) average.
Tree-based — O(log n) in balanced trees.

Classical graph algorithms

Dijkstra's algorithm — single-source shortest paths, non-negative weights, O((V+E) log V) with a priority queue.
Bellman-Ford — handles negative weights; detects negative cycles; O(VE).
Floyd-Warshall — all-pairs shortest paths; O(V³).
Kruskal's and Prim's — minimum spanning tree; O(E log E) and O(E + V log V) respectively.
Topological sort — DAG ordering; O(V + E).

Algorithm design paradigms

Divide and Conquer — break into smaller subproblems (Merge sort, Quick sort, FFT).
Greedy — make locally optimal choices (Dijkstra, Kruskal, Huffman coding).
Dynamic Programming — overlapping subproblems + optimal substructure (Fibonacci, LCS, Knapsack, Bellman-Ford).
Backtracking — explore + prune (N-Queens, Sudoku, subset sum).
Branch and Bound — for optimisation problems (TSP).
Randomised algorithms — randomised quicksort, Monte Carlo, Las Vegas.

NP-completeness

The complexity-theoretic backdrop:

P — problems solvable in polynomial time.
NP — problems whose solutions can be verified in polynomial time.
NP-Complete — hardest NP problems; if any one is in P, then P = NP. Examples: SAT, Travelling Salesman, Subset Sum.
NP-Hard — at least as hard as NP problems, may not even be in NP.

The P vs. NP question, posed by Stephen Cook (1971), remains open and is one of the seven Millennium Prize problems.

For exam answers, learn to state worst, average and best case complexities of common algorithms by heart, and be ready to explain why quicksort's worst case is O(n²) — when the pivot is always the smallest or largest element.

Practical relevance

Pakistani fintech (Easypaisa, JazzCash), e-commerce (Daraz), telecom (PTCL, Jazz) and government systems (NADRA, FBR's IRIS portal) all run on optimised data structures and algorithms. A CSS officer interfacing with such systems benefits enormously from knowing why a query is slow, why an index helps, or why a graph traversal lies under social-network recommendations.