Summer School 2016

Summer School is a course in algorithms and data structures. The course is run via the WellingtonRuby Meetup and features weekly assignments and review sessions.

Assignments

Array, Enumerable & Enumerator

Source
Implementation
Correction
Diff between my implementation and the correction
Notes

CircularBuffer & LinkedList

Source
Implementation
Correction
Diff between my implementation and the correction
Notes

Hash

Source
Implementation
Correction
Diff between my implementation and the correction
Notes

Binary Search Tree

Source
Implementation
Correction
Diff between my implementation and the correction
Notes

Binary Heap & Priority Queue

Source
(Partial) Implementation
Correction
Diff between my implementation and the correction
Notes

Matrix & Page Rank

Source
Implementation
Correction
Diff between my implementation and the correction
Notes

Graph

Source
(Partial) Implementation
Correction
Diff between my implementation and the correction
Notes

Algorithms

Source
Implementation
Correction
Diff between my implementation and the correction
Notes

Weighted Graph Algorithms

Notes

01 - Array, Enumerable & Enumerator

When the memory has reached capacity, it needs to be extended. It is done by allocating a new memory chunk and copying all the elements to it.

If it was only extended to exactly what is needed, the next time around, it would happen again. arithmetic series: 1 + 2 + ... + n = (n^2 + n) / 2 geometric series: 1 + 2 + 4 + 8 + ... + 2^n = 2^(n+1) - 1

02 - Circular Buffer & Linked List

Linked Lists lookup is faster than Arrays lookup if we assume:

Locality of reference
Random access machine model where access time is constant

03 - Hashes

Use hash when keys are sparse or not even integer.

For non-integer keys, Ruby calls hash on the key to have an integer. Ruby uses a SipHash which is collision resistant but not completely secure, see collision resolution.

04 - Binary Tree

Binary Tree: Each node has at most 2 child. Binary Search Tree: Same but each node has a value and the left node is always smaller.

The shape of the tree is dependent on the order things are put in. Although, self adjusting/balancing trees help having a balanced tree regardless of the order in which the elements are added.

O(n^2) for insert and search if not balanced. O(log•n) for insert and O(n•log•n) for search if balanced.

Easy in order traversal with recursive function (see each).

Normal implementation of eql? would only verify that the container has the same values in it but for the sake of this exercise, it also check the structure of the tree.

05 - Binary Heap & Priority Queue

Index calculations for Binary Heaps:

In general the children of the value at index i are at index 2 * i + 1 and 2 * i + 2.
The parent of the value at index i is at index (i - 1) / 2.

Utils methods have a few preconception.

sift_up assume that element at index j has a parent (or that j >= 1).
sift_down assume that the element at index i has 2 children.

Heapsort (runtime of n•log•n) usually use a max-heap which will shift all values from the start and add them from the end.

06 - Matrix & Page Rank

Matrix

The inner product of two vectors is the sum of the products of corresponding elements, such as a•b = a1b1 + a2b2 + a3b3 + ... (e.g. with v1 = Vector[1, 2] and v2 = Vector[2, 3], v1 • v2 = 8).

The straightforward algorithm for calculating a floating-point dot product of vectors can suffer from catastrophic cancellation. To avoid this, approaches such as the Kahan summation algorithm are used. – Wikipedia

The product of two matrices is a matrix of the inner product of each row (of the first matrix) with each column (of the second matrix).

Page Rank

Given the pages and links:

A links to B
B links to C
C links to B and D
D links to A

An adjacency matrix can be created to represent the pages links between each other. It will take n^2 space.

   A   B   C   D
A  0   1   0   0
B  0   0   1   0
C  0   1   0   1
D  1   0   0   0

Then it will be transformed into a (right) stochastic matrix (a.k.a. probability matrix) where each value is a non-negative real number representing a probability and where each row sums to 1. It will show the probability to go to a given page.

   A   B   C   D
A  0   1   0   0
B  0   0   1   0
C  0  0.5  0  0.5
D  1   0   0   0

The PageRank theory holds that an imaginary surfer who is randomly clicking on links will eventually stop clicking. The probability, at any step, that the person will continue is a damping factor d. Various studies have tested different damping factors, but it is generally assumed that the damping factor will be set around 0.85. – Wikipedia

It then multiplies the matrix by the damping factor and saves the result.

Separately, it creates another stochastic matrix of the same size, filled with 1/n representing the probability to pick a page when the surfer stops following links. This second matrix is multiplied by the opposite of the damping factor.

Both result are summed: follow_links_matrix * d + pick_new_page_matrix * (1 - d).

In addition to being stochastic, the resulting matrix is also irreducible (can get anywhere from anywhere) and aperiodic (it's possible to be on any page at any given step).

The PageRank is finally computed using the power iteration (an Eigenvalue algorithm) which, in short, produces a non-zero vector, multiplies it by the matrix, saves the resulting vector and repeat the process with that new vector until the probability to be on any page reach stationary distribution.

# With `m` the matrix and `n` the number of pages in the matrix.
v0 = Vector.new(n, 1/n)
while true
  v1 = m * v0
  t = (v1.magnitude - v0.magnitude)

  break if t < 0.0001

  v0 = v1
end
# PageRank is the value of v1

Note: In this exercise, it give the same weight to each link here but the importance of a link could be weighted. In addition, in case the surfer stops following links and go to another page, all the pages have an equal chance of being selected, but in reality, the surfer probably goes to a well known page.

07 - Graph

Representing page links with a matrix takes n^2 space and don't have edges per se. A more efficient way to represent the pages relations would be to use a graph of size n + e (nodes + edges, a.k.a. links).

Recursive methods are simpler but might cause stack overflow. Iterative methods can be used instead but then need to use a stack to keep track of where you are conceptually.

It is not possible to produce a topological sort if the graph is cyclic (or if a portion of it is).

Note: In this exercise, an array of size n is used to keep track of the visited nodes and might not be practical when the nodes not simple integers. Other solutions might involve using the hash of the object and checking whether it is in the array; or it could use a binary search tree.

08 - Algorithms

Fisher–Yates shuffle

The Fisher–Yates shuffle is an algorithm for generating a random permutation of a finite set. [...] The algorithm effectively puts all the elements into a hat; it continually determines the next element by randomly drawing an element from the hat until no elements remain. The algorithm produces an unbiased permutation: every permutation is equally likely. – Wikipedia

Binary search algorithm

Binary search runs in at worst O(log•n) (i.e. logarithmic time).

Knuth–Morris–Pratt algorithm

In computer science, the Knuth–Morris–Pratt string searching algorithm (or KMP algorithm) searches for occurrences of a word within a main text string by employing the observation that when a mismatch occurs, the word itself embodies sufficient information to determine where the next match could begin, thus bypassing re-examination of previously matched characters. – Wikipedia

09 - Weighted Graph

Floyd–Warshall

The Floyd–Warshall algorithm is an algorithm for finding shortest paths in a weighted graph – Wikipedia

Every combination of edges is tested by incrementally improving an estimate on the shortest path between two vertices, until the estimate is optimal.

Dijkstra

Dijkstra's algorithm is an algorithm for finding the shortest paths between nodes in a graph – Wikipedia

Prim

Prim's algorithm is a greedy algorithm that finds a minimum spanning tree for a weighted undirected graph. This means it finds a subset of the edges that forms a tree that includes every vertex, where the total weight of all the edges in the tree is minimized. The algorithm operates by building this tree one vertex at a time, from an arbitrary starting vertex, at each step adding the cheapest possible connection from the tree to another vertex – Wikipedia

Annex

Big O notation

In computer science, big O notation is used to classify algorithms according to how their running time or space requirements grow as the input size grows. – Wikipedia

Running time	Name
O(1)	constant time
O(log•n)	logarithmic time
O(n)	linear time
O(n•log•n)	linearithmic time
O(n^2)	quadratic time
O(n^3)	cubic time

Random

Most default implementation of random are only pseudo-random and Ruby's Random is no different, although it also has SecureRandom.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
01_array_enumerable_enumerator		01_array_enumerable_enumerator
02_circularbuffer_linkedlist		02_circularbuffer_linkedlist
03_hash		03_hash
04_binary_search_tree		04_binary_search_tree
05_binaryheap_priorityqueue		05_binaryheap_priorityqueue
06_matrix_pagerank		06_matrix_pagerank
07_graph		07_graph
08_algorithms		08_algorithms
09_weighted_graph_algorithms		09_weighted_graph_algorithms
.editorconfig		.editorconfig
.gitignore		.gitignore
.ruby-version		.ruby-version
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Summer School 2016

Assignments

Notes

01 - Array, Enumerable & Enumerator

02 - Circular Buffer & Linked List

03 - Hashes

04 - Binary Tree

05 - Binary Heap & Priority Queue

06 - Matrix & Page Rank

Matrix

Page Rank

07 - Graph

08 - Algorithms

Fisher–Yates shuffle

Binary search algorithm

Knuth–Morris–Pratt algorithm

09 - Weighted Graph

Floyd–Warshall

Dijkstra

Prim

Annex

Big O notation

Random

About

Languages

loicteixeira/summer-school-2016

Folders and files

Latest commit

History

Repository files navigation

Summer School 2016

Assignments

Notes

01 - Array, Enumerable & Enumerator

02 - Circular Buffer & Linked List

03 - Hashes

04 - Binary Tree

05 - Binary Heap & Priority Queue

06 - Matrix & Page Rank

Matrix

Page Rank

07 - Graph

08 - Algorithms

Fisher–Yates shuffle

Binary search algorithm

Knuth–Morris–Pratt algorithm

09 - Weighted Graph

Floyd–Warshall

Dijkstra

Prim

Annex

Big O notation

Random

About

Topics

Resources

Stars

Watchers

Forks

Languages