# PageRank algorithm

Aug-Nov 2020

## Recap

• Vector space $V$ over a scalar field $F$
• $F$: real field $\mathbb{R}$ or complex field $\mathbb{C}$ in this course
• $m\times n$ matrix A represents a linear map $T:F^n\to F^m$
• dim null $T$ + dim range $T$ = dim $V$
• Linear equation: $Ax=b$
• Solution (if it exists): $u+$ null$(A)$
• Four fundamental subspaces of a matrix
• Column space, row space, null space, left null space
• Eigenvalue $\lambda$ and Eigenvector $v$: $Tv=\lambda v$
• Distinct eigenvalues have independent eigenvectors
• Basis of eigenvectors results in a diagonal matrix for $T$
• Every linear map has an upper triangular matrix representation

## PageRank algorithm

• Search mostly used to struggle to give relevant results first

• Larry Page and Sergey Brin

• Developed PageRank as an algorithm to rank web pages based on their importance

• Importance was determined by the links connecting the pages

• Algorithm rooted in linear algebra, eigenvalues and eigenvectors

## Webgraph

Construction

• Each web page is a node

• Edge from Node $i$ to Node $j$ if

• web page of Node $i$ has a link to web page of Node $j$
• Entire www is a “huge” graph

Degrees

• Out-degree of a node: number of outgoing links

• In-degree of a node: number of nodes that backlink

• High in-degree: could indicate “importance” or popularity

## Importance of nodes

• $x_i$: real number that denotes importance of Node $i$

• $n_i$: out-degree of Node $i$

• $L_i$: set of nodes that link to Node $i$

Condition enforced on importance

$x_i=\sum_{j\in L_i}\dfrac{x_j}{n_j}$

## Importance, eigenvalues and eigenvectors

$A=\begin{bmatrix} 0&0&0&1/3&0\\ 1&0&0&1/3&0\\ 0&1/2&0&0&0\\ 0&1/2&1/2&0&1\\ 0&0&1/2&1/3&0 \end{bmatrix}$

$x=(x_0,x_1,x_2,x_3,x_4)$

Condition on importance

$x = Ax$

Importance vector $x$ is an eigenvector of $A$ with eigenvalue 1

## Existence of importance vector

$A$: $(i,j)$-th entry is $\dfrac{1}{n_j}$

• Node $j$ links to Node $i$

• $n_j$: out-degree of Node $j$

columns of $A$ add to 1

or, rows of $A^T$ add to 1

$v=(1,1,\ldots,1)$ (all-1 vector) is an eigenvector of $A^T$ with eigenvalue 1

$\lambda=1$ is an eigenvalue of $A$

Importance vector: eigenvector of $A$ corresponding to eigenvalue 1

## Example

$A=\begin{bmatrix} 0&0&0&1/3&0\\ 1&0&0&1/3&0\\ 0&1/2&0&0&0\\ 0&1/2&1/2&0&1\\ 0&0&1/2&1/3&0 \end{bmatrix}$

$x=(2,4,2,6,3)$

Highest importance: Node 3

## Real webgraphs

Dataset from Laboratory for Web Algorithmics

• Sample webgraph: European union domains in 2015

• Nodes: 1,070,557,254

• Edges: 91,792,261,600

• average degree: 85.743

• maximum in-degree: 20,252,239

• maximum out-degree: 35,340

Importance vector: needs to be efficiently computed