Lecture Notes and Topics Covered -- CS 6913


Lecture 1: (2/3)

Lecture notes in PDF:
6 slides per page (intro, handed out in first class)

Topics covered: Introduction, what is web search, how the web works, basic search engine architecture, crawling, indexing, querying, link analysis, course overview.


Lecture 2: (2/10)

Lecture notes in PDF: (not available)

Topics covered: Guest lecture by Josh Attenberg. Overview of computational advertising and sponsored search. Privacy.

To learn more about this topic, see the lecture notes for this course at Stanford.


Lecture 3: (2/17)

Lecture notes in PDF:
6 slides per page (IR intro)

Topics covered: Storage, indexing.


Lecture 4: (2/24)

Lecture notes in PDF: (see previous lectures)

Topics covered: Introduction to information retrieval, link analysis, I/O-efficient computing, discussion of homework #2.


Lecture 5: (3/3)

Lecture notes in PDF:
4 slides per page (disks and sorting)
4 slides per page (index building)
6 slides per page (indexing and basic query processing)

Topics covered: Intro to disk models, I/O-efficient computing, index building, basics of query execution.


Lecture 6: (3/10)

Lecture notes in PDF:
4 slides per page (data compression)

Topics covered: Overview of index access and query processing: interfaces, DAAT versus TAAT, index compression (slides from last lecture).


Lecture 7: (3/24)

Lecture notes in PDF: (no lecture notes -- will cover notes from 3/10)

Topics covered: Introduction to data compression: Huffman, arithmetic, gzip.


Lecture 8: (3/31)

Lecture notes in PDF:
6 slides per page (inverted indexing compression )
4 slides per page (link ranking)

Topics covered: Data compression continued: gzip, ppm. Inverted index compression: golomb, rice, gamma, delta, vbyte, S16, PForDelta, Interpolative coding. More on link-based ranking.


Lecture 9: (4/7)

Lecture notes in PDF:
4 slides per page (computing with the web graph)
4 slides per page (graph models for the web)

Topics covered: Advanced link analysis: convergence of pagerank and HITS, related methods.


Lecture 10: (4/14)

Lecture notes in PDF:
4 slides per page (advanced crawling)

Topics covered: Computing with the web graph. Graph compression. Random graph models and the web as a graph. Advanced crawling -- focused crawling, recrawling, web site surveillance and mining, citeseer.


Lecture 11: (4/21)

Lecture notes in PDF: (no new notes)

Topics covered: More recrawling, focused crawling. High performance crawling.


Lecture 12: (4/28)

Lecture notes in PDF:
6 slides per page (advanced architecture and query processing)

Topics covered: Intro to advanced search engine architecture and query processing. Intro to Google file system


Lecture 13: (5/5)

Lecture notes in PDF:
6 slides per page (google file system)
HTML (mapreduce)

Topics covered: Google File System. Google mapreduce and (maybe) bigTable.