Lecture 1: (2/3)
Lecture notes in PDF:
6 slides per page (intro, handed out in
first class)
Topics covered: Introduction, what is web search, how the web works, basic search engine architecture, crawling, indexing, querying, link analysis, course overview.
Lecture 2: (2/10)
Lecture notes in PDF: (not available)
Topics covered: Guest lecture by Josh Attenberg. Overview of computational advertising and sponsored search. Privacy.
To learn more about this topic, see the lecture notes for this course at Stanford.
Lecture 3: (2/17)
Lecture notes in PDF:
6 slides per page (IR intro)
Topics covered: Storage, indexing.
Lecture 4: (2/24)
Lecture notes in PDF: (see previous lectures)
Topics covered: Introduction to information retrieval, link analysis, I/O-efficient computing, discussion of homework #2.
Lecture 5: (3/3)
Lecture notes in PDF:
4 slides per page (disks and sorting)
4 slides per page (index building)
6 slides per page (indexing and basic query
processing)
Topics covered: Intro to disk models, I/O-efficient computing, index building, basics of query execution.
Lecture 6: (3/10)
Lecture notes in PDF:
4 slides per page (data compression)
Topics covered: Overview of index access and query processing: interfaces, DAAT versus TAAT, index compression (slides from last lecture).
Lecture 7: (3/24)
Lecture notes in PDF: (no lecture notes -- will cover notes from 3/10)
Topics covered: Introduction to data compression: Huffman, arithmetic, gzip.
Lecture 8: (3/31)
Lecture notes in PDF:
6 slides per page (inverted indexing compression
)
4 slides per page (link ranking)
Topics covered: Data compression continued: gzip, ppm. Inverted index compression: golomb, rice, gamma, delta, vbyte, S16, PForDelta, Interpolative coding. More on link-based ranking.
Lecture 9: (4/7)
Lecture notes in PDF:
4 slides per page (computing with the web graph)
4 slides per page (graph models for the web)
Topics covered: Advanced link analysis: convergence of pagerank and HITS, related methods.
Lecture 10: (4/14)
Lecture notes in PDF:
4 slides per page (advanced crawling)
Topics covered: Computing with the web graph. Graph compression. Random graph models and the web as a graph. Advanced crawling -- focused crawling, recrawling, web site surveillance and mining, citeseer.
Lecture 11: (4/21)
Lecture notes in PDF: (no new notes)
Topics covered: More recrawling, focused crawling. High performance crawling.
Lecture 12: (4/28)
Lecture notes in PDF:
6 slides per page (advanced architecture and query
processing)
Topics covered: Intro to advanced search engine architecture and query processing. Intro to Google file system
Lecture 13: (5/5)
Lecture notes in PDF:
6 slides per page (google file system)
HTML (mapreduce)
Topics covered: Google File System. Google mapreduce and (maybe) bigTable.