Reading List -- CS 6913
Following is the list of papers that are covered or that you have to read as
part of this course. Links are provided. You can also easily find electronic
versions of most of these papers yourself by pasting the titles into Google.
Part I: Basics of HTTP, Crawling, and Search Engines:
Part II: Historical Perspective on Information Science and Retrieval - for your interest:
Additional material -- if you are really interested:
Part III: Indexing and Compression
Recommended but not required additional reading:
Part IV: Link-Based Ranking and Mining of the Web Graph
- J. Kleinberg: Authoritative Sources in a Hyperlinked Environment.
Journal of the ACM, September 1999.
- Chakrabarti/Dom/Kumar/et al.:
Mining the Web's Link Structure. IEEE Computer, August 1999.
- Broder/Kumar/Maghoul/et al.:
Graph Structure in the Web. WWW Conference 2000.
- Randall/Stata/Wickremesinghe/Wiener:
The Link Database: Fast Access to Graphs of the Web.
Data Compression Conference, 2002.
Recommended but not required additional reading:
- T. Haveliwala:
Topic-Sensitive Pagerank. WWW Conference, 2002.
- Trawling the Web for Emerging Cyber-Communities. WWW Conference, 1999.
- Kumar/Raghavan/Rajagopalan/Tomkins:
Extracting Large-Scale Knowledge Bases From the Web.
VLDB Conference, 1999.
- Boldi/Vigna: The Web Graph Framework I: Compression Techniques. WWW 2004.
Part V: Advanced crawling, web surveillance, and specialized search tools
- Lee/Leonard/Wang/Loguinov: IRLbot: Scaling to 6 Billion Pages and Beyond.
WWW Conference, 2008.
- Cho/Garcia-Molina: The Evolution of the Web and Implications for an Incremental Crawler. Int. Conference on Very Large Data Bases (VLDB), 2000.
- Chakrabarti/van den Berg/Dom: Focused Crawling: A New Approach to Topic-Specific Resource Discovery. WWW Conference, 1998.
- Broder/Mitzenmacher: Network Applications of Bloom Filters: A Survey.
Internet Mathematics Vol. 1, No. 4: 485-509. 2002. (read at least first 5 pages)
Recommended but not required additional reading:
- Olston/Pandey: Recrawl Scheduling Based on Information Longevity.
WWW Conference, 2008.
- Pandey/Olston: Crawl Ordering by Search Impact. ACM WSDM Conference, 2008.
- Chakrabarti/Punera/Subramanyam: Accelerated Focused Crawling Through
Online Relevance Feedback. WWW 2002, Hawaii.
- Marc Najork and Allan Heydon: High-Performance Web Crawling..
SRC Research Report 173, Compaq Systems Research Center (2001).
- Shkapenyuk/Suel: Design and Implementation of a High-Performance Distributed Web Crawler. Int. Conference on Data Engineering, February 2002.
- Boyapati/Chevrier/Finkel/et al: ChangeDetector: A Site-Level Monitoring Tool for the WWW. WWW Conference, 2002.
- Lawrence/Bollacker/Giles: Indexing and Retrieval of Scientific Literature.
Conference on Information and Knowledge Management (CIKM), 1999.
Part VI: Search engine architecture, scalable data mining platforms,
advanced query processing
Further optional reading:
Various Videos of Talks and Other Resources for Additional Information