COURSE ANNOUNCEMENT FOR FALL 2005 *********************************** I will offer the following special topics course for graduates students and advanced undergraduates. CS912: Web Protocols and Web Search Engines Instructor: Torsten Suel (suelpoly.edu) Time: Mondays at 6pm. Course summary: This course covers a variety of topics related to web search, information retrieval, and web protocols. The main focus of the course will be on web search engines (such as Google) and the underlying architecture and techniques. In addition, we will also cover fundamentals of information retrieval and data compression, and basics of web protocols, caching, and content distribution. Topics: Introduction to information retrieval, HTTP basics, search engine basics, web crawling, basics of data compression, indexing and index compression, boolean and ranked queries, hyperlink analysis, web data mining and content surveillance, meta search engines, web proxies, caching and content distribution networks, advanced search engine architectures, parallel and distributed search engines, and more. The course emphasizes both algorithmic techniques and implementation aspects. Students are required to complete several substantial programming projects, including building your own mini search engine. Workload will be significant but manageable, and you will learn a lot. Prereqs: Excellent programming skills, preferably in C/C++. General experience with the web and with HTML is expected. Knowledge of algorithms (CS603 or undergrad algorithms) is recommended. Useful but not required are basic knowledge of networking and of Unix network and systems programming, scripting languages, and OS and database concepts. Textbooks: (not yet decided) Chakrabarti: Mining the Web, Morgan Kaufmann 2002. Witten, Moffat, Bell: Managing Gigabytes, Morgan Kaufmann 1999. Krishnamurthy, Rexford: Web Protocols and Practice, Addison-Wesley 2001. In addition, numerous recent research papers will be handed out.