TR-CIS-2005-03 (02/26/2005)
Alexander Markowetz, Yen-Yu Chen, Torsten Suel, Xiaohui Long, Bernhard Seeger

pdf version of this paper

Abstract
In this paper, we describe the design and initial implementation of a geographic search engine prototype for Germany, based on a large crawl of the .de domain. Geographic search engines provide a flexible interface to the Web that allows users to constrain and order search results in an intuitive manner, by focusing a query on a particular geographic region. Geographic search technology has recently received significant commercial interest, but there has been only a limited amount of academic work in this direction so far. Our prototype performs massive extraction of geographic features from crawled data, which are then mapped to coordinates and aggregated across link and site structure. This allows us to assign to each web page a set of relevant locations, called the geographic footprint of the page. The resulting footprint data is then integrated into a high-performance query processor on a cluster-based architecture. We discuss the various techniques, both new and existing, that are used for recognizing, matching, mapping, and aggregating geographic features, and describe how to integrate geographic query processing into a standard search engine architecture and search interface.