Computer & Information Science Department   Polytechnic University

ATTENTION: THIS WEB SITE HAS MOVED. The pages you are looking at are no longer being maintained. Please go to http://www.poly.edu/cis/ to visit the new site of the Department of Computer and Information Science at Polytechnic University.

I/O-Efficient Techniques for Computing Pagerank

TR-CIS-2002-03 (11/08/2002)
Yen-Yu Chen, Qingqing Gan, Torsten Suel

pdf version of this paper

Abstract
Over the last few years, most major search engines have integrated link-based ranking techniques in order to provide more accurate search results. One widely known approach is the Pagerank technique, which forms the basis of the Google ranking scheme, and which assigns a global importance measure to each page based on the importance of other pages pointing to it. The main advantage of the Pagerank measure is that it is independent of the query posed by a user; this means that it can be precomputed and then used to optimize the layout of the inverted index structure accordingly. However, computing the Pagerank measure requires implementing an iterative process on a massive graph corresponding to hundreds of millions of web pages and billions of hyperlinks.

In this paper, we study I/O-efficient techniques to perform this iterative computation. We derive two algorithms for Pagerank based on techniques proposed for out-of-core graph algorithms, and compare them to two existing algorithms proposed by Haveliwala. We also consider the implementation of a recently proposed topic-sensitive version of Pagerank. Our experimental results show that for very large data sets, significant improvements over previous results can be achieved on machines with moderate amounts of memory. On the other hand, at most minor improvements are possible on data sets that are only moderately larger than memory, which is the case in many practical scenarios.