![]() |
![]() |
TR-CIS-2002-03 (11/08/2002)
Yen-Yu Chen, Qingqing Gan, Torsten Suel
Abstract
Over the last few years, most major search engines have integrated link-based ranking
techniques in order to provide more accurate search results. One widely known approach
is the Pagerank technique, which forms the basis of the Google ranking scheme, and which
assigns a global importance measure to each page based on the importance of other pages
pointing to it. The main advantage of the Pagerank measure is that it is independent of the
query posed by a user; this means that it can be precomputed and then used to optimize the
layout of the inverted index structure accordingly. However, computing the Pagerank measure
requires implementing an iterative process on a massive graph corresponding to hundreds of
millions of web pages and billions of hyperlinks.
In this paper, we study I/O-efficient techniques to perform this iterative computation. We derive two algorithms for Pagerank based on techniques proposed for out-of-core graph algorithms, and compare them to two existing algorithms proposed by Haveliwala. We also consider the implementation of a recently proposed topic-sensitive version of Pagerank. Our experimental results show that for very large data sets, significant improvements over previous results can be achieved on machines with moderate amounts of memory. On the other hand, at most minor improvements are possible on data sets that are only moderately larger than memory, which is the case in many practical scenarios.