Fast First-Phase Candidate Generation for Cascading Rankers.
Qi Wang, Constantinos Dimopoulos and Torsten Suel.
39th ACM Int. Conference on Research and Development in Information Retrieval (SIGIR'16).
Current search engines use very complex ranking functions
based on hundreds of features. While such functions return
high-quality results, they create efficiency challenges as it is
too costly to fully evaluate them on all documents in the
union, or even intersection, of the query terms. To address
this issue, search engines use a series of cascading rankers,
starting with a very simple ranking function and then ap-
plying increasingly complex and expensive ranking functions
on smaller and smaller sets of candidate results. Researchers
have recently started studying several problems within this
framework of query processing by cascading rankers.
We focus on one such problem, the design of the initial
cascade. Thus, the goal is to very quickly identify a set of
good candidate documents that should be passed to the sec-
ond and further cascades. Our contribution
is to propose a framework that builds specialized single-term
and pairwise index structures, and then during query time
selectively accesses these structures based on a cost budget
and a set of early termination techniques. Using an end-
to-end evaluation with a complex machine-learned ranker,
we show that our approach finds candidates about an order
of magnitude faster than a conjunctive top-k computation,
while essentially matching the quality.
A Candidate Filtering Mechanism for Fast Top-K Query Processing on Modern CPUs.
Constantinos Dimopoulos, Sergey Nepomnyachiy and Torsten Suel.
36th ACM Int. Conference on Research and Development in Information Retrieval (SIGIR'13).
A large amount of research has focused on faster methods for finding top-k results in large document collections, one of the main scalability challenges for web search engines. In this paper, we propose a method for accelerating such top-k queries that builds on and generalizes methods recently proposed by several groups of researchers based on Block-Max Indexes. In particular, we describe a system that uses a new filtering mechanism, based on a combination of block maxima and bitmaps, that radically reduces the number of documents that have to be further evaluated. Our filtering mechanism exploits the SIMD processing capabilities of current microprocessors, and it is optimized through caching policies that select and store suitable filter structures based on properties of the query load. Our experimental evaluation shows that the mechanism results in very significant speed-ups for disjunctive top-k queries under several state-of-the-art algorithms, including a speed-up of more than a factor of 2 over the fastest previously known methods.
Optimizing Top-k Document Retrieval Strategies for Block-Max Indexes.
Constantinos Dimopoulos, Sergey Nepomnyachiy and Torsten Suel.
6th ACM Int. Conference on Web Search and Data Mining (WSDM'13).
Large web search engines use significant hardware and energy resources to process hundreds of millions of queries each day, and a lot of research has focused on how to improve query processing efficiency. One general class of optimizations called early termination techniques is used in all major engines, and essentially involves computing top results without an exhaustive traversal and scoring of all potentially relevant index entries. Recent work in proposed several early termination algorithms for disjunctive top-k query processing, based on a new augmented index structure called Block-Max Index that enables aggressive skipping in the index.
In this paper, we build on this work by studying new algorithms and optimizations for Block-Max indexes that achieve significant performance gains over the previous work. We start by implementing and comparing Block-Max oriented algorithms based on the well-known Maxscore and WAND approaches. Then we study how to build better Block-Max index structures and design better index-traversal strategies, resulting in new algorithms that achieve a factor of 2 speed-up over the previous best results with acceptable space overheads. We also describe and evaluate a hierarchical algorithm for a new recursive Block-Max index structure.
Text vs. Space: Efficient Geo-Search Query Processing.
Maria Christoforaki, Jinru He, Constantinos Dimopoulos
Alexander Markowetz and Torsten Suel.
20th ACM Int. Conference on Information and Knowledge Management (CIKM'11).
Many web search services allow users to constrain text queries to a geographic location (e.g., yoga classes near Santa Monica). Important examples include local search engines such as Google Local and location-based search services for smart phones. Several research groups have studied the efficient execution of queries mixing text and geography; their approaches usually combine inverted lists with a spatial access method such as an R-tree or space-filling curve. In this paper, we take a fresh look at this problem. We feel that previous work has often focused on the spatial aspect at the expense of performance considerations in text processing, such as inverted index access, compression, and caching. We describe new and existing approaches and discuss their different perspectives. We then compare their performance in extensive experiments on large document collections. Our results indicate that a query processor that combines state-of-the-art text processing techniques with a simple coarse-grained spatial structure can outperform existing approaches by up to two orders of magnitude. In fact, even a naive approach that first uses a simple inverted index and then filters out any documents outside the query range outperforms many previous methods.
A Web Page Usage Prediction Scheme using Sequence Indexing and Clustering Techniques.
Constantinos Dimopoulos, Christos Makris, Yannis Panagis,
Evangelos Theodoridis and
Data & Knowledge Engineering (DKE'10).
In this paper we consider the problem of web page usage prediction in a web site by modeling users navigation history and web page content with weighted suffix trees. This user's navigation prediction can be exploited either in an on-line recommendation system in a web site or in a web page cache system. The method proposed has the advantage that it demands a constant amount of computational effort per one user's action and consumes a relatively small amount of extra memory space. These features make the method ideal for an on-line working environment. Finally, we have performed an evaluation of the proposed scheme with experiments on various web site log files and web pages and we have found that its quality performance is fairly well and in many cases an outperforming one.
Quasi-complete list of publications
Personalized URL recommendation system.
Constantinos Dimopoulos, Ratan Dey and Yigit Kiran.
We developed a personalized URL recommendation system in Online Social Networks using vector space model and supervised text classification machine learning techniques. The real time URL recommendation system utilized public tweets, friends tweets, web search results and user's personal preferences.