Summary of a few papers from SIGIR 2012 - Part I

(photo from: http://www.city-data.com/picfilesv/picv32970.php)

Here is a short summary of a few papers from SIGIR 2012:

  • Adaptation of the Concept Hierarchy Model with Search Logs for Query Recommendation on Intranets by Ibrahim Adeyanju, Dawei Song, M-Dyaa Albakour, Udo Kruschwitz, Anne De Roeck and Maria Fasli. This paper talks about enhancing query suggestions on Intranets. The paper combines concept hierarchy model with query-flow graph based query suggestions. First the paper talks about creating hierarchical clustering of intranet documents to create concept hierarchy. Then for a given query find candidates from the concept hierarchy and uses query logs to adapt the query suggestions candidates based on past user clicks. Here are more details about the overall project and related papers:  http://autoadapt.essex.ac.uk/tiki/tikiwiki-3.0/tiki-index.php
  • An Exploration of Ranking Heuristics in Mobile Local Search by Yuanhua Lv, Dimitrios Lymberopoulos, Qiang Wu. This paper describes in depth analysis of local search features such as user's location, ratings and number of reviews for a business, user's profile and personal preference, and how each of these features affect click-rate on results. This paper also talks about incorporating the category of businesses (used by local search engines such as Yelp, Google local, and Bing local) in ranking results. This paper describe a machine learning approach to combine these signals to predict click-rate.
  • Detecting Quilted Web Pages at Scale by Marc Najork. Web spam detection is a serious issues in improving the quality of search results. This paper talks about an algorithm for detecting 'quilted' web pages (web-pages that are stitched together by combining content from other web pages. The algorithm takes a corpus of web pages as input and outputs a set of quilted web pages along with source pages used to in those quilted web pages. The algorithm first extracts patch grams by finding k-grams that are not too popular (occur in at most) m web-pages and occur in at least one web-page . Then for each document, the algorithm finds patch grams and the source documents (other than the document in consideration) containing the patch-grams.

Also, industrial track was very interesting and I will post a summary soon. I presented on Related Searches at LinkedIn that I described in an earlier post: related searches at LinkedIn blogpost.