Vertical Search Engine Integrating Text and Image Analysis
The iLike image search method analyzes text-based and content-based image features to deliver results that match the true intention of a user's search query.
When searching for multimedia content online or in large-scale repositories (e.g., the Library of Congress Prints and Photographs Catalog), results are typically retrieved using text-based searching of surrounding text or manually annotated metadata. Content-based image retrieval (CBIR) has been intensively studied in the research community, but presents a challenging problem in real-world applications. This is primarily due to the "semantic gap" between low-level visual features and high-level content (i.e., when comparing multiple images, visual feature similarities are not necessarily correlated with content similarities). The iLike method has been developed to bridge this gap for "vertical search" applications that focus on visual content.
The iLike method has proven very effective in a particular application - a product search engine for apparel and accessories. Thus it would be extremely valuable in an e-commerce context. The same method, however, could also improve search results in photo albums or any other collections of images.
How it works:
The iLike system is comprised of three major components: the Crawler, the (Pre-) Processor, and the Search and UI component. The Crawler fetches product pages from retailer websites. A customized parser extracts item descriptions and generates the term dictionary and inverted index. Simultaneously, the image processor extracts visual features from item images. Next, the method integrates textual and visual features in a reweighting scheme, and further constructs a visual thesaurus for each text term. Finally, the UI component provides query interface and browsing views of search results.
iLike bridges the semantic gap by capturing the meaning of each text term in the visual feature space and reweighting visual features according to their significance to the query terms. It also bridges the user intention gap because it is able to infer the "visual meanings" behind the textual queries. Last but not least, it provides a visual thesaurus, which is generated from the statistical similarity between the visual space representations of textual terms.
Why it is better:
The iLike method integrates both text and visual features to improve image retrieval performance. Experimental results show that this approach improves both precision and recall, compared with content-based or text-based image retrieval techniques. More importantly, search results from iLike are more consistent with users' perception of the query terms.