Published on Nov 30, 2023
The availability of web search has revolutionized the way people discover information, yet as search services maintain larger and larger indexes they are in danger of becoming a victim of their own success.
Many common searches can return a vast number of web pages, many of which will be irrelevant to the searcher, and of which only about ten or twenty of the top-ranked results will be browsed.
The problem is that while pages returned by a search may be relevant to the keywords entered, the keywords generally give only a partial expression of the searcher's information need.
Personalized web search takes keywords from the user as an expression of their information need, and delivers the results from personalized indexing server of their community or organization to assist in determining the relevance of pages
The vast majority of publicly available search engines adopt a so-called query-list paradigm , whereby in response to a user's query the search engine returns a linear ranking of documents matching that query.
The higher on the list, the more relevant to the query the document is supposed to be. While this approach works efficiently for well-defined narrow queries, when the query is too general, the users may have to sift through a large number of irrelevant documents in order to identify the ones they were interested in.
This kind of situation is commonly referred to as a low precision search .
As more than 60% of web queries consist of one or two words, which inevitably leads to a large number of low precision searches. Several methods of dealing with the results of such searches have been proposed.
One method is pruning of the result list, ranging from simple duplicate removal to advanced Artificial Intelligence algorithms. The most common approach, however, is relevance feedback , whereby the search engine assists the user in finding additional key words that would make the query more precise and reduce the number of returned documents. An alternative and increasingly popular method is also search results clustering
PHP
Apache HTTP server 2.2
1 GB RAM
60 GB Hard Disk