2010

Authors

  • Pavel Kalinov Pavel Kalinov

The hyperlinked part of the internet known as "the Web" arose without much planning for a future of millions of publishers and countless pieces of online content. It has no in-built mechanism to find anything, so tools external to it were introduced: initially web directories and then search engines. Search engines are based on machine learning and have been extremely successful. However, they have some inherent limitations and cannot, by design, address some needs: they serve the "information locating" need only and not "information discovery". Search engine users have learned to accept them and in many cases do not realise how their search has been limited by shortcomings of the model. Before the advent of the search engine, web directories were the only information-finding tool on the web. They were manually built and could not compete economically with the effciency of search engines. This lead to their virtual extinction, with the effect that the "information discovery" need of users is no longer served by any major information provider. Furthermore, none of the dominant information-finding models account for the person of the user in any meaningful way controllable by (or even visible to) the user. This work proposes a method to combine a search engine, a web directory and a personal information management agent into an intelligent Web Exploration Engine in a way which bridges the gaps between these seemingly unrelated tools. Our hybrid, for which we have developed a proof-of-concept prototype [Kalinov et al., 2010b], allows users to both locate specific data and to discover new information. Information discovery is served by a web directory which is built with the assistance of a dynamic hierarchical classifier we developed [Kalinov et al., 2010a]. The category structure achieved by it is also the basis of a large number of nested search engines, allowing information locating both in general (similar to a "standard" search engine) and in a variety of contexts selectable by the user.