Object Based Internet Search
Today's Internet search engines compute their centralized index
by crawling web contents. This approach implies two major problems: large and
relevant parts of the Internet content are not
reachable by crawling and thus
remain inaccessible for search engines, bandwidth and its growth impose harsh
limits on central index currency and indexable share of vastly growing available
information.
The obvious solution is a distributed approach to
information retrieval that better leverages the available bandwidth in order to
achieve higher index currency and improved coverage, including deep web
contents. Forward knowledge -like keyword indices - have to be stored closer to
the searchable information sources than the central index approach currently
does. Furthermore,
their updating has to happen in a more bandwidth-efficient
manner as compared to the change detection heuristics and "brute force" crawling
methods used today.
Metrics collected have shown that the amount of
publicly accessible information on the Internet grows much faster than the
available Internet backbone bandwidth. In particular it turns out that
the
amount of application generated content grows especially fast. Crawling-based
technologies for global Internet search won't be able to scale with these growth
factors. Furthermore, many of the application-generated contents are unreachably
hidden from the crawlers in the "deep web".
The solution lies in
reversing the paradigm of Internet search. Content providers will have to
contribute to the searchability of their information space, thus making search
more bandwidth-efficient and making deep web contents accessible to
search.
A prototype showing how this can be done has been brought online
together with a white paper explaining the most important concepts. It combines
best-of-breed techniques from the field of information retrieval with an
architectural approach to designing searchable online applications.
(Cooperation: Interactive Objects Freiburg)
Contact: Axel Uhl