Internet search techniques: using word count, links and directory structure as internet search tools
Abstract
As the Web grows in size it becomes increasingly important that ways are developed to maximise the efficiency of the search process and index its contents with minimal human intervention. An evaluation is undertaken of current popular search engines which use a centralised index approach. Using a number of search terms and metrics that measure similarity between sets of results, it was found that there is very little commonality between the outcome of the same search performed using different search engines. A semi-automated system for searching the web is presented, the Internet Search Agent (ISA), this employs a method for indexing based upon the idea of "fingerprint types". These fingerprint types are based upon the text and links contained in the web pages being indexed. Three examples of fingerprint type are developed, the first concentrating upon the textual content of the indexed files, the other two augment this with the use of links to and from these files. By looking at the results returned as a search progresses in terms of numbers and measures of content of results for effort expended, comparisons can be made between the three fingerprint types. The ISA model allows the searcher to be presented with results in context and potentially allows for distributed searching to be implemented.Citation
Moghaddam, M.M. (2005) 'Internet search techniques: using word count, links and directory structure as internet search tools'. PhD thesis. University of Luton.Publisher
University of BedfordshireType
Thesis or dissertationLanguage
enDescription
A thesis submitted for the degree of Doctor of Philosophy ofthe University of LutonCollections
The following license files are associated with this item:
Related items
Showing items related by title, author, creator and subject.
-
Knowledge modeling in prior art searchGraf, Erik; Frommholz, Ingo; Lalmas, Mounia; Van Rijsbergen, Keith (Springer, 2010)This study explores the benefits of integrating knowledge representations in prior art patent retrieval. Key to the introduced approach is the utilization of human judgment available in the form of classifications assigned to patent documents. The paper first outlines in detail how a methodology for the extraction of knowledge from such an hierarchical classification system can be established. Further potential ways of integrating this knowledge with existing Information Retrieval paradigms in a scalable and flexible manner are investigated. Finally based on these integration strategies the effectiveness in terms of recall and precision is evaluated in the context of a prior art search task for European patents. As a result of this evaluation it can be established that in general the proposed knowledge expansion techniques are particularly beneficial to recall and, with respect to optimizing field retrieval settings, further result in significant precision gains.
-
Novel moving target search algorithms for computer gamingLoh, Peter K. K.; Prakash, Edmond C. (2012-05-11)
-
Probabilistic search with agile UAVsWaharte, Sonia; Symington, Andrew; Trigoni, Niki (IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 2010)Through their ability to rapidly acquire aerial imagery, Unmanned Aerial Vehicles (UAVs) have the potential to aid target search tasks. Many of the core algorithms which are used to plan search tasks use occupancy grid-based representations and are often based on two main assumptions. Firstly, the altitude of the UAV is constant. Secondly, the onboard sensors can measure the entire state of an entire grid cell. Although these assumptions are sufficient for fixed-wing, high speed UAVs, we do not believe that they are appropriate for small, lightweight, low speed and agile UAVs such as quadrotors. These platforms have the ability to change altitude and their low speed means that multiple measurements may easily overlap multiple cells for substantial periods of time. In this paper we extend a framework for probabilistic search based on decision making to incorporate multiple observations of grid cells and changes in UAV altitude. We account for observation areas that completely and partially cover multiple grid cells. We show the resultant impact on a number of simulation examples.