IBM - something in the works? - General Search Engine Marketing Issues forum at WebmasterWorld

"...HyperText Markup Language (HTML) has been the standard format for delivering information on the World Wide Web (WWW). However, HTML has only a limited set of tags for specifying document structures, and these tags are mainly for the purposes of browser presentation. Automated information processing on these documents for data exchange and interoperability has been difficult. Extensible Markup Language (XML), which is a subset of Standard Generalized Markup Language (SGML), has been proposed to be the next standard format that allows user-defined tags for better describing nested document structures and associated semantics...."

"....While HTML documents serve very well for Web browsing, automated information processing on them could be difficult, because there are few semantics associated with the documents. For example, without human understanding or a sophisticated program, it is difficult to know what a number "1991" means in an HTML document; it could be a year, a quantity, or anything. Just as in a programming language, program semantics are defined by a standardized set of keywords. HTML has a limited set of keywords (i.e., tags) and they are mainly for presentation purposes, not for semantics associated with document contents...."

.....there is clearly a need for a search engine that understands document structures and allows a user to ask structured queries. Current search engines either flatten out the structure of a document (i.e., remove nested structures), or have limited, predefined structures (such as paragraphs and sentences, according to some predefined punctuation marks), and thus are not capable of evaluating general ad hoc structured queries. Structured documents also enable comparisons among numeric values, for example, to get the references published after year 1991 from a structured paper (which is not possible with an inverted file based search engine)....

...A successful search engine for a large repository of structured documents relies on good indexing schemes. Therefore, there is a need in the art for designing indexes that support structured queries and execute the queries without resorting to the structured documents....

Method and apparatus for creating an index in a database system [164.195.100.11]

...It is often difficult for a user to locate desirable information resources, or web pages and locating a pertinent resource can consume a substantial amount of time. Locating an information resource is typically done by keyword searching. Keyword searching is accomplished when a user provides a keyword and instructs the client via a server to search for information resources having the keyword or information resources linked to the keyword. Typically, the user receives voluminous information from the Internet when a keyword search is performed. A single retrieval can provide links to a considerable quantity of web sites. Next, the user must sort through the received information for desirable data...."

"...Electronic information transferred between data processing networks is usually presented in "hypertext", a metaphor for presenting information in a manner in which text, images, sounds, and actions become linked together in a complex non-sequential "web" of associations. The web of associates permit a user to "browse" or "navigate" through related topics, regardless of the presented order of the topics. Such links are often established by both the author of a hypertext document, and by the user depending on the intent of the hypertext document...."

For example, traveling among links to the word "iron" in an article displayed within a graphical user interface, in a data processing system, might lead the user to the periodic table of the chemical elements (i.e., linked by the word "iron"), or to a reference to the utilization of iron in weapons in Europe in the Dark Ages.

Method for extracting hyperlinks from a display document and automatically retrieving and displaying multiple subordinate documents of the display document [164.195.100.11]

IBM - something in the works?

msgraph

Brett_Tabke

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week