Forum: Search Engine Spider and User Agent Identification
Category: The Search Engine World
Moderator: incrediBILL & Ocean10000
Previous Moderator: volatilegx (founding moderator: littleman)
Founded: Nov 2, 1999
Spiders are small independent programs that go out and download websites. They take the website data (same that is viewed in a browser) and use it for various purposes. Our theme here is mainly Search engine promotion, thus we are mostly concerned with search engine spiders.
Every thread must be approved by a moderator before it is published. Please see the guidelines below for reasons why posts may not be approved. We try to make pre-moderation decisions in a timely manner - but because we are a volunteer staff and not always available, a decision can take as long as 12-24 hours.
The moderators often edit post titles and may not always send a note to explain. Title edits are made to attract more clicks to your thread, to clarify differences between similar topics, and to help similar discussions appear as clearly non-duplicate to the search engines.
Spiders, Spider IP's, and other spider topics, design, care & feeding are also welcome.
Additionally, some spiders hide as various programming library default user agents [webmasterworld.com] or common browser user agents therefore the scope of the forum has expanded to include generic user agent identification and elimination as part of the spider identification process.
The WebmasterWorld Terms of Service [webmasterworld.com] remain in full effect in this forum.
IP addresses tend to change ownership over time so unless the IP information is expressly owned by a search engine, such as Google or Yahoo, needs to be obfuscated in the D block of the IP address.
Any IP address or reverse DNS information not expressly belonging to a search engine should be masked as follows: