homepage Welcome to WebmasterWorld Guest from 174.129.80.166
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
How to identify and track down spiders
Brett_Tabke




msg:405985
 6:44 pm on Jun 2, 2000 (gmt 0)

I get about one question every day about how to track down spider owners. Here is my method:

1) find the ip address.
2) do a trace route back to the host. For windows, go >start >run and type in tracert ipaddress. With ip address being the ip address of the spider.
3) let the trace finish. Notice back from the end of the trace, the last host you can find. Often this is a tricky step of deciding which was the last real host. Start at the bottom and work up. Usually you'll see if a host has 2-3 boxes and can determine the real host name by guessing.
4) take the host name and try finding it in the browser with some standard incantations of www.host.com or .net. Often that may be all you need.
5) look up the host on a internic whois. Often that can lead you straight to the owner/domain.

You can hit about 50% of them with this system. Most often you'll run into 'joe user' running a spider. Those are hard to know just who or what it was. If the spider was abusive, keep your logs and contact the admin of the host.

Most of the better isp's will take a moment to look into it - it may be someone who is routinely abusive and they need more information to identify them.

Anyone else with tips/tricks or comments on id'ing spiders?

 

scott




msg:405986
 2:25 pm on Jun 7, 2000 (gmt 0)

I don't know if it's the network I'm on, or what but I tried tracing this one and it times out on the second hop. It didn't come up looking under whois either. I'm really curious, though, cuz it's the first spider to crawl my ENTIRE site, start to finish. Got every BL page and all. Maybe someone else may have an idea on it:
209.167.50.28

Scott

VAL@Amsterdam




msg:405987
 4:22 pm on Jun 7, 2000 (gmt 0)

Cool , Tx for this great tip!

Air_




msg:405988
 1:11 am on Jun 8, 2000 (gmt 0)

Scott,

that IP belongs to http://www.seventwentyfour.com/
it looks like they have a link rot service ....

fantomaster




msg:405989
 2:35 am on Jun 8, 2000 (gmt 0)

it looks like they have a link rot service

Link rot's the word! LOL

scott




msg:405990
 1:07 pm on Jun 8, 2000 (gmt 0)

Disappointed but relieved I guess. That was driving me nuts. I did get an email from them this morning saying there was a broken link on one of the BL pages. I forget which one now....email is on pc at home. I can send an email tonite with the broken link if you like? Thanks for solving that one BTW!

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved