homepage Welcome to WebmasterWorld Guest from 23.20.149.27
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Who.is Bot
iamzippy




msg:4427799
 10:59 am on Mar 11, 2012 (gmt 0)

174.36.196.nnn - - [11/Mar/2012:00:17:03 +0100] "GET / HTTP/1.1" 301 ... www... "-" "Who.is Bot" "-"
174.36.196.nnn - - [11/Mar/2012:00:17:04 +0100] "GET / HTTP/1.1" 200 ... ... "-" "Who.is Bot" "-"

United States Dallas Softlayer Technologies Inc
AS36351, 174.36.0.0/15
Host: 174.36.196.nnn-static.reverse.softlayer.com

Second visit in 6 months from this IP. Ignores robots, follows redirects.

Kinda dull, eh?

 

incrediBILL




msg:4427800
 11:08 am on Mar 11, 2012 (gmt 0)

Most spider hunters block all of Softlayer and would never notice.

Rather clever bot name as it's also their domain name "who.is"

iamzippy




msg:4427827
 12:23 pm on Mar 11, 2012 (gmt 0)

Softlayer is PNG here too. This is the first visit from an IP in that range since I blocked it last October. It got routed to a nothing page by a home-brewed WordPress plugin.

From their whois:
Comment: Our motto: Innovate or Die.

Pity it's not a call for a vote.

lucy24




msg:4427925
 9:53 pm on Mar 11, 2012 (gmt 0)

:: insert classic Jack Benny line here ::

Robots that follow redirects tend to make me uneasy. Unless they're www or directory-slash redirects, which don't really count. At least for robotic purposes.

iamzippy




msg:4427940
 10:29 pm on Mar 11, 2012 (gmt 0)

I entirely agree, Lucy24. In this case it was a www redirect. The jury is out for now.

Pray tell, what's that Jack Benny line?

incrediBILL




msg:4427956
 11:51 pm on Mar 11, 2012 (gmt 0)

Having written a robot, a redirect is a redirect, they don't know one 301 or 302 from the other.

It's when they start chasing meta redirects and javascript redirects you should get nervous, and I did that too :)

lucy24




msg:4427962
 12:19 am on Mar 12, 2012 (gmt 0)

they don't know one 301 or 302 from the other

But it can be included in the programming, can't it? "IF you're redirected from your target URL to the identical URL plus or minus www THEN follow the redirect, ELSE report back to me."

Search-engine robots definitely distinguish between www redirects and "real" redirects. You can see them hippity-hopping in the logs, making two consecutive requests for the same page: a 301 followed by a 200.

what's that Jack Benny line?

"Your money or your life!"
"... I'm thinking! I'm thinking!"

incrediBILL




msg:4427986
 2:16 am on Mar 12, 2012 (gmt 0)

You can see them hippity-hopping in the logs, making two consecutive requests for the same page: a 301 followed by a 200.


Um, that's no different than if anything followed the same redirect, or any other redirect, nothing special.

keyplyr




msg:4427995
 4:21 am on Mar 12, 2012 (gmt 0)


Most spider hunters block all of Softlayer and would never notice.

'nuff said

lucy24




msg:4428014
 5:28 am on Mar 12, 2012 (gmt 0)

I went off to investigate a few random logs and instead found another goofy UA from a blocked IP (China).

Mozilla/4.0 (compatible; MSIE 6.0; MSIE 5.5; Windows NT 4.0) Opera 7.0 [en]

Uh... Make up your mind, willya? Come to think of it, didn't someone post about this variant recently?

Anyway, quick eyeballing suggests that what I get the most of is not 301-plus-200-- except Yandex, which can't get it into its head that it's with www-- but 301-plus-403.

The ones that intrigue me are the ones that ask for robots.txt at the wrong address, get redirected, and never come back. Makes it seem as if they never really wanted it in the first place, doesn't it?

incrediBILL




msg:4428016
 5:31 am on Mar 12, 2012 (gmt 0)

I went off to investigate a few random logs and instead found another goofy UA from a blocked IP (China).


... and that relates to who.is how?

I'm lost on how it ties into this thread, unless I missed the point, new thread perhaps?

wilderness




msg:4428073
 11:03 am on Mar 12, 2012 (gmt 0)

I'm lost on how it ties into this thread, unless I missed the point, new thread perhaps?


Bill, you provided an answer in your first reply, which caused a controversy in another thread.

After the answer was provided, the subsequent replies are all rambling, lucy just added a little more rambling ;)

lucy24




msg:4428170
 4:19 pm on Mar 12, 2012 (gmt 0)

and that relates to who.is how?

I went off to check on the redirect question and got redirected ;)

I think everyone's now up to speed on who who.is is. Except that the name keeps making me think it's "whois.deliriumtremens.com" (substituting for Unprintable Name in the middle).

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved