homepage Welcome to WebmasterWorld Guest from 23.20.77.156
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
MIIxpc?
mark_roach




msg:400657
 9:32 pm on Mar 13, 2001 (gmt 0)

I don't currently block anyone. However after blowing my bandwidth limit last month I intend to start doing so now.

Does anyone know why I shouldn't add these 3 to the list ?

JennyBot
MIIxpc
teoma_agent3

 

Machiavelli




msg:400658
 10:09 pm on Mar 13, 2001 (gmt 0)

User-agent: Googlebot
Disallow: /

mivox




msg:400659
 10:34 pm on Mar 13, 2001 (gmt 0)

Why would you want to ban Google from your entire site?

Air




msg:400660
 10:01 pm on Mar 15, 2001 (gmt 0)

>MIIxpc

Not absolutely sure, but there is some indication that this may be altavista.nl or altavista.de, can anyone confirm?

oLeon




msg:400661
 3:51 pm on Mar 16, 2001 (gmt 0)

Air,
I notice that spider, too. I donīt know where it comes from, but I would guess it spiders the livesearches of another engines.

volatilegx




msg:400662
 11:26 pm on Mar 27, 2001 (gmt 0)

Isn't this for the robots.txt file? Am I missing something? Most of these robots would never check the robots.txt file, right?

Dan

mark_roach




msg:400663
 1:42 pm on Mar 29, 2001 (gmt 0)

>MIIxpc

>I notice that spider, too. I donīt know where it comes >from, but I would guess it spiders the livesearches of >another engines.

oLeon, do you think that is what is going on here ?

195.121.6.106 - - [28/Mar/2001:06:35:55 -0500] "GET / HTTP/1.1" 200 9693 "http://195.121.7.86/cgi-bin/zoeken/avsearch.cgi?pg=q&q=border+terrier&kl=XX&what=web&stq=10" "Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt)"

195.121.6.106 - - [28/Mar/2001:06:35:59 -0500] "GET /images/film.jpg HTTP/1.1" 200 5911 "http://www.champdogs.co.uk/html/home.html" "Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt)"

212.78.177.71 - - [28/Mar/2001:06:36:00 -0500] "GET /images/film.jpg HTTP/1.0" 200 5911 "-" "MIIxpc/4.2"

195.121.6.106 - - [28/Mar/2001:06:36:12 -0500] "GET /html/search_menu.htm HTTP/1.1" 200 1815 "-" "Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt)"

195.121.6.106 - - [28/Mar/2001:06:36:12 -0500] "GET /html/search.htm HTTP/1.1" 200 824 "http://www.champdogs.co.uk/html/master_menu.htm" "Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt)"

212.78.177.71 - - [28/Mar/2001:06:36:12 -0500] "GET /html/search.htm HTTP/1.0" 200 824 "-" "MIIxpc/4.2"

212.78.177.70 - - [28/Mar/2001:06:36:12 -0500] "GET /html/search_menu.htm HTTP/1.0" 200 1815 "-" "MIIxpc/4.2"

It followed the surfer right round my site, taking the identical pages including the graphics.

sjoerd




msg:400664
 10:19 pm on Apr 19, 2001 (gmt 0)

It's an accelerator of Mirrorimage.net. Just resolve the IP and then do a whois on the domain xpc-mii.net and you'll end up at mirrorimage.net.
They offer some kind of shared proxy-cache. Everytime you get visited by websurfer having enabled a proxy that is using this shared proxy-cache, you'll find this thing hitting your site some 30 minutes later...

max_b




msg:400665
 11:59 pm on May 31, 2001 (gmt 0)

Is it supposed to crawl behind a .htaccess fence?

On my site it also follows the behaviour of another user.

skirril




msg:400666
 12:03 am on Jun 5, 2001 (gmt 0)

To my knowledge, it is very hard to crawl behind such a fence, because its the server blocking. Unless of course the fence had holes in it..

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved