Forum Moderators: open

Message Too Old, No Replies

Lehigh University

         

wilderness

10:53 pm on Mar 19, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



128.180.121.221 - - [19/Mar/2005:13:02:37 -0800] "GET / HTTP/1.1" 200 9503 "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)"

Anybody getting small pokes from Lehigh?
Couple pages yesterday, couple today.
No robots. No images.

jmccormac

9:45 am on Mar 20, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yep Lehigh became a major problem here trying to download about 90K pages. I deepsixed the whole Lehigh net allocation on the IP level. No UA - it appeared to be a scraper and was bouncing off two cse.lehigh.edu webservers/proxies.

Regards...jmcc

gordongecko

10:52 am on Mar 23, 2005 (gmt 0)

10+ Year Member



Yup,

Lehigh University poking around here as well - a few pages here and there - no images.

IP: 128.180.121.221
UA: Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)
*Note* Accept_Language is blank

Not once have I seen these "pokes" from Uni computing centers come to any good. They often come back full-on and rip entire sites - usually looking for keywords, density, etc.

We've got several ip ranges banned from other University Computing Centers.

For us it's better to err on the side of caution:
RewriteCond %{REMOTE_ADDR} ^128\.180\.

GG

wilderness

6:21 pm on Mar 23, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Not once have I seen these "pokes" from Uni computing centers come to any good

gordon,
I guess there are exceptions to everything :)

I get some incoming traffic from Cornell, however I have links going to their Making of America section.
I also get some occassional standard traffic from Rutgters. (Have some folks there I email with.)
The University of Kentucky as well.
All, without crawls.

On the other hand, I've had many University's come in on crawls.

Thanks to both of you for the Lehigh insight. I added them in before they started a crawl.

Don

IrishWonder

11:12 am on Mar 25, 2005 (gmt 0)

10+ Year Member



got one today

wume2.cse.lehigh.edu - - [25/Mar/2005:09:45:10 +0200] "GET /directory/file.html HTTP/1.1" 200 21190 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

128.180.121.222
wume2.cse.lehigh.edu
Lehigh University
183 Computing Center, Building 8B
Bethlehem
PA
18015
United States

Why does it say it's Googlebot though? That's a bit suspicious...

After digging through the log some more I found another one also posing as Googlebot:

pool-68-236-42-186.phil.east.verizon.net - [22/Mar/2005:17:56:18+0200]GET /directory/file.html HTTP/1.1 200 21069 - Googlebot/2.1 (+http://www.googlebot.com/bot.html)

68.236.42.186
Verizon Internet Services
1880 Campus Commons Dr
Reston
VA
20191
United States

Both only requested one file.

volatilegx

2:07 pm on Mar 25, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've also seen both of those. They are somebody spoofing the UserAgent of Googlebot in hopes of uncovering cloaked pages.

jmccormac

2:26 pm on Mar 25, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Looking for cloaked pages does not give a bunch of muppets the right to hammer my site (which does not have cloaked pages) or anyone else's site. I hope the people in Lehigh university realise that sysadmins in the real world will shoot first and ask questions later.

Regards...jmcc

Staffa

3:00 pm on Mar 26, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



sysadmins in the real world will shoot first

and even faster when they impersonate someone else :

128.180.121.222----Googlebot/2.1 (+http://www.googlebot.com/bot.html)