Forum Moderators: open

Message Too Old, No Replies

Freshbot and Robots.txt

         

DavidT

3:28 pm on Mar 12, 2003 (gmt 0)

10+ Year Member



Does freshbot usually disregard robots.txt when it comes to images.

I've been getting this all day:
64.68.86.59 - - [12/Mar/2003:06:23:08 -0800] "GET /robots.txt HTTP/1.0" 200 2566 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
64.68.86.59 - - [12/Mar/2003:06:23:08 -0800] "GET /Assets/images/picture.jpg HTTP/1.0" 403 239 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

I could swear my robots.txt is correctly configured. I've never had any problem with search engine robots before in this area. The file says:
User-agent: *
Disallow: /htdocs/Assets/

This includes sub-folders surely?

Also today i had:
35.11.210.144 - - [11/Mar/2003:17:27:36 -0800] "HEAD / HTTP/1.1" 200 0 "-" "dumbBot"

and

216.75.194.54 - - [12/Mar/2003:06:14:39 -0800] "HEAD / HTTP/1.1" 200 0 "-" "GornKer Crawler"

Anyone know them?

weesnich

5:04 pm on Mar 12, 2003 (gmt 0)

10+ Year Member



If you cited everything correct, Googlebot did not violate your robots.txt.
Disallowed is:
/htdocs/Assets/
Googlebot requested something from
/Assets/...
This is not a subfolder of /htdocs/Assets

> This includes sub-folders surely?
Yes.

wilderness

5:11 pm on Mar 12, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



dumbBot comes from a Michigan State University IP.
I tried some google searches and could only find reference to java bots.
I have an inquiry in at MSU.

I have a portion of googled denied for image infractions in protected folders. At the time it looked as though goggle weny haywire and then the image spidering ceased.

"216.75.194.54"
I'm going to deny their entire range.
Thanks for the heads up.
navisite dot com

photoace

8:09 pm on Mar 13, 2003 (gmt 0)

10+ Year Member



I had the same bot today. My logs show :
oleszkie.user.msu.edu - - [13/Mar/2003:11:52:24 -0800] "HEAD / HTTP/1.1" 200 0 "-" "dumbBot"

Looks like a computer science student at mich. state. I did a search on their server under "oleszkie"

Hasn't done anything else yet, but didn't go for the robots.txt

leoo24

5:59 pm on Mar 17, 2003 (gmt 0)

10+ Year Member



i've also had dumbot at my site today, revisited about 3 times
35.11.210.144 (oleszkie.user.msu.edu)
seems harmless, does anyone think i should ban his ip?
or is it he's just got himself a little spidering project

whois on their site did show he is a student
Oleszkiewicz, Jonathan James
oleszkie@msu.edu
[msu.edu...]
Student
Masters
Cmptr Sci

while i'm here i had 'dlc 1.19' in the browser logs also, anyone know it, all i could find was the same entry in other peoples logs?

pendanticist

2:19 pm on Mar 19, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hey wilderness,

Have you heard anything from MSU yet?

oleszkie.user.msu.edu - - [18/Mar/2003:23:57:40 -0800] "GET /robots.txt HTTP/1.1" 200 188 "-" "dumbBot"

I got my visit last night and as you can see, it did ask for robots.txt.

Pendanticist.

wilderness

6:58 pm on Mar 19, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Pendanticist
No reply from my friend thus far.
She said she'd see what she could do. She doesn't work in the data center.