Forum Moderators: open

Message Too Old, No Replies

Orbiter

         

wilderness

2:45 am on Nov 11, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



[webmasterworld.com...]

They've changed the UA
69.168.43.89 - - [10/Nov/2005:15:45:38 -0800] "GET /robots.txt HTTP/1.1" 403 - "-" "Orbiter (+http://www.dailyorbit.com/bot.htm)"
69.168.43.89 - - [10/Nov/2005:15:45:38 -0800] "GET / HTTP/1.1" 403 - "-" "Orbiter (+http://www.dailyorbit.com/bot.htm)"

GaryK

4:34 pm on Nov 11, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for the update. :)

Is it still well behaved?

wilderness

5:12 pm on Nov 11, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Is it still well behaved?

I'm not sure how well they may behave while getting 403's ;)

The bot thus far has not attempted to repeatedly crawl numerous pages returning 403 as some other bots do.

Don

GaryK

6:28 pm on Nov 11, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Oops, I did it again. I missed the 403 status from your log snippets. At least I'm consistent. ;)

Dijkgraaf

9:10 pm on Nov 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Why is it giving them a 403 on robots.txt though?

wilderness

11:38 pm on Nov 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Why is it giving them a 403 on robots.txt though?

You mean other than I don't like em :)

Because I have either the UA or IP set up on either a Deny From or SetEnvIf as opposed to a Rewrite.

I don't recall the name for SetEnfIf or what module it's part of. I do know how to use them though ;)

Perhaps somebody else will provide.

Don

keyplyr

12:19 am on Nov 14, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I don't recall the name for SetEnfIf or what module it's part of.

"set environment variable if" and "mod_setenvif"

(I love it when I know something)

Dijkgraaf

1:24 am on Nov 14, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You should change your deny rule so that it does allow it to fetch robots.txt, that way if they actually obey robots.txt they will not request any other resources and so you wil see less of them.
I beleive there are some examples of how to do that floating around some threads somewhere.
Ah here is one
[webmasterworld.com...]

wilderness

1:49 am on Nov 14, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You should change your deny rule

I really appreciate your thoughfulness.

However, my htaccess was implemted more than five years ago.
The size of my htaccess, would (I'm sure)make yours appear as a pebble in a pile of boulders.

It currently holds 1417 lines and if Jim hadn't been kind enough to assit me in grasping methods to clean up code and condense some two years ago?
My lines would be approaching 3000 lines.

The majority of bots, I could care less about presenting my websites pages or methods in a courteous order, rather my desire is to just stop them from reaping data.

robots.txt is a good thing for [/b]honorable and current bots[/b], however there are many that change their UA often.
Not to mention the new bots that appear almost daily.

There are colocators appearing like flys on horse crap and they deserve (at least IMO) no courtesy at all.