Forum Moderators: open

Message Too Old, No Replies

Blocking everything except .

Blocking everything except target country and good bots.

         

Wayder

10:20 pm on May 15, 2023 (gmt 0)

10+ Year Member Top Contributors Of The Month



Hi,

I have a new site that is dedicated to the UK & NI market. It doesn’t have much going on so now is the time to decide my allow/block policy.

I am pretty fed up with the endless list of bots going through my site and when I check them out their pages say, you can BUY the information that they have scraped for [insert_purpose_here]. I am also fed up with continuously monitoring large logs to see if anything else is misbehaving, sooo….

I am thinking of allowing a few selected bots, google, bing, duckduck etc. using a reverse lookup and stoping everything else that does not have a UK/NI IP. I can then easily manually refine the policy using the logs. I will still have to manually check, but it should be a less arduous process with a much smaller log file.

Nice and simple.

Can you think of any major downsides doing it this way?
Thank you.

lucy24

2:17 am on May 16, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You're talking about a whitelist: deny everyone by default, and then allow the ones that fit your criteria. Sure, some sites do it that way. You can also go a long way by blocking based on headers, since most (not all) robots don't bother to dress up as humans.

You'll want to have some way to identify unintentionally blocked humans. It could be something as simple as attaching an image or css file to your 403 page; human browsers will request it, while almost no robots will. And make sure you have a nice, kind 403 page so you don't make those unsuspecting humans unhappy.

Wayder

11:55 am on May 16, 2023 (gmt 0)

10+ Year Member Top Contributors Of The Month



Hello Lucy,

I think the critical issues for me are the unintentionally blocked humans and making unsuspecting humans unhappy. Good points and I will work on them.

I have often wondered if all this scraping helps. I have a three page website which has been there for about 12 years with no changes or blocking. It is page 1 or 2 for 50+ phrases and regularly/mostly comes above the UK goverment sites with almost the same content. Go figure. If I only knew what I was doing.

Thank you for your help.

tangor

9:41 pm on May 16, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've been running whitelist for years ... but that only goes so far. The bad bots still make the effort and some of them get through (hence you will *still* have to read the logs!)

Wayder

11:37 am on May 17, 2023 (gmt 0)

10+ Year Member Top Contributors Of The Month



Hi Tangor,

Agreed, it will make it a lot easier by filtering out all the 403's first though although I think a leasurly look through those every now and again may be interesting as well.

Thanks