Page is a not externally linkable
- Local
-- Foo
---- lets try this for a month or three...


Brett_Tabke - 11:50 pm on Nov 23, 2005 (gmt 0)


hello from dfw...argh - what a zoo on the busiest travel day of the year. yeow!

> A simple solution I found to slowing and stopping
> unauthorized bots was to just limit the number of
> pages they can download within a certain amount
> of time, and blocking them automatically

and knowing your pattern of usage - you use the site more than alot of the bots. You would look more like a bot to an algo, than many of the bots.

This aint your local phentermine five and dime site! We have members legitimately view thousands of pages a day. One former moderator regularly hit 4k page views a day (a great mod at that).

> Don't have time to read through all this thread right now

Then don't participate if you don't have the decency to read a the recent responses. I understand when it is a huge thread, but this one is doable and alot of good info up there (eg: all your concerns were addressed).

> As a newb I must use the search function

And if you read back a bit, you'll see I addressed that with a solution. I agree, the best thing a newbie can do is READ.

I am pretty surprised how fast we fell out of the indexes, but I am sure because some threw in a removal request as some mentioned above... so I thought we had a lot more time before I needed to roll out the new engine.

> Are you saying that google possessed a more complete record than Yahoo?

ya, but it was all supplemental ;-)

> problems we faced with rogue bots and scrapers

5 to 1 over the humans. If I wouldn't have banned 1k ips, banned 200 agent names, and required login for about 70k users a day on isp X - we would be looking at 50 to 1.

Moral of the story - be careful if you have alot of content, and that content is easily indexable.

We all know that if you really wanted to just ban rogue spiders it would be easy enough to force a login and cookies on everyone while still allowing the major search engines to crawl based on their IP addresses and then to throttle the crawl.

You hit the nail on the head OIL - I can't require cookies (eg: login) and allow the se bots in - or it would be classified as cloaking out right (which we have flirted with here for several years because of this very problem). I've heard that rarely a week goes by where we don't get accused of something by someone and told to the engines...

What has me curious at this point tho is how fast would WebmasterWorld be fully indexed again if the ban was lifted and the 180 days at G were past. So say 7 months for now Brett lifts the ban - how fast does Google pick up those 2 million pages?

....olms - olms - links for the poor... lol

> simply cloaking for Google, MSN, and Yahoo by IP would do the trick.

ya, and do the trick at getting us removed because of cloaking. Sheese, there are those with their shorts in a wad that I hide session ids now.

> you did it just because you were po'd that google wouldn't give you a pr8.

lol. Although it was/is disappointing that we would not get a pr8, that is the last thing on my brain here...

> which demands that the bots stay away, but does NOT instruct them to remove the pages from their index.

Actually - Google interps it to mean that they can still crawl, and use for index purposes, but not display results. So, even if you ban a the bot via robots.txt, you will still get crawled by google. (in our case, the required login will put Gbot at the login page over and over)

> The site feels faster to me, so kudos.

Thanks - it definitely is...

> Upside:

I think they are good for the moment. I understand those that want to peruse the archives and currently can't. I am surprised how fast it fell out of the index...

> Man, brett, so you're really going to hand off all that traffic to your competitors?

I have never valued se traffic of the search engines for community building.

Communities are built right here in the one on one responses - not from some random search. Yes, it is the evil point of entry we all must deal with, but you don't stick around here because we are in index X - you stick around to be involved with the members - to get answers to fresh questions and to answer others questions.

Like I said - it is an experiment. Some are good - and some are bad, but so far - I like the idea of not being beholding to engines for traffic. CoDependancy - takes two to tango...

Life without the engines. hmmmm Can it be done?


Thread source:: http://www.webmasterworld.com/foo/9593.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com