homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Local / Foo
Forum Library, Charter, Moderators: incrediBILL & lawman

Foo Forum

This 223 message thread spans 8 pages: < < 223 ( 1 2 3 4 5 6 [7] 8 > >     
lets try this for a month or three...
last recourse against rogue bots

 1:21 am on Nov 19, 2005 (gmt 0)


required login the real story here...
MSN and yahoo bots were blocked in October. This does everyone else.



 5:41 pm on Nov 24, 2005 (gmt 0)

Pulling the plug on the SEs prior to launching an alternative site search is my only complaint. The number of repeat questions by competent members will likely increase, the number by the ask-before-looking-crowd is already maxed. Brett says internal site search will be available soon - an inconvenient wait but not critical. Should the wait become indefinite or the result be substandard we can say something then.

I don't know the backend code, server capability, etc. Brett does so I will take his word on what is necessary, possible, or not. I have appreciated his efforts (and those of everyone else involved) for several years and believe that on the past record he deserves, at the least, the benefit of any doubt.

Ban The Bots!
Nasty little creepy crawlies.


 6:03 pm on Nov 24, 2005 (gmt 0)

I'll add that Brett's said publically more than once that he dreamed fo the day he could ban all bots and WebmasterWorld would stand on its own. I guess I just didn't believe he'd actually do it. Ballsy move - we'll be watching closely to see how WebmasterWorld fares going forward. All the complaining and #*$! folks aside I know none of us wish ill upon WebmasterWorld.


 6:26 pm on Nov 24, 2005 (gmt 0)

it was all your fault todd


 6:28 pm on Nov 24, 2005 (gmt 0)

>>he dreamed fo the day he could ban all bots and WebmasterWorld would stand on its own

yeah and then he fell over ;)


 7:17 pm on Nov 24, 2005 (gmt 0)

Forgive me if this is a noob response.

Couldn't you let bots in for one day a week only, in the middle of the night, and ban them the rest of the time? This, combined with bandwidth-throttling, and a bad bot repellent script, would help.

Unlike most other websites, Bots will come back to WebmasterWorld time and again due to all the links pointing to it. So let them Monday in the early AM only, to get the latest pages.

And bar them from all save the latest pages using mod_rewrite.

Greedy pigs behave one way, polite visitors another. Must be some way 'round this.

Interesting problem. Wish I had it :)


 7:30 pm on Nov 24, 2005 (gmt 0)

Removing google search ability before having something else in place is the worst. And from earlier casual posts, that is not even close to being done since the direction isn't set. It should be already in place for supporters at a minimum. Now no one can complain about all the new posts we'll have because people can't do any research on their own. We'll need to have the same old posts and questions over and over again. It makes it hard to pay to support a site that doesn't support it's members but says it's all about the members. I hope it all works out for the best!

Maybe this is to get us all so frustrated that we'll welcome some kind of advertising to sponsor the search function instead of complaining.


 10:19 pm on Nov 24, 2005 (gmt 0)

Just for fun I'm going to share an old crackpot theory I had about WW's search.

The thought was that the poor functionality of WW's site search was intentional and that encouraging users to use Google's 'site:' search had the side effect of increasing WW's rankings.

I theorized that Brett and co. knew some superstar SEO secret that the more 'site:' specific searches G sees for a particular domain, the higher it might rank it.

To me (I'm not an SEO pro) it would be logical to conclude that seeing tens of thousands of 'site:' queries per day for a particular domain would be an excellent indicator of a site's popularity/authority.

Now I wonder if all those searches had perhaps a negative effect: increasing the frequency and depth of indexing by the bots. It would seem to me that Google's (and Yahoo's?) algo might do this based on the same logic "hey, this site is really popular, we better index it like crazy to maintain accurate and thorough results".

Obviously, this wasn't WW's intention but I'd be interested to hear what people think. Could the number of 'site:' searches seen by the SE's possibly affect ranking or indexing frequency/depth?


 10:28 pm on Nov 24, 2005 (gmt 0)

can you use a program like Fluid Dynamics Search Engine?
i use it on my site, works great for us


 10:38 pm on Nov 24, 2005 (gmt 0)

IMHO there is no reason that ww cannot use YPN adsense selectively on some forums or on a % of pages say 25 % to pay for the immense bendwidth charges.

Turning away traffic and Google I believe would lead to ultimate downfall of WW

How do you think will new webmasters find you...? I found ww by search and feel lucky now

and there is a crowd joining the webmaster band everyday given what adsense / ypn does to your pocket

Please do not take anything personally, this is a well meant advise.


 10:47 pm on Nov 24, 2005 (gmt 0)

Hey Brett,
Ever thought of burning the whole site to DVD and selling it? Then I could copy to my hard drive and search locally for all the old threads. No server load to you and you could make some bucks...

Just a thought, maybe WW members could then get an update DVD once a quarter as an incentive for people to suscribe.


 10:50 pm on Nov 24, 2005 (gmt 0)

In support of those who think its arrogant in the extreme to remove the only reasonable way to search the content provided by the good citizens of this community.

'doh I didnt think that would happen' oh yea homer?

If webmaster world of all places cant find solutions to the load/bandwidth issues, then may your god help the internet.

I conclude its been done purely and simply to stop paying supporters searching for the unsubscribe instructions.

but otoh maybe I too missed the point



 11:22 pm on Nov 24, 2005 (gmt 0)

I remember a couple of years back when the ODP was basically brought to its knees by over-agressive scrapers and crawlers. Not sure how it got fixed, but it did.

We're not seeing the picture that Brett sees.. I think we should trust his judgement on this one. Sure, the missing site search is a PITA but I'm sure we'll all cope. Although personally I would monetise the site more and then invest in the biggest servers I could lay my hands on.

You know though, I've always found that as a site GROWS then then the average VALUE of the visitor decreases (either in terms of revenue or other value.) Look at the threads about the last Google update if you want to see what the signal/noise problem is like. So perhaps this (temporary) change will help to refocus things a little.

As a site note, keep an eye on the Alexa stats. For a busy site like this they'll be fairly accurate. Sure, there's a dip today but then I understand it's this "Thanksgiving" holiday the Americans like so much.

And a final thought.. this is perhaps an experiment that none of us would want to try ourselves - it'll be interesting to see how it pans out.


 12:08 am on Nov 25, 2005 (gmt 0)

Being without a search facility for webmasterworld has made me realise just how much I use it to find answers to questions. So first and foremost thanks for creating such a great resource!

Bots are a pain - as are spam email. I wish someone had the answer on how to effectively deal with them.

I look forward to the new search but think that we should give Brett a break - it sounds like he's been putting in the hours trying to find a solution to misbehaving bots.

I also think that WebmasterWorld.com could do some advertising, whether its affiliate links or adwords. I'm not overly found of advertising - but lets face it most other sites we visit have advertising.


 1:34 am on Nov 25, 2005 (gmt 0)

One of the silver linings here is that for members who post valuable info, then reproduce it on their sites, its no longer dupe content.

Searching for some stuff on WW is now more of a redirect thru blogs and sites that reference many of the topics.


One thing to remember is that everything has a life cycle, and when we look at that in an online perspective, we see that it isnt always as long as we think. With the logarithmic increase in blogs, particularly by those who are contributors to some of the more valuable content, a google search will take us to other resources without the noise.

I hope you know what you are doing Brett, because without a site search soon, the value of WW as a community will be measured against the value of WW as a resource. I personally care more about the resources than the community, but that's just me. I send a lot of people in my niche here for answers to questions. I frequently use Google to quickly find the url, then I post that for them to go to. Dont underestimate the resource/reference side of WW.


 2:22 am on Nov 25, 2005 (gmt 0)

The ODP had all functions (public, editor, forum, etc) all on one box, and bots killed it.

It was fixed by then having 3 load-balanced public servers, 1 editor server, 1 forum server, etc.


 6:43 am on Nov 25, 2005 (gmt 0)

Seeing as this is the Foo forum, Brett's timing for killing these spiders is interesting. November is the eighth canonical month (Ecclesiastical year; astronomical also, as it happens, starting 21 March) and Brett announced his decision to kill them on this site just as that month finished.

I've been steadily adding new spiders each month to my robots.pm file since getting AWStats in May, and had already noticed that there were far more new spiders this last 4 weeks than at any other period since I started. Synchronicity of 8, huh?.


 8:06 am on Nov 25, 2005 (gmt 0)

I don't understand the first post:

required login the real story here...

What story?

I read through the posts, and what I get out of this, the site owner is blocking all bots because they consume too much bandwidth? But why block even Google? I am confused, where is the story?

I am curious to know the story because I run a very high bandwidth consuming forum (no monetization yet), and I would like to know the motivations for this act.


 9:00 am on Nov 25, 2005 (gmt 0)

Come on Brett you have kept us all waiting now, tell us the real reason why you have banned all bots..?


 9:27 am on Nov 25, 2005 (gmt 0)

If I were Brett I would want to monetize this site, we all want to make our own sites profitable so why not Brett.

Remove content from the web, make the search available to members only. There could be a landing page to direct searchers for WebmasterWorld to the site (like the pubcon page)

This is the greatest resource for those involved in websites, and as such, very valuable both to us and Brett.

I honestly feel it is time that people start to accept that everything on the 'net is not free.

I do not suggest the above is what BT has in mind, nor is it in any way critical.


 11:21 am on Nov 25, 2005 (gmt 0)

I'd be cool with that... The reason I pay a subscription to WebmasterWorld is becuase I see it a useful resouce that I want to be able to access. Giving Brett some $$$ will keep this site going... (I also think the threads I've read in the Supporters forum are some of the best info & advice I've read on the web...)

Up until now I've taken a step back because I know that whatever info I need is in WebmasterWorld and I could use Google to find it. But without a search function I will have to spend more time reading the forums. Time I really don't have.


 12:14 pm on Nov 25, 2005 (gmt 0)

Regarding monetizing, here's a quote from Jenstar (don't know which post I get this from awhile back because I can't search for it anymore...)

"I have seen many sites try to switch to a
subscription model in order to make the site
either pay for itself or generate revenue, and I
have recently seen several die a painful death
while trying to implement it. Why? Because unless
it is information you cannot find anywhere else,
people can find another site with the same or
similar information for free. If it comes up
unexpectedly, your current visitor base can
become upset. Or worse, one of the people who
refused to pay the subscription fee goes and
starts a competitive non-subscription site. Some
sites can pull it off, but many can't."

--Jenstar (moderator, WebmasterWorld)


 1:51 pm on Nov 25, 2005 (gmt 0)

Getting back to "rogue bots" as Brett's stated reason for having to shut down search, I keep coming back to the question:

"If rogue robots are such a problem, then how do the big sites deal with it?"

Somehow they all must, or they'd all be out of business, right? So how come WebmasterWorld can't?


 1:55 pm on Nov 25, 2005 (gmt 0)

Hello Play_Back. How did you find Webmaster World?


 2:01 pm on Nov 25, 2005 (gmt 0)

Hi lawman,

> Hello Play_Back. How did you find Webmaster World?

Probably through a Google search. And you?


 2:05 pm on Nov 25, 2005 (gmt 0)

if found it via google btw ... maybe Brett does not want any more problem makes like me finding the place :)


[edited by: DaveN at 2:19 pm (utc) on Nov. 25, 2005]


 2:14 pm on Nov 25, 2005 (gmt 0)

Probably through a Google search. And you?

One of the things that scares me with the current situation. I also found WebmasterWorld with a Google search. After the site popped up a number of times for webmaster related queries I decided to join and eventually become a paying member.

By not being visible anymore in the major search engines, the number of new users may decrease. For some forums--like the AdSense forum--this might lead to increased quality in the short term, but I am not sure how the community will evolve if WW remains unfindable from the search engines for a longer period of time. A good thread is often not started by a senior who knows it all, but by a newbie who happens to ask the good questions. A community is an organic entity and by cutting one of its sources for new blood, it can change into something unwanted which is difficult to reverse.


 2:24 pm on Nov 25, 2005 (gmt 0)


Brett said it was a server load issue not one of bandwidth.

I can relate to that.

I just hope that his new search system doesn't cause the same issue for him.


 2:25 pm on Nov 25, 2005 (gmt 0)

For some topics, maybe WebmasterWorld doesn't need the same 20 questions asked again and again every single day, when each question has already been answered in several hundred previous repetitive threads?


 2:32 pm on Nov 25, 2005 (gmt 0)

> For some topics, maybe WebmasterWorld doesn't need the same
> 20 questions asked again and again every single day, when
> each question has already been answered in several hundred
> previous repetitive threads?

Without a search function this is just going to spiral upwards. One of things that did irk me was the elitist way there wasn't a set search function. Newbies to search wouldn't be able to work out how to use Google to search WebmasterWorld and so ended up asking repetive questions.


 2:50 pm on Nov 25, 2005 (gmt 0)

Hey Brett,
Ever thought of burning the whole site to DVD and selling it

I'd be interested in this. Would be hard to stop it becoming a plug-and-play clone or scraper site. Seems it would need encryption of some type and raster display on decryption, allowing only [ code ] blocks to be selected for copy & paste. Major undertaking.


 3:25 pm on Nov 25, 2005 (gmt 0)

<don't worry, be happy>

> combined with bandwidth-throttling,

I think it would only hurt members (probably me worst of all). Most of the bots are not agressive at speed, and fall somewhere in the middle of the regular members usage patterns. It is the constant number of them and consisitent spidering. Hence, the reappearance of session ids and tagged pages on most pages of the site. This has helped weed out about another 30 bots that were smart enough to support cookies and be setup with a u/p login. The content is flowing back to unindexable...

> bad bot repellent script

And what would that look like? It wouldn't look like:

- page view throttling. (as mentioned, many members regularly hit 1-2k page views a day and can view 5-10 pages a minute at times). However, above that threshold is doable...or as a mod put it 'doneable'.
- bandwidth throttling. Mod_throttle was tested on the old system and only clogs up the system for other visitors - it is also pretty processor intensive - it is very noticable to all when you flip it on).
- agent name parsing - ya, it's laughable.
- cookie requirements (eg: login). I think you would be surprised at the number of bots that support cookies and can be quickly setup with a login and password.
- ip banning - takes excessive hand monitoring (which is what we've been doine for 7 years). The major problem is when you get 3-4k ips in your htaccess, it tends to slow the whole system down.
- intelligent combo of all of the above? yep, that is basically where we are at right now.

> before having something else in place

agreed, the speed with which we fell out of the index caught me by surprise. I was expecting 30 days. In the interim a project came up that demanded attention this week. oops.

In the meantime, we could use some help from members [webmasterworld.com]. Many of those are from people who found us via an engine...

> we better index it like crazy

The major problem is that all this content is dynamic and to get "if modified since" support, is difficult. We will have to move to non-parsed header scripts and generate our own if mod since headers. That would slow down spidering enormously.

> can you use a program like Fluid Dynamics Search Engine?

I am a major fan of Zoltans work over on XAV. What a script. As most know here, it is was what we were using to about 200k pages, when it slowly faded because it was so slow and so much of a system killer. Great program - I highly recommend it on smaller sites.

> immense bendwidth charges.

Currently, we have no bandwidth charges other than the base fees. I don't expect that to change any time soon.

> Ever thought of burning the whole site
> to DVD and selling it?

Yes, thought of it alot. There are a lot of issues involved - many of which out weight the benefits at this time.

> incentive for people to subscribe

Your word of mouth recommendation is the best reason there is for people to subscribe and support the site.

> Not sure how it got fixed, but it did.

They moved from dynamic to static content. They moved from a mid range box to 3 high end boxes with round robin dns. They optimized their scripts/programs. They were more aggressive in offering downloads of the db . They banned scripts designed to raid them. They started changing key bytes on the html that crippled older scripts ability to parse the html - thus they would have to be rewritten. Lastly, there were so many sites running copies of the odp, that Google got aggressive at pr0'ing them. That decreased the value of a odp clone site to about zero.

> I hope you know what you are doing Brett,

hehe. ya right. Life is best when it is lived on a Wing-n-Prayer baby!

> I think we should trust his judgement on this one.

Oh, I'm not saying it was a perfect decision. Oilman mentioned we'd been considering requiring cookies for ages, this was not a long preplanned. It was an emotional reaction to all the bot attacks, scrapper sites, blog leeches, open Chinese proxy sites, and hyper aggressive big search engine crawlers. It was a, throw hands in the air I've had it moment. The site is here for the members to be involved and engaged in - robots do not make an online industry community - people do.

> announced his decision to kill them on
> this site just as that month finished.

Hey, I remember Nostradamus mentioning something like this. ;-) (I think this is how rumors start. lol!)

> tell us the real reason

I give up trying to disabuse people of any notion to the contrary. I've laid it all out for you here. It doesn't matter what I say Mick, there will be those that think just the opposite. I've said since day one that rogue bots are the #1 issue we face here as a community site. I don't talk about it too much in public because it is hard to discuss when the very problem is reading the page. It is just like google talking about spam issues too much. Once they do, then someone will design a system to game that feature within a few hours. After we had put up the required login here - I found 10 new bots that had been given cookie support and were crawling the site as logged in registered members.

In the meantime, we could use some help from members [webmasterworld.com]. Many of those are from people who found us via a engine...

> put up more ads.

Sorry, not at this time. We like it direct advertising free as it is now. I don't discount the possibility, but we have no plans at current to put advertising on the site.

Yes, we do give exhibitors and sponsors of PubCon page views here and that is in their agreement with us. We do that to promote the conference and let people know who is going to be at the show. It supports us and the members, MORE than it does the exhibitors.

> then how do the big sites deal with it?

They have multiple servers. Of which, most are static and don't require constant syncing. Other bigs sites such as: auction houses, email sites, and other massively dynamic sites have custom software that keeps their servers in sync. Basically what they have are programs that can "read from any server, but must write to ALL servers" code. That code is about half done here. Yes, we will be investing in infrastructure and flushing out that setup in the next year. Gasp - moving to a full real db (sql) is in our non-too-distant future.

> doesn't need the same 20 questions asked again and again every single day

And a search function does not always do alot to help that. The content is all so similar here, that on many issues, the major search engines were of little help. How many duplicate "duplicate content", or "supplemental results" questions has the google forum seen? A search engine didn't help there.

> I just hope that his new search system
> doesn't cause the same issue for him.

It will for awhile. There will be problems with that at the start. Moving the search engine to new server is the answer, but that will take a few weeks.

So, patience please...

In the meantime, we could use some help from members [webmasterworld.com].

</don't worry, be happy>

...wow those are pretty black helicopters outside.

This 223 message thread spans 8 pages: < < 223 ( 1 2 3 4 5 6 [7] 8 > >
Global Options:
 top home search open messages active posts  

Home / Forums Index / Local / Foo
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved