homepage Welcome to WebmasterWorld Guest from 54.161.166.171
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Local / Foo
Forum Library, Charter, Moderators: incrediBILL & lawman

Foo Forum

This 223 message thread spans 8 pages: < < 223 ( 1 2 3 4 5 [6] 7 8 > >     
lets try this for a month or three...
last recourse against rogue bots
Brett_Tabke




msg:329347
 1:21 am on Nov 19, 2005 (gmt 0)

[webmasterworld.com...]

required login the real story here...
MSN and yahoo bots were blocked in October. This does everyone else.

 

Robin_reala




msg:329497
 1:29 pm on Nov 24, 2005 (gmt 0)

no one is looking forward to the inevitable compatability problems

Judging by the quality of support in forum83 that should really be an issue :)

claus




msg:329498
 1:41 pm on Nov 24, 2005 (gmt 0)

You know, I've been saying this for years, literally - even several times in these forums: A good webmaster will want loyal revisiting users, not bots or random traffic.

So, if this site can work without SE's all the more power and respect to my fellow members, to Brett and to the mods throughout the years for that. Well done!

It's not the first site to achieve that status, but it's certainly one of the largest.

--
An internal site search is *badly needed* though, and a very good one at that. But that's apparently in the works, so I'll just have to be patient (like any good doctor will tell you to be).

lasko




msg:329499
 1:53 pm on Nov 24, 2005 (gmt 0)


Thats a bold move Brett, I say I hope one day I could work on a web site that doesn't have to think about Search Engines :)

Just imagine forgetting Search Engines all together we could bang out sites left right and center without the worry.

I must say I'm going to miss the Google search, specially in the Php section looking up previous posts and syntax tips.

Looking forward to help test the new search function.

I only wish I used my Add Bookmark button more, never mind.

Hands up all those who would love to close the door on search engines!

I remember finding WW in Google for the very first time many years ago. When I searched Google for php answers etc I got WW and Co and 99% of the time the answers.

WW is like a family on the web you can go about your business each day knowing that the support or community is right there when you need it, thats why WW doesn't require Google or Yahoo any longer!

Good Luck Brett on the new search function!

rj87uk




msg:329500
 2:01 pm on Nov 24, 2005 (gmt 0)

My view is hes doing what is needed to be done. 'On yer sel, Brett.'

Play_Bach




msg:329501
 2:02 pm on Nov 24, 2005 (gmt 0)

> > if bandwidth is a problem

> It's not - system load is.

> Sooner or later, you are going to kick the nighbors out of the house and build a fence.

OK, well my question still stands. How does eBay, Amazon, craigslist, Yahoo! or any of the other big portals deal with this bot problem? Somehow they are all able to be spidered by Google (et al) and also provide site search - anybody know?

Thanks

DaveN




msg:329502
 2:08 pm on Nov 24, 2005 (gmt 0)

system load is this a inerrant problem with the BestBBS

[edited by: DaveN at 2:13 pm (utc) on Nov. 24, 2005]

notsleepy




msg:329503
 2:10 pm on Nov 24, 2005 (gmt 0)

>It's not - system load is.

Uhhhh. Load balancing? Dells are cheap.

lasko




msg:329504
 2:13 pm on Nov 24, 2005 (gmt 0)

How does eBay, Amazon, craigslist, Yahoo! or any of the other big portals deal with this bot problem? Somehow they are all able to be spidered by Google (et al) and also provide site search - anybody know?

Its called a multi-million dollar investment in load balancing, servers, bandwidth etc.

WW is a free forum with donations from regular supporters and I guess only 1 server that has to handle a huge load.

DaveN




msg:329505
 2:15 pm on Nov 24, 2005 (gmt 0)

but if it is server load.. how are we going to handle a site search?

surley that will bring the server to it's knees

DaveN

oddsod




msg:329506
 2:21 pm on Nov 24, 2005 (gmt 0)

Hands up all those who would love to close the door on search engines!

Exactly! If you're playing high risk games with SEs you want them around 'cause that's how you get your buzz. Everyone else would like to give up the addiction but won't admit it in public and cetainly won't go the whole hog and ban all bots.

Lawnboyronmiller




msg:329507
 2:21 pm on Nov 24, 2005 (gmt 0)

yeah, just setup a server farm. i have a server farm of about 10 servers. You can set up a simple one, and just round-robin dns..

if anything just Disallow forums /forum30/ (forum 1-XX) and you probably would have cut down robot load by 99% while still maintaining your homepage.

If your gone for 6 months, thats pretty bad... and self-inflicted is even worse. One thing about pubcon's is you see old friends you've made and you see fresh faces at each one...

Play_Bach




msg:329508
 2:26 pm on Nov 24, 2005 (gmt 0)

> Its called a multi-million dollar investment in load balancing, servers, bandwidth etc.

But I thought Brett said the problem was "rogue bots," right? So how do the big portals deal with it? Does anybody know?

Thanks.

Brett_Tabke




msg:329509
 3:14 pm on Nov 24, 2005 (gmt 0)


> yeah, just setup a server farm.
> i have a server farm of
> about 10 servers. You can set
> up a simple one, and just round-robin dns..

Load balancing is easy on the send - syncing writes is very difficult and requires software specific to your app to deal with. That software is started, but not complete here. I'd guess 50 files a minute are updated. Probably 8-10k files a day are changed. Some how, the software has to sync those files across all the server simultaneously.

> how are we going to handle a site search?

Good point - yet to be seen. I am working under the theory that we'll do as we did before and put it on sew [searchengineworld.com] - or another server altogether. Also, plan B is aspseek, which isn't too bad load wise via a sql server on the same box (although, I think the results are pretty poor).

But ya know - of all the people that would have issue, or voice support for the action - yours is completely baffling Dave. You talk often about being so "black hat", eschewing scrapper sites, loathing the engines, and ripping them of traffic, that you should be the first to support this!?

> So, if this site can work without SE's
> all the more power and respect to my fellow
> members, to Brett and to the mods throughout
> the years for that. Well done!

Claus - thanks!

> server load is

there is also the issue of ripped content... that is another independent story though that I don't think is prudent to discuss in public.

> system load is this a inerrant problem with the BestBBS

I believe it is the most system friendly forum on the web.
/. was last heard to be running about .75 our page views and uniques, but on 8 load balanced servers. Eg: bestbbs is 10-15 times more efficient than / code. Which is pretty good considering I wrote the software to handle 1,000 members and 10,000 page views a day (take times 100 to get into the ball park of where we were last week). Who ever freakin believed we would have 600k files in just threads? yeow...

[edited by: Brett_Tabke at 3:20 pm (utc) on Nov. 24, 2005]

reseller




msg:329510
 3:14 pm on Nov 24, 2005 (gmt 0)

Brett!

I see a revolution in what you have done :-)

Lets assume that everything will go successfuly as you have thought and planned. And WebmasterWorld grow and nurish without being listed in search engines, and I do hope that that exactly whats gonna happen.

Have you ever thought about what that means for the SEO industry?

In fact you are showing the owners of big sites that they don't need SEO and search engines to survive!

Leaving those famous 26 steps to the webmasters of mini-sites, small and medium sites.

And I can already see 100s of SEO specialists starting shining their resumes :-)

Play_Bach




msg:329511
 3:24 pm on Nov 24, 2005 (gmt 0)

> In fact you are showing the owners of big sites that they don't need SEO and search engines to survive!

Pretty early to be making that claim...

DaveN




msg:329512
 3:29 pm on Nov 24, 2005 (gmt 0)

brett said : But ya know - of all the people that would have issue, or voice support for the action - yours is completely baffling Dave. You talk often about being so "black hat", eschewing scrapper sites, loathing the engines, and ripping them of traffic, that you should be the first to support this!?

lol, Maybe I'm getting soft in my old age :)

DaveN

incrediBILL




msg:329513
 3:39 pm on Nov 24, 2005 (gmt 0)

syncing writes is very difficult and requires software specific to your app to deal with.

OK, I could explain to you the concept of time invariant data models and how you could easily update write synchs seamlessly across multiple servers as each server would always have it's correct snapshot "in time" as they all update, but it only works if your data architecture is designed properly.

Been there, done that, I'm not cheap ;)

ogletree




msg:329514
 4:21 pm on Nov 24, 2005 (gmt 0)

If all the non SEO forums are causing all the traffic maybe you can just allow a few SEO forums to be indexed. I think that would cut down useless traffic quite a bit. I do understand the bad bot problem. A site this populer and mainstream can get away with all kinds of stuff that looks black hat but is not because of the reasons behind it. Cloaking is not bad and is done by lots of big sites includimng google itself. I'm talking about allowing the top 3 spiders in by IP and everyone else has to log in. It is not to rank better it is to cut expenses. Yeah somebody will report you but stuff like that G will look at it and know why. Specially if it is done publicly and becomes newsworthy. Even if they do ban you how is that differnt than you banning them both have the same result. At least this way there is a chance of getting new quality visitors.

Of course I came to WebmasterWorld from the advice of a friend.

DaveN




msg:329515
 4:29 pm on Nov 24, 2005 (gmt 0)

here is a thought ... now the SE's can't get in here .. the content in here just went up though the roof .. I mean it's orginal content that the SE's CAN'T see it's just begging to be lifed and dropped on to a scaper site ..

hey Brett .. look back to my old evil blackhat self ;)

DaveN

Kirby




msg:329516
 4:35 pm on Nov 24, 2005 (gmt 0)

Whatever you do Brett, I hope it works. I can tell you that it is somewhat baffling to me after talking with notsleepy, seth, oil and jatar_k outside the bar one night about a search function. What I kept hearing was that a search function was soooo difficult to pull off and with the exception of the Supporters forum, at least we could use Google, so this really sucks.

With regard to new members, I found this via word of mouth 3 years ago. With blogs, WW gets even more play, so that is probably a wash.

Its not like this place will fall apart in 60-90 days, but I hope you are testing this with a goal in mind. Not sure though, since you and DaveN seem to be on different pages and I would have expected your mods to be in this with you from the start, or at least willing to give it a chance. Making decisions of this magnitude without your mods is foolish and arrogant, especially when you want this to out live you. Without good mods it doesnt stand a chance, and without search, you will wear out your mods. Losing your mods from this community, but past and present, is not something you can afford.

I wonder if somepeople are just wanked that their profile will no longer pass pr? So it comes back to their inability to game google through webmasterworld. interesting... this must be what G feels like during an update. lol

This is the only thing that really irked me in this thread. We have good reasons to be wanked, so that was a cheap shot. Like many others, I havent put anything in a profile. Up until last week at PubCon, and outside of my supporters registration, my ID was completely seperate from my online nic.

Best of luck with this and Happy Thanksgiving.

DaveN




msg:329517
 4:41 pm on Nov 24, 2005 (gmt 0)

Kirby I used google all the time to find stuff in WebmasterWorld .. so thats why I'm wanked

The first I knew this was happening is when a member MSN IMed me ..lol

DaveN

Kirby




msg:329518
 4:47 pm on Nov 24, 2005 (gmt 0)

Dave, I figured as much, based on the conversations I had with other mods about search.

I used Google to find Suzy's css tutorial. It was bookmarked in my laptop someone stole. This is more than a community to me. It is a resource. Its like someone just took away my library card.

Stefan




msg:329519
 4:50 pm on Nov 24, 2005 (gmt 0)

I've been following this thread since it started and have become increasingly more baffled. I understand entirely that WW can probably get away with this (try a search for "webmasterworld" in G), but "search engine addiction"? What on earth are the alternatives? Shall I spend billions putting billboards up all over the planet that have the site URL's on them (actually, I just need to cover the areas our tourists come from - call it half a billion)? To consider this approach further, beyond just how it would affect me - will all of the sites on the net also be putting up billboards, and will we soon run out of dry land on which to place them? If we eradicate SE's, we have to at least put up signs advertising the ODP. Of course, the users could just start guessing URL's and typing those in.

Best of luck to all those who decide to hide their sites from now on, but I'm going to leave mine out there where everyone can find them. (And if that's an addiction, what the hec - I also like a few beers everyday, so this won't be the first one I'm dealing with.)

Play_Bach




msg:329520
 4:57 pm on Nov 24, 2005 (gmt 0)

Having a good site search is an asset - Google certainly provided that. I thought the reason Brett shut it down was because of "rogue bots" - which would seem to be a security issue, right? Somehow, I'm not following how WebmasterWorld with all it's programmer talent couldn't be just as secure as eBay, Amazon or Yahoo! - perhaps "rogue bots" isn't the whole picture here...

jetboy




msg:329521
 5:19 pm on Nov 24, 2005 (gmt 0)

Having just spent the best part of an hour trying to track down one of my own old posts without Google - hey, I've forgotten more than I know, and WebmasterWorld often helps to jog my memory - I've got to to agree with Kirby's library card analogy. It hurts even more when it's your own books you're looking for!

AlexK




msg:329522
 5:29 pm on Nov 24, 2005 (gmt 0)

I hope that your actions cause the SEs some pause for thought.

Brett_Tabke #150:
> if bandwidth is a problem
It's not - system load is.

Brett_Tabke #71:
slurp was so aggressive that it was too much load

Same problem my site... Yahoo reply says "didn't fine an actual problem ... (use) crawl-delay directive". Of course, Yahoo is not alone in this - the G Mozilla-bot has actually brought other websites down due to an over-agressive GET-frequency. Very satisfying to think that your actions with this site may actually get this message home to them.

Brett_Tabke #67:
New site search engine is in alpha ... Not in any real big hurry for it

Brett_Tabke #69:
Less than 1 in 1k users use site search

You need to re-think your attitude to the importance of in-house search now. There was little need for it before, because it was covered by the SEs. This place is not just a community, it is also a resource.

It is now a resource who's history cannot be accessed.

Powdork




msg:329523
 5:29 pm on Nov 24, 2005 (gmt 0)

Kind of funny.
There isn't any whining about how bad Google's results are in this thread. You don't know what you got until it's gone.

tigertom




msg:329524
 5:35 pm on Nov 24, 2005 (gmt 0)

Off topic: I'm in the UK. You guys know what the word "#*$!" means over here, right? And I think the Brits invented it. The word, I mean.

Later: Heheh, WebmasterWorld does too now, it seems.

DaveN




msg:329525
 5:37 pm on Nov 24, 2005 (gmt 0)

im the the uk too, #*$! is about the only bad word thats not on the filter lol

DaveN




msg:329526
 5:38 pm on Nov 24, 2005 (gmt 0)

oops i guess not any more lol #*$!

iamlost




msg:329527
 5:41 pm on Nov 24, 2005 (gmt 0)

Pulling the plug on the SEs prior to launching an alternative site search is my only complaint. The number of repeat questions by competent members will likely increase, the number by the ask-before-looking-crowd is already maxed. Brett says internal site search will be available soon - an inconvenient wait but not critical. Should the wait become indefinite or the result be substandard we can say something then.

I don't know the backend code, server capability, etc. Brett does so I will take his word on what is necessary, possible, or not. I have appreciated his efforts (and those of everyone else involved) for several years and believe that on the past record he deserves, at the least, the benefit of any doubt.

Ban The Bots!
Nasty little creepy crawlies.

This 223 message thread spans 8 pages: < < 223 ( 1 2 3 4 5 [6] 7 8 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Local / Foo
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved