| 8:12 pm on Nov 25, 2005 (gmt 0)|
Anyway, I was thinking if it would be possible to set up a mirror site where posting wasn't allowed and redirecting all known and detected robots over there, would that work? Let the bots chew on a slow mirrored site and keep the real site relatively bot free. When a user logs into the mirror site, have them redirected over to the real site. At least Google will have something to crawl.
I was also thinking one of those Google search appliances would be a good replacement for the site search, but looking at it and the number of pages here it would cost a fortune. Brett, know anybody over there that would give you a good discount on that? Maybe as part of some beta test or something?
[edited by: Brett_Tabke at 9:51 pm (utc) on Nov. 27, 2005]
| 8:13 pm on Nov 25, 2005 (gmt 0)|
You bet - one of the most interesting I can remember (because it is live, and totally relevant to my site - that just has to also echo out to others).
|Seems like there is a great deal of interest in the topic |
That is the human element that gets us all coming back here, Brett. Give me human over spiders every time.
|It was a, throw hands in the air I've had it moment |
Another echo of a problem we all face [webmasterworld.com] - the link is the way I've managed to fix it on my site (and how I found WebmasterWorld in the first place), and it is so useful to know the problems faced here, and how you are managing it.
|rogue bots are the #1 issue we face |
On the issues of Search - since G is so good at this, do they have a solution that could be installed on your server? Just a thought.
Great minds, huh?
|I was also thinking one of those Google search appliances... |
[edited by: AlexK at 8:16 pm (utc) on Nov. 25, 2005]
| 8:15 pm on Nov 25, 2005 (gmt 0)|
> So if I had a copy on my cpu, I could then use Google desktop to search it?
I suspect this comment is made with a mouthful of facetious irony, but the answer is certainly "yes."
In fact, this is what I do with a few news sites from a certain Third World country. Each news site individually is terribly biased (pro/anti-government), but by aggregating the stories into a local searchable database, I can get a more clear, "bigger picture."
| 8:16 pm on Nov 25, 2005 (gmt 0)|
What I don't get is, why did you block Google?
Cant you just block all bots, except google?
| 8:30 pm on Nov 25, 2005 (gmt 0)|
> Cant you just block all bots, except google?
I believe the explanation is that the site requires a login now so it would require cloaking to allow Googlebot in - and the cloaking would lead to a ban anyway.
| 8:46 pm on Nov 25, 2005 (gmt 0)|
> The major problem is when you get 3000-4000 ips in your htaccess
Also, why are these in htaccess not in the server config (httpd.conf) directly? That's much faster.
| 8:48 pm on Nov 25, 2005 (gmt 0)|
Would it really be cloaking? Cloaking content and feeding a SE different content because it is a SE is bad. But all it is really doing is giving the SE exactly what the human gets after they log in.
So instead of a full blown cloaking script, you would just need to edit the login script to not require log ins from certain IP's. Can that truely be considered cloaking?
Just a thought.
| 8:51 pm on Nov 25, 2005 (gmt 0)|
This Sucks! No site search an any engine AS WELL as no search box on webmasterworld.com
This doesnt seem right at all...
Why not go purely cloak and only server pages to known IP's? Because cloaking is bad?
I dont get it, why be scared of "purely cloaking" if your willing to give up all search engine traffic anyway?
| 8:58 pm on Nov 25, 2005 (gmt 0)|
After being a member with many names over the past seven years I have to say that this is unbelievable.
There has been so much focus on search on this site over the years I cannot believe two things:
--The action taken to remedy the problem you are having.
--There is no search box on ww.com?
For the first time I cannot find what I am looking for and I am going to be forced to get the information elsewhere.
Figure it out man!
| 8:59 pm on Nov 25, 2005 (gmt 0)|
|I believe the explanation is that the site requires a login now so it would require cloaking to allow Googlebot in - and the cloaking would lead to a ban anyway. |
If thats the case, why not enable guests to browse the forums, just not repond? Wouldn't this solve the problem?
| 9:16 pm on Nov 25, 2005 (gmt 0)|
If you required users to have cookies, but didn't require them to login. Could you require everyone but google to have a cookie to view the site?
This wouldn't be cloaking because as long as the users browser supported cookies they could see the site, but you would allow google to also see the site without cookies. This would stop most non-authorized spiders but also allow the search engines in.
Seems requiring the ability to set a cookie would accomplish the same thing as requiring a login.
| 9:34 pm on Nov 25, 2005 (gmt 0)|
What about requiring unknown IP addresses who are not logged in (or even those with < xx posts) to enter a human verifier/catchpa after viewing 10 or 20 pages? I would imagine a lot of the SE traffic is looking for answers to specific problems, so this probably would not even affect too many people.
The human verifier page could have something saying "register so you don't have to do this anymore."
| 9:34 pm on Nov 25, 2005 (gmt 0)|
|(most were proxy servers) |
and reading betwixt the lines..
Isn't it nice to know your "friends" at least put on the carnival masks before attempting the burglary :)
warm and cosy feeling ..
actually..given the expression on the "kittens"( where'd they go?..404's aint what they used to be ;) I am slightly surprised that you didn't smack a few bots back into their boxes and break a few spider legs and lairs whilst you were at it ..
showed restraint ..
whilst i think of it and there is some agreement behind scenes in stickies ..particularly with reletion to "update" threads etc .."post" POST MODERATION mightn't be a bad idea either ..even if there lies the path to censorship ..cut out an awfull lot of the "can't be bothered to read the TOS" or "I cant read 500 posts" etc ..WHY NOT!
[edited by: Leosghost at 9:53 pm (utc) on Nov. 25, 2005]
| 9:40 pm on Nov 25, 2005 (gmt 0)|
> Figure it out man!
Funny, but useless.
"Where's my waitress?"
| 10:06 pm on Nov 25, 2005 (gmt 0)|
Holy cow... the backlash begins.
Alexa's trend for WW since the Google listings were sliced shows how important good Google rankings are currently.
| 10:06 pm on Nov 25, 2005 (gmt 0)|
'scuse the "speeling" in the foregoing..I was worried about running out of "edit window"..
Point is ..if as a side effect of the current config the quality of the threads gets back to what it was ..( endless cut and pastes of what "Cutts" / "gg" said are really not good for this place ..after all it does say professional in the description ..and whilst all of us were "noobs" somewhere ..at some time ..of late there is a tendancy of many recently to post just to see their own posts ..and of some real new visitors to think that some of them ( the multi-cut and pasters and DC watchers )actually know what they are talking about ..
If one wants slash dot type "Warhol fame" posts ..this site is not what one opens ..
mini rant ended..needed saying ..semi on topic IMO
Alexa ..rotfalol ..only for drive by black hat planning are alexa relevant ..
| 10:19 pm on Nov 25, 2005 (gmt 0)|
You can use those graphical 'words' that are warped and angled for login. Having four letters shouldn't be a big deal for us users. That will stop all bots. Then make special access for Google bots and maybe Yahoo and MSN.
| 10:29 pm on Nov 25, 2005 (gmt 0)|
As you have no search facility for the forum software that you use, I spent about 10 minutes today trying the roundabout way to search for something on Webmaster World through Google and I thought Google was broken. Now I know what happened....
Can you use something like VBulletin and limit Search to Supporters Only?
| 10:32 pm on Nov 25, 2005 (gmt 0)|
"rogue bots are the #1 issue we face"
I'm seeing it too on a much smaller sized message board. I've seen some over in the phpbb forums wondering "why all these large increases in visitors online?" Sometimes ten fold of the actual registered visitors on much smaller boards. It has been posted in their forums but nobody had an answer for it.
How else can it be explained? Man I really feel for Brett because I've been dealing with what almost looks to be the same problem. I was on a shared server and they shut it down twice. I've recently upgraded but the strange IP's are still hammering away.
But then again, I'm kinda lost with alot of this stuff
| 10:56 pm on Nov 25, 2005 (gmt 0)|
I don't understand. Writing a bot that accepts cookies is a piece of cake, so to speak. So it's the registration/login that's used to separate humans from bots. But if I were a new visitor coming from G, I couldn't be bothered to register just to see some contents I have never seen before and don't know the quality of. I'd just move on to the next result. The most I could be bothered with is a captcha but even that is going to drive first-timers away.
My two cents: Drop any cookie requirements for the initial visit but present a captcha check once the hit rate coming from a particular IP exceeds a certain value. Ask again every 5 minutes for as long as the hit rate exceeds that limit. If the captcha check fails, blacklist the IP. If the captcha check succeeds, use a session cookie to identify the user agent. If the session cookie is rejected by the UA, do another captcha check along with a message saying that you require cookies.
I don't know what the hit rate limit should be exactly, but it could be that it needs to be so low that busy members might trigger it incidentally. To deal with that, members other than "New Users" should be white listed.
| 11:08 pm on Nov 25, 2005 (gmt 0)|
|But if I were a new visitor coming from G, I couldn't be bothered to register just to see some contents I have never seen before and don't know the quality of. |
Fair enough, but that could help weed out the useless "me too" posts and raise the quality bar.
[edited by: engine at 11:31 pm (utc) on Nov. 25, 2005]
| 11:11 pm on Nov 25, 2005 (gmt 0)|
this issue sounds more like a DDOS attack:
- first, contact the FBI (you'll need to explain to them that there are damages >$50k)
- There is (very) expensive router/firewall hardware / very smart software out there to detect this kind of "unnatural" behaviour.
- The IPs doing this are either hacked servers (OK to ban them right away) or trojans on enduser XPs etc. (dangerous to ban them as you might ban an AOL proxy completely)
| 11:17 pm on Nov 25, 2005 (gmt 0)|
This is what does not make sense....
And Brett if you could...please answer this one!
Why do you not opt to cloak the site?
If the answer is because it is "bad"....explain why!
If the reason you dont want to cloak is because you are afraid of search engine's frowning down on you the I am COMPLETELY LOST..I mean so here you are not cloaking but who really cares of your not even in the search engines..
<joking>But yeah at least your not cloaking</joking>
I have read through many suggestions...some may work some may not!
You say you tried everything but have you tried to cloak the whole site?
| 11:31 pm on Nov 25, 2005 (gmt 0)|
> If you required users to have cookies,
> but didn't require them to login. Could
> you require everyone but google to have
> a cookie to view the site?
That exact question has been asked of the se's for years, and they say universally, that would be cloaking and against the major se guidelines and you would be subject to removal.
| 11:39 pm on Nov 25, 2005 (gmt 0)|
|If the reason you dont want to cloak is because you are afraid of search engine's frowning down on you the I am COMPLETELY LOST..I mean so here you are not cloaking but who really cares of your not even in the search engines. |
Not my place to answer this, but it seems obvious to me - this state of affairs might not be permanent. Why poison the domain? We've all seen tales of woe from people here who have domains that are so poisoned that they're beyond redemption/recovery.
And for those folks who want the wavy letter captcha things - man, most of the time when I run into those, I can't tell what the hec they're supposed to be spelling. Please don't put me through that.
And great intro to part two of this thread, Brett. It really puts things in perspective.
Added: He's playing by the rules, man - rather hard to find fault with that, isn't it?
[edited by: Stefan at 11:51 pm (utc) on Nov. 25, 2005]
| 11:42 pm on Nov 25, 2005 (gmt 0)|
>>use Google desktop to search it?
If you have Google Desktop installed, at least you can use it to find the posts that you have read, and want to locate again.
| 11:44 pm on Nov 25, 2005 (gmt 0)|
>>subject to removal
and here we are...
sorry couldn't resist :)
| 11:44 pm on Nov 25, 2005 (gmt 0)|
|I am going to be forced to get the information elsewhere |
If it even exists elsewhere...
Not having WW on Google is like driving a car on a dark foggy night.
| 11:53 pm on Nov 25, 2005 (gmt 0)|
Brett -- this is heartbreaking. I feel like my puppy died. Is there any way we can help?
- donations for new server equip
- custom software
- distributed computing resources (Akamai or SETI@Home style)
This is going to sound naive, but I'd like to think we could 'open source' a solution given that we have a lot of talented webmasters, distributed servers, the will, etc.
To the issue with syncing, maybe WW's own server(s) (How many exactly are there Brett?) would host fresh content, and a distributed network of member servers (I'll donate some cycles) form an Akamai-style distributed WW. Maybe P2P syncing could reduce the load on WW to update the mirrors? And since only 'changes' would need to be synced, it would be limited to new posts.
| This 246 message thread spans 9 pages: 246 (  2 3 4 5 6 7 8 9 ) > > |