Lord Majestic

msg:307575 | 10:08 pm on Dec 1, 2005 (gmt 0) |
| Eh? The data the google bot sees is EXACTLY the same data the user would see. You're just requiring the user to login. The data behind the login page is still the same. No one is being deceived. |
| The USER who clicks on Google's result is being deceived - instead of seeing page with relevant data he will see login page. I feel seriously pissed off when I click on some links in Google news only to be greeted with subscriber's only page - it may be acceptable for Google news since they have trusted feeds, but its not for Google search engien. Granted registration is free, however its not important - what's important is that there is no way computers can distinquish between good Brett cloaking for good reason and bad Bill-The-Spammer who cloaks for bad reasons, thus anybody who cloaks should be penalised because machines simply can't see the difference.
|
Brett_Tabke

msg:307576 | 10:12 pm on Dec 1, 2005 (gmt 0) |
2by4 - also noticed that folks that run with bot names, are most often bots trying to slip by filters and thus end up autobanned by the cron job at the end of the day. eg: if you have an ip that has been banned and you are running as a bot name - that's why... Actual ip parsing is left as an excersize to the reader: ##################### #!/usr/bin/perl #use CGI::Carp qw (fatalsToBrowser); print "Content-type: text/plain\n\n"; # if needed... $ip = $ENV{'REMOTE_ADDR'}; $hta= ".htaccess"; &GetHtaccess; # print "s $success $ip" if $debug; &BurnIP($ip); &PutHtaccess; sub BurnIP { $z=shift; foreach $t (@htaccess) { if ($t =~ /deny from/gi &&!$done) { $t.=" $z"; $done++; push(@out,$t); } else { push(@out,$t); } } if (!$done) { push(@out,qq¦\ndeny from $z\n¦); } undef @htaccess; @htaccess=@out; undef @out; } sub PutHtaccess { open(FILE2,">$hta"); foreach $t (@htaccess) { print FILE2 "$t\n"; } close(FILE2); } sub GetHtaccess { return(0) if!-e "$hta"; open(FILE3,"$hta"); @htaccess =<FILE3>; chomp @htaccess; close(FILE3); 1; } #####################
|
2by4

msg:307577 | 10:52 pm on Dec 1, 2005 (gmt 0) |
LOL, should have known better brett, thanks for the sample code. On the bright side, I get to play with w3m and vi, always fun. takes me back to the good old days I guess. By the way, if you haven't tried w3m lately, check it out, it's pretty cool.
|
BeeDeeDubbleU

msg:307578 | 7:13 am on Dec 2, 2005 (gmt 0) |
Can we disable the site search and post a notice at the top about this to stop infrequent visitors posting about this?
|
kaled

msg:307579 | 11:55 am on Dec 2, 2005 (gmt 0) |
Having thought about it a little more, I think Lord Majestic has a point. However, I think a compromise/alternative approach might be possible. If not logged in :- 1) Disable all outward links from every page. 2) If cookies are enabled, insert a login form at the top of the page, otherwise insert a "cookies required" message at the top of the page. By disabling all the outward links, robots would be totally screwed. By displaying the indexed content (albeit with a login form at the top of the page) those that enter the site directly from a search engine will not be overly annoyed/disappointed. Kaled.
|
SebastianX

msg:307580 | 1:31 pm on Dec 2, 2005 (gmt 0) |
Brett, could you allow bots and simple RSS readers like Feedreader to request the RSS feed again? I wouldn't mind if you've to move it to another location to make this happen ;) TIA
|
Play_Bach

msg:307581 | 2:52 pm on Dec 2, 2005 (gmt 0) |
> Can we disable the site search What site search are you referring to?
|
BeeDeeDubbleU

msg:307582 | 3:06 pm on Dec 2, 2005 (gmt 0) |
The one on the menu at the top of this and every page.
|
HelenDev

msg:307583 | 3:11 pm on Dec 2, 2005 (gmt 0) |
Re the site search, I apologise for my probably ignorance in the matter, but how come it's not possible to just have a normal, erm, site search? ie not a google search or whatever? I don't know anything about the workings of this site but I presume it's driven by some sort of database, could not a search function be written for it? Is it because the site is too big and it would take too long to do a search?
|
Lord Majestic

msg:307584 | 3:15 pm on Dec 2, 2005 (gmt 0) |
Thanks kaled ;) | By disabling all the outward links, robots would be totally screwed. |
| It seems to me that this will defeat the point - if robots can't find links then they won't index much, thus there will be no bots and hence no results in Google to get traffic from: this is fine if you don't want bots, but the way I understand the situation here (with alleged cloaking of robots.txt and possibly other content depending on user-agent) the point is to make people register to view content, yet some bots are allowed using cloaking. Its either having cake or eating it.
|
kaled

msg:307585 | 4:34 pm on Dec 2, 2005 (gmt 0) |
Perhaps I was a little too brief.... You would disable links if not logged in (thereby defeating unwanted robots) but enable the links if logged in and/or on a white-listed IP address. Thus, Googlebot et al could be allowed in. This is still cloaking, but the user would see the same content as indicated by the search, however links would be disabled and a login form would be displayed at the top of the page. To disable links, simply set href="#" (to go to top of page I think). You could also use javascript to focus the first item on the form. Kaled.
|
effisk

msg:307586 | 4:48 pm on Dec 2, 2005 (gmt 0) |
WW is the only forum of this importance without a rss feed. Is this something that will be implemented in the near future? another thing; I mentionned PunBB as being a very light and efficient BB system, I forgot to mention the best system: MesDiscussions. I have no other words, it simply is the best. Only thing is, I'm not sure they have support in English (it's a French system). cheers
|
Leosghost

msg:307587 | 4:57 pm on Dec 2, 2005 (gmt 0) |
tu rigole ..! there is no comparison ..that is typical graphic laden smiley ridden MSQL weirdness which would collapse at the volume and usage of this place .. it's not cos TF1 have used it that it's the best or big enough or versatile enough for this community.. Plus it can be "taken down" too easily for most usages ..France has some of the best coders in the world ..they did not work on "MesDiscussions" support is french only ..24 hour delay except holidays ..France has lots of holidays ..form mail problem submit system.. BTW ..you seriously think that TF1 has the specific bot problems of here ..most of the posters to TF1 fora can barely read let alone run a bot!
|
Lord Majestic

msg:307588 | 5:14 pm on Dec 2, 2005 (gmt 0) |
| This is still cloaking, but the user would see the same content as indicated by the search, however links would be disabled and a login form would be displayed at the top of the page. |
| Some people will certainly be confused seeing no links - but I suppose in this case it could actually be a reasonable compromise, the only thing I don't like is producing different content depending on whether requests come from particular search engine IPs - all good search engines should have an unknown range of IPs to catch such things and this brings the ultimate problem with cloaking - how can a machine determine if its a cloaking with the best intentions or just plain black hat spamming? They can't do that, so the only reasonable course of action is to ban all cloakers - not geo-IP delivery is not cloaking.
|
effisk

msg:307589 | 5:35 pm on Dec 2, 2005 (gmt 0) |
Leosghost, I was not aware TF1 did use this system at any stage. MesDiscussions is used for some of the largest online communities, including hardware.fr (about 350,000 members and not far from 30 million messages). The "smiley" thing is not an issue, it can be deactivated. The support however can become an issue. And by the way, I suggest those who doubt of systems such as phpbb for large communities have a look at "Gaia Online" forums. You'll be surprized... I think they're not far from 1,000,000 new posts each month.
|
kaled

msg:307590 | 5:39 pm on Dec 2, 2005 (gmt 0) |
Ok, with a little more thought, it can be done without cloaking at all (well, almost).... 1) Use entirely cgi links. 2) Unless logged in or on an IP address white-list, all links will lead to a login page. 3) The login page must include a noindex robots meta (to avoid duplicate content penalty!) Kaled.
|
Lord Majestic

msg:307591 | 5:51 pm on Dec 2, 2005 (gmt 0) |
| 2) Unless logged in or on an IP address white-list, all links will lead to a login page. |
| This is based on assumption of knowing all IPs of given search engines - this perhaps will work now, but with explosion of spam sites it seems to me that using cloaking is exposing yourself to a serious penalty. This also does not address the issue of users who came from search engine being mislead - they expect to see content they searched for on the first page after click, but instead they will have to register etc. When I come across with these things I just close the window and search harder.
|
kaled

msg:307592 | 7:49 pm on Dec 2, 2005 (gmt 0) |
Scratch my last suggestion. Mixing Google with redirects is probably a really bad idea anyway. Kaled.
|
Lord Majestic

msg:307593 | 7:56 pm on Dec 2, 2005 (gmt 0) |
IMO the best most honest policy is this: you either close access to search engines or you allow them (non-abusive bots only of course) to roam free - doing anything else is likely to push site beyond line in sand that separates good content sites and spammy junkies.
|
2by4

msg:307594 | 10:16 pm on Dec 2, 2005 (gmt 0) |
ip unbound... I will be good, I will be good.... effisk, those french forums, I was going to post a code sample from their page that certainly does nothing to support your claim that they are the best, but leosghost already covered the question. With phpbb forums, it's a different idea than these, they are db driven, bbbs are flat file driven. Different animals. Punbb does look interesting, but hasn't been stress tested yet as far as I know on a major forum site. But it's probably more or less just phpbb lite, with some extras and some subtractions, definitely as I noted the best output css/html of any forum software I've yet looked at. And very quick. But these forums aren't going to migrate to any generic solution, so there's really no point in bringing that up.
|
Key_Master

msg:307595 | 8:41 am on Dec 3, 2005 (gmt 0) |
Hello Brett, it's been a long time. I have 155 messages in my inbox that I can't read. lol :) You can ban the majority of the bots using Apache. Won't disclose how here though. Sticky me if interested.
|
Brett_Tabke

msg:307596 | 3:54 pm on Dec 3, 2005 (gmt 0) |
thanks tm... your thread is still kicking...
|
JoaoJose

msg:307597 | 1:03 am on Dec 4, 2005 (gmt 0) |
In the meantime the absence of a search function for WW is making my life pretty difficult. Can never get what I want on other websites...I guess that's why WW was always on Google's top spots for my querys.
|
lammert

msg:307598 | 9:57 am on Dec 4, 2005 (gmt 0) |
Those totally adicted to the Google site search functionality to search WebmasterWorld might take a look at this thread [webmasterworld.com] where Receptional gives a workable alternative using the Microsoft search engine which still contains a reasonable amount of WW pages.
|
webjourneyman

msg:307599 | 2:53 pm on Dec 4, 2005 (gmt 0) |
I found webmasterworld on google and continued to use it because I could search for answers to particular problems with site: search on google. If it had not been for this feature I would have continued using #*$!. You should at least allow for search if user is a paying member.
|
ken_b

msg:307600 | 3:32 pm on Dec 4, 2005 (gmt 0) |
I'm curious how this has affected the number of human visits.
|
walkman

msg:307601 | 3:38 pm on Dec 4, 2005 (gmt 0) |
>> I'm curious how this has affected the number of human visits. not sure how to translate it in numbers but the Alexa ranking is now at about 500, from a top 300 or so. Sure there's a drop, but it's holding up pretty good IMO.
|
motorhaven

msg:307602 | 4:39 am on Dec 5, 2005 (gmt 0) |
If a 50% drop in tool bar visits is good, then I guess so. Look at the traffic details, there's a huge drop.
|
JAB Creations

msg:307603 | 7:38 am on Dec 6, 2005 (gmt 0) |
I spent ALL night trying to read and catch up on the only 25 pages worth of posts... (cuz I know Brett wasn't thrilled about posts without full reads) BUT when you read all that and then see another 24 pages worth in a new thread(!)...gotta post what is on my mind before I crash tonight... Brett, how about selling your blacklists? If you have every bad bot in the Delta Quadrant coming at you I would think it would be a trustworthy and effective blacklist that you could make a profit out of (and of course make operations a bit cheaper). I *LOVE* access logs and will probably always love them, I'm sure there are plenty of others who spend countless hours tracking like I do. Maybe you could hire someone to cover the work for you (unless you do like dealing with the issue though probably not)...either way sell the lists and make a profit and buy some new super servers or something? ;) I fully agree with requiring cookies to serve content, it must be done to keep this or any site under siege operationally sound.
|
BReflection

msg:307604 | 5:15 pm on Dec 6, 2005 (gmt 0) |
I can't count how many times I was searching Google for a webmaster related issue and ended up coming to WebmasterWorld. WW is rarely the one to break a story. Usually stuff is posted on the front page days and weeks after the rest of the web gets it (i've noticed there are peaks and troughs of spurts and dryness concerning stories posted). So basically you come to read the valuable comments posted by members. That will continue. But searching for other people who have started threads on a topic I am interested in will now, unfortunately, have to happen elsewhere (and sorry, I don't prefer to use MSN search...and the web is about choices - you can't choose for me.) Absolutely nothing has changed in the abstract. Anyone who is willing to accept a cookie can come to WebmasterWorld and rip the entire site. It's just an added requirement - like when before where everyone who had an internet connection could come and rip the entire site.
|
jdancing

msg:307605 | 6:41 pm on Dec 6, 2005 (gmt 0) |
Offer tastefully done sponsorships for each WebmasterWorld sub-forum. Then uses the extra ~$20K/mo. from those sponsorships to pay someone to migrate WebmasterWorld from flat files to a database driven forum (like vBulletin) with search built in, use the money left over to pay for more server power as needed.. Problem solved. I’d rather see a few non-obtrusive sponsorships rather then not have a search function to find the answers I need quickly at WebmasterWorld. I fear unless WebmasterWorld gets search, the redundant posts and lack of utility will cause this site to start to die on the vine.
|
| This 246 message thread spans 9 pages: < < 246 ( 1 2 3 4 5 6 7 [8] 9 ) > > |
|
|