homepage Welcome to WebmasterWorld Guest from 107.20.25.215
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Local / WebmasterWorld Community Center
Forum Library, Charter, Moderators: lawman

WebmasterWorld Community Center Forum

    
You should protect profile links from bots
A very simple two-click sequence will do it
Everyman




msg:501968
 5:35 pm on May 12, 2001 (gmt 0)

I like the idea that you can either fill in your profile or stay anonymous. However, I'd like to recommend one further step -- a simple way to keep the profiles from getting spidered.

The initial click on a profile link would be picked up by your CGI program, which would return a little box with another "Click for profile". This is easy to do by using the initial link in a hidden variable in the new little box, and changing the usual "Submit" on the form button to "Click for profile."

This way you need two clicks to get to the profile. Bots are too dumb to handle this without special programming. I use this on my site when some domain seems to be a bot. The title of the box I return is called "Robot Roadblock" and I explain, with an apology, that you may not be a bot after all, but I'm trying to save my bandwidth, so please click again to continue, and aren't we all lucky that bots are too dumb to click again?

With this little gizmo installed, you'd keep the profiles out of the bots, and out of Google's cache, and give forum participants more confidence that they can change their profile at any time -- such as going from full ID back to anonymous -- and do so under conditions where this change might actually be useful and effective.

 

Brett_Tabke




msg:501969
 7:56 pm on May 12, 2001 (gmt 0)

Interesting topic, that we've had quite a few discussions about handling. Nice to hear some member input on it.

The current setup protects members email addresses and allows them optionally to identify as much or as little information about themselves as possible. That is as far as I am willing to take it since it is completely in the control of members themselves.

We've had several requests to allow members profiles to be indexed. After much discussion, I have put in place a system where meta tag robots noindex tags are present on members profiles with less than 100 posts. After that, we remove the no index tag, because it does help the linking sites link popularity. If a member is willing to contribute that much, we will go to those ends to support them.

theperlyking




msg:501970
 8:20 pm on May 12, 2001 (gmt 0)

I've put some dodgy spam protection in my email address (and switched off notification) so hopefully I will be ok. To be honest I treat WmW similar to a lot of places and use a hotmail account rather than an email addresss where spam would distress me :)

Everyman




msg:501971
 9:16 pm on May 12, 2001 (gmt 0)

Thank you, Brett, for your reply. It occurred to me after I posted that some participants would want their profiles indexed. Your 100-post threshold is probably a reasonable compromise. This is the first I've heard of this policy.

One point about it being under the member's complete control: This was true prior to October, before Google went "deep," and it's true for under-100 members, assuming that the noindex meta is respected by all bots, but consider the hypothetical case of an over-100 member who has a full profile ID.

Google comes along and caches this profile. Have you ever tried to get Google to purge a cached file that has been removed or modified on the original site? -- it's difficult to get their attention! If the member wants to vacate the profile for some reason (bad guys are looking for him?), it's probably going to stay cached at Google for anywhere from 30 to 60 more days, despite the member's best efforts.

And that member will be contacting you, because Google will want to verify the situation with the actual website administrator. I went through this last August with Google; I'm assuming that things haven't improved since then.

I don't know how many bots respect the noindex meta, but I'm not confident that many do. I think you need to carry another byte in the user preferences -- a Y/N flag for whether you want your profile protected from spidering. And if the answer is "Y" then BOTH the noindex meta and the two click sequence should be used.

But it's your call. I think this forum is the best I've seen, and based on your track record, whatever you decide is probably the best solution in any case.

Everyman




msg:501972
 4:10 pm on May 13, 2001 (gmt 0)

Well, I nulled out my profile after discovering that it's too easy to find a link to it on Google. It's not getting indexed, but just with Google picking up the link from anywhere on WmW, you get my Everyman handle in conjunction with webmasterworld.com, because that's how the URL is formed. Do this on Google:

site:www.webmasterworld.com everyman

And you're one click away from reading the profile. Top ranking, too!

Now I just have a one-pix of me in there. Your "border=1" doesn't frame my best side, but most won't notice.

Any policy on this?

mivox




msg:501973
 7:51 pm on May 13, 2001 (gmt 0)

Oh hey, I think it's a rather fetching picture! ;)

Everyman




msg:501974
 11:21 pm on May 13, 2001 (gmt 0)

>Oh hey, I think it's a rather fetching picture!

Thanks. It wasn't easy, but you can hardly even tell that I've lost most of my hair!

By the way, is it within policy parameters to generate this transparent one-pix with a CGI program? You don't test for a graphics extension, and I know it would work. I can do some USER_AGENT and USER_FROM logging this way, and get an idea of which bots don't respect the noindex meta.

And if that's okay, then is it okay to generate my portrait (one-pix or any size) with a CGI program and at the same time plant a cookie? Unlike Google, I'll have it expire early -- in 2037 instead of 2038. (Just kidding; I don't really want to plant a cookie. But I think the bot monitoring would be useful.)

mivox




msg:501975
 12:10 am on May 14, 2001 (gmt 0)

That's an awfully clever idea... of course the policy interpretation is up to Brett, but I'd personally love to see the list of bots that ignore the meta tags! (at least until you exceed 100 posts ;) )

Everyman




msg:501976
 4:11 am on May 14, 2001 (gmt 0)

It won't catch many bots, because they ignore the [img src=...] anchor. But about 30 minutes after I put this same GIF on another forum, where they don't have the noindex meta, I got this in my log:

inktomi3-not.server.ntl.com - - [13/May/2001:15:44:39 -0400] "GET /portrait.gif HTTP/1.0" 200 43

toolman




msg:501977
 4:18 am on May 14, 2001 (gmt 0)

>>>>inktomi3-not.server.ntl.com

Thats from an isp in the UK. Probably a surfer.

Everyman




msg:501978
 5:09 am on May 14, 2001 (gmt 0)

Guess we need a forum for webbug buffs.

Technically, according to webbug critic and privacy expert Richard Smith, it's not a webbug unless the one-pix is from an off-site domain. Therefore, this use of the one-pixel GIF under discussion is indeed a webbug. When I use it on my home page, it's not a webbug.

The fun thing about webbugs is when you have some information you've collected from the user somehow, and you need to get this information back to your site quietly. You can use the PATH_INFO and QUERY_STRING to transfer information very effectively if your one-pix is provided by a CGI program. And they work in HTML-enabled e-mail also. A programmatic "Submit" in JavaScript would accomplish much the same thing for getting information back quietly, but I think webbugs are cute, while JavaScript is just clumsy.

Another thing: while there is plenty of time for a CGI program to catch all the environment variables and do logging, and then spit out the 43 bytes for a one-pixel transparent GIF to keep Apache and the remote browser happy, the HTTP_REFERER variable is almost worthless in this case.

About 999 times out of a thousand, the referer for this profile GIF would be:

www.webXXXXXXworld.com/profiles.cgi_actionviewmembereveryman

In other words, it's referred by the profile page itself. Once in a while you get something more interesting, but it's very rare. I've been log watching for years on this. Too bad, because I'd be curious about how many profile viewers come in locally, as opposed to how many come in from a link provided by bots.

I've been using this technique to rotate some cartoons and boxes with each one of my no-cache refreshes, to keep the home page more interesting. If you use a "CGI-pretend GIF" in this context, as a trigger to rearrange your "no-cache meta" home page, you also need one more trick to guarantee that there's no way your "pretend GIF" gets cached anywhere. Because if it's cached anywhere, it won't trip the next time for that user. Even a no-cache meta on a page doesn't prevent the GIFs themselves from getting cached.

That simple trick is this:

Instead of calling [img src="blah.blah/cgi-bin/blah.cgi"] you should instead call [img src="blah.blah/cgi-bin/blah.cgi/12345"] in which 12345 is the process ID or some other number that changes every time. This number ends up in PATH_INFO, where you just ignore it. The thing about this is that there isn't a cache system on earth, at least in my experience, that doesn't think they have to GET the new image. The number on the end makes it look like a new URL to them. If you're rearranging the page anyway, it's easy to slap a new number on the end of this URL.

Now go out there, webmasters, and get your hands dirty!

Froggyman




msg:501979
 5:40 am on May 14, 2001 (gmt 0)

Funny what a potential bomb a little piece of information like an e-mail address can do. My e-mail listed here was used to track my site down and multiple cyber attacks were waged against it. When this proved futile they focuses on other sites I was affiliated with. At least one was not so lucky. Fortunately the attack was anticipated and backup copies were made by the webmaster in charge.

I'm still receiving e-mail bombs at least one per week. Hint, e-mail viruses would be a topic worthy of it's own forum.

PS- This a bit of the story so many have asked about.

Everyman




msg:501980
 1:31 am on May 15, 2001 (gmt 0)

I got bored with my one-pixel GIF, and felt bad about using a webbug. Instead I made a transparent 8-color, 287-byte GIF that you can use in an anchor on your pages to link to Google. Might help your PageRank! Click on my profile, then right-click on the this GIF to download.

mivox




msg:501981
 1:54 am on May 15, 2001 (gmt 0)

::guffaw:: *snif*

Jeez, I actually *like* Google as a search engine too! I use them almost all the time. (Not the tool bar, just google.com) I do really like them...

...but I can't help sniggering and guffawing at almost anything that pokes fun at the large, sucessful and/or bloated... be they corporate or government, 'good guys' or bad.

I think a sense of healthy skepticism and humor is an important thing when dealing with the big guys of the world.

toolman




msg:501982
 2:15 am on May 15, 2001 (gmt 0)

>>>My e-mail listed here was used to track my site down and multiple cyber attacks were waged against it

You're talking about gem100 aren't you Froggyman?

I wonder if there's a script that will write a 'deny from' line to an .htaccess file when a given ip reaches a threshold limit?

Brett_Tabke




msg:501983
 2:37 am on May 15, 2001 (gmt 0)

Please try to stay within the spirit of the board and don't do the 1pix graphic counter thing. The graphic profile was put there for the benefit of the members. Like most things in life, it can be abused. If so, we will be forced to remove this feature that is growing and will continue to grow in popularity.

Everyman




msg:501984
 3:45 am on May 15, 2001 (gmt 0)

Okay, I'm not using a one-pix anymore. I can "count" just as well with a larger GIF.

Disabling the picture option would be a rather extreme reaction to a few webbugs. Disabling the profile entirely might make more sense, as no doubt there are those with profiles who will, at some point in our googled future, wish they had never filled them out.

(It's too late for me anyway; my FBI and CIA files from the late 1960s were so complete that everything about me in Google's cache is rather like a big stack of footnotes.)

All you have to do is change your "border=1" to perhaps a "border=3". At that point even a one-pix is so noticeable that it defeats any purpose in using one, as the link to a noticeable GIF frequently betrays the person who placed the GIF.

You should also check for a ".gif" or ".jpg" extension to discourage images generated by CGI programs.

As to the "spirit of the board," it seems to me that there are many "spirits" on this board (as well as other boards, too), and it's the very fact of multiple spirits that makes it interesting and informative.

Marcia




msg:501985
 7:02 am on May 26, 2001 (gmt 0)

>a larger GIF

I noticed. Nice graphic, Everyman. ;)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Local / WebmasterWorld Community Center
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved