Forum Moderators: phranque
I contacted my ISP about making the changes mentioned by jdMorgan on June 8. I'm free to do it, but they don't support it. The Apache changes mentioned are beyond my skill set.
He mentions another way to do it, using a script for doing auth. I'm not sure exactly what that means, but I contacted the guy who wrote my password program, which is a script. He says changing his code is not related to this. His program just manages the login process. He thinks making changes to the .htaccess file could accomplish this. Is the .htaccess file a solution? If so, what changes to the .htaccess file would be needed to allow search engines to index a password protected site?
Thanks,
David
If you're not keen on modifying .htaccess, you (or the guy who wrote my password program) could modify your login scripts so that if a request for a password-protected page comes from one of Google's ip blocks, you simply allow the request without asking for logon informaton.
Regarding modifying .htaccess: I can paste some code into .htaccess, and maybe change something to make it work for my URL. I did that for changing my URL to the "no www" format. But I can't write code for allowing the right search engines in. It sounds like my goal can be accomplished with the right code in .htaccess (and no other site modifications). Is this true? Are you aware of some stock code that could be simply modified?
Is this a new technique, or has it been around a while? Would it need to be updated with current IP/hostname/useragents from time to time?
Thanks!
Also, this is a 'project,' not a simple cut-and-paste job that can be 'quick-fixed' on a forum. I'd suggest you look into hiring somone to do it if you're not comfortable making the modifications to the auth script and config files yourself. The guy who wrote your authentication/authorization script would be a prime candidate, and adding this feature would improve his product, so he should have an interest in doing this. He might even consider sharing the cost of the work with you (if approached in just the right way...) :)
An alternative would be someone who uses and understands your password script, and might like to compete with the original author by providing a more search-engine-compatible version of it.
Jim
1) I did pitch the guy who wrote my password script on modifying it, paying him, and how it would help his program. As I mentioned earlier, he says his script is NOT related to allowing search engines in. He says his program just manages the login process. Is he perhaps wrong that it is not related to search engines?
2) Is the .htaccess file itself file a script?
3) Does something beyond changing the .htaccess file need to be done?
4) Are there ready-made password protection programs that provide the functionality of allowing search engines in? That is, without additional programming? When I found this one at CGI Resources, I don't remember seeing that as a common feature of the programs.
I wonder why the search engines don't provide the option to supply them a password, probably agree to some legal terms, and just index your site. Is it because they can't serve ads easily to password protected pages?
You're basically asking how to work on a Sirius or XM satellite radio in a 1967 Ford Mustang -- The radio didn't come with the car, and all we know about here is the car, and in general, how generic car radios work.
Therefore, we cannot answer your questions or provide a solution, because everything depends on how your specific script works, and very few people who might read here can be expected to know anything about it.
If you use the built-in Apache authentication and authorization, then it's a relatively simple matter to tell Apache to bypass the login requirement for certain user-agents and/or IP address ranges. However, you give up having a custom login page, and possibly several other features that came with the custom login script (I wouldn't know what they are, or how they affect your site, so I'm speaking very generally, here).
The .htaccess file is a text file which is parsed in turn by each Apache module, which then interprets only the directives that it understands, and does something as a result. It is not, therefore, strictly a script, but more like a text data file that is interpreted by the various Apache modules.
The trick you are looking for is how to bypass the login requirements for search engine user-agents and/or IP address ranges. The reason I referred you to the script author is that he might at least have some idea of a reasonable approach to doing this in a way that works well with your script.
Basically, you'd want the script to act as if the client presented a valid username/password, as long as the client request arrived from a known search engine IP address range and/or if the user-agent is that of a recognized search engine. For maximum security, you'd want to check both, and both the IP address and user-agent lists would need to be maintained over time. The client IP address is available to scripts in the REMOTE_ADDR server variable, and the client user-agent name is available in the HTTP_USER_AGENT server variable for each request to your server.
If you can't get help from the author, I'd suggest that you seek answers in a forum dedicated to the specific script that you are using. Only after understanding how your script works can you formulate any plan for letting the search engines access your site without going through the login process.
I think you answered your own question about why search engines can't/won't provide a mechanism to log in to sites when you said, "legal terms" -- Lawyers are expensive. There are simply too many sites and too many opportunities for something to go wrong. Therefore, in a cost/benefit analysis, there are high technical support, legal, and administrative costs, and little or no benefit for them. So, they let us deal with the problem and bear the cost.
I suggest that you look around for forums and/or consultants which/who deal specifically with the script you are using, and proceed from there.
Jim