Welcome to WebmasterWorld Guest from 54.167.185.18

Forum Moderators: goodroi

Message Too Old, No Replies

ROBOTS.txt---Yes to Google, but No to Public view

   
2:31 am on Jan 11, 2010 (gmt 0)

5+ Year Member



Hi. I've got a robots.txt file that, of course, allows Google's crawler to crawl it. However (and I admit i'm a relative newbie) is there a way to restrict public (competitors) from viewing that robots.txt file? Thanks if you have some tricks or solutions to this problem.
7:55 am on Jan 11, 2010 (gmt 0)

5+ Year Member



If you are using php, you can configure php to make the robots.txt executable, afterwhich you can add the following logic:
1. check the user agent
2. if user agent is google bot, msn or other bots you want to allow to read robots.txt as usual but do 3. first.
3. do an ip check and then do a reverse ip check, if indeed the bot is really who it is; it should end as googlebot.com, msn.com or yahoo.net.
ex. gethostbyip check the string then gethostbyname to do a reverse ip check then compare the ips.
4. after both checks are complete and authenticated; you can show the robots.txt accordingly.

I think that should work.. haven't tested it myself though..

hope that helps.

9:08 am on Jan 11, 2010 (gmt 0)

5+ Year Member



The prob is that I don't know php, and while I hope your post helps others, I was wondering, respectfully, if there might be another solution that might be a little simpler that I might be able to handle. Thanks for your understanding and additional thoughts.
11:24 am on Jan 11, 2010 (gmt 0)

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



Not sure why you'd want to restrict access to robots.txt. If you are running a white list robots.txt, ie, allow the ones you want and disallow all the rest, the only thing a competitor could see is what they already have in their OWN robots.txt.... after all they want google, bing, and yahoo, too!

On the other hand if you are allowing an obscure, unknown SE to spider and do not want to reveal that info... I don't have any simple suggestions. Robots.txt, by definition, should be available to any and all requests.