Welcome to WebmasterWorld Guest from 54.167.86.211

Forum Moderators: goodroi

Message Too Old, No Replies

ROBOTS.txt---Yes to Google, but No to Public view

     
2:31 am on Jan 11, 2010 (gmt 0)

Junior Member

10+ Year Member

joined:Oct 4, 2005
posts: 66
votes: 0


Hi. I've got a robots.txt file that, of course, allows Google's crawler to crawl it. However (and I admit i'm a relative newbie) is there a way to restrict public (competitors) from viewing that robots.txt file? Thanks if you have some tricks or solutions to this problem.
7:55 am on Jan 11, 2010 (gmt 0)

Junior Member

5+ Year Member

joined:Nov 26, 2008
posts:42
votes: 0


If you are using php, you can configure php to make the robots.txt executable, afterwhich you can add the following logic:
1. check the user agent
2. if user agent is google bot, msn or other bots you want to allow to read robots.txt as usual but do 3. first.
3. do an ip check and then do a reverse ip check, if indeed the bot is really who it is; it should end as googlebot.com, msn.com or yahoo.net.
ex. gethostbyip check the string then gethostbyname to do a reverse ip check then compare the ips.
4. after both checks are complete and authenticated; you can show the robots.txt accordingly.

I think that should work.. haven't tested it myself though..

hope that helps.

9:08 am on Jan 11, 2010 (gmt 0)

Junior Member

10+ Year Member

joined:Oct 4, 2005
posts: 66
votes: 0


The prob is that I don't know php, and while I hope your post helps others, I was wondering, respectfully, if there might be another solution that might be a little simpler that I might be able to handle. Thanks for your understanding and additional thoughts.
11:24 am on Jan 11, 2010 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:6160
votes: 284


Not sure why you'd want to restrict access to robots.txt. If you are running a white list robots.txt, ie, allow the ones you want and disallow all the rest, the only thing a competitor could see is what they already have in their OWN robots.txt.... after all they want google, bing, and yahoo, too!

On the other hand if you are allowing an obscure, unknown SE to spider and do not want to reveal that info... I don't have any simple suggestions. Robots.txt, by definition, should be available to any and all requests.

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members