homepage Welcome to WebmasterWorld Guest from 23.22.194.120
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
ROBOTS.txt---Yes to Google, but No to Public view
donna130




msg:4058058
 2:31 am on Jan 11, 2010 (gmt 0)

Hi. I've got a robots.txt file that, of course, allows Google's crawler to crawl it. However (and I admit i'm a relative newbie) is there a way to restrict public (competitors) from viewing that robots.txt file? Thanks if you have some tricks or solutions to this problem.

 

herculano




msg:4058166
 7:55 am on Jan 11, 2010 (gmt 0)

If you are using php, you can configure php to make the robots.txt executable, afterwhich you can add the following logic:
1. check the user agent
2. if user agent is google bot, msn or other bots you want to allow to read robots.txt as usual but do 3. first.
3. do an ip check and then do a reverse ip check, if indeed the bot is really who it is; it should end as googlebot.com, msn.com or yahoo.net.
ex. gethostbyip check the string then gethostbyname to do a reverse ip check then compare the ips.
4. after both checks are complete and authenticated; you can show the robots.txt accordingly.

I think that should work.. haven't tested it myself though..

hope that helps.

donna130




msg:4058200
 9:08 am on Jan 11, 2010 (gmt 0)

The prob is that I don't know php, and while I hope your post helps others, I was wondering, respectfully, if there might be another solution that might be a little simpler that I might be able to handle. Thanks for your understanding and additional thoughts.

tangor




msg:4058235
 11:24 am on Jan 11, 2010 (gmt 0)

Not sure why you'd want to restrict access to robots.txt. If you are running a white list robots.txt, ie, allow the ones you want and disallow all the rest, the only thing a competitor could see is what they already have in their OWN robots.txt.... after all they want google, bing, and yahoo, too!

On the other hand if you are allowing an obscure, unknown SE to spider and do not want to reveal that info... I don't have any simple suggestions. Robots.txt, by definition, should be available to any and all requests.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved