phranque

msg:4487919 | 9:17 am on Aug 24, 2012 (gmt 0) |
welcome to WebmasterWorld, webseosolution! one option you have is to return a X-Robots-Tag: noindex HTTP Response header with those files and allow the urls to be crawled. | should be only available for paid members only |
| why are those files directly web-accessible.
|
webseosolution

msg:4487928 | 10:26 am on Aug 24, 2012 (gmt 0) |
Thank you for your valued response. Those files were accessible earlier, so those were crawled earlier. Currently I have made them paid, that's why. But some of the urls are having https:// even though they are being crawled by google bot. WSS
|
webseosolution

msg:4487971 | 1:08 pm on Aug 24, 2012 (gmt 0) |
Hello, Let me explain my question in depth. Let's say on my URL [xyz.com...] page having following content in page source <meta name="robots" content="all" /> <meta name="robots" content="index, follow" /> <meta name="googlebot" content="index, follow" /> <meta name="msnbot" content="index, follow" /> Now I am setting following rule in .htaccess file <files admin.php> Header set X-Robots-Tag "noindex, nofollow" </files> And my admin.php is NOT restricted in robots.txt Question: How bot will behave in above mentioned case? Thanks & Regards, WSS
|
not2easy

msg:4488017 | 3:19 pm on Aug 24, 2012 (gmt 0) |
<meta name="robots" content="all" /> <meta name="robots" content="index, follow" /> <meta name="googlebot" content="index, follow" /> <meta name="msnbot" content="index, follow" /> Now I am setting following rule in .htaccess file <files admin.php> Header set X-Robots-Tag "noindex, nofollow" </files> And my admin.php is NOT restricted in robots.txt Question: How bot will behave in above mentioned case? |
| The bot will crawl but not index or follow the URL: xyz.com/admin.php Why do you have so many separate robots meta tags? This one does the same as all of them together: <meta name="robots" content="all" /> You should think about password protecting the files you want to keep private and make sure they are not in your sitemaps.
|
phranque

msg:4488137 | 10:31 pm on Aug 24, 2012 (gmt 0) |
i believe most well-behaved robots will use the exclusion that is most specific to the user agent and if there is more than one relevant exclusion the bot will use the most restrictive. i would assume in general that means "noindex, nofollow" for bots that support X-Robots-Tag. however googlebot should see the more specific "googlebot" exclusion and go with "index, follow"
|
Elsmarc

msg:4488235 | 12:32 pm on Aug 25, 2012 (gmt 0) |
What will happen if, for example, I have a directory which is excluded in robots.txt but files in that directory and sub-directories are in the sitemap?
|
phranque

msg:4488273 | 3:52 pm on Aug 25, 2012 (gmt 0) |
a sitemap is simply a url discovery mechanism. reqardless of how a crawler has discovered a url, the robots.txt file is consulted to see if the url matches an excluded pattern before requesting the url. GWT should report this as crawl "error". i wouldn't recommend putting excluded urls in a sitemap.xml file.
|
webseosolution

msg:4489669 | 5:41 am on Aug 30, 2012 (gmt 0) |
Hello All, Thank you for your valued response. I just did that and waiting to get the result from google. Thanks, WSS
|
|