homepage Welcome to WebmasterWorld Guest from 54.166.105.24
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Google Index robots.txt
google describes the result for the pages blcocked by robots.txt
webseosolution




msg:4487909
 7:24 am on Aug 24, 2012 (gmt 0)

Hello,

I have Blocked some of my web urls through robots.txt before a few months. But If I am checking it now it delivers the message like:

A description for this result is not available because of this site's robots.txt

And the urls are accessible through google search, What will be the best way to stop them being indexed?
As those urls have some important data in PDF or PDA that should be only available for paid members only.

Please let me know how can I remove them google indexing?

If I will try google webmaster tool to remove them for each and every url it will take too long as the count for those urls is more then 500.

Is there any other way apart from google webmastaer removal tool?

Thanks & Regards,
WSS

 

phranque




msg:4487919
 9:17 am on Aug 24, 2012 (gmt 0)

welcome to WebmasterWorld, webseosolution!

one option you have is to return a X-Robots-Tag: noindex HTTP Response header with those files and allow the urls to be crawled.

should be only available for paid members only

why are those files directly web-accessible.

webseosolution




msg:4487928
 10:26 am on Aug 24, 2012 (gmt 0)

Thank you for your valued response.

Those files were accessible earlier, so those were crawled earlier.

Currently I have made them paid, that's why.

But some of the urls are having https:// even though they are being crawled by google bot.



WSS

webseosolution




msg:4487971
 1:08 pm on Aug 24, 2012 (gmt 0)

Hello,

Let me explain my question in depth.

Let's say on my URL [xyz.com...] page having following content in page source

<meta name="robots" content="all" />
<meta name="robots" content="index, follow" />
<meta name="googlebot" content="index, follow" />
<meta name="msnbot" content="index, follow" />

Now I am setting following rule in .htaccess file

<files admin.php>
Header set X-Robots-Tag "noindex, nofollow"
</files>

And my admin.php is NOT restricted in robots.txt

Question: How bot will behave in above mentioned case?

Thanks & Regards,
WSS

not2easy




msg:4488017
 3:19 pm on Aug 24, 2012 (gmt 0)

<meta name="robots" content="all" />
<meta name="robots" content="index, follow" />
<meta name="googlebot" content="index, follow" />
<meta name="msnbot" content="index, follow" />

Now I am setting following rule in .htaccess file

<files admin.php>
Header set X-Robots-Tag "noindex, nofollow"
</files>

And my admin.php is NOT restricted in robots.txt

Question: How bot will behave in above mentioned case?

The bot will crawl but not index or follow the URL: xyz.com/admin.php

Why do you have so many separate robots meta tags? This one does the same as all of them together: <meta name="robots" content="all" />

You should think about password protecting the files you want to keep private and make sure they are not in your sitemaps.

phranque




msg:4488137
 10:31 pm on Aug 24, 2012 (gmt 0)

i believe most well-behaved robots will use the exclusion that is most specific to the user agent and if there is more than one relevant exclusion the bot will use the most restrictive.

i would assume in general that means "noindex, nofollow" for bots that support X-Robots-Tag.

however googlebot should see the more specific "googlebot" exclusion and go with "index, follow"

Elsmarc




msg:4488235
 12:32 pm on Aug 25, 2012 (gmt 0)

What will happen if, for example, I have a directory which is excluded in robots.txt but files in that directory and sub-directories are in the sitemap?

phranque




msg:4488273
 3:52 pm on Aug 25, 2012 (gmt 0)

a sitemap is simply a url discovery mechanism.

reqardless of how a crawler has discovered a url, the robots.txt file is consulted to see if the url matches an excluded pattern before requesting the url.

GWT should report this as crawl "error".

i wouldn't recommend putting excluded urls in a sitemap.xml file.

webseosolution




msg:4489669
 5:41 am on Aug 30, 2012 (gmt 0)

Hello All,

Thank you for your valued response.

I just did that and waiting to get the result from google.

Thanks,
WSS

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved