Welcome to WebmasterWorld Guest from 54.227.110.209

Forum Moderators: goodroi

Message Too Old, No Replies

Google Index robots.txt

google describes the result for the pages blcocked by robots.txt

     

webseosolution

7:24 am on Aug 24, 2012 (gmt 0)



Hello,

I have Blocked some of my web urls through robots.txt before a few months. But If I am checking it now it delivers the message like:

A description for this result is not available because of this site's robots.txt

And the urls are accessible through google search, What will be the best way to stop them being indexed?
As those urls have some important data in PDF or PDA that should be only available for paid members only.

Please let me know how can I remove them google indexing?

If I will try google webmaster tool to remove them for each and every url it will take too long as the count for those urls is more then 500.

Is there any other way apart from google webmastaer removal tool?

Thanks & Regards,
WSS

phranque

9:17 am on Aug 24, 2012 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



welcome to WebmasterWorld, webseosolution!

one option you have is to return a X-Robots-Tag: noindex HTTP Response header with those files and allow the urls to be crawled.

should be only available for paid members only

why are those files directly web-accessible.

webseosolution

10:26 am on Aug 24, 2012 (gmt 0)



Thank you for your valued response.

Those files were accessible earlier, so those were crawled earlier.

Currently I have made them paid, that's why.

But some of the urls are having https:// even though they are being crawled by google bot.



WSS

webseosolution

1:08 pm on Aug 24, 2012 (gmt 0)



Hello,

Let me explain my question in depth.

Let's say on my URL [xyz.com...] page having following content in page source

<meta name="robots" content="all" />
<meta name="robots" content="index, follow" />
<meta name="googlebot" content="index, follow" />
<meta name="msnbot" content="index, follow" />

Now I am setting following rule in .htaccess file

<files admin.php>
Header set X-Robots-Tag "noindex, nofollow"
</files>

And my admin.php is NOT restricted in robots.txt

Question: How bot will behave in above mentioned case?

Thanks & Regards,
WSS

not2easy

3:19 pm on Aug 24, 2012 (gmt 0)

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



<meta name="robots" content="all" />
<meta name="robots" content="index, follow" />
<meta name="googlebot" content="index, follow" />
<meta name="msnbot" content="index, follow" />

Now I am setting following rule in .htaccess file

<files admin.php>
Header set X-Robots-Tag "noindex, nofollow"
</files>

And my admin.php is NOT restricted in robots.txt

Question: How bot will behave in above mentioned case?

The bot will crawl but not index or follow the URL: xyz.com/admin.php

Why do you have so many separate robots meta tags? This one does the same as all of them together: <meta name="robots" content="all" />

You should think about password protecting the files you want to keep private and make sure they are not in your sitemaps.

phranque

10:31 pm on Aug 24, 2012 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



i believe most well-behaved robots will use the exclusion that is most specific to the user agent and if there is more than one relevant exclusion the bot will use the most restrictive.

i would assume in general that means "noindex, nofollow" for bots that support X-Robots-Tag.

however googlebot should see the more specific "googlebot" exclusion and go with "index, follow"

Elsmarc

12:32 pm on Aug 25, 2012 (gmt 0)

10+ Year Member



What will happen if, for example, I have a directory which is excluded in robots.txt but files in that directory and sub-directories are in the sitemap?

phranque

3:52 pm on Aug 25, 2012 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



a sitemap is simply a url discovery mechanism.

reqardless of how a crawler has discovered a url, the robots.txt file is consulted to see if the url matches an excluded pattern before requesting the url.

GWT should report this as crawl "error".

i wouldn't recommend putting excluded urls in a sitemap.xml file.

webseosolution

5:41 am on Aug 30, 2012 (gmt 0)



Hello All,

Thank you for your valued response.

I just did that and waiting to get the result from google.

Thanks,
WSS
 

Featured Threads

Hot Threads This Week

Hot Threads This Month