Welcome to WebmasterWorld Guest from 54.205.96.97

Google ignoring robots.txt and No index meta tags

   
11:54 pm on Jul 17, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



A few months ago, I received a unnatural link warning from webmaster tools for one of my better sites.
It has many freely given links from .edu sites and also plenty of other probably lower quality links

I have never bought, begged or traded any links to this site so I am assumed I was penalised for giving links from other sites I run that are niche related.
Google can tell these sites belong to me as they all run the same adsense code.
I sent them a re inclusion request for this site explaining the situation and that I have now nofollowed all links (6) from my other sites.

They sent me a reply stating I was still in breach if their guidelines.
I sent some more reinclusion requests and got same response

This ticked me off so I decided to remove my adsense($60 a day) from the site and block all googlebots in my robots txt.I also put no index no follow in my meta tags.
By all rights Google should now, not be indexing my site as far as I know.
However 2 months later they are still showing my site for some main keyword searches with the snippet that "A description for this result is not available because of this site's robots.txt"

To me, this is another example of Google playing by their own rules and screw everyone else.
12:57 am on Jul 18, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Sometimes you can answer a post by its subject line alone, without even reading the question. This is one of those times :(

If google can't crawl a page, it can't see the "noindex" tag.

That's assuming it really said "noindex" as in the subject line, rather than "nofollow" as in the body of the post.
1:30 am on Jul 18, 2013 (gmt 0)

WebmasterWorld Senior Member netmeg is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Leave the NOINDEX on each page, and get rid of the block in robots.txt, so Google can see the NOINDEX.

(You do realize that this will also take you out of Bing and Yahoo)

NOINDEX is the surest way to stay out of Google. But you'll also stay out of the other search engines.

If you use robots.txt, you could still be indexed (only without a meta description). You're just telling Google not to crawl your pages, but they could still discover them other ways, for example via links to you.

NOINDEX deals with indexing. robots.txt deals with crawling. They're not quite the same thing.
3:15 am on Jul 18, 2013 (gmt 0)

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



Leave the NOINDEX on each page, and get rid of the block in robots.txt, so Google can see the NOINDEX.

(You do realize that this will also take you out of Bing and Yahoo)

Good point, netmeg!

To remove pages from google index only, use meta name="googlebot" instead of meta name="robots"
Directing a robots meta tag specifically at Googlebot
To provide instruction for all search engines, set the meta name to "ROBOTS". To provide instruction for only Googlebot, set the meta name to "GOOGLEBOT".
[googlewebmastercentral.blogspot.co.uk...]
3:21 am on Jul 18, 2013 (gmt 0)

5+ Year Member



It takes a long time for google to process noindex. I added noindex to a bunch of my pages over a month ago and they still are in the index. I have no robots block either.
3:27 am on Jul 18, 2013 (gmt 0)

WebmasterWorld Senior Member ken_b is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



aakk9999
Directing a robots meta tag specifically at Googlebot

Thanks for that, it answers a question I've been wondering about the last couple days.

.
3:32 am on Jul 18, 2013 (gmt 0)

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Here's a related thread that covers this topic over and over and over, until I think it finally made sense to most participants of the thread. It covers many aspects of the question, and I highly recommend it....

Pages are indexed even after blocking in robots.txt
http://www.webmasterworld.com/google/4490125.htm [webmasterworld.com]
3:51 am on Jul 18, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for the replies.
That probably explains it.
I have <meta name="googlebot" content="noindex"> and Bing treats and indexes it just fine.
but I guess I have to unblock the robots.txt.
Cheers
3:54 am on Jul 18, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



It takes a long time for google to process noindex. I added noindex to a bunch of my pages over a month ago and they still are in the index.

You may want to go into gwt and remove them explicitly-- especially if you're dealing with whole directories that can easily be block-removed.

Here's a related thread that covers this topic over and over and over, until I think it finally made sense to most participants of the thread.

For a given definition of "made sense", at least. "I don't like it, I don't understand it, but I accept it as fact."
5:48 am on Jul 18, 2013 (gmt 0)



I agree with dethfire. It takes Google ages to process these no index and nofollow tags. I did this to a user generated part of my site and 3 months later Google still have them in the index.

Also there is plenty of evidence that even though Google says they won't follow the "nofollowed" link they still do but just don't use it in the ranking algo. Even wikipedia mentions this effect on it's nofollow page. Of course it is hard to prove this as there maybe sites out there that you don't know about that have a follow link to this particular page. However some of these experiments blocked everything but Google bot in Apache config when the test pages were published and Google bot still crawled those pages.

I personally haven't done these experiments, has anyone here done any like this?
7:08 am on Jul 18, 2013 (gmt 0)

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



For a given definition of "made sense", at least. "I don't like it, I don't understand it, but I accept it as fact."

lucy24 - And here I thought you'd come around after your first post above. ;)

I highly recommend that thread to anyone with lingering questions about the Google crawling or indexing process.

It takes a long time for google to process noindex.

dethfire - How often does Googlebot visit the pages you noindexed? That would affect implementation speed.

Also, regarding how fast noindex is implemented, getcooking makes a very interesting comment on this crawl allocation discussion about Google's crawling behavior on noindexed pages (more particularly about removing noindex rather than first implementing it, but there might be related behavior at the head end)...

Crawl allocation and duplicate content
http://www.webmasterworld.com/google/4593402.htm [webmasterworld.com]

I track all googlebot activity on my site. On my noindexed pages, google slows down the crawls over time. So, once it picks up the noindex tag and removes the page from the index it starts to spider that page less and less frequently (once a day, then once a week, then once a month, etc). It will still eat some of the crawl budget but Google seems to be good at reducing how much effort it puts into those pages. It's probably also why once you noindex a page it can be a long time before you can get it reindexed. That's been my experience anyway.
12:29 pm on Jul 18, 2013 (gmt 0)

WebmasterWorld Senior Member netmeg is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



My noindex stuff tends to get dropped right away most of the time, but if it doesn't, I go take it out in GWT.

I think you used to be able to remove your entire site in GWT, but I haven't checked recently to see if that's still possible.
5:34 pm on Jul 18, 2013 (gmt 0)

WebmasterWorld Senior Member planet13 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Alright, get the flamethrowers out now, but I gotta ask:

Is that site making you more money now that you have gotten rid of adsense and are trying to get it unlisted from google?

I mean, what is the point of this except to tilt at windmills?
6:11 pm on Jul 18, 2013 (gmt 0)

5+ Year Member



I have 175k pages I need to noindex. I'm not doing that by hand :D
8:06 pm on Jul 18, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Is that site making you more money now that you have gotten rid of adsense and are trying to get it unlisted from google?

Nah.
It's making Bugga all with alt ads and lost 60% of traffic but I feel strongly that first of all, I was innocent of their accusations and that my site is the best on the subject and google serps look silly (imo) without my site.
I am a patient man and I can wait till Google come cap in hand and beg me forgiveness.
;)
10:12 pm on Jul 18, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



till Google come cap in hand and beg me forgiveness

Look! Up in the sky! It's a flying pig!
 

Featured Threads

My Threads

Hot Threads This Week

Hot Threads This Month