Welcome to WebmasterWorld Guest from 54.144.124.152

Message Too Old, No Replies

NoIndex not NoIndexing. SO FRUSTRATING!

     

Lenny2

4:42 pm on Nov 10, 2011 (gmt 0)



Ladies and Gents,

I have an application on a server... it's the main website. Then I have a mirror application on another server... it's our testing grounds. back in the hay-day it didn't matter, we kept both active... We had the set-up on two servers so that we could bounce users from one server (in texas) to the other server (in Washington) when traffic got heavy. At the time the application would slow down when there was a lot of traffic. Anyway it never really worked... round-robining as I guess it's called is extremely complicated and good people couldn't/didn't feel like figuring it out and it was less expensive for us to just fix the application so that it didn't slow down with lots of traffic.

Anyway, we kept the other server and the mirror site... as a testing ground. We no-indexed all the pages. My question is, when I do a search for the bck01.site-name.com, even though we no-indexed/nofollowed everything a year ago... I'm still finding some results.

As we were SLAMMED down by the Panda, I'm wondering if perhaps Google is ignoring the noindex and treating this bck01.site-name.com as duplicate content and penalizing us for it.

Here is the code: <META NAME="ROBOTS" CONTENT="NOINDEX,NOFOLLOW">

anybody have any thoughts on why Google would ignore the noindex and index the pages anyway? Does anybody have any insight as to whether or not Google would treat the mirror site as duplicate content? Are there better solutions to using a testing server/site for big site changes?

ackk

4:55 pm on Nov 10, 2011 (gmt 0)

5+ Year Member



We've had tons of pages noindexed for 6-7 months. Many of them still appear in Google's index.

The only thing you may want to check is to make sure that the noindexed pages are accessible. If the crawler can't access the page in the first place, it won't know that the noindex tag has been added.

netmeg

5:02 pm on Nov 10, 2011 (gmt 0)

WebmasterWorld Senior Member netmeg is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Yea, what's your robots.txt look like?

Lenny2

5:30 pm on Nov 10, 2011 (gmt 0)



@ackk and @netmeg; thanks for the feedback. I don't have a robots.txt on site... do you think we should add one?

potentialgeek

7:58 pm on Nov 10, 2011 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



See Google employee JohnMu's reply in this thread:

Creating a legitimate, no follow , 2nd mirror site with no penalty to our main site
[google.com...]

londrum

8:33 pm on Nov 10, 2011 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



there was a conversation about this ages ago to do with baidu, i think, and one of the things that came out of it was that noindex does not do what we think it does.

most search engines will remove the pages from their index if you noindex them, so we have come to believe that's what it does. but technically, it isn't.

it just tells them that they can't crawl it. anything already in their index can stay there. and if they can get the info through a third-party, then that's okay too.

if a third party links to your page, then they can grab the URL and title, or whatever, from that, and there's nothing you can do about it.

Lenny2

10:03 pm on Nov 10, 2011 (gmt 0)



@ potential Geek thanks for the link..
@ londrum thanks for the clarification on the noindex key information

In my case I can't 301 the pages, because we use it for testing.... I think I will look into rel="canonical" - that should do the trick. I actually didn't realize you could do a rel="cononical" on a completely different domain. good to know!

zerillos

10:59 pm on Nov 10, 2011 (gmt 0)

5+ Year Member Top Contributors Of The Month



I see you use large caps in your code. This is smth i was wondering about. Is robots noindex case sensitive? I'm starting to suspect it is, but i don't have any real proof yet.

BenFox

10:42 am on Nov 11, 2011 (gmt 0)

5+ Year Member



I'm assuming that there are too many URLs for you to manually remove them using WMT?

scooterdude

2:09 pm on Nov 11, 2011 (gmt 0)



check the cache dates, its not impossible that they was cached before you noindexed the pages and if the site is not heavily linked to, tis possible the crawler hasn't been back to recrawl and therefore de index the pages

Andem

2:20 pm on Nov 11, 2011 (gmt 0)

10+ Year Member Top Contributors Of The Month



+1 on what londrum says and plus my $0.02 :)

A couple of months ago, we tried noindexing and blocking pages via robots.txt. Unfortunately, Google was still listing these pages in their SERPs but without a snippet below the title.

The only way to get them out of the index was via webmaster tools and even now, there are errors about these pages being blocked by robots.txt *sigh*

londrum

2:26 pm on Nov 11, 2011 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



it makes you wonder why people are told to noindex low quality pages, as a way to beat panda. surely it shouldn't have any effect, if google can keep the pages in the index?

netmeg

3:38 pm on Nov 11, 2011 (gmt 0)

WebmasterWorld Senior Member netmeg is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Personally, I have never had a situation (over hundreds of thousands of URLs in aggregate) where a NOINDEXed URL showed up in the index... UNLESS I or someone else had made a mistake and blocked crawling with robots.txt. If you do that, then G can't even get in to *see* the NOINDEX.

I'm not saying it can't happen, but I've never seen it happen without some logical explanation for it.

pageoneresults

4:04 pm on Nov 11, 2011 (gmt 0)

WebmasterWorld Senior Member pageoneresults is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



A couple of months ago, we tried noindexing and blocking pages via robots.txt. Unfortunately, Google was still listing these pages in their SERPs but without a snippet below the title.


What you describe with the URI only listings is the default robots.txt behavior. The META (or X-Robots-Tag) NoIndex is at the document level. If you've Disallowed the bot from accessing the documents that contain the NoIndex directive, it will never see it, that's why your pages are still showing in the index with a URI only listing.

Remove the robots.txt directives and let the document level NoIndex do its thing. It works just as it says on the tin. I've been using it for years and I've never, ever, seen any of those documents appear in the index - ever.

londrum

5:48 pm on Nov 11, 2011 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



i managed to find that thread from ages ago
[webmasterworld.com ]

the reply from skrenta is the interesting one

Andem

11:32 pm on Nov 14, 2011 (gmt 0)

10+ Year Member Top Contributors Of The Month



If you've Disallowed the bot from accessing the documents that contain the NoIndex directive, it will never see it, that's why your pages are still showing in the index with a URI only listing.


Sorry for being unclear. I first tried NOINDEX, and then tried the robots.txt route. Neither worked like I thought they should. The WMT removal url/directory worked (past tense). After 3-4 weeks, they results with just urls or titles without snippets are showing back up in the results. Note that they pages are over 10 years old and have several decent backlinks.

Now, even after noindex, robots.txt blocking directories and a directory removal via webmaster tools, I have given up and decided to send a 404 Not Found header to all requests. Now I have a truckload of complaints about the 404s mixed with robots.txt blockage in webmaster tools. *sigh*

g1smd

1:46 am on Nov 15, 2011 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



The meta robots noindex should fix the problem, but it does take quite a while (sometimes more than 6 months).
 

Featured Threads

Hot Threads This Week

Hot Threads This Month