homepage Welcome to WebmasterWorld Guest from 54.234.128.25
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Working with 'Close this block', robots.txt and noindex
DirigoDev




msg:4521187
 6:09 pm on Nov 21, 2012 (gmt 0)

I have a URL that is set to disallow everything from the robots.txt. The site is a portal used by our customers. The URL is active on 3rd party websites. The URL has listing in Google SERPs in the following structure:

Close this block - KW1 KW2
subdomain.domain.com

The KW1 and KW2 is relevant and factual. The domain is correct. Where does "Close this block -" come from? It is not in my code anywhere. My site does not have a Title because it was set to noindex. Could this be the issue?

I know that Google won't crawl or index the content of pages blocked by robots.txt, but that they still index the URLs if they find them on other pages on the web.

Anyone know how to fix this?

 

tedster




msg:4521604
 3:06 pm on Nov 22, 2012 (gmt 0)

"Close this block" is not a directive to search engines, as far as I know. Instead, it is a third party application used by, among others, Microsoft to help site builders.

If you don't want a URL crawled you still need a robots.txt disallow rule.
If you don't want a URL indexed you still need a robots meta tag noindex rule.

It does get just a bit more complicated because if you disallow crawling then the meta tag never gets crawled or seen, so some Google SERPs (usual longer tail) might show a link with an explanation that the actual URL was restricted by robots.txt

DirigoDev




msg:4521607
 3:18 pm on Nov 22, 2012 (gmt 0)

The why is Google displaying "close this block" in the anchor text in the SERPS. Send me a message and I'll attach a photo.

klark0




msg:4521635
 5:20 pm on Nov 22, 2012 (gmt 0)

Maybe "close this block" is the anchor text or maybe the alt text used for the link on those affiliates pages you mentioned.

To deindex the page, remove the robots.txt block and add a noindex meta tag to the page itself.

DirigoDev




msg:4521656
 6:34 pm on Nov 22, 2012 (gmt 0)

You're right. This could be the case. Since there is no cache I can't tell where the link comes from... I'll drop the robots and add noindex to the individual page. Sort of sucks that Google does not follow the wishes of the robots file.

klark0




msg:4521670
 8:32 pm on Nov 22, 2012 (gmt 0)

They actually do. Robots.txt tells them not to crawl. It doesn't say not to index. As a result, if they find out about the URL from links on other sites they may index it without crawling it.

Or if a URL was previously allowed in robots.txt and then subsequently blocked, it may still remain indexed.

aristotle




msg:4521698
 12:21 am on Nov 23, 2012 (gmt 0)

I think it might be best to leave everything as it is now. Because if you allow Google to crawl the page in order to see a noindex tag, you also allow Google to see all the content on the page. And even though the page will no longer be indexed, it is still part of the site, and the Google algorithm might therefore take its content into account when making its overall assessment of the site's purpose. This might adversely affect the Google rankings and traffic of the other pages on the site. Here is an old thread that discusses this possibility:
[webmasterworld.com ]

klark0




msg:4521699
 12:34 am on Nov 23, 2012 (gmt 0)

^By that measure, having a bunch of pages indexed with no crawling, no titles, weird titles, and/or no description might also adversely affect the Google rankings and traffic of other pages.

If you're that worried about it, you should probably be removing the page all together and returning a 404 or 410.

aristotle




msg:4521701
 1:20 am on Nov 23, 2012 (gmt 0)

^By that measure, having a bunch of pages indexed with no crawling, no titles, weird titles, and/or no description might also adversely affect the Google rankings and traffic of other pages.

If you're that worried about it, you should probably be removing the page all together and returning a 404 or 410.


Well if I understood the OP correctly, the page in question is used by existing customers, so returning a 404 or 410 would prevent them from accessing it.

As for "no titles, weird titles, and/or no description", I'm not sure what this means. Because if the page is blocked by robots.txt, Google won't have any information about the title and description other than clues provided by links from other pages, which it already has anyway.

As for the effect on Google rankings and traffic for other pages on the site, this depends on how well the content of the blocked page would "mesh" with the content of the rest of the site. On my own sites, I've blocked a few pages because their content clearly wouldn't mesh very well with the rest of the site. Several of these blocked pages on one site have even been indexed by Google because of external links, but this does no harm to anything, so it doesn't matter.

ZydoSEO




msg:4521711
 3:12 am on Nov 23, 2012 (gmt 0)

If you don't want the page showing in the SERPs, allow Google to crawl it and use the meta robots NOINDEX. Having one page on your site that is flagged NOINDEX is not going to adversely affect rankings of other pages on your site.

aristotle




msg:4521842
 10:27 am on Nov 23, 2012 (gmt 0)

Having one page on your site that is flagged NOINDEX is not going to adversely affect rankings of other pages on your site.


How do you know this? Do you have a reference? Has a google employee told you?

There is an old thread about this subject [webmasterworld.com ] which offers good reasons to think that it could have an adverse effect. Unless someone at Google explicitly says that it can't, then I don't see how anyone can rule out the possibility.

tedster




msg:4521948
 5:29 pm on Nov 23, 2012 (gmt 0)

One reason I lean toward that statement being true is that I've done it several times and saw the noidexing of a single page help traffic. Another is that the strategy was recommended by Googlers on their own forums, talking to people coping with Panda. Granted, it was recommended as a sort of stop gap measure, but it still was recommended.

ZydoSEO




msg:4521951
 5:47 pm on Nov 23, 2012 (gmt 0)

Funny you reference a thread that you started, but okay. And even if someone at Google says that it can't (or can) doesn't mean that is true either. They dont always tell the truth (Cutts... PR sculpting works... then a year after they "supposedly" changed how PR is handled when links on a page are nofollowed, he finally announced PR sculpting no longer worked and hadn't been working for over a year).

Anyway, let me rephrase...

Based on my experience working with sites for large brands like LendingTree, large lead gen sites with 10s of thousands or 100s of thousands of pages, medium and large ecomm sites, small lead gen sites, small business websites, and even personal blogs (basically, the entire gammet)... I have never once seen one shred of evidence that would support the "hypothesis" that noindexing one or even dozens of pages on a site has adversely affected that site's rankings in any way.

tedster




msg:4521955
 5:52 pm on Nov 23, 2012 (gmt 0)

even if someone at Google says that it can't (or can) doesn't mean that is true either. They dont always tell the truth

They don't always KNOW "the truth" - at least not the truth in practice that we see. They know the truth of what they intend - but their point of view biases them, and the complexity of the algorithm at this point makes the truth of very general statements turn into a bug fuzz ball.

Sgt_Kickaxe




msg:4521958
 6:18 pm on Nov 23, 2012 (gmt 0)

Embrace the nofollow meta tag on any page that you have questionable or paid content. I prefer the meta tag because the rel="nofollow" tag on the link doesn't stop the page from being one you don't want linking to your site. Let Google work out the rest, the page will rank just fine assuming it has enough incoming links/social mentions etc.

Theoretically you could employ a nofollow meta tag sitewide on your site and rely solely on incoming links to support it. I don't see a valid reason to do this but it's possible, I tried it by accident once...

aristotle




msg:4521967
 7:17 pm on Nov 23, 2012 (gmt 0)

I have never once seen one shred of evidence that would support the "hypothesis" that noindexing one or even dozens of pages on a site has adversely affected that site's rankings in any way.


I don't see how this proves anything one way or the other. In fact, if the nodexing of some pages has no effect, then it seems to indicate that the Google algorithm continued to take account of the content of those pages even after they were noindexed, which supports my argument.

But let me give a clearcut example. Suppose you have an established site about growing roses. Then you add a noindexed (but crawlable)page full of porn words and you also sell links to porn sites on it, and you also link to it internally. Do you really think that the Google algorithm will totally disregard that page simply because of the noindex tag?

jimbeetle




msg:4521995
 9:21 pm on Nov 23, 2012 (gmt 0)

If Google correctly follows the noindex -- and we have no evidence or reason to suspect it does not -- then the page is not in its index.

DirigoDev




msg:4522018
 10:34 pm on Nov 23, 2012 (gmt 0)

The page in question is a portal only used by authorized libraries (e.g. under subscription). We have another page used by consumers. Having the library page indexed in Google with a funky title is not what we want because it may lead consumers to the library site. The site looks and operates similar to the consumer portal. It could create confusion.

So I'm going to drop the robots disallow all and just put a meta noindex/nofollow on all pages in front of authentication. This will fix the issue.

I guess I had forgotten that Google puts pages in the index if they're found on other websites. And this link is found on a huge number of library sites. I still think this is a bit funky because my robots.txt shows intent that I don't want the page/site indexed.

DirigoDev




msg:4522029
 10:46 pm on Nov 23, 2012 (gmt 0)

Oh, I don't care about SEO on this subdomain. Primarily because the sales process is offline.

But this thread has me thinking. If I have a link pointing to my site from thousands of library sites (Europe, Asia, Middle East, and North America - some schools as well) and if I can dictate the anchor text I might have missed some very important SEO pay-dirt.

I'm now considering moving the login for new library/school clients to my consumer sales site to a /libraries/ or /education/ directory.

ZydoSEO




msg:4522031
 10:48 pm on Nov 23, 2012 (gmt 0)

I don't see how this proves anything one way or the other.


I've NOINDEXed many a page on many sites and never seen adverse affects to rankings of other pages as a result.

Where is your repeatable proof that NOINDEXing a page actually DOES lead to negative affects in rankings of other pages? Simply "hypothesizing" that it does cause issues doesn't make it so.

Proof is in the puddin'. I've done it repeatedly on many types & sized sites w/ no adverse effects. You (and the thread you referenced) simply "theorized" that it might cause issues without any example (much less repeatable examples) where it has caused issues.

Hopefully, you are aware that the <meta name="robots" content="noindex"> element only prevents Google from indexing the "content" and showing that URL in their SERPs. It does NOT prevent Google from leaving that page in their link graph, following that NOINDEXed page's outbound links, noting link text used on those link in the link graph, flowing PageRank out on that page's outbound links (because they are considered FOLLOWed links), etc. And this, you'll be pleased to hear, I have heard many times from different Googlers including Cutts.

Cutts eludes to it in the following video, though I've also heard him talk about this in other interviews, videos, and in person at Pubcon/SMX on many occasions.

[youtube.com...]


Hopefully the mods will leave the link.

A <meta name="robots" content="noindex"> is logically equivalent to <meta name="robots" content="noindex, follow">. It is not logically equivalent to <meta name="robots" content="noindex, nofollow">.

In fact, if the nodexing of some pages has no effect, then it seems to indicate that the Google algorithm continued to take account of the content of those pages even after they were noindexed, which supports my argument.


The fact that those outbound links on the NOINDEXed page are actually FOLLOWed links even though the page's content is not indexed is a much more plausable explanation IMO as to why rankings of other pages on a site don't change when you NOINDEX one of your pages.

It's not likely the "content" of the NOINDEXed page that was helping surrounding pages rank as much as it was the site's architecture. Specifically, it's internal linking structures (like top navigation, left navigations, breadcrumbs, contextual links, etc.) that appear on the NOINDEXed page likely influences how other pages on the site ranked FAR more so than does the content of the NOINDEXed page. And the credit other pages are given for inbound links from the NOINDEXed page are not affected at all by NOINDEXing that page. The outbound links still remain followed.

But we can agree to disagree. I'm just saying I have never, in all the times that I've used a <meta name="robots" content="noindex"> seen any evidence of what you're hypothesizing.

jimbeetle




msg:4522047
 11:49 pm on Nov 23, 2012 (gmt 0)

My comment above didn't make much sense as a standalone. When I said that noindexed pages wouldn't be in the index I was referring to aristotle's concerns that the content of noindexed pages could somehow be used against a site.

aristotle




msg:4522191
 2:09 pm on Nov 24, 2012 (gmt 0)

Well this thread's original question was whether to remove a block from the robots.txt file and allow Google to crawl the page and see its content for the first time. That's different from the question of adding a noindex tag to a page that Google has already seen, which was brought up later. In any case, I believe that the content of any one page, including both its text and links, can affect the rankings of other pages on the same site. And I also believe that this might be true even if a page has a noindex tag. That's what I've consistently said throughout this thread, (although some appear to have mis-interpreted it,) and I haven't seen anything to cause me to change my opinion.

aristotle




msg:4522196
 2:20 pm on Nov 24, 2012 (gmt 0)

By the way, earlier in this thread I tried to ckarify the issue by giving a clearcut example as follows:

Suppose you have an established site about growing roses. Then you add a noindexed (but crawlable)page full of porn words and you also sell links to porn sites on it, and you also link to it internally. Do you really think that the Google algorithm will totally disregard that page simply because of the noindex tag?

I haven't seen any responses to this, so would like to ask again: Does anyone really believe that adding this page, even with a noindex tag, wouldn't hurt the rankings of the other pages on the site.

jimbeetle




msg:4522213
 4:55 pm on Nov 24, 2012 (gmt 0)

Do you really think that the Google algorithm will totally disregard that page simply because of the noindex tag?

Yes, because if Google respects the noindex meta -- and again, we have nor reason or evidence to suggest that it doesn't -- then that's what it is supposed to do, the page is not in its index.

Do you have any evidence that Google does not fully support the noindex directive?

aristotle




msg:4522214
 5:08 pm on Nov 24, 2012 (gmt 0)

jimbeetle
Yes I agree that Google would follow the metatag directive and exclude that page from its index. But my intended question is, would the presence of that page (even though it won't be indexed) affect Google's view of the rest of the site and hurt the rankings of the other pages? To my mind, that's the real issue which some of us have been discussing.
.

ZydoSEO




msg:4522241
 9:16 pm on Nov 24, 2012 (gmt 0)

By the way, earlier in this thread I tried to ckarify the issue by giving a clearcut example as follows:

Suppose you have an established site about growing roses. Then you add a noindexed (but crawlable)page full of porn words and you also sell links to porn sites on it, and you also link to it internally. Do you really think that the Google algorithm will totally disregard that page simply because of the noindex tag?

I haven't seen any responses to this, so would like to ask again: Does anyone really believe that adding this page, even with a noindex tag, wouldn't hurt the rankings of the other pages on the site.


It's a bad analogy IMO and nothing like what this thread was talking about (adding a <meta name="robots" content="noindex"> element to a "normal" page... not one with tons of paid links to porn sites). Your analogy talks about a page that would ALREADY be violating Google's Webmaster Guidelines. Actually, if such a page existed and was indexed and Google penalized you for it, adding a <meta name="robots" content="nofollow, noindex"> to that page might even be enough to get the penalty lifted.

Regardless of whether a page with tons of paid, FOLLOWed porn links has a <meta name="robots" content="noindex"> element or not, having a page with FOLLOWed, paid links to bad neighborhoods is likely going to cause issues with rankings.

In your analogy, adding a <meta name="robots" content="noindex"> to cause Google to NOT index the content and URL does NOT get rid of the paid, FOLLOWed links which are a violation of Google's Webmaster Guidelines. Those links to the porn sites will remain in Google's link graph for your site, and can still cause you to be penalized.

However... If you have a site growing roses... and you add a brand new page with paid links like you're suggesting with paid links (even to porn sites) and have a <meta name="robots" content="noindex, NOFOLLOW"> element in the <head> of that page from the start of its existence, I doubt very seriously even such a page would negatively affect rankings of other pages on the site and/or Google's view of the site in general. The content of that page would be NOINDEXed so it would not affect your site. All of the paid, porn links on that page would now be NOFOLLOWed, so you would not be violating their guidelines.

Google has said repeatedly they have absolutely no problem with sites selling links... IF... they NOFOLLOW the paid links preventing them from flowing PageRank. And as long as the page had a <meta name="robots" content="nofollow"> or <meta name="robots" content="noindex, nofollow"> element, every link on that page (paid or unpaid) would be considered NOFOLLOW and therefore not cause issues.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved