| 8:13 pm on Mar 17, 2011 (gmt 0)|
Hmmm, I did have the privilege of reviewing the document in question. There is a META Refresh in the process too. Google will typically treat those as 301s. So, there's a bit more to this particular instance than most others would be dealing with.
Disallow: > META Refresh > noindex, nofollow
A bit confusing yes? Googlebot SHOULD have only gotten the Disallow which is what happened. Now it shows your standard URI only entry.
| 8:16 pm on Mar 17, 2011 (gmt 0)|
Although this particular example is an affiliate redirect script and there are no external links to it at all, I just sussed out the title is made up of two parts: the anchor text of the ffiliate link and the title of the referring page (in this case the site name). So that's logical - thanks.
So am I right in thinking, use robots.txt to disallow directories and META NOINDEX tags to make sure specific pages don't get indexed?
| 8:27 pm on Mar 17, 2011 (gmt 0)|
| 8:47 pm on Mar 17, 2011 (gmt 0)|
Personally I feel noindex is a poor solution. If you have links that you don't want Google to index, why are they on your site? If it's a structural issue - fix it.
| 8:59 pm on Mar 17, 2011 (gmt 0)|
Did anyone who followed the link aakk9999 posted make it to this thread?
Why Google Might "Ignore" a robots.txt Disallow Rule [webmasterworld.com]
Definitely worth the read.
| 9:10 pm on Mar 17, 2011 (gmt 0)|
Good reminder. One big point - if your robots.txt has a User-agent: Googlebot section, then you need to include ALL the rules you expect googlebot to follow in that section. The rules in a User-agent: * section will not have any effect.
| 9:11 pm on Mar 17, 2011 (gmt 0)|
I met another site with the exact same problem only last month.
The information in that thread is as true now as it was in 2006. :)
There are also definitive answers from both GoogleGuy and Vanessa Fox.
Those were the days!
| 9:22 pm on Mar 17, 2011 (gmt 0)|
|There are also definitive answers from both GoogleGuy and Vanessa Fox. |
Yeah, remember when they used to remember us over here and at least pretended like they cared?
You're right g1smd, those were the days...
| 9:32 pm on Mar 17, 2011 (gmt 0)|
|Did anyone who followed the link aakk9999 posted make it to this thread? |
Why Google Might "Ignore" a robots.txt Disallow Rule [webmasterworld.com]
Definitely worth the read.
Thanks TheMadScientist - all is now clear :-)
| 9:45 pm on Mar 17, 2011 (gmt 0)|
Here's one reason why. We are a manufacturer and an ecomm, and we sell 1000s of products and have been doing so since the 90s. Some of those products exist in different sizes, and have different reviews, links etc, and stand on their own as separate URLs. But, it is silly to get google to index all the variations... So, we set one on them as a master and have it indexed, and set the variations to noindex. For legacy reasons, the pros outweigh the cons, and a noindex gives us a reasonable solution.
| 10:00 pm on Mar 17, 2011 (gmt 0)|
Google (and Bing) themselves said "Noindex" if you intend to add more to the page eventually, suggesting that noindex pages don't count in SERPS and use 404/410 if the page will not be coming back. Probably has to do with page age or something, if you delete you lose it. 410 is a bit faster and it's gone for good, no need to recheck it for Googlebot.
On my noindex pages all the internal navigation is removed
| 1:11 am on Mar 18, 2011 (gmt 0)|
@walkman, why did you remove internal navigation? It is noindex right, does it matter?
| 1:25 am on Mar 18, 2011 (gmt 0)|
Noindex still means that the links on the page are followed and they will circulate PageRank (unless you use "noindex,nofollow"). For that reason, I usually don't change anything about internal navigation on a noindexed page.
So I'm curious about your thinking here, too, walkman.
| 2:18 am on Mar 18, 2011 (gmt 0)|
@tedster, I saw a success story this morning, they are using 'noindex, follow' instead of just 'noindex'. Which one do you prefer after Panda update?
| 2:30 am on Mar 18, 2011 (gmt 0)|
"Follow" is the default action, so stating it or not stating it makes no difference.
| 3:04 am on Mar 18, 2011 (gmt 0)|
|@walkman, why did you remove internal navigation? It is noindex right, does it matter? |
Drains PR and my pages are not static, in the sense that I have to update them quite often. It takes time and I can dedicate that on a smaller site. So, I'm not taking any chances and removed maybe too much but when I come back I can make it up with the rest. Once that happens I can add the pages back, one by one, after updating them of course. They are not deleted in my CMS, just 'up in the air,' and out of navigation, search and everything until I check a box back.
| 3:06 am on Mar 18, 2011 (gmt 0)|
Thanks, I think I see your point of view, now. There are now no internal links pointing to those URLs either - right?
| 5:17 am on Mar 18, 2011 (gmt 0)|
|On my noindex pages all the internal navigation is removed |
The links to the pages drain PR equally from internal AND external links, but the links on them back to your indexed pages don't drain PR from anywhere, except the outbound links ... I really don't get it?
I can see not linking to them, but what you do by not linking back to yourself from them is increase the PR passed by every outbound link ... The outbound links pass more PR when you remove the internal links ... What am I missing?
| 5:24 am on Mar 18, 2011 (gmt 0)|
Home > category > page
Home > alphabet > page
If I have 100 on the categories it is much worst than having 30. How much flows back, I don't know...
| 5:34 am on Mar 18, 2011 (gmt 0)|
You increase the external PR flowed by every link on your site, and decrease the internal PR you keep ... Example:
If page A has 100 links on it and 10 of those are external (90 internal links + 10 external links), when you remove 70 links to your site you are left with 20 internal and 10 external. You inadvertently cause the external links to flow more PR by removing the links to your own pages.
You don't increase the PR of the page those links are on. You change the amount 'awarded' (flowed through) to each link. Yes, the amount each internal link passes goes up, but so does the amount the external links pass. By having more internal links you 'dampen' the amount flowed through the external links. By removing internal links you 'dampen' the value you keep internally overall.
You started off with 90 overall links to your site on the page. You ended up with 20 overall links to your site on the page. There are a constant 10 external links. Who lost link weight by removing the links, you or the external pages?
10 points / 100 links = .1 passed by each internally and externally.
10 points / 30 links =.33 passed by each internally and externally.
90 x .1 = 9 points internal.
10 x .1 = 1 point external.
20 x .33 = 6.6 points internal.
10 x .33 = 3.3 points external.
You are taking link weight away from yourself and sending it to the competition.
| 5:47 am on Mar 18, 2011 (gmt 0)|
I understand your point, and maybe as PR keeps going in circles I end up with the same amount. But cleaner pages are worth it.
I have a few external links on each page and they are all on the last page. It's hard to get direct links to my 'product pages' so everything flows from the homepage and 3 other sections.
| 5:52 am on Mar 18, 2011 (gmt 0)|
Got it, so it might be something that works for your specific situation, but I don't think I'd recommend it for everyone ... Just wanted to make sure I wasn't missing something silly, because you really made me think about it for a few minutes. ;)
Of course, now I am starting to wonder how many people get into a situation with a 'loss of rankings' or 'penalty' of some type and do something similar, which could (theoretically) cause them to lower their own rankings more and increase the rankings of others more?
I think it could be more than I would have guessed yesterday ... lol
| 7:59 am on Mar 18, 2011 (gmt 0)|
|You are taking link weight away from yourself and sending it to the competition. |
Only if you are linking TO your competition. And even, this is only the PR calculation, just part of the total algorithm. Google's algo definitely does something else that helps sites that link out.
Two years ago I tested two sites, launched at the same. I kept everything as parallel as possible except that one never linked out and the other linked out at least twice from every article. The one with external links was soon ranking better. Both were direct sales sites (not affiliates).
What Google does in cases like this, I can only guess, so I won't. But it seems clear to me they do something, both from my own testing and from some cryptic comments made by Matt Cutts from time to time on this topic of external linking.
| 10:21 pm on Mar 23, 2011 (gmt 0)|
|In order to even see the noindex meta tag, googlebot must crawl the page. It may crawl less frequently after it verifies the noindex a few times, but it must continue to crawl. |
This is accepted but can a page get indexed in SERPs with no-index tag ?
Regardless of any keyword it contains ?
This is very important and meaningful for existence of this tag.
| 10:38 pm on Mar 23, 2011 (gmt 0)|
Another trick to sites with profile pages that are often unpopulated is to place the link behind some logged in only code. If the links only show up when logged in they don't show up for the average user or search engine. You can safely no-index pages that are for members only.
| 10:49 pm on Mar 23, 2011 (gmt 0)|
|If the links only show up when logged in they don't show up for the average user or search engine. |
| 10:53 pm on Mar 23, 2011 (gmt 0)|
|This is accepted but can a page get indexed in SERPs with no-index tag? |
I've never seen it happen and I've been using it for years now. When someone does claim that they've found noindex documents in the SERPs, there is ALWAYS a reason why. Most of the time it is because the document is disallowed via robots.txt so the noindex is not being seen. I've not seen any other instances that I can remember that weren't due to robots.txt.
noindex does just as it says on the tin, the document WILL NOT appear in the index. You can perform site: searches and you'll see that they will not get returned, no matter how advanced you get with the queries. If you do find noindex documents in the SERPs, then something is technically wrong.
It is one of the few protocols that all the SEs adhere to. It is referred to as the REP (Robots Exclusion Protocol) and there are only three choices...
Anything else is someone's misinterpretation of the protocol.
|Personally I feel noindex is a poor solution. If you have links that you don't want Google to index, why are they on your site? If it's a structural issue - fix it. |
Use of noindex is to prevent documents from appearing in the index. Use of nofollow prevents the links in that document from being followed. I use noindex to take the cruft out of the equation. For example, I don't want upper level category pages indexed in most instances. I want them to get crawled, and the bot to follow the links but I don't want the document indexed, there are more valuable documents further down the breadcrumb which is what I want indexed, the money stuff. I like to conserve equity wherever I can. There is no need for the intermediary pages to be on the front line.
| 11:03 pm on Mar 23, 2011 (gmt 0)|
|If you do find noindex documents in the SERPs, then something is technically wrong. |
Thank You for clarifying pageoneresults.
My last all-in-one question.
All noindex docs are only meant for self-non-disclosure, but can disclose anything they contain + pass on link-juice and any other META info.
If my understanding above is correct, noindex tag has a very great meaning..
| 11:15 pm on Mar 23, 2011 (gmt 0)|
|All noindex docs are only meant for self-non-disclosure, but can disclose anything they contain + pass on link-juice and any other META info. |
Not sure I fully understand the question.
noindex documents are just that, they are not indexed, they are invisible to anyone searching for them. Unlike robots.txt entries which can easily be found via a site: search.
Google states that they will download and crawl the noindex document but it WILL NOT appear in their index. And yes, noindex documents pass value, you want them too. For example, if you are using noindex on intermediary catalog pages, you want the bot to crawl and give credit for everything that is there. From my perspective, it's like telling the bot "Hey, you can crawl this to maintain the integrity of the linking architecture and the document semantics, but I don't want you to index it."
I think of it this way, if you have a site with 100k documents, there's a good chance that a large percentage of those are intermediary documents that may not be worth indexing. Especially if you are allocated a certain amount of crawl budget and pages indexed. You want to take whatever equity comes your way and direct it to the documents that deserve it the most.
The use of noindex, nofollow adds another level of protection to the document. My experiences shows me that noindex, nofollow effectively blocks the page from obtaining and/or passing value, it is removed from the equation. We do this with login pages and other documents that have no value from an indexing perspective but are a must for the user.
| 11:36 pm on Mar 23, 2011 (gmt 0)|
Thank You again pageoneresults.
Now is strongly beleive,
noindex is only meant for NOT-GETTING or NOT-LETTING document indexed in SERPs.
noindex does adds protection level to document as stated in example above, but definetly differs from nofollow.. (not to get confused)
Aparently.... I have observed a ranking site getting de-ranked in SERPs due to high-usage of noindex tag on there non-required pages, except there main node/thread/content/article/topic pages.
Not available in SERPs now.. ?
| 12:08 am on Mar 24, 2011 (gmt 0)|
|I have observed a ranking site getting de-ranked in SERPs due to high-usage of noindex tag on there non-required pages, except there main node/thread/content/article/topic pages. |
If you're saying that the use of noindex caused that site to get deranked overall, that would be a concern. I've always been an avid fan of noindex for documents that I don't want users landing on after performing a search. Also, I don't want anyone scraping site: searches and finding my entire site indexed.
I've seen no ill effects of using noindex and/or noindex, nofollow where appropriate. When dealing with larger sites, there are quite a few intermediary documents that I feel don't need to be indexed. If a user landed on one of them, they'd have to click once or twice more to find what they were looking for. My thinking is to just remove that hodgepodge from the equation and provide the bot a direct indexing path to the primary content. All those directory style listings that are paginated get the noindex treatment. It's the click after that which counts. The final destination.
|Not available in SERPs now? |
It's possible that they overcooked things and removed "too much" from the equation, who knows...
| This 67 message thread spans 3 pages: < < 67 ( 1  3 ) > > |