Welcome to WebmasterWorld Guest from 3.227.208.153

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Duplicate pages and noindex

     
12:28 am on Jun 26, 2005 (gmt 0)

New User

10+ Year Member

joined:Aug 10, 2002
posts:33
votes: 0


In a website, if I have 2 links and when I click each link, the same page is displayed. If I put noindex in one click, does Google still index this page?
10:53 pm on June 26, 2005 (gmt 0)

New User

10+ Year Member

joined:Aug 10, 2002
posts:33
votes: 0


Hmm, no answer? I'm sure some people have this problem before. Here's an example:
An article is written by 2 authors, and it may have two urls (therefore 2 links) like:

abc.com?id=123&author=au1
abc.com?id=123&author=au2

It will display same page but different urls. That's the problem!

This can also happen in case when a product belongs to 2 different categories! I'm sure some people here had this before.

1:26 am on June 27, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 1, 2004
posts:1987
votes: 0


What... no helpful people here? They are all hungover from N.O.

I don't know about putting noindex on a 'click' but if you put the noindex meta tag on one of the dup pages that would do it IMHO.

1:30 am on June 27, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 16, 2004
posts:693
votes: 0


<a href=oneofthelinks rel="nofollow">adopted by G,Y,M</a>

nofollow seems more for external links though
an alternative would be
robots.txt
user-agent: *
disallow: one of the URL's

8:03 am on June 27, 2005 (gmt 0)

Full Member

joined:Jan 12, 2004
posts:334
votes: 0


Don't forget:

User-agent: *
Disallow: /one of the URL's

....The slash before the path.

I don't know if that is case-sensitive or not. If it is, what's above is correct.

Reid, do you know if this definitely works?
<a href=oneofthelinks rel="nofollow">adopted by G,Y,M</a>

Joe, I don't think the noindex tag works for G. I was about to post something on this. I have the 'noindex, nofollow' tag on one of my links pages for the G-bot, and it still visits the page! Apparently it doesn't obey it.

10:28 am on June 27, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:July 18, 2002
posts:154
votes: 0


The NOINDEX value of the robots META tag will not prevent from crawling. A crawler must fetch the page to parse META tags. NOINDEX means 'do not deliver this page on the SERPs'.

As for the initial question: I'd go for NOINDEX,FOLLOW with one page and INDEX,FOLLOW (=ALL = default value =obsolete) with the other.

12:03 pm on June 27, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 1, 2004
posts:1987
votes: 0


Noindex meta seems to work for me. I just checked and there are no pages with Noindex and in my site: command.

But what do I know - nill.

1:50 pm on June 27, 2005 (gmt 0)

Full Member

joined:Jan 12, 2004
posts:334
votes: 0


>>>>Joe, I don't think the noindex tag works for G. I was about to post something on this. I have the 'noindex, nofollow' tag on one of my links pages for the G-bot, and it still visits the page! Apparently it doesn't obey it. <<<<<<<

I found out (DUHHHHH), that bot has to first visit the page in order to see the meta tag! So then I assume after it crawls the <head>, it sees the "noindex" or whatever and obeys it.

3:53 pm on June 27, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 16, 2004
posts:693
votes: 0


Reid, do you know if this definitely works?
<a href=oneofthelinks rel="nofollow">adopted by G,Y,M</a>

yes - the bot will disregard a link with the nofollow attribute but it is designed for external links - not sure how it would affect ranking if you start using it on internal links though.

we are talking about 2 links to the same page from one page right?
I would just use robots.txt

user-agent: *
disallow: /?id=123&author=au2

BTW this is a perfect example where google sitemap could prove very useful
The problem here is that one page has 2 URL's which google could index the same page under both URL's and cause dupe-content. Google sitemap could straighten this mess by only submitting one of the 2 URL's in the sitemap.xml file. Doesn't do much for Yahoo and MSN though.

11:28 pm on June 27, 2005 (gmt 0)

New User

10+ Year Member

joined:Aug 10, 2002
posts:33
votes: 0



>>we are talking about 2 links to the same page from one page right?

Yes. And therefore 2 different links to TWO different urls of the same page! The main problem is if Google still index such page after 'noindex' is placed properly - that's the question.

5:28 am on June 28, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 16, 2004
posts:693
votes: 0


if you put a noindex Meta tag on the page then neither URL would get indexed. That's why I was saying to disallow one of the URL's in robots.txt and the page would be indexed under the other URL.

This is how google indexes basically

1. googlebot finds the URL from a link on one of your pages (or from another website) and adds a URL-only listing to the index
2. another googlebot crawls the URL-only listing and adds the Title and description, cache ect.

If you do nothing, it will list both URL's as 'pages' and crawl each URL seperatly causing a duplicate listing.

If you put a META noindex on the page it will attempt to crawl both URL's but leave them both as URL-only. it will treat the 2 URL's as 2 seperate 'pages' with a noindex meta tag on each of them.

Block one URL in robots.txt and it will list the page under the other URL and it will remove the disallowed URL from the index. If someone links to the disallowed URL then it will appear in the index as URL-only but not get crawled because of robots.txt even get removed because it is disallowed in robots.txt. It will keep cycling through getting listed (from the other site) and getting removed (from robots.txt)

5:51 am on June 28, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 16, 2004
posts:693
votes: 0


the difference between robots.txt and Meta robots directions

robots.txt disallows the URL before it is even requested (do not request this URL)

robots META tag is in the header, after the URL has been requested. It tells robots NOINDEX (do not add this page to the index - leave it URL-only in google)
NOFOLLOW (do not follow the links on this page)

Google is funny that way 'knowing about' a URL does not make it 'indexed'
I'm not sure wether they 'know about' 8 million URL's or they actually have 8 million URL's 'indexed'

6:51 am on June 28, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:July 18, 2002
posts:154
votes: 0


Since the page is dynamic, I'd put in a NOINDEX meta tag for one version, and no robots tag for the other version, depending on the input parameters. This works with every crawler.

If you search for "steering and supporting search engine crawling" you'll find more info on robots.txt, robots meta tags, rel=nofollow ... in my tutorial.

2:30 am on June 29, 2005 (gmt 0)

New User

10+ Year Member

joined:Aug 10, 2002
posts:33
votes: 0


Thank you for information. So to sum it up, here's what I'll will do:

- I will Not use noindex on any dynamic page.
- Because I have many different pages but they all
are this type of page, I'll create a folder called xyz that will contain all URLs that I don't want search engines to index; so in robots.txt, I will add this line:

disallow: /xyz/

I'm safe now?
Thank you,
Best regards,
John

1:43 pm on July 1, 2005 (gmt 0)

Full Member

joined:Jan 12, 2004
posts:334
votes: 0


Reid, do you know if this definitely works?
<a href=oneofthelinks rel="nofollow">adopted by G,Y,M</a>

yes - the bot will disregard a link with the nofollow attribute but it is designed for external links - not sure how it would affect ranking if you start using it on internal links though.

Thanks Reid. Reason I was asking is to maybe do this with some of the sites to which I link that G may not like.

I'm confused by this:

>>we are talking about 2 links to the same page from one page right? <<<

Yes. And therefore 2 different links to TWO different urls of the same page! The main problem is if Google still index such page after 'noindex' is placed properly - that's the question.

What's wrong with one page having more than one link to the same page? This is with INTERNAL links. Many if not most may have a few links on one page that point to the same other page at their site. For example, a product page and at the top you may have hyperlinked "back to whatever.com home page" and you may also have that at the bottom of the product's page.

1:45 pm on July 1, 2005 (gmt 0)

Full Member

joined:Jan 12, 2004
posts:334
votes: 0


if you put a noindex Meta tag on the page then neither URL would get indexed. That's why I was saying to disallow one of the URL's in robots.txt and the page would be indexed under the other URL.

How can putting a robots meta tag on one page affect some other page? Please explain.
Thanks.

11:04 am on July 2, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 16, 2004
posts:693
votes: 0


This is the topic right?

abc.com?id=123&author=au1
abc.com?id=123&author=au2

It will display same page but different urls. That's the problem!

so lets call the page article123.
either URL above will serve up this page because article 123 was written by author 1 and 2 both.

There is a concern that google will list the same article (123) under both URL's - that's 2 pages with the exact same content.
Now if that content has a robots META tag (noindex) in the header, then neither URL would get indexed because they both point to the same content containing the robots tag.
But if there were no robots tag, then both URL's would be indexed.
We only want one of the URL's indexed (because there's only one page) so if we disallow one of the 2 URL's in robots.txt and have no robot's tag on the page, then it will be indexed one page one URL.
The URL that's not disallowed will be indexed but the URL that is disallowed will not be indexed.
So One page 2 URL's
one indexed and one disallowed

11:32 am on July 2, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 16, 2004
posts:693
votes: 0


the other solution would be to write a robots meta (noindex) into the script.
when id=123au=1 is called then it gets a noindex written into the header
when id=123au=2 is called it doesn't get written in.

this would be a better solution because if any bot does find the (noindex) URL then it won't index it.

disallowing in robots.txt doesn't keep the bots from finding it through an inbound link, but should it get indexed, the robots.txt should cause it to be removed again (get rid of the IBL of course).

John's solution also looks good, folder /xyz/

3:16 pm on July 2, 2005 (gmt 0)

Preferred Member from US 

joined:June 2, 2003
posts:376
votes: 0


...