Welcome to WebmasterWorld Guest from 23.22.206.103

Forum Moderators: goodroi

Message Too Old, No Replies

duplicate content indexing

     
11:54 pm on Apr 17, 2007 (gmt 0)

Full Member

10+ Year Member

joined:May 16, 2004
posts:218
votes: 0


Can't figure out how to implement this:
[webmasterworld.com...]

How do you achieve this using the robots.txt file? Right now we're stumped. We can't add a "noindex" instruction for "#" in robots.txt. The symbol "#" is seen as the beginning of the comment.

Let me know if you need to see the complete code we're trying to use.

[edited by: encyclo at 1:53 pm (utc) on April 18, 2007]
[edit reason] fixed link [/edit]

9:30 am on Apr 18, 2007 (gmt 0)

Full Member

5+ Year Member

joined:Dec 3, 2006
posts:257
votes: 0


What do you mean by "#"? (sorry, no time to read 193 messages over 7 pages ;) )

If it is # as in [webmasterworld.com...] , "#someanchor" is never seen by the webserver, only by the browser, which strips it out before querying the server.

2:38 pm on Apr 18, 2007 (gmt 0)

Full Member

10+ Year Member

joined:May 16, 2004
posts: 218
votes: 0


We're trying this:
User-agent: *
Disallow: *#*
12:07 pm on Apr 19, 2007 (gmt 0)

Full Member

5+ Year Member

joined:Dec 3, 2006
posts:257
votes: 0


It won't change anything.
Have you seen urls with # in your logs?
3:33 pm on Apr 19, 2007 (gmt 0)

Full Member

10+ Year Member

joined:May 16, 2004
posts:218
votes: 0


Yes, here's an example of a URL that gets created when I click on a link to be scrolled down the page:
www.domain.com/articles/article-name/#view

Do you suggest I change # to something else? If so, what?

5:40 pm on Apr 19, 2007 (gmt 0)

Full Member

5+ Year Member

joined:Dec 3, 2006
posts:257
votes: 0


Yes, here's an example of a URL that gets created when I click on a link to be scrolled down the page:
www.domain.com/articles/article-name/#view

It just means that the browser will request www.domain.com/articles/article-name/ from the server, and scroll the page to the anchor named "view", if it finds one.
The webserver never sees a request with '#'. And if it does by the fault of a bad request, it replies with a 404 (since it has no page named "#view").

Do you suggest I change # to something else? If so, what?

You don't have the choice. # is used to navitage to anchors.
10:11 pm on Apr 19, 2007 (gmt 0)

Full Member

10+ Year Member

joined:May 16, 2004
posts:218
votes: 0


Tx for the clarification.

Am concerned that this will cause the SE's to index both the non "#" page and the "#" page separately, and to discount the backlinks to one of them (probably the "#" version), instead of seeing them as the same page and adding the bl's together. That's what the previous ww URL reference was about - duplicate content.

2:20 am on Apr 20, 2007 (gmt 0)

Administrator from US 

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:June 21, 2004
posts:3124
votes: 113


just spoke with someone at google and they told me that using # in the url is not an issue and will not cause duplicate content.
2:49 am on Apr 20, 2007 (gmt 0)

Full Member

10+ Year Member

joined:May 16, 2004
posts:218
votes: 0


It pays to know the right people. Tx Greg!
2:53 am on Apr 20, 2007 (gmt 0)

Full Member

10+ Year Member

joined:May 16, 2004
posts:218
votes: 0


So the second part of the question - what would happen if folk linked to the "/#" as well as the "/" version of the page? Would G be able to recognise these as both being for "/" or would they be diluted?
6:34 pm on May 7, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Everything after, and including the #, is stripped out by Google. They don't index it. They see it all as one URL.