Welcome to WebmasterWorld Guest from 54.166.122.69

Forum Moderators: goodroi

duplicate content indexing

   
11:54 pm on Apr 17, 2007 (gmt 0)

10+ Year Member



Can't figure out how to implement this:
[webmasterworld.com...]

How do you achieve this using the robots.txt file? Right now we're stumped. We can't add a "noindex" instruction for "#" in robots.txt. The symbol "#" is seen as the beginning of the comment.

Let me know if you need to see the complete code we're trying to use.

[edited by: encyclo at 1:53 pm (utc) on April 18, 2007]
[edit reason] fixed link [/edit]

9:30 am on Apr 18, 2007 (gmt 0)

5+ Year Member



What do you mean by "#"? (sorry, no time to read 193 messages over 7 pages ;) )

If it is # as in [webmasterworld.com...] , "#someanchor" is never seen by the webserver, only by the browser, which strips it out before querying the server.

2:38 pm on Apr 18, 2007 (gmt 0)

10+ Year Member



We're trying this:
User-agent: *
Disallow: *#*
12:07 pm on Apr 19, 2007 (gmt 0)

5+ Year Member



It won't change anything.
Have you seen urls with # in your logs?
3:33 pm on Apr 19, 2007 (gmt 0)

10+ Year Member



Yes, here's an example of a URL that gets created when I click on a link to be scrolled down the page:
www.domain.com/articles/article-name/#view

Do you suggest I change # to something else? If so, what?

5:40 pm on Apr 19, 2007 (gmt 0)

5+ Year Member



Yes, here's an example of a URL that gets created when I click on a link to be scrolled down the page:
www.domain.com/articles/article-name/#view

It just means that the browser will request www.domain.com/articles/article-name/ from the server, and scroll the page to the anchor named "view", if it finds one.
The webserver never sees a request with '#'. And if it does by the fault of a bad request, it replies with a 404 (since it has no page named "#view").

Do you suggest I change # to something else? If so, what?

You don't have the choice. # is used to navitage to anchors.
10:11 pm on Apr 19, 2007 (gmt 0)

10+ Year Member



Tx for the clarification.

Am concerned that this will cause the SE's to index both the non "#" page and the "#" page separately, and to discount the backlinks to one of them (probably the "#" version), instead of seeing them as the same page and adding the bl's together. That's what the previous ww URL reference was about - duplicate content.

2:20 am on Apr 20, 2007 (gmt 0)

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



just spoke with someone at google and they told me that using # in the url is not an issue and will not cause duplicate content.
2:49 am on Apr 20, 2007 (gmt 0)

10+ Year Member



It pays to know the right people. Tx Greg!
2:53 am on Apr 20, 2007 (gmt 0)

10+ Year Member



So the second part of the question - what would happen if folk linked to the "/#" as well as the "/" version of the page? Would G be able to recognise these as both being for "/" or would they be diluted?
6:34 pm on May 7, 2007 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Everything after, and including the #, is stripped out by Google. They don't index it. They see it all as one URL.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month