Welcome to WebmasterWorld Guest from 54.204.198.71

Forum Moderators: goodroi

Message Too Old, No Replies

duplicate content indexing

     
11:54 pm on Apr 17, 2007 (gmt 0)

Full Member

10+ Year Member

joined:May 16, 2004
posts:218
votes: 0


Can't figure out how to implement this:
[webmasterworld.com...]

How do you achieve this using the robots.txt file? Right now we're stumped. We can't add a "noindex" instruction for "#" in robots.txt. The symbol "#" is seen as the beginning of the comment.

Let me know if you need to see the complete code we're trying to use.

[edited by: encyclo at 1:53 pm (utc) on April 18, 2007]
[edit reason] fixed link [/edit]

9:30 am on Apr 18, 2007 (gmt 0)

Full Member

10+ Year Member

joined:Dec 3, 2006
posts:257
votes: 0


What do you mean by "#"? (sorry, no time to read 193 messages over 7 pages ;) )

If it is # as in [webmasterworld.com...] , "#someanchor" is never seen by the webserver, only by the browser, which strips it out before querying the server.

2:38 pm on Apr 18, 2007 (gmt 0)

Full Member

10+ Year Member

joined:May 16, 2004
posts: 218
votes: 0


We're trying this:
User-agent: *
Disallow: *#*
12:07 pm on Apr 19, 2007 (gmt 0)

Full Member

10+ Year Member

joined:Dec 3, 2006
posts:257
votes: 0


It won't change anything.
Have you seen urls with # in your logs?
3:33 pm on Apr 19, 2007 (gmt 0)

Full Member

10+ Year Member

joined:May 16, 2004
posts:218
votes: 0


Yes, here's an example of a URL that gets created when I click on a link to be scrolled down the page:
www.domain.com/articles/article-name/#view

Do you suggest I change # to something else? If so, what?

5:40 pm on Apr 19, 2007 (gmt 0)

Full Member

10+ Year Member

joined:Dec 3, 2006
posts:257
votes: 0


Yes, here's an example of a URL that gets created when I click on a link to be scrolled down the page:
www.domain.com/articles/article-name/#view

It just means that the browser will request www.domain.com/articles/article-name/ from the server, and scroll the page to the anchor named "view", if it finds one.
The webserver never sees a request with '#'. And if it does by the fault of a bad request, it replies with a 404 (since it has no page named "#view").

Do you suggest I change # to something else? If so, what?

You don't have the choice. # is used to navitage to anchors.
10:11 pm on Apr 19, 2007 (gmt 0)

Full Member

10+ Year Member

joined:May 16, 2004
posts:218
votes: 0


Tx for the clarification.

Am concerned that this will cause the SE's to index both the non "#" page and the "#" page separately, and to discount the backlinks to one of them (probably the "#" version), instead of seeing them as the same page and adding the bl's together. That's what the previous ww URL reference was about - duplicate content.

2:20 am on Apr 20, 2007 (gmt 0)

Administrator from US 

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:June 21, 2004
posts:3156
votes: 130


just spoke with someone at google and they told me that using # in the url is not an issue and will not cause duplicate content.
2:49 am on Apr 20, 2007 (gmt 0)

Full Member

10+ Year Member

joined:May 16, 2004
posts:218
votes: 0


It pays to know the right people. Tx Greg!
2:53 am on Apr 20, 2007 (gmt 0)

Full Member

10+ Year Member

joined:May 16, 2004
posts:218
votes: 0


So the second part of the question - what would happen if folk linked to the "/#" as well as the "/" version of the page? Would G be able to recognise these as both being for "/" or would they be diluted?
6:34 pm on May 7, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Everything after, and including the #, is stripped out by Google. They don't index it. They see it all as one URL.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members