homepage Welcome to WebmasterWorld Guest from 23.20.61.85
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
duplicate content indexing
coburn

10+ Year Member



 
Msg#: 3314096 posted 11:54 pm on Apr 17, 2007 (gmt 0)

Can't figure out how to implement this:
[webmasterworld.com...]

How do you achieve this using the robots.txt file? Right now we're stumped. We can't add a "noindex" instruction for "#" in robots.txt. The symbol "#" is seen as the beginning of the comment.

Let me know if you need to see the complete code we're trying to use.

[edited by: encyclo at 1:53 pm (utc) on April 18, 2007]
[edit reason] fixed link [/edit]

 

Achernar

5+ Year Member



 
Msg#: 3314096 posted 9:30 am on Apr 18, 2007 (gmt 0)

What do you mean by "#"? (sorry, no time to read 193 messages over 7 pages ;) )

If it is # as in [webmasterworld.com...] , "#someanchor" is never seen by the webserver, only by the browser, which strips it out before querying the server.

coburn

10+ Year Member



 
Msg#: 3314096 posted 2:38 pm on Apr 18, 2007 (gmt 0)

We're trying this:
User-agent: *
Disallow: *#*

Achernar

5+ Year Member



 
Msg#: 3314096 posted 12:07 pm on Apr 19, 2007 (gmt 0)

It won't change anything.
Have you seen urls with # in your logs?

coburn

10+ Year Member



 
Msg#: 3314096 posted 3:33 pm on Apr 19, 2007 (gmt 0)

Yes, here's an example of a URL that gets created when I click on a link to be scrolled down the page:
www.domain.com/articles/article-name/#view

Do you suggest I change # to something else? If so, what?

Achernar

5+ Year Member



 
Msg#: 3314096 posted 5:40 pm on Apr 19, 2007 (gmt 0)

Yes, here's an example of a URL that gets created when I click on a link to be scrolled down the page:
www.domain.com/articles/article-name/#view

It just means that the browser will request www.domain.com/articles/article-name/ from the server, and scroll the page to the anchor named "view", if it finds one.
The webserver never sees a request with '#'. And if it does by the fault of a bad request, it replies with a 404 (since it has no page named "#view").

Do you suggest I change # to something else? If so, what?

You don't have the choice. # is used to navitage to anchors.

coburn

10+ Year Member



 
Msg#: 3314096 posted 10:11 pm on Apr 19, 2007 (gmt 0)

Tx for the clarification.

Am concerned that this will cause the SE's to index both the non "#" page and the "#" page separately, and to discount the backlinks to one of them (probably the "#" version), instead of seeing them as the same page and adding the bl's together. That's what the previous ww URL reference was about - duplicate content.

goodroi

WebmasterWorld Administrator goodroi us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3314096 posted 2:20 am on Apr 20, 2007 (gmt 0)

just spoke with someone at google and they told me that using # in the url is not an issue and will not cause duplicate content.

coburn

10+ Year Member



 
Msg#: 3314096 posted 2:49 am on Apr 20, 2007 (gmt 0)

It pays to know the right people. Tx Greg!

coburn

10+ Year Member



 
Msg#: 3314096 posted 2:53 am on Apr 20, 2007 (gmt 0)

So the second part of the question - what would happen if folk linked to the "/#" as well as the "/" version of the page? Would G be able to recognise these as both being for "/" or would they be diluted?

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3314096 posted 6:34 pm on May 7, 2007 (gmt 0)

Everything after, and including the #, is stripped out by Google. They don't index it. They see it all as one URL.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved