Welcome to WebmasterWorld Guest from 54.160.177.33

Message Too Old, No Replies

How Google reads URL with "#"

     
5:08 am on Feb 1, 2006 (gmt 0)

Full Member

10+ Year Member

joined:May 4, 2005
posts:306
votes: 0


anyone who has an idea? I`m afraid it can cause I duplicated content because, for example, -www.domain.com/page1.htm, -www.domain.com/page1.htm#part2, -www.domain.com/page1.htm#part2, will basically point to a similar webpage.

Would it also cause a some sort of URL unfriendliness?

Thanks.

7:03 am on Feb 1, 2006 (gmt 0)

Preferred Member

10+ Year Member

joined:Feb 15, 2005
posts:380
votes: 0


I'd say #anchor is not a part of URL actually, browser doesn't send it to server with HTTP request.

Google doesn't treat /page.html#anchor as different URL than /page.html. It might be possible that keywords after # mark matter a little, but in Google links database everything after # is stripped.

I'm sure about this after checking with site: command one of my sites which uses #anchor links extensively. The site is old and completely crawled, links with # are showing in the source of my page in Google cache, but site: command shows only pure URLs.

8:19 pm on Feb 1, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 31, 2005
posts:1108
votes: 0


Google does not have a problem with the # in URL's, it correctly ingores them.

Some other badly written bots do however have problems with them and actually send requests with them. But these are usually the ones which are hostile bots (email harvesters, site copiers) that you don't want visiting anyway.

8:28 pm on Feb 1, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 24, 2005
posts:965
votes: 0


I'd say #anchor is not a part of URL actually

Agreed, the #anchor is not part of the URL, but it is a component of the URI

12:50 am on Feb 2, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


I linked to a page using ONLY /the.page.html#sectionA and /the.page.html#sectionB and Google indexed the page, and correctly listed it as /the.page.html with no problems at all.
1:00 am on Feb 2, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member jomaxx is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Nov 6, 2002
posts:4768
votes: 0


I've never had a problem with Googlebot not understanding this, but Google's AdSense ("Mediapartners") spider sometimes converts the "#" to a hex code and appends that to my page names -- resulting in a couple of dozen 404's every day.

Hard to believe they can't detect and fix this bug, but it's been going on for at least a year, as recently as yesterday.

1:31 am on Feb 2, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Send in a report using the feedback forms.