Welcome to WebmasterWorld Guest from 23.20.241.155

Message Too Old, No Replies

How Google reads URL with "#"

     
5:08 am on Feb 1, 2006 (gmt 0)

10+ Year Member



anyone who has an idea? I`m afraid it can cause I duplicated content because, for example, -www.domain.com/page1.htm, -www.domain.com/page1.htm#part2, -www.domain.com/page1.htm#part2, will basically point to a similar webpage.

Would it also cause a some sort of URL unfriendliness?

Thanks.

7:03 am on Feb 1, 2006 (gmt 0)

10+ Year Member



I'd say #anchor is not a part of URL actually, browser doesn't send it to server with HTTP request.

Google doesn't treat /page.html#anchor as different URL than /page.html. It might be possible that keywords after # mark matter a little, but in Google links database everything after # is stripped.

I'm sure about this after checking with site: command one of my sites which uses #anchor links extensively. The site is old and completely crawled, links with # are showing in the source of my page in Google cache, but site: command shows only pure URLs.

8:19 pm on Feb 1, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google does not have a problem with the # in URL's, it correctly ingores them.

Some other badly written bots do however have problems with them and actually send requests with them. But these are usually the ones which are hostile bots (email harvesters, site copiers) that you don't want visiting anyway.

8:28 pm on Feb 1, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'd say #anchor is not a part of URL actually

Agreed, the #anchor is not part of the URL, but it is a component of the URI

12:50 am on Feb 2, 2006 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I linked to a page using ONLY /the.page.html#sectionA and /the.page.html#sectionB and Google indexed the page, and correctly listed it as /the.page.html with no problems at all.
1:00 am on Feb 2, 2006 (gmt 0)

WebmasterWorld Senior Member jomaxx is a WebmasterWorld Top Contributor of All Time 10+ Year Member



I've never had a problem with Googlebot not understanding this, but Google's AdSense ("Mediapartners") spider sometimes converts the "#" to a hex code and appends that to my page names -- resulting in a couple of dozen 404's every day.

Hard to believe they can't detect and fix this bug, but it's been going on for at least a year, as recently as yesterday.

1:31 am on Feb 2, 2006 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Send in a report using the feedback forms.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month