http://www.webmasterworld.com Welcome to WebmasterWorld Guest from 38.103.63.18
register, login, search, glossary, subscribe, help, library, PubCon, announcements , recent posts, unanswered posts
PubCon Exhibitor
Home / Forums Index / The Google World / Google Search News
Forum Library : Charter : Moderators: Receptional Andy & Robert Charlton & lawman & tedster

Google Search News

  
How Google reads URL with "#"
jcmiras


#:766668
 5:08 am on Feb. 1, 2006 (utc 0)

anyone who has an idea? I`m afraid it can cause I duplicated content because, for example, -www.domain.com/page1.htm, -www.domain.com/page1.htm#part2, -www.domain.com/page1.htm#part2, will basically point to a similar webpage.

Would it also cause a some sort of URL unfriendliness?

Thanks.

Wizard


#:766669
 7:03 am on Feb. 1, 2006 (utc 0)

I'd say #anchor is not a part of URL actually, browser doesn't send it to server with HTTP request.

Google doesn't treat /page.html#anchor as different URL than /page.html. It might be possible that keywords after # mark matter a little, but in Google links database everything after # is stripped.

I'm sure about this after checking with site: command one of my sites which uses #anchor links extensively. The site is old and completely crawled, links with # are showing in the source of my page in Google cache, but site: command shows only pure URLs.

Dijkgraaf


#:766670
 8:19 pm on Feb. 1, 2006 (utc 0)

Google does not have a problem with the # in URL's, it correctly ingores them.

Some other badly written bots do however have problems with them and actually send requests with them. But these are usually the ones which are hostile bots (email harvesters, site copiers) that you don't want visiting anyway.

mrMister


#:766671
 8:28 pm on Feb. 1, 2006 (utc 0)

I'd say #anchor is not a part of URL actually

Agreed, the #anchor is not part of the URL, but it is a component of the URI

g1smd


#:766672
 12:50 am on Feb. 2, 2006 (utc 0)

I linked to a page using ONLY /the.page.html#sectionA and /the.page.html#sectionB and Google indexed the page, and correctly listed it as /the.page.html with no problems at all.

jomaxx


#:766673
 1:00 am on Feb. 2, 2006 (utc 0)

I've never had a problem with Googlebot not understanding this, but Google's AdSense ("Mediapartners") spider sometimes converts the "#" to a hex code and appends that to my page names -- resulting in a couple of dozen 404's every day.

Hard to believe they can't detect and fix this bug, but it's been going on for at least a year, as recently as yesterday.

g1smd


#:766674
 1:31 am on Feb. 2, 2006 (utc 0)

Send in a report using the feedback forms.

 

Home / Forums Index / The Google World / Google Search News
All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
WebmasterWorld ® and PubCon ® are a Registered Trademarks of WebmasterWorld Inc.
© WebmasterWorld Inc. / SearchEngineWorld 1996-2008 all rights reserved