Forum Moderators: Robert Charlton & goodroi
www.example.com/category#subcategory/item
I suspected this might cause issues as Google will treat anything after the # as an anchor point on (and therefore part of www.example.com/category).
I found [webmasterworld.com...] from 2006 which confirms that.
Is this still the situation?
if G' (or any other bot) has trouble navigating and understanding your site, they will not index as much of it...
is there a reason you are not using slashes to divide categories and subcategories?
The initial problem was that the url wasn't changing when Flash links were clicked. Refreshing the whole page slowed the site down, so swfaddress has been used to change the url. That is what is inserting the # into the file path instead of /.
The choice of Flash, Windows or method of execution aren't mine. I'm just trying to advise on the best url format and structure.
Everything I've seen indicates that Google will see any content after a # as a named anchor belonging to the file preceding the #. Is this correct?
IIRC, anything after a "#" doesn't even get sent to the server. It's only use is for telling your browser which part of the page to show, so there's no need to send it as part of the request.
I've not tested it, but I definitely read that somewhere when I was trying to do something fiddly with an ajax application a few years ago.
Google suggested #! being used to denote an AJAX page state for indexing purposes, but I haven't heard anything about that since the thread here and if I remember correctly the w3c states information after the # symbol should be ignored by robots...
I'll let you do the research on the preceding or maybe someone else can chime in with the links if they have them handy, but the short of it is:
Sounds like either a bad plan or a really really bad plan depending on how much content is flash and how many URLs are effected by the # symbol...
You can't even get most compliant user-agents to send the information after the # to the server, and if you can the server does not handle them (in most cases I've heard of or tested) as part of the URL, so you can't even redirect them.
I tried to redirect URLs with a # symbol for about an hour and a half one night and could even detect it in the URL using Safari as the browser, and mod_rewrite is one of my 'things', so I would definitely advise against the idea of using them rather than a /. I'm not sure on Windows handling of them, but the short version again is IMO: It's a BAD idea.
I wondered if the rel=canonical tag would be of use (I know this isn't what it was intended for) and I've set up a few tests using that.
If I understand correctly, it boils down to this:
Google and other UAs expect the # to be a location on a page whose content they have already indexed. They expect no further content that they don't already know of to be associated with the #, so it will not be indexed.
Even worse than I thought then!
I wondered if the rel=canonical tag would be of use (I know this isn't what it was intended for) and I've set up a few tests using that.
IMO Not a bit.
I think you should tell the person you would like to follow web standards on the site and # is a named anchor indicator which can be used to change an AJAX page state (and possibly flash by flash), but using it as a substitute for a / is not one of it's uses... Like downhiller80 said again, but in other words: Why not use some other character (any other character) on the keyboard would probably be better to use, except maybe a couple... But - _ = + ? ~ . , are all options.
If your category/subcategories are numbers you could even use a letter - I have a site that has pages like /100x200x300/ when I want it to show a comparison between those three IDs. I don't recall why I chose not to use a comma, but I know I had a good reason at the time, hmm...