Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

How # in urls affects page indexing

         

FranticFish

12:19 pm on Jan 27, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm consulting on a site build where the developer is using # in urls instead of /, for example:

www.example.com/category#subcategory/item

I suspected this might cause issues as Google will treat anything after the # as an anchor point on (and therefore part of www.example.com/category).

I found [webmasterworld.com...] from 2006 which confirms that.

Is this still the situation?

gn_wendy

4:05 pm on Jan 27, 2010 (gmt 0)

10+ Year Member



whenever you divert from "best practices" and accepted "standards" G' and other engines have trouble applying their algorithms.

if G' (or any other bot) has trouble navigating and understanding your site, they will not index as much of it...

is there a reason you are not using slashes to divide categories and subcategories?

FranticFish

7:13 pm on Jan 27, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The site is on a Windows server in ASP.net and Flash, with a CSS / HTML version 'behind' what humans see.

The initial problem was that the url wasn't changing when Flash links were clicked. Refreshing the whole page slowed the site down, so swfaddress has been used to change the url. That is what is inserting the # into the file path instead of /.

The choice of Flash, Windows or method of execution aren't mine. I'm just trying to advise on the best url format and structure.

Everything I've seen indicates that Google will see any content after a # as a named anchor belonging to the file preceding the #. Is this correct?

downhiller80

4:19 am on Jan 28, 2010 (gmt 0)

10+ Year Member



I'm not 100% on this, but I don't think it'll even work (though presumably the developer has tried it...)

IIRC, anything after a "#" doesn't even get sent to the server. It's only use is for telling your browser which part of the page to show, so there's no need to send it as part of the request.

I've not tested it, but I definitely read that somewhere when I was trying to do something fiddly with an ajax application a few years ago.

TheMadScientist

4:48 am on Jan 28, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What downhiller80 said, with the exception that the information after the # is available to JavaScript (and possibly Flash, which is not read or indexed by search engines) so you can change the content on the page using the # symbol on an AJAX site, and as downhiller80 said information after the # symbol is NOT sent to the server, except by Safari, and possibly Chrome (I couldn't find the info when I looked and didn't feel like looking too much more than I did at the time).

Google suggested #! being used to denote an AJAX page state for indexing purposes, but I haven't heard anything about that since the thread here and if I remember correctly the w3c states information after the # symbol should be ignored by robots...

I'll let you do the research on the preceding or maybe someone else can chime in with the links if they have them handy, but the short of it is:

Sounds like either a bad plan or a really really bad plan depending on how much content is flash and how many URLs are effected by the # symbol...

You can't even get most compliant user-agents to send the information after the # to the server, and if you can the server does not handle them (in most cases I've heard of or tested) as part of the URL, so you can't even redirect them.

I tried to redirect URLs with a # symbol for about an hour and a half one night and could even detect it in the URL using Safari as the browser, and mod_rewrite is one of my 'things', so I would definitely advise against the idea of using them rather than a /. I'm not sure on Windows handling of them, but the short version again is IMO: It's a BAD idea.

FranticFish

9:22 am on Jan 28, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks for all the replies.

I wondered if the rel=canonical tag would be of use (I know this isn't what it was intended for) and I've set up a few tests using that.

If I understand correctly, it boils down to this:

Google and other UAs expect the # to be a location on a page whose content they have already indexed. They expect no further content that they don't already know of to be associated with the #, so it will not be indexed.

Even worse than I thought then!

downhiller80

12:10 pm on Jan 28, 2010 (gmt 0)

10+ Year Member



Does he have a good reason for using #? I doubt it. He may have good reason for not wanting to use /, in which case pick something else. "." maybe?

TheMadScientist

5:20 pm on Jan 28, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I wondered if the rel=canonical tag would be of use (I know this isn't what it was intended for) and I've set up a few tests using that.

IMO Not a bit.

I think you should tell the person you would like to follow web standards on the site and # is a named anchor indicator which can be used to change an AJAX page state (and possibly flash by flash), but using it as a substitute for a / is not one of it's uses... Like downhiller80 said again, but in other words: Why not use some other character (any other character) on the keyboard would probably be better to use, except maybe a couple... But - _ = + ? ~ . , are all options.

downhiller80

5:35 pm on Jan 28, 2010 (gmt 0)

10+ Year Member



Although "?" is possible, for his sanity I'd advise against that one too, could get pretty unpleasant as it also has a special "job" in a URL.

If your category/subcategories are numbers you could even use a letter - I have a site that has pages like /100x200x300/ when I want it to show a comparison between those three IDs. I don't recall why I chose not to use a comma, but I know I had a good reason at the time, hmm...

TheMadScientist

5:41 pm on Jan 28, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yeah, I threw it in there, because I was thinking if they wanted a URL portion not treated as part of the file path it might be useful, because information after a ? is passed to the server and not treated as part of the file path by the server, but information after the ? is treated as information passed to the location requested, so I was thinking it might function like the # does, except SEs index and treat information after the ? different than they do #... It could actually be the 'most correct' option now that I'm thinking about it a bit more? Of course if all the information is flash, then it really doesn't matter right now, because it's not going to be found by SEs anyway, so you can use about whatever you feel like...