homepage Welcome to WebmasterWorld Guest from 54.237.122.241
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / WebmasterWorld / Webmaster General
Forum Library, Charter, Moderators: phranque & physics

Webmaster General Forum

    
url anomalies
numbers, symbols or letters tacked on to the end of urls
dougwilson




msg:4510966
 7:13 pm on Oct 22, 2012 (gmt 0)

Recently I've getting a few strange anomalies with my own urls. Not sure how to explain except sometimes when I go to one of my pages I see something extra.

Today I got this: domain/folder/page.html#.UIWXAHYknB4

I might add this is a page with a feed reader on it.

 

phranque




msg:4511167
 8:05 am on Oct 23, 2012 (gmt 0)

everything after the hash mark (#) is considered a fragment identifier.

therefore the url of the document is http://example.com/domain/folder/page.html and .UIWXAHYknB4 refers to (typically) an id attribute of an HTML element in that document.

lucy24




msg:4511179
 8:36 am on Oct 23, 2012 (gmt 0)

But why would it show up in a request? Some browsers (Safari?) include it in referers, but hash marks have no business on the Internet. Unless they're in java-thingie. Um. The one whose name I always forget that throws /#/ into the middle of each URL just to be difficult.

Or are you talking about pages of yours that you're revisiting? If you went to #.et cetera on the previous visit, your browser may helpfully fill it in again. That would be the mundane explanation.

But #. ? With a . ? Really? Don't like the looks of that in any case. Isn't a leading dot prohibited in anchors of any kind?

phranque




msg:4511215
 9:47 am on Oct 23, 2012 (gmt 0)

sometimes when I go to one of my pages I see something extra

But why would it show up in a request?

that description is not specific enough to assume the anchor was included in a request.
i assumed this was all happening in the browser in which case it could just be a bug in the markup.

Isn't a leading dot prohibited in anchors of any kind?

yup.

http://www.w3.org/TR/html401/types.html#type-name
ID and NAME tokens must begin with a letter ([A-Za-z]) and may be followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), and periods (".").

phranque




msg:4511217
 10:11 am on Oct 23, 2012 (gmt 0)

but not a problem for HTML5.

http://www.w3.org/TR/html-markup/global-attributes.html#common.attrs.id
Any string, with the following restrictions:
* must be at least one character long
* must not contain any space characters

NOTE: Previous versions of HTML placed greater restrictions on the content of ID values (for example, they did not permit ID values to begin with a number).



XHTML specifies an XML ID type for id attributes which starts with a "NameStartChar" defined here.
http://www.w3.org/TR/REC-xml/#sec-common-syn
NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]



i did a quick test and although a '.' is not a legal start character for a fragment identifier in XHTML i didn't see any validation complaints and it worked properly as a fragment identifier in the browser.

dougwilson




msg:4511287
 2:04 pm on Oct 23, 2012 (gmt 0)

good answers, I'm reading/learning about fragment identifiers now.

I can say that "#.UIWXAHYknB4" is coming from add-this script I recently put in a fixed div at bottom of page. Before I've always served images (bookmark) from my server or simply written Bookmark (text) instead of image.

So that's the why 'now' answer.

I'm reading here:

jenitennison (dot) com/blog/node/154

Oh, using <!DOCTYPE html>. CSS called with SSI as doctype.txt - just FYI

g1smd




msg:4511418
 7:48 pm on Oct 23, 2012 (gmt 0)

Is this a Joomla site? There's several extensions and add-ons that can cause this type of behaviour.

I guess there might be other types of site where this might occur.

dougwilson




msg:4511433
 8:31 pm on Oct 23, 2012 (gmt 0)

No, html static pages.

Note: Doesn't occur on home page (com$)
only when html$

lucy24




msg:4511485
 12:21 am on Oct 24, 2012 (gmt 0)

Hm, now that's interesting. When you say "home page" does that also include other directory index pages? That is, anything ending in / rather than .html?

So it's going by what the address bar says, rather than by the "real" content. In the case of your home page that's www.example.com/index.html but the last part is the result of a behind-the-scenes rewrite.

Do you have code in place to redirect requests for
www.example.com/index.html
when it's written out like that? Anything different in the browser's address bar? Same for other index.html pages, if any.

dougwilson




msg:4511527
 2:14 am on Oct 24, 2012 (gmt 0)

Public_html/index.html has no / just com

only other index.html are in folders and so end folder/page/

Those show the /#.UIdGAXYknB4 same as .html

"...the result of a behind-the-scenes rewrite.
Do you have code in place to redirect requests for
www.example.com/index.html..."

I don't think I've placed anything in .htaccess. I think it's set up that way on server. Now index.php will show unless rewrite is present but not index.html.

This is the script from add-this. The data-track part is the part that's different from the old script I used.

<div id="addThis">
<!-- AddThis Button BEGIN -->
<div class="addthis_toolbox addthis_default_style addthis_32x32_style">
<a class="addthis_button_preferred_1"></a>
<a class="addthis_button_preferred_2"></a>
<a class="addthis_button_preferred_3"></a>
<a class="addthis_button_reddit"></a>
<a class="addthis_button_compact"></a>
<a class="addthis_counter addthis_bubble_style"></a>
</div>
<script type="text/javascript">var addthis_config = {"data_track_addressbar":true};</script>
<script type="text/javascript" src="http://s7.addthis.com/js/300/addthis_widget.js#pubid=username"></script>
<!-- AddThis Button END -->
</div>


From reading a little I gather this is acting like an app? Needed to get stats at Add-This?

If I change this "{"data_track_addressbar":true}" to false the #.numbers go away in url.

On page code is linked (?) somehow to share button in toolbar. So if someone shares page it has anchor? Identifier? Also, the numbers change with each refresh.

You can tell I know nothing about this and am just putting whatever I notice up for people to look at. I think it's just Add-This's way of knowing what to write. Like any stat script. I've just never seen url's (mine) like this before.

lucy24




msg:4511545
 4:53 am on Oct 24, 2012 (gmt 0)

If I change this "{"data_track_addressbar":true}" to false the #.numbers go away in url.

Ah ha. If you do this, does the information still get tracked the way you want? I can't imagine why it would default to the messy version. It should all be happening quietly and invisibly behind the scenes.




Oh yes and: When I said "index.html" I meant "index.whatever-extension-you-really-use".

Now index.php will show unless rewrite is present but not index.html

?
Here's what should happen:

request for
directory/
address bar says directory/ while server secretly serves content from directory/index.whatever-extension-you-use

request for
directory/index.extension-you-really-use
should be redirected to directory/ alone, and then continue as above. With humans, this would all happen automatically.

request for
directory/index.extension-you-don't-use
should be either redirected to directory/ alone, or get a 404 at the gate. Your choice, depending on circumstances. But it should absolutely not serve content from /index.some-other-extension

dougwilson




msg:4511739
 3:16 pm on Oct 24, 2012 (gmt 0)

"does the information still get tracked the way you want?"

Don't know, just now sent question(s) to Add-This support


request for
directory/
If index.html then domain/folder/

I have rule for WP blogs so - If index.php then domain/folder/

The only place index.php pops up is /forum/ (SMF 2.0)

I don't like it (forum), I've just neglected addressing it

I'll let you know what Add-This says

dougwilson




msg:4512466
 11:26 pm on Oct 25, 2012 (gmt 0)

Okay, from preliminary reading the word is:

SE's drop everything from # on, so ,htm#numbers is .htm

# and after for tracking add-this analytics

Bookmarking (Me, my page) includes # and gets tracked

Remove true, add false = no track, no analytics

Don't know what user experience is. Probably most people don't even notice it

And I gotta say the code is fast. Of course mine is <here><body><html> but still - Mundo CDN's? Stats are also good since it tracks shares etc.

Okay thanks for getting me started in the right direction

Any experimenting with my pages and code is welcomed. See how they run. Thanks, Doug

g1smd




msg:4512469
 11:34 pm on Oct 25, 2012 (gmt 0)

Of course mine is <here><body><html>

What does this actually mean?

dougwilson




msg:4512520
 3:10 am on Oct 26, 2012 (gmt 0)

Sorry, <script></script></body></html>

Alex_TJ




msg:4515353
 11:50 am on Nov 3, 2012 (gmt 0)

Doug you'll also get Analytics with that option set to false. Your Analytics will show all the usual sharing data.
That hash tag is just used for tracking URL copying and pasting, when a visitor doesn't use the addthis services but just copies the URL from their browser.
I've turned that feature off as we use hashes for another reason. What kind of percentage are you seeing for visitors who share the URL in this way, compared to clicking the usual fb, twitter, etc. icons?

dougwilson




msg:4515404
 5:42 pm on Nov 3, 2012 (gmt 0)

Thanks, that's good to know

"What kind of percentage are you seeing..."

Not sure, where do I look? I see Services By Category > others

But not sure what that means ... I'll look around

Alex_TJ




msg:4515448
 9:22 pm on Nov 3, 2012 (gmt 0)

I don't have it activated so I can't be certain, but it should be something like 'copied and pasted' under the services list.
Just curious if you're seeing a large percentage of your visitors sharing content by copying the URL compared to just using the addthis sharing buttons.

dougwilson




msg:4515613
 3:39 pm on Nov 4, 2012 (gmt 0)

If and when I find out ( sharing content by copying the URL compared to just using the addthis sharing ) I post here.

I will find out... Thanks for alerting me to the option

dougwilson




msg:4518303
 12:01 pm on Nov 12, 2012 (gmt 0)

From Add-This:

"#AHb4gs1hwck" is a semi-random value which identifies each page view. When a user clicks on an URL like this we'll know that they were the recipient of an address bar share and we'll count a share and a click for your site. This tag contains the time that the page was viewed by the sharer so we can properly attribute the share. If that recipient subsequently shares your page to someone else, we'll be able to measure it separately as a "re-share", taking into account the various generations of your viral sharing
Sgt_Kickaxe




msg:4519788
 8:43 pm on Nov 16, 2012 (gmt 0)

Embrace canonical and noindex tags. It's not just services like add-this that alter your urls, your own CMS is likely doing it too. example: Wordpress, add a pair of periods at the end of an article url and the url still returns the article on most sites. Not good.

dougwilson




msg:4520725
 1:08 pm on Nov 20, 2012 (gmt 0)

"...still returns the article on most sites. Not good."

Why is that?

dougwilson




msg:4520726
 1:08 pm on Nov 20, 2012 (gmt 0)

I should have said, Why is that no good?

phranque




msg:4520760
 2:38 pm on Nov 20, 2012 (gmt 0)

Why is that no good?


2 or more URLs returning the same content - only one of them may be canonical.

dougwilson




msg:4520785
 4:05 pm on Nov 20, 2012 (gmt 0)

Is mis-spelled a url?

I understand duplicate. I understand, what folks say, about canonical.

I'd like simple typos or Case O's to resolve if possible.

Doesn't mean "...html++++++++++?Admin/login" gets through.

But if /cOokie(dot)html makes it to /cookie(dot)html I'm okay with it.

phranque




msg:4520810
 5:21 pm on Nov 20, 2012 (gmt 0)

everything after the hostname in a url is case-sensitive.

http://www.example.com/cOokie.html and http://www.example.com/cookie.html are different urls and may both be indexed separately.

if you serve a 200 OK and the same content for a typoed URL - that is non-canonical.

dougwilson




msg:4520876
 8:38 pm on Nov 20, 2012 (gmt 0)

It'll just have to be that way (non-canonical) cause I've got

CheckSpelling On

and every now and then a user makes it through to the page.

(info) if I add a couple of periods to the end of my WP post. Like domain/wp/somepost/.. I get taken to domain/wp/

lucy24




msg:4520912
 11:43 pm on Nov 20, 2012 (gmt 0)

CheckSpelling On

Funny you should say that, because I took a big detour into mod_speling and then discovered I haven't got it installed (I used to but they seem to have removed it) and didn't want to say anything without personal testing.

Apache docs do say that mod_speling uses a redirect, so you don't have the Duplicate Content problem. An even safer option is to say

CheckCaseOnly On

so you don't have to worry about what happens when you've got a sensitive file named imdex.html.

I get taken to domain/wp/

This is probably OK so long as your address bar changes. So it's an explicit redirect rather than a rewrite.

dougwilson




msg:4521098
 2:11 pm on Nov 21, 2012 (gmt 0)

I've got:

RewriteEngine On
CheckSpelling On
CheckCaseOnly On
RewriteBase /
ServerSignature Off
Options -Indexes +Includes +ExecCGI +FollowSymlinks
DefaultLanguage en-US
AddDefaultCharset UTF-8

Don't know much but I've gotten better at testing things. No weirdness from the above...

"...they seem to have removed it" NO! Not the They!

"...OK so long as your address bar changes" It does.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / Webmaster General
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved