homepage Welcome to WebmasterWorld Guest from 184.73.52.98
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Link with a simple unwanted trailing '/'
Can do so much damage..!
Web_Savvy




msg:3197566
 8:59 pm on Dec 22, 2006 (gmt 0)

Some dynamic sections of our site are structured like this:

www.ourdomain.com/info.php

This URL produces a page that has some content and some internal links. Some of the internal links
are intended to be as follows:

www.ourdomain.com/link-1.ext (.htm/.php - whatever)
www.ourdomain.com/link-2.ext
www.ourdomain.com/link-3.ext
......

While looking (closely) through the logs recently, I found out that GoogleBot was requesting
(and crawling, with 200 response), URLs like this:

www.ourdomain.com/info.php/link-1.ext
www.ourdomain.com/info.php/link-2.ext
www.ourdomain.com/info.php/link-3.ext

This caught my eye, because we're certainly not publishing/using such URLs anywhere on the site.
Heck, I didn't even know what output such URLs would produce so I tried a few in the browser.

To my horror, I found that they ALL produced a (more or less) duplicate copy of the URL:

www.ourdomain.com/info.php

I'm sure if left unchecked, this would get a site in a horrible 'duplicate content' mess.
(Well, actually, this site of ours is already in this mess, for different reasons, but that's
besides the point here.)

I was quite sure that GoogleBot was not finding such URLs via our internal links, so I went
investigating further and found out that recently, one webmaster had 'kindly' linked to our URL
this way:

www.ourdomain.com/info.php/

I found that this URL (with the extra trailing /) produced our info.php page, with mal-formed links
in the format:

www.ourdomain.com/info.php/link-1.ext etc.

Done, I think. The root cause of the current problem detected.
But just to think about it, one little unwanted trailing / in an external inbound link can have the potential to cause a major disaster!

Now how to 'fix' this? I think the 'fix' would have to be (at least) a two-way fix:

1. Strengthen our scripts to do a strict validation of all arguments, to look for such unrequired
parameters and to deal with them in a consistent manner.
(I think this should be a good standard practice for all webmasters/developers, whether they've landed
in trouble with Google or not ;-) I know that this advice has been freely and frequently given out out here before, but there's
nothing like 'self-discovery' to make one a true believer ;-))

2. Make all the internal links absolute, always - or to use the base href (meta tag) on all pages?

3. Request the other webmaster to 'correct' the link

I don't quite know yet if .htaccess can also be deployed to help protect against such 'accidents'.
Perhaps tedster, g1smd, jdMorgan and other experts here would throw some light on all this.

 

proboscis




msg:3197730
 12:48 am on Dec 23, 2006 (gmt 0)

Yes, I think you can use this, it also corrects problems with people linking to you with a double slash.

Options +FollowSymLinks
RewriteEngine on
# Remove multiple slashes anywhere in URL
RewriteCond %{REQUEST_URI} ^(.*)//(.*)$
RewriteRule . http://www.example.com%1/%2 [R=301,L]
#
# Remove trailing slash if filetype present in URL
RewriteRule ^(.+\.[^/]+)/$ http://www.example.com/$1 [R=301,L]
# Remove extra URL-path info if filetype present in URL
RewriteRule ^([^.]+\.[^/]+)/ http://www.example.com/$1 [R=301,L]

jdMorgan




msg:3197734
 1:00 am on Dec 23, 2006 (gmt 0)

The second and third rule above are similar in design and function. You could remove the second rule and keep the third without changing the effect.

Be aware that neither the second nor the third rule will work properly if your URL-paths contain periods anywhere except preceding the filetype. E.G. a URL such as example.com/my.files/index.php will break either rule.

Jim

theBear




msg:3197764
 1:54 am on Dec 23, 2006 (gmt 0)

Oh, what a tangled web we weave.

OptiRex




msg:3197766
 2:01 am on Dec 23, 2006 (gmt 0)

That'll larn ya...!

Web_Savvy




msg:3197902
 5:54 am on Dec 23, 2006 (gmt 0)

Thanks for all the input, proboscis and jdMorgan.

Will certainly try this (also) and see how it goes.

Web_Savvy




msg:3197905
 5:59 am on Dec 23, 2006 (gmt 0)

Oh, what a tangled web we weave.

When we first try to receive (?)
i.e. receive traffic

Web_Savvy




msg:3197906
 6:12 am on Dec 23, 2006 (gmt 0)

Well,

He who 'steals' my traffic steals trash;
'tis something, nothing, 'twas mine, 'tis his
and has been a slave to thousands.

[But] He who links to me incorrectly robs me
of that which not enriches him,
and makes me poor indeed.

proboscis




msg:3197965
 8:47 am on Dec 23, 2006 (gmt 0)

Thanks for all the input, proboscis and jdMorgan.

Thank jdMorgan, if I remember correctly he is the one who helped me figure that out in the first place - and nice poem hehe

Web_Savvy




msg:3198002
 9:42 am on Dec 23, 2006 (gmt 0)

> Thank jdMorgan

I already did, above :-)
But I got the order of the 'condition/rule' reversed ;-)

> and nice poem hehe

Obviously not mine, a para-phrased version of
the Great man's original work.

BTW, this code works like a charm:

Options +FollowSymLinks
RewriteEngine on
RewriteCond %{REQUEST_URI} ^(.*)//(.*)$
RewriteRule . http://www.example.com%1/%2 [R=301,L]
RewriteRule ^([^.]+\.[^/]+)/ http://www.example.com/$1 [R=301,L]

(With the second rule removed, per jdMorgan's recommendation.)

theBear




msg:3198122
 1:46 pm on Dec 23, 2006 (gmt 0)

Web_Savvy,

receive huh?

I was thinking in other terms, but then I do think differently than the average bear.

Motive and oppurtunity do play in the wobbly web game.

Web_Savvy




msg:3198141
 2:16 pm on Dec 23, 2006 (gmt 0)

theBear:

> Motive and oppurtunity do play in the wobbly web game.

and how! :-)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved