homepage Welcome to WebmasterWorld Guest from 54.234.59.94
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Dupe Content or Simple Anomaly? - extra directory in url
MrStitch




msg:3573839
 2:58 pm on Feb 13, 2008 (gmt 0)

Quite a while ago, the hot buzz was dupe content, and how it was possible to get your site knocked down through shady linking (or just plain mistakes by webmasters).

In my case, it happened because a higher profile site linked to my site as http://example.com instead of http://www.example.com. I fixed the problem, and I jumped up in the serps again.

Today, while going through some links at yahoo with link:www....., I found that I had priority pages with weird links from within my site.

Example of link location going to the home page: www.example.com/~mysi/product1.htm

If I key that into the url, it gives me the product page (which should be just www.example.com/product1.htm).

To me, this looks just like the dupe content fiasco when you keyed in the url with or without the WWW, and still got the same result.

I'm not completely knocked out of the serps for my key terms, but page three is not picnic either. It really should be better than that.

Should I fix it? How do I fix it? What causes it?

[edited by: tedster at 4:46 pm (utc) on Feb. 13, 2008]
[edit reason] switch to example.com [/edit]

 

tedster




msg:3573977
 4:48 pm on Feb 13, 2008 (gmt 0)

If I key that into the url, it gives me the product page

I don't understand why - your server configuration should not resolve urls with "extra" directories inserted. Are you using some form of url rewriting? If so, it needs some tweaking so the url cannot be "hacked" and still resolve with a 200 status.

MrStitch




msg:3574009
 5:13 pm on Feb 13, 2008 (gmt 0)

I do have some rewriting that was handed to me, that was suppose to be for fixing the www and non-www issues.

The code -

RewriteEngine On

#
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(([^/]+/)*)index\.html\ HTTP/
RewriteRule index\.html$ http://www.example.com/%1 [R=301,L]

#
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(([^/]+/)*)index\.htm\ HTTP/
RewriteRule index\.htm$ http://www.example.com/%1 [R=301,L]

#
RewriteCond %{HTTP_HOST} ^example\.com
RewriteRule (.*) http://www.example.com/$1 [R=permanent,L]

[edited by: tedster at 5:52 pm (utc) on Feb. 13, 2008]

tedster




msg:3574166
 8:11 pm on Feb 13, 2008 (gmt 0)

I can't see any reason those rules woud allow the insertion of an "extra" directory to resolve to the same content.

Bottom line, your server should respond with a 404 error for that url.

MrStitch




msg:3574169
 8:16 pm on Feb 13, 2008 (gmt 0)

Then, what if I just do a 301 from the oddball url to the actual one? That wouldn't do any harm, would it?

tedster




msg:3574172
 8:20 pm on Feb 13, 2008 (gmt 0)

It won't prevent other instances, but it sounds like a good patch for a one-off occurance and it would also preserve the backlink juice for you.

MrStitch




msg:3574175
 8:24 pm on Feb 13, 2008 (gmt 0)

Thanks bud, I'll run with it, and see what happens over the next couple of months.

WiseWebDude




msg:3574220
 9:09 pm on Feb 13, 2008 (gmt 0)

Or, in your robots.txt file add:

Disallow: /*~mysi/

Google and Yahoo respect wildcards, but MSN still doesn't yet, but that would stop it in Google and Yahoo anyway. Just make sure that isn't a REAL folder, LOL.

MrStitch




msg:3574311
 10:41 pm on Feb 13, 2008 (gmt 0)

This might sound strange, but now that you say folder....

I know absolutely for sure that there is NO folder called ~mysi or any variation there of.

However, I believe that the mysit (it IS suppose to be missing the last letter) is actually a login name. The host chops off the last letter of the domain, and makes that your login for certain things.

Think theres a chance that somehow the programing caused a nightmare wiring mess, resulting in odd urls tied to a username? Yeah.... that sounds crazy.

So, if I add that line in the robot.txt, that would just tell the robots to completely ignore anything related to said directory... even tho the directory isn't there in the first place?

How about I do both - 301 AND robot.txt?

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved