homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

Dupe Content or Simple Anomaly? - extra directory in url

 2:58 pm on Feb 13, 2008 (gmt 0)

Quite a while ago, the hot buzz was dupe content, and how it was possible to get your site knocked down through shady linking (or just plain mistakes by webmasters).

In my case, it happened because a higher profile site linked to my site as http://example.com instead of http://www.example.com. I fixed the problem, and I jumped up in the serps again.

Today, while going through some links at yahoo with link:www....., I found that I had priority pages with weird links from within my site.

Example of link location going to the home page: www.example.com/~mysi/product1.htm

If I key that into the url, it gives me the product page (which should be just www.example.com/product1.htm).

To me, this looks just like the dupe content fiasco when you keyed in the url with or without the WWW, and still got the same result.

I'm not completely knocked out of the serps for my key terms, but page three is not picnic either. It really should be better than that.

Should I fix it? How do I fix it? What causes it?

[edited by: tedster at 4:46 pm (utc) on Feb. 13, 2008]
[edit reason] switch to example.com [/edit]



 4:48 pm on Feb 13, 2008 (gmt 0)

If I key that into the url, it gives me the product page

I don't understand why - your server configuration should not resolve urls with "extra" directories inserted. Are you using some form of url rewriting? If so, it needs some tweaking so the url cannot be "hacked" and still resolve with a 200 status.


 5:13 pm on Feb 13, 2008 (gmt 0)

I do have some rewriting that was handed to me, that was suppose to be for fixing the www and non-www issues.

The code -

RewriteEngine On

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(([^/]+/)*)index\.html\ HTTP/
RewriteRule index\.html$ http://www.example.com/%1 [R=301,L]

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(([^/]+/)*)index\.htm\ HTTP/
RewriteRule index\.htm$ http://www.example.com/%1 [R=301,L]

RewriteCond %{HTTP_HOST} ^example\.com
RewriteRule (.*) http://www.example.com/$1 [R=permanent,L]

[edited by: tedster at 5:52 pm (utc) on Feb. 13, 2008]


 8:11 pm on Feb 13, 2008 (gmt 0)

I can't see any reason those rules woud allow the insertion of an "extra" directory to resolve to the same content.

Bottom line, your server should respond with a 404 error for that url.


 8:16 pm on Feb 13, 2008 (gmt 0)

Then, what if I just do a 301 from the oddball url to the actual one? That wouldn't do any harm, would it?


 8:20 pm on Feb 13, 2008 (gmt 0)

It won't prevent other instances, but it sounds like a good patch for a one-off occurance and it would also preserve the backlink juice for you.


 8:24 pm on Feb 13, 2008 (gmt 0)

Thanks bud, I'll run with it, and see what happens over the next couple of months.


 9:09 pm on Feb 13, 2008 (gmt 0)

Or, in your robots.txt file add:

Disallow: /*~mysi/

Google and Yahoo respect wildcards, but MSN still doesn't yet, but that would stop it in Google and Yahoo anyway. Just make sure that isn't a REAL folder, LOL.


 10:41 pm on Feb 13, 2008 (gmt 0)

This might sound strange, but now that you say folder....

I know absolutely for sure that there is NO folder called ~mysi or any variation there of.

However, I believe that the mysit (it IS suppose to be missing the last letter) is actually a login name. The host chops off the last letter of the domain, and makes that your login for certain things.

Think theres a chance that somehow the programing caused a nightmare wiring mess, resulting in odd urls tied to a username? Yeah.... that sounds crazy.

So, if I add that line in the robot.txt, that would just tell the robots to completely ignore anything related to said directory... even tho the directory isn't there in the first place?

How about I do both - 301 AND robot.txt?

Global Options:
 top home search open messages active posts  

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved