Welcome to WebmasterWorld Guest from 54.234.8.146

Message Too Old, No Replies

Dupe Content or Simple Anomaly? - extra directory in url

     

MrStitch

2:58 pm on Feb 13, 2008 (gmt 0)

5+ Year Member



Quite a while ago, the hot buzz was dupe content, and how it was possible to get your site knocked down through shady linking (or just plain mistakes by webmasters).

In my case, it happened because a higher profile site linked to my site as http://example.com instead of http://www.example.com. I fixed the problem, and I jumped up in the serps again.

Today, while going through some links at yahoo with link:www....., I found that I had priority pages with weird links from within my site.

Example of link location going to the home page: www.example.com/~mysi/product1.htm

If I key that into the url, it gives me the product page (which should be just www.example.com/product1.htm).

To me, this looks just like the dupe content fiasco when you keyed in the url with or without the WWW, and still got the same result.

I'm not completely knocked out of the serps for my key terms, but page three is not picnic either. It really should be better than that.

Should I fix it? How do I fix it? What causes it?

[edited by: tedster at 4:46 pm (utc) on Feb. 13, 2008]
[edit reason] switch to example.com [/edit]

tedster

4:48 pm on Feb 13, 2008 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



If I key that into the url, it gives me the product page

I don't understand why - your server configuration should not resolve urls with "extra" directories inserted. Are you using some form of url rewriting? If so, it needs some tweaking so the url cannot be "hacked" and still resolve with a 200 status.

MrStitch

5:13 pm on Feb 13, 2008 (gmt 0)

5+ Year Member



I do have some rewriting that was handed to me, that was suppose to be for fixing the www and non-www issues.

The code -

RewriteEngine On

#
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(([^/]+/)*)index\.html\ HTTP/
RewriteRule index\.html$ http://www.example.com/%1 [R=301,L]

#
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(([^/]+/)*)index\.htm\ HTTP/
RewriteRule index\.htm$ http://www.example.com/%1 [R=301,L]

#
RewriteCond %{HTTP_HOST} ^example\.com
RewriteRule (.*) http://www.example.com/$1 [R=permanent,L]

[edited by: tedster at 5:52 pm (utc) on Feb. 13, 2008]

tedster

8:11 pm on Feb 13, 2008 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



I can't see any reason those rules woud allow the insertion of an "extra" directory to resolve to the same content.

Bottom line, your server should respond with a 404 error for that url.

MrStitch

8:16 pm on Feb 13, 2008 (gmt 0)

5+ Year Member



Then, what if I just do a 301 from the oddball url to the actual one? That wouldn't do any harm, would it?

tedster

8:20 pm on Feb 13, 2008 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



It won't prevent other instances, but it sounds like a good patch for a one-off occurance and it would also preserve the backlink juice for you.

MrStitch

8:24 pm on Feb 13, 2008 (gmt 0)

5+ Year Member



Thanks bud, I'll run with it, and see what happens over the next couple of months.

WiseWebDude

9:09 pm on Feb 13, 2008 (gmt 0)

5+ Year Member



Or, in your robots.txt file add:

Disallow: /*~mysi/

Google and Yahoo respect wildcards, but MSN still doesn't yet, but that would stop it in Google and Yahoo anyway. Just make sure that isn't a REAL folder, LOL.

MrStitch

10:41 pm on Feb 13, 2008 (gmt 0)

5+ Year Member



This might sound strange, but now that you say folder....

I know absolutely for sure that there is NO folder called ~mysi or any variation there of.

However, I believe that the mysit (it IS suppose to be missing the last letter) is actually a login name. The host chops off the last letter of the domain, and makes that your login for certain things.

Think theres a chance that somehow the programing caused a nightmare wiring mess, resulting in odd urls tied to a username? Yeah.... that sounds crazy.

So, if I add that line in the robot.txt, that would just tell the robots to completely ignore anything related to said directory... even tho the directory isn't there in the first place?

How about I do both - 301 AND robot.txt?

 

Featured Threads

Hot Threads This Week

Hot Threads This Month