Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Yet Another Google Canonical Problem in IIS 6.0

new way to trigger dup. content penalty

         

Shurik

6:09 pm on Oct 27, 2005 (gmt 0)

10+ Year Member



After loosing one site to dup. content and 302 hijack penalty last year
I became very careful how my server replies to unexpected requests.

With the new site of mine I thought I anticipated every possible avenue of abuse. Well I was wrong. One thing I missed was a “/” after an aspx extension. Apparently IIS 6.0 has a bug (or a feature) that answers “200 OK” for requests like this: www.mysite.com/page.aspx/ where last slash gets ignored and page.aspx gets returned instead of the default file from the directory

Guess what, some scraper linked to the whole bunch of my pages with a trailing slash and both google and yahoo (but not MSN though) happily picked it up and indexed dup pages.

Needless to say I’m out of yahoo and google. I don’t know whether trailing slashes is the cause of demotion or a consequence of some other penalty. This is getting really, really tiresome…

Shurik

9:49 pm on Oct 28, 2005 (gmt 0)

10+ Year Member



I guess IIS 6.0 is not very popular in this community. Googliers, if you're reading this forum please take notice of the IIS feature described above

g1smd

11:06 pm on Oct 28, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This is related to the trailing / problem in Apache that I raised a few months ago. I have no idea if IIS might have the same issue.

.

Beware of a 301 redirect from non-www to www where the defaultsitename is domain.com and where you are linking to a folder, and where you forget to add the trailing / to the URL in the link.

If you forget the trailing / then your link to www.domain.com/folder will first be redirected to domain.com/folder/ {without www!} before arriving at the required www.domain.com/folder/ page.

The intermediate step, at domain.com/folder/ will kill your listings. Luckily, this effect is very easy to see if you use Xenu LinkSleuth to check your site: it shows up as reporting double the number of pages (when you generate the sitemap) that you actually have, with half of the pages having a title of "301 Moved".

theBear

11:17 pm on Oct 28, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The problem is one of many, one that should have zero impact but can and probably does.

It is getting to the point where you will have to have a fully programable filter handling all inbound requests and basiclly rejecting anything not in a valid page list.

g1smd

11:20 pm on Oct 28, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It has always been possible to link to any page with extra parameters added to the real URL, parameters that the server is never going to process, and then see the fake URL indexed just days later - and retained for many months after the link is removed too...

girish

12:49 pm on Nov 15, 2005 (gmt 0)

10+ Year Member



g1smd - "The intermediate step, at domain.com/folder/ will kill your listings. Luckily, this effect is very easy to see if you use Xenu LinkSleuth to check your site: it shows up as reporting double the number of pages (when you generate the sitemap) that you actually have, with half of the pages having a title of "301 Moved".

How exactly do we do this test?

Example- I loaded Xenu / File / Check URL and typed in "domain.com" (without quotations) and left the CHECK EXTERNAL LINKS box checked.

Xenu returns:
[domain.com...] -status ok
[domain.com...] -status ok

Is this the test you are referring to?

g1smd

8:30 pm on Nov 15, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Use Xenu to check all of your internal links, and then get Xenu to generate a report. In the report (near the end) is a simply linked sitemap list. Inspect that.

>> Xenu returns:
>> http://domain.com/ -status ok
>> http://www.domain.com/ -status ok

That isn't correct. One of those (usually www) should issue 200 OK and the other (usually non-www) should issue a 301 redirect to the www version for all pages of the site.