Duplicate content problem with ISAPI filter?

Forum Moderators: open

Message Too Old, No Replies

Duplicate content problem with ISAPI filter?

Marcia

8:05 am on Jul 22, 2004 (gmt 0)

There's a concern about whether duplicates will be picked up and cause problems with using an ISAPI filter, with there being actually more than one URL generated for the same page. It looks something like this

www.example.com/shopforWidgets.asp
www.example.com/st/asp/en/shopforWidgets.htm
www.example.com/st/asp/en/shopforWidgets.htm

That looks like it could be a duplicate content issue, any way to deal with the problem?

defanjos

6:15 pm on Jul 22, 2004 (gmt 0)

I don't quiet understand the question, but are you concerned that a url in the format site.com/page.asp?id=456 will be shown as a duplicate of site.com/whatever/456?

I think if you don't link in any way to site.com/page.asp?id=456, you will be fine.

But I could be wrong.

Marcia

11:06 pm on Jul 22, 2004 (gmt 0)

There's another page being created for the same "page" - neither has a question mark in it.

pageoneresults

11:09 pm on Jul 22, 2004 (gmt 0)

Hi Marcia, this would be seen as duplicate content. You'll need to block robots from one of the above URIs, preferrably the longer one. Just Disallow: the /st/ and that will keep the spiders out of there. Ultimately you should implement a META robots tag with robots-terms of none on the final page in the /st/ path.

P.S. Why are there two paths to the same content? Translation?

Marcia

8:22 am on Jul 23, 2004 (gmt 0)

No idea how this is happening guys, this is a way new thing to me, how it's being done. It's just the way whatever "filter" is being applied is creating them.

I've just started looking into it; there are over 2K pages in the index, and the vast majority of them are Suppiemental Results.

Also, depending on how far you go deeply into the navigation, each time you click to go back to the homepage you get a completely different URL - all with the same identical page, but with different department and section ID's in the filepath - loads of them.

It looks like big trouble to me; it's very tempting to put in a robots.txt exclusion for everything that isn't in the root directory until it can be sorted.

defanjos

2:20 pm on Jul 23, 2004 (gmt 0)

Marcia,

Not saying this is related to your problem at all, but, keep an eye on it. The pages that the ISAPI filter uses to serve pages have to have complete paths for the links, images, css, js, etc.

For example, the links should look like <a href="http://www.site.com/">Home</a> and not <a href="default.asp">Home</a>

pageoneresults

3:21 pm on Jul 23, 2004 (gmt 0)

Marcia, defanjos brings up one of the most important parts of working with ISAPI_Rewrite filters; Absolute URI Paths. This is a must.

When we first started working with ISAPI filters a couple of years ago, we learned from experience. Our first test of working with the filters produced some unexpected results. Since we used Relative URI Paths in the .ini file, existing URIs were being appended with the rewritten URI. To make a long story short, it was a mess for about 30 days while we corrected the issue.

In addition to the above, we Disallow: all of the file.asp pages that we don't want to have a spider crawling. We also include a META Robots Tag with the robots-terms of none on those .asp pages to keep the bots from indexing and displaying links to those pages.

You should not have two different URI paths to the same content. If you do, one of those paths need to be disallowed so that you avoid the issues you mention above.

P.S. Those supplemental results (SRs) are not good. From what I've seen, when they appear in an instance like this (with a rewrite), something is not right somewhere. Note, there could be other issues causing the SRs to appear.

It sounds like whomever wrote the expressions for the .ini file might have some problems in there. If you go to the root of the web, there should be a file

http.parse.errors

. Open that file in Notepad and see if there are any logged errors.

defanjos

3:49 pm on Jul 23, 2004 (gmt 0)

pageoneresults,

I learned that the hard way also.
Now I have a the top of all my ISAPI filter pages a warning, so I don't forget to use absolute paths.

e.g.

'****** WARNING - NO files and links CAN use VIRTUAL paths - all need to be ABSOLUTE ***

Marcia

1:37 pm on Jul 30, 2004 (gmt 0)

>>You should not have two different URI paths to the same content.

It's far worse than two - it's multiple paths to the same page. And it is a concern about suppiemental result; I've seen that with another site out there that got hit for duplicate content. I'll get the info about the direct paths to them. Thanks!