Forum Moderators: open
www.example.com/shopforWidgets.asp
www.example.com/st/asp/en/shopforWidgets.htm
www.example.com/st/asp/en/shopforWidgets.htm
That looks like it could be a duplicate content issue, any way to deal with the problem?
P.S. Why are there two paths to the same content? Translation?
I've just started looking into it; there are over 2K pages in the index, and the vast majority of them are Suppiemental Results.
Also, depending on how far you go deeply into the navigation, each time you click to go back to the homepage you get a completely different URL - all with the same identical page, but with different department and section ID's in the filepath - loads of them.
It looks like big trouble to me; it's very tempting to put in a robots.txt exclusion for everything that isn't in the root directory until it can be sorted.
Not saying this is related to your problem at all, but, keep an eye on it. The pages that the ISAPI filter uses to serve pages have to have complete paths for the links, images, css, js, etc.
For example, the links should look like <a href="http://www.site.com/">Home</a> and not <a href="default.asp">Home</a>
When we first started working with ISAPI filters a couple of years ago, we learned from experience. Our first test of working with the filters produced some unexpected results. Since we used Relative URI Paths in the .ini file, existing URIs were being appended with the rewritten URI. To make a long story short, it was a mess for about 30 days while we corrected the issue.
In addition to the above, we Disallow: all of the file.asp pages that we don't want to have a spider crawling. We also include a META Robots Tag with the robots-terms of none on those .asp pages to keep the bots from indexing and displaying links to those pages.
You should not have two different URI paths to the same content. If you do, one of those paths need to be disallowed so that you avoid the issues you mention above.
P.S. Those supplemental results (SRs) are not good. From what I've seen, when they appear in an instance like this (with a rewrite), something is not right somewhere. Note, there could be other issues causing the SRs to appear.
It sounds like whomever wrote the expressions for the .ini file might have some problems in there. If you go to the root of the web, there should be a file
http.parse.errors. Open that file in Notepad and see if there are any logged errors.
It's far worse than two - it's multiple paths to the same page. And it is a concern about suppiemental result; I've seen that with another site out there that got hit for duplicate content. I'll get the info about the direct paths to them. Thanks!