Welcome to WebmasterWorld Guest from 126.96.36.199
All the pages except contact, about and main page are supplemental.
That is the problem. The file /thispage.html on the Microsoft server can be accessed through multiple URLs:
and that is duplicate content.
The non-www and www problem is solved with a simple 301 redirect (easy to do on Apache) and the CASE problem just does not occur with Apache.
To access www.domain.com/THISpage.html on an Apache server, you would need a file on the server called /THISpage.html to actually exist too.
Today I have noticed that my article (article A) is back in Google -- as a normal URL, not as a supplemental.
- When I use the site: operator (site:www.example.com), it's there.
- When I search for keywords, it's also there (even though it has dropped one position to position #5 when I search for the title -- before it went supplemental it had a position #4).
It looks like everything is back to normal; both article A and B is still in the index. Right now, from where I am, the SERPs seems stabil.
Anybody else who have noticed a normalisation?
[edited by: OutdoorMan at 3:46 pm (utc) on Jan. 6, 2007]
so when a google spider steps on a link to
a file on a web server.
the link is: example.com/INDEX.HTM
CASE 1: web server is microsoft IIS6
the server accepts the request and compiler and returns the existing webpage on the case insensitive url
CASE 2: web server is apache/unix/linux
what happens? does it return a 404, or does it do some kinda redirect?
does it perform the redirect , if there is a redirect, for every request for any page of that domain name?
does if therefore perform the redirect for internal navigation, i,e going from /index.htm to /contact_us.htm?
If necessary, perhaps this can be replicated in IIS6 via asp.net
Apache: A request for www.domain.com/page.html will pull the file page.html from the server if it exists, otherwise will respond with a 404 error.
A request for www.domain.com/PAGE.HTML will pull the file PAGE.HTML from the server if it exists, otherwise will respond with a 404 error.
In the last couple of days we've experienced one of the English language sites [ 1 of 3 site/page/template designs ] which wasn't producing SERP results, start kicking. Prior to that , 3 of our 4 foreign language sites [ 4 same template/page/sites ] did the same. They are all large sites of 90k plus pages.
The only thing that we appear to have done differently to you, if to balance our IBL's between the main/front page and relevant inner / deep pages. But on the foreign language sites we have very few IBL's and they were the first to produce result ( 7-9 weeks ago )
But the behaviour of recovery is a little perplexing, and I'd encourage patience. We finished the fixes around 15 August. 5 months on from the fix and there appears to be no rhyme nor reason to the different timelines of each sites results being released back into the SERP's ( except it starts at fortnightly intervals on a Thurs! )
There is a potential issue with the quality of navigation and theming between pages, which might indicate why our English sites are taking longer.
I'm wondering if any other members could explain their recovery process and if you would encourage patience for internetheaven ( difficult as it is ).
>and that is duplicate content
Actually this technically is not duplicate. What is seems to be to me is part of an ongoing war between goofle and Microsoft. I am not to sure how Microsoft will respond to this, but I do know it will be putting a few smiles on a few wierdo techy guys faces.
URLs in general are case-sensitive (with the exception of machine names).
W3C reference [w3.org]
My earlier mentioned (and now very mysterious) article A is no longer in G's index. This morning I have discovered that it's gone from the index (again!), both when I search for keywords and when I do the site: operator :(
I really don't know what is going on or why the article keeps appearing and disappearing at this time, but I better stop reporting on this issue, untill Google has made up it's mind (I don't want to post confusing reports).
[edited by: OutdoorMan at 1:18 pm (utc) on Jan. 7, 2007]
1. Identifying the possible problems that put you into the supp. pages and fixing them.
2. Renaming the pages and redirect all requests for the old pages to the new pages.
3. Submit a new sitemap to Google.
I got put into the supplemental pages and see I made a stupid mistake by forgetting to keep my meta descriptions unique. I've made the changes along with a few others and was thinking this would be the fastest way to get re indexed, rathar than waiting for the pages to get moved back to the regular index.?
But still the W3C uses uppercase letters in it's URLs?
Many do. In the case of the W3, change the case to lower and watch what happens. They force the correct case.
Wiki uses mixed case in their URI's too. I just noticed that they have a flaw in their implementation. If there are underscores separating words, you can change the case on the second/third words and they do not correct it. They do if you change the case on the first word. That could present problems.
I typically have followed the "all lower case" mantra. I'm a Windows person and haven't really given this too much thought until it was just brought up. You don't expect anyone to link to you in a mixed case format. But, if you're on a Windows machine and haven't taken this into consideration, there may be issues to contend with.
This kind of ties in with competitor sabotage. What's to stop someone from building a site full of links that link to your site using mixed case? :(
regarding case sensitivity, i can implement a code fix for pages, basically, my code works for any page that doesn't have session variables or other variables after "?"
it works for
it also works but des pick up the bits after the "?"
I'll some more experimenting, but can i ask, is it a general consensus that case insensitivity in MS IIS is a serious duplicate content problem.
my code fix is bound to have a detrimental impact on server performance, so
Thanks for your help
However, when I do a inurl: search in Google I get the exact opposite results. All of the pages that are in thier regular index are listed and, whatever is in supplemental is not listed.
Could someone explain why I get this type of results.