We had a bunch of pages come out of the supplementals a few weeks back, just to see them go back in to supplemental results. From watching them, it appears as if pages that had recent cache dates were out, now the pages have old cache dates again ( a few months old) and are back in the supplementals.
People change for something good but I don't know why Big B is giving all kind of junk results these days. I guess something is not right at their end.
>> I work only with microsoft servers an i know that filepaths are not case sensitive <<
That is the problem. The file /thispage.html on the Microsoft server can be accessed through multiple URLs:
and that is duplicate content.
The non-www and www problem is solved with a simple 301 redirect (easy to do on Apache) and the CASE problem just does not occur with Apache.
To access www.domain.com/THISpage.html on an Apache server, you would need a file on the server called /THISpage.html to actually exist too.
Update/feedback on my mentioned supplemental issue (earlier in this thread - page 2):
Today I have noticed that my article (article A) is back in Google -- as a normal URL, not as a supplemental.
- When I use the site: operator (site:www.example.com), it's there.
- When I search for keywords, it's also there (even though it has dropped one position to position #5 when I search for the title -- before it went supplemental it had a position #4).
It looks like everything is back to normal; both article A and B is still in the index. Right now, from where I am, the SERPs seems stabil.
Anybody else who have noticed a normalisation?
[edited by: OutdoorMan at 3:46 pm (utc) on Jan. 6, 2007]
so when a google spider steps on a link to
a file on a web server.
the link is: example.com/INDEX.HTM
CASE 1: web server is microsoft IIS6
the server accepts the request and compiler and returns the existing webpage on the case insensitive url
CASE 2: web server is apache/unix/linux
what happens? does it return a 404, or does it do some kinda redirect?
does it perform the redirect , if there is a redirect, for every request for any page of that domain name?
does if therefore perform the redirect for internal navigation, i,e going from /index.htm to /contact_us.htm?
If necessary, perhaps this can be replicated in IIS6 via asp.net
IIS: A request for www.domain.com/page.html or for www.domain.com/PAGE.HTML or any other capitalisation of [(p¦P)(a¦A)(g¦G)(e¦E).(h¦H)(t¦T)(m¦M)(l¦L)] will return the single file on the server that matches any capitalisation version of [(p¦P)(a¦A)(g¦G)(e¦E).(h¦H)(t¦T)(m¦M)(l¦L)] with a status code of "200 OK". The single file on the server, with 8 LETTERS in the filename can therefore be indexed under 64 different URLs.
Apache: A request for www.domain.com/page.html will pull the file page.html from the server if it exists, otherwise will respond with a 404 error.
A request for www.domain.com/PAGE.HTML will pull the file PAGE.HTML from the server if it exists, otherwise will respond with a 404 error.
In Windows create a file called test.txt and then try to create one called TEST.TXT. You will get an error message: "file already exists".
In Linux/Unix, etc, you can create both files. They are treated as being separate names. You need to use the exact capitalisation to access the correct one.
There is a module for Apache called mod_speling that can be used to correct, incorrectly capitalised URLs. It will either redirect if only one possible file is found or it will present all possible correct URLs if more than one is found.
Can you confirm that the mod_speling "redirect" returns a 301 status code?
If it does not, then that would be a problem.
In the last couple of days we've experienced one of the English language sites [ 1 of 3 site/page/template designs ] which wasn't producing SERP results, start kicking. Prior to that , 3 of our 4 foreign language sites [ 4 same template/page/sites ] did the same. They are all large sites of 90k plus pages.
The only thing that we appear to have done differently to you, if to balance our IBL's between the main/front page and relevant inner / deep pages. But on the foreign language sites we have very few IBL's and they were the first to produce result ( 7-9 weeks ago )
But the behaviour of recovery is a little perplexing, and I'd encourage patience. We finished the fixes around 15 August. 5 months on from the fix and there appears to be no rhyme nor reason to the different timelines of each sites results being released back into the SERP's ( except it starts at fortnightly intervals on a Thurs! )
There is a potential issue with the quality of navigation and theming between pages, which might indicate why our English sites are taking longer.
I'm wondering if any other members could explain their recovery process and if you would encourage patience for internetheaven ( difficult as it is ).
>and that is duplicate content
Actually this technically is not duplicate. What is seems to be to me is part of an ongoing war between goofle and Microsoft. I am not to sure how Microsoft will respond to this, but I do know it will be putting a few smiles on a few wierdo techy guys faces.
More like a war between Microsoft and the W3C -- plus the entire rest of the tech world. Technically, those definitely ARE different urls and if they resolve to the same resource, then that IS duplicate content. Ignore this fact at your own risk. I've spent a lot of time helping companies clean up after this kind of mess grew out of hand.
|URLs in general are case-sensitive (with the exception of machine names). |
W3C reference [w3.org]
|URLs in general are case-sensitive (with the exception of machine names). |
But still the W3C uses uppercase letters in it's URLs? ;)
Sorry for this... But I'd better correct my earlier statement (yesterday statement) -- in case someone has been monitoring the issue of my article that suddently went supplemental.
My earlier mentioned (and now very mysterious) article A is no longer in G's index. This morning I have discovered that it's gone from the index (again!), both when I search for keywords and when I do the site: operator :(
I really don't know what is going on or why the article keeps appearing and disappearing at this time, but I better stop reporting on this issue, untill Google has made up it's mind (I don't want to post confusing reports).
[edited by: OutdoorMan at 1:18 pm (utc) on Jan. 7, 2007]
|I really don't know what is going on or why the article keeps appearing and disappearing at this time, ... |
Whats going on maybe the folks at Googleplex are testing different algos and filters. Lets say its a continuos tweaking process until further ;-)
Would it be possible to deal with supplemental results by:
1. Identifying the possible problems that put you into the supp. pages and fixing them.
2. Renaming the pages and redirect all requests for the old pages to the new pages.
3. Submit a new sitemap to Google.
I got put into the supplemental pages and see I made a stupid mistake by forgetting to keep my meta descriptions unique. I've made the changes along with a few others and was thinking this would be the fastest way to get re indexed, rathar than waiting for the pages to get moved back to the regular index.?
|Whats going on maybe the folks at Googleplex are testing different algos and filters. Lets say its a continuos tweaking process until further ;-) |
Reseller > I really hope so -- otherwise I don't know what I might have done wrong and how to correct it.
|But still the W3C uses uppercase letters in it's URLs? |
Many do. In the case of the W3, change the case to lower and watch what happens. They force the correct case.
Wiki uses mixed case in their URI's too. I just noticed that they have a flaw in their implementation. If there are underscores separating words, you can change the case on the second/third words and they do not correct it. They do if you change the case on the first word. That could present problems.
I typically have followed the "all lower case" mantra. I'm a Windows person and haven't really given this too much thought until it was just brought up. You don't expect anyone to link to you in a mixed case format. But, if you're on a Windows machine and haven't taken this into consideration, there may be issues to contend with.
This kind of ties in with competitor sabotage. What's to stop someone from building a site full of links that link to your site using mixed case? :(
regarding case sensitivity, i can implement a code fix for pages, basically, my code works for any page that doesn't have session variables or other variables after "?"
it works for
it also works but des pick up the bits after the "?"
I'll some more experimenting, but can i ask, is it a general consensus that case insensitivity in MS IIS is a serious duplicate content problem.
my code fix is bound to have a detrimental impact on server performance, so
Thanks for your help
One of my sites seems to be going out of a 1 Year supplemental listing on the following dcs :
When I do site: search in Google I get a listing of all our pages that are in supplemental, I do not see any url's that are not in supplemental.
However, when I do a inurl: search in Google I get the exact opposite results. All of the pages that are in thier regular index are listed and, whatever is in supplemental is not listed.
Could someone explain why I get this type of results.
| This 81 message thread spans 3 pages: < < 81 ( 1 2  ) |