| 10:10 pm on Aug 16, 2004 (gmt 0)|
steveb, you're 100% on the money as usual.
What I don't understand, however, is what has changed? I'm not the only one reporting this behavior - I've spoken to 15 other people who have had the same thing happen.
Something big changed last week. I think Google's getting ready to make a big announcement when they IPO.
FWIW - Even with the homepage drop, G traffic to this site was up something like 20% last week.
| 10:22 pm on Aug 16, 2004 (gmt 0)|
I think we first saw a change when the threads began appearing about URL only listings a few months ago. Now, my guess is they simply did the duplicate equivalent of an algo update... they added some factor in detecting duplicates (or perhaps removed some type of element that protected benign duplicates).
| 11:32 pm on Aug 16, 2004 (gmt 0)|
|Hey bakedjake, I think I've found your problem. There's no penalties or anything like that; it's because you're splitting your internal linkage between two different root pages. Suppose your domain is yourdomain.com. Do the search [site:yourdomain.com inurl:https]. See those 912 https results? Click on the cached page for the first one. Now mouseover the link in the top left. Instead of pointing to http://www.yourdomain.com/, notice that it points to http://www.yourdomain.com/index.asp. All the https cached pages that I checked link to the index.asp version of your home page, and there were enough of your internal pages doing that that you convinced us that http://www.yourdomain.com/index.asp was the canonical root page for your domain. That's why when you do the search [yourdomain] you see http://www.yourdomain.com/index.asp as the first result.|
So your hundreds of internal https pages pointing to a different location is clouding the water some. If you fix them to all point to the same location (http://www.yourdomain.com/) then we'll crawl all those pages in a little while, and you should be fine shortly afterwards.
I guess the takehome message for non-bakedjake members is to doublecheck your internal linkage. I'd pick a canonical root page like http://www.yourdomain.com/ and just stick with that by making sure any internal pages point there instead of to other versions of your root page.
| 11:44 pm on Aug 16, 2004 (gmt 0)|
Hey GG, long time no see! Thanks for chiming in.
Anticipating the problem was in fact the duplicate linking (thanks Marcia, who originally suggested the idea to me), I've gone ahead and implemented step one (see msg21) of my plan. So it should be corrected soon, and I'll let everyone know.
But GG, what changed last week/over the weekend? :) Whatever it was, you sure hit a lot of people with it.
New duplicate content filter?
By my count, there are more links pointing to
www.domain.com/index.asp, unless you count HTTPS links. Did you change the parsing/weight/something of https pages? ;-)
[edited by: bakedjake at 12:00 am (utc) on Aug. 17, 2004]
| 11:59 pm on Aug 16, 2004 (gmt 0)|
Google has been going after duplicate content in a large way for months now, we were the first to spot a duplicate filter and report findings and examples of it in action. Maybe they are just now getting around to employing it for links as well as content.
| 10:05 pm on Aug 17, 2004 (gmt 0)|
Question on this topic - would it make a difference if you are linking to:
[mydomain.com...] or [mydomain.com...]
With the extra / being the only difference there.
| 10:49 pm on Aug 17, 2004 (gmt 0)|
No different. But if you notice, whenever you type an address without the trailing / the server actually redirect it to the one with trailing /.
Traling / usually notify the server that it is a directory and it should read the default index file, or list the content of the directory (depending to the server configuration)
| 11:28 pm on Aug 17, 2004 (gmt 0)|
Thanks AthlonInside - that was what I was thinking - just wanted to confirm. Can't be too safe these days ;)
| 7:31 pm on Aug 18, 2004 (gmt 0)|
bakedjake same thing happened to me last month and I freiked out. If you have all your pr just wait, or in fact what i did in desperation was to use the submit tool again. I know is is stupid as I have 500 external links coming into that site and it should be picked up daily but by luck or not a day later I was back in the index.
Never know a competitor could have used the site removal thing!
| 7:45 pm on Aug 18, 2004 (gmt 0)|
I did not see GG has spoken. I am not worthy with above analysis. Shame on me
| 9:27 pm on Aug 19, 2004 (gmt 0)|
Is it worth sending an email to Google letting them know if this has happened to you?
| 9:44 pm on Aug 19, 2004 (gmt 0)|
No, now we know the problem, fix it.
| 1:52 am on Aug 20, 2004 (gmt 0)|
For all who are curious and have the same problem, the homepage just popped back in.
Roughly 4 days from correcting the links to the homepage coming back.
| 2:06 am on Aug 20, 2004 (gmt 0)|
bakedjake, all is needed to see if one has been victim of the wrongly linked home page thing is to type:
www.mydomain.com in Google and see what comes up?
A "no information..." means a MIA?
I read about it but not too familiar with this.
| 2:19 am on Aug 20, 2004 (gmt 0)|
That is correct, and indicitive of the problem that I had as well as some others I talked to. But I could invision it happening to any page that could be accessed two different ways and served the same content, such as a directory:
The advice and simple solution to the problem is to make sure you are consistent with your page references.
| 2:36 am on Aug 20, 2004 (gmt 0)|
bakedjake, glad that we picked your preferred canonical page again.
| 3:22 am on Aug 20, 2004 (gmt 0)|
I believe that PR0 problem for index page is not Google bug.
I believe that it is their strategic error and lack of common sense in their new algorithm.
Because in *any reasonable* algorithm the index (main entry) page should always inherit PR from the page with the *highest* PR in the site. In other words PR of the index page of the site should always be the highest one in the site.
This is very easy to implement and if they did not do this they lost their common sense and made the biggest error in the Google history. And it seems that they really did this error
Common sense tells us that index page represent the site. If we compare the site with a person or company and a page in the site with the "product" we may say that people usually measure the authority (PR) of the person by the authority (PR) of its greatest product. They often prefer to search for company or author (i.e. index page) than for a particular product because they believe that if a company made one good product may be other products are also good. And in general case they are right.
Because it is actually the consumers demand it is in Google's interest quickly fix this problem and make index page as searchable as the highest PR page in the site.
I would like to stress that index page *requires special treatment* because without this index page *in general* has much lower rank than other site pages. For example, when I link to another sites I try to link to the page with specific information, relevant to the topic of my page. And (surprise!) as a rule it is not an index page. Other seems do the same. When I now try to find the new part of my site in Google I find first numerous download sites that point to me and after them (you guess right!) my download page and not index one after them all. It is like suggest to come through the window instead the door.
As about what happened recently, I believe that Google virtually blocked PR spreading to other pages of the same site. Their error was that they seems did not understand that index page should be an exception. You may play and adjust spreading coefficients but not for index page. It should always inherit the highest PR.
| 10:04 am on Aug 20, 2004 (gmt 0)|
"In other words PR of the index page of the site should always be the highest one in the site."
There is no reason for this to be the case, or assumed.
The problem here is mostly due to sloppy webmastering, and of all the things Google can be blamed for, that isn't one. Maybe they should be somewhat better at mindreading the sloppiness of some websites, but they are at best a junior partner in the blame.
| 1:29 pm on Aug 20, 2004 (gmt 0)|
>>splitting your internal linkage between two different root pages
| 2:22 am on Aug 21, 2004 (gmt 0)|
|"In other words PR of the index page of the site should always be the highest one in the site." |
There is no reason for this to be the case, or assumed. The problem here is mostly due to sloppy webmastering...
I meant that index (main entry) page should always inherit PR from the page with the *highest* PR in the site.
Because common sense tells us that the authority (PR) of the site as a whole (index page) should be as high as the authority (PR) of its best page.
For example the authority of Shakespeare should be not less than the authority of its best play, say Hamlet.
It should not depend on the web master skill because it's the matter of consumer demand and not of the author /company self promotion. People often prefer to remember and search by companies and authors rather than by products.
With Shakespeare it happens to be all OK but the problem seems still here.
For example when I searched for Adobe I saw Acrobat page first. Again Google suggests us to go through the window instead of the door.
But they probably have already fixed something. Now when I search for my product, I still see it after all download pages that point to it.
But now I see my index page above my download page and previously I saw my download page only. I changed nothing.
| 3:40 am on Aug 21, 2004 (gmt 0)|
> Their error was...
It doesn't look it was an error. More like an unintended side effect. And I stop here.
Why would Google spend so much time, so much of their money paying expensive PhDs AND risk an entire index or half an index JUST to fix sloppiness of webmasters? Seriously.
A bug is a bug is a bug...
| 10:04 am on Aug 21, 2004 (gmt 0)|
Can some veteran give specific instructions on Linking ... in a nutshell? too confusing i found!
what is somebody links to me like mydomain.com/index.htm , while my intention to google understanding is "www.mydomain.com/"?
or say if somebody links me www.mydomain.con and i internally link homepage as www.mydomain.com/index.htm?
| 10:23 am on Aug 21, 2004 (gmt 0)|
I've had this problem - a disappearing site after I put some navigation pointing to index.html and others directly to the domain. Oddly, these are old sites (98/99) and this problem only started to rear it's ugly head over the past few months.
In fact, I'm waiting for a site to return from the Google Grave as I type since it went missing last week due to a similar mistake of mine. Oops! (I blame Dreamweaver Guv')
On a broader issue - no other search engine seems to have this problem, so why is Google tripping on this?
| 9:13 pm on Aug 21, 2004 (gmt 0)|
I have a site that has 2 different domain extensions (for different country/languages) which until recent had different content on the frontpage (ie different language). For a month now I changed it to the same content and since 2 weeks one of the domains got a PR0 and the other still has it's original PR.
It looks like the same 'error' that happened to the other sites mentioned in this thread so google thought it was duplicate content and therefor made one of the domains low valued. The only difference here is that the domain still is in the google index (cache).
Internal pages still have their usual PR.
Is this the same bug and for me more important, I changed the original content back again and how long would it take to get the PR back? It would really help me it it could get it's original PR back by the end of this month. (so if GoogleGuy could help me out here.....)
| 1:55 am on Aug 22, 2004 (gmt 0)|
I have found hundreds of sites that google has fully indexed as www.domain.com?source=looksmart while it only has the true domain partially indexed (no title or description). I have also found hundreds of business.com tracking urls indexed. Several of my pages are fully indexed with tracking url's since the end of may while google shows fully indexed versions with tracking url's and dynamic strings from other ppc engines, fully indexed.
So while everyone is busy changing or making sure they don't happen to have one stray link, the fact is that unless you have some magic way of controlling every link on the internet, google may completely screw up the way they index pages. I have seen things like this in the past but they always seemed to be fixed a few days or weeks later. It's about three months since I started seeing this and it doesn't look like any progress is being made at all.
I was hoping they would fix it, but I am seriously doubting it now. I just wonder why they would fully index a ppc tracking url and not the actual page?
Hey Google, what's your problem? Nobody can control every link on the internet and why in the world would you index results from other SE's and PPC engines?
The tracking urls I have indexed have ZERO backlinks, the real url's have hundreds, including the GOOGLE directory!
You would think that they would at least get urls in their own directory right...
| 3:43 am on Aug 23, 2004 (gmt 0)|
Answer to robertito62 reply to my message #50
|It doesn't look it was an error. More like an unintended side effect. And I stop here. |
I agree, forgive my poor English. You term is better.
|Why would Google spend so much time, so much of their money paying expensive PhDs AND risk an entire index or half an index JUST to fix sloppiness of webmasters? Seriously. |
Because otherwise they are at risk to loose customers.
PR0 for index page (while other site pages have large PR) produce not what the customers expect and wish. So their customers may begin to use other search engines more.
| 3:13 pm on Aug 23, 2004 (gmt 0)|
|PR0 for index page (while other site pages have large PR) produce not what the customers expect and wish. So their customers may begin to use other search engines more. |
Vadim, off-topic, 95% of the customers of Google don't know what PageRank is, let alone actually care about it. Look at how many people use MSN "because it is the default".
Vadim, on-topic, PR0 of the index page is not what was happening. The index page still returned PR.
|More like an unintended side effect. |
That's exactly what it was. And it looks like G changed something on Thursday. I thought our homepage came back because of the changes we made, but in talking to other webmasters with the problem I'm starting to think something was changed on their end as well.
| 4:50 pm on Aug 23, 2004 (gmt 0)|
At about midnight Saturday night, my index page disappeared from the Google index. All my other pages still appear listed. My home page had been number one at G for a fairly high traffic search term for many years, so it was a shock when it just disappeared. I hadn't made any significant changes to the site recently, nor had I done anything I can think of to trigger this. It just appears to be a change at G.
I've read this thread and others here, but I'm still a novice in this area. Is there any action I can take at this point to get my index page listed again? Should I submit it? Should I email G? Would they even respond?
Thanks in advance.
| 2:43 am on Aug 24, 2004 (gmt 0)|
|Vadim, off-topic, 95% of the customers of Google don't know what PageRank is, let alone actually care about it. Look at how many people use MSN "because it is the default". |
Vadim, on-topic, PR0 of the index page is not what was happening. The index page still returned PR.
Sorry I implied that dropping of index page from Google index and PR are related. Probably I am wrong but it does not matter for the main idea.
I of course understand that people do not care of PR. They also do not care about webmasters skill and other reason why they cannot find main entrance to the site (index page). They blame Google
I meant that people simply would like to find the index page as easy as most top positioned or popular page in the site. Because index page like the main entrance, like a representative of a company or author.
Since it is a consumer demand it should be satisfied independently of the web master skill because best content authors are not always the best web masters.
For example, Internet Explorer is most forgivable browser to html errors and I believe that it is because Microsoft made the research and found out (surprise!) that people do not care about webmasters errors but prefer the browser that can fix this errors if possible. If the content is good, of course.
| 3:07 am on Aug 24, 2004 (gmt 0)|
|Is there any action I can take at this point to get my index page listed again? Should I submit it? Should I email G? Would they even respond? |
1.Goggle responds but mostly by standard letters that suggest reading the advices to webmasters in Google site. They indeed worth reading.
2.The only reliable advice seems to get as many good (high PR) relevant links to index page as possible. And fill it with good content (though I should admit index page often is not the most appropriate place for large good content form the visitors point of view). And make some relevant outbound links (though again, index page is not often the most convenient place for them).
This is trivial, this is hard but it seems that this is the only reliable way. Because actually Google advises the same.
| 4:38 am on Aug 24, 2004 (gmt 0)|
In a nutshell: Pick one canonical URL for your site, usually www.example.com/ or example.com/, and stick with it. For internal links and requested incoming links, use that one root URL. Internally, you can use
<a href="/">, <a href="http://www.example.com/"> or <a href="http://example.com/"> -- the first form being "absolute" and the second two being "canonical" -- but always be consistent with the domain name if you use the canonical form. If your server receives a request for the "wrong" domain name, it should invoke a 301-Moved Permanently redirect to the "correct" domain name. This can be implemented using mod_rewrite on Apache, or with the "control panel" or ISAPI Rewrite on IIS.
The key is to be absolutely consistent in your internal linking, and to try to get those who link to you to use the "correct" domain name as well. Implementing the 301 redirect will help in this respect if the Webmaster adding a link to your site verifies his/her work.
While we have recently had some reports of a specific search engine mishandling 301 redirects, the 301 is still "the right thing to do," and we'll have to leave it to that search provider to bring their redirect handling into conformance with RFC2626 [w3.org].
Personally, I believe it is a mistake for search engines to try too hard to compensate for incorrectly-configured sites, at least insofar as it results in problems for correctly-configured sites.
Anyway, in a nutshell, always link to any given resource by one URL and one URL only; consistency is what matters.
| This 74 message thread spans 3 pages: < < 74 ( 1  3 ) > > |