Forum Moderators: open
Then in August without any changes to the site it all just fell apart. Pages fell out of the index and visitors dried up. Our previously fine site map was way too large (google changed the rules?) and spidering all but stopped.
We noticed some of our competitors ranking way high up with huge link counts of 'self-referencing' links. How can this be? We also noticed dynamic URLs being indexed too which I read in the google FAQ shouldn't happen i.e.
www.whatever.foo?this=that&that=theother
We have now adjusted our site and site map to fit but spidering is patchy and very few pages cached. This seems to fit with other peoples comments on these forums which talk about the 'sandbox' or 'strange bot activity' either too much or none.
Has google imploded or did a change to the rules backfire or is it a big shake up to compete with the new MSN? What are the rules? How can an honest site compete?
I think I found your problem...do you have apache rewrite enabled?
Did google spidered both the /page.php and /?this=that&that=theother and declared them dupes?
If this is it, you owe me 40 cents, cash ;)
While I am very happy that you have all contributed please don't take it the wrong way when I ask what a debate on TOS has to do with my question ;-)
walkman - thanks for your help, I need to clarify, the address given is an example of the sort of self referencing links that currently appear to be boosting the link count of our competition. Our link count is a pathetic 17. BUT ALL 17 are valid white hat inbound links.
Why is it that some sites are able to boost their link counts in this way? Or is this something google used to allow but are now phasing out? Is this perhaps why sites are slipping because the link count drops slowly without warning from where it was with a gazillion self references?
Is the way links are handled by google at the heart of the sandbox issue too? Old urls = Old link rules (self ref ok) New urls = new rules (strict inbound links)
Perhaps the 'being moved to the sandbox' effect is when google catches up with the site and applies the new rules to it.
How many internal links show up is down to a combination of (usually) Home page PR and the internal linking structure.
With only 17 links your PR may be too low, or your site design doesn't enable it to pass properly to your internal pages.
We have side nav bars on every page with a fixed set of links to the main 40 sub catagory index pages and to our home page (which is PR3). This means that we have about 4000 links to our home page and also to each of the 40 sub index pages.
We also have a site-map tree linking ultimately to every page on our site with a maximum of 3 clicks.
Sometime in June 2004, the pages that drew the most traffic via Google searches disappeared from the index, then eventually about 90% of the remaining pages. It nearly coincided with a HOSTING IP and MS IIS software upgrade at the time, and I persued possibility that my problem was identical to someone else who uses MS IIS as their host, and found out it was because of UTF-8 encoding problem Google could not interpret when inquiring this type of Host. My Sitemap was at the time encoded in UTF8, which I later changed. I never could verify either way what was going on with my particular site because my host refused to let me see server logs. So I moved hosts recently to one who will. Moving hosts did not actually solve the problem either, but as I say, resubmitting the URL to Google seems to have got the site showing in the index again , and it lists a lot of the dynamic URL's that it never had before for a PHP bulletin board I have there. I suspect it hung up on this coding, and decided not to list the whole site for it? This is speculation, and I wonder if anyone can verify that is what Googlebot will do when it has too many dynamic pages to index. Also an important note is that other engines like YAHOO, ALTA VISTA etc, had no problems keeping up with the site, and simply ignored the bulletin board totally.
Right now, my site is indexed - but not 100% the way it actually is structured. Because I moved hosts, some pages that were case sensitive no longer show up, and at one point through all the struggling to figure out what was going on, I mistakingly posted the site url in a newsgroup that Google indexes, but excluding the 'WWW' - IE: [mysite.com...] , and Google did place this URL in its index at about the same time as the rest of my site disappeared, but did not reindex it as such. Currently the cached pages of the site all have a date of : Dec 31, 1969 , for the last month or so since I resubmitted the URL to Googlebot.
Any speculations or comments on whats going on are welcome.
During the move to the New host, a couple of pages wound up having links to each other that ignored case sensitivity. The old MS IIS host didnt care about case sensitivity, and worked fine, but the new Host is Linux, and it mattered. So as an example I had a page named:
product3.html
Which had to be changed to : Product3.html to avoid a lot of unnecessary altering of code on my site overall.
In actuality, there are about 3-4 pages in total that now have a capital letter as opposed to lower case, and the new server gives 404 when the old ones try to be called up. As a solution, I made up a custom 404 page , which this host lets you do. In it, I listed the new names of the pages with the old to help searchers get where they need to go. Turns out, going by the server logs, MANY still cannot make the connection to the new pages, and leave when they see the 404 page come up. This is not directly related to my point I want to make, but While I am here, does anyone know if I further customize the error page to utilize a redirect to perhaps the site map, would Google frown on this redirect?
Last night I submitted one of the NEW pages Url's with the corrected Capital letters to Google. Before that, It had no record of the corrected page name, was not crawling the site either.....but listed the old lower case page. After submitting the page with the correct capital letter in it, it seems to have made no difference. I cant tell if Google visited lately because my server log just reset, and I didnt have it auto save at the end of the month. I guess more questions would by why is Google still listing pages
with the lower case , and probably more important, is the date they all have :
Cached Copy as retrieved on Dec 31, 1969 23:59:59 GMT.