|What should be Gbots response to a 304?|
When my site was down for four days as a result of the Manchester telecoms fire (UK) I lost 3 index pages and several minor pages from the index. (All index pages show toolbar PR 5).
After five or six days I decided to check my logs for Googlebot - all seemed well. One page in particular seemed to be read at least once a day - neverless, all remained absent from the index.
I left it another day or so and then decided to study the logs more closely. All I could see for Googlebot was the response code 304. Not being familiar with HTTP codes I had to look it up - 304 = unmodified (in response to if modified since).
Since no pages had changed, the code was correct, however, Google's response to the 304 seems to be do nothing. This is fine, but if the page is currently missing from the index, do something is required.
So to test the theory, I touched all the missing index files. Within a few hours, my logs showed a 200 (OK) code for the most read page. That page is now back in the index after less than 48 hours.
So, if pages go missing from the index for any reason, touch them (or upload them again) and cross your fingers.
but 304 is normal .. why worry about it .. sorry, not getting your point.
I believe your site would re appear even if you didn't change the page. Code 304 is very common and it does save bandwidth for both Google and us. It is possible that the changes you made improved the looks of our page to Google, and not just the fact that you recieved a code 200 instead of 304. Just my 2 cents.
Doesn't sound like a bug to me.
> So to test the theory, I touched all the missing index files. Within a few hours, my logs showed a 200 (OK) code for the most read page. That page is now back in the index after less than 48 hours.
From your description there was no control sample for comparison. I'm afraid that the results of your experiment don't imply the results you infer.
|but 304 is normal .. why worry about it .. sorry, not getting your point |
The point I'm trying to make is that if a page is already missing from Google's index, then an "if-modified-since" request may itself be wrong. But let's assume that Google does have a reference copy of the missing page stored somewhere (not just a date for the missing page) then it still appears to behave wrongly by not reincluding the page in the index.
OK, coincidence is a possibility here, but, as a programmer, I can definitely say that this looks like a bug. Hopefully, GG will read this and send out a memo to investigate and if it is a bug, fix it. But in the mean time, people should be aware that this bug MAY exist since there is an easy solution - touch the files (or upload them again) to change the date stamp.
|From your description there was no control sample for comparison. I'm afraid that the results of your experiment don't imply the results you infer |
Yes, absolutely, every experiment requires a control - in this case, the control experiment is ongoing. In particular, my links page is hit (almost) every day but is still missing. Other pages are hit less frequently. I shall experiment with all of these.
Kaled: Make some minor changes to your pages and GoogleBot will reload them.
Are you saying that you used the Unix "touch" command to cause the files modified date to change and making it look to the bot as tho' you had altered the file?
Thank you Kaled for your interesting observation. Your original post was quite clear.
|Are you saying that you used the Unix "touch" command to cause the files modified date to change and making it look to the bot as tho' you had altered the file? |
To be strictly accurate, I used the touch option from the Plesk Control Panel (File Manager). I thought this was the best option for an initial test rather than uploading new files.
One thing I didn't make clear though is that most (if not all) missing pages are now indexed again but only as urls, no titles, snippets, etc. - just as if they were robot-excluded.
From my logs, the two other missing index pages were picked up (http 200) at ~1.00 am 9th April. I suspect they'll soon be back in the Google index.
As I predicted, the two missing index pages from my site have now been reincluded by Google. My links.html page is still missing - also as predicted.
This may not be proof of a Googlebug, but it certainly supports the theory.