Forum Moderators: open
After five or six days I decided to check my logs for Googlebot - all seemed well. One page in particular seemed to be read at least once a day - neverless, all remained absent from the index.
I left it another day or so and then decided to study the logs more closely. All I could see for Googlebot was the response code 304. Not being familiar with HTTP codes I had to look it up - 304 = unmodified (in response to if modified since).
Since no pages had changed, the code was correct, however, Google's response to the 304 seems to be do nothing. This is fine, but if the page is currently missing from the index, do something is required.
So to test the theory, I touched all the missing index files. Within a few hours, my logs showed a 200 (OK) code for the most read page. That page is now back in the index after less than 48 hours.
So, if pages go missing from the index for any reason, touch them (or upload them again) and cross your fingers.
Kaled.
From your description there was no control sample for comparison. I'm afraid that the results of your experiment don't imply the results you infer.
but 304 is normal .. why worry about it .. sorry, not getting your point
The point I'm trying to make is that if a page is already missing from Google's index, then an "if-modified-since" request may itself be wrong. But let's assume that Google does have a reference copy of the missing page stored somewhere (not just a date for the missing page) then it still appears to behave wrongly by not reincluding the page in the index.
OK, coincidence is a possibility here, but, as a programmer, I can definitely say that this looks like a bug. Hopefully, GG will read this and send out a memo to investigate and if it is a bug, fix it. But in the mean time, people should be aware that this bug MAY exist since there is an easy solution - touch the files (or upload them again) to change the date stamp.
From your description there was no control sample for comparison. I'm afraid that the results of your experiment don't imply the results you infer
Yes, absolutely, every experiment requires a control - in this case, the control experiment is ongoing. In particular, my links page is hit (almost) every day but is still missing. Other pages are hit less frequently. I shall experiment with all of these.
Kaled.
Are you saying that you used the Unix "touch" command to cause the files modified date to change and making it look to the bot as tho' you had altered the file?
To be strictly accurate, I used the touch option from the Plesk Control Panel (File Manager). I thought this was the best option for an initial test rather than uploading new files.
One thing I didn't make clear though is that most (if not all) missing pages are now indexed again but only as urls, no titles, snippets, etc. - just as if they were robot-excluded.
From my logs, the two other missing index pages were picked up (http 200) at ~1.00 am 9th April. I suspect they'll soon be back in the Google index.
Kaled.