homepage Welcome to WebmasterWorld Guest from 54.237.54.83
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

    
What should be Gbots response to a 304?
kaled




msg:168121
 10:56 pm on Apr 9, 2004 (gmt 0)

When my site was down for four days as a result of the Manchester telecoms fire (UK) I lost 3 index pages and several minor pages from the index. (All index pages show toolbar PR 5).

After five or six days I decided to check my logs for Googlebot - all seemed well. One page in particular seemed to be read at least once a day - neverless, all remained absent from the index.

I left it another day or so and then decided to study the logs more closely. All I could see for Googlebot was the response code 304. Not being familiar with HTTP codes I had to look it up - 304 = unmodified (in response to if modified since).

Since no pages had changed, the code was correct, however, Google's response to the 304 seems to be do nothing. This is fine, but if the page is currently missing from the index, do something is required.

So to test the theory, I touched all the missing index files. Within a few hours, my logs showed a 200 (OK) code for the most read page. That page is now back in the index after less than 48 hours.

So, if pages go missing from the index for any reason, touch them (or upload them again) and cross your fingers.

Kaled.

 

seofreak




msg:168122
 2:43 am on Apr 10, 2004 (gmt 0)

but 304 is normal .. why worry about it .. sorry, not getting your point.

MarkHutch




msg:168123
 2:50 am on Apr 10, 2004 (gmt 0)

I believe your site would re appear even if you didn't change the page. Code 304 is very common and it does save bandwidth for both Google and us. It is possible that the changes you made improved the looks of our page to Google, and not just the fact that you recieved a code 200 instead of 304. Just my 2 cents.

John_Creed




msg:168124
 2:54 am on Apr 10, 2004 (gmt 0)

Doesn't sound like a bug to me.

ciml




msg:168125
 10:13 am on Apr 10, 2004 (gmt 0)

> So to test the theory, I touched all the missing index files. Within a few hours, my logs showed a 200 (OK) code for the most read page. That page is now back in the index after less than 48 hours.

From your description there was no control sample for comparison. I'm afraid that the results of your experiment don't imply the results you infer.

kaled




msg:168126
 10:34 am on Apr 10, 2004 (gmt 0)

but 304 is normal .. why worry about it .. sorry, not getting your point

The point I'm trying to make is that if a page is already missing from Google's index, then an "if-modified-since" request may itself be wrong. But let's assume that Google does have a reference copy of the missing page stored somewhere (not just a date for the missing page) then it still appears to behave wrongly by not reincluding the page in the index.

OK, coincidence is a possibility here, but, as a programmer, I can definitely say that this looks like a bug. Hopefully, GG will read this and send out a memo to investigate and if it is a bug, fix it. But in the mean time, people should be aware that this bug MAY exist since there is an easy solution - touch the files (or upload them again) to change the date stamp.

From your description there was no control sample for comparison. I'm afraid that the results of your experiment don't imply the results you infer

Yes, absolutely, every experiment requires a control - in this case, the control experiment is ongoing. In particular, my links page is hit (almost) every day but is still missing. Other pages are hit less frequently. I shall experiment with all of these.

Kaled.

giggle




msg:168127
 10:45 am on Apr 10, 2004 (gmt 0)

Kaled: Make some minor changes to your pages and GoogleBot will reload them.

rharri




msg:168128
 12:05 pm on Apr 10, 2004 (gmt 0)

Kaled,
Are you saying that you used the Unix "touch" command to cause the files modified date to change and making it look to the bot as tho' you had altered the file?

Bob

HarryM




msg:168129
 1:32 pm on Apr 10, 2004 (gmt 0)

Thank you Kaled for your interesting observation. Your original post was quite clear.

kaled




msg:168130
 2:09 pm on Apr 10, 2004 (gmt 0)

Are you saying that you used the Unix "touch" command to cause the files modified date to change and making it look to the bot as tho' you had altered the file?

To be strictly accurate, I used the touch option from the Plesk Control Panel (File Manager). I thought this was the best option for an initial test rather than uploading new files.

One thing I didn't make clear though is that most (if not all) missing pages are now indexed again but only as urls, no titles, snippets, etc. - just as if they were robot-excluded.

From my logs, the two other missing index pages were picked up (http 200) at ~1.00 am 9th April. I suspect they'll soon be back in the Google index.

Kaled.

kaled




msg:168131
 9:46 am on Apr 13, 2004 (gmt 0)

As I predicted, the two missing index pages from my site have now been reincluded by Google. My links.html page is still missing - also as predicted.

This may not be proof of a Googlebug, but it certainly supports the theory.

Kaled.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved