homepage Welcome to WebmasterWorld Guest from 54.234.128.25
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Google pestering home page
Downloading several times a day
Phil_Payne




msg:3076139
 2:19 pm on Sep 8, 2006 (gmt 0)

Small site - 50 or so pages. Low update frequency, nice unique keywords.

Google XML sitemaps submitted, up to date and clean - priority, lastmod, updatefreq all specified correctly for each page. HTML sitemap derived from Xenu output.

The most stable page on the site is the home page. It changes three or four times a year. But Google spiders it up to seven times a day. A GET that gets a 200 and (from the log byte cound) the whole page is usually downloaded.

Nothing I am doing anywhere - or ever have done anywhere - would imply that the home page is updated frequently. It never has been, and the Google sitemap has updatefreq set to "monthly". There are about thirteen pages on the site - listed in the Google and HTML sitemaps and linked organically that Google has never crawled.

What is the Googlebot looking for on the home page? Surely if it's checking for changes I should be seeing 304s.

 

Brett_Tabke




msg:3076171
 2:47 pm on Sep 8, 2006 (gmt 0)

it will clear itself out in about 30 days as gbot learns your update frequency.

Are you producing *anything* that changes from page view to page veiw?

auto updating advertising code?
random headlines?
changing links?
different menu or flash.

any aspect of the page that changes at all? Even so much as one character change?

Take a accurate byte count of the source code (view from browser) and then compare that to a page view 2-3 hrs later. Any changes at all? Anything you are overlooking that would make the byte count or the character positions/code even a little bit different?

Phil_Payne




msg:3076197
 3:11 pm on Sep 8, 2006 (gmt 0)

> it will clear itself out in about 30 days as gbot learns your update frequency.

If that were true it would have cleared itself up just after Big Daddy. This is permanent activity.

The page is HTML 4.01 written using the Crimson editor. Static HTML barely begins to decribe it - fossilized HTML would be closer.

motorhaven




msg:3076555
 6:07 pm on Sep 8, 2006 (gmt 0)

Is your server properly handling dates in the header and set up to use if-modified-since requests?

Phil_Payne




msg:3076712
 8:17 pm on Sep 8, 2006 (gmt 0)

> Is your server properly handling dates in the header and set up to use if-modified-since requests?

Boring IIS 5. Google accesses the XML sitemap using if-mofified-since and gets 304s, just like it should.

IIS log extract:

2006-09-06 14:18:59 66.249.65.18 GET 200 /index.html
2006-09-06 14:21:27 66.249.65.18 GET 200 /index.html
2006-09-06 15:18:31 66.249.65.18 GET 200 /index.html
2006-09-07 04:04:07 66.249.66.34 GET 304 /sitemap.xml
2006-09-07 04:27:07 66.249.66.34 GET 200 /index.html
2006-09-07 05:08:47 66.249.66.34 GET 200 /index.html
2006-09-07 05:13:08 66.249.66.34 GET 200 /index.html
2006-09-07 05:51:26 66.249.66.34 GET 200 /index.html
2006-09-07 06:10:28 66.249.66.34 GET 200 /index.html
2006-09-07 06:48:00 66.249.66.34 GET 200 /index.html

jomaxx




msg:3076740
 8:48 pm on Sep 8, 2006 (gmt 0)

Keep it in perspective, you're talking about "up to" about 7 pageviews per day. This amounts to some tiny fraction of a penny.

Phil_Payne




msg:3076750
 9:09 pm on Sep 8, 2006 (gmt 0)

> .. perspective ..

I'm not complaining, just trying to understand.

Results of the GSiteCrawler Server-Test
Tested at 9/8/2006 9:05:15 PM / from 82.3.81.13:

URL=http://www.mysite.com
Result code: 200 (OK / OK)
Server: Microsoft-IIS/5.0
Content-Location: [mysite.com...]
Date: Fri, 08 Sep 2006 20:58:12 GMT
Content-Type: text/html
Accept-Ranges: bytes
Last-Modified: Wed, 06 Sep 2006 12:20:34 GMT
ETag: "6a5577daaed1c61:c3a"
Content-Length: 4163

So "Last-Modified" is being returned correctly.

g1smd




msg:3076762
 9:18 pm on Sep 8, 2006 (gmt 0)

It should be requesting http://www.domain.com/ and not http://www.domain.com/index.html I think.

I suspect that fact may turn out to be important.

ronburk




msg:3077593
 10:12 pm on Sep 9, 2006 (gmt 0)

Interesting.

It should be requesting [domain.com...] and not [domain.com...] I think.

What made you think it was requesting /index.html? Looked like the request was for the (illegal, but universally accepted) URL of [something.com,...] and the server used the Content-Location header to inform the User-Agent where the actual resource resides.

------

Although you describe this as fossilized HTML, the server says it was modified recently -- which is correct?

g1smd




msg:3077596
 10:16 pm on Sep 9, 2006 (gmt 0)

What made me think that? The fact that I have seen several hundreds of sites with that exact same problem in recent months.

Does the call for http://www.domain.com respond with a 302 or a 301 response? That short URL was my other, much more unlikely, guess.

Phil_Payne




msg:3079818
 8:39 am on Sep 12, 2006 (gmt 0)

Results of the GSiteCrawler Server-Test
Tested at 9/12/2006 8:36:31 AM / from 82.2.113.108:

URL=http://www.mysite.com
Result code: 200 (OK / OK)
Server: Microsoft-IIS/5.0
Content-Location: [mysite.com...]
Date: Tue, 12 Sep 2006 08:29:22 GMT
Content-Type: text/html
Accept-Ranges: bytes
Last-Modified: Wed, 06 Sep 2006 12:20:34 GMT
ETag: "6a5577daaed1c61:c3a"
Content-Length: 4163

What I don't understand is:

a) Why this is "bad"? I have the same "problem" on many other sites that are performing very well.

b) Why is Google downloading the index.html page repetitively - when it almost never changes - and NOT downloading the pages that do change?

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved