homepage Welcome to WebmasterWorld Guest from 54.197.183.230
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Hardware and OS Related Technologies / Website Technology Issues
Forum Library, Charter, Moderators: phranque

Website Technology Issues Forum

    
How to let search engine NOT update the webpage
another strange question about tell SE not update page when it come
Eric

5+ Year Member



 
Msg#: 3605946 posted 1:19 am on Mar 20, 2008 (gmt 0)

Hi there

I always have strange questions:

Example: www.sample.com/testpage.htm
Was clawed and cached by search engine already
And it will regularly come the check if update

Here is a situation

The testpage.htm is generating by 3 data sources regularly, one day 1 data source has temporary problem but I will still generate the page from another two sources to serve the normal visitors.

Here is the question, when search engine robot comes, if I do nothing, it will claw and update the page. So I want to know, what I can do to tell search engine this page is temporarily half-baked and do not update them database.

Any ideas, thanks

 

robsoles

5+ Year Member



 
Msg#: 3605946 posted 11:31 am on Mar 22, 2008 (gmt 0)

Hi Eric,

If I was doing this I would add a couple of <meta> tags to the <head> section of the incomplete page, just leave them out when it's complete and I don't mind client's cache being updated:

<meta name="robots" content="noarchive" />
<meta http-equiv="Pragma" CONTENT="no-cache" />

If I could make /testpage.htm dynamic by using error doc in IIS or addHandler in Apache I would use Server level signaling to tell browsers not to cache the material.

How dynamic is your process for assembling /testpage.htm ?

Eric

5+ Year Member



 
Msg#: 3605946 posted 9:51 pm on Mar 23, 2008 (gmt 0)

If I use "no-cache" or "norchive", or if the page is dynamic, put http header "Pragma" as "no-cache", or "nocache" Cache-Control, etc etc. The question is, will search engine delete the page in them database and wait till the problem fixed?

Btw, if the page is dynamic, what http status code will you recommend? 304 Not Modified if request include HTTP_IF_MODIFIED_SINCE, or 503 Service Unavailable with Retry-After? or some other ideas?

Thanks

robsoles

5+ Year Member



 
Msg#: 3605946 posted 11:44 pm on Mar 23, 2008 (gmt 0)

No, it won't delete the page for these signals and if you want the page to do as best it can in SERPs then the last thing you want is for it to be completely removed from their databases. Don't purposefully set your server up to send an error code as a status, '200 OK' or '304 Not Modified'(-if specified by request) are the only 'healthy' things to send, anything else is a likely mark against the 'importance' of the page.

To get it deleted from their database you can mark your page as 'noindex' in the 'robots' meta and still have to go to Google Webmasters Tools and request the page be removed from their index manually anyway - that has a tendancy of lasting 6 months if you request removal, wait till removed and go request re-inclusion immediately, 6 months last time anyone I heard. It's slightly more substantial to disallow it in robots.txt but still may have to go request removal if already indexed.

I think you should put <meta name="robots" content="noarchive" /> in the <head> section permanently, it is pointless letting SEs cache this constantly changing page anyway, using this directive on the incomplete page alone doesn't seem unreasonable to me though.

If you are running such a page and you've found that an older page is losing positions in the SERPs for keywords it has always done well for it might be worth your while to mark the constantly changing page 'noindex' as well (maybe even disallow it in robots.txt), depends on if you particlularly want/need it indexed by them.

What is the change frequency of this page Eric?

Thinking a little more about it, why can't you just use 'the last known usable update' from each of your three sources and never have an incomplete or broken page of it?

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Hardware and OS Related Technologies / Website Technology Issues
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved