homepage Welcome to WebmasterWorld Guest from 54.204.94.228
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Hardware and OS Related Technologies / Website Technology Issues
Forum Library, Charter, Moderators: phranque

Website Technology Issues Forum

    
Custom 404 page cached in search engines
We just re-launched our portal
Susanne

10+ Year Member



 
Msg#: 4341 posted 5:02 pm on Nov 18, 2005 (gmt 0)

We just re-launched out portal with completely new file names. Our custom 404 page is served when we have any requests for old files. Now I notice that for example MSN is caching our 404 page! If this continues we will end up having thousands of pages in the indexes of search engines and half of the pages will be identical 404 pages...

Please help me prevent that, we must have done something wrong setting it up. Users are being redirected to the 404 page, is that the correct way to do it?

How does a typical crawer react to a 404 page? And how do they react to a 500 page? Would these 2 responses normally cause the file to be removed from the index?

Very grateful for any help you can give. Have a nice day!

 

encyclo

WebmasterWorld Senior Member encyclo us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4341 posted 5:07 pm on Nov 18, 2005 (gmt 0)

Is the server really returning a 404 not found error, or is it actually returning a 302 found or other header? If the server is returning a 302 or 200 then the spider may well cache the page.

Try the server header check tool [webmasterworld.com] for a non-existent page on your site to see what is happening.

Susanne

10+ Year Member



 
Msg#: 4341 posted 5:51 pm on Nov 18, 2005 (gmt 0)

Many thanks for such a quick reply!
This is the response from the header check:

HTTP/1.1 404 Not Found
Content-Length: 6833
Content-Type: text/html
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Date: Fri, 18 Nov 2005 17:45:23 GMT
Connection: keep-alive

So everything looks ok to me. Now, what is the next step in trying to solve this problem?
Thanks again, I am very grateful.

Susanne

10+ Year Member



 
Msg#: 4341 posted 7:13 pm on Nov 20, 2005 (gmt 0)

Bump!

I just hope that anyone else here has an ideas about why search engines cache our custom made 404 page. Ciao!

pageoneresults

WebmasterWorld Senior Member pageoneresults us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4341 posted 7:20 pm on Nov 20, 2005 (gmt 0)

Did you do the server header check on the 404 page? Or, on a non-existent page? The non-existent page is probably returning a 404 and the custom 404 page is probably returning a 200. This is usually the case 8 out of 10 times.

Susanne

10+ Year Member



 
Msg#: 4341 posted 7:48 pm on Nov 20, 2005 (gmt 0)

pageoneresults, you are absolutely right! Thanks a lot. So all we need to do is to block robots from indexing the page, just like one does with normal pages?

<META NAME="msnbot" CONTENT="noarchive"> for MSN
<META NAME="ROBOTS" CONTENT="NOARCHIVE"> For all other robots

and/or

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> Another way to prevent indexing

Plus a robots.txt file of course.

Right?
Many thanks!

Susanne

10+ Year Member



 
Msg#: 4341 posted 10:13 am on Nov 27, 2005 (gmt 0)

I am back... It seems the search engines igonore the META tags mentioned in my previous post. Instead of seeing a decreased amount of pages in Google, an allinurl query brings back several hundred new pages each day.

If the costum 404 page brings back a 200 in the header check tool, how can we make sure it gives a 404? Users trying to view one of our old pages are re-directed to the custom 404 page. Are there any other ways to server that page than using re-directs?

I cannot understand what we are doing wrong and I am not knowledgeable in the technical details of portal. Any of you have any ideas? Thanks for any advice.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Hardware and OS Related Technologies / Website Technology Issues
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved