Welcome to WebmasterWorld Guest from 54.147.44.13

Message Too Old, No Replies

How can I remove my pages from google cache ?

     
6:04 pm on Feb 8, 2010 (gmt 0)

Junior Member

10+ Year Member

joined:July 3, 2003
posts:144
votes: 1


I have deleted the actual pages from the website.
I have installed robots.txt with Disallow All

But 3 months after, Google still has my pages cached.

How can I remove my pages from cache and my site from google?
6:43 pm on Feb 8, 2010 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member themadscientist is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 14, 2008
posts:2910
votes: 62


Disallow All

Is this what you have in your actual robots.txt or the 'English Version' of what you have?

How can I remove my pages from cache and my site from google?

Personally, in this situation I would be inclined to let the pages be spidered (remove the disallow from the robots.txt) and replace the content with a custom 404 (or 410) page with <meta name="robots" content="noindex,nofollow,noarchive"> in the <head>.
6:51 pm on Feb 8, 2010 (gmt 0)

Preferred Member

5+ Year Member

joined:Nov 2, 2006
posts:410
votes: 0


There is a URL removal tool in your Google Webmaster Tools account. That should remove the page completely from the index.
10:19 pm on Feb 8, 2010 (gmt 0)

Junior Member

10+ Year Member

joined:July 3, 2003
posts:144
votes: 1



Is this what you have in your actual robots.txt or the 'English Version' of what you have?


I have this in robots.txt:


User-agent: *
Disallow: /
10:26 pm on Feb 8, 2010 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member themadscientist is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 14, 2008
posts:2910
votes: 62


Cool, that's what it should be... Just checking.

Leaving it or serving a 404 will work with the removal tool if you decide to go that route. Either with or without the removal tool I would probably personally use a 404 (or more likely a 410) and let the URLs be spidered so there's actually a more recent version of the content (none) at the specific location than what they have cached, but if you use the removal tool it should be gone for 6 months for sure, so it may simply be about personal preference, but IMO if you have the URLs disallowed they may revert to the old cache when the 6 months is up, so I would want to tell them each page is Intentionally Removed (410, Gone) personally.

It takes a bit of knowledge of mod_rewrite or a scripting lang. to serve a Gone error, but it's my preference for pages I have removed the content from, especially over a 404. I guess you could do it if you serve a custom 404 page in PHP and then set the header as 410 Gone on the actual error page, but I haven't tried it, so double check with a Server Header Check tool (like the one in the control panel here) if you try to serve a 410 this way...
10:36 pm on Feb 8, 2010 (gmt 0)

Junior Member

10+ Year Member

joined:July 3, 2003
posts:144
votes: 1


dammit, this is so complicated

why cant they just drop pages that aren't there anymore..
10:40 pm on Feb 8, 2010 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member themadscientist is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 14, 2008
posts:2910
votes: 62


LMAO... Yeah, you'd think it would be simple.

Why is it so complicated? Google's way of saying thanks for letting us spider your site and have access to your content in the first place... If you want us to let go of it you'll have to work for it. LOL.

If you remove the robots.txt block and set up a custom 404 page with the meta tag below they won't return the cache or the pages in the results any more, but you will have to wait for them to spider the pages to get it to take effect. It is the simplest, most straight forward way IMO.

Also, for future readers, although you should be able to serve a 410 Gone as outlined above, know if your pages are ever really missing 404 you will be serving a 410 instead, so it's not something to use, unless you know for sure you want all missing (404) pages dropped and not spidered for a longer period of time.

All you really need is to get the SEs the following and all references to the page it's on will be dropped from the results. Personally I almost always serve noarchive on my pages, even when I allow the pages to be indexed and returned in the results.

<meta name="robots" content="noindex,nofollow,noarchive">
11:13 pm on Feb 8, 2010 (gmt 0)

Junior Member

10+ Year Member

joined:July 3, 2003
posts:144
votes: 1


i have like 2000 pages cached, so i need to re-create them again with the 404 redirect code

and that's.. time consuming
11:32 pm on Feb 8, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


If those pages are already gone from your server, then the minute you allow googlebot in to spider, it should get a 404 response without you needing to place a custom message or doing any redirect at all.

Then you can use WebmasterTools to remove the pages - or the entire site. either one.
11:38 pm on Feb 8, 2010 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member themadscientist is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 14, 2008
posts:2910
votes: 62


Yeah, what tedster said, and being able to set a custom 404 error page is usually standard in most hosting accounts, so by creating a single page you can serve a cool site-specific 404 page for the visitors (real people) who request a non-existent URL. IMO it's a good way to do things and I use them on almost all, if not all sites I work on.

I usually include links to 'important pages' or directories visitors might be looking for, so it's a single page and can usually be set from within your hosting account.

Do make sure you run a header check when using one to ensure it serves a 404 properly... IMO the issue may have been prolonged by disallowing the content rather than serving a 404 page even without the robots meta tag, but again IMO it definitely has been by not serving a 404 page with the meta tag I posted previously, because as soon as compliant bots get the noindex,nofollow,noarchive tag on a page (URL) and that URL is processed the page is dropped from the results.
1:35 am on Feb 9, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 15, 2003
posts:2395
votes: 0


If you had not already removed the pages from the server you could use this meta code:

<meta name="googlebot" content="noarchive">
9:38 am on Feb 9, 2010 (gmt 0)

Junior Member

10+ Year Member

joined:July 3, 2003
posts:144
votes: 1


ok, I just went to Webmaster Tools and submitted a "Remove Whole Site" request

will that take care of cache too ?
10:20 am on Feb 9, 2010 (gmt 0)

New User

5+ Year Member

joined:Jan 17, 2010
posts:1
votes: 0


If you do a site removal request, google will remove everything including your homepage. It might be quite a while before google decides to crawl your site again. Months maybe. Is that what you want?

You can do a manual removal request of a cached URL. It is better to have the meta tag noarchive and allow google to crawl the page.
1:08 pm on Feb 9, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member billys is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 1, 2004
posts:3181
votes: 0


Content removed with this tool will be excluded from the Google index for a minimum of 90 days. You can use the URL removal request tool to reinclude your content at any time during the 90-day period.

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members