homepage Welcome to WebmasterWorld Guest from 54.198.140.148
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
How can I remove my pages from google cache ?
Jessica




msg:4076454
 6:04 pm on Feb 8, 2010 (gmt 0)

I have deleted the actual pages from the website.
I have installed robots.txt with Disallow All

But 3 months after, Google still has my pages cached.

How can I remove my pages from cache and my site from google?

 

TheMadScientist




msg:4076473
 6:43 pm on Feb 8, 2010 (gmt 0)

Disallow All

Is this what you have in your actual robots.txt or the 'English Version' of what you have?

How can I remove my pages from cache and my site from google?

Personally, in this situation I would be inclined to let the pages be spidered (remove the disallow from the robots.txt) and replace the content with a custom 404 (or 410) page with <meta name="robots" content="noindex,nofollow,noarchive"> in the <head>.

arieng




msg:4076488
 6:51 pm on Feb 8, 2010 (gmt 0)

There is a URL removal tool in your Google Webmaster Tools account. That should remove the page completely from the index.

Jessica




msg:4076639
 10:19 pm on Feb 8, 2010 (gmt 0)


Is this what you have in your actual robots.txt or the 'English Version' of what you have?


I have this in robots.txt:


User-agent: *
Disallow: /

TheMadScientist




msg:4076643
 10:26 pm on Feb 8, 2010 (gmt 0)

Cool, that's what it should be... Just checking.

Leaving it or serving a 404 will work with the removal tool if you decide to go that route. Either with or without the removal tool I would probably personally use a 404 (or more likely a 410) and let the URLs be spidered so there's actually a more recent version of the content (none) at the specific location than what they have cached, but if you use the removal tool it should be gone for 6 months for sure, so it may simply be about personal preference, but IMO if you have the URLs disallowed they may revert to the old cache when the 6 months is up, so I would want to tell them each page is Intentionally Removed (410, Gone) personally.

It takes a bit of knowledge of mod_rewrite or a scripting lang. to serve a Gone error, but it's my preference for pages I have removed the content from, especially over a 404. I guess you could do it if you serve a custom 404 page in PHP and then set the header as 410 Gone on the actual error page, but I haven't tried it, so double check with a Server Header Check tool (like the one in the control panel here) if you try to serve a 410 this way...

Jessica




msg:4076653
 10:36 pm on Feb 8, 2010 (gmt 0)

dammit, this is so complicated

why cant they just drop pages that aren't there anymore..

TheMadScientist




msg:4076654
 10:40 pm on Feb 8, 2010 (gmt 0)

LMAO... Yeah, you'd think it would be simple.

Why is it so complicated? Google's way of saying thanks for letting us spider your site and have access to your content in the first place... If you want us to let go of it you'll have to work for it. LOL.

If you remove the robots.txt block and set up a custom 404 page with the meta tag below they won't return the cache or the pages in the results any more, but you will have to wait for them to spider the pages to get it to take effect. It is the simplest, most straight forward way IMO.

Also, for future readers, although you should be able to serve a 410 Gone as outlined above, know if your pages are ever really missing 404 you will be serving a 410 instead, so it's not something to use, unless you know for sure you want all missing (404) pages dropped and not spidered for a longer period of time.

All you really need is to get the SEs the following and all references to the page it's on will be dropped from the results. Personally I almost always serve noarchive on my pages, even when I allow the pages to be indexed and returned in the results.

<meta name="robots" content="noindex,nofollow,noarchive">

Jessica




msg:4076672
 11:13 pm on Feb 8, 2010 (gmt 0)

i have like 2000 pages cached, so i need to re-create them again with the 404 redirect code

and that's.. time consuming

tedster




msg:4076677
 11:32 pm on Feb 8, 2010 (gmt 0)

If those pages are already gone from your server, then the minute you allow googlebot in to spider, it should get a 404 response without you needing to place a custom message or doing any redirect at all.

Then you can use WebmasterTools to remove the pages - or the entire site. either one.

TheMadScientist




msg:4076680
 11:38 pm on Feb 8, 2010 (gmt 0)

Yeah, what tedster said, and being able to set a custom 404 error page is usually standard in most hosting accounts, so by creating a single page you can serve a cool site-specific 404 page for the visitors (real people) who request a non-existent URL. IMO it's a good way to do things and I use them on almost all, if not all sites I work on.

I usually include links to 'important pages' or directories visitors might be looking for, so it's a single page and can usually be set from within your hosting account.

Do make sure you run a header check when using one to ensure it serves a 404 properly... IMO the issue may have been prolonged by disallowing the content rather than serving a 404 page even without the robots meta tag, but again IMO it definitely has been by not serving a 404 page with the meta tag I posted previously, because as soon as compliant bots get the noindex,nofollow,noarchive tag on a page (URL) and that URL is processed the page is dropped from the results.

claus




msg:4076707
 1:35 am on Feb 9, 2010 (gmt 0)

If you had not already removed the pages from the server you could use this meta code:

<meta name="googlebot" content="noarchive">

Jessica




msg:4076833
 9:38 am on Feb 9, 2010 (gmt 0)

ok, I just went to Webmaster Tools and submitted a "Remove Whole Site" request

will that take care of cache too ?

Hotclutch




msg:4076861
 10:20 am on Feb 9, 2010 (gmt 0)

If you do a site removal request, google will remove everything including your homepage. It might be quite a while before google decides to crawl your site again. Months maybe. Is that what you want?

You can do a manual removal request of a cached URL. It is better to have the meta tag noarchive and allow google to crawl the page.

BillyS




msg:4076926
 1:08 pm on Feb 9, 2010 (gmt 0)

Content removed with this tool will be excluded from the Google index for a minimum of 90 days. You can use the URL removal request tool to reinclude your content at any time during the 90-day period.


Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved