Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

How much time Google takes to remove the content from it's database

content , Google, removal, robot.txt, deleted, oudated

         

abhishekkaushik

1:13 pm on Apr 28, 2007 (gmt 0)

10+ Year Member



Does anyone has idea how much time it takes the Google content removal tool to process. Some of my outdated pages which i have deleted are still in the Google's database and I don't want to display them in search result anymore. To do this i have used Google's content removal tool two week back. To make removal sure I added a robot.txt file into my web space and listed the 20+ deleted URLs. After sometime Google says only one URL is blocked by robot.txt file.

even after two week my pages are still listed in the Google "site:example.com" search result. What will happen in the future I have no idea, is anyone gone through this plz help me.

tedster

7:15 pm on May 4, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



After sometime Google says only one URL is blocked by robot.txt file.

Two weeks is too long -- it's always been mere days for me and others report the same. The message you got is what they saw -- so I'd suggest you check the syntax of your robots.txt file for an error.

[edited by: tedster at 5:00 pm (utc) on May 5, 2007]

abhishekkaushik

3:30 pm on May 5, 2007 (gmt 0)

10+ Year Member



Thanks tedster for your reply....You say I must wait more time but I hope not much at this progress rate. The idea for checking syntax of robot.txt is nice I will definitely try this.

jdMorgan

3:58 pm on May 5, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The way I read the Removal Tool instructions is this:

If you want to remove a page from Google, make sure that that page returns a 404-Not Found when requested, and make sure that Google is allowed to fetch that page's URL -- The URL for the page to be removed must not be Disallowed in robots.txt until after the page has been successfully removed (and if the page is really gone, then there's no need to Disallow it in robots.txt anyway).

If the page is Disallowed in robots.txt, Googlebot won't fetch it. And if Googlebot doesn't fetch it, it can't see the 404, and the page won't be removed.

Jim

jd01

4:24 pm on May 5, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If that doesn't work, you might try serving a Gone ('The resource you have requested has been removed, please remove all references to it.') Error for the locations via .htaccess. (If you are on Apache and can use mod_rewrite.)

RewriteEngine on
RewriteRule ^the-path/to-the/page\.html$ - [G]
RewriteRule ^the-path/to-the/page2\.html$ - [G]
RewriteRule ^a-path/to-another/page\.html$ - [G]

Justin

It does say on the WMT page the robots.txt block should be effective, but I know others have also said pages are still listed, so I would follow jdMorgan's advice and try one of the other exclusion methods, which allows for Googlebot to request the page and receive an error.

tedster

5:18 pm on May 5, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The URL for the page to be removed must not be Disallowed in robots.txt until after the page has been successfully removed

That's not how I read it. Even before the url removal tool was added to the Google Webmaster Tools area, there was a robots.txt method in the public tool - I used it for a legal issue not too long ago and it worked in three days.

The instructions that I see now do include a robots.txt method -- see this quote (red emphasis added).

If you want to remove a page, image, directory, or your entire site, you must do one of the following before proceeding:

  • Ensure requests for the page return an HTTP status code of either 404 or 410.
  • Ensure that the pages you want to remove have been blocked using a robots.txt file.
  • Ensure that the pages you want to remove have been blocked using a meta noindex tag.

    [google.com...]

    -------

    There is a change that no one here has mentioned so far. Now there is a REINCLUDE method you can use from within your webmaster tools account BEFORE 180 DAYS HAS EXPIRED. Previously, once you asked for a url removal, you were stuck with that situation for six months.

    When you use the URL removal request tool to remove content from the Google index, your content is removed for a minimum of 180 days. However, you can reinclude your content at any time during the 180-day period by following these simple steps...

    ...Pending reinclusion requests are usually processed within 3-5 business days.

    [google.com...]

    -------

    abhishekkaushik, you wrote: "You say I must wait more time but I hope not much at this progress rate."

    I definitely do not think you should wait - I think Google has already done everything they are going to do from your original request. There appears to be an error with all except one of the urls in your request - I think you should find that error if you can, and then make another request.

  • jdMorgan

    5:34 pm on May 5, 2007 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    Well, it may be possible to ask Google to fetch a Disallowed page, but why mess with complications? The robots.txt method and the 404 method should (by current Web standards) be mutually-exclusive, and so simple is probably better.

    Jim