What does "no longer exists on the web" mean?

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

What does "no longer exists on the web" mean?

Clarification of Google's URL Removal Tool

Glitzer

4:21 pm on Jan 2, 2007 (gmt 0)

Hi,

I want to remove one of our domains from Google because G has pages indexed (supplemental) that are almost the same for our new site on another domain. I fear that it may be causing a low ranking of the new site. We'll probably be using that domain name within the next year for a new product line.

We've had robots.txt set to not index anything on that site since last spring or summer, but it's still in the G index.

The G removal tool says, "Enter the URL of your page. We will accept your request only if the page no longer exists on the web."

What does "no longer exists on the web" mean? That domain really does exist on the web, that is, you can go to it. Does the statement mean that it won't remove the site if the domain is still in the DNS?

Quadrille

12:56 am on Jan 3, 2007 (gmt 0)

If you have duplicate page issues, you'd do much better to deal with the problem; remove the pages.

Neither robots.txt nor the removal tool are a reliable and safe way to deal with the problem, particularly if links exist to the pages in question.

g1smd

2:36 pm on Jan 4, 2007 (gmt 0)

You can only remove pages that return a 404 error if you are using the page removal tool.

If you want to use the robots removal option, then you instead feed Google the URL of a valid robots.txt file for it to process.

In any case, these processes do not properly deal with results that are already tagged as Supplemental in Google's SERP.

Glitzer

3:54 pm on Jan 4, 2007 (gmt 0)

Bummer. We use the second domain as a development and test site for the live one.

I don't understand why G just doesn't obey the robots.txt and sitemaps file. It would certainly make their life easier and the SE up-to-date if they did.

They've been restricted in robots.txt for almost a year, yet refuses to omit those pages.

I'll submit a sitemap to them with nothing in it and see what happens.

tedster

4:35 pm on Jan 4, 2007 (gmt 0)

There's another option in the url removal tool -- one that asks googlebot to re-spider your robots.txt and remove what is disallowed by the file. I used it twice since November and it was trouble-free for me.

Glitzer

5:01 pm on Jan 4, 2007 (gmt 0)

Hi,

Okay, I submitted the robots.txt to be re-spidered by G.

I was just at G Sitemaps and registered a sitemap for that URL with zero urls.

Let's see if these work.

My guess is that they won't remove pages already indexed or cached or in the supplementals. Why? In the past, I had a Sitemap omitting old pages, months later they still existed somewhere in the indexes.

Stay tuned.

tedster

5:23 pm on Jan 4, 2007 (gmt 0)

The sitemap will not remove urls not found on it, as you supposed. But if you put up a new robots.txt AND used that part of Google's url removal tool that says "Remove pages, subdirectories or images using a robots.txt file." the you are on your way.

This option also allows you to create a dedicated robot.txt file at some other address than the standard root robot.txt -- note "Your robots.txt file need not be in the root directory".

I never tried that, but I can imagine how someone might want it in some situations. However in your situation, you never want bots to spider your dev server, so you probably do want to use the standard robots.txt file to do the url removal.