How do I make google flush its cache?

Forum Moderators: open

Message Too Old, No Replies

How do I make google flush its cache?

the_nerd

9:56 am on Aug 26, 2003 (gmt 0)

Unfortunately I did websites before I started reading here - so a lot of things have to be cleaned up.

One thing is: I used different spellings for the same site (index.htm or index.HTM). They are both identical, but google seems to treat them as different pages. Now, I made all links lower case. And I put the wrong spellings in my robots.txt file. Google doesn't spider the excluded pages any more, so the robots-Syntax seems to be ok. But the double pages still linger in the cache and probably are regarded as multiple content (all pages do have page rank, though)

Any idea how I can get the old pages deleted without putting the real pages at risk?

ciml

10:13 am on Aug 26, 2003 (gmt 0)

> But the double pages still linger in the cache and probably are regarded as multiple content

If they were seen as duplicate then they would just be merged; one of the two URLs would inherit the backlinks and PageRank of the other.

Now that you've unlinked the URLs, they should eventually disappear but over this year unlinked URLs haven't gone with the smae regularity as we were used to previously.

To have one removed, I would suggest removing the /robots.txt exclusion and using a robots meta tag to exclude them (eg. you could use XSSI or similar). Then, when the robot visits it should remove them.

the_nerd

12:00 pm on Aug 26, 2003 (gmt 0)

thanks ciml,

in that case I'll just wait - I thought it was my fault they'd keep the old pages.

What's XSSI (extended server side includes?)

But if I used the meta tag - I would have to know which spelling the bot used to access the page (e.g. "index.cfm" oder "InDex.cfm"). (which isn't hard to code, but I'm not sure sure how reliable the info is.)

btw. does gbot spider pages that don't have any more links to them anyway?

ciml

1:47 pm on Aug 26, 2003 (gmt 0)

Sorry, XSSI are extended server side includes in Apache, I don't know the Cold Fusion functions. Apache would reliably make the environment variables available to XSSI or a CGI script, so I don't see why you couldn't do similar on other platforms.

> does gbot spider pages that don't have any more links to them anyway?

Normally I think not, but it can happen. For example, some pages with no links since early this year are having their links crawled.

dougmcc1

3:04 pm on Aug 26, 2003 (gmt 0)

I would suggest removing the /robots.txt exclusion and using a robots meta tag to exclude them

Would it hurt to do both? What would be bad about excluding a page from a robots.txt file AND using a meta tag to exclude it?

ciml

6:49 pm on Aug 26, 2003 (gmt 0)

/robots.txt exclusion does not ask an engine to remove a URL from its index, only not to fetch the page. Google interpret META tag robots exclusion as a request to keep a URL from their index.

If the URL is /robots.txt excluded, then Google won't request the URL and it won't be able to find the META tag robots exclusion.

dougmcc1

11:31 pm on Aug 26, 2003 (gmt 0)

/robots.txt exclusion does not ask an engine to remove a URL from its index

But the SE would end up dropping that page from it's index anyways wouldn't it? For all they know, the page doesn't even exist anymore. But that page could also be part of a members only area now, it could contain sensitive information that the site doesn't want crawled (and was an accident that it was able to get crawled in the first place), or it could be a spammy page which the site is afraid of getting penalized for due to a new filter.

The point is, with all the reasons not to keep a disallowed page in their index, I can't think of one reason why a search engine would want to. And if they did, how would they know when that page doesn't exist anymore? With that, a page could stay indexed indefinately because the SE would never know when it didn't exist anymore.

I'm not arguing against you, just oferring my reasoning. I don't have any hard facts or prior experience to base my thoughts off of, but maybe you do. And maybe I misunderstood you in the first place ;)

ciml

12:35 pm on Aug 28, 2003 (gmt 0)

> But the SE would end up dropping that page from it's index anyways wouldn't it?

Unless things have changed recently, Google will list a /robots.txt excluded URL without fetching it (so no title or snippet). This can allow members-only URLs to be listed if people link to them.

Sensitive information that the site doesn't want crawled cannot be protected by /robots.txt, which is merely a mechanism to request polite robots not to fetch those URLs.

dougmcc1

1:49 pm on Aug 28, 2003 (gmt 0)

Google will list a /robots.txt excluded URL without fetching it (so no title or snippet).

I never knew that, that's good to know. Thanks ciml.