Welcome to WebmasterWorld Guest from

Forum Moderators: Robert Charlton & andy langton & goodroi

Message Too Old, No Replies

How to remove junk pages from google index?

6:13 am on Aug 19, 2006 (gmt 0)

Full Member

10+ Year Member

joined:May 6, 2006
votes: 0

I recently changed the structure of my site.

I changed directories and page names.

Google has many old pages indexed.

I want those junk pages to be removed.

What should i do?

I have already added those directories and pages in robots.txt file.

What else i can do?

Or should i just wait for another week or two?

4:13 pm on Aug 19, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
votes: 0

If the urls are now 404, and you've added them to robots.txt, those urls still may show as suppemental in site: search for many, many months -- and I would suggest letting Google handle them however they want to. Trying to force things can sometimes work against you.

If you are still seeing traffic coming from Google to those old urls, you might consider what they were looking for and actually serve them some relevant content. But I suggest not worrying about seeing a "perfect" site: search result.

And going forward from here, do remember that old saying -- Cool URI's Don't Change [w3.org]. So I'd say take some time right now to consider your new url naming scheme and if it can handle future growth and change for you without going through another upheaval.

8:46 pm on Aug 20, 2006 (gmt 0)

New User

10+ Year Member

joined:Aug 14, 2006
votes: 0

I had the same problem ... I took over responsibility for a site that was a mess. I rue the day that I re-organised it to be more logical and structured!

In Sitemaps, Google was reporting 404 errors on 29 pages that didn't exist anymore. I was struggling to think why Google was trying to crawl those legacy pages, so I used their "URL Removal" tool.

I've got a few of our obsolete pages listed there. It says:

2006-02-03 01:54:07 GMT :
removal of ......html

But it doesn't stop Google trying to crawl those pages though!

Just to check that I haven't got any links lurking in our code, I downloaded the entire site to my PC and did a text search for those URLs ... nothing.

However, now our complete cache in Google has gone missing!

My conclusion is that although we have no cache showing on Google results, they do have an old cache lying in some murky corner which they are using to try to index our current site.

8:52 pm on Aug 20, 2006 (gmt 0)

Full Member

10+ Year Member

joined:Jan 10, 2003
votes: 0