Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Removing Deleted Pages From Google’s Index

         

austtr

5:36 am on Nov 7, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi … seeking some opinions about the best way to remove deleted pages from Google’s index.

An old .htm format site has been upgraded to a new Joomla 3 .php format site. The old .htm files have been totally removed from the server. Gone… they no longer exist.

However, even though there is a new sitemap.xml, and a .htaccess handling all the 301 redirects, Google is still showing the old .htm pages in its index along with the new .php pages. Everything I read says this could carry on for quite some time.

What is the best way to get the old .htm pages removed from the index? As I understand it there are three options.

1) Go to GSC and use the “Remove Outdated Content” tool (https://www.google.com/webmasters/tools/removals?pli=1)
2) Tell Google the .htm pages are gone with a 410 statement in the .htaccess
3) Do nothing. Just wait for Google to drop the old .htm entries from the index in its own good time.

Appreciate any and all suggestions.

lucy24

6:23 am on Nov 7, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Tell Google the .htm pages are gone with a 410 statement in the .htaccess

Yikes. But they're not gone, are they? I thought you meant a one-for-one oldpage.htm >> newpage.php as implied by the reference to redirecting. Then the last thing you'd want to do is claim a page is gone when it has simply moved. What happens to all that tasty link juice the old URLs have been accumulating?

Some people would at this point be saying that you can prevent all this trouble by going extensionless. (Not me, because extensionless URLs give me the fantods.) Or, for that matter, by keeping the old .htm URLs and quietly rewriting to the .php equivalent.

quite some time

Well, for a given definition of "some". When I moved sites a couple of years ago, the pages Google liked best were re-indexed at the new URL within days-- without doing a change-of-address in WMT. And that's a teeny little site that nobody ever visits. I don't know how long it took for the whole site; I stopped checking after a while.

even though there is a new sitemap.xml

Sitemaps are inclusive, not exclusive. You're telling the search engine "don't overlook suchandsuch", not "crawl only suchandsuch".

Robert Charlton

9:09 am on Nov 7, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



If I read the OP correctly, the 301s have been done, and that should take care of the situation eventually....
However, even though there is a new sitemap.xml, and a .htaccess handling all the 301 redirects, Google is still showing the old .htm pages in its index along with the new .php pages. Everything I read says this could carry on for quite some time.

Several questions...

- how long ago did you apply the redirects?
- what is returned when you enter the old urls?
- have you checked the redirects with a server header checker?
- what are you listing on your sitemap?

To double-check various...

Your sitemap should show the canonical form of your new urls only.

When you type in your old urls and hit enter, you should see the new ones.

When you check the old urls in a server header checker, you should see the 301 redirects, and the new urls should return a 200 OK response.

When you check the new urls in a server header checker, you should get 200 OK responses.

Regarding the question of Google dropping the old .htm pages , here's a good reference thread that involves redirects that include a domain change. An extension change without a domain change would be quicker, but the old extensions might show in the serps for a while....

Domain 301 Redirect - How Long to Change the Index?
Sept 2012
https://www.webmasterworld.com/google/4499653.htm [webmasterworld.com]

I would consider what g1smd says here gospel...
New one should appear in days.

Old one can take months to drop out. This is not a problem.

Do check your page code to make sure that all your nav links have been updated.

Note that what's in Google may hang around for a while. If everything else is OK, that shouldn't be a problem.

Note also that Google will periodically come back to check for the old urls for years. As long as your redirects are working properly, that should not be a concern.

Do not use the Remove Outdated Content option in the Google Search Console. Do not use 410s.

If you're sure your redirects are working, and that there are no extra hops in the redirects, just do nothing....
3) Do nothing. Just wait for Google to drop the old .htm entries from the index in its own good time.

Actually, you might also want to check your backlinks, and ask some of the sites you know that link to you to change the old urls if they can.

tangor

10:33 am on Nov 7, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The major search engines will revisit previously found urls for years to come so that should not be a great concern as long as the redirect (should be forever) are in place.

Use 410 only for DELETED GONE FOREVER content.

Like lucy24 I have a slight horror of expressionless. :)

The OP's premise indicates the content changed from static to cms, which means the content is NOT GONE. Verify all 301 redirects (keep them forever) and just give it time.

Just know that all search engines are like bull dogs. Once they bite a URL they never let go... FOREVER. (would hate to pay their support and hardware bills!)

Meanwhile, get used to 404 appearing in logs and reports. Won't hurt you, just complicates things.

Robert Charlton

7:25 pm on Nov 7, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



To phrase this in yet another way... one distinction that should be re-emphasized in this particular discussion...

...while the OP's question is about "Removing Deleted Pages From Google’s Index"...

...the situation is actually about "Removing Redirected Pages From Google's index".

When you redirect old-url to new-url with a 301, old-url is replaced by new-url. As tedster notes in the thread referenced above...
If the 301 redirect is working properly, then that's all any user agent will see - including googlebot. It doesn't matter if the original file is still sitting around somewhere, no one can access it.

Using 410s is the preferable way of telling Google to remove a page from the index when the page is simply removed. In this case, the old pages/ urls have been redirected.

Meanwhile, get used to 404 appearing in logs and reports. Won't hurt you, just complicates things.
Under what circumstances/ searches are you seeing 404s? While Google queries for the old urls might show those old urls in Google, clicking on these results in the serps should take you to the new destination urls.

Additionally, in this situation, if the redirect is working, requests for the old urls should not be returning 404s.

austtr

2:02 am on Nov 8, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks everybody......

I think I'll keep it simple.... just go with the 301 redirects and leave Google up update the index in its own good time.

Rasputin

4:48 pm on Nov 8, 2015 (gmt 0)

10+ Year Member



Ausstr, can I ask a question...although joomla of course uses php, the .php file suffix is not standard in joomla and takes a bit of effort to setup, usually by using an external joomla component to control urls or perhaps through htaccess?
Have you found an easy way to do it in more recent joomla versions?
Cheers