Welcome to WebmasterWorld Guest from 50.16.112.199

Removing a large set of pages from Google's index

   
11:38 am on Mar 11, 2014 (gmt 0)

10+ Year Member



I changed the publishing platform of a fairly large blog from Movable type to Wordpress.

Movable type paginated by using queries -
mywebsite.com/index.php?page=23

however Wordpress does the same by -
mywebsite.com/page/23/

Based on this Google crawled thousands of pages using a combination of query strings -
mywebsite.com/page/23/?page=1
mywebsite.com/page/23/?page=23
So to get rid of the pages, I created a rule in htaccess which delivers a 404 for all pages with the query "page".

I can see a lot of crawl errors for these queries in my webmaster tools. Now I want to get these pages removed from Google's index.
So what should I do - mark them as fixed so Google crawl them again and eventually deletes them or
just ignore the errors and they will go automatically?
12:51 pm on Mar 11, 2014 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



i would use mod_rewrite to 301 redirect all requests with a page= paramter in the query string to the canonical wordpress url.
1:01 pm on Mar 11, 2014 (gmt 0)

10+ Year Member



You mean redirecting mywebsite.com/?page=12 to mywebsite.com/page/12/ ?

Well I did that for a year with .htaccess, but searching on Google with the site parameter still yielded results with the query mark.

So I resorted to 404.
2:31 pm on Mar 11, 2014 (gmt 0)

WebmasterWorld Senior Member Top Contributors Of The Month



I think phranque means redirect:

mywebsite.com/page/23/?page=23

to

mywebsite.com/page/23/

And you can't remove redirects after a year or 5 or 10 and hope any major SE will just stop spidering them, because there are many times URLs are reverted or reused after a 301 is in place, so they keep checking periodically to make sure the redirect is still in place.
2:32 pm on Mar 11, 2014 (gmt 0)

5+ Year Member



Removing pages from Google takes a very long time. I've had thousands of pages go noindex and two months later, they are still in the index.
1:41 pm on Mar 12, 2014 (gmt 0)



It is difficult to remove page from google index, Googlebot remember every url, never forget.
 

Featured Threads

My Threads

Hot Threads This Week

Hot Threads This Month