Just a few questions to help understand the big picture, it helps speed up the help. By any chance are all the files in any of your folders all pages you wish to return a 410 for? Or should I say of the folders that contain files that are being dropped, do they also contain files you want to keep?
Have you noindexed the pages you want to disappear to help remove them from the serps and avoid a lot of 404s? If you currently have entire foldersthat will not exist after the change, have you looked into removing those directories in GWT? Do you have plenty of time to make these changes or this is needed before Fri?
it always comes down to whether or not you can create a pattern or set of patterns using perl compatible regular expressions that are sufficient to include or exclude the set of urls you need to 410 or redirect or rewrite to the WP script.
No, I do not want to keep any of the files within these folders. The entire folder needs to go. However, there are some other pages in the root that need to go.
|Or should I say of the folders that contain files that are being dropped, do they also contain files you want to keep? |
Yes, I'm in the process of adding the following code in the head tag -
|Have you noindexed the pages you want to disappear to help remove them from the serps and avoid a lot of 404s? |
<meta name="robots" content="noindex,follow">
Is this correct?
From what I've read so far, the general consensus says that removing multiple directions in GWT should only be used in times of extreme emergency. Is that so?
|If you currently have entire folders that will not exist after the change, have you looked into removing those directories in GWT? |
I was hoping to get all this done by the 15th of this month. Possible?
|Do you have plenty of time to make these changes or this is needed before Fri? |
* In what order should these tasks be accomplished and how much time gap should be given -
- Adding no index to 50+ pages
- Deleting the pages and adding 410 to .htaccess for these pages
- Removing the folders in GWT
- The actual WP move.
You can do the gwt part first. This gives you a cushion of 90 days to make the permanent changes.
If you delete the pages, there's obviously no point to adding <noindex> or any other content to them. Note that once you've coded to return a 410, it makes absolutely no difference whether the pages physically exist or not.
So I'd say:
#2 code for 410
#3 physical move, including both wp conversion and physically deleting pages --but, as always, keep a backup somewhere!
Removing directories in GWT means they come out of the index right away, so in that respect you might consider whether that is what you want to do. They have a set of steps for submitting your request that is not terribly intuitive. They will tell you not to disallow crawling in robots.txt although that may seem what you want to do to make things go away. It is not difficult or extreme, but it requires following their process.
I migrated a small site to Wordpress in 2009 and 301 redirected each page to its equivalent WP page. I thought I did all I was supposed to do and they are still looking for example.com/oldpage.html and seem surprised to still get a 404. It will happen, don't be alarmed, it is what they do. (That site is doing fine).
Any ideas for the 410 coding for folders and some static pages in the root? I've tried to search for help on the code but can't find anything specific.
I recently had 450+ pages being replaced and I had noindexed them for sufficient time so I replaced the entire contents of the pages with:
<? Header( "HTTP/1.1 410 Gone" ); ?>
(This assumes that you have settings to handle php as html because the pages were saved with the same pagenames as the originals and just uploaded to replace them all.) Best practice? Maybe not, but it returns a 410 and I did not see any adverse effects. It did the job I needed to get done without further embiggening my .htaccess file. As I saw them reported in GWT they were deleted and by doing the noindex and 410, they have not caused 404s.
|Any ideas for the 410 coding for folders and some static pages in the root? I've tried to search for help on the code but can't find anything specific. |
Well, you're not going to find code that spells out your identical filenames ;) The basic pattern is
RewriteRule ^(dir1|dir2|otherdir) - [G]
with opening anchor but no closing anchor. Also leave off the closing directory slash unless there is a risk of ambiguity. Search engines will ask for directories without final slash, even if you've never used the URL anywhere. (I didn't know this until I moved sites a few months ago and put a close watch on redirects. I think it's because they have no way of knowing whether it's a physical directory.)
You said at the outset that you're moving to WordPress. That means you'll be using mod_rewrite. Check your existing htaccess and make sure you don't have any redirects using mod_alias (Redirect by that name). If there are any, they'll need to be converted to mod_rewrite syntax.
|Well, you're not going to find code that spells out your identical filenames ;) The basic pattern is RewriteRule ^(dir1|dir2|otherdir) - [G] |
:) This is good enough. Thanks but what about individual .htm file names? How does one add those?
It depends on exactly how many URLs are involved and where they're located. If the isolated pages are at the root, and there are only a few of them, you might shove it all into one pattern:
But if the isolated pages are inside various other directories, it will probably be easier to make separate rules. These are simple, conditionless rules:
The important thing is to put all your new rules before the parts of htaccess that are specific to WordPress. Opening anchors may or may not be necessary for proper rule execution*; they definitely help the server because it can see instantly if a request will match the pattern.
* When I moved sites, I initially came to some grief because I hadn't realized just how many different subdirectories I have with rats/ in the name :(
Thanks lucy24, not2easy. I get to work now!
A quick update -
All the pages that I don't want to carry forward to the new website have a noindex code (code inserted a week back). GWT shows the status as Removed for all the pages / folders I added to Remove URL.
Is this the right time to return a 410, move to WP & delete old pages?
Yes, the new WP pages should be in place and your robots.txt edited for the new sitemaps.
Just so you know:
When you use "Remove URLs/Directories" in GWT they will come back looking for them in case you changed your mind, they aren't gone "forever" until Google crawls the 410s. (and maybe not even then?)
When a URL returns 410 Gone several times in a row, Google assigns a lower crawl priority to it. They will still come looking for the page almost forever but it might only be at a rate of once or twice per year.
Thanks, g1smd, that's why I mentioned it - wouldn't want the OP to stop serving the 410s added previously until they have shown up in GWT. The URLs removed by requesting URL removal are only a temporary fix. I found out the hard way.
Everything in place. Sitting back and admiring the 'new' and improved website. Couldn't have done it without all the help from members here. Thank you once again.
(Waiting for the dust to settle.)
Dust settling can take some time, I am just beginning to see new activity on a new directory that replaced some long term pages over 5 months ago. They can move with all deliberate speed.