homepage Welcome to WebmasterWorld Guest from 23.22.194.120
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
How to add multiple webpage URL's to .htaccess
410 (gone Forever)
contentmaster




msg:4669271
 3:46 pm on May 7, 2014 (gmt 0)

I am migrating from an html website to WordPress. Several sections on the website are no longer relevant. Hence, we've left those pages as is and created all the other pages in WP.

We are now close to migrating the website to WP and want to know how to return 410 for all the webpages that we don't plan to carry forward to WP and will be eventually deleted.

FYI: Many webpages are located in the root folder & some others in separate folders. Hence, we're dealing with webpages like these -

http://www.example.net/some-random-page.htm
http://www.example.net/random-folder/some-random-page.htm

Can someone help me with the code I need to add to the .htaccess file and the format for these pages and folders? (My understanding of coding is quite limited)

This is the code we're starting with -

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress

Any help will be highly appreciated.

[edited by: Ocean10000 at 4:45 pm (utc) on May 7, 2014]
[edit reason] Fixed URL Displayed [/edit]

 

not2easy




msg:4669289
 4:27 pm on May 7, 2014 (gmt 0)

Just a few questions to help understand the big picture, it helps speed up the help. By any chance are all the files in any of your folders all pages you wish to return a 410 for? Or should I say of the folders that contain files that are being dropped, do they also contain files you want to keep?

Have you noindexed the pages you want to disappear to help remove them from the serps and avoid a lot of 404s? If you currently have entire foldersthat will not exist after the change, have you looked into removing those directories in GWT? Do you have plenty of time to make these changes or this is needed before Fri?

phranque




msg:4669387
 7:29 am on May 8, 2014 (gmt 0)

it always comes down to whether or not you can create a pattern or set of patterns using perl compatible regular expressions that are sufficient to include or exclude the set of urls you need to 410 or redirect or rewrite to the WP script.

contentmaster




msg:4669402
 8:18 am on May 8, 2014 (gmt 0)

Or should I say of the folders that contain files that are being dropped, do they also contain files you want to keep?
No, I do not want to keep any of the files within these folders. The entire folder needs to go. However, there are some other pages in the root that need to go.

Have you noindexed the pages you want to disappear to help remove them from the serps and avoid a lot of 404s?
Yes, I'm in the process of adding the following code in the head tag -

<head>
<title></title>
<meta name="robots" content="noindex,follow">
</head>
Is this correct?

If you currently have entire folders that will not exist after the change, have you looked into removing those directories in GWT?
From what I've read so far, the general consensus says that removing multiple directions in GWT should only be used in times of extreme emergency. Is that so?

Do you have plenty of time to make these changes or this is needed before Fri?
I was hoping to get all this done by the 15th of this month. Possible?

* In what order should these tasks be accomplished and how much time gap should be given -

- Adding no index to 50+ pages
- Deleting the pages and adding 410 to .htaccess for these pages
- Removing the folders in GWT
- The actual WP move.

lucy24




msg:4669572
 7:44 pm on May 8, 2014 (gmt 0)

You can do the gwt part first. This gives you a cushion of 90 days to make the permanent changes.

If you delete the pages, there's obviously no point to adding <noindex> or any other content to them. Note that once you've coded to return a 410, it makes absolutely no difference whether the pages physically exist or not.

So I'd say:
#1 gwt
#2 code for 410
#3 physical move, including both wp conversion and physically deleting pages --but, as always, keep a backup somewhere!

not2easy




msg:4669595
 8:48 pm on May 8, 2014 (gmt 0)

Removing directories in GWT means they come out of the index right away, so in that respect you might consider whether that is what you want to do. They have a set of steps for submitting your request that is not terribly intuitive. They will tell you not to disallow crawling in robots.txt although that may seem what you want to do to make things go away. It is not difficult or extreme, but it requires following their process.

I migrated a small site to Wordpress in 2009 and 301 redirected each page to its equivalent WP page. I thought I did all I was supposed to do and they are still looking for example.com/oldpage.html and seem surprised to still get a 404. It will happen, don't be alarmed, it is what they do. (That site is doing fine).

contentmaster




msg:4669738
 2:22 pm on May 9, 2014 (gmt 0)

Any ideas for the 410 coding for folders and some static pages in the root? I've tried to search for help on the code but can't find anything specific.

not2easy




msg:4669761
 2:48 pm on May 9, 2014 (gmt 0)

I recently had 450+ pages being replaced and I had noindexed them for sufficient time so I replaced the entire contents of the pages with:
<? Header( "HTTP/1.1 410 Gone" ); ?>

(This assumes that you have settings to handle php as html because the pages were saved with the same pagenames as the originals and just uploaded to replace them all.) Best practice? Maybe not, but it returns a 410 and I did not see any adverse effects. It did the job I needed to get done without further embiggening my .htaccess file. As I saw them reported in GWT they were deleted and by doing the noindex and 410, they have not caused 404s.

lucy24




msg:4669827
 8:17 pm on May 9, 2014 (gmt 0)

Any ideas for the 410 coding for folders and some static pages in the root? I've tried to search for help on the code but can't find anything specific.

Well, you're not going to find code that spells out your identical filenames ;) The basic pattern is

RewriteRule ^(dir1|dir2|otherdir) - [G]

with opening anchor but no closing anchor. Also leave off the closing directory slash unless there is a risk of ambiguity. Search engines will ask for directories without final slash, even if you've never used the URL anywhere. (I didn't know this until I moved sites a few months ago and put a close watch on redirects. I think it's because they have no way of knowing whether it's a physical directory.)

<topic drift>
You said at the outset that you're moving to WordPress. That means you'll be using mod_rewrite. Check your existing htaccess and make sure you don't have any redirects using mod_alias (Redirect by that name). If there are any, they'll need to be converted to mod_rewrite syntax.
</topic drift>

contentmaster




msg:4670002
 4:36 pm on May 10, 2014 (gmt 0)

Well, you're not going to find code that spells out your identical filenames ;) The basic pattern is RewriteRule ^(dir1|dir2|otherdir) - [G]


:) This is good enough. Thanks but what about individual .htm file names? How does one add those?

lucy24




msg:4670083
 8:05 pm on May 10, 2014 (gmt 0)

It depends on exactly how many URLs are involved and where they're located. If the isolated pages are at the root, and there are only a few of them, you might shove it all into one pattern:

^(dir1|dir2|dir3|(page1|page2)\.html)

But if the isolated pages are inside various other directories, it will probably be easier to make separate rules. These are simple, conditionless rules:

^dir4/(page1|page2)\.html

^dir5/(page3|page4)\.html

The important thing is to put all your new rules before the parts of htaccess that are specific to WordPress. Opening anchors may or may not be necessary for proper rule execution*; they definitely help the server because it can see instantly if a request will match the pattern.


* When I moved sites, I initially came to some grief because I hadn't realized just how many different subdirectories I have with rats/ in the name :(

contentmaster




msg:4670362
 3:47 am on May 12, 2014 (gmt 0)

Thanks lucy24, not2easy. I get to work now!

contentmaster




msg:4674326
 9:08 am on May 25, 2014 (gmt 0)

A quick update -
All the pages that I don't want to carry forward to the new website have a noindex code (code inserted a week back). GWT shows the status as Removed for all the pages / folders I added to Remove URL.

Is this the right time to return a 410, move to WP & delete old pages?

not2easy




msg:4674356
 3:32 pm on May 25, 2014 (gmt 0)

Yes, the new WP pages should be in place and your robots.txt edited for the new sitemaps.

Just so you know:
When you use "Remove URLs/Directories" in GWT they will come back looking for them in case you changed your mind, they aren't gone "forever" until Google crawls the 410s. (and maybe not even then?)

g1smd




msg:4674377
 7:52 pm on May 25, 2014 (gmt 0)

When a URL returns 410 Gone several times in a row, Google assigns a lower crawl priority to it. They will still come looking for the page almost forever but it might only be at a rate of once or twice per year.

not2easy




msg:4674386
 9:09 pm on May 25, 2014 (gmt 0)

Thanks, g1smd, that's why I mentioned it - wouldn't want the OP to stop serving the 410s added previously until they have shown up in GWT. The URLs removed by requesting URL removal are only a temporary fix. I found out the hard way.

contentmaster




msg:4674939
 3:32 pm on May 27, 2014 (gmt 0)

Everything in place. Sitting back and admiring the 'new' and improved website. Couldn't have done it without all the help from members here. Thank you once again.

(Waiting for the dust to settle.)

not2easy




msg:4674949
 3:51 pm on May 27, 2014 (gmt 0)

Dust settling can take some time, I am just beginning to see new activity on a new directory that replaced some long term pages over 5 months ago. They can move with all deliberate speed.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved