Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Are 50,000 404 URL's VS 1,500 Working URL's a Major Problem?

         

Azri

8:11 am on Jan 19, 2015 (gmt 0)

10+ Year Member



As part of moving one of our sites to a SharePoint CMS. While the Domain will remain the same, all the inner URL's are going to change.

Usually we would implement 301 redirects, at least to important URL's with a lot of traffic or important back-links, but due to various reasons, the IT department might not be able to perform the 301 redirects.

This means that practically the upgraded site will have around 1,500 working url's (status 200) and around 50,000 404 url's.

Assuming there will be no redirects, do you think this 33 to 1 ratio in favor for the 404 url's will be a major SEO problem?
Will this be a just fresh (Tabula Rasa) start from 0, or the site could get penalized or degraded, making it very hard to recover?

Is it enough to have an excellent 404 page?

Thanks in advance
Azri,

bwnbwn

6:03 pm on Jan 19, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I would say yes it is a major problem. What I don't understand is why the urls can't be redirected. Something just doesn't sound right and I wouldn't settle for it won't work because I can't think of a CMS that can't be redirected.

50k urls going to a 404 page and 1500 working your traffic is gonna take a nice big ole dive.

You said it is an upgrade I would conder the move a downgrade.

Azri

6:23 pm on Jan 19, 2015 (gmt 0)

10+ Year Member



Well, a huge amount of the current 50k urls are actually duplicate urls, which are constantly being created due to a very old current CMS. The number of real pages is much smaller and probably around 1500-1700.

Anyway, I don't know why the IT department is arguing about difficulties with implementing redirects, they are not SEO guys and perhaps they want to avoid dealing with the maintenance of the redirects. I don't believe that there is any real technical obstacle with performing redirects.

I hope I can convince the to at least redirect the most important pages.

Still, besides diving way down and kind of starting from the beginning in terms of SE visibility, do you think you might as well get tagged with "negative" credit from Google, or you just need to climb the mountain again and gradually re-build rankings?

Azri

aakk9999

6:59 pm on Jan 19, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



bwnbwn is right that it is a major problem if you are not redirecting as you will lose link equity. If many are duplicates and do not get traffic, then these can be left 404 (with a very good custom 404 page).

You do not need to redirect everything, redirecting important URLs may be enough - which is what you said you had done in the past.

With regards to your developers, I believe Sharepoint is .NET environment. As such, there is web.config file and important redirect could go there or in an external config file referenced from web.config. I believe this is executed before .NET application, so it is config and not development.

So you can put all redirects in a separate config file (for example, you can name it redirectRules.config) which is just a text file that is located in the root where web.config is. After changes are made in that file, they will not take effect until aplication recycles or restarts.

So in web.config you would have:


<rewrite>
<rules configSource="redirectRules.config" />
</rewrite>


and in redirectRules.config you would have redirects, just for example:


<rules>
<rule name="rule1" stopProcessing="true">
<match url="^your-old-url" ignoreCase="true" />
<conditions>
<add input="{QUERY_STRING}" pattern="^some-query-string$" />
</conditions>
<action type="Redirect" redirectType="Permanent" url="http://www.example.com/your-new-url" appendQueryString="false" />
</rule>
</rules>


Of course, your redirects can use patterns, etc just like in .htaccess on Apache.

The only thing to be careful of is the size of config files since web.config + redirectRules.config must not be larger than 250K (I think). So if you could get redirects actioned by patterns rather than one by one is probably better as it would save on the file size.

There is more info here: [ruslany.net...]

Martin Ice Web

7:22 pm on Jan 19, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I would get rid of the 50k 404s before switching. Make them 410 or put a canoncial tag on the original site. Else google is going to crawl this old 404s for a long time. Once you switched to new system it will be very difficult to make the old pages 410.
IT department employees that are not familiar with web development should not mentain a webserver.

bwnbwn

7:46 pm on Jan 19, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well makes a BIG difference duplicate urls. Adding that to the original post would have gotten a different answer. If you can't do as Martin Ice Web suggested then aakk9999 suggested you redirect all the dups to the good url.
If it is a .net (I haven't looked) setting up the redirects is easy all you will need is a good file of the old ones to redirect to the current urls.
You can create a mapfile set up the mapfile redirect with one line and all a 2 lines of redirect code in the webconfig file. I know just got through redirecting 300k.

Azri

7:02 am on Jan 22, 2015 (gmt 0)

10+ Year Member



Martin Ice Web,

Interesting insight!

Does Google really "forget" faster 410 pages than 404 pages?


Should I also serve 410 for not found pages that will come out of the new site?

lucy24

8:59 am on Jan 22, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Does Google really "forget" faster 410 pages than 404 pages?

Google never forgets. But the Googlebot stops crawling --that is, requesting-- URLs a lot sooner if the request meets a 410 instead of a 404. (Source: Direct personal observation.) This is appropriate, since a 410 requires deliberate action on the webmaster's part, while a 404 can happen by accident.

There's no difference in indexing, since either way they've received no content so there's nothing to index. I don't know if anyone has studied whether there's any difference in how fast an URL would be removed from the index if it used to exist and now returns either a 404 or a 410.

I also don't know whether Google sets a whole-site crawl budget, independent of how many URLs they know about for the site. If they do, then a 410 is again to your advantage, because less of the crawl budget will be spent on nonexistent pages.