Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Cleaning URL data base

         

andreicomoti

2:59 pm on Nov 25, 2021 (gmt 0)

5+ Year Member Top Contributors Of The Month



Hello SEOs,

I have a couple of questions for you.

So I have a large website with over 1.5M URL. Some of these are up and running, but 70% of them are previous versions of the current URL that now redirect to the last version or return 404. I also have redirect chains in this database, like:

Version A redirects to -> Version B -> Version C -> Version D -> Versions E

I have read that redirect chains can affect SEO if, for example, Version A has some backlinks to it. This I will solve with a tool that will make all versions point to E directly, eliminating the redirect chain.

May questions are:

1. If I want to clean this database of old URL, what is the best approach and what should I consider before deleting the old URLs?
2. Does Google still crawl and visit permanently redirected URLs? What is the risk if I delete all of them, I must mention that these URLs are created a few years ago, maybe 5 years or more.
3. Is there any risk if I delete the old 404 URLs? Should I check anything else here?

My concerns are with backlinks and losing traffic from mentions on other websites.

I have a lite account on A hrefs, but it limits me to 200 URLs so it will take a lot to check nearly 700K urls manually for backlinks, do you know how can I check them in batches of 100K at least?

Thanks in advance!

NickMNS

4:24 pm on Nov 25, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Regarding redirects.
Instead of A -> B -> C -> D -> E
you can simply use A-> E, B->E, C->E, D->E, that eliminates the chain.
Then the question become is there any value in keeping these backlinks. My guess would be no.

1. If I want to clean this database of old URL, what is the best approach and what should I consider before deleting the old URLs?

My question to you is why would you want to delete the page and it's content? If the content has value and people are linking to it then you should probably keep it. On the other hand if the page has no value then delete it URL and all. If the page has been update for some legitimate reason the redirect is probably warranted.

2. Does Google still crawl and visit permanently redirected URLs? What is the risk if I delete all of them, I must mention that these URLs are created a few years ago, maybe 5 years or more.
Google has a long memory and will continue to attempt to crawl all pages it knows about even if the page has been deleted a long time ago. Frequency of crawl drops with time. Ultimately you shouldn't really care what Google crawls, if a page needs to be deleted or redirected, you delete or redirect it regardless of Google. Otherwise you will be managing URL's that provide no value to you or your users or Google. This also consume your crawl budget, so it comes at an indirect cost.

3. Is there any risk if I delete the old 404 URLs? Should I check anything else here?

If you are returning a 404 http status code, as far a Google is concerned the URL is already deleted. If on the other hand you are returning a status 200, but showing a "404" message then you have a problem. The solution to that problem is simple, delete the URL and your server should return 404.

I have a lite account on A hrefs,

As far as I am concerned you are wasting your time and money. AHrefs has no idea how Google values the backlinks it reports, many of those could be ignored, could not be discovered by Google, or could be hurting you and there could be other backlinks that aHrefs hasn't found. You are dealing with incomplete information, basing your decisions on incomplete information can lead to the wrong decisions.

If I were you would focus my efforts more on the user and less on Google. Win over the user and Google will follow.

andreicomoti

4:49 pm on Nov 25, 2021 (gmt 0)

5+ Year Member Top Contributors Of The Month



Thank you Nick!

Regarding this part:

"My question to you is why would you want to delete the page and it's content? If the content has value and people are linking to it then you should probably keep it. On the other hand if the page has no value then delete it URL and all. If the page has been update for some legitimate reason the redirect is probably warranted."

I was not quite clear. I want to delete old URL that have been redirected to the new ones, they cannot be accessed by the Users and google says that they are not in the index. Some return 404, others 301 or 302. They occupy space and put stress on my server without any advantage, so I simply want to get rid of them when I migrate to the new platform. I was thinking if this has dangers from an SEO POV.

I already made lists of each type of old URL that redirects to the new ones and I am currently analyzing the HTTP status code. Next will do an analysis on the backlinks, to see if I have any quality backlinks to them. If they have backlinks, the redirect will pass link juice, but if I 404 them, I will lose the link juice, right?

Other dangers? :)

NickMNS

5:37 pm on Nov 25, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I was thinking if this has dangers from an SEO POV.

I don;'t think you need to worry.

If they have backlinks, the redirect will pass link juice, but if I 404 them, I will lose the link juice, right?

True a redirect passes some link juice, 404 doesn't. The problem you have is knowing whether the links are truly valuable or not. Not all links are equal, so you may be devoting time and resource to keeping URL's alive for nothing. If a page has no value to the user, then simply kill it.

lucy24

6:36 pm on Nov 25, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Is there any risk if I delete the old 404 URLs?
The question makes no sense. If the URL returns a 404, then it is impossible for anyone--whether human or robot--to know if the page associated with the URL physically exists on the server. (The same applies to 301s and, for that matter, to almost any response other than 200 or 304.) In general, 404 means the server looked for the file and couldn't find it, but here it sounds as if you are returning the 404 manually. And if so, you should instead return a 410 (Gone). It’s more accurate, and will also make G stop requesting the URL faster.

tangor

7:54 am on Nov 26, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I keep it a bit more simple:

If I don't want it any longer, why would I let others "think" it was there long enough to get a 404?

When I delete, I delete.

When I MOVE content to another URL, I 301.

If I want g (or others) to leave me alone after delete I 410.

Don't get me wrong, link juice is important, but the web, saturated as it is becoming, is nowhere as good at "juice" as it once was.

Work for yourself and your users first. Everything else comes second. :)