Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

404s and Google - Facts and Fiction

         

mcskoufis

6:48 pm on Oct 12, 2006 (gmt 0)

10+ Year Member



Back in February my most successful website (in many aspects) had it's database corrupted and 5 years of content lost in space. It was a very unfortunate sequence of events that led to this, despite the measures setup by the server engineers for backing up (which is not what I want to discuss here).

8 months on and the site:

Never lost it's rankings - The site ranks in the top 6 of a generic and highly competitive keyword (as it did for the past 3-4 years), the site ranks for possibly every other keyword involving the generic one, and traffic to it did not decrease.

Never lost it's loyal visitors - Because it went down for a few days, people thought we had decided to scrap it and started sending messages of support when it was up again, even thought it started literally from scratch.

New URLs quickly replaced the old ones in GSERPs - It took less than 2 months if I remember right. I had installed PostNuke there and was replaced by Drupal, which for those who know the two systems has a totally different architecture and structure.

Now one of my other sites had problems with Google rankings from last november including canonical issues, duplicate content penalties and so on...

It was about the time when one of my SEO advisors (can't afford one to work for me) instructed me to change the way the URLS of any new pages created ever since. Instead of http://example.com/widgets/blue-widget.htm to do it http://example.com/blue-widget.htm so that it is closer to the root.

After doing this update, even the few quality rankings we had in Google were lost and last night that I last checked Google has still some of the one year old URLs in it's index. The first site I mention here never had this problem, even though it's entire architecture and URL naming changed within 3 days... (their only difference is that the one is authoritative in it's field while the other not).

Back in July I started redirecting the old pages to the new ones (for the hurt site) using htaccess and 301 redirection and still they exist in Google's index.

I am very frustrated... I think I am in a sort of sandbox which is never ending... I continuously see hits for "bluewidgets" instead of "blue widgets" and the only thing we rank for is misspelled words "blue wigdets" which leaves me to believe we are still in a sandbox.

At some point in March I did some changes to the urls (saved internally in drupal's db) and changed mistakenly about 2500 URLs to some very different ones. In terms of Google traffic nothing changed and we still have some very old urls in it's index... There are 900 pages not found still today in Google's index (according to sitemaps).

Now because of the change to http://example.com/blue-widget.htm it is impossible to put no index rules for a specific folder. Neither to enter all 900 urls in my robots or htaccess is possible (time-wise).

I did a re-inclusion request explaining the problem to google (back in March) but nothing has changed ever since. All those pages are into supplemental, but I think that it does affect my rankings even today.

Only for a single competitive keyword we are no1. No other significant ranking for the site. MSN and Yahoo are fine with it and send us traffic even for our generic target keyword.

Any ideas on how to get this issue rectified? Have you had the same situation happen to you and if yes what happened? What did you do to get out of the so-called "sandbox" which some argue that it does not exist anymore?

Thanks in advance...

g1smd

9:18 pm on Oct 12, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The redirected URLs will continue to show up as Supplemental results for one year after the redirect was put in place. They are not harming anything. If they return 301 then you have done all you can do. Google will do the rest after a year. In the meantime, if that old URL does appear in the SERPs your redirect gets the visitor to the correct place anyway, so you don't lose out at all in this situation.

As for having content in folders or not, it isn't the directory level that is important, it is the number of clicks away from the root index page that is important. Using folders can be very beneficial. You can have an index page for each section of the site, and if you link everything up with breadcrumb navigation, spiders will have an easy job getting all over your site.

Additionally, any PR that internal pages gain from external links is easily spread to other nearby pages deep in the folder-structure, as well as back to the parent pages, and back to the root of the site.

mcskoufis

10:34 pm on Oct 12, 2006 (gmt 0)

10+ Year Member



Ok..

I am closely following your suggestions and especially the "duplicate content - get it right or perish" thread (that's what I was just reading thoroughly)...

[webmasterworld.com...]

It is just that now I have totally messed up the URLs of the site. I have mistakenly changed them twice over last year and it is humanly impossible to 301 each one of them. I have over 900 404s in my sitemap index stats.

Do you think it would be a good idea to add a noindex to the entire domain and when everything is out remove it?

I think that trying and trying and trying to fix things just makes them worse... The urls were changed at a time when I've spent an entire week reading through this site (and especially your advice on the duplicate issues) then trying to fix things in the drupal code and hand editing database tables... After several days of this stuff it (if your profession is not seo, rather your interest) is a natural consequence to do these types of errors.

g1smd

10:50 pm on Oct 12, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If something returns a 404 then it cannot be a duplicate.

If Google lists pages as 404 in WebMaster tools then they already know there is nothing there.

There is nothing for you to do. They know. You know they know. They don't know that you know, but they don't need to know whether you know, or don't know, in order to carry on indexing the site. :-)

Are the new URLs being picked up and indexed? That is the only important thing to know.

mcskoufis

11:22 pm on Oct 12, 2006 (gmt 0)

10+ Year Member



hasn't indexed any new pages since for the past 7 days... Pr went from 3 to 5 but strange things happen... site: shows one day 100 and another 14000... And a third 100 again...

But if google has the 404 page cached in it's data centres and goes to fetch a new url which is identical to the cache but has only different URL doesn't this cause problems?

mcskoufis

1:35 am on Oct 13, 2006 (gmt 0)

10+ Year Member



All my main sections are duplicates... Fixed everything that I can think of right now... Sometimes looking into the nitty gritty details continuously for days makes you loose the bigger picture...

Thanks for your help!

Alex70

7:34 am on Oct 13, 2006 (gmt 0)

10+ Year Member



g1smd
you've almost killed me with this....>> There is nothing for you to do. They know. You know they know. They don't know that you know, but they don't need to know whether you know, or don't know, in order to carry on indexing the site :-)>>>

I understand now who's George W.B. speechwriter. ;-))..fortunally is friday, couse after this I may need something strong tonight!

Simsi

11:12 am on Oct 13, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If they return 301 then you have done all you can do

G1smd - probably a silly question, but how do you check the code?

g1smd

12:40 pm on Oct 13, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You could use WebBug but make sure you always use the HTTP/1.1 option if you do that.

Get a Web Browser like Mozilla, Firefox, or SeaMonkey, and install the Live HTTP Headers extension (livehttpheaders.mozdev.org).

While doing that, also get the ShowIP extension (L4X.org) as well.

Simsi

3:22 pm on Oct 13, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks. I'll try out Webbug.