| This 233 message thread spans 8 pages: < < 233 ( 1 2 3  5 6 7 8 ) > > || |
|A real Google conundrum|
Established site losing all its Google traffic
I have a well-established site that over the past few days has lost almost all its Google referals. I think I know what's wrong but have no idea how to fix it.
First some background. The site is a well-established, deep-information site with many, many thousands of pages and a PR 6 on the home page. While we have attempted to get some links to us, most of the hundreds of links to us are spontaneous from a variety of professionals who find our content useful. Therefore, we're not at all dependent on the "latest" SEO tricks - totally white hat.
Up until this week, we got >15000 Google referals a day. We are not dependent on ranking for "blue widgets" or any other identifiable term - our referals come from thousands of different keywords a day which reflects the diversity of our content. Therefore, only a massive drop in the SERPs across the board can cause a >90% drop in referals, as we are seeing.
We still are in the index with the same number of pages and our backlinks don't seem changed. We still have the same PR showing throughout the site (for whatever that's worth since if there are changes, they probably wouldn't show immediately anyway).
Here's the kicker: Another site we own, let's call it widgetville.com, is showing up ahead of our real site, widgetville.org, in the SERPs when you search Google for "Widgetville". The higher widgetville.com site is shown without title or description. Widgetville.com has been 301 redirected to widgetville.org. Widgetville.com does have a backlink or two out in the world, but not the hundreds that the real site, widgetville.org, has so I don't understand the higher ranking.
If you search for "a bunch of widget words that you find on the front page", three other web sites who quote our mission statement appear on the page and our page doesn't. However, if you click on the link to show "omitted" results, we are listed as the omitted page.
In a way, it seems almost like our home page has been hijacked by our own non-functioning site. And it also seems to be like the whole canonical root problem that trips up some site owners except in our case it is between two domains, not a problem of Google getting confused between widgetville.com and widgetville.com/index.html.
We've had this problem before - a year ago - and I queried Google about the problem. I was told that it was a problem on their end, not mine, and they would fix it. The widgetworld.com listing was removed and within weeks, my traffic grew from a trickle to where I started hiring people to deal with the blossoming new customer base. Now all of that is threatened.
So, anyone want to take a crack at explaining this or giving advice on how to handle it?
I have taken one step to see what happens. I've removed the 301 redirect from widgetville.com and put a simple sentence on the page that says to click on the link to widgetville.org. I did this to disassociate widgetville.com from widgetville.org in case Google was somehow seeing duplicate content from the 301. Not sure how that would happen exactly since that is the prefered method of dealing with pages that are no longer valid, but this whole thing throws me for a loop.
|Google sees [ourdomain.org...] and [ourdomain.org...] as the same. So we were being penalized for duplicate content on almost all our pages. |
Umm... It is probably/likely to be the case. However, persoanlly, I wouldn't rush to the street chanting "eureka, eureka..." on that observation alone. Seriously, this could be caused by a whole lot of things, so if I were you, I would continue to explore further just to err on the side of caution. May be it is just me....
I do hope a 301 redirect cures your traffic blues... :)
you'd think this problem would have been sorted by now no? There is always a fresh reminded how far we haven't come sometimes!
I really think too much value is being given to this one. I agree that it helps and is worth doing but...
We have domains that are wild carded:
all show the same content and the entire site can be surfed at one of these subs. We show VERY high for thousands of competitive terms. I wouldn't relax and assume your problems are solved from such an easy fix.
Google is smart enough to recognize dup content more effectively than this.
otc - you are very lucky if your site hasnt been divided into smaller parts yet - lately when people have done what you are doing, Google tends to assign certain pages on the site to each of those wildcarded domains and splits up the associations so that each of those gets a part of the site. an example would be if you have a total of 100 pages, your www.domain.com would get 20 of the pages on a site: command, 20 pages would be associated with the domain.com etc.
I removed some old content with the URL removal tool in November.
The URLs were removed from the index, and a potential duplicate content issue was resolved, but meanwhile when I do a site: command, I still see the inflated number that must include those deleted URLs.
Question about duplicate content: Should we wait for Google to see the 301s so it can properly assign rank and such to the correct page, or should we immediately use the robots.txt and Google Exclusion (which is very effective!) to exclude any listings in Google that are not indicating the correct URLs.
We are in the same boat as many here after losing 95% of our G visits effective Feb 3. We found many places to use 301s to fix duplicate content that Google has indexed and we now use absolute links thinking it might make the spider see things more clearly.
>> I still see the inflated number that must include those deleted URLs. <<
Did you only remove the URLs that you didn't want indexed? Or, did you also set up the 301 redirect too? Without a 301 redirect, Google will simply suppress the URLs from the SERPs for 6 months then reinclude them; the URLs stay in their index but aren't shown in search results until then. If you set up the redirect, then they should drop them all.
Sidenote: If you attempt removal with a redirect in place, Google drops both entries from the SERPs.
Just a quick note to insist that the single most important factor for downturns during updates is usually dupe content. Lots of things attributed to weird algo changes are as straightforward as URL and/or DNS mess.
Do keep your URLs on a very tight leash. There is absolutely no reason to doubt the effect of just having one URL for each page. And don't trust that Google or any other SE will somehow miraculously identify the right URL. They try, and they often fail.
301? Weeks or Months? Heh...
It's to the point where I'm going to start swapping 301 with 404.
domainname.com to www.domainname.com? 404
www.site-name.com towww. sitename.com? 404
domain.com/folder to domain.com/folder/? 404
Soooo tired of seeing four variations of the same destination...
Sean please clarify what you mean. The 301's are not working to clean up your indexed content? How long have you had them redirecting?
>> domainname.com to www.domainname.com? 404 <<
No. No. You'll kill your site. The 301 will work fine. What I suggest you do is make a list of all the pages that appear in Google that are now redirects, make a "site map" type page listing them all, and put that page on another unrelated site. It may be that Google is no longer crawling those URLs and so simply hasn't seen the redirect yet. This will do it.
>>www.site-name.com towww.sitename.com? 404 <<
Same again. The 301 will fix it.
>> domain.com/folder to domain.com/folder/? 404 <<
Your whole site will go 404 if you do that. The true location of your content really is /folder/ not /folder. Make sure that all your internal links point to /folder/ - that is VERY very important, and here is why:
A friend of mine has a 150 page site. A Google site:domain.com listing showed about 180 pages, but with more than 50% without title and description. The listings showed duplicates, and less than half of the site was actually indexed (been online 2 years). There was no redirect in place for www and non-www content. There were more non-www pages with title and description listed, than there were www pages with title and description (and non-www was the version the site owner wanted to use).
We set up a redirect from www to non-www, and Google soon showed a lot more pages indexed, but again there were many variations:
There were some dead links to clean up here and there, which we did. I then ran the Xenu LinkSleuth over the site to check all the links and to make a sitemap page. I was horrified to find the sitemap had 300 entries in it - twice what was expected - and that half of the pages were called "301 Page Moved".
Here is what was happening: The base URL for the site is set up as www.domain.com but the owner wants to use just domain.com now. Most of the internal links to folders did not have a trailing / on the URL. So, when the bot follows a link to domain.com/folder the server does an automatic redirect to www.domain.com/folder/ (which is seen as a "page" for the sitemap) and only then does the 301 redirect kick in and redirect onwards to domain.com/folder/. We had no server access to change the "default name" to just domain.com so we have to find another way to avoid that problem. The answer was simple: make sure that every link to a folder ends in a trailing / every time.
Once all the internal links had the trailing / added, this site map problem was fixed, and 3 days later Google had indexed all of the non-www pages properly. Three days after that, all of the listed www pages are without title and description. Three weeks later most of the www pages have now been dropped from the index, there are a few dozen left to do. Adding a "sitemap page" to another site, one that shows all of the URLs that need to be delisted has helped Google to see that there is a redirect in place.
Install the 301 redirect.
Add a trailing / to all links to folders.
Make a list of all pages that need to be delisted.
>> How long have you had them redirecting?
I think I'm getting ready to celebrate the six-month anniversary since the drop-off and the ensuing crash-course in defensive webmastering. But internal links have always been absolutely addressed, and I've always been 100% consistent on things like the trailing slash. There was a time when I used site-name.com instead of sitename.com, but I think that was over 18 months ago (time flies when you are having fun with this stuff).
>> No. No. You'll kill your site.
Heh, it is already dead in terms of Google, I am trying to shock it back to life. About the only sites linking to the incorrect URLs are a few thousand scrapers. If I have to choose between guaranteed scraper traffic and a long-shot at Google traffic, I'll take the latter.
>> What I suggest you do is make a list of all the pages that appear in Google that are now redirects, make a "site map" type page listing them all, and put that page on another unrelated site.
And here I thought 1000+ other sites were already doing that for me...
Sean that's a real ugly problem and a vicious circle too. Those scrapers are in fact voting against you, so when you say that your URL is one thing and 1,000 other pages say it's something else, then it's a really hard job getting it corrected and it will take ages.
This is a very good example of why it is very important to keep these things in tight control - problems like that can easily escalate, as others use wrong URL's they've copied form elsewhere.
I'd suggest that you do one more thing than the 301. Put the <base href=""> tag in the head section of all your pages (even though you have absolute linking already). I'm not promising a silver bullet, as you obviously have a long way to go, but it will perhaps give a little more weight to your own votes (and it might just be snake oil, but i would give it a try anyway, as it can't harm if you have a good "link discipline" already).
|Once all the internal links had the trailing / added, this site map problem was fixed, and 3 days later Google had indexed all of the non-www pages properly. Three days after that, all of the listed www pages are without title and description. Three weeks later most of the www pages have now been dropped from the index, there are a few dozen left to do. |
Did your friend experience a significant drop in traffic during the time that Google was listing pages under various URLs? If so, has traffic begun to recover now that most of the www pages have been dropped from the index?
I've run into the www/non-www duplication problem myself; thanks to a couple of helpful members in the Supporters Forum, I installed a 301 redirect in my .htaccess file at the end of March. 759 of my 4,300 or so pages are still showing www versions in the Google index, but the www versions are now without descriptions, so you've given me hope that they may disappear sooner rather than later. :-)
IMHO, this is a problem that Google really needs to fix at its end. E-commerce vendors and SEOs may have the economic incentive and technical wherewithal to deal with this kind of thing, but a lot of great information resources will be unavailable to users if Google expects authors, professors, researchers, etc. to be aware of the kinds of technical issues that get discussed here at Webmaster World.
If you directly link to those 759 pages using a sitemap-type page, but installed on a separate site, and just for a few days to weeks, you'll likely fix the problem sooner rather than later.
As the site is informational in nature, the owner doesn't actually monitor traffic in any meaningful way. There is no advertising or products on the site.
Brett and oddsod -
You suggest that Google works these things out on it's own - is that because you think the spider can address accidental duplicate pages in other ways?
Without 301s how can the spider know the "correct page" to index?
About 2 years ago we had the non www syndrome going. Since then a we have had a fix in place (htaccess mod rewrite). 301 redirects for any url without the www and/or trailing / will redirect into the proper www.widgets.com/. Google seem to never have a problem with this. Upon the recent updates it has been giving both style URLs in it's results descriptions. Later the home page went to straight link. Looking through log files the non www URLs have ALWAYS returned a 301 redirect to googlebot and it immediately crawled the correct page. There is an exception when the index.html is added to url.
We are seeing is old deep content URLs showing up in the serps. These URLs have long and always been under a 301 redirect from our old structure and design. Looking at what google has cached it is from back in september, october, november but has our new design in which those pages have never seen just had been redirected by a 301. It is like google instead treated the 301 like a 302 which is hard to believe would happen. Decided to check the header of the page and everything looks just fine.
The only other problem I see is on a few of our main directory pages we see 2-3 different URLs. www.widgets.com/dir www.widgets.com/dir/ and www.widgets.com/dir/index.php. Our wholde site we only use www.widgets.com/dir/ style linking so it could be coming from external sources. usually one shows up in serps normally, the others are usually supplemental. If anyone knows of a fix or a mod rewrite (I am mod rewrite stupid) for this would be appreciated.
|If anyone knows of a fix or a mod rewrite (I am mod rewrite stupid) for this would be appreciated. |
If you go to the Apache forum on WW Jim is awesome. He da man when it comes to .htaccess (a few others too but don't just cut and paste stuff into .htaccess)
Here's a real google conundrum
when I first launched my site I made some changes and deleted a few file names. I was getting no traffic anyway so I wasn't a big deal. These file were indexed by google but soon disappeared from the SERP's.
Over the month's I was getting the odd 404 for these files so I wrote an .htaccess file and redirected 301 them to existing files.
Last month these files definitely did NOT appear in site:mysite they were non-existant.
This month they are indexed and appearing in site: with the old title, the old cache everything.
what did google sniff out my .htacces file and reinstate these files? These 2 file are pointing (301) at my homepage and another main-content page and i think they have done some damage too.
I know how to nuke them (but my host is having problems and I have no FTP until they fix it so I'm like a sitting duck now)
Getting listed by Google is quite simply, just sign up for google ads, they will then crawl your site, we have done it many times in the past and never used the ads. After they had crawled our sites we were back on top once again.
Reid, thanks for the help.
What I am thinking what happened is Google screwed up way back when and was either treating 301 redirects and indexing them like a 302...Or keeping an index of 301 URLs/pages and relating content that the 301 was redirecting to reverting/replacing the URL/pages in SERPS with the old. With the 302 problem Google has been working on 301's might be caught in the mix also and/or recent algo changes might be causing some penalties.
ALL of the 301 redirect pages I am seeing in our SERPS site:www.widgets.com are from September/October/November of last year and I just ran across one from back on April 21st 2004. None of these pages have ever seen our newest design except through a 301. Yet, they all have caches of our newest design. Years agow when we changed directory structure and to php we used 301 redirects for every page to point to the new versions. For whatever reason or changes in Google they may have been "creating" duplicate content and recent changes had a big effect. If this theory is correct then there would be wide spread Google created damage/penalties/dropped pages across many sites granted there is enough duplicate content on a site to do enough damage.
If google did fix the problem with the 302/301 redirects (and this may just be an effect of the fix) it will take some time to get crawled and cleaned out of their index.
If they fixed the problem any relation between the old redirect pages and the new may have got cut off dropping one out of the serps for being a duplicate. Google would have to recrawl everythig to sort it all out again. - Just a theory though.
Google's Hook,Line & Sinker:
First: They hook us by giving us access or ask us to submit our websites to Google!
Second: They give us a Guidelines line, and what we should do if we have problems with Google by responding with a "canned response"!
Third: With the ever changing (update indexes, algo) they sink us!
thanks for your msg.
It is similar here. Those messy old 301 URLs off my main site are cached in Google since Sept./2004.
In msg #20 of this thread I described how I tried to get rid of these messy long gone 301 URLs, which should not be there, if Google would have read the 301 definition correctly and implemented accordingly.
I spoke too soon. As others also noted, I saw my removal requests denied days later, because I deactivated the 410 rewrite rules too soon -- have it now activated again and tried the removal again.
From analyzing the logs:
seconds after the removal request, a bot (UA = "googlebot-urlconsole") comes by and checks.
If you just have sitting a 301 there, your removal request gets rejected immediately, because the bot follows the 301 and thinks that the new page is the old one. This is an incorrect interpretation of a 301, but I repeat myself and Google seems not to listen anyway.
Well, just clearly saying "301 - moved PERMANENTLY" is apparently not enough.
So I sent a 410 and thought that should be enough. NOT!
They sent another bot hours later to check again, same IP address but UA = "Java/1.4.2" this time.
If you think they may have got it now -- NOT!
Their Java-bot now comes every 6 hours to check again ... and again ...
This "removal" action is still running, and I will keep you informed about how long this will continue until they believe that a half-year-long 301 "moved permanently" URL has moved indeed and could now finally be scratched from the index ...
romeo - I think it needs to find a 404.
BTW - 2 days now and my virtual host still hasn't gotten it together - i'm still a sitting duck in google and can't even look at logs or stats of any kind.
a 404 means the cannot be found and no other resource has been defined.
a 410 should be read as gone - remove everything that has to do with this URL.
a 410 should be the better choice. - then again everything nowadays seem to cause problems.
Hi i do hope this is relavant to this thread.
I have just run site:mydomain.com
Goggle show 27,000 results
I have 11,000 pages on my site
all the results are www there are are results with out www.
Were are the other 17,000 pages coming from and is the factor that saw me lose 70% traffic on the 23 March. And finally have you guys in some way suffered similar which Google has dropped our rankings for.
thanks. Yes, a 404 should do it, and by removing the non-www ^example.com alias in the httpd.conf as well as the 301 redirect would make these old URLs simply inaccessible.
Google surely would finally understand this, but users following old bookmarks or typing short addresses may get annoyed and lost.
I have implemented that 301 stuff to help and aid human users to comfortably access my site, and not as an incentive for search engines to bloat their indexes.
I decided to send a 410 instead a 404, because there is a handy rewrite action [G] doing this, and have not found a way do generate a customized 404 out of a .htaccess rewrite rule (perhaps I have not looked in the docs deeply enough).
A 410 [definitely gone] should do, as it has an even 'stonger' meaning than a 404 [not found]. The strongest medicine one can get should do it here ...
You have mail and for the rest of the folks out here.
Please check your redirect rules using a header checker.
Google is your buddy in finding one.
An incorrect set of redirects can result in you 302 jacking your own site.
Also watch out which form of the errordocument statement you use on your Apache servers.
Also never ever send errors to your home page, you can 302 duplicate content your home page this way.
Treat errordocument handling exactly like rewrite and redirects, verify them using a header checker.
I screwed myself in the past using a 404 that redirected to my homepage.
It used to be OK like that for years, but at some point last summer it seemed to start causing issues.
Meanwhile, I fixed that in last fall, but right now I'm having March 23rd traffic drop issues almost identical to the September 23rd problem myself and others had.
My traffic is still dwindling lower and lower since March 23rd, it had a few inner pages hanging on, but they are fading from the SERPs now too. Some of them are competing with the network of a busy spammer and loosing, and others just don't rank period.
I'm hoping they will come back overnight like they did last December, but I'm not holding my breath.
|Meanwhile, I fixed that in last fall, but right now I'm having March 23rd traffic drop issues almost identical to the September 23rd problem myself and others had. |
My traffic is still dwindling lower and lower since March 23rd, it had a few inner pages hanging on, but they are fading from the SERPs now too. Some of them are competing with the network of a busy spammer and loosing, and others just don't rank period.
I had the same problem on March 23 (a 75% drop in Google referrals), and I added a www-to-non-www redirect to my .htaccess file at the end of March. If it's any consolation, my rankings for many of the keyphrases that I track are returning, and I'm starting to see a tiny increase in Google referrals. So maybe there's light at the end of the tunnel--though anyone who's old enough to remember Robert McNamara probably won't be reassured by that metaphor. :-)
About two years ago, I was having Sunday brunch and Robert McNamara was at the next table, eating alone, drinking a glass of wine. Didn't seem like the light was shining on him ...
Following GG's advice of [webmasterworld.com...] I sent Google a message about our problems in case that thread helps explain what happened to us. It feels like our PR has been slammed across the board for reasons unknown. Now we're prey to all kinds of canonical confusion (hence the defunct .com ranking above the main .org) etc. We've also cleaned up all the 301 issues we can think of and hopefully traffic will come back.
I'm glad yours did, but we haven't seen any improvement yet.
Wow, my pre-coffee grammar was awful in that post above.
I also have a re-direct, in my case from non-www. to the www. version, it's been there since fall.
I did have a dmoz listing that pointed to the non-www domain for years, but I had them change it on February 18th of this year. The Google directory doesn't show the change though.
Just a week or so prior to loosing most of my traffic on March 23rd, I moved my deliberately mid-nineties code into the 21st century. I used CSS for layout and text attributes, where as before I'd only lightly used it for text in the past.
I thought a basic style old school site with valid HTML was a good way to go (KISS), but I moved to CSS to make the site look better for visitors, and to reduce some minor tables related code clutter (it was already fairly clean).
I've since moved back to the more vanilla html version this week, although my gut tells me this drop and the September 23 drop have nothing to do with my on-page site factors, and more to do with Google, since others dropped around the same time.
| This 233 message thread spans 8 pages: < < 233 ( 1 2 3  5 6 7 8 ) > > |