| 2:38 pm on Oct 23, 2012 (gmt 0)|
I had/have this same issue with Disqus and the errors continue to rise and fall.
Still showing many in serps too.
My crawl errors are up to 168,000 right now - it's crazy. Many of those are brand new discoveries of pages that were removed over a YEAR ago?
My programmer and I have worked on this tirelessly and I do not understand it. I'm dealing with a wordpress blog in a thesis theme. I feel like the crawl errors DO effect ranking.
| 4:24 pm on Oct 23, 2012 (gmt 0)|
I suspect that you have so many of these 404s on your site that Googlebot is spending ALL of its time crawling them and not spending any time crawling your content. If I am right, the number of documents indexed (WMT -> Health -> Index Status) would have fallen.
I would suggest putting in rules to redirect any of these 404s as best you can to appropriate content (not to your home page). If you redirect these 404 urls then *perhaps* googlebot will figure the situation out faster.
RedirectMatch 301 ^(.*)[0-9]+$ http://example.com/$1
RewriteRule ^(.*)[0-9]+$ http://example.com/$1 [R=301,L]
| 4:37 pm on Oct 23, 2012 (gmt 0)|
Thank you for your reply. Well, as I said; I fixed all the errors by redirecting them properly to respective URLs. So technically, there are no errors on my site at the moment. The Index Status is just alright; but the Crawl Stats are significantly down - which is quite obvious. The 'time spend downloading page' has gone up.
What I notice is that there is a drop in 'Search Queries' - which clearly indicates that our website has been pushed down in rankings. I saw a spike in the search queries for two days; but it's fallen down again.
What's worrying me is that GWT drops the reported errors at 1000/day, as I mark them as 'fixed'. But if the 404 errors aren't causing the traffic drop, what else should I really look at? I've been researching for over a month now and not going anywhere.
These days, my routine is to login to GWT every evening and marking the top 1000 errors (generated by disqus) as 'fixed'. I'm not sure what else can I do?
| 4:49 pm on Oct 23, 2012 (gmt 0)|
The fact that 'time spend downloading page' has gone up is worring to me. The redirects you put in place should be very quick to crawl compared to entire pages that your site needs to render. With the situation that you describe, I would expect Googlebot to be doing a lot more crawling and doing most of it quickly.
I suspect that your site is experiencing some performance problem that could be contributing. How does your site perform for visitors right now? (WMT -> Labs -> Site Performance might be able to give you some idea).
| 4:54 pm on Oct 23, 2012 (gmt 0)|
Well, it says your website is 'slow' (updated on August 10). We've had the same setup for long time without making any changes. We've CDN and caching in place so I'm not sure if I should really be looking improving it.
I thought that the Google bot is taking time because it's encountering large number of errors. I think we've relevant plugins in place and the site performs 'nice' in pagespeed tests. Am I missing out on anything obvious?
Addendum: Just did a Google PageSpeed test and got 85 (out of 100) for front page.
| 5:36 pm on Oct 23, 2012 (gmt 0)|
Run some tests on how long the redirects take to serve and how long it takes to serve your pages (without images). I use curl on the command line for that on my linux box:
time curl -s http://example.com/ > /dev/null
time curl -s http://example.com/9723104987123 > /dev/null
| 6:46 am on Oct 24, 2012 (gmt 0)|
I really appreciate all your inputs. I just performed the tests and found that the site is loading really fast.
For the problem URL that corrects to correct URL, the results were -
time curl -s http://www.examplr.com/our-correct-url-ends-here/1346898779000/1347950695000
The site is faster than 56% of the all tested websites.
The redirects aren't an issue, it seems. I've seen few other web-masters also reporting similar drop in traffic after generating large number of 404s. However Google continues to deny it. Is there anything else I can try?
[edited by: tedster at 11:23 am (utc) on Oct 24, 2012]
[edit reason] switch to example.com [/edit]
| 6:52 am on Oct 24, 2012 (gmt 0)|
Having a sudden and massive 404 did caused a traffic drop on my site for a couple of months before everything went back to normal.
As for my case, I had automated translation plugin and I removed it without doing a proper 301 redirect.
| 6:57 am on Oct 24, 2012 (gmt 0)|
Several months? Darn! That's going to hurt a lot! Could you share the numbers please? How many errors did your site generate and what did you do to recover from the drop?
How many months did it take your site to recover? Also are you sure that it was sudden rise in 404s that caused the drop?
PS: Sorry for too many questions.
| 2:08 pm on Oct 24, 2012 (gmt 0)|
I can't really remember exactly how many months but I'm sure it is at least 2-3 months. Our traffic just kept on dropping and not sure if it was Panda related, our lost traffic never came back. It could be also due to massive lost of keywords since the translated pages are no longer available.
The amount of errors hit more than a hundred thousand. It was nasty!
| 2:52 pm on Oct 24, 2012 (gmt 0)|
Thank you, raymondcc. For our website, the 404s are all because of -
1. Disqus plugin. I think 70-80% of all 404s are because of this.
2. Deleted tags. I deleted about ~25k tags which were totally useless; but were indexed by Google.
I've redirected the deleted tags to homepage and those generated by Disqus have been redirected to correct URLs.
Could you share the fixes that you applied to your site so that I can do something similar?
| 4:26 pm on Oct 24, 2012 (gmt 0)|
Simple question: Do you have a proper 404.php file located in the folder of the theme you're using?
See also - [codex.wordpress.org...] the section - Help Your Server Find the 404 Page
| 5:23 pm on Oct 24, 2012 (gmt 0)|
Hello Zivush: Yes, the headers reflect proper 404s. I've tested them through HTTP status checking online tool. I also checked by fetching the URL as Google Bot. Whenever there's a proper 'Page Not Exist' - the site generates 404 status page.
While we continue to have proper 404.php; I've 301 redirected the problem URLs to their correct URLs (at least all those faulty URLs generated by Disqus. Those 404s generated by removal of unwanted tags have been redirected to homepage. For all other pages; there's a proper 404 error page.
| 5:48 am on Dec 16, 2012 (gmt 0)|
I'll post a quick update -
1. We're down to ~16k errors being reported in GWT. 99% of these have already been fixed, but Google's taking its own sweet time to acknowledge that they are gone.
2. We've had a few drops in the error count reported daily. These drops appeared at random intervals. Nov 26: 3, Nov. 28: 4k and a big drop of about 9.5k a few days ago which got us down to 16k errors from initial 99k (in October).
3. Traffic & search queries are still down and so is the crawl rate. I've let Google decide how fast to crawl in GWT settings.
4. 'Time Spent Downloading' is still high, because Google crawls the error URLs and finds them as 'fixed' through 301 redirects. I'm expecting another big drop before we get our error count to acceptable levels (2k - 3k) or lesser. Our site has ~450k pages indexed, so I think that should be just ok.
5. Google began discovering the error counts on 26 August, and finally lowered our rankings on 4 Sept. During that period the error count went up from 2k to over 17k and then to 36k.
6. I'm still not sure if I should only wait expecting Google to restore our traffic or should continue looking for problems on the site. I'm continuing with creation of content on our website though, but still worried.
7. There seems to no clear answer to these broken links issue. Google and their 'top posters' say that it doesn't. But several webmaster confidently say that it does.
I'm continuing with a hope that once the error count goes below 2-3k, Google will start crawling our site more and restore our SERPs for most of the quality content we've on our site that used to rank high.
| 6:08 am on Dec 16, 2012 (gmt 0)|
I am in the same boat. My errors were 168,000 - now down to 41,000.
I just keep hoping that there is some "level" that these errors drop down to that get you out of the "doghouse".
| 6:11 am on Dec 16, 2012 (gmt 0)|
@Frosh_Angel: Could you tell me whether the traffic and crawl rate drop on your site are almost in sync with the rise of the errors?
| 6:16 am on Dec 16, 2012 (gmt 0)|
| 6:24 am on Dec 16, 2012 (gmt 0)|
I see. All the folks on Google Webmaster Forums have been unanimously saying that those errors don't affect your ranks and SERPs. But the evidence suggests that they are interlinked.
I've read several webmasters saying that they got their rankings back after all the errors were gone.
I've began believing that Google's algorithm team doesn't let those active GWT forum Googlers attend all their meetings. Hah!
I've no other option than finding solace in believing that Googlebot found a ton of broke links on the site and thought "this site is crappy, let me not crawl and rank them while they address the issues on their site".
I wonder how long will it take Google to restore rankings once the error count is below the mark?
PS: Did you see big drops in error counts? Did you do anything special to reduce the error count? I fixed all of them in one go, because 99% followed a pattern.
| 6:34 am on Dec 16, 2012 (gmt 0)|
I was hit by Panda initially in April 2011 and didn't have GWT then - so I can't say for sure that those two are linked. However... I never recovered from Panda and was hit by Panda 20 as well. It's my hope that these crawl errors are part of the puzzle, as I have really worked hard to figure out what else it could be.
| 6:35 am on Dec 16, 2012 (gmt 0)|
Forgot to answer question....
I see huge drops in error counts. Sometimes 10,000 a day. However.... this ONLY started happening when I started marking all errors FIXED.
| 7:32 am on Dec 16, 2012 (gmt 0)|
Right. No one (except the Google guys who work on the algos) can really connect the two. But 'user experience' and 'overall quality' of the website are two important factors in Google deciding website's ranking in searches. I believe the the sudden rise of the broken links on the site led to Googlebot thinking that the site offers a bad user experience and not decided to rank the website lower.
Of course, this is just a guess. I've checked and checked and re-checked whether we're doing anything that's against Google TOS and could not find anything that requires attention.
The 'experts' told me to 'remove' the content generated by users that is 'thin'. Given that our site has over 50k discussions, it's totally impossible for me to visit every discussion and remove the content that 'I think' is low value.
Now, you're experiencing the similar trend in rise of crawl errors and drop in traffic+crawl rate. Our website never got hit by Panda or Penguin because we never were a content farm or implemented any black-hat SEO stuff.
The only thing I'm interested in knowing now is how long will Google take to restore the rankings when the error count drops below the acceptable level. Some of the webmasters of smaller websites have said that the rankings & traffic restored almost immediately. I'd have loved to have more opinions/experiences on this.
Right now I'm going to just wait and watch what happens when the crawl errors drop below 3k-5k and whether that affects our traffic - I 'think' it will take at least a week or two to 'restore'.
| 4:08 pm on Dec 16, 2012 (gmt 0)|
I had a script running on my site that would allow users to search for jobs. I had no idea the script was caching each and ever search and then Google was indexing these. By the time I found out - my site looked like a web spammer and I had tossed tens of thousands of thin content pages into the SERPS. So fixed it, redesigned site, went from HTML to wordpress.
Panda hits. I decided (stupidly) to use the script again since users really liked it and after talking to the developer of the script. He told me how to empty the search cache and make sure Google didn't index the searches. Well....he was WRONG.
Again.... I have an issues of thin content, cached searches into the SERPS. No matter what we tried Google was indexing them. So this was over a year ago. Google is STILL finding these pages in THEIR cache - thousands and thousands of them and calling them crawl errors.
Even after a year I am dealing with these pages - even though they've been long gone.... maybe even over a year.
Only after I started marking them as FIXED did they start going down.
I don't know if any of this helps you BigK - but just thought I'd share.
| 4:27 pm on Dec 16, 2012 (gmt 0)|
|All the folks on Google Webmaster Forums have been unanimously saying that those errors don't affect your ranks and SERPs. But the evidence suggests that they are interlinked. |
Some of those people are just plain wrong about a lot of stuff. If you think about it, why does WMT give you info on the crawl errors and all that if it doesn't matter to Google? I think it's very logical to assume they at least *can* count against you.
As for removing thin content, which creates 404s unavoidably, I've been wondering about that too. I have under 100 404s because of this on one of my sites, and I don't know if it's a problem or not. But I don't mark them "fixed" because they're not - they're meant to be 404s. I think I just have to wait it out.
| 4:37 pm on Dec 16, 2012 (gmt 0)|
Are those errors hurting your rankings/traffic?
| 5:17 pm on Dec 16, 2012 (gmt 0)|
Curious, what does GWT tell you about your sitemap, if you have one?
submitted = 1000 pages
Indexed = 900 pages (10% fewer indexed than exist, for example)
Sometimes the above is a sign that a spammer has been busy finding pages that resolve to incorrect urls on your site and is linking to those, which causes the wrong version of your pages to be indexed. Wordpress? Check for urls ending in periods and shortened urls.
examples for: example.com/a-perfectly-valid-url/
Check header responses. Bad = automatic redirect creations with infinite variations possible. Worse = showing the right content on a malformed url without rel=canonical at a minimum.
Test to see how your site reacts to these types of malformed urls, if you see content from the correct version but the wrong url or a redirect from malformed to correct there is a chance your rankings may be impacted.
| 5:53 pm on Dec 16, 2012 (gmt 0)|
@ Sgt_Kickaxe How do you check header responses?
| 6:03 pm on Dec 16, 2012 (gmt 0)|
Can anyone suggest a good 301 redirect tool, preferably automated for Wordpress?
I recently changed permalinks structure, and am using the "404 Redirected" by Weberz. It reports 4671 404 URL's need to be processed, but I see so mechanism to process them.
I'm sure this issue is why my blog traffic has died off.
| 6:18 pm on Dec 16, 2012 (gmt 0)|
@Frost_Angel: Thanks. It does help me in my analysis and my ongoing observation of the situation.
@dilberry: I might be completely wrong and they might be completely right. It's just that I'm believing what I see in the stats - that my traffic dropped at the same time when the error count shot up. I'm not sure if that's a pure coincidence with something else that decided to rank our site lower. I do see several low quality websites outranking our pure, original content.
I fixed the problem by redirecting all the error URLs to correct URLs. I don't know if that was a right thing to do; because had I not done that, I'd still be sitting with ~90k errors, making Google think our site is still crappy and not improving our crawl rate.
I'll only know whether my solution worked if our traffic restores after the error count drops. I'll keep everyone updated.
| 6:27 pm on Dec 16, 2012 (gmt 0)|
|I fixed the problem by redirecting all the error URLs to correct URLs. |
That's a key step when you change Permalinks for old pages. Then it's good to change any content area links to the new Permalink too, rather than send them through 301 handling. Depending on how you changed your permalinks, plus how many content area links you've got, that can be time consuming. But it is a good thikng to do and helps your site circulate PageRank internally in a more efficient way.
| This 42 message thread spans 2 pages: 42 (  2 ) > > |