To add my experience I made a mistake with my canonical tags, which cause webmaster tools to report lots of 404 broken links (about 3% of the total pages). The links were not actually broken.
I fixed it about 6 weeks ago, but have not seen a recovery yet.
Sorry... regarding OP's header for the post I have to ask how can one "accidentally" link to thousands of 404s? And worse, those pages were on site?
Just a reminder we each can shoot oneself in the proverbial foot.
Fix it. Go from there. Don't worry about it (damage already done) just create new content that does not link to 404 pages and go from there. No magic fix, and do recall that G, B, and Y have long memories.
In my experience, second-order ranking factors do not update immediately, even on Caff, and generally last 30 or 90 days*.
Such factors are discrete (as opposed to continuous), and are often perceived as a "penalty" though they are no such thing. It's pretty hard to tie down these effects, as they are a quantification of an abstract. (Think "good quality" versus "poor quality" and compare to "how much quality does this item possess")
If you have been re-graded with a lower quality score, and have fixed the issue, patience will be required until it works through.
I disagree with some posters that your situation as discribed is "unlikely" to have caused you problems. Having a "main" link as 301, and "thousands" of internal 404s is frankly awful. If I were grading such a site, I would mark it down. Which is a good thing, because fixing it should bring a measurable result.
*When set, it is re-evaluated after a pre-determined period. If it was set 29 days ago, and you make the change today, it may be re-evaluated tomorrow. This is one reason why people get confused about variable "recovery times".
I have now tried Xenu and LinkScan both are struggling to crawl our site in full.
Both programs show a TimedOut errors, but when I visit the page it loads fine. Does anyone have any experience with these programs. Does this mean Gbot is being served these errors, I doubt it because they are not reported in WMT.
I have exactly this problem on a site with 1 million indexed pages. Slow degradation since Mayday update and are now down to 200 000 indexed pages.
We also had a number of internal links pointing ot 404s etc + many other major duplications errors etc.
These are now fixed, but I dont see any recovery in the index or in traffic from G
Welcome to the forums, firstconversion.
A couple things I'm wondering about in your case. Are those "pages in the index" numbers coming from the site: operator or WMT? Also how has your Google traffic been through this period?
[edited by: tedster at 5:18 pm (utc) on Aug 3, 2010]
Great insight, Shaddows!
I think that on top of Google's internal "clock" there is also a crawling factor that plays into this: if a site is pushed back, it is very often crawled slower. So if you made a bad sitewide change while it was crawled normally, it would take Google longer to realize that the change was reversed. Or, in other words, it will take them longer to realize that the reverse change is also sitewide because they would need a certain significant amount of pages crawled before considering the change being sitewide. This is why I think large sites are affected more by any sitewide changes.
I hope that was a rhetorical question because there is nothing easier than to add a single character into a script that runs a database-driven site and voila, you got yourself either thousands of 404s or a thousands of dupe content pages, depending on the situation. Been there, done that :( Or, were you essentially saying: "How can one be so careless"? Well, you didn't have to rub it in ...
|Sorry... regarding OP's header for the post I have to ask how can one "accidentally" link to thousands of 404s? And worse, those pages were on site? |
|I have now tried Xenu and LinkScan both are struggling to crawl our site in full. |
Both programs show a TimedOut errors, but when I visit the page it loads fine.
Are you tripping a DOS setting on your server?
Thanks for all the great feedback very useful thread.
BillyS - Yeah iv checked the DOS, doesnt look like it that! Im going to give LinkScan another shot now on my workings it should take 63 hours if it runs at the current rate. Ouch!
We recently discovered (in GWT) that one of our scripts was being indexed using a negative pointer. Essentially a value that was not suppose to be allowed. This in turn had created thousands of broken pages in the index. I ended up doing a 301 redirect to our 404 page whenever a negative value was passed in the query string.
Now it shows up as thousands of 404 errors in GWT. Will G eventually drop them from the error reports? I hate seeing them there every day. :-)
Try reducing the number of simultaneous threads on Xenu. Big sites will still choke Xenu, though.
I find the fact that the main pages are no longer at the top of the site listing interesting. Not sure what that means, but in my experience it's atypical.
Maximillianos - by 301'ing to 404 arnt you still internal linking via a re-direct to your 404 pages?
You are right C41lum - It is probably best not to 301 them, but just have the script return a 404. I've corrected it.
I think I was 301'ing it because I had trouble getting the script to return a valid 404 in perl, but I think I have it working properly now. Let's hope! =)
yeah lets hope...let me know if you see any changes it might be helpful.
Hi Tedster (longtime lurker, first time poster here)
The numbers are coming form site: operator. It is just our product pages that are affected - From 10k visits a day down to 500 and at the same time the site: command for our product pages show a 90% drop
Correlation not causation, but worrying. Im having a devil of a time trying to figure this out. We have made a number of fundamental fixes and cleaned up a lot 404s/503s but seen no recovery in either traffic or site:index numbers
All other areas are fine (tags/categories etc)and reporting site: and actual visits as per normal
WMT numbers look ok, but now down also. IM not worried about the site: numbers, id just like to figure out what went wrong to get the traffic back :)
Hi firstconversion glad you have decided to get in involved.
The Site: command should return the pages G has listed for you in the index. If the number has dropped I would check duplicate issues. Then on page content for thinness.
Have you tried doing a site:yourdomain.com/file search to see what results that throws up. Also a wrong spelling search can throw up different results.
I use the GWMT urls seen + site command to keep a general eye on my indexing. Normally it doesnt correlate to traffic that well, but I can safely say that 0 pages indexed = 0 traffic ;)
Duplicates certainly were an issue and are now fixed. Just waiting for indexing and traffic to come back. Shaddows explanation above gives some comfort, but without referencable material on that score I am hesitant to message that to my colleagues
ofc there is always the possibility that there is some duplicate content that we have missed
I track each part of my site index daily e.g. site:yourdomain.com/profiles (and some of my competitors) so this is how I know I have lost only my product pages and not other pages
|Just waiting for indexing and traffic to come back. |
Shaddows' comments are very helpful here with regard to Google indexing. Look also at the comments from pontifex in this post in the site: operator discussion....
Google SITE: Comand, what does it show?
The two together, I feel, present a very good view of why some results on Google seem to go in "waves", to use the word that pontifex uses.
I think I need to check my webmaster tool account everyday to have all these issues fixed to avoid ranking down. Thanks
|I have exactly this problem on a site with 1 million indexed pages... |
I gotta ask this: what kind of a site has one million pages?
I mean, I don't think wikipedia has a million pages, does it? And wikipedia tries to cover EVERYTHING.
According to the site: operator, wikipedia.org has 137 million URLs indexed. There are many sites that clear the 1 million URL mark. News sites that have been around for a while can be way over that mark, as well as websites for some corporations, etc. It's really not that uncommon.
| This 51 message thread spans 2 pages: < < 51 ( 1  ) |