Forum Moderators: open
Had a problem with Googlebot on 27th / 28th April. It tried to index my sites Snitz forums (which use dynamic URLS).
It started out Ok, but then missed one of the parameters off the querystring. That caused the ASP page to crash with an ODBC error. This is recorded as a 500 (internal server error) in my logs.
Only problem is, Googlebot kept trying, requesting several topic pages from my forums > 20,000 times each, and getting internal server errors each time.
I have 275,000 Internal Server Errors on one day alone because of this. (Which hit my webserver pretty hard).
GoogleGuy - can you help? I assume this is a problem with Google's latest algorithm and Snitz Forums (which are quite popular).
I don't want to exclude Google (or for Google to exclude me!) - I'd like it to be able to index the forums. Happy to supply any details, logs etc if required to diagnose problem.
Thanks!
It would certainly appear to be an algo change as googlebot also tried over 30,000 in one day to access 5 pages that were SQL created pages that were missing off the site by mistake simply creating a 404 error which would normally be crawled once and ignored. As soon as I noticed and rectified the error, googlebot carried on with the rest of the site as normal - the worrying thing is that this latest error cannot be rectified easily at all without re-writing the whole cart and googlebot is 'stuck' trying this one URL and is not attempting anything else.
If you can help GoogleGuy, it would be much appreciated
One of the most common causes of 404 problems on Apache server is incorrect syntax in the ErrorDocument directive [httpd.apache.org]. If you point 404's or 500's to a full URL instead of a local path, the server code returned will be a 302-Moved Temporarily, not a 404-Not Found. See the warning (concerning 401's but applicable to all error documents) at the bottom of the cited ErrorDocument documentation.
Jim
HTTP/1.1 302 Found
message! Yes, I redirect to a full link for error pages using .htaccess
ErrorDocument 302 FULL-URL
ErrorDocument 401 FULL-URL
ErrorDocument 404 FULL-URL
ErrorDocument 500 FULL-URL
ErrorDocument 509 FULL-URL
I've never had any trouble with the Googlebot because of it.
Well, I was one for about 25 years, now I'm doin' whatever it takes to pay the bills! :)
See, in the old days, I had to design the hardware, write the boot code (machine code, and usually on paper), toggle it into the front panel in binary, and hit the run switch. The division between "hardware" and "software" was not so sharp.
Jesse,
I think you've been really lucky! Of course, Google probably has a few "extra" routines in their 'bot code to handle simple and common problems with 404s, and that's why you haven't had any trouble. I would recommend fixing it, though - they'll spend less time doing fix-up, and more time spidering.
Jim
It reminds me a lot of the big server crash we had a while back while testing a new e-commerce application before deploying it.
Our Tomcat server caused our Apache server to go into a wild loop... and Catalina ( a log file inside Tomcat ) went totally bezerk and caused our whole server to crash with a big thump. Had to remove that file et manually reboot our server at the data center.... We coudn't remote it anymore...
When we deleted that file, it was over 22 Gigs! Just a log file, imagine...
What A day! I'l never forget that one...
..this is because the server is doing more in-depth analysis about the best way to crawl your site.. ..Googlebot will be much better at crawling your website after that..
It's heuristic? Will all our phones ring at once one day?
Adaptive bot tech, using our own sites to continually learn! Given a few years any type of SEO would be impossible.
Jolly good.