Redirects, tell the bot page doesn't exist

Hey Folks,

My first post here, so please be gentle. :)

I am having an ongoing issue with my site/s with excessive resource usage with the cpu (shared hosting). I have one primary domain and a subdomain. Both sites have wordpress blogs, and one site also has a coppermine-gallery photo section. And both sites have a bunch of static html pages.

One of the things we found is bing bot and/or msn.bot crawling a unch of pages that don't exist. Urls like this

GET /Bio/alaska/faq/stock/stock/alaska/journal/eagles/stock/thumbnails-79-Banff-National-Park-photos.html
GET /Bio/copyright/faq/stock/alaska/stock/index-17.html
GET /Bio/alaska/stock/stock/stock/alaska/portfolio/landscapes/stock/thumbnails-17-Small-Mammals-Photos.html

Literally, thousands of them. The directory Bio doesn't exist, is now 'bio'. and has just one url, index.html/ But somewhere along the line bing is trying to crawl these crazy non-existent urls. No other engine is crawling them, and they don't seem to exist. Bing's webmasters tools aren't showing a bunch of 404 errors, only a few, and none with this kind of url thing.

So the problem is it generates a 404, which is called and created dynamically by wordpress. Here's what I did:

In the .htaccess file, added

RedirectMatch 301 ^/Bio/ http://www.skolaiimages.com/bio/index.html

So now every one of those bad urls just goes to a correct bio, and static page. Is there a "better" way to configure this, rather than now having thousands of redirects, just have a script or code that says 'Bio"/anything doesn't exist?

Another option is to reconfigure wordpress so it isn't pointed to the root directory, I only made this change recently, so it wouldn't be too big a deal to switch it back and have a static html page as the home page again, and everything wordpress operate within its own directory (/journal/). That should mean any 404s from that above set of urls is not generated dynamically, but calls a static page, correct?

And/or drop wp-super cache, and switch to W3 Total Cache, which allows caching of 404 pages (Super Cache does not).

My access logs show as many as 5000 hits by bing/msn to these bad urls. Is it likely that this is causing the CPU problems?

I've slowed the crawl rate down, via webmasters tools, and and also via this

User-Agent: *
Crawl-delay: 30

in the robots.txt file, but those didn't seem to change anything. I just made the 301 redirect for Bio today, so don't know yet whether the resource usage has slowed at all.

I apologize for the long and convoluted introductory post. I've been having such a hard time with this, and am in WAY over my head on this. Any and all help is much appreciated.

Thanks so much.

Cheers

Carl

Redirects, tell the bot page doesn't exist

bingbot crawling non-existent pages

Aussiefoto

lucy24

Aussiefoto

Aussiefoto

g1smd

lucy24

Aussiefoto

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week