Forum Moderators: phranque
On a side note, if I ever get a chance to meet Slurp and/or MSNbot in a dark Internet alley I'm going to tear their freakin little eight legs off one by one. And, I'm going to wait about an hour between each leg removal. That should give you some time to think about your actions and teach your peers a thing or two.
And Ask? I've banned you period from all websites. Your bot has an IQ of < minimum. Clueless!
User-agent: Teoma
Disallow: /
I'm cleaning up and de-spamitizing a site that's a horrid mess, totally redoing pages to cleaner code and 301 "redirecting" /bad_green_widgets/fuzzy.shtml to nifty cleaned up ones in /nice-green-widgets/fuzzy.html (with mod_rewrite and or RedirectMatch).
BUT, the whole site was a dup content usability mess and in supplemental (didn't even deserve that), and figuring it would take forever, I've been thinking of just putting up the /nice-green-widgets/ folder(s) anew and doing a 410 and page removal on /bad_green_widgets/
Brand new pages have been picked up right away, when linked to from the PR2 homepage, but the rest of the site just NEVER gets crawled (and I don't blame them).
But why not "/nice-green-fuzzy/widgets" ? Dump those extraneous future-liability page extensions while you have the chance, so that the "next time" when the site goes to --say-- PHP, the URLs won't have to change again.
PR1, the problem is that it's hard to learn to "design" good, sustainable URLs and URL-structures, and most folks don't get really good at it until they've suffered the consequences of being really bad at it (ask me how I know). ;)
There are all sorts of considerations that must go into it: Readability, marketing appeal, SEO factors, organization for robots.txt, cache-control, and access policies, rewritability (ease of creating regex patterns to match URL-subsets and minimizing the required number of rules), partitioning among staff members having different access levels for maintenance, etc.
Too few 'get it' that a URL should last forever, and that if it doesn't, then the 301 to redirect it will likely need to last forever; Many people treat their Web sites like a monthly magazine rack, rather than as a library; That's why we get so many threads here about "Just redesigned all my pages and changed all the URLs, so why did my site tank in the SERPs?"
Oh, and I'm not so sure Ask is still using Teoma -- Results today look like the way to rank for a keyword is to have only that single keyword as your <Title>. It doesn't seem to take much more than that!
Jim
They don't have to change when you change technology. Behold the power of:
# Allow .html .htm .cfm .pl .asp .jsp .aspx .jspx extensions to be PHP scripted files:
AddType application/x-httpd-php .html .htm .cfm .pl .asp .jsp .aspx .jspx
in your .htaccess file.
Of course they don't, but why continue to haul around the ".html" extension at all?
What's supposed to be a link to /widgets-folder/ is linked to as /widgets-folder without the forward slash at the end. Consistently, not knowing the difference between a page and a folder.
You would end up with all these:
/green-widgets
/green-widgets/small
/green-widgets/small/fuzzy