Forum Moderators: Robert Charlton & goodroi
I was wondering that will there be any problem when google crawls the following pages
ex: -
www.example.com/bk12345/widgets/widgetname.html
I mean the number part of the url.... dose it effect the url in any way in search engines.... If yes is there a way to rectify it please feel free to post your answers....
thanks
kiran
[edited by: Robert_Charlton at 6:02 pm (utc) on Aug. 4, 2009]
[edit reason] made example url less specific [/edit]
In other words, if the apparent directory or file name is only there to add keywords to the url (a practice I've heard called "keyword fluffing") then make sure that those text strings need to be exactly right and any error gets a 404.
I know Tedster, I recall one or two posts I read and you highlighted the problem, many people refer to the problem and no one came up with an acceptable term, so I had to call it the above. I had three sites hammered by G* as a result. .../write-what-you-like-here-3547.html will always pulls the page ID 3545 from the database and presents it to the user or any bot, hence a 200 response. Some of the subsequent problems are duplicate pages and URLs, infinite URI loops, infinite appending directory URLs over other directory URLs etc. This of course tend to happen when you have database pulled content and mostly using CMS type dynamic URLs.
Some people blame their webservers such as Apache, but that's wrong and the likely to blame are the webmasters themselves, mixing the order of their rewrite rules. This technique can be beneficial as long as the URLs themselves are not ridiculously long, produce a 404 if mistyped (as you pointed out Tedster), no other URL has the same content and few other precautions.
G* knows about this problem and in some situations and large authority sites, they get a hint on WMT or even emails I heard, for some they get hit by moderate to severe penalties. Yahoo totally bans most and that's because they could not manage to find the solution, imagine you have a site with 2000 true pages, but using the keyword urilation technique wrongly and end up with 2,3,4 million pages. I had a site from which yahoo indexed 2 million pages when it only had less than 50k, that site is still banned today even though I reversed what I did and told them few times.
"Keyword URIlating" - that phrase is not too likely to go viral, is it. If I come up with a brainstorm for an alternate term, I'll let you know.
This problem is actually bigger than we think it is, I am surprised it does not get addressed enough. Considering what it can do to SEs index databases and sites's page and trust rank. G* engineers (now) to some considerable extent know how to deal with it and rank the intended true and original pages accordingly, but I am sure some pages are discredited due to a paralleled and un-intended page duplication. Bing for example is no stranger to the problem and I do think they deal with it better despite having a worst pitfall when it comes to URL decoding and encoding, they still index some pages with tags in them (example:.../something-is-said-here<br>-and-the rest-of-it-<br>.html), I don't know where they do get the tags from!
For the record, keyword urilating for this purpose is meant WRONGLY IMPLEMENTING THE USE OF INCLUDING KEYWORDS IN URLs leading to massif page duplication whether that was intentional or was due to coding malpractices!