Is there a problem with numbers in the URL for inner pages?

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Is there a problem with numbers in the URL for inner pages?

ahdtravel

6:24 am on Aug 4, 2009 (gmt 0)

Hello guys,

I was wondering that will there be any problem when google crawls the following pages

ex: -
www.example.com/bk12345/widgets/widgetname.html

I mean the number part of the url.... dose it effect the url in any way in search engines.... If yes is there a way to rectify it please feel free to post your answers....

thanks
kiran

[edited by: Robert_Charlton at 6:02 pm (utc) on Aug. 4, 2009]
[edit reason] made example url less specific [/edit]

tedster

10:29 pm on Aug 4, 2009 (gmt 0)

There is no problem if your server responds with a 404 to a typo in the text area. But if your application is only keying off the number alone, and a url such as example.com/bk12345/GARBAGE/widgetname.html still gets a 200 response, then you have trouble.

In other words, if the apparent directory or file name is only there to add keywords to the url (a practice I've heard called "keyword fluffing") then make sure that those text strings need to be exactly right and any error gets a 404.

dusky

3:26 am on Aug 5, 2009 (gmt 0)

Tedster!
Though "keyword fluffing" is not the exact term for this type of scenario, fluffing is deliberate and intentional "spider feeding" with search queries through a text box and transforming them into fictitious URLs leading to fictitious pages, or to be precise, URLs and pages made-up from previous or made-up search queries, this been around for years. However, what I had to call "Keyword URIlating" derived from the abbreviated words URI themselves, constituted most often from an innocent mode rewrite and rewrite rule mistake. I know it sounds dirty (I called it with an N initially, which could've been worse, but that did not sit right with me), but that what it is, urilating any given keyword or phrase and the process still yields a 200 response as long as the the page identifier or id is correct!

I know Tedster, I recall one or two posts I read and you highlighted the problem, many people refer to the problem and no one came up with an acceptable term, so I had to call it the above. I had three sites hammered by G* as a result. .../write-what-you-like-here-3547.html will always pulls the page ID 3545 from the database and presents it to the user or any bot, hence a 200 response. Some of the subsequent problems are duplicate pages and URLs, infinite URI loops, infinite appending directory URLs over other directory URLs etc. This of course tend to happen when you have database pulled content and mostly using CMS type dynamic URLs.

Some people blame their webservers such as Apache, but that's wrong and the likely to blame are the webmasters themselves, mixing the order of their rewrite rules. This technique can be beneficial as long as the URLs themselves are not ridiculously long, produce a 404 if mistyped (as you pointed out Tedster), no other URL has the same content and few other precautions.

G* knows about this problem and in some situations and large authority sites, they get a hint on WMT or even emails I heard, for some they get hit by moderate to severe penalties. Yahoo totally bans most and that's because they could not manage to find the solution, imagine you have a site with 2000 true pages, but using the keyword urilation technique wrongly and end up with 2,3,4 million pages. I had a site from which yahoo indexed 2 million pages when it only had less than 50k, that site is still banned today even though I reversed what I did and told them few times.

tedster

3:56 am on Aug 5, 2009 (gmt 0)

I stand corrected on the terminology - thanks dusky. But the problem itself, not usually intentional, is real. And once again, there certainly is no problem, per se, with number in the url filepath.

"Keyword URIlating" - that phrase is not too likely to go viral, is it. If I come up with a brainstorm for an alternate term, I'll let you know.

dusky

7:17 pm on Aug 5, 2009 (gmt 0)

Yes, it is real Ted and causing a lot of problems to too many innocent, well intentioned webmasters. Competition for SE position and SERPs drove many to implement the keyword urilating technique, I have to keep calling the ad hoc scenario as such and I am quite happy with it anyway, OK you're right is not likely to go viral, but at least we have a diagnosis and a name.

This problem is actually bigger than we think it is, I am surprised it does not get addressed enough. Considering what it can do to SEs index databases and sites's page and trust rank. G* engineers (now) to some considerable extent know how to deal with it and rank the intended true and original pages accordingly, but I am sure some pages are discredited due to a paralleled and un-intended page duplication. Bing for example is no stranger to the problem and I do think they deal with it better despite having a worst pitfall when it comes to URL decoding and encoding, they still index some pages with tags in them (example:.../something-is-said-here<br>-and-the rest-of-it-<br>.html), I don't know where they do get the tags from!

For the record, keyword urilating for this purpose is meant WRONGLY IMPLEMENTING THE USE OF INCLUDING KEYWORDS IN URLs leading to massif page duplication whether that was intentional or was due to coding malpractices!