ID # in URL not good for SEO?

Forum Moderators: phranque

Message Too Old, No Replies

ID # in URL not good for SEO?

Using ID# in URL

Brindara

2:08 am on Jun 28, 2005 (gmt 0)

I remember reading somewhere that it is not a good for SEO to use ID # in URLs. Does anyone remember what was said about this and why?
Thanks.

coffeebean

2:21 am on Jun 28, 2005 (gmt 0)

GoogleGuy said:

I've been aching for a long time to mention somewhere official that sites shouldn't use "&id=" as a parameter if they want maximal Googlebot crawlage, for example. So many sites use "&id=" with session IDs that Googlebot usually avoids urls with that parameter...

[webmasterworld.com...]

physics

7:43 pm on Jun 29, 2005 (gmt 0)

Hi Brindara, welcome to WebmasterWorld.com! As mentioned above the problem is that dynamic sites often use this to keep track of the session id information. However, this causes all sorts of problems for spiders. For example, Googlebot might spider your site one day and get http://www.example.com/widget.html?id=234 one day and http://www.example.com/widget.html?id=948 the next. So it would have two of the same page with different URLs and might consider them duplicate content, also it makes it harder for the bot to determine if it's been to a url before, etc. If you use a CMS or dynamic site try to find one that doesn't put the id string in the url ... use cookies instead and/or find a CMS/program that doesn't keep sessions for known spider useragents.

txbakers

9:56 pm on Jun 29, 2005 (gmt 0)

I use dynamic urls and use the ID= parameter but not for session ID. It is a product or page id.

I'm top ranked in google for my chosen words so I can't complain.

That said, not every screen in my web program uses this. The basic "default" page and informational pages such as "contact" or "Faq", etc. are pretty static.

I do not want google to crawl those "inner" pages, I want them to find my default "outer" screens.

I think of it as a store. I want people to see my awning, I want them to look in the window, I want them to come inside and browse. I don't want them in the store room ,the cellar, or anywhere else behind the scenes.

physics

10:16 pm on Jun 29, 2005 (gmt 0)

I guess it depends on the type of site but for an ecom site I do everything I can to make sure all the pages get crawled. If there's 100,000 products then I want at least 100,000 pages in the SEs. I think of having all of product pages in as a free shopping engine data feed, not people looking in the cellar ;) To be on the safe side I think people who want all pages indexed should either choose another query string like

?product=4958749

rather than

?id=4958749

to denote which product is being displayed just in case a spider decides to ignore the?id string.
On the other hand, if you don't want your product pages indexed it's probably better to put a noindex tag on them or ban that directory in robots.txt rather than count on a bot getting confused about the query string (in case the bot wises up in the future) ;)

txbakers

1:59 am on Jun 30, 2005 (gmt 0)

If you have 100000 pages with a dynamic URL, those 100,000 pages can't be crawled.

There exists in the web server ONE page called "products.php".

When a user clicks on a link to find another page it appends the product number: products.php?prod=45675

So that product page doesn't really exist and can't be spidered.

Dijkgraaf

2:57 am on Jun 30, 2005 (gmt 0)

I have several thousand dynamic pages with id= as one of the parameters in the URL. The new Mozilla/5.0 googlebot has quite happily been spidering them, and a search in Google will bring these pages up (although marked as Supplemental Result).
I will however be looking at renaming that parameter, and also reducing the number of parameters in that URL as it currently has 4 parameters which I think is too many (I didn't write the PHP application that is doing it).
Quite a few bots wont spider a URL with parameters, others will only spider URL's if they have a single parameter, so I'll see if I can reduce it to one parameter.

physics

3:43 am on Jun 30, 2005 (gmt 0)

txbakers, do a google search for

site:ebay.com inurl:forum.jsp

You can clearly see here that having the same page such as furum.jsp or product.php with a dynamic parameter appended after it does not render a site uncrawlable. The issue with id= is that so many sites use that for a session id that spiders may decide to ignore it in which case yes, the site won't get fully indexed. So back to the original point of the thread it's better to use something different than?id= such as?prod= or?forum=

txbakers

4:41 am on Jun 30, 2005 (gmt 0)

I just ran my site with the same type of google search and was very, very relieved that it found no matches.

Perhaps in my application those dynamic pages don't exist without a parameter. I'm counting on them not existing without the parameter and a whole bunch of Session variables behind it.

physics

5:16 am on Jul 1, 2005 (gmt 0)

txbakers, glad to hear it ... but if you're counting on it you should probably look into putting noindex meta tags like


<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

into those pages you don't want crawled or use robots.txt so you can be sure Google won't pick them up.

decaff

6:17 am on Jul 2, 2005 (gmt 0)

I've worked with one set of 4 sites for the last 2 years with a /?cart_id= and then a long auto generated numerical string (for tracking customers through the shopping cart)on the backside of most of the important url strings...these pages are indexed fine and for the most part performing well in Google....(in conjunction with my studious work and attention)

Dijkgraaf

11:08 pm on Jul 2, 2005 (gmt 0)

Well actually
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> won't stop the bot crawling that page, as it needs to crawl it to actually read those META tags, what it will do is that it won't index the contents or try and follow any links found on that page.
If you put the page in the robots.txt file, that is telling it never to fetch it, however if it finds any links to those pages, it may list the URL's without any heading or contents. These usually won't show up doing normal searchers, only if you do searches for URL's.

physics

11:27 pm on Jul 2, 2005 (gmt 0)

Right Dijkgraaf ... I meant to say 'don't want indexed' instead of 'don't want crawled' Obviously robots.txt is a better way since it will keep the pages from getting crawled by the major se bots and thus save server resources.