Welcome to WebmasterWorld Guest from 54.234.8.146

Message Too Old, No Replies

Google's Crawler and the "id=" parameter

Will my content be crawled?

     

fmfguy

8:54 pm on Apr 27, 2006 (gmt 0)

5+ Year Member



I am starting a directory site that will have profiles of counselors and other content that is pulled via PHP from a MySQL database. If the counselor pages have URLs that look like this:
( http://example.com/family/search_details.php?cid=33&PHPSESSID=ea39cc8670394d805cc9071f431fb605 )

and other content looks like this:

http://example.com/family/articledetails.php?id=4

will Google even crawl all of this?

I was hoping to gain an edge by having thousands of counselor profiles and other content, but if the site is designed in this way, will google even crawl it efficiently?

What's the best way to get around this issue? mod rewrite? please advise. thanks!

[edited by: tedster at 8:56 pm (utc) on April 27, 2006]

tedster

9:00 pm on Apr 27, 2006 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Hello fmfguy, and welcome to the forums.

A couple things. From Google's Technical Guidelines for Webmasters [google.com]

Allow search bots to crawl your sites without session IDs or
arguments that track their path through the site. These technique
are useful for tracking individual user behavior, but the access
pattern of bots is entirely different. Using these techniques may
result in incomplete indexing of your site, as bots may not be
able to eliminate URLs that look different but actually point to
the same page.

Essentially, by including the SESSID in the url, you are creating a situation where a potentially infinite number of urls are possible for a single bit of content -- and as the bot starts trying these (if they even do) the pages actually active in the index will start to erode. Track this kind of data with a cookie if you must, but best practice is to keep it out of the url.

Also, if the parameter "id" is only being used to point to a certain counselor, I would advise changing it to almost anything that doesn't have "id" in it. However, that kind of url "may be" OK for you in this situation. The big probems come with "&id" where ID is the second parameter and it looks like a sessid, with all the troubles I mentioned above or even worse -- no crawling at all.

There are many indexed pages in Google with a single id parameter.

fmfguy

12:21 am on Apr 28, 2006 (gmt 0)

5+ Year Member



thanks for the welcome tedster, hopefully i can add my own insights and knowledge to this impressive forum.

what can then be done about the problems you have outlines without re-doing the way the entire site is designed? would a mod rewrite solve the problem? I want google to be able to crawl every ounce of content.

catch2948

1:53 am on Apr 28, 2006 (gmt 0)

10+ Year Member



If I were you, I would use a simple Rewrite statement in .htaccess to change the dynamic urls to static ... Google definitely doesn't like the "id=" part at all ...

tedster

3:02 pm on May 1, 2006 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



As a clarification, Google says they don't like "&id" as a parameter -- that is, using an id parameter beyond a single one.

patc

3:37 pm on May 1, 2006 (gmt 0)

10+ Year Member



They definitely do crawl it, but it might end up in the supplemental index.

I know this for sure after finding out how a hacker managed to compromise my system (bad security flaw allowing SQL injection at the end of my id= statement, ugh, what an idiot I am!).

They searched on google for filetype:.asp and "id=4737" or some random number... just happened I had an article that matched it, and they went from there.

fmfguy

1:52 pm on May 2, 2006 (gmt 0)

5+ Year Member



does someone have an example of a site where this .htaccess mod rewrite is being utilized? Does it slow load times significantly or no? I am going to do it regardless, but am looking to educate myself :-)

teaperson

2:25 pm on May 2, 2006 (gmt 0)

10+ Year Member



It doesn't slow down response significantly.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month