|Google's Crawler and the "id=" parameter|
Will my content be crawled?
I am starting a directory site that will have profiles of counselors and other content that is pulled via PHP from a MySQL database. If the counselor pages have URLs that look like this:
( http://example.com/family/search_details.php?cid=33&PHPSESSID=ea39cc8670394d805cc9071f431fb605 )
and other content looks like this:
will Google even crawl all of this?
I was hoping to gain an edge by having thousands of counselor profiles and other content, but if the site is designed in this way, will google even crawl it efficiently?
What's the best way to get around this issue? mod rewrite? please advise. thanks!
[edited by: tedster at 8:56 pm (utc) on April 27, 2006]
Hello fmfguy, and welcome to the forums.
A couple things. From Google's Technical Guidelines for Webmasters [google.com]
|Allow search bots to crawl your sites without session IDs or |
arguments that track their path through the site. These technique
are useful for tracking individual user behavior, but the access
pattern of bots is entirely different. Using these techniques may
result in incomplete indexing of your site, as bots may not be
able to eliminate URLs that look different but actually point to
the same page.
Essentially, by including the SESSID in the url, you are creating a situation where a potentially infinite number of urls are possible for a single bit of content -- and as the bot starts trying these (if they even do) the pages actually active in the index will start to erode. Track this kind of data with a cookie if you must, but best practice is to keep it out of the url.
Also, if the parameter "id" is only being used to point to a certain counselor, I would advise changing it to almost anything that doesn't have "id" in it. However, that kind of url "may be" OK for you in this situation. The big probems come with "&id" where ID is the second parameter and it looks like a sessid, with all the troubles I mentioned above or even worse -- no crawling at all.
There are many indexed pages in Google with a single id parameter.
thanks for the welcome tedster, hopefully i can add my own insights and knowledge to this impressive forum.
what can then be done about the problems you have outlines without re-doing the way the entire site is designed? would a mod rewrite solve the problem? I want google to be able to crawl every ounce of content.
If I were you, I would use a simple Rewrite statement in .htaccess to change the dynamic urls to static ... Google definitely doesn't like the "id=" part at all ...
As a clarification, Google says they don't like "&id" as a parameter -- that is, using an id parameter beyond a single one.
They definitely do crawl it, but it might end up in the supplemental index.
I know this for sure after finding out how a hacker managed to compromise my system (bad security flaw allowing SQL injection at the end of my id= statement, ugh, what an idiot I am!).
They searched on google for filetype:.asp and "id=4737" or some random number... just happened I had an article that matched it, and they went from there.
does someone have an example of a site where this .htaccess mod rewrite is being utilized? Does it slow load times significantly or no? I am going to do it regardless, but am looking to educate myself :-)
It doesn't slow down response significantly.