Welcome to WebmasterWorld Guest from 54.196.244.186

Message Too Old, No Replies

Google's Crawler and the "id=" parameter

Will my content be crawled?

     
8:54 pm on Apr 27, 2006 (gmt 0)

New User

5+ Year Member

joined:Apr 27, 2006
posts:9
votes: 0


I am starting a directory site that will have profiles of counselors and other content that is pulled via PHP from a MySQL database. If the counselor pages have URLs that look like this:
( http://example.com/family/search_details.php?cid=33&PHPSESSID=ea39cc8670394d805cc9071f431fb605 )

and other content looks like this:

http://example.com/family/articledetails.php?id=4

will Google even crawl all of this?

I was hoping to gain an edge by having thousands of counselor profiles and other content, but if the site is designed in this way, will google even crawl it efficiently?

What's the best way to get around this issue? mod rewrite? please advise. thanks!

[edited by: tedster at 8:56 pm (utc) on April 27, 2006]

9:00 pm on Apr 27, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


Hello fmfguy, and welcome to the forums.

A couple things. From Google's Technical Guidelines for Webmasters [google.com]

Allow search bots to crawl your sites without session IDs or
arguments that track their path through the site. These technique
are useful for tracking individual user behavior, but the access
pattern of bots is entirely different. Using these techniques may
result in incomplete indexing of your site, as bots may not be
able to eliminate URLs that look different but actually point to
the same page.

Essentially, by including the SESSID in the url, you are creating a situation where a potentially infinite number of urls are possible for a single bit of content -- and as the bot starts trying these (if they even do) the pages actually active in the index will start to erode. Track this kind of data with a cookie if you must, but best practice is to keep it out of the url.

Also, if the parameter "id" is only being used to point to a certain counselor, I would advise changing it to almost anything that doesn't have "id" in it. However, that kind of url "may be" OK for you in this situation. The big probems come with "&id" where ID is the second parameter and it looks like a sessid, with all the troubles I mentioned above or even worse -- no crawling at all.

There are many indexed pages in Google with a single id parameter.

12:21 am on Apr 28, 2006 (gmt 0)

New User

5+ Year Member

joined:Apr 27, 2006
posts:9
votes: 0


thanks for the welcome tedster, hopefully i can add my own insights and knowledge to this impressive forum.

what can then be done about the problems you have outlines without re-doing the way the entire site is designed? would a mod rewrite solve the problem? I want google to be able to crawl every ounce of content.

1:53 am on Apr 28, 2006 (gmt 0)

Full Member

10+ Year Member

joined:Apr 25, 2003
posts:204
votes: 0


If I were you, I would use a simple Rewrite statement in .htaccess to change the dynamic urls to static ... Google definitely doesn't like the "id=" part at all ...
3:02 pm on May 1, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


As a clarification, Google says they don't like "&id" as a parameter -- that is, using an id parameter beyond a single one.
3:37 pm on May 1, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:June 4, 2005
posts:70
votes: 0


They definitely do crawl it, but it might end up in the supplemental index.

I know this for sure after finding out how a hacker managed to compromise my system (bad security flaw allowing SQL injection at the end of my id= statement, ugh, what an idiot I am!).

They searched on google for filetype:.asp and "id=4737" or some random number... just happened I had an article that matched it, and they went from there.

1:52 pm on May 2, 2006 (gmt 0)

New User

5+ Year Member

joined:Apr 27, 2006
posts:9
votes: 0


does someone have an example of a site where this .htaccess mod rewrite is being utilized? Does it slow load times significantly or no? I am going to do it regardless, but am looking to educate myself :-)
2:25 pm on May 2, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:Nov 2, 2004
posts:71
votes: 0


It doesn't slow down response significantly.