homepage Welcome to WebmasterWorld Guest from 54.161.214.221
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Google's Crawler and the "id=" parameter
Will my content be crawled?
fmfguy




msg:774255
 8:54 pm on Apr 27, 2006 (gmt 0)

I am starting a directory site that will have profiles of counselors and other content that is pulled via PHP from a MySQL database. If the counselor pages have URLs that look like this:
( http://example.com/family/search_details.php?cid=33&PHPSESSID=ea39cc8670394d805cc9071f431fb605 )

and other content looks like this:

http://example.com/family/articledetails.php?id=4

will Google even crawl all of this?

I was hoping to gain an edge by having thousands of counselor profiles and other content, but if the site is designed in this way, will google even crawl it efficiently?

What's the best way to get around this issue? mod rewrite? please advise. thanks!

[edited by: tedster at 8:56 pm (utc) on April 27, 2006]

 

tedster




msg:774256
 9:00 pm on Apr 27, 2006 (gmt 0)

Hello fmfguy, and welcome to the forums.

A couple things. From Google's Technical Guidelines for Webmasters [google.com]

Allow search bots to crawl your sites without session IDs or
arguments that track their path through the site. These technique
are useful for tracking individual user behavior, but the access
pattern of bots is entirely different. Using these techniques may
result in incomplete indexing of your site, as bots may not be
able to eliminate URLs that look different but actually point to
the same page.

Essentially, by including the SESSID in the url, you are creating a situation where a potentially infinite number of urls are possible for a single bit of content -- and as the bot starts trying these (if they even do) the pages actually active in the index will start to erode. Track this kind of data with a cookie if you must, but best practice is to keep it out of the url.

Also, if the parameter "id" is only being used to point to a certain counselor, I would advise changing it to almost anything that doesn't have "id" in it. However, that kind of url "may be" OK for you in this situation. The big probems come with "&id" where ID is the second parameter and it looks like a sessid, with all the troubles I mentioned above or even worse -- no crawling at all.

There are many indexed pages in Google with a single id parameter.

fmfguy




msg:774257
 12:21 am on Apr 28, 2006 (gmt 0)

thanks for the welcome tedster, hopefully i can add my own insights and knowledge to this impressive forum.

what can then be done about the problems you have outlines without re-doing the way the entire site is designed? would a mod rewrite solve the problem? I want google to be able to crawl every ounce of content.

catch2948




msg:774258
 1:53 am on Apr 28, 2006 (gmt 0)

If I were you, I would use a simple Rewrite statement in .htaccess to change the dynamic urls to static ... Google definitely doesn't like the "id=" part at all ...

tedster




msg:774259
 3:02 pm on May 1, 2006 (gmt 0)

As a clarification, Google says they don't like "&id" as a parameter -- that is, using an id parameter beyond a single one.

patc




msg:774260
 3:37 pm on May 1, 2006 (gmt 0)

They definitely do crawl it, but it might end up in the supplemental index.

I know this for sure after finding out how a hacker managed to compromise my system (bad security flaw allowing SQL injection at the end of my id= statement, ugh, what an idiot I am!).

They searched on google for filetype:.asp and "id=4737" or some random number... just happened I had an article that matched it, and they went from there.

fmfguy




msg:774261
 1:52 pm on May 2, 2006 (gmt 0)

does someone have an example of a site where this .htaccess mod rewrite is being utilized? Does it slow load times significantly or no? I am going to do it regardless, but am looking to educate myself :-)

teaperson




msg:774262
 2:25 pm on May 2, 2006 (gmt 0)

It doesn't slow down response significantly.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved