Forum Moderators: Robert Charlton & goodroi
and other content looks like this:
http://example.com/family/articledetails.php?id=4
will Google even crawl all of this?
I was hoping to gain an edge by having thousands of counselor profiles and other content, but if the site is designed in this way, will google even crawl it efficiently?
What's the best way to get around this issue? mod rewrite? please advise. thanks!
[edited by: tedster at 8:56 pm (utc) on April 27, 2006]
A couple things. From Google's Technical Guidelines for Webmasters [google.com]
Allow search bots to crawl your sites without session IDs or
arguments that track their path through the site. These technique
are useful for tracking individual user behavior, but the access
pattern of bots is entirely different. Using these techniques may
result in incomplete indexing of your site, as bots may not be
able to eliminate URLs that look different but actually point to
the same page.
Essentially, by including the SESSID in the url, you are creating a situation where a potentially infinite number of urls are possible for a single bit of content -- and as the bot starts trying these (if they even do) the pages actually active in the index will start to erode. Track this kind of data with a cookie if you must, but best practice is to keep it out of the url.
Also, if the parameter "id" is only being used to point to a certain counselor, I would advise changing it to almost anything that doesn't have "id" in it. However, that kind of url "may be" OK for you in this situation. The big probems come with "&id" where ID is the second parameter and it looks like a sessid, with all the troubles I mentioned above or even worse -- no crawling at all.
There are many indexed pages in Google with a single id parameter.
what can then be done about the problems you have outlines without re-doing the way the entire site is designed? would a mod rewrite solve the problem? I want google to be able to crawl every ounce of content.
I know this for sure after finding out how a hacker managed to compromise my system (bad security flaw allowing SQL injection at the end of my id= statement, ugh, what an idiot I am!).
They searched on google for filetype:.asp and "id=4737" or some random number... just happened I had an article that matched it, and they went from there.