Forum Moderators: open
I have noticed that these four dynamic pages have one thing in common: they have only two parameters, for example domain.com/result.asp?lang=uk&wt=hem
And all of the others have more than two parameters. Do you know if google only indexes queries with two parameters?
Thank you for your help.
Elena
Its understandable why they aint too crazy with some links.....
domain.com/result.asp?lang=uk&wt=hem
domain.com/result.asp?lang=fr&wt=hem
domain.com/result.asp?lang=de&wt=hem
domain.com/result.asp?lang=ir&wt=hem
domain.com/result.asp?lang=da&wt=hem ....
/added
try the site search, theres been a few about queries and google.
In their Google Information for Webmasters [google.com] they say:
If you decide to use dynamic pages (i.e., the URL contains a '?' character), be aware that not every search engine spider crawls dynamic pages as well as static pages. It helps to keep the parameters short and the number of them small.
Much has been written here about the issue, look at this Google Site Search [google.com] for more than you really want to know :)
It helps to keep the parameters short and the number of them small.
Actually, your best bet is to get rid of them altogether. I've seen some say two, and others say one. I know for a fact that there are no issues with one parameter taking into account other areas that may hamper spidering.
Parsed URLs outperform URLs with parameters, in most instances. There are other determining factors that come into play. If you are working with a dynamic site, parsing the URLs is at the top of the list of priorities.
In the last 12 months, I've converted a few sites from dynamic URLs to parsed URLs (appear static). It was a learning experience and one that I find priceless. In fact, I'm still learning this very minute as more and more of those pages are getting indexed and sticking.
I've had a couple of pages that I couldn't figure out why Google wasn't crawling. Turns out they have 3 URL variables.
A follow up on this. I changed my code to remove the third variable from the URLs. I posted the changes yesterday morning and last night Googlebot crawled ~250 pages that it had never, in over 2 years, crawled before -- 2 url variables vs. 3 being the only significant difference. So it seems from my experience that a max of 2 URL variables is a pretty hard rule. But unless Google tells us, we are all just making educated guesses.
Again, with something like this, I really wish Google would provide specific info (I still have questions about the length of a URL). Google goes on and on about the search results being the main thing, yet these pages were not available in their search results because of what seems to be an arbitrary decision to cut off URL variables at 2.
BTW, I work for a non-profit medical college. My employer nor I have any financial interest in higher rankings. We don't charge for access to our sites. The sites I work on are providing updates to medical professionals on research and treatment for a variety of health issues. We also provide formal continuing medical education (CME). Having these pages available on Google would have no doubt helped a number of doctors (however small) find information that would have been useful in treating their patients. It's a little frustrating to find out after months of trying different things that the reason these pages weren't available to Google's users was something like this. I don't really have a problem with Google ignoring URLs with more than 2 variables, or any similar restrictions they may have, I just wish they would let us know. This has nothing to do with gaming the system, it has everything to do with making relevant pages available to their users. Why keep it secret?
I just wish they would let us know. This has nothing to do with gaming the system, it has everything to do with making relevant pages available to their users. Why keep it secret?
It's not really been a secret. Those who are familiar with the search engine marketing industry know that urls with query strings have been a problem for SE's for many years.
Those same people also know when Google, Fast and others started indexing dynamic content, it's been a while now.
Google and others clearly state in their Webmaster submission guidelines that they can index certain dynamic content. They also offer some tips on how to make that content more indexible.
Our industry is changing so rapidly that if you are away for more than 30 days, you've got some major catching up to do. If you've been away for 6 months, take a one week vacation and spend about 12 hours a day here reading.
The best way to deal with the issues of indexing urls with query strings, is to eliminate all of the query strings. Not just one or two, but all of them. We are fortunate that Google goes as far as it does with query strings, other SE's are not so forgiving.
It's not really been a secret. Those who are familiar with the search engine marketing industry know that urls with query strings have been a problem for SE's for many years.
If it's not been a secret, can you point me to the thread on Webmaster World (besides this one) that discusses 2 URL parameters being the max Google will crawl. It may exist, but I can't find it. It seems if this specific issue where such common knowledge it would have been discussed more. I found discussions that talk generally about keeping the number of URL parameters down, but no specifics.
Those same people also know when Google, Fast and others started indexing dynamic content, it's been a while now.Google and others clearly state in their Webmaster submission guidelines that they can index certain dynamic content. They also offer some tips on how to make that content more indexable.
Clearly state? Where does Google clearly state anything about how a URL should be constructed? The only thing I can find states "If you decide to use dynamic pages (i.e., the URL contains a '?' character), be aware that not every search engine spider crawls dynamic pages as well as static pages. It helps to keep the parameters short and the number of them small." That's pretty vague to me. It doesn't even mention what Google's spider does but uses a general "not every search engine spider".
Our industry is changing so rapidly that if you are away for more than 30 days, you've got some major catching up to do. If you've been away for 6 months, take a one-week vacation and spend about 12 hours a day here reading.
Thanks for implying that ignorance is my problem, but I don't think that's the case (at least in this situation). With something as fundamental as this I could either "take a one week vacation and spend about 12 hours a day here reading" and hope to find the info I need or Google could clearly state that they don't crawl URLs with 3 or more parameters (if that's truly the case). Frankly, I'd prefer the latter, wouldn't you? And why shouldn't they clearly state issues like this? I'm not talking about whether a page get a higher ranking or not, those things Google should keep secret. I'm talking about whether a page gets listed at all!
The best way to deal with the issues of indexing urls with query strings, is to eliminate all of the query strings. Not just one or two, but all of them. We are fortunate that Google goes as far as it does with query strings, other SE's are not so forgiving.
Why would you do that? Has Google clearly stated something I'm not aware of? It seems they crawl dynamic URLs just fine if you keep the parameters to 2 or less.
Of course I don't even know this to be true - there could be other issues affecting why my sites are only allowed 2 parameters. Even you don't know. You may believe it to be true, but that belief is based on your interpretation of your experience, not true knowledge. Only Google knows for sure. And it seems, for whatever reason, they ain't tellin'.
The general consensus is that Google does not have problems with query strings. In fact, there was a topic not long ago that stated the exact opposite of what I posted above. Basically that topic stated that you need not worry about query strings and Google.
Thing is, I've tested with both. I didn't have the chance to go through the process of removing one query string from three or four to see the results. Based on your findings, it looks like minimizing it to two query strings solved your indexibility issues. I'm not too certain that would apply to everyone. I think it depends more on the structure of the query string. Since this is not my area of expertise, I cannot give any definitive answers.
If you have the ability to remove all query strings, why not do so? If you find that removing one out of three does the trick, that's great. But, there are other SE's out there that are not as technologically advanced as Google. I'm more concerned across the board with indexing. G is of course the predominant SE. There is another factor here when it comes to dynamic content. It appears that PageRank is an important factor in how much of your content gets indexed.
On a side note. From Google...
Reasons your site may not be included.
Your pages are dynamically generated. We are able to index dynamically generated pages. However, because our web crawler can easily overwhelm and crash sites serving dynamic content, we limit the amount of dynamic pages we index.
I wonder if that last line should read...
We limit the number of queries in the string that we index.
P.S. Again, my apologies for the dryness of my above reply. ;)
We are building a new site that uses dynamic urls and it is important that we understand. These posts offer some specifics about two versus three parameters. Some posts give warnings not to use 'em at all. Other threads suggest that Google will accept dynamic urls if only if you have a high enough PR,. I agree with those above that the guidance available is conflicting. The official source is does not give any clear information.
We are not trying to cheat. Why does this have to be a gamble? Google - what is your advice to sites contemplating dynamic urls?
If you have the ability to remove all query strings, why not do so?
It's something we are considering. If I were starting a site from scratch, I probably would.
If you find that removing one out of three does the trick, that's great. But, there are other SE's out there that are not as technologically advanced as Google.
I've spent a lot of time in the optimizing for other SE with good results on rankings but not much effect on traffic from those SE. The Gaggle (G,Y!,AOL,Netscape) account for >85% of our SE traffic, MSN is ~3%, and the rest is divided up among the dozens of other SEs. The best ROI I have on my time is by focusing on Google.
eliminate all of the query strings
Why would you want to do that? Because the query component is a string of information to be interpreted by the resource (http://www.faqs.org/rfcs/rfc2396.html) the URI is identifying. It is not meant to help identify the resource. This is done entirely without regard to the query string.
Since SEs want to index resources, i.e. documents on the web, not information to be interpreted by those documents it is sensible for them not to spider and index all those resources that are not sufficiently identified by the main components of the URIs.
Now you might want to argue that the one php script that creates the actual pages served to clients is the resource that is identified by the URI. This is one way to look at this problem. Consequently you can expect SEs to index that resource only. Thatīs not what you want.
Andreas