Forum Moderators: open

Message Too Old, No Replies

Google not listing dynamic pages with long strings?

listing dynamic pages with long strings

         

evera

9:14 pm on Dec 22, 2002 (gmt 0)



Most of the content of our site is generated dynamically with queries. Google has included only four of these dynamic pages in its data base.

I have noticed that these four dynamic pages have one thing in common: they have only two parameters, for example domain.com/result.asp?lang=uk&wt=hem

And all of the others have more than two parameters. Do you know if google only indexes queries with two parameters?

Thank you for your help.

Elena

Krapulator

9:28 pm on Dec 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi Evera,

Welcome!

That seems to be my experience. If Google listing is important to you you may want to look at ways of keeping the querystrings short and passing the variables another way.

brotherhood of LAN

9:32 pm on Dec 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



two sounds OK, dunno if I can get anyone to confirm this but it could be dependant on pagerank. If it had 3 queries and a supposed PR of 10 then I think G would just get on with it.

Its understandable why they aint too crazy with some links.....

domain.com/result.asp?lang=uk&wt=hem
domain.com/result.asp?lang=fr&wt=hem
domain.com/result.asp?lang=de&wt=hem
domain.com/result.asp?lang=ir&wt=hem
domain.com/result.asp?lang=da&wt=hem ....

/added
try the site search, theres been a few about queries and google.

defanjos

9:37 pm on Dec 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Evera,

I have noticed the same thing on my sites also, 2 or less parameters --> ok, more than 2 --> not ok

I've had to do many redesigns because of it.

Watty

1:41 pm on Jan 28, 2003 (gmt 0)



Yep I Agree only 2 params for Google

atadams

10:50 pm on Jan 28, 2003 (gmt 0)

10+ Year Member



If this is true then Google should let us webmasters know. They need to be forthcoming with info like this. This has nothing to do with gaming the system.

I've had a couple of pages that I couldn't figure out why Google wasn't crawling. Turns out they have 3 URL variables.

Mohamed_E

11:52 pm on Jan 28, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> If this is true then Google should let us webmasters know.

In their Google Information for Webmasters [google.com] they say:

If you decide to use dynamic pages (i.e., the URL contains a '?' character), be aware that not every search engine spider crawls dynamic pages as well as static pages. It helps to keep the parameters short and the number of them small.

Much has been written here about the issue, look at this Google Site Search [google.com] for more than you really want to know :)

EasyCall

12:15 am on Jan 29, 2003 (gmt 0)

10+ Year Member



This brings up another issue with dynamic pages that I encountered today. I requested to trade links with a site and when they posted my link, it actually is listed as [theirsite.com...] and that generates my site. But will google count this as a backlink? If not, I'm not interested in passing her page rank with nothing coming back in return.

atadams

12:26 am on Jan 29, 2003 (gmt 0)

10+ Year Member



Sorry Mohamed_E, I'm looking for something more specific than "It helps to keep the parameters short and the number of them small." Something like "we don't crawl URLs with more than 2 variables" from Google would be nice. With things like this, a little more specificity from the source is needed, IMO. Why should we have to guess.

pageoneresults

1:42 am on Jan 29, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It helps to keep the parameters short and the number of them small.

Actually, your best bet is to get rid of them altogether. I've seen some say two, and others say one. I know for a fact that there are no issues with one parameter taking into account other areas that may hamper spidering.

Parsed URLs outperform URLs with parameters, in most instances. There are other determining factors that come into play. If you are working with a dynamic site, parsing the URLs is at the top of the list of priorities.

In the last 12 months, I've converted a few sites from dynamic URLs to parsed URLs (appear static). It was a learning experience and one that I find priceless. In fact, I'm still learning this very minute as more and more of those pages are getting indexed and sticking.

atadams

4:29 pm on Jan 31, 2003 (gmt 0)

10+ Year Member



I've had a couple of pages that I couldn't figure out why Google wasn't crawling. Turns out they have 3 URL variables.

A follow up on this. I changed my code to remove the third variable from the URLs. I posted the changes yesterday morning and last night Googlebot crawled ~250 pages that it had never, in over 2 years, crawled before -- 2 url variables vs. 3 being the only significant difference. So it seems from my experience that a max of 2 URL variables is a pretty hard rule. But unless Google tells us, we are all just making educated guesses.

Again, with something like this, I really wish Google would provide specific info (I still have questions about the length of a URL). Google goes on and on about the search results being the main thing, yet these pages were not available in their search results because of what seems to be an arbitrary decision to cut off URL variables at 2.

BTW, I work for a non-profit medical college. My employer nor I have any financial interest in higher rankings. We don't charge for access to our sites. The sites I work on are providing updates to medical professionals on research and treatment for a variety of health issues. We also provide formal continuing medical education (CME). Having these pages available on Google would have no doubt helped a number of doctors (however small) find information that would have been useful in treating their patients. It's a little frustrating to find out after months of trying different things that the reason these pages weren't available to Google's users was something like this. I don't really have a problem with Google ignoring URLs with more than 2 variables, or any similar restrictions they may have, I just wish they would let us know. This has nothing to do with gaming the system, it has everything to do with making relevant pages available to their users. Why keep it secret?

pageoneresults

5:15 pm on Jan 31, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I just wish they would let us know. This has nothing to do with gaming the system, it has everything to do with making relevant pages available to their users. Why keep it secret?

It's not really been a secret. Those who are familiar with the search engine marketing industry know that urls with query strings have been a problem for SE's for many years.

Those same people also know when Google, Fast and others started indexing dynamic content, it's been a while now.

Google and others clearly state in their Webmaster submission guidelines that they can index certain dynamic content. They also offer some tips on how to make that content more indexible.

Our industry is changing so rapidly that if you are away for more than 30 days, you've got some major catching up to do. If you've been away for 6 months, take a one week vacation and spend about 12 hours a day here reading.

The best way to deal with the issues of indexing urls with query strings, is to eliminate all of the query strings. Not just one or two, but all of them. We are fortunate that Google goes as far as it does with query strings, other SE's are not so forgiving.

atadams

6:32 pm on Jan 31, 2003 (gmt 0)

10+ Year Member



It's not really been a secret. Those who are familiar with the search engine marketing industry know that urls with query strings have been a problem for SE's for many years.

If it's not been a secret, can you point me to the thread on Webmaster World (besides this one) that discusses 2 URL parameters being the max Google will crawl. It may exist, but I can't find it. It seems if this specific issue where such common knowledge it would have been discussed more. I found discussions that talk generally about keeping the number of URL parameters down, but no specifics.

Those same people also know when Google, Fast and others started indexing dynamic content, it's been a while now.

Google and others clearly state in their Webmaster submission guidelines that they can index certain dynamic content. They also offer some tips on how to make that content more indexable.

Clearly state? Where does Google clearly state anything about how a URL should be constructed? The only thing I can find states "If you decide to use dynamic pages (i.e., the URL contains a '?' character), be aware that not every search engine spider crawls dynamic pages as well as static pages. It helps to keep the parameters short and the number of them small." That's pretty vague to me. It doesn't even mention what Google's spider does but uses a general "not every search engine spider".

Our industry is changing so rapidly that if you are away for more than 30 days, you've got some major catching up to do. If you've been away for 6 months, take a one-week vacation and spend about 12 hours a day here reading.

Thanks for implying that ignorance is my problem, but I don't think that's the case (at least in this situation). With something as fundamental as this I could either "take a one week vacation and spend about 12 hours a day here reading" and hope to find the info I need or Google could clearly state that they don't crawl URLs with 3 or more parameters (if that's truly the case). Frankly, I'd prefer the latter, wouldn't you? And why shouldn't they clearly state issues like this? I'm not talking about whether a page get a higher ranking or not, those things Google should keep secret. I'm talking about whether a page gets listed at all!

The best way to deal with the issues of indexing urls with query strings, is to eliminate all of the query strings. Not just one or two, but all of them. We are fortunate that Google goes as far as it does with query strings, other SE's are not so forgiving.

Why would you do that? Has Google clearly stated something I'm not aware of? It seems they crawl dynamic URLs just fine if you keep the parameters to 2 or less.

Of course I don't even know this to be true - there could be other issues affecting why my sites are only allowed 2 parameters. Even you don't know. You may believe it to be true, but that belief is based on your interpretation of your experience, not true knowledge. Only Google knows for sure. And it seems, for whatever reason, they ain't tellin'.

pageoneresults

7:19 pm on Jan 31, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Oops, sorry, I guessed I deserved that after reading my reply again.

The general consensus is that Google does not have problems with query strings. In fact, there was a topic not long ago that stated the exact opposite of what I posted above. Basically that topic stated that you need not worry about query strings and Google.

Thing is, I've tested with both. I didn't have the chance to go through the process of removing one query string from three or four to see the results. Based on your findings, it looks like minimizing it to two query strings solved your indexibility issues. I'm not too certain that would apply to everyone. I think it depends more on the structure of the query string. Since this is not my area of expertise, I cannot give any definitive answers.

If you have the ability to remove all query strings, why not do so? If you find that removing one out of three does the trick, that's great. But, there are other SE's out there that are not as technologically advanced as Google. I'm more concerned across the board with indexing. G is of course the predominant SE. There is another factor here when it comes to dynamic content. It appears that PageRank is an important factor in how much of your content gets indexed.

On a side note. From Google...

Reasons your site may not be included.
Your pages are dynamically generated. We are able to index dynamically generated pages. However, because our web crawler can easily overwhelm and crash sites serving dynamic content, we limit the amount of dynamic pages we index.

I wonder if that last line should read...

We limit the number of queries in the string that we index.

P.S. Again, my apologies for the dryness of my above reply. ;)

Skier

7:33 pm on Jan 31, 2003 (gmt 0)

10+ Year Member



At last we are getting some more information on this. I have been searching this site and elsewhere for weeks trying to get a clear answer to the perils of using dynamic urls.

We are building a new site that uses dynamic urls and it is important that we understand. These posts offer some specifics about two versus three parameters. Some posts give warnings not to use 'em at all. Other threads suggest that Google will accept dynamic urls if only if you have a high enough PR,. I agree with those above that the guidance available is conflicting. The official source is does not give any clear information.

We are not trying to cheat. Why does this have to be a gamble? Google - what is your advice to sites contemplating dynamic urls?

atadams

7:35 pm on Jan 31, 2003 (gmt 0)

10+ Year Member



Again, my apologies for the dryness of my above reply.

No prob. I probably need to take a few deep breaths anyway. The whole "tweak...wait a month...tweak" cycle is a little frustrating sometimes. :)

pageoneresults

7:44 pm on Jan 31, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The whole "tweak...wait a month...tweak" cycle is a little frustrating sometimes.

Wait until Freshbot latches on to that content. Then it will be...

"Tweak...Wait 24-48 hours...Tweak Again. Oh the sleepless nights and the countless hours of frustration (sometimes).

Thanks for your understanding.

atadams

8:02 pm on Jan 31, 2003 (gmt 0)

10+ Year Member



If you have the ability to remove all query strings, why not do so?

It's something we are considering. If I were starting a site from scratch, I probably would.

If you find that removing one out of three does the trick, that's great. But, there are other SE's out there that are not as technologically advanced as Google.

I've spent a lot of time in the optimizing for other SE with good results on rankings but not much effect on traffic from those SE. The Gaggle (G,Y!,AOL,Netscape) account for >85% of our SE traffic, MSN is ~3%, and the rest is divided up among the dozens of other SEs. The best ROI I have on my time is by focusing on Google.

atadams

8:04 pm on Jan 31, 2003 (gmt 0)

10+ Year Member



Wait until Freshbot latches on to that content.

Actually, it was the Freshbot that crawled those pages last night. No telling when they might appear in the index.

andreasfriedrich

8:07 pm on Jan 31, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



eliminate all of the query strings

Why would you want to do that? Because the query component is a string of information to be interpreted by the resource (http://www.faqs.org/rfcs/rfc2396.html) the URI is identifying. It is not meant to help identify the resource. This is done entirely without regard to the query string.

Since SEs want to index resources, i.e. documents on the web, not information to be interpreted by those documents it is sensible for them not to spider and index all those resources that are not sufficiently identified by the main components of the URIs.

Now you might want to argue that the one php script that creates the actual pages served to clients is the resource that is identified by the URI. This is one way to look at this problem. Consequently you can expect SEs to index that resource only. Thatīs not what you want.

Andreas

Susanne

11:52 am on Feb 1, 2003 (gmt 0)

10+ Year Member



Would you guys please, please go to this thread and give me some more ideas:
[webmasterworld.com...]
Marcia had a very good point but I think there's something else wrong with our portal. Thanks.