Dynamic URLs and Snippets

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Dynamic URLs and Snippets

nfinland

1:54 pm on Nov 15, 2005 (gmt 0)

I have a blog that still use dynamic pages (index.php?itemid=64). Google did index them in the last update, but they are shown in the index without snippets (just the URL). I guess this means they aren´t fully indexed?

And do you have any experience if G needs several crawles before it indexes dynamic pages fully?

What I´m asking is:

should I wait or try to change the dynamic pages to static (item-64.html -type URLs) or wait for another crawl and update?

webdoctor

2:10 pm on Nov 16, 2005 (gmt 0)

Remember that only the webmaster knows if a page is dynamic - from the point of view of the user / the browser, they make a request and get content.

Googlebot and friends may make an educated guess whether a page is being dynamically generated; URIs which have "?id=XYZ" in them are a bit of a giveaway, but there are lots of ways to disguise this, mod_rewrite being one of them.

If your pages are shown in the index, IMHO it means Google indends to list them but hasn't got round to fetching data for the snippets; in Google's datacentres the snippet info comes from a different index to that for the search results themselves.

My $0.02? Wait a couple of weeks and see what happens before you redesign your site ;-)

Erku

2:18 pm on Nov 16, 2005 (gmt 0)

Would you say that static pages are still more preferable than dynamic pages?

webdoctor

3:23 pm on Nov 16, 2005 (gmt 0)

Would you say that static pages are still more preferable than dynamic pages?

...err, maybe I didn't make myself clear.

You can't really tell whether a page is static or dynamic.

I have servers which serve pages that appear to be static - www.example.com/keyword.html - but actually it's mod_rewrite in the background, and there's a content management system too, so after mod_rewrite's done its magic, the server gets a request for www.example.com/cms.php?search=keyword and the CMS kicks in and serves that page from a MySQL database.

As a user, or a Googlebot, you cannot tell that this is going on

If you're designing from the ground up, then doing things this way is probably a good idea, because it's always best to abstract the technologies in use from the end users. In fact, the W3C recommend doing without extensions at all - they suggest using URIs such as www.example.com/keyword

No-one can really tell whether Google prefers pages that appear to be static or pages that appear to be dynamic, but if I had the choice I'd rather serve my users content that is as easy to use as possible - and I'd say

www.example.com/keyword.html

beats

www.example.com/fetchpage.aspx?mode=search&num=100&query=doitrealquick&valueofpi=4
any day.

Which would you click on?

nfinland

3:45 pm on Nov 16, 2005 (gmt 0)

Thanks for the info.

P.S.

Google and other SE´s might have problems when indexing dynamic pages and they prefer static ones:

Google Guidelines:

If you decide to use dynamic pages (i.e., the URL contains a "?" character), be aware that not every search engine spider crawls dynamic pages as well as static pages. It helps to keep the parameters short and the number of them few.

Serving static pages that actually are dynamic is another thing. I can do this with my blogging software, but I have noticed that Google somehow picks up both the dynamic and the static (via .htacces) page. This could trigger some duplicate content filter - maybe?

Erku

6:18 pm on Nov 16, 2005 (gmt 0)

I mean, it took me about 48 hours for the first update.

Now it shows yesterday's results, but nothing for today. So how often does it update?

Do we have live stats?

Thank you.

theBear

7:42 pm on Nov 16, 2005 (gmt 0)

Ok folks here is a way to tell if the page is dynamic.

If there is a query string that url can be considered dynamic.

Does this get all dynamic pages, no but it gets a lot of them.

Then it could look at the file extension.

.cgi .pl .php .jsp, .asp ... all reek of being dynamic correct, ok they go on the pile with the query string pages as being dynamic.

Does this get them all nope but it sure gets one huge pile of them.

There are other methods that involve probing but the above is enough to start the ball rolling.

Erku,

Update frequency appears to be dependent on page PR (those pesky IBL).

Google may get some pages once a day and some once a month.

webdude

8:04 pm on Nov 16, 2005 (gmt 0)

I am still not sure what all the hallabaloo is about not wanting to use or trying to hide dynamic pages. All of my sites are dynamic. The only problem I have ever had with G, Y, MSN, etc etc... is when I try to pass unique userreference information for tracking users. I have many sites that run either forums or shopping baskets that are 100% dynamic. Several of these sites rank #1 for very competative terms. So what is the big deal? Microsoft's site is 100% dynamic, yet I don't see them having a problem ranking. Yahoo also is mostly dynamic content. In fact, most of the major search engines out there are dynamic and of these, they rank very well in G. Some even better then G itself. And most of them do not even try to hide the fact that they are dynamic. Type in software in G and the first 5 results are sites that are dynamic including the first link which is --- you guessed it... Google's news search which is dynamic...

[news.google.com...]

Google Images - dynamic
Google Groups - dynamic
Google News - dynamic
Froogle - Dynamic
etc etc adnaseum

Maybe it isn't so much the fact that the pages are dynamic, but rather what the quality and uniqueness of the content is.

Something to think about anyway.

theBear

8:32 pm on Nov 16, 2005 (gmt 0)

webdude it is a combination of factors the most important one of which is how fast your site gets botted.

But Google is able to tell which is which to some degree.

The rate that the bots spider pages on your site has a large impact on how your site shows up.

Using the same script under a rewrite rule set will result in a different fully indexed rate which is faster than the dynamic version.

This has been repeatedly mentioned on this site, but I'll adde yet another example.

I made a pageset static and the static stuff was fully indexed in less than a week where as the dynamic pages took over 5 weeks to get less than 50% of them indexed.

There was less than a 50% similarity between any two pages in the set just based upon word occurances let alone relative word placement. So duplicate content should be a non issue.

In addition the pages when generated via the scripts all eventually became fully indexed. It just took what seemed forever,

webdude

8:41 pm on Nov 16, 2005 (gmt 0)

I have never had that problem. The content, title, meta description, menus, format... almost all aspects of my sites are dynamic. On one of my forums, it takes G less then 3 days to have the new pages up. The site gets crawled every other day. I think that the crux of the problem is people who use dynamic data to load into template pages that have the same title, meta description, menu, format, etc. I also think that these type pages will downgrade because of dupe content. When laying out dynamic pages, you need to also pay attention to this stuff. G does not like 10,000 pages with the same title and description, but using a database, the bot will take, and eagerly eat, pages that have different content, titles and descriptions. It is all on how you use your code.

theBear

8:56 pm on Nov 16, 2005 (gmt 0)

If you indeed pump out the same title etc you'll find you have problems, however none of the pages generated by the script I was talking about had the same title etc and in fact the real content portion was just plain not even in the ball park of being the same.

Every site has a different rate at which it gets loving attention by the bots.

So the only real test is to do it both ways on the same site.

It is all in the IBLs.

webdude

9:08 pm on Nov 16, 2005 (gmt 0)

theBear,

you posted...

Using the same script under a rewrite rule set will result in a different fully indexed rate which is faster than the dynamic version.

Are you saying that the bots crawl a rewrite rule set faster?

Or are you saying that the page gets indexed faster under a rewrite rule set?

nfinland

9:15 pm on Nov 16, 2005 (gmt 0)

Hmmm...

I think you misunderstood?

I said I have a blog that by default generates dynamic pages like this:
index.php?itemid=64

When I make them search engine friendly they look like this:
item-64.html

Now I have two pages that are identical and if Google indexes both pages there might (?) be a duplicate content penalty.

theBear

9:22 pm on Nov 16, 2005 (gmt 0)

nfinland, that is a certainty the only question is what will google do in that case.

webdude, I am saying that other things being the same Google seems to like the "static" version for spidering and indexing better than the one with the query string in the url.

Others on the forum have commented on this many times.

Untill I just did a page set myself I was willing to allow them that it was possible. I've seen enough now to say they are probably correct. However maybe Google was having a good bot day on the site.

It should make zero difference, however could have, should have, did have, and will have can be and frequently are different.

webdude

9:24 pm on Nov 16, 2005 (gmt 0)

Okay, I'll buy that :-)