Pagerank of Active Pages

Forum Moderators: open

Message Too Old, No Replies

Pagerank of Active Pages

Should we be careful about online database?

SlyOldDog

10:42 am on Nov 25, 2002 (gmt 0)

We have a site where we list online properties. At the moment the site is static HTML and all properites are entered by hand.

This way we are sure that each page earns a pagerank and can pass it on.

However, it is very cumbersome and adding properties takes a long time. We are developing a database which would make the whole thing easier, but we are concerned that if we replace the static content with database pages:

1) they may not all get crawled because Google will not try every possible query. Perhaps a solution to this is to create an HTML link to each database page we want crawled?

2) the database pages may be seen as transitory and not earn real pagerank (this takes 2 months to propagate properly). Can these pages earn real pagerank? Can they pass it on?

Thanks for your thoughts.

Crush

12:04 pm on Nov 25, 2002 (gmt 0)

My very question too Sly. What are the consequences of changing static to dynamic content? Are there any pointers from someone who has gone through the change? Is the best a mix of Static and dynamic?

seindal

12:12 pm on Nov 25, 2002 (gmt 0)

First, make the frontend to the database so the urls look like normal static files, that is, keep the query parameters in the url itself. Don't use?a=b unless it is for a search interface or something similar. Instead of [site...] make it [site...] and parse out the information in the url.

Next, make sure all you old urls still work, if necessary by setting up redirects for each and every of your old files. The best solution would probably be if you could maintain your old namespace while moving from static to dymanic content.

René.

Crush

1:00 pm on Nov 25, 2002 (gmt 0)

Thanks Rene

How about leaving the old html files there and making some new dynamic ones and a link from the index page to both so they both get crawled....or I am just setting myself up for a page rank 0 for having duplicate content?

Grumpus

1:21 pm on Nov 25, 2002 (gmt 0)

There's really no trick here. In the olden days (i.e. Before Summer, 2002) there was always the question of whether or not a dynamic page would get crawled.

That's not the case anymore - just keep the querystrings short and make it so that there's no "superfluous" data in the strings.

To make a map, just make a master/detail list and the bot will crawl the master list - you should have one of those anyway.

If you aren't much of a data developer, this project sounds easy - 1 table with a dozen or so fields in it for all of the site information. You could probably get someone to make it for you for under $300 (TRANSLATED: 30 hours a redundant data entry.)

TheComte

1:44 pm on Nov 25, 2002 (gmt 0)

There must be a trick here Grumpus. I just acquired a client who had no link backs, but who has other (paid for) pages pointing that are ASP, PHP and SQL based. Those links are not being picked up by Google and that is currently my reason for being here. I suspect that Google is not quite there yet as far as dynamic databases are concerned, and I would like to give him/her/it some good advice. Should I tell them to use a complicated system of dynamic retrival, or should I advise static html pages from the get go? I know that we can alter the appearance of dynamic pages, but is it worth the effort?

seindal

2:08 pm on Nov 25, 2002 (gmt 0)

As far as I have understood only links from pages with PR>=4 are shown by google.

I believe it is worth the effort to cloak dynamic pages as static, because it gives the user a meaningful and understandable namespace. If it helps being indexed by SEs that is an added benefit.

René.

taxpod

2:49 pm on Nov 25, 2002 (gmt 0)

I have many pages in Google which are like this:

/Page.asp?Page=1

These typically show one back link from a PR5 and carry PR4 themselves. But some of these are from PR4s and show no backlinks but are still in the database. I think you get into trouble where your pages look like this:

/Page.asp?State=NY&County=Westchester&Property=25

Gbot seems to have more trouble with multiple variables or avoids them altogether. IMHO, that is precisely where you want to make the URLs look like this:

/Page/NY/Westchester/25

or whatever.

Another site I'm involved with is in a database form and has absolutely no trouble getting pages spidered. That site uses 3 or 4 variables per page but we re-write so that it the variables look like subdirectories. So I'd have to conclude that databases where the URLs look "normal" are the way to go.

SlyOldDog

8:22 pm on Nov 25, 2002 (gmt 0)

Thanks guys.

Any evidence that Google doesn't like cloaking of database pages to HTML?

Any reason they might not in the future?

john5

9:31 pm on Nov 25, 2002 (gmt 0)

My web site with hundreds of pages consists nearly exclusively of dynamic Active Server pages with .asp? extensions. I am well ranked and Google follows IMO all links as long as the parameter list behind the "?" is not too long or the links are not too deeply buried. Unless there is a way of transforming dynamic pages into static ones automatically and without any major effort or problems, I think it is not worth to do it.

sun818

9:55 pm on Nov 25, 2002 (gmt 0)

If the database you are constructing is ODBC compliant or Access driven, you can use several Windows desktop programs that can create web pages based on a template page and content pulled from the database. It took me half a day to create and debug the template page. After the system was set up, it took five minutes to publish 2000 pages. ;)

Sticky me if you want a list of DB to static HTML/Web programs.

Krapulator

3:00 am on Nov 26, 2002 (gmt 0)

An alternative solution is to use some sort of third party CMS software which generates static pages from the database and simply updates the static pages when the database is updated (or at certain intervals).

Grumpus

1:45 pm on Nov 26, 2002 (gmt 0)

Most of the trick here is to make sure that the database outputs to something that looks like plain old HTML when you look at the source. The name of the page seems irrelevant (asp/php/etc.) so long as what the bot sees looks like HTML.

I think the lingering problems many folks are having getting dynamic pages indexed is because the (resulting) page source is just too damned complicated. On all the sites where I've heard complaints it is inevitably a page with so much muck that the bot gets confused.

Things like:

Too many languages/scripts: Many sites dump a date (or another variable) from the database via ASP (or PHP, or whatever) then parse it to a readable format client-side via Javascript or some other snippet they picked up on a web site. Why? You're using ASP (or PHP, or whatever) - keep it all the same. You're confusing the robot by saying "Do you gusta play al futball?" rather than just saying "Do you like to play soccer?"

Not making a dynamic title to go along with the dynamic content. Googlebot hits a few pages and sees lots of words on the page, but the title on all of them are "My Site: The Best Site on the Web!". Therefore, they are all about the same thing despite the content, so there's no need to index more than one page.

In the end: If you're going to make a data driven site - make a data driven site and process everything on the server side and spit it out as HTML. The only javascript should be in forms for setting focus and validation (stuff the bots don't even look at anyway).

And, as mentioned - there can't be stuff in the URL that doesn't effect the content of the page. In other words, don't use session ID's, click tracking codes, or other things like that.

Everyone seems to be making "work" out of nothing. Do it clean and uniformly in the first place and you won't be worried about "tricks" to get the bot in later.