homepage Welcome to WebmasterWorld Guest from 54.198.148.191
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Website
Visit PubCon.com
Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

This 37 message thread spans 2 pages: 37 ( [1] 2 > >     
Tips for Helping Google Index Dynamic Pages
dvduval




msg:213349
 2:15 am on Nov 8, 2002 (gmt 0)

4 important concepts for making sure dynamic pages are indexed:

1. All pages have a unique and descriptive title
2. The pages don't have an excessively long URL. Once you above about 50 characters, the chance of the page being indexed declines rapidly. Over 90 - almost never.
3. The page is not deep in a directory such as: domain.com/i-am-great/why/reasons/press-releases/he-is-great.php. Better: domain.com/he-is-great.php
4. There are several links to the page. In other words, you should have good site navigation.

Any other tips you would add?

 

Chris_R




msg:213350
 2:24 am on Nov 8, 2002 (gmt 0)

Ths sort of goes along with what you were saying, but every &,?, and = added in the url seems to decrease the chances of it being indexed.

seindal




msg:213351
 9:39 am on Nov 8, 2002 (gmt 0)

Wouldn't the best way be to camouflage the pages as static by putting the arguments in the file name. That way the SEs won't realise that the pages are dynamic.

It will probably also give URLs that are easier understood by the normal user.

René.

xbase234




msg:213352
 5:17 pm on Nov 8, 2002 (gmt 0)

Build a site map that allows a user to click through every page.

It worked well for my 20,000 page site.

edit_g




msg:213353
 5:25 pm on Nov 8, 2002 (gmt 0)

Yup- I can confirm the sitemap- works like a charm.

respree




msg:213354
 5:55 pm on Nov 8, 2002 (gmt 0)

How the heck do you build a site map with 20,000 links? Is this only for the spider to see?

rogerd




msg:213355
 6:02 pm on Nov 8, 2002 (gmt 0)

Welcome to WebmasterWorld, Respree... I was wondering about the 20K links, too... certainly a multi-page, hiearchical site map would be needed for that many links?

nipear




msg:213356
 6:09 pm on Nov 8, 2002 (gmt 0)

If you have several variables in your page url like:
page.asp?var1=34&var2=x&var3=yes

Then try making a site map with only 1 variable in the URL. In the May '02 update our 3 variable pages were dropped from the index (which was most of our site). So I built a site map with 1 variable URL's. It was a little difficult and some info on each page was missing, but it did get the pages back into the index.

Also make sure you have NO REDIRECTS in your site navigation. Many off the shelf programs will redirect to different pages if variables are missing from the URL. For example if one category of your site has no sub categories it might be redirecting the content page after looking for sub categories. If the there are sub categories, the page is shown with the sub categories listed.

Home -> category -> sub category list-> content
Home -> category -> redirect -> content

edit_g




msg:213357
 12:39 pm on Nov 9, 2002 (gmt 0)

With Coldfusion you can get this to autogenerate for you by including query results on a dynamically generated page without any?'s or &'s. I don't know about .asp, .net or .php- buy you can probably do something simillar.

So your sitemap would look like this:

widgets/fuzzybluewidgets/links to each individual widget
/fuzzygreenwidgets/links to each individual widget
/nonfuzzybluewidgets/links to each individual widget
/nonfuzzygreenwidgets/links to each individual widget

All you would have to set up would be the widgets page and the subcat one down from that.

GilbertZ




msg:213358
 1:57 am on Nov 10, 2002 (gmt 0)

I'm very interested in this as well. We used to use infopop mainly due to html output for the search engines...switched to vbulletin for many reasons, but with 10-20 times as many messages, none if any of the content is indexed. I tried some hack that vb'ers created to make archives available, involving 404 pages, but google didn't bite at all.

I read that index.php?threadid=nnn should work for google, but no love...

I just spent the morning creating a hack that would allow me to output all the content in a way that each post would show up as:

domain.com/1/index.php

where 1=postid...but after all that, I got stopped at post 35,000 or so because it looks like Linux won't allow more than 35,000 subdirectories..

The rewrite trick isn't working for me which I suppose is just as well as I read somewhere that mod-rewrite sends a certain header out and that google can detect it...

This is an extremely annoying problem.

hiker_jjw




msg:213359
 4:58 am on Nov 10, 2002 (gmt 0)

Also, look into Apache mod_rewrite if you are using an Apache Server.

[httpd.apache.org...]

GilbertZ




msg:213360
 6:04 am on Nov 10, 2002 (gmt 0)

yah, well right at the top of the page sums it up:

"The great thing about mod_rewrite is it gives you all the configurability and flexibility of Sendmail. The downside to mod_rewrite is that it gives you all the configurability and flexibility of Sendmail."

I found that if the rewrite rules didn't cause the HTTPD daemon not to load anymore, then it didn't do what I expected it to... I had similar experiences with Sendmail :)

WebRankInfo




msg:213361
 9:21 am on Nov 10, 2002 (gmt 0)

domain.com/1/index.php

where 1=postid...but after all that, I got stopped at post 35,000 or so because it looks like Linux won't allow more than 35,000 subdirectories..


have you tried to make URLs like domain.com/1_index.php to avoid creating thousands of directories?
I don't know if it could be better...

Grumpus




msg:213362
 12:22 pm on Nov 10, 2002 (gmt 0)

GilberZ - Stickymail me the URL of the PHP site in question. Usually I can spot something in two minutes on the front page of the site that's keeping Google from crawling it. It's usually just a silly coding error. Google right now LOVES ASP and PHP driven sites - if they are done right.

G.

GilbertZ




msg:213363
 6:35 pm on Nov 10, 2002 (gmt 0)

Thanks G! That was a very nice offer..I sent you a sticky!

BTW, I just read an article interviewing Brett. Geeez! You wrote this forum code yourself? Awesome job Brett!

You should resell it! I'm very happy with vbulletin and have no plans to switch, but this is certainly marketable code...

relgoog




msg:213364
 11:19 pm on Nov 11, 2002 (gmt 0)
The key that I have found is to use clever coding..
Many of the sites I have seen make good use of PHP, .htaccess and $QUERY_STRING.

The key to this is having structured urls.. for example..

domain.com/12343223/keyword4/keyword3/keyword2/keyword.html

of course google see's this as a structured site with keyword rich url.

The script could do a $key = explode("/",$QUERY_STRING); {following me here php guys?} this would then mean that $key[1] == 12343223 which can be used to output your content accordingly..

This isnt the best explanation, but by utilising apache error404's (and forcing the header as "200 OK"[smilestopper]) Tricks google into getting the big urls (without the query strings).

You can use this as a search function (PHP.net utilises this for something such as http://www.php.net/explode, although they then header redirect)

This isnt the best explanation in the world but ive just come back from the pub :-p

GilbertZ




msg:213365
 12:28 am on Nov 12, 2002 (gmt 0)

I've done the 404 thing but google isn't biting...perhaps gguy can confirm but it looks like google detects the 404 error code and therefore doesn't index it?

relgoog




msg:213366
 11:28 am on Nov 12, 2002 (gmt 0)

did you force the header as 200 OK? If you force it as 200 OK then google has no way of telling ( header("200 OK"); ).
If you haven't then google will just 404tastic you and reduce your ranking for poor linkage.

GilbertZ




msg:213367
 7:29 pm on Nov 12, 2002 (gmt 0)

I used a standard hack...I'll have to check the code...

BTW, is there a way to tell from the browser or some other method what the header is?

Grumpus also mention to me in sticky that I have some no cache headers, which is just vbulletin stuff, could that also have an effect on google?

relgoog




msg:213368
 7:33 pm on Nov 12, 2002 (gmt 0)

if you have something like header("pragma:no-cache") i believe google would not index it (im not sure if my header syntax is 100% correct in this case).

jatar_k




msg:213369
 7:33 pm on Nov 12, 2002 (gmt 0)

You can use this to check headers

Server Header Checker [searchengineworld.com]

Philosopher




msg:213370
 7:51 pm on Nov 12, 2002 (gmt 0)

I don't believe the 'pragma:nocache' attribute will stop Google from indexing you at all.

I know at least one of my clients has this in the header of every single page (some 1k pages) and all of them are indexed.

dvduval




msg:213371
 8:04 pm on Nov 12, 2002 (gmt 0)

What can one do to change or manipulate the server header?

Digimon




msg:213372
 8:09 pm on Nov 12, 2002 (gmt 0)

Could anybody tell me where can I find complete explanations about the different ways to solve the problem of dynamic urls in SEO? (mod_rewrite, 404, etc...)
Thanks a lot.

GilbertZ




msg:213373
 9:00 pm on Nov 12, 2002 (gmt 0)

turns out that I have one domain pointing to another...although in most cases it doesn't do this, for some reason the hack was sending a redirect to the domain in the httpd.conf, so there was no direct backlink to those pages...only to the redirect...could this have caused the problem?

gilli




msg:213374
 12:57 am on Nov 13, 2002 (gmt 0)

Purely personal take on all this based on observations and things I think just make sense.

Rule 1:
Keep you url variables in a structure so that any particular url will always (or as often as possible) return the same content. When google returns to the page it expects to see similar or at least on topice content - not an error or 404. So "once only" url strings are bad.

Rule 2:
Each page should have a unique url, so session identifiers are very bad. If google visits the links:

something.php?sessionId=123,
something.php?sessionId=465 and
something.php?sessionId=789

and the same content is presented for each request I *think* it will get fed up with your site pretty quickly.

Fuzzy stuff:
Short and descriptive is probably good for users, but I don't think google cares too much (I think going over board with length will probably have a negative impact but I don't have proof). I have a site that uses url variables in the form something.php?x=123,456,789,123 (kind of like Vignette does/did). This is meaningless to users but google doesn't seem to mind it at all, this site does however conform to rules 1 & 2.

There are ways of "hiding" url variables, one way I have used is something.php/page/contact_us. This works by tricking to user agent into thinking that contact_us is a folder, while the web server knows that something.php is the file that is being requested. You then grab the query string environment variable and strip the variable name/variable value pairs out off the end of it (in this case page=contact_us). How you do this depends on what you are using in terms of a web server & scripting language (do a search on "search safe urls php" or whatever your language is). Anyway I think this is rapidly becoming redundant and probably not worth worrying about. Personally I don't worry about hiding the "?" but concentrate on rules 1 & 2 and things seem to be fine.

Another quick thing about that "hiding" method - the user agent thinks that its looking at the contact_us folder so you must use absolute paths for all links and images references which can be a pain, especially if you are trying to implement this on an existing site.

Anyway hope thats clear. While its not gospel, sites I've been looking at for the last 6 months it seems to work well. Maybe Brett_T or GG can confirm?

relgoog




msg:213375
 8:26 am on Nov 13, 2002 (gmt 0)

good point about absolute url's, I always negate to mention that issue

GilbertZ




msg:213376
 5:58 pm on Nov 13, 2002 (gmt 0)

How about a <base href="" rather than absolute urls?

relgoog




msg:213377
 6:29 pm on Nov 13, 2002 (gmt 0)

yeah <base would work.. but why pass go without collecting £200? best to use absolute url's imho

dspeake




msg:213378
 7:56 pm on Dec 2, 2002 (gmt 0)

I get around this on my site by using cleverly coded .htaccess files. For example, I have PHP files that operate like news.php?id=54 and tune.php?id=89 but my URLs are, for example, [djism.com...] which looks like there is an individual page for each item.

This is done by the following (hopefully fool-proof) code:

Options +FollowSymlinks
RewriteEngine on
RewriteRule ^news/(.*).php$?id=$1 [L]

In the same way, I redirect images to GD to be thumbnailed:

RewriteRule ^images/news/small_(.*)$ [gdfile].php?src_img=images/news/$1 [L]

I would not advise using the '404' method as this could potentially return '200 OK' when a 404 has been encountered. A good example of 404 usage is on the PHP.net site where it will redirect to the closest page or if it can't find one, send the url to the search page.

HTH

This 37 message thread spans 2 pages: 37 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved