Welcome to WebmasterWorld Guest from 54.81.69.220

Forum Moderators: open

Message Too Old, No Replies

Google is now better at spidering dynamic sites.

Site is finally spidered after 5 years.

     

lgn

2:29 am on Apr 30, 2003 (gmt 0)

Full Member

joined:June 18, 2002
posts:343
votes: 0


In the past Google took one look at our url string and barf. We only had our home page spidered for the last five years.

It appears that googles algorthim has improved, and has spidered our site this month. All my 300 or so pages now have page rank. Wow.

So it appears that Google has improved it algrothim for spidering dynamic pages.

All goes to show, that if you stick your head in the sand, eventually somebody will come by and dig you out :)

6:35 am on Apr 30, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Oct 8, 2001
posts:2882
votes: 0


We're getting better on dynamic pages every month thanks to better analysis. I think we crawl dynamic pages better than any general search engine at this point..
6:57 am on Apr 30, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Oct 4, 2002
posts:666
votes: 0


so is it good enough to spider urls like this yet?

domain.com/page?_pageid=54,1,54_36340:54_94476&_dad=gprtl&_schema=GPRTL

On second thoughts....dont answer that ;-)

7:28 am on Apr 30, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 7, 2003
posts:90
votes: 0


well from what I've seen on the sites that we're running google does a good job of doing www.site.com?name=eric, but not so good at www.site.com?name=eric&last=lastname, unless the site has a high PR. What do you think googleguy?
8:32 am on Apr 30, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Oct 5, 2001
posts:2466
votes: 0


more than two variables tends not to be indexed, but i do feel that google is starting to set the number of dynamically generated product pages higher once over we struggled to get
product.asp?id=1 past product.asp?id=125 it seems higher now GG any comment (and thanks for the pen)

DaveN

Ps GG the 4 year old technology thing, if it works why fix it ;)?

12:06 pm on Apr 30, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Nov 2, 2002
posts:104
votes: 0


lgn, could you please provide an example of how your strings look like? I think itīs quite interesting to see whatīs now spiderable by Google.
11:40 pm on Apr 30, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Oct 8, 2001
posts:2882
votes: 0


Krapulator, that url looks like the punctuation monster barfed on your urls. :)

In general, it's still a good idea to keep the number of parameters short. But we are getting better over time. :)

12:49 am on May 1, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Apr 25, 2003
posts:171
votes: 0


I'd be interested in seeing an example also. We're patiently awaiting indexing of our new dynamic site, so I'd like to see what you got in with (edited to take out your personal info of course).

Thank you GoogleGuy for my first big laugh of the day!

1:21 am on May 1, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 20, 2002
posts:139
votes: 0


In another thread, it was mentioned that pubmed is now being indexed. Here's one of the many pages I've seen included from their site:

[ncbi.nlm.nih.gov...]

If I'm counting correctly, that's 5 variables. Prior to this month, the most I had seen indexed, at least from my own sites, was 2 variables. One day last week, I did see freshbot crawling my dynamic pages up to 5 variables. Definitely had never seen this before then. Those pages never did make it in though.

lgn

2:14 am on May 1, 2003 (gmt 0)

Full Member

joined:June 18, 2002
posts:343
votes: 0


My url looks like this:

www.widgets.com/cgi-bin/myscript.pl?page=purple_widgets.html&cart_id=32348346.433

2:33 am on May 1, 2003 (gmt 0)

Full Member

10+ Year Member

joined:Mar 4, 2003
posts:309
votes: 0


URL rewritting is not only a good spider solution (saves ALL the hassle) but it looks clean for the client if you do it properly.

Most importantly, my urls will NEVER change, and I don't have to 'expose' my technology with extensions.

However, using " /something/ " may cause problems with indexing(?), does that mean to avoid all this we may have to " /something/index.htm " them all..

1:40 pm on May 1, 2003 (gmt 0)

Administrator

WebmasterWorld Administrator rogerd is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Aug 2, 2000
posts:9687
votes: 1


that url looks like the punctuation monster barfed on your urls.

ROFL! I knew there was a reason I start my day at WebmasterWorld!

I second the notion that logical, brief, static-looking URLs are handier for your visitors. While many will bookmark or e-mail a URL without ever paying attention to it, occasionally a user may have to type it in, repair it after a line wrap, etc.

One thing that I've found a bit weird is that on a site where I've rewritten the messy query string URLs Google has stuck with the old dynamic ones in its index. I haven't wanted to ban the old URLs in the robots.txt file for fear that I'd lose all the listings. Indeed, Matt implied at Pubcon that higher PR duplicate content on the same site would push out the rest. That was my assumption when I changed all the home page linkage, navigation linkage, etc. After a couple of updates, though, the dynamic URLs are hanging in there. I recently found a few links to the old URLs in a corner of the site that doesn't get much attention and fixed them, but from a PR standpoint I don't see how that could have been the problem. The moral is don't expect static URL rewriting to solve all of your Google problems.

1:47 pm on May 1, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 10, 2003
posts:51
votes: 0


GoogleGuy,

Any chance you will be spidering and indexing Miva merchant pages any time soon :)

4:01 pm on May 1, 2003 (gmt 0)

Full Member

10+ Year Member

joined:Mar 4, 2003
posts:309
votes: 0


The moral is don't expect static URL rewriting to solve all of your Google problems.

Absolutely, infact depending on how you do it you may see different results. I found one of the biggest problems was actually making a dynamic site 'STATIC' in the real sense. Alot of people forget that rewritting also includes RETHINKING your SELECTS and URLS

Having a list of items ordered like this:

/widget/10
/widget/11

Is no good if the site orders/selects this by new products
Ie when you add a new product 10 becomes 11 etc..

I would imagine this kind of flux would cause just as many indexing problems when used in a URL rewrite as it would in a querystring.

6:40 pm on May 1, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Dec 17, 2002
posts:124
votes: 0


What about when google indexes an affiliate who passes an affiliate id on the url, in the look.

And you don't want google to server that affiliate id on the link back to your own site.

How can you prevent that?

Because if a visitor clicks on the link at the affiliate site then it is completely valid to have on the url.

How can we prevent google from adding the affiliate id on the links it grabs from our affiliates?

12:14 pm on May 4, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 19, 2003
posts:1001
votes: 0


Can't give you anything like a final answer, crosenblum, but I'd guess a good tactic is to list your url's that affiliates are to use so the affiliate id is always at the end, like session id's. It makes no difference what order they come in, as long as they are grouped properly with the variable name they are associated with. I.e., if your software produces a url like me.com?pid=48&afilid=5647&viewthing=12&lang=ENG, change the software to cite the url as me.com?pid=48&viewthing=12&lang=ENG&afilid=5647. I bet google is smart enough that when it comes across a really long number tacked to the end of a url without too many other variables, it guesses that it may be some sort of session id or other dynamic indicator that is not immediately relevant to content, tries the url without this extra variable, and indexes it like that. I've come across many url's indexed in Google that are shorter than any actual url linked the site. Try [google.com...] and you'll see how it shows results for urls that don't have all the variables -- I'm not sure how these get listed, maybe webmasters trying to get mileage out of their links on msn (this doesn't work on msn.com, just the local msn's).

When it comes to making short html-like url's on dynamic content management systems with techniques like using the Apache mod_rewrite, it's also important not to change those url's which simply present your page content in a different form, or are likely to end in an endless loop of different dynamic url's. Some sites have gotten clobbered by the googlebot, producing infinite progressions of url's, and go down. Keep those url's with all those variables in them in the normal?a=whatever&b=foo&c=bar format.

I think the other search engines in the last week have been keeping up with Google when it comes to dynamic url's (is Google creaking?). I've listed some software on a lot of shareware sites a couple of weeks ago, and fast now finds it about a hundred times, Lycos even more, and Google has only a few listings.

1:13 pm on May 4, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Dec 24, 2002
posts:1130
votes: 0


> How can we prevent google from adding the affiliate id on the links it grabs from our affiliates?

A browser will add a question mark and the parameters & values to the URL in the action tag when a user clicks on the submit button of a 'GET' form. For a 'POST' form the parameters and values are forwarded in a different way. Also if no form is used, a string with the parameters & values can be composed and used in a link. It is possible to make a 'POST' form with a question mark and some of the parameters & values added to the original URL like

<form method="post" name="form1" action="http://www.mydomain.com/product.asp?param=aaa&lang=en">

I'm not sure if Google will regard the action as a link to your page. But if so, it will be without the affiliate ID. When a user click on the submit button (which can be disguised with an image), the affiliate id is also forwarded. So you will need some extra code in your dynamic page to get the id.

I've only experience with this in an intranet application, so maybe you need some testing before you start using this.

I disagree with mincklerstraat about Google leaving out some part at the end of the URL. The example in msg16 doesn't help. The URLs displayed in the SERP are not incomplete.

4:24 pm on May 4, 2003 (gmt 0)

New User

10+ Year Member

joined:Mar 10, 2003
posts:24
votes: 0


lgn you have done any redesign work on your site recently? The reason I am asking is that I was having problem's getting my dynamic pages listed on Google too(2 & 3 Parameters). Then i noticed why.

I cant remember the reason i did it, but i saved one of my dynamic pages in Internet Explorer, then opened it in Dreamweaver MX. What i found was, there was NO content in my dynamic table. After further investigation i noticed that i hade some stray <td> tags in there.

After editing my PHP code so that the saved pages displayed properly in Dreamweaver, Google listed my pages. Unfortunatly this was half way through the last dance so only a few have been listed, but Google traffic is now better than ever.

Maybe this was just a coincindence, but it could be worth checking just in case. My site has been live since last August, with Dynamic content since December, so it could be that Google has only just got round to listing them.

I have also noticed tha my Dynamic pages are now listed on Fast/Alltheweb too, i didnt think Fast indexed dynamic pages. Again, I'm quite new to this so I may be wrong on that too.

Love the site, picked up loads of good quality tips. Keep up the good work.

5:31 pm on May 4, 2003 (gmt 0)

New User

10+ Year Member

joined:Mar 4, 2003
posts:20
votes: 0


Unfortunately Googlebot also appears to have started to mutate URLs, for example:

someurl.com/something.php?1234

now becomes:

someurl.com/something.php?1234=

5:58 am on May 5, 2003 (gmt 0)

New User

10+ Year Member

joined:May 5, 2003
posts:2
votes: 0


Sweet, just stumbled on this site today, I'll hopefully learn alot here!

I've noticed that google has yet to go past my index page too, but I'm not sure why. We submitted not too long ago, so I dont know if just a matter of time or if something is wrong on our side... Is there some typical length of time before google will spider the entire site, even though they have already come visited index page?

Also, is there anything you can put into a robots.txt file to encourage spidering certain pages? Or is robots.txt just used for excluding files/directories?

My site is designed such that you never see index.php or any .php or any special chars like? or = in the URL, every URL looks like a directory, such as "/join/", "/tour/", "/ad/1000012/", "/photo/12345/", etc.... Is this the best design to allow spidering? And does it matter if I leave the "/" at the end of each URL? My code allows it with or without the slash, would one way or the other make any difference to the Googlebot?

Thanks for any advice.
Rob

6:08 am on May 5, 2003 (gmt 0)

Full Member

10+ Year Member

joined:Apr 1, 2003
posts:298
votes: 0


Welcome to Webmaster World!

Yeah I would say its a matter of time. But I also would build a site navagation page. A page linked from the home page thats has all the links to my other pages.

That way google WILL follow the link from the home page to the navagational page and then follow those links. Its best to have the text for those links the <title> of those pages.

I have 30 static pages that I built a navagation page for, thenon my static pages( no? in the urls) I have links from them to my dynamics, and my dynamics hav links to more dynamics.

My site is newer aswell, so far my statics have been crawled just not my dynamics(1000 pages).

The reason for a navagation page for me was that my links to my pages in the ome page were in JavaScript, and google doesnt read JavaScript. So that why for me google wasnt picking up my statics, hope that helps....

10:45 am on May 5, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Apr 16, 2003
posts:45
votes: 0


I have a doubt about dynamic links and the way they are followed by the spiders or at least how deep.

Imagine someone puts a link like this:

<a href=http://www.google.com/search?q=some-popular-keywords>

I'm sure the spider won't follow all the SERPs...

10:49 am on May 5, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Apr 14, 2003
posts:55
votes: 0

I've got a sinking feeling that the serp's on www-sj.google.com is the new update. I've just checked my keywords and google.co.uk and google.com and the serp's are exactly the same as in sj.
11:44 am on May 5, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Dec 24, 2002
posts:1130
votes: 0


Imagine someone puts a link like this:

<a href=http://www.google.com/search?q=some-popular-keywords>

I'm sure the spider won't follow all the SERPs...

They will not follow all the SERPs, but...
Try this [google.com] and you will see a SERP of 19 SERPs.

4:49 pm on May 5, 2003 (gmt 0)

New User

10+ Year Member

joined:May 5, 2003
posts:2
votes: 0


Thanks! Good to know about matching the page titles to the text of the links, I will make sure they match on our site map.
4:17 pm on May 8, 2003 (gmt 0)

New User

10+ Year Member

joined:May 8, 2003
posts:2
votes: 0


Making static web pages within dynamic sites is the way forward, kills those nasty querystrings completely
8:46 pm on May 14, 2003 (gmt 0)

New User

10+ Year Member

joined:May 2, 2003
posts:16
votes: 0


[domain.com...]

Will google index this url

8:52 pm on May 14, 2003 (gmt 0)

Full Member

10+ Year Member

joined:Feb 25, 2003
posts:323
votes: 0


Goolge indexed several of my 3 query string pages, BUT there are wayy more and I am still struggling to do the mod_rewrite thing anybody bookmarked any good threads on this topic? I tried a search and didn't really get far. Need to know whether the .htaccess file can be ftp'd ito the directory without any additional work done bythe sysadmin ...

thanks

And my apologies if I veered off topic

9:12 pm on May 14, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 8, 2002
posts:2335
votes: 0


I just noticed that I have a calendar program on a site I manage that lists a bunch of events. Some recurring regularly, some not. Weekly events, monthly events, yearly events. The url is dynamic but very clean. If you click on next month or previous month, the url is /path/to/url/calender.cgi?month=7&year=2003. Those next and previous links could go on forever. Google hasn't really touched it because I have a sessionid on it. But I'm wondering what G would do in another scenario.

Now if the url would have only?date=72003, =82003 etcetera...the content would be different every month. For example, birthday's would look like this:

XYZ's 20th birthday. XYZ's 21st birthday. Etcetera. It isn't a big change, but it is there. In my calendar, I tested this url:

/path/to/url/calender.cgi?month=7&year=220031 and the page showed up.

So I'm wondering, would "G" be smart enough to know not to crawl it forever?

If they are, I wonder how they would do that?

2:47 am on May 13, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 26, 2000
posts:2176
votes: 0


>>would "G" be smart enough to know not to crawl it forever?

Never rely on a bot to be smart. It is pretty easy to trap Googlebot. Especially if you've used Mod Rewrite to remove all the items that help Google identify a dynamic page.