Forum Moderators: open

Message Too Old, No Replies

asp / dhtml / shtml

how do spiders see these?

         

allybongo

11:01 am on Jan 8, 2003 (gmt 0)

10+ Year Member



I have a client that has come to me for advice on building a search engine friendly website. He would prefer to use asp to build this. I have tried to steer him away from this and just use html but if he insists on using ASP how will this affect the spiders?

Also while I'm on the subject, do spiders have problems with dhtml and shtml?

Thanks in advance :)

Dreamquick

11:21 am on Jan 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



At the end of the day all scripting languages (ASP, PHP, SSI etc) work just as well as traditional html pages assuming that all they are really writing out at the end of the process is HTML - spiders generally don't care what type of page gives them HTML...

The only gotchas relate to how the site is coded rather than what it is coded with for example lots of query string data or not allowing users to browse without a session cookie are easy ways to get search engines to not index pages.

That said most problems of dynamic sites not being able to be crawled by SEs can be solved with a little thought and a little more code...

- Tony

Visit Thailand

11:24 am on Jan 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I converted a site over from .htm to .shtml a while back and have had no problems, have maintained v. good positions in SE's etc

I would recommend anyone doing a new site to do .shtml from the start, the ease and benefits of using SSI is great.

allybongo

11:55 am on Jan 8, 2003 (gmt 0)

10+ Year Member



Thanks for that guys, that's exactly what I wanted to hear :)

Cheers!

Grumpus

12:18 pm on Jan 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Couple of tips for ASP and getting it spidered.

In addition to making sure your ASP spits out clean HTML code that the spiders can digest, the query string is really the trick.

1) Don't put any tracking data into the string. In other words, session ID's, referer data, etc. is bad. Send only the variables that are needed to generate a unique page to the URL.

2) Be consistent. "...page.asp?ID1=2&ID2=1" will generate the same page as "...page.asp?ID2=1&ID1=2" but google will look at them as separate pages. It will then start to hate both of them because they are the same. Make sure your querystrings maintain a consistent order. I still have a few pages on my site that I find from time to time where I have that problem and I kick myself with each Google update.

3) Keep it Short and Sequential: Google hates thinks like "...page.asp?ID=1625te53632271ths6" It may not be the length so much as the "Oh, that's a RANDOM number and not a sequence. That means it's a RANDOM page and not a real page". I have NEVER been able to get google to index my pages that are generated using product UPC, ISBN, or Amazon ASIN numbers. They are long numbers and non sequential. Not sure what kills it, but if I generate a page using my own sequential database numbers or even calling the product by its name (which is LONGER than the number, but at least it's identitifiable) then it crawls it fine.

4) Make good use of Server.URLEncode and Server.HTMLEncode functions before passing and parsing those strings. Google's pretty good at it, but FAST and ALLTHEWEB sometimes have problems with "page.asp?dude=Joe%20Blow". When you use URLEncode, it changes it to "page.asp?dude=Joe+Blow" and it works fine. (AOL, though, often converts the? and = to their ascii codes. I haven't figured out what causes this and it doesn't happen all the time, but when it DOES do it, you can watch your 404's go through the friggin' roof. I HATE that!)

I guess that's it for the major stuff. The only other thing to deal with (if the site is big) is getting googlebot to index the important stuff first and leave the older, less important stuff until later. I've got about 3 million pages now, but only 3000-4000 are really hot topics (it's movies and soundtracks so some are hot, some are not). For months, googlebot would go through and pick up 40K - 50K pages - but it picked whichever ones it felt like. Now I've got it better where it's getting the hot stuff first and then filling the rest with whatever it happens to like. Let me know if you ever get to that point and I'll post a thread on how to do that. (Or at least how I THINK I managed to do that). ;)

G.