Forum Moderators: open

Message Too Old, No Replies

Impact of URL re-writing?

Question about impact of URL re-writing on Spiders

         

hav

2:47 pm on Dec 21, 2000 (gmt 0)



We're looking at moving to servlets and jsp but I fear that URL re-writing will be required because of things like session scoped beans etc.

Was wondering about the implications of URL re-writing from the perspective of spiders and SE's - how can we avoid the appearance of SE spamming?

NFFC

12:15 pm on Dec 31, 2000 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi hav welcome to WebmasterWorld,

As you may have gathered from the deafening silence surrounding your post the combination of search engines and session scoped beans strikes fear into the heart of even the most accomplished SEO's.

There are good reasons for this, as I understand it the use of JSP and servlets requires an enormous about of session information to be passed from page to page in the URL string, it also involves the use of "special" characters in the URL such as ? and @. Put simply this is about as SE unfriendly as it gets.

For an example of this view this URL [move.com] [all 217 characters of it] from move.com. It may be useful to try a search at AltaVista to try and locate the site by what should be their targeted keywords, if you can find them you have more patience than I, <sarcastic> although in fairness they are 9th for a search for "move.com".</sarcastic> It may also be instructive to note the amount of screen real estate they have purchased, at who knows what cost.

IMHO URL re-writing will only paper over the cracks and will not allow you the freedom with page design etc that is needed to compete in the most competitive keywords. You need some static, hand-crafted pages which I believe the set-up you are considering disallows.

If it were me I would be seriously considering looking at an ASP solution, combined with url re-writing and "true" static html pages.

grnidone

8:43 pm on Jan 3, 2001 (gmt 0)



My $0.02.

One of the websites I worked on could *not* get ranked to save my life until we rewrote the urls to get rid of the ? % ? characters. We couldn't get in the top 300, the spiders just weren't getting past the first page.

However, there is hope. Apache has a work around to take out those characters, and so does Vinette (sp?) Storyserver.

Apache rewrite is here [engelschall.com]

I haven't been able to find an online version of the Vinette (sp?) storyserver stuff...but it is in the documentation that comes with it.

-G

Fusioneer

12:15 am on Jan 10, 2001 (gmt 0)

10+ Year Member



I've cross-posted this in the other thread also.

Peronsally I have had mixed results indexing dynamic pages, although I have seen Google, AV, and FAST all with .cfm, .asp, and "?" URL's in their database.

I have written a cloaking script in Cold Fusion and did some experiments with generating doorway pages out of a database with randomized content so they would not be recognized as such.

The URL's were encoded to hide the fact they were dynamic pages, eg.

[domain.com...]

For any of you Cold Fusion junkies (there's a lot of Apache in this forum I've noticed) CF stores the entire URL in a string called CGI.PATH_INFO.

Convert this into a forward-slash-delimited list, define a start point, strip out unecessary info and you can pass as many variables as you want through the URL with slashes.

Before you get too excited...it does not seem to work with all engines. AV now seems to recognize the "info.cfm" is the document rather than the index.html and requests it WITHOUT the appending slashed variables.

I have a possible workaround but am working on other things at the moment ;)

Comments?

hav

4:38 pm on Jan 11, 2001 (gmt 0)



Thanks for the replies folks -

I guess it's not really a big deal for us to simply watch for a spider (like by watching user-agent) and then re-write or not the URLs which might appear in a page. (I know this won't catch all spiders but it seems to do pretty well for the majors)

My real concern is that, by doing this, a new session will be "started" for each page that a spider hits (perhaps hundreds in a very short period) and each such session would consume (at least minimal) resources that will remain unavailable until each such one-page session times out.

I guess we could have two sets of beans (one for spiders - one for people) such that the beans used for spiders are assigned page - rather than session - scope. I guess I was just hoping maybe someone else using servlets and jsp (et al) might have already spent some time in this and might be willing to share some experience.

Anyway - thanks again for the comments - this is a great resource for sure!