homepage Welcome to WebmasterWorld Guest from 54.161.247.22
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Advertising / Paid Inclusion Engines and Topics
Forum Library, Charter, Moderator: open

Paid Inclusion Engines and Topics Forum

  posting off  
Symbols in URLS: %?,
grnidone




msg:20443
 4:13 pm on Sep 18, 2000 (gmt 0)

I know that search engines such as Alta will stop at a % and ?.

What about commas? I talked to Danny Sullivan about commas in urls and he said they were not a problem, but I am starting to wonder.

What do you guys think?

-G

 

seth_wilde




msg:20444
 5:42 pm on Sep 18, 2000 (gmt 0)

I've never heard of commas being a problem. Of coarse to my knowledge it's not common to even have commas in a URL. They targeted question marks becuase they are very common in dynamic sites with millions of pages and they were afraid that their spiders would get trapped in a never ending loop. Last I new it was still possible to get url's with "?" indexed (if you submitted them directlty), spiders just wouldn't follow links containing "?". Has anybody experimented with this lately?

JamesR




msg:20445
 6:53 pm on Sep 18, 2000 (gmt 0)

I have been submitting LookSmart directory pages and have seen them get listed on AltaVista.

Brett_Tabke




msg:20446
 5:51 pm on Sep 25, 2000 (gmt 0)

I wouldn't trust it to survive. I'd think if you can reroll your cgi's to accept a comma, why not just educate yourself on apache ModRewrite. Once you get the basics, you can twist a url anyway you want. You can then do things like:
foo.com/bar/zippy-form-valueone-formvalue-two.htm

Then modrewrite can strip it all down and toss it to the proper script. The se never knows the difference. Looks like a standard html file to them.

grnidone




msg:20447
 12:51 pm on Sep 26, 2000 (gmt 0)

>I wouldn't trust it to survive. I'd think if you can >reroll your cgi's to accept a comma, why not just >educate yourself on apache ModRewrite.

Yeah. That would work, but we are using Vinette (sp?) Storyserver. We currently have a ModRewrite-like thing in place, but we can't get rid of the commas.

-G

Brett_Tabke




msg:20448
 1:07 pm on Sep 26, 2000 (gmt 0)

Ok, is it on Apache? You can call the story server via SSI. The same way I am using it here to present the posts. Everything here is of course dynmically generated but laying under pretty looking htm urls.

Such as this post is located at:
[webmasterworld.com]

Which actually calls the cgi at:
[webmasterworld.com]

via a simple ssi :
#include virtual="/discussion.cgi?forum=13&discussion=100"

georged




msg:20449
 1:18 pm on Sep 26, 2000 (gmt 0)

commas are not a problem for indexing.
e.g.
[altavista.com...]
Look at the ninth spot.
No problem getting indexed. It's just a real pain not being able to name the pages properly, I'm thinking about this due to a client using Vignette Storyserver as well.

Brett_Tabke




msg:20450
 7:18 am on Oct 3, 2000 (gmt 0)

I wonder about how well they rank George? There are so few of them in alta with commas it is hard to deduce what kind of rankings they are getting across the board.

georged




msg:20451
 10:13 am on Oct 3, 2000 (gmt 0)

I see them when I'm searching for sports people on AV and they seem to do OK. Never seen them at number one, though. These searches typically have less than 20,000 returns and the pages in question aren't optimised.
I suspect the sites that the pages come from have high link popularity, as they are generally sports news sites or official sites of clubs. I would also suspect that if you optimised them they'd do as well as any other page, since they can be indexed.
Probably the reason why we don't see more of them is that they don't get submitted due to the size of the sites they're from, they don't get optimised because these sites are just grinding out hundreds of these pages, and also because they are not up very long (or not linked to for very long).
I wonder if this is the case with grnidone's site, high page turn-over etc? It is with my client's site and I'm trying to get them to introduce some absolutely static no-change pages, so I can test-drive my 'oh, hell, just optimise it and see' strategy. :)


Ted




msg:20452
 3:04 pm on Oct 3, 2000 (gmt 0)

I have seen AV index ? as well as % for a while now.

search [altavista.com]

This is a local Swedish search engine that has 52 search result pages indext by AV.

The URL's look like this:
[4en.net...]

They even index own URL’s including the ?

search [altavista.com]

Check the number 200 listing, the add URL page at AV is indext. Gives a good link to that adult site I suppose.

Another example is koll.se, also a Swedish search engine. They have over 2000 pages indext including a ? and/or % in the URL.
Just type "host:koll.se" and see the result.

What do you all say?

[edit]shortened the urls[/url]

henki




msg:20453
 3:29 pm on Oct 3, 2000 (gmt 0)

Funny to notice that e.g. altavista.se is not accepting question marks and also states so when you submit such a page.

But AltaVista.com is accepting them.

rencke




msg:20454
 3:37 pm on Oct 3, 2000 (gmt 0)

It certainly seems that AV has started to index dynamic pages, allright. Think, think, think....

AV must have choked on their coffee when Fast announced last Feb that they had a bigger index and sent their spiders out day and night in order to beat Fast to the billion pages they had promised for new year.

A quick way to accumulate lots of pages fast, would be not to stop at "?" but to keep going. There is always the risk of getting stuck with lots of duplicates, but that risk is great even for static pages and besides, there is supposed to be 500 billion pages hidden in databases. So why not harvest what you can from them?

I wonder if other SE:s are going the same way? The business is turning into a numbers game and everyone wants to have a billion pages. Google says they already have that.

Life will become a lot easier for sites with dynamic content if this is the new trend.

seth_wilde




msg:20455
 7:33 pm on Oct 3, 2000 (gmt 0)

Av has been indexing urls with "?" marks for months, the only catch is that the page has to be directly submitted. This still allows them to protect their spiders from never ending loops but at the same time allows them to index quality dynamic pages.

It's still not clear if their is any kind of penalty for these types of pages, But overall in results they make up of very small percentage of top 60 results (less than a 1/2 of a percent).

mark roach




msg:20456
 11:31 pm on Oct 4, 2000 (gmt 0)

There does seem to be a trend towards indexing dynamic sites recently. Excite crawled a load of pages with ? in the URLS last week and yesterday Google did the same. Av has never crawled any of my pages though :(

seth_wilde




msg:20457
 11:33 pm on Oct 4, 2000 (gmt 0)

Mark-

Are you directly submitting the dynamic pages?

mark roach




msg:20458
 12:08 am on Oct 5, 2000 (gmt 0)

No

seth_wilde




msg:20459
 1:34 am on Oct 5, 2000 (gmt 0)

Try directly submitting to the add url page. You should see a much better success rate this way.

uksitesubmit




msg:20460
 9:03 am on Oct 7, 2000 (gmt 0)

>Av has never crawled any of my pages though
I think that AV do crawl all the pages of a site, they just dont list them.

Brett_Tabke wrote earlier..
>The same way I am using it here to present >the posts. Everything here is of course >dynmically generated but laying under >pretty looking htm urls.
I think you know exactly what you are at and where your going :)
I am working on the same thing at the moment but with more options!!
Please let me know more, as i am scepticle about one thing if you want ill show you it.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Advertising / Paid Inclusion Engines and Topics
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved