Welcome to WebmasterWorld Guest from 54.196.144.100

Forum Moderators: open

Message Too Old, No Replies

Symbols in URLS: %?,

     

grnidone

4:13 pm on Sep 18, 2000 (gmt 0)



I know that search engines such as Alta will stop at a % and ?.

What about commas? I talked to Danny Sullivan about commas in urls and he said they were not a problem, but I am starting to wonder.

What do you guys think?

-G

seth_wilde

5:42 pm on Sep 18, 2000 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've never heard of commas being a problem. Of coarse to my knowledge it's not common to even have commas in a URL. They targeted question marks becuase they are very common in dynamic sites with millions of pages and they were afraid that their spiders would get trapped in a never ending loop. Last I new it was still possible to get url's with "?" indexed (if you submitted them directlty), spiders just wouldn't follow links containing "?". Has anybody experimented with this lately?

JamesR

6:53 pm on Sep 18, 2000 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have been submitting LookSmart directory pages and have seen them get listed on AltaVista.

Brett_Tabke

5:51 pm on Sep 25, 2000 (gmt 0)

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I wouldn't trust it to survive. I'd think if you can reroll your cgi's to accept a comma, why not just educate yourself on apache ModRewrite. Once you get the basics, you can twist a url anyway you want. You can then do things like:
foo.com/bar/zippy-form-valueone-formvalue-two.htm

Then modrewrite can strip it all down and toss it to the proper script. The se never knows the difference. Looks like a standard html file to them.

grnidone

12:51 pm on Sep 26, 2000 (gmt 0)



>I wouldn't trust it to survive. I'd think if you can >reroll your cgi's to accept a comma, why not just >educate yourself on apache ModRewrite.

Yeah. That would work, but we are using Vinette (sp?) Storyserver. We currently have a ModRewrite-like thing in place, but we can't get rid of the commas.

-G

Brett_Tabke

1:07 pm on Sep 26, 2000 (gmt 0)

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Ok, is it on Apache? You can call the story server via SSI. The same way I am using it here to present the posts. Everything here is of course dynmically generated but laying under pretty looking htm urls.

Such as this post is located at:
[webmasterworld.com]

Which actually calls the cgi at:
[webmasterworld.com]

via a simple ssi :
#include virtual="/discussion.cgi?forum=13&discussion=100"

georged

1:18 pm on Sep 26, 2000 (gmt 0)

10+ Year Member



commas are not a problem for indexing.
e.g.
[altavista.com...]
Look at the ninth spot.
No problem getting indexed. It's just a real pain not being able to name the pages properly, I'm thinking about this due to a client using Vignette Storyserver as well.

Brett_Tabke

7:18 am on Oct 3, 2000 (gmt 0)

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I wonder about how well they rank George? There are so few of them in alta with commas it is hard to deduce what kind of rankings they are getting across the board.

georged

10:13 am on Oct 3, 2000 (gmt 0)

10+ Year Member



I see them when I'm searching for sports people on AV and they seem to do OK. Never seen them at number one, though. These searches typically have less than 20,000 returns and the pages in question aren't optimised.
I suspect the sites that the pages come from have high link popularity, as they are generally sports news sites or official sites of clubs. I would also suspect that if you optimised them they'd do as well as any other page, since they can be indexed.
Probably the reason why we don't see more of them is that they don't get submitted due to the size of the sites they're from, they don't get optimised because these sites are just grinding out hundreds of these pages, and also because they are not up very long (or not linked to for very long).
I wonder if this is the case with grnidone's site, high page turn-over etc? It is with my client's site and I'm trying to get them to introduce some absolutely static no-change pages, so I can test-drive my 'oh, hell, just optimise it and see' strategy. :)

Ted

3:04 pm on Oct 3, 2000 (gmt 0)

10+ Year Member



I have seen AV index ? as well as % for a while now.

search [altavista.com]

This is a local Swedish search engine that has 52 search result pages indext by AV.

The URL's look like this:
[4en.net...]

They even index own URL’s including the ?

search [altavista.com]

Check the number 200 listing, the add URL page at AV is indext. Gives a good link to that adult site I suppose.

Another example is koll.se, also a Swedish search engine. They have over 2000 pages indext including a ? and/or % in the URL.
Just type "host:koll.se" and see the result.

What do you all say?

[edit]shortened the urls[/url]

henki

3:29 pm on Oct 3, 2000 (gmt 0)

10+ Year Member



Funny to notice that e.g. altavista.se is not accepting question marks and also states so when you submit such a page.

But AltaVista.com is accepting them.

rencke

3:37 pm on Oct 3, 2000 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It certainly seems that AV has started to index dynamic pages, allright. Think, think, think....

AV must have choked on their coffee when Fast announced last Feb that they had a bigger index and sent their spiders out day and night in order to beat Fast to the billion pages they had promised for new year.

A quick way to accumulate lots of pages fast, would be not to stop at "?" but to keep going. There is always the risk of getting stuck with lots of duplicates, but that risk is great even for static pages and besides, there is supposed to be 500 billion pages hidden in databases. So why not harvest what you can from them?

I wonder if other SE:s are going the same way? The business is turning into a numbers game and everyone wants to have a billion pages. Google says they already have that.

Life will become a lot easier for sites with dynamic content if this is the new trend.

seth_wilde

7:33 pm on Oct 3, 2000 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Av has been indexing urls with "?" marks for months, the only catch is that the page has to be directly submitted. This still allows them to protect their spiders from never ending loops but at the same time allows them to index quality dynamic pages.

It's still not clear if their is any kind of penalty for these types of pages, But overall in results they make up of very small percentage of top 60 results (less than a 1/2 of a percent).

mark roach

11:31 pm on Oct 4, 2000 (gmt 0)

10+ Year Member



There does seem to be a trend towards indexing dynamic sites recently. Excite crawled a load of pages with ? in the URLS last week and yesterday Google did the same. Av has never crawled any of my pages though :(

seth_wilde

11:33 pm on Oct 4, 2000 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Mark-

Are you directly submitting the dynamic pages?

mark roach

12:08 am on Oct 5, 2000 (gmt 0)

10+ Year Member



No

seth_wilde

1:34 am on Oct 5, 2000 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Try directly submitting to the add url page. You should see a much better success rate this way.

uksitesubmit

9:03 am on Oct 7, 2000 (gmt 0)

10+ Year Member



>Av has never crawled any of my pages though
I think that AV do crawl all the pages of a site, they just dont list them.

Brett_Tabke wrote earlier..
>The same way I am using it here to present >the posts. Everything here is of course >dynmically generated but laying under >pretty looking htm urls.
I think you know exactly what you are at and where your going :)
I am working on the same thing at the moment but with more options!!
Please let me know more, as i am scepticle about one thing if you want ill show you it.