Forum Moderators: phranque

Message Too Old, No Replies

sim spider

how good is it - known problems

         

Matthew McNally

2:45 pm on Apr 11, 2003 (gmt 0)

10+ Year Member



I have a set of invision forums, which I have worked very hard on to make all the important links search engine friendly.

Using the sim spider - I can see the list of links on the forum home page, which take the format

http://us.mysite.com/forums/show.php/act/SF/f/10

this links to the forum page for a specific topic (this is a car owners forum - and it links to a forum for a specific model)

All the links in this forum are also search engine engine friendly, eg

http://us.mysite.com/forums/show.php/act/ST/f/10/t/304

Checking the page through sim spider shows a load of links on the forum index to all the individual forums.

But none of the links on the individual forums (the links to each individual post) are shown.

Is this a problem with sim spider? does t have known limitations?

The links are there - but sim spider does't show them.

This is difficult to explain without giving a URL so you can try - would it be against the rules if I told you what to replace mysite with in those URLS (but without posting a real URL)?

[edited by: engine at 12:20 pm (utc) on Feb. 23, 2004]
[edit reason] de-linked [/edit]

jdMorgan

7:53 pm on Apr 11, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



How are the links which don't show different from those that do show? Are they all html <a href> links?

The only bug I know of in simspider is that it does not include title attribute text.

JIm

Matthew McNally

3:53 pm on Apr 12, 2003 (gmt 0)

10+ Year Member



sample link

<a href='http://us.mysite.com/forums/show.php/act/ST/f/1/t/727' class='linkthru' title='This topic was started: Mar 24 2003, 11:57 AM'>Import Revolution Pictures.....what A Day&#33;</a>

looks clean too me!

really worried that my hard work will be in vain - as if google works like sim spider - it is going to miss all the content in my forums 8(

Birdman

4:50 pm on Apr 12, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This is odd. I'm trying the same thing and seeing the same results with simspider [searchengineworld.com]. I don't understand why it sees the links from the home page but not the links from the individual forums.

I think a better solution is to get the urls to resemble the urls of this forum, which would require mod_rewrite [httpd.apache.org]

Matthew McNally

7:24 pm on Apr 12, 2003 (gmt 0)

10+ Year Member



just placed a bunch of links in a <noscript> section - sim spider didn't see those either.

Birdman - its us.mysite.com - where mysite = what I stickymailed - links are right below <body> tag

Clark

7:36 pm on Apr 12, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Can you sticky me the url?

Clark

6:08 am on Apr 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for the sticky.

I'm going to respond to this in public in case someone else can learn from it also, however, if there's anything private you want to ask I can reply by sticky.

Make sure when you run the tests that you are logged out (i.e. looking at the page as a guest). That is how the sim spider will see it.

I noticed two possible issues google may have with your setup. One is the links to the individual messages still have the s=(sessionhash) on it. The second issue is that even though the link to the individual forums are in a spider friendly format, when you actually click on the link, it seems you are forwarded to another url...I don't know how that is happening, but you have some issue there. Hope this helps.

Matthew McNally

4:54 pm on Apr 13, 2003 (gmt 0)

10+ Year Member



Thanks for the response Clark - more than happy to keep it all in public - sharing == good!

8)

Being a guest / logged in makes no difference to the HTML presented by the forums software - can you explain in a bit more detail why you think this is relevant?

With regards to the links, then yes, the actual links them selves are very search engine unfriendly, they include several query strings, including the search engines worst enemy - session ids! 8)

the mod I applied to the forum gets around this by formatting the links in the manner I illustrated above.

These links are passed to a file called show.php, which changes the ST/F/t/ type links into the unfriendly URLs that Invision normally uses.

I have posted this file beow - it is included in the Invision "package", and I have also posted the full "credits" - so hopefully, I'm not breaking any rules by posting it (I'm sure a mod will delete it if I am - and I'll understand)


<?php

/*

Script Name: show.php
Script Author: Matt Mecham
Package: Invision Board
Date: 16th September 2002
-------------------------
What is this?
-------------------------
It's a little "add-on" that simply allows for a neater / easier to read
URL to an invision board URL.
Example: http://www.domain.com/forums/show.php/act/ST/f/3/t/45
Resolves to: http://www.domain.com/forums/index.php?act=ST&f=3&t=45

It's not used in Invision Board itself, but might come in handy in your
own projects (such as a search engine friendly menu, etc).

Probably only works with PHP 4.1+

*/

$base_url = 'http://us.mysite.com/forums/index.php'; //Edit this to suit

$redirect = "";

if ( $_SERVER['PATH_INFO']!= "" )
{
$c = 0;

foreach( explode( "/", $_SERVER['PATH_INFO'] ) as $bit)
{
if ($bit!= "")
{
if ($c == 0)
{
$c++;
$redirect .= $bit.'=';
}
else
{
$c = 0;
$redirect .= $bit.'&';
}
}
}
}

header("Location: $base_url?".$redirect);

exit();

?>

this is why the page you receive, is not the one reffered to in the link (it is exactly the same / right page for your request - just has a different URL)

Sim spider picks up one set of these links (the main forum page), but misses the individual forums / topics

would love to be able to do this with Mod_ReWrite, but that would definitely be beyond me at the moment

[edited by: engine at 12:21 pm (utc) on Feb. 23, 2004]
[edit reason] de-linked [/edit]

PaulPaul

5:29 pm on Apr 13, 2003 (gmt 0)

10+ Year Member



There is a browser called Lynx. This browser was recommended on this board, by our favorite Google employee. I would take his suggestion and use it to scan your site.

If you cant see it in Lynx, chances are googlebot isnt seeing it.

Matthew McNally

5:54 pm on Apr 13, 2003 (gmt 0)

10+ Year Member



PaulPaul

have downloaded and surfed the site using Lynx.

All my links where hown, including those that sim spider doesn't see.

phew.

can I relax now, knowing that Google will be able to spider my forums?

PaulPaul

6:12 pm on Apr 13, 2003 (gmt 0)

10+ Year Member



That is a good sign, but take the warnings about session ID's.. Googlebot does not utilize cookies nor session id's.It can come back 2 days later, trying to follow the same route, far after the session ID has expired.

Try passing all the variables in the url, and reduce or better yet eliminate the need for session variables. You will notice much better crawling results.

Matthew McNally

6:22 pm on Apr 13, 2003 (gmt 0)

10+ Year Member



I'm not feeding google any session ids.

The links I'm feeding google take the form http://us.mysite.com/forums/show.php/act/ST/f/10/t/304 and this always resolves to the same page.

Ony difference is the show.php script I posted translates this to

http://us.mysite.com/forums/index.php?act=ST&f=10&t=304&

[edited by: engine at 12:22 pm (utc) on Feb. 23, 2004]
[edit reason] de-linked [/edit]

Clark

9:18 pm on Apr 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



RE: log in as guest, the closer you get to looking at a site like a search engine will, the better. But lynx more than takes care of that.

The big problem I see is that Google will have an easier time seeing the link if you don't make the url "neater" then if you make the url "neater" but then do a redirect. These days google is ok with passing one or two variables to a script. Just kill that hack and you should be fine as long as the forum names don't contain sessionids.

To tell the truth, I don't know why all those forums put sessions in as a default. It's just for those pesky few people in the world who are a pain about not having cookies in their site. I wouldn't even want those PITA's on my site anyways. They'll always complain about something ;) (if any of you PITAs are reading this, I was just kidding. Just a joke. If none of you are reading this, then I was serious.)

P.S. I would fix this ASAP. If google didn't start the deepcrawl already, it'll start very soon.

Matthew McNally

12:05 am on Apr 14, 2003 (gmt 0)

10+ Year Member



I have looked at all these links in lynx - and the ones that count are search engine friendly and work perfectly.

so - I am pretty happy.

would post screenies - but the URL that I need to show to "prove" what I am saying is in the title bar, and its too late for me to start editting the true URL out of the image.

> P.S. I would fix this ASAP. If google didn't start the deepcrawl already, it'll start very soon.

PS - why do you think I was worrying!

8) 8) 8)

Clark

5:57 am on Apr 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



:)

TheOtherOne

11:54 pm on Apr 15, 2003 (gmt 0)

10+ Year Member



Just one last comment... I've talked with other people who have applied this same mod to invision board and confirmed that google successfully indexed their forums.