Welcome to WebmasterWorld Guest from 54.196.244.45

Forum Moderators: open

Message Too Old, No Replies

urls with session_id

how to avoid multiple spidering?

     
9:28 am on Feb 21, 2002 (gmt 0)

Preferred Member

10+ Year Member

joined:Feb 17, 2001
posts:409
votes: 0


We started using session_ids recently and today I see in the logs that the FAST spider does not handle it well.

It spiders repeatedly the same page with differente session ids:

66.77.73.149 - - [20/Feb/2002:04:01:59 +0100] "GET /index.php3?PHPSESSID=2ab18fdd6ca1320f644422b44daffa2d HTTP/1.0" 200 59126 "-" "FAST-WebCrawler/3.3 (crawler@fast.no; [fast.no...] "-"
66.77.73.149 - - [20/Feb/2002:04:02:24 +0100] "GET /index.php3?PHPSESSID=768e7265ce0735e14ff41532e05ea0e9 HTTP/1.0" 200 59126 "-" "FAST-WebCrawler/3.3 (crawler@fast.no; [fast.no...] "-"

Is there some simple solution how to avoid this? We get many thousands hits for these spurious urls from the fastbot...

11:34 pm on Feb 21, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member heini is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Jan 31, 2001
posts:4404
votes: 0


Starec, I've been trying to get you some help, without much success so far.
Perhaps you should try and mail Fast, they have been pretty responsive.
12:20 am on Feb 22, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 26, 2000
posts:2176
votes: 0


>>Is there some simple solution how to avoid this?

Don't write session ID's in the url. :)

A solution where the session ID is stored in a cookie works much better.

The other option would be to check IP addresses before writing the session ID's to the URL. If a search engine IP is detected, the server simply serves the page without the ID. All IP's not on your list get the unique ID.

1:53 pm on Feb 22, 2002 (gmt 0)

Preferred Member

10+ Year Member

joined:Feb 17, 2001
posts:409
votes: 0


heini, Webguerilla, thanks a lot.
Webguerilla, SIDs are passed by urls only when the user (like fastbot) does not accept cookies.
Never thought I will have to cloak the urls one day:)
3:35 pm on Feb 22, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 26, 2000
posts:2176
votes: 0



Don't think of it as cloaking. I can't imagine any search engine (even the anti-cloaking zealot, Google)having any problem with that approach. You are delivering the same page to both spider and human. You would be doing them a big favor.

You could probably even handle it just using the UA.