Welcome to WebmasterWorld Guest from 54.234.129.215

Forum Moderators: open

Message Too Old, No Replies

urls with session_id

how to avoid multiple spidering?

     

starec

9:28 am on Feb 21, 2002 (gmt 0)

10+ Year Member



We started using session_ids recently and today I see in the logs that the FAST spider does not handle it well.

It spiders repeatedly the same page with differente session ids:

66.77.73.149 - - [20/Feb/2002:04:01:59 +0100] "GET /index.php3?PHPSESSID=2ab18fdd6ca1320f644422b44daffa2d HTTP/1.0" 200 59126 "-" "FAST-WebCrawler/3.3 (crawler@fast.no; [fast.no...] "-"
66.77.73.149 - - [20/Feb/2002:04:02:24 +0100] "GET /index.php3?PHPSESSID=768e7265ce0735e14ff41532e05ea0e9 HTTP/1.0" 200 59126 "-" "FAST-WebCrawler/3.3 (crawler@fast.no; [fast.no...] "-"

Is there some simple solution how to avoid this? We get many thousands hits for these spurious urls from the fastbot...

heini

11:34 pm on Feb 21, 2002 (gmt 0)

WebmasterWorld Senior Member heini is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Starec, I've been trying to get you some help, without much success so far.
Perhaps you should try and mail Fast, they have been pretty responsive.

WebGuerrilla

12:20 am on Feb 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>>Is there some simple solution how to avoid this?

Don't write session ID's in the url. :)

A solution where the session ID is stored in a cookie works much better.

The other option would be to check IP addresses before writing the session ID's to the URL. If a search engine IP is detected, the server simply serves the page without the ID. All IP's not on your list get the unique ID.

starec

1:53 pm on Feb 22, 2002 (gmt 0)

10+ Year Member



heini, Webguerilla, thanks a lot.
Webguerilla, SIDs are passed by urls only when the user (like fastbot) does not accept cookies.
Never thought I will have to cloak the urls one day:)

WebGuerrilla

3:35 pm on Feb 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member




Don't think of it as cloaking. I can't imagine any search engine (even the anti-cloaking zealot, Google)having any problem with that approach. You are delivering the same page to both spider and human. You would be doing them a big favor.

You could probably even handle it just using the UA.

 

Featured Threads

Hot Threads This Week

Hot Threads This Month