homepage Welcome to WebmasterWorld Guest from 107.20.109.52
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Yahoo / Deprecated - Altavista, Alltheweb.com
Forum Library, Charter, Moderator: open

Deprecated - Altavista, Alltheweb.com Forum

    
urls with session_id
how to avoid multiple spidering?
starec




msg:222673
 9:28 am on Feb 21, 2002 (gmt 0)

We started using session_ids recently and today I see in the logs that the FAST spider does not handle it well.

It spiders repeatedly the same page with differente session ids:

66.77.73.149 - - [20/Feb/2002:04:01:59 +0100] "GET /index.php3?PHPSESSID=2ab18fdd6ca1320f644422b44daffa2d HTTP/1.0" 200 59126 "-" "FAST-WebCrawler/3.3 (crawler@fast.no; [fast.no...] "-"
66.77.73.149 - - [20/Feb/2002:04:02:24 +0100] "GET /index.php3?PHPSESSID=768e7265ce0735e14ff41532e05ea0e9 HTTP/1.0" 200 59126 "-" "FAST-WebCrawler/3.3 (crawler@fast.no; [fast.no...] "-"

Is there some simple solution how to avoid this? We get many thousands hits for these spurious urls from the fastbot...

 

heini




msg:222674
 11:34 pm on Feb 21, 2002 (gmt 0)

Starec, I've been trying to get you some help, without much success so far.
Perhaps you should try and mail Fast, they have been pretty responsive.

WebGuerrilla




msg:222675
 12:20 am on Feb 22, 2002 (gmt 0)

>>Is there some simple solution how to avoid this?

Don't write session ID's in the url. :)

A solution where the session ID is stored in a cookie works much better.

The other option would be to check IP addresses before writing the session ID's to the URL. If a search engine IP is detected, the server simply serves the page without the ID. All IP's not on your list get the unique ID.

starec




msg:222676
 1:53 pm on Feb 22, 2002 (gmt 0)

heini, Webguerilla, thanks a lot.
Webguerilla, SIDs are passed by urls only when the user (like fastbot) does not accept cookies.
Never thought I will have to cloak the urls one day:)

WebGuerrilla




msg:222677
 3:35 pm on Feb 22, 2002 (gmt 0)


Don't think of it as cloaking. I can't imagine any search engine (even the anti-cloaking zealot, Google)having any problem with that approach. You are delivering the same page to both spider and human. You would be doing them a big favor.

You could probably even handle it just using the UA.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Yahoo / Deprecated - Altavista, Alltheweb.com
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved