homepage Welcome to WebmasterWorld Guest from 23.22.128.96
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / WebmasterWorld / New To Web Development
Forum Library, Charter, Moderators: brotherhood of lan & mack

New To Web Development Forum

    
Spider Link Following
using cgi script with external links
DavidT




msg:958010
 3:28 pm on Jan 11, 2003 (gmt 0)

I'm using a cgi script to track when someone clicks on one of my outbound links. The link code looks like this:
<a href="cgi-bin/tj-e.cgi?http://www.othersite.com/">
Other Site</a>

Can search engine robots follow links like this or does the '?' scare them off?

Actually I'm hoping they can follow, I'm under the impression Google likes sites to link out.

 

Dreamquick




msg:958011
 2:20 pm on Jan 12, 2003 (gmt 0)

Two things immediately spring to mind;

  1. Has /cgi-bin/ been excluded via robots.txt? Potentially if this isn't your site then someone could create a link out script and then block spiders from the cgi-bin directory either for a genuine reason (ie maybe it affects their tracking stats) or in an attempt to weasel their way our of giving a "real" link.
  2. Are the URLs overly long? Lots of data in a querystring will often deter a search engine from crawling a certain URL - mostly due to a fear of hitting infinitely dynamic pages. That said though there are example of people who say that google did crawl similar links so you might be safe...
    [webmasterworld.com...]

-Tony

Brett_Tabke




msg:958012
 11:19 am on Jan 14, 2003 (gmt 0)

No, not all bots will follow the link, but the most important one(s) will (google). Even if the link is parsed off by cgi-bin, it will get read by the shear fact that it is in the url. It doesn't necc have to "click" the link to "follow" the link (unless you encode the link).

Dreamquick




msg:958013
 11:58 am on Jan 14, 2003 (gmt 0)

Brett,

It doesn't necc have to "click" the link to "follow" the link (unless you encode the link).

David was talking about using the cgi-bin directory for links, specifically using a cgi script to link out to others. Now although the link-thru looks obvious enough it is realistically the same as using an encoded value.

Now unless I'm missing something...

Although a spider may think that "www.example.com/cgi-bin/linkout?http;//test.example.com" would link to "test.example.com", it can *never* be sure without requesting that script because it's dealing with a dynamic server-side script without any idea of what the code behind that script actually does.

- Tony

Brett_Tabke




msg:958014
 11:59 am on Jan 14, 2003 (gmt 0)

Ok, clarification: even if cgi-bin is blocked by robots.txt, Google will read the following link to bar.com

<a href="http://foo.com/cgi-bin?redirect=http://bar.com">Foo and Bar</a>

Dreamquick




msg:958015
 2:01 pm on Jan 14, 2003 (gmt 0)

Even if cgi-bin is blocked by robots.txt, Google will read the following link to bar.com

I agree it will read/note the link and potentially include it in the index but are you saying it will access the redirecting script (which it thinks points to "bar.com") violating robots.txt in the process?

- Tony

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / New To Web Development
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved