Welcome to WebmasterWorld Guest from 107.20.20.39

Forum Moderators: brotherhood of lan & mack

Message Too Old, No Replies

Spider Link Following

using cgi script with external links

   
3:28 pm on Jan 11, 2003 (gmt 0)

10+ Year Member



I'm using a cgi script to track when someone clicks on one of my outbound links. The link code looks like this:
<a href="cgi-bin/tj-e.cgi?http://www.othersite.com/">
Other Site</a>

Can search engine robots follow links like this or does the '?' scare them off?

Actually I'm hoping they can follow, I'm under the impression Google likes sites to link out.

2:20 pm on Jan 12, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Two things immediately spring to mind;

  1. Has /cgi-bin/ been excluded via robots.txt? Potentially if this isn't your site then someone could create a link out script and then block spiders from the cgi-bin directory either for a genuine reason (ie maybe it affects their tracking stats) or in an attempt to weasel their way our of giving a "real" link.
  2. Are the URLs overly long? Lots of data in a querystring will often deter a search engine from crawling a certain URL - mostly due to a fear of hitting infinitely dynamic pages. That said though there are example of people who say that google did crawl similar links so you might be safe...
    [webmasterworld.com...]

-Tony

11:19 am on Jan 14, 2003 (gmt 0)

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



No, not all bots will follow the link, but the most important one(s) will (google). Even if the link is parsed off by cgi-bin, it will get read by the shear fact that it is in the url. It doesn't necc have to "click" the link to "follow" the link (unless you encode the link).
11:58 am on Jan 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Brett,

It doesn't necc have to "click" the link to "follow" the link (unless you encode the link).

David was talking about using the cgi-bin directory for links, specifically using a cgi script to link out to others. Now although the link-thru looks obvious enough it is realistically the same as using an encoded value.

Now unless I'm missing something...

Although a spider may think that "www.example.com/cgi-bin/linkout?http;//test.example.com" would link to "test.example.com", it can *never* be sure without requesting that script because it's dealing with a dynamic server-side script without any idea of what the code behind that script actually does.

- Tony

11:59 am on Jan 14, 2003 (gmt 0)

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Ok, clarification: even if cgi-bin is blocked by robots.txt, Google will read the following link to bar.com

<a href="http://foo.com/cgi-bin?redirect=http://bar.com">Foo and Bar</a>

2:01 pm on Jan 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Even if cgi-bin is blocked by robots.txt, Google will read the following link to bar.com

I agree it will read/note the link and potentially include it in the index but are you saying it will access the redirecting script (which it thinks points to "bar.com") violating robots.txt in the process?

- Tony

 

Featured Threads

Hot Threads This Week

Hot Threads This Month