Forum Moderators: open

Message Too Old, No Replies

aol - does it spider sites itself

does it?

         

mr_dredd

12:50 pm on Feb 18, 2001 (gmt 0)



I read somewhere that aol runs its own spider on sites it gets from DMOZ. Does anyone know if this is true, or is it just inktomi results?

:) any help appreciated

Brett_Tabke

6:19 pm on Feb 18, 2001 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Results are coming from Ink, Goto, and the ODP. Ink was spidering the ODP at one point, but I don't know if that is the case now.

What you may be referring to is AOL's proxy cache spider that retrieves pages from the internet for AOL users. When an AOL user clicks a link, it requests it from the AOL proxy cache server. If the page isn't in the cache, then AOL sends out a spider to download the page, stuff it in the cache, and finally return it to the user. In your website logs, you will often see things like "spider-xyz.aol.com" and that's what they are up to.

mr_dredd

9:14 pm on Feb 18, 2001 (gmt 0)



thanks very much for the clarification brett!!

skibum

4:28 am on Feb 23, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



So AOL stores it's search results in a proxy cache server for users of AOL? Does that mean that when an AOL user requests a page that has been cached, that it will not show in the sites' server logs if the page is served up from AOL cache?

AOL could then blend click through data with on-page criteria for ranking purposes. Just trying to understand what all is involved in ranking in AOL and how what Brett mentioned would affect referral data in the log files.

skirril

3:05 am on Mar 1, 2001 (gmt 0)

10+ Year Member



All hits I get from aol either are cache-something.proxy.aol.com (in case of a GET request) or spider-something.proxy.aol.com (in case of a POST request).

It would be logical that sites already in the aol cache would not generate requests at your site (unless they need a POST..). Same thing with pages in google cache, one reason why web statistics can never be accurate.

To me, it looks like the aol cache is configured to have a fairly short timeout/rapid aging of sites. I have seen requests from the aol proxy for the same page within 30-35mins from each other.

WebGuerrilla

4:13 am on Mar 1, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



AOL did at one time (not sure if they still do or not) spider the page associated with the ODP listing and add the additoinal content to their database. If I remember correctly, this was an Inktomi spider that handled this process.

If you go there and do some searches you will quite often find ODP results being returned that do not contain any of your search words in the title/description. That's where they are coming from.

JohnNovaNYC

5:12 am on Mar 1, 2001 (gmt 0)



WebG, This is true. My site was accepted into ODP on 2/7 and then it began to be available on AOL database after the 22nd of February.

skibum

8:15 am on Mar 1, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



WebGuerrilla - I absolutely concur with what you said, and beleive that AOL definately spiders the page in addition to utilizing the ODP title/description when determining rankings.

The caching process and click through data does not seem to me to be very important in determining rankings for AOL. (Whereas YAHOO! seems to place sites based on keywords in the category string, title, and description, and then bump the rankings up or down based on search queries and click throughs) Any idea if AOL uses their caching (sp?) or click throughs to adjust rankings? I don't think they do, but just wondering if anyone has observered anything different..

Aaron

8:42 pm on Mar 25, 2001 (gmt 0)

10+ Year Member



Interesting, so this is what the spideraol stuff I have being watching..

I also got accepted through ODP first, and given my site was very new, I get changing material and the current AOL listing seems to have more recent content than that I initally submitted to ODP..

Robert Charlton

6:16 am on Mar 27, 2001 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



>>If you go there and do some searches you will quite often find ODP results being returned that do not contain any of your search words in the title/description<<

A site I've optimized is just being listed in the engines after its first appearance in ODP around 3/8. Prompted by this thread, I just checked it on AOL, and either AOL is supporting stemming, or they have spidered (part of?) the site.

I remember hearing way back that they spidered the home page. If this is the case, it puts an extra load on what you include in the home page. Anyone have any thought as to how deep they might be going?

NFFC

7:00 am on Mar 27, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>Anyone have any thought as to how deep they might be going?

From my end I only see an influence from data on the home page and if this is updated [re-spidered?] it is done so very infrequently.

When submitting to ODP I work on the basis that the site will be ranked at AOL on the basis of the ODP listing and the contents of the index page *at the time of submission*.

WebGuerrilla

5:26 pm on Mar 27, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That is the same approach I take. I haven't seen anything that would indicate they re-spider the home page very often, so it is something that you need to address before you submit to ODP.

I'm also fairly confident that the home page is the only page getting indexed.

olias

11:03 am on Apr 2, 2001 (gmt 0)

10+ Year Member



I also believe that AOL must spider ODP sites based on some of the searches that have found me from AOL.
I have a site with a few deep linked pages (please don't hold that against me!), interestingly all of my ODP pages were just hit within a minute by what looks like a standard AOL user:-
152.163.188.162 - 152.163.188.231
Mozilla/4.0+(compatible;+MSIE+5.0;+AOL+5.0;+Windows+98;+DigExt)
And there is no referer on any of these.. sorry if I'm overlooking something obvious but it looked like spidering to me!

Dave

Brett_Tabke

12:14 pm on Apr 2, 2001 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Dave, that is the proxy cache spider (see second post above).