Welcome to WebmasterWorld Guest from 54.160.131.144

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

do jaguars eat pdfs?

     

lucy24

1:25 am on Sep 18, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



IP: 85.31.219.0/25 within 85.31.192.0/19 (France, Jaguar. I assume this is jaguar-network dot com and not the car)
UA: HttpComponents/1.1

I wouldn't normally notice a robot I've only met twice,* but this one's got an oddity: it only eats PDFs. First saw it in May when it ate all three PDFs from an extremely obscure directory (took it a month to find them). Just showed up again to eat a brand-new PDF that's only been indexed a few days.

What do you suppose it does with them?

And, for that matter, how does it find them? Does it hit up the search engines every 24 hours, or possibly every five minutes, for a list of newly indexed PDFs?


* Found one recently that had been slipping under the radar almost daily for several months.

wilderness

1:38 am on Sep 18, 2012 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



FWIW, Google indexes PDF's on a separate crawl. Furthermore and when crawling PDF's, google is not robots.txt compliant.

I store my PDF's (used to have many) in image folders which are requested for omission by all bots and google wouldn't even slow down when entering such a directory for a PDF.

Thus I'm assuming the your-Jaguar is simply following the leader.

lucy24

5:30 am on Sep 18, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



FWIW, Google indexes PDF's on a separate crawl.

Yes, they use a special PDFbot that wears less clothes than the usual googlebot, presumably because PDFs tend to be heavy. In fact google indexed this particular pdf before it finished the three fat pages of which the pdf is only a snippet.

keyplyr

7:43 am on Sep 18, 2012 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month





The Apache HttpComponents™ project is responsible for creating and maintaining a toolset of low level Java components focused on HTTP and associated protocols. This project functions under the Apache Software Foundation (http://www.apache.org), and is part of a larger community of developers and users.