Forum Moderators: mack
I am a newbie on spider programing, 'though I've been for one year parsing external web pages for info crawling with
regexp, php, custom db's...
I want to extend the facilities given by our company by
a web spider. I've been taking a look at snoopy, and kind of classes, it looks interesting, but I need the keyword indexing as well, strategies, options, phrases, word density, etc...
(I hate this kind of messages of lazy people saying
"Anyone knows hoy can I do what I should be searching instead of requesting?", I won't do that again)
SO: Anyone can put me on the first steps on this stuff?
----
Additional questions:
1.- Can I spoof or accept a cookie given by Javascript to resend in cookie-protected page spidering
2.- Any Ideas on how to spider pdf files, apart of getting the file using pdf2html and parsing?
P.d.: I'll compile the related info for a future post