|FAST's inability to deep crawl|
Apparently does not deep crawl PHP drive pages
| 2:26 am on Feb 8, 2003 (gmt 0)|
The one downside I've seen to using dynamically driven content on my site (powered by PHP) is that search engines like FAST and Inktomi adamantly refuse to deep crawl my site. They basically skim off the top and then leave. Wimps. :D
Google has almost since the beginning always deep crawled my PHP pages without so much as a ho hum, and recently I noticed that Teoma/Ask.com has suddenly started deep crawling my site quite unexpectantly. I don't think it's a time issue, since my site has been online for over a year. That's why I question the concept of FAST being an engine that could overtake Google, since they are apparently incapable of deep crawling anything other than static content. Am I wrong?
| 5:33 am on Feb 8, 2003 (gmt 0)|
I'm not sure if it's just PHP pages. I've submitted all four of my (admittedly small, average just over 20 pages) sites to FAST/ATW in the past year... and got the index pages indexed. Even checking the logs, they never went past /robots.txt and /index.html. These are all plain vanilla HTML pages, no scripts, nada. If I want a specific page in ATW, I need to submit it manually... so I do. Everything I've submitted has gotten in, usually in about a month. As far as I can tell, this is perfectly normal for FAST... But maybe it's just me.
| 4:55 pm on Feb 8, 2003 (gmt 0)|
Bluestreak, I can't say if there are issues with the fast spiders indexing dynamic content or if it's rather a conceptual issue.
I'm still not sure where Fast is going with their index, be it for their own engine, ATW, or for their portal partners.
They do the second largest index worldwide on the one hand. But then they also have issued statements like they would not want to index everything on the web.
| 5:11 pm on Feb 8, 2003 (gmt 0)|
It´s most certainly not the fact that you are using PHP or any other technology to produce dynamic content. I´m using either PHP or Perl/mod_perl for all my sites and they all get completely indexed.
I´d be surprised if any SE spider were to look for the X-Powered-By: PHP/4.x.x response header field and were to exclude resources because of its presence.
Most likely this will be an issue about query strings.
| 2:39 am on Feb 9, 2003 (gmt 0)|
Thanks for the responses. Since the engines does seem to be able to index PHP I wonder why it has not deep crawled to this day, maybe perhaps because I never payed a fee to submit a listing, or some such thing. I know Inktomi require a fee, though they have skimmed my pages over the months, they never deep crawled either, and that could be the reason.