I read the page provided [alltheweb.com] and it does say it's not 100% accurate.
I can only guess that it's a filter looking for the most common words, phrases and filenames that have been defined by FAST as "offensive." If those words appear on your site it would explain the missing pages when the filter is on.
I doubt we'll get to know the list, however, you could build your own list by looking for the missing pages and compiling the possibilities into a spreadsheet or database.
If I perform searches with words which could be markted as "offensive" and with the filter switched on, then I get a less results number in top of the SERPs as after search without this option. But I get results!
Surely the filter checks combinations.
But I have the idea, my problem is of the mixture of English and German. There are any english keywords and also english books at the pages, but the most text is in German.
Could it be that the filter algorithm is depending on the density of may-be-offensive words? And the filter only counts english words to calculate it, because it only deals with english contents?
h_b_k, where do you check? ATW (AlltheWeb) checks only for english words. T-Online nevertheless filters Fast results for german words. Not sure if for example T-online would use both filters for mixed english/german content. Wouldn't suppose so. It would need some testing with cross checking results between T-online and ATW with filters on/off.
I have checked both alltheweb.com and t-online.de search
With alltheweb.com I have the opportunity to toggle the filter option off and on. Using t-online.de with my test phrases, then I get the filtered SERPs.
ADDITION: At t-online SERP there is also an option to toggle "Familienfilter ausschalten"/"Familienfilter einschalten". (family filter) After toogling the filter option to "off", I can see the same results as with ATW filter off. This means that t-online results are pre-filtered with english phrases?
Looking at the other results in t-online.de SERPs, my content could not have any problem with the german family filter. But it has no chance because of the english filter.