Msg#: 4382821 posted 10:59 pm on Nov 2, 2011 (gmt 0)
Sheesh! Out of curiosity I just had a look at gluetext dot com, typed in two good keywords for one of my sites and their search engine (!?) began processing in real time. Where they get original URLs from I don't know (maybe one of the big three because there is no way that they can crawl the entire internet in under a minute) but they obviously did find my site and promptly started to rip through it (at least attempted to), in real time, while I was waiting at their site for the results to show. Fortunately, I have a good system in place to prevent this from happening otherwise they would have gotten away with everything, now they got nothing.
UA : Mozilla/4.76 [en] (Win98; U)
22.214.171.124 no referrer no robots.txt
It goes without saying, the whole range is now blocked !
Msg#: 4382821 posted 7:10 pm on Nov 3, 2011 (gmt 0)
Thank you keyplyr, I was just curious if their behaviour would be different when hitting a site that has blocked them (yours) and one that had not yet (mine).
After I entered my keywords I waited a moment while their progress thingy was running. Then I thought it was taking its time (in internet terms) to show the first results and on a hunch went to look what was happening at my site in the meantime and saw them trying to grasp page after page. Woh, I immediately closed the browser tap with their website and the their visit ended.
During that time they attempted to access 58 pages in about 1 min 46 sec.