Welcome to WebmasterWorld Guest from 18.104.22.168
Forum Moderators: bakedjake
Looks impressive so far indeed. I'm really curious about any increase/decrease in relevance, once there's a significant number of sites indexed.
A few things to note, most of which you probably know already:
This time of the day submitted, spidered and listed within 3mins ;)
On the robots.txt issue - I agree - a good bot should obey the rules (more work). The downside with robots.txt is that it's down to the spider to enfore the rules - it would be nice to have an apache module that sent 403's based on the robots.txt rules.
Me I don't bother with robots.txt - if there is a bad bot mod_rewrite sorts it.
Also of interest to all on this topic is [webmasterworld.com...] - Interesting argument on the price of a search engine - in particular Google, with Matt here demonstrating what one determined and skilled individual can do - maybe lowers the price further?.
Sure got our minds off Google for a few days huh? :)
One concern -- cgi scripts are not being filtered out of the SERPs. For example, search on links.cfg and you'll see what I mean. I'd say that's a target for some serious abuse.
(edited by: MarkHutch at 6:55 pm (utc) on Mar. 26, 2002)
2) Intermittently, I see the "Last 5" only returning 4 results. I'm clicking fairly fast, so I don't think it's a blank line coming through
3) Is it my imagination, or is there loads of German content in there? I've seen more German language lines in the SERPS from Gigablast than anywhere else I remember outside of a dedicated German language engine
1) Did the d/base reset again? A loa dof sites I put in yesterday seem to have dumped
It appears that he did. I believe he posted about it just previous to your post.
The force respider option is gone as well. I can only guess that the last five searches is a study tool for Matt at this point. I can't imagine leaving it in place.
Matt is using a black list he got free from squidguard. I am wondering how many others are using this and what the criteria is to be placed on it. I found sites of ours and others we know on these lists for no apparent reason. There were even some IP's that are on our server that have yet to be developed. They don't even have an index page and they're on this black list. Oh btw, if you have an adult site you're trying to get listed, this would be the most likely reason it isn't from what we are reading on these lists.
Matt: you might want to rethink using this data alone to decide what you do and don't want in your database.
Imagine increasing efficiency 3 times per day, wow, that will speed up 2 months to 20 days I guess (if the hardware is not a problem-the limitation I mean)
(edited by: MarkHutch at 12:55 am (utc) on Mar. 28, 2002)