Welcome to WebmasterWorld Guest from 3.227.233.55

Forum Moderators: Ocean10000

Message Too Old, No Replies

Whither Semalt and the like?

reducing list of active crawlers like semalt in htaccess

     
11:03 pm on Apr 9, 2018 (gmt 0)

Junior Member from CA 

Top Contributors Of The Month

joined:July 9, 2017
posts:47
votes: 5


I have not seen much of the crawlers like semalt and 1-99seo in the last half year. (Even probes like Jorgee have not shown up lately.) Since my htaccess file has grown to 6.8 kB and I would like to reduce it, I am wondering if it would be safe to delete semalt etc from my list of agents to block. What do others think of this?
11:54 pm on Apr 9, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


Why do you want to reduce your htaccess file? Unless you have something causing problems (endless loops, nested redirects, etc) this file makes little impact on your server's response time. There are numerous other factors that play a much more significant role regarding site speed.

My htaccess file is over 200 kb and Google says my site is very fast.

Yes it is always a good idea to re-examine your blocking rules. If you diligently watch your raw server access log files and have determined some UAs have stopped being troublesome, give it a try :)

Ironically, the couple times I've removed malicious UA blocks, they show up shortly thereafter.

Note: semalt is alive and well. I see it all the time.
1:11 am on Apr 10, 2018 (gmt 0)

Junior Member from CA 

Top Contributors Of The Month

joined:July 9, 2017
posts:47
votes: 5


Thanks for your comments, keyplyr, very interesting. I had been under the impression that a small and well-crafted htaccess file was good for server response times, but guess I needn't worry about the size part after all. There have been no problems with it.

Your point about removing a block only to have the culprit show up again soon afterwards is well made, so I will leave the list as is, except to make additions, of course.

Thanks again for your comments. It seems my worries were needless. (The best kind to have, I suppose.)
3:30 am on Apr 10, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 572
votes: 58


I, too, have been told by my host provider that a smaller htaccess is faster. I don't know. I have always added comments to my htaccess (I put comments on the line above) as to the last time I saw the bot. I wish htaccess had a way to add a comment on the end of a line, but it cannot. If it has been a couple of years then I comment the line out and watch for it to reappear in my raw access log. If they reappear then I remove the comment, activating the line yet again. I keep all versions of htaccess, so I can easily roll back.

I wrote a shell script to remove all comments before I upload the new htaccess. This cuts down the size considerably, but is less readable. I read the original, so this is not an issue.

I have not commented out Semalt,1-99seo or the buttons attacks. They might return.
3:49 am on Apr 10, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


I, too, have been told by my host provider that a smaller htaccess is faster.
There was a time... think dial-up, or at least before high-speed broadband, web servers were slow, HTTP/1.0 was the best we had and browsers took forever to render & load a webpage.

At that time every little bit was under the microscope and opinions were formed which are still held by some today. In my experience, asking a host provider about the mechanics of a webpage is futile endeavor. That is not what they do. I talk to them as little as possible.

Using comments in any file is just dead weight. That's just another throwback to yesteryears and no reason for comments to be in the htaccess file at all. Keep a separate file for all that info... UAs, ranges, pests, change dates, etc.
6:38 pm on Apr 10, 2018 (gmt 0)

Junior Member

10+ Year Member

joined:June 25, 2005
posts:195
votes: 2


delete semalt etc from my list of agents to block

Semalt was not in the User-Agent, but in the Referer - [webmasterworld.com...]

One month ago I saw
http://example.com.seocheckupx.net
as Referer. Same bot.
6:44 pm on Apr 10, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 572
votes: 58


That Semalt bot last year, 1-free-share-buttons and 1-99seo, and dragged me through much of South America for a couple of months. Huge numbers of IP addresses and host providers. Nasty. Yes, they were in the referrer column, not the UA.
7:05 pm on Apr 10, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15813
votes: 848


Oh, my, I havenít thought about semalt in years. They used to be on my list of bad referers--I still have a few--but when I changed over to header-based access controls, I looked closely and found that all robots with the semalt referer also break many other access-control rules, so the referer block is no longer needed.

On the internet as it currently exists, the main delaying factor about htaccess is not its size but its mere existence--or rather, the possibility of its existence. On every request for every file ever (not just pages), the server first has to check all the way up the filepath to see if it meets an htaccess in any directory along the way. Even if there wasn't one a millisecond ago when it had a request for image34.jpg, that's not to say there might not be a new one now that it's looking for image35.jpg. (And if there was one, it might have changed.) A further delaying factor, though admittedly a tiny one, is that the config file is read just once, at server startup--or at Config Reload, if you have the option of doing that separately--and all of its Regular Expressions are compiled once and for all time. In htaccess, everything is read and compiled afresh every single time. But you would have to have hundreds or thousands of hideously complex expressions before this really makes a difference. At that point, you're probably looking into your own server or VPS anyway.

Edit: I checked. The "semalt" element isn't present in logs from the last year or so (the ones I can check without going to extra trouble). Good riddance to bad rubbish.
7:54 pm on Apr 10, 2018 (gmt 0)

Junior Member from CA 

Top Contributors Of The Month

joined:July 9, 2017
posts:47
votes: 5


Semalt was not in the User-Agent, but in the Referer
Sorry, yes, wrong list. My bad.

All the comments have been informative and encouraging. I have already economized my comment lines somewhat in htaccess, but as the file size seems not "large" in practice (now 6.4 kB), I have left them in for the time being (for my own benefit, being too lazy to keep separate commented and uncommented versions).

The real job is to stay on top of the visiting bad guys. Trying to give away to them as little information as possible, I have switched all the Fails to R="404" so as to avoid "403" wherever possible. Still working on this. (I am no whiz at these regular expressions.) I certainly appreciate all the comments in the various threads here, and the knowledge and experience behind them. Thanks to all of you.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members