Forum Moderators: open

Message Too Old, No Replies

So blocking semrush. block bot or IP?

         

born2run

3:49 pm on Mar 7, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hey so I"ve decided to block Semrush for now. I know it's a seo tool site but I don't care much right now.

Should I just block their User Agent text or block their IPs. If ips, can anyone here please let me know the ip range for their bots? Thanks in advance!

TorontoBoy

4:30 pm on Mar 7, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



My personal notes say that Semrush requires a UA ban as well as ip ban. They use Advanced Hosters.

[webmasterworld.com...]

# Advanced Hosters 46.229.160.0.0 - 46.229.175.255 semrushbot
deny from 46.229.168.0/24
# ADVANCED HOSTERS 192.243.48.0 - 192.243.63.255 Semrush bot no ua
deny from 192.243.51.0/24 192.243.53.0/24 192.243.55.0/24

keyplyr

8:25 pm on Mar 7, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If you publish ads, Semrush can be a highly beneficial agent to allow. Do the research.

born2run

5:15 am on Mar 8, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Ok keyplyr I hear you.. I've removed the block for Semrush. It's been hammering my site but I guess it's OK for now.

keyplyr

5:47 am on Mar 8, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hammering? You mean actively crawling?

That's what bots do. It doesn't hurt your server. Probably 3/4 of server requests are from bots.

Semrush does many things. Some of the data collected is used in their SEO products giving their customers comparative metrics for advertising. This can benefit you by attracting more advertisers that bid for those ad slots at your site.

TorontoBoy

2:10 pm on Mar 8, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



bot usefulness depends on your individual site. To each his/her own.

TorontoBoy

4:40 pm on Mar 8, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



@blend27, do I sense a tinge of anger, or at least frustration of being overwhelmed by bots?

blend27

5:31 pm on Mar 8, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



No, No Anger and definitely not overwhelmed for a while now. Actually being able to do that is a blessing!

I load IIS log files into DB for later analysis from several sites. Loading 200MB LESS(several sites worth of data) a day of useless Data is just waste of time and space in DB.

I run sites that are ALLOW ALL(these are the sites that gather Data and..) and are Locked Down sites(web.config, .htaccess, hosting ranges, Country IP Tables, programmatically -headers, RDNS, WebServices APIs and such). It is a looooooooot of FUN!

But then again, every Developer goes crazy in their own way :)

guggi2000

2:24 pm on Aug 28, 2018 (gmt 0)

10+ Year Member Top Contributors Of The Month



I am not happy about blocking the SEMRush bot but how do you tell them to ignore the jsessionid parameter:

www.example.com/;jsessionid=A
www.example.com/;jsessionid=B

They think it's 2 pages and create endless sessions.

lucy24

6:04 pm on Aug 28, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Does the sessionID parameter have meaning for anyone? In general, you deal with unwanted parameters by redirecting requests to the parameter-less form of the URL. If necessary, add a User-Agent condition if only certain requests are to be redirected. They'll get the message eventually. Exact format of the redirect will, of course, depend on your server type.

:: detour to logs, because I've been Ignoring them for yoincks and honestly don't know what they've been up to ::

Oh, how odd. On one site they've never (calendar year 2018) requested anything but robots.txt. (All sites share an exclusion list, so it's not that I've inadvertently denied them.) On other sites they go through long spells--up to several weeks--where they request nothing but robots.txt, and then they jump in and do major crawls again. No hanky-panky; it's the same IP and identical UA all around. Is it possible they're sometimes just checking for uptime, in which case a successful robots.txt fetch is as useful as anything else?