Forum Moderators: open

Message Too Old, No Replies

LinuxGetUrl/2.0

         

WebOpz

2:32 pm on Apr 23, 2021 (gmt 0)

5+ Year Member Top Contributors Of The Month



Has anyone any details on LinuxGetUrl/2.0 usualprogrammers @ umich.edu? It started to show up a few weeks ago. At first it only made a few GET requests for index and robots but now it is growing every day and I'm finding it a bit rude. Any insights are appreciated...

TIA


[edited by: not2easy at 3:00 pm (utc) on Apr 23, 2021]
[edit reason] delinked UA string [/edit]

lammert

2:44 pm on Apr 23, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



From some other reports I have found, this bot is run by students of the University of Michigan. It seems to focus on music or podcasts.

not2easy

3:06 pm on Apr 23, 2021 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Hi WebOpz and welcome to WebmasterWorld [webmasterworld.com]


Sorry for the edits but out of respect for privacy we don't want to feed to stray bots. As mentioned in the Charter [webmasterworld.com] The UA string was edited.

WebOpz

3:17 pm on Apr 23, 2021 (gmt 0)

5+ Year Member Top Contributors Of The Month



@lammert - Looks like it is pretty useful requesting only index and robots.txt 100's of times a second. Do people even test their code? LMAO!

WebOpz

3:17 pm on Apr 23, 2021 (gmt 0)

5+ Year Member Top Contributors Of The Month



@not2easy Sorry about that. I will remember that in the future. =)

WebOpz

3:20 pm on Apr 23, 2021 (gmt 0)

5+ Year Member Top Contributors Of The Month



I just blocked em. Should I redirect them to a page with 😢🎶 on it?

WebOpz

4:54 pm on Apr 26, 2021 (gmt 0)

5+ Year Member Top Contributors Of The Month



They have escalated their scraping efforts via Google cloud resources. Completely ignore robots.txt and aggressively scrape.

not2easy

6:35 pm on Apr 26, 2021 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



If you have blocked them, they cannot scrape anything. How are you blocking them?

BTW - do not redirect them to anything. It is not appreciated by either party and will not prevent their further attempts.

WebOpz

6:50 pm on Apr 26, 2021 (gmt 0)

5+ Year Member Top Contributors Of The Month



I did block the first 2 IPs I saw when they originally appeared. Now they are across 20+ IPs on Google cloud. FYI, I was joking on the redirect. I'd like to redirect them to my fist. =)

not2easy

6:59 pm on Apr 26, 2021 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Blocking by single IPs is not very efficient. If you look up the IP to determine who it belongs to you can block the entire CIDR - BUT in the case of a single UA on various IPs, you could always block by UA. There are many discussions about how to do that around here. One recent example is here: [webmasterworld.com...]

IMPORTANT - don't just copy and paste, you might not be happy with what happens. You can see the method with examples but before editing your htaccess file to paste it in, be sure you understand what you're doing. Ask questions if you're not sure. ;)

WebOpz

4:50 pm on Apr 28, 2021 (gmt 0)

5+ Year Member Top Contributors Of The Month



@not2easy sound advice, thanks! I did end up finding out this was a CS class attempt at writing a search engine. Unfortunately for me it was buggy code that didn't respect robots.txt and had me assuming the worst. Anyhow, thanks again!