Forum Moderators: open

Message Too Old, No Replies

Please Name Your robot

         

lucy24

7:11 pm on Jun 13, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



46.188.16.250 - - [13/Jun/2017:06:38:52 -0700] "GET /robots.txt HTTP/1.1" 200 793 "-" "Mozilla/5.0 (compatible; Please Name Your robot; +http://192.168.1.33:23481/yioop/bot.php)"
...
46.188.16.250 - - [13/Jun/2017:09:58:15 -0700] "GET /dir/subdir/ HTTP/1.1" 403 1766 "-" "Mozilla/5.0 (compatible; Please Name Your robot; +http://192.168.1.33:23481/yioop/bot.php)"

Uh... Yeah.

Peter_S

7:55 pm on Jun 13, 2017 (gmt 0)

5+ Year Member Top Contributors Of The Month



lol

keyplyr

11:37 pm on Jun 13, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



yioopbot - search engine software found at: yioop.com

Their bot info page is equally economic: yioop.com/bot


[fix typo]

[edited by: keyplyr at 7:33 am (utc) on Jun 23, 2017]

lucy24

1:28 am on Jun 14, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



A few years back there was a robot called the yioopbot but I kinda think it's unrelated. It definitely didn't live at 192.168, since that's

:: detour to check ::

Yah, that's what I thought. Private Use Network, meaning that we can't get there from here.

Here's the old one. It was longer ago than I thought.
173.13.143.78 - - [15/Nov/2011:10:43:06 -0800] "GET /robots.txt HTTP/1.1" 206 809 "-" "Mozilla/5.0 (compatible; YioopBot +http://www.yioop.com/bot.php)" 
Interestingly, the domain still exists, but bot.php is now a 404. Well, maybe it always was; who can remember that far back?

keyplyr

3:24 am on Jun 14, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



who can remember that far back?
Eat more fish.

cpollett

4:03 am on Jun 23, 2017 (gmt 0)

10+ Year Member



The Yioop search engine software can be downloaded at seekquarry.com. By default, if you crawl without configuring anything, it will identify itself as "Please Name Your Robot", so someone downloaded the software and didn't configure anything before crawling. The last time YioopBot (the crawler at Yioop.com that makes use of Yioop software) was used for a major crawl (a billion pages) was around 2015. I did done some small 30million ish page crawls at the start of the new year (2017) of just Canadian sites at Findcan.ca (identified as FindcanBot), but nothing since January. I am currently working on some other projects right now (I am writing a PHP IMAP server and realizing IMAP is overly complicated) and waiting for SSD prices to come down a bit before trying any major crawls. I also want to revamp some of the indexing internals. My guess is I'll try a larger crawl another around December, 2017 or early January, 2018.

keyplyr

5:18 am on Jun 23, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks for the info cpollett.

cpollett

6:59 am on Jun 23, 2017 (gmt 0)

10+ Year Member



No problem. Lucy above pointed out the bot description was rather terse. I think it got reset during my last DB upgrade. Since it wasn't crawling, and I don't normally look at that page, I hadn't notice. It is back at yioop.com/bot now. Yioop is based in San Jose not Russia. The software has been used by lots of different other sites though. Unless the code was altered, it should obey the robots.txt as described on the bot page.

[edited by: keyplyr at 7:30 am (utc) on Jun 23, 2017]
[edit reason] please, no active links [/edit]

lucy24

5:57 pm on Jul 25, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I do believe this is the same kind of thing, so it's not worth starting a fresh thread.

68.74.116.*** - - [23/Jul/2017:15:22:54 -0700] "GET /robots.txt HTTP/1.0" 200 841 "-" "VenusCrawler/Nutch-1.12 (crawler@mycompany.com)"

"mycompany.com"? Oh, you betcha.

[edited by: keyplyr at 9:26 pm (utc) on Jul 25, 2017]
[edit reason] obscured private IP address [/edit]

keyplyr

2:56 am on Jul 26, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Most every bot available at github has that feature. You can usually change any or all UA string attributes and the other header fields as well. That's how spoofing is accomplished.

lucy24

5:45 pm on Jul 26, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well, yes, of course. I'm just laughing at the ones that missed a line.