Forum Moderators: open

Message Too Old, No Replies

Gowikibot

         

lucy24

11:19 pm on Sep 27, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Previous discussion here [webmasterworld.com]

UA (unchanged): Mozilla/5.0 (compatible; Gowikibot/1.0; +http://www.gowikibot.com)
IP: 216.160.239.abc (different numbers from before, but still Qwest)
robots.txt: yes ... and nothing else
Protocol: both HTTP and HTTPS
It was the robots.txt-and-nothing-else that caught my attention. Two HTTP requests to personal site, two HTTPS to personal site, two 301 + two 200 to primary site, which is still HTTP. Since I don't log headers on redirects*, I can't be sure, but strongly suspect they default to www.example.com, while this site happens to be without-www.

Robots.txt: No, possibly on an earlier visit
Or, then again, possibly on a (much, much) later visit. Heh.

That was two days ago and they've yet to ask for anything else.


* robots.txt requests are exempt from HTTPS redirecting, which also means they're exempt from domain-name-canonicalization, since it's the same RewriteRule. Sites that are not yet HTTPS don't have the robots.txt exemption.

keyplyr

1:08 am on Sep 28, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've been allowing it. Seems to be quite common for SEs to update robots.txt permission for sites in their index.

Gowiki

4:25 pm on Oct 3, 2018 (gmt 0)

5+ Year Member



Hello - Gowiki here.

In response to lucy24, the crawler which handles the robots.txt file is separate from our other crawlers and will therefore only download the robots.txt file. We attempt to find the robots.txt file by requesting it on both port 80 and 443 (in case 80 has no redirect to 443) for each subdomain of a domain. For instance, if your domain is example.com, we will request the robots.txt file at http://www.example.com, https://www.example.com, http://example.com, and https://example.com. When successful, we will request the file again after a few days to update our records in case changes were made to the file. When unsuccessful after a few tries, we will delay the new request for a few weeks. Please note that we have politeness rules in place so that there is a delay between each request.

We are still in development and are continuing to improve our crawlers. Thank you for allowing us to crawl your site.

keyplyr

6:36 pm on Oct 3, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi Gowiki and welcome to WebmasterWorld [webmasterworld.com]

And thanks for making available a comprehensive Gowikibot info page from your UA string: http://www.gowikibot.com