homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

yes, them again

 7:39 am on Dec 25, 2013 (gmt 0)

Can someone point to a recent explanation of who Thunderstone are and what they want? They're not getting it from me; I'm just curious. - - [24/Dec/2013:07:56:35 -0800] "GET /robots.txt HTTP/1.1" 301 513 "-" "Mozilla/4.0 (compatible; http://search.thunderstone.com/texis/websearch/about.html)" - - [24/Dec/2013:07:56:35 -0800] "GET / HTTP/1.1" 301 492 "-" "Mozilla/4.0 (compatible; http://search.thunderstone.com/texis/websearch/about.html)" - - [24/Dec/2013:07:56:35 -0800] "GET / HTTP/1.1" 200 870 "-" "Mozilla/4.0 (compatible; http://search.thunderstone.com/texis/websearch/about.html)"

Possibly they thought they could hide among the thousands of lines of checklink hits (brand-new site, not yet visible to the public) in the same day's logs.

robots.txt currently says, in full,

User-Agent: W3C-checklink

User-Agent: *
Disallow: /

It goes beyond "Which part of {asterisk} did you not understand?"* They seem to go out of their way to look for roboted-out files and under-the-radar sites. Compare this thread [webmasterworld.com] from early 2011. I've never seen them on my "real" site, only on assorted backwaters.

Cursory Forums search suggests they've been at it-- whatever "it" is-- since 2001**. Their current home is

* Or, for that matter, "Which part of 301 did you not understand?" Note the pattern of redirects. My host's logs can be a bit hiccupy, so it's not even certain that they asked for robots.txt before asking for the front page-- currently the host's "coming soon" default, so neener-neener. What is certain is that they never bothered to follow the redirect.
** I had no idea there was such a thing as a lapsed or inactive member. That's how long ago 2001 was.



 4:22 pm on Jan 1, 2014 (gmt 0)

Not sure how I missed this.

In brief a 3rd party harvester.
from the main page of their site:
Thunderstone Software LLC is an independent R&D company that has been providing high-performance state-of-the-art solutions to intelligent information retrieval and management problems for over 33 years. Our flagship product, Texis™, is the most comprehensive text retrieval and publishing software available. In one package Texis provides every full-text, SQL, multimedia management, and dynamic publishing operation needed for an enterprise search application.
end of quote

Pretty straight-forward and not sure why you need an explanation.

More than a decade ago, T-h-u-n-d-e-r-s-t-o-n-e, ran primarily from a Road Runner IP. Perhaps and despite 3rd parties utilizing the software, the orgs server was utilized at that time.
Today, it seems the users IP is the active server for the software.

BTW, T-h-u-n-d-e-r-s-t-o-n-e failed to honor robots.txt and was void of any comprehension and/or protocols more than a decade and it is likely those practices of disregard continue.


 6:28 pm on Jan 1, 2014 (gmt 0)

I use a Thunderstone product as my site search. Great stuff.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved