homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

Singingfish extractor not obeying robots.txt
sound files hosted at different domain than website

 5:42 pm on Jun 9, 2003 (gmt 0)

We host a small number of sound files for a friend in a robots.txt excluded directory. The website that links to them, however, is at a different, unprotected domain, e.g.:


Singingfish.com's spider sees the original website and declares it open season on the sound files. The extractor then blindly hits those sound files without checking the robots.txt file of the hosting domain.

I sent them an email. In the meantime, I am banning their extractor:


ADDED: Got a reply to my email. They do not seem concerned that their extractor ignores robots.txt. Instead, they said they would add my domain to their exclusion list and run a script to remove our files from their db. Not what I was hoping for...



 3:28 pm on Jun 11, 2003 (gmt 0)

Adding your domain to an excluded list seems a pretty inefficient method of sorting things out - surely getting the spider to read robots.txt is the way to do things but hey, who are we to ask? ;)

As an aside, did you consider blocking the spider via .htaccess?



 12:51 am on Jun 14, 2003 (gmt 0)

The extractor uses a generic Real Media UA, so I blocked them by IP instead.

I replied to their email, explaining the situation and even pointing them to this thread. Unfortunately I got no response.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved