Welcome to WebmasterWorld Guest from 54.161.208.7

Forum Moderators: Ocean10000 & incrediBILL & keyplyr

Message Too Old, No Replies

Singingfish extractor not obeying robots.txt

sound files hosted at different domain than website

     
5:42 pm on Jun 9, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 18, 2002
posts:131
votes: 0


We host a small number of sound files for a friend in a robots.txt excluded directory. The website that links to them, however, is at a different, unprotected domain, e.g.:

www.foomusic.com
www.mydomain.com/protected/soundfile.ra

Singingfish.com's spider sees the original website and declares it open season on the sound files. The extractor then blindly hits those sound files without checking the robots.txt file of the hosting domain.

I sent them an email. In the meantime, I am banning their extractor:

extractor.singingfish.com
63.251.169.234

ADDED: Got a reply to my email. They do not seem concerned that their extractor ignores robots.txt. Instead, they said they would add my domain to their exclusion list and run a script to remove our files from their db. Not what I was hoping for...

3:28 pm on June 11, 2003 (gmt 0)

Preferred Member

10+ Year Member

joined:Mar 22, 2002
posts:453
votes: 0


Adding your domain to an excluded list seems a pretty inefficient method of sorting things out - surely getting the spider to read robots.txt is the way to do things but hey, who are we to ask? ;)

As an aside, did you consider blocking the spider via .htaccess?

R.

12:51 am on June 14, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 18, 2002
posts:131
votes: 0


The extractor uses a generic Real Media UA, so I blocked them by IP instead.

I replied to their email, explaining the situation and even pointing them to this thread. Unfortunately I got no response.