Welcome to WebmasterWorld Guest from 54.159.190.106

Forum Moderators: goodroi

Message Too Old, No Replies

Discrepancy between Webmaster Tools robots.txt analysis and reality

   
5:34 pm on Jun 9, 2008 (gmt 0)

10+ Year Member



I have been trying - and failing - for months to keep Google's image bot from displaying a number of images on a site. Before I give up and deal with it like any other scraper, server-side, can anyone tell me what I might be doing wrong?

Here's the appropriate parts of the file:

User-agent: *
Disallow: /*.jpg$
Disallow: /*.gif$
Disallow: /uploads/
Disallow: /uploads/textareas/
Disallow: /uploads/textareas/image/

I'm trying to block http://www.example.com/uploads/textareas/image/image.jpg

Webmaster Tools confirms
Blocked by line 12: Disallow: /uploads/textareas/image/

But the image shows in my 'top queries' and I can see it in Image Search in Google!

Orginally I just had the top-level folder in the txt file. Then I added the subfolders, and the wildcard matches. I even blocked Googlebot-Image from the whole site. Nothing has made any difference.

Just today I've re-added

User-agent: Googlebot-Image
Disallow: /

to the file as a last resort. Perhaps I need to keep it out of the pages the images are called from as well as the folders they reside in?

Anyone had/having this problem?

6:03 pm on Jun 9, 2008 (gmt 0)

5+ Year Member



I blocked the folders that contained images and a gallery. It took about 6 weeks, then they stopped coming.
8:13 am on Jun 10, 2008 (gmt 0)

10+ Year Member



I've had the latest incarnation of the script online for 4 weeks, so perhaps I need to wait longer.

But Google tells me that it downloads the file on a near-daily basisand has done so for months.

11:29 pm on Jun 26, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I am also having a lot of problem...

Google keeps indexing pages which are blocked by robots.txt.

/top-du-top/?c=0&an=0&mo=8&au=3- Blocked by line 46: Disallow: /*?*

I already did a big clean up last month but it never stops, I have now 300 pages indexed that should not be there.

Is there any tools to make sure that my robots.txt is correct ?

Are we the only one to have this nightmare ?

2:32 pm on Jul 11, 2008 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



*** Is there any tools to make sure that my robots.txt is correct ? ***

There is a whole section of Google's Webmastertools dedicated to helping you check your robots.txt file.

It can take a long time for changes to follow through in the SERPs.

You don't need the final * in your rule.

Disallow /*?

already says "disallow URLs that start with anything, and that start is then followed by a question mark".

The rule disallows anything that ends with a question mark, as well as URLs that have something, anything, after the question mark.

10:05 am on Jul 14, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks G1smd,

You are right - I had spotted the error after going through my robots.txt - and changed has been done.

 

Featured Threads

Hot Threads This Week

Hot Threads This Month