homepage Welcome to WebmasterWorld Guest from 54.146.175.204
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Discrepancy between Webmaster Tools robots.txt analysis and reality
fishfinger

5+ Year Member



 
Msg#: 3670596 posted 5:34 pm on Jun 9, 2008 (gmt 0)

I have been trying - and failing - for months to keep Google's image bot from displaying a number of images on a site. Before I give up and deal with it like any other scraper, server-side, can anyone tell me what I might be doing wrong?

Here's the appropriate parts of the file:

User-agent: *
Disallow: /*.jpg$
Disallow: /*.gif$
Disallow: /uploads/
Disallow: /uploads/textareas/
Disallow: /uploads/textareas/image/

I'm trying to block http://www.example.com/uploads/textareas/image/image.jpg

Webmaster Tools confirms
Blocked by line 12: Disallow: /uploads/textareas/image/

But the image shows in my 'top queries' and I can see it in Image Search in Google!

Orginally I just had the top-level folder in the txt file. Then I added the subfolders, and the wildcard matches. I even blocked Googlebot-Image from the whole site. Nothing has made any difference.

Just today I've re-added

User-agent: Googlebot-Image
Disallow: /

to the file as a last resort. Perhaps I need to keep it out of the pages the images are called from as well as the folders they reside in?

Anyone had/having this problem?

 

jeffposaka

5+ Year Member



 
Msg#: 3670596 posted 6:03 pm on Jun 9, 2008 (gmt 0)

I blocked the folders that contained images and a gallery. It took about 6 weeks, then they stopped coming.

fishfinger

5+ Year Member



 
Msg#: 3670596 posted 8:13 am on Jun 10, 2008 (gmt 0)

I've had the latest incarnation of the script online for 4 weeks, so perhaps I need to wait longer.

But Google tells me that it downloads the file on a near-daily basisand has done so for months.

tomda

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3670596 posted 11:29 pm on Jun 26, 2008 (gmt 0)

I am also having a lot of problem...

Google keeps indexing pages which are blocked by robots.txt.

/top-du-top/?c=0&an=0&mo=8&au=3- Blocked by line 46: Disallow: /*?*

I already did a big clean up last month but it never stops, I have now 300 pages indexed that should not be there.

Is there any tools to make sure that my robots.txt is correct ?

Are we the only one to have this nightmare ?

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3670596 posted 2:32 pm on Jul 11, 2008 (gmt 0)

*** Is there any tools to make sure that my robots.txt is correct ? ***

There is a whole section of Google's Webmastertools dedicated to helping you check your robots.txt file.

It can take a long time for changes to follow through in the SERPs.

You don't need the final * in your rule.

Disallow /*?

already says "disallow URLs that start with anything, and that start is then followed by a question mark".

The rule disallows anything that ends with a question mark, as well as URLs that have something, anything, after the question mark.

tomda

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3670596 posted 10:05 am on Jul 14, 2008 (gmt 0)

Thanks G1smd,

You are right - I had spotted the error after going through my robots.txt - and changed has been done.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved