Forum Moderators: goodroi
User-agent: Googlebot
Disallow: /*?*
User-Agent: Googlebot
Disallow: /photos/*.html$
I have also done a re-write in the class file within gallery2 so that photos/index.php is now resolving to photos/
I've never needed to use robots.txt before have I done this correctly and will it have the deseired effect.
Any advice greatly appreciated
You only need one "User-agent: Googlebot" line there.
If you have another user-agent you want to control, then there must be a blank line after the last Disallow in a any User-agent's record, including the last one in the file:
User-agent: abc
Disallow: /xyz
Disallow: /123
Disallow: /doremiUser-agent: def
Disallow: /xyz
Disallow: /124
URLs like "photo.jpeg.html" are rather non-sensical. If these are JPEG files, they should be called "photo.jpeg" or "photo.jpg", and if HTML pages, then "photo.html".
Jim
I agreee about the adding.html to the end of every photo page being a strange approach but this plug in is used ny gallery2 because the alternative is a url which was deemed to be "unfriendly" to search engines - but I think making each photo an html page is equally problematic if, as in my case, I am working with a great many photos and don't want to create meta tags for everyone.
What I'm trying to achieve with regard to indexing is I just want the first two levels of the gallery indexed so - mysite.com/photos/ and mysite.com/photos/pictures-of-pigs/ - because the front page of the site and the first page of each album has all the meta tags and each is unique etc. My hope is to stop googlebot indexing any of the photopages which present as www.mysite.com/photos/picture.jpg.html
I also want to stop google from indexing all duplicate versions of album pages so for example I want mysite.com/photos/ to be indexed but mysite.com/photos/?g2_page1 not to be indexed as both url's resolve to the same page. The first page of every album can be reached via mysite.com/photos/albumname/
and mysite.com/photos/albumname/?g2_page1
so I think this robots.txt should take care of all the photo pages that are missing tags and most of the duplicates the only issue I have is mysite.com/photos/index.php I have redirected it but still if you type www.mysite.com/photos/index.php into the browser the page comes up with that address showing so clearly I still have to find a way to redirect that properly.
My site is quite new about 6 months and so I don't know if it's really been penalised or if I'm just bouncing around at the moment I have seen the posts here about the minus 30 penalty and so whether that is what has happened or not, I thought I better sort my site out :)
One last idiot question if you don't mind - is asking Google not to index the duplicates enough to make Google happy? Or do I need to 301 all of the urls containing a? to their mod-rewrite version. If so how would you direct mysite.com/photos/example/?g2_page=1 to mysite.com/photos/example/ when there is no album called example sitting under the photos directory on my site. Sorry probably a really dumb question.
Thanks again for your help