Forum Moderators: open
If i have 3 dynamic pages listed on google..such as...
widgets.php?size=big
widgets.php?size=medium
widgets.php?size=small
How can i stop google listing just the "small" entry?
Can i just use a meta noindex type thing when i dynamicly create the small page?
If i do it this way will the other pages remain on google?
Many thanks
But it won't hurt even a bit when they get your files, in terms of ranking ... so stop worry unless they are getting some sensitive information indexed.
Does google have a limit on number of pages it will eventually pick up?
Is there a definite way to stop them crawling certain pages?
If the meta noindex thing dosnt work whats the point of it!?
can i list a dynamic page such as..
widgets.php?size=small
in robots.txt and if i can, will that not stop ALL of the widgets.php pages i have links to?
Lots of questions there!
?
User-agent: Googlebot
Disallow: /*size=*$
NickW tried something similar but a slightly different format. Googlebot ignored it and indexed the pages anyway.
User-agent: Googlebot
Disallow: /index.php?*$
He listed User-agent: * above the one for Googlebot which may or may not be the reason it didn't work. Don't know. According to Google though, if you don't want their bot crawling any dynamic pages they say to use this format:
User-agent: Googlebot
Disallow: /*?
See FAQ #12: [google.com...]
Since you want some indexed, you can try what I did. Googlebot was ignoring my noindex, nofollow metatags. Once I used the above in robots.txt he started behaving himself. Last week, I experimented again with metatags on some other pages without using robots.txt entries. So far, he seems to be obeying them. Perhaps he's had his attitude adjusted. I'd recommend using both robots.txt and metatags then keep an eye on your logs to see if he goes wandering off.
Id rather stay clear of the robots.txt method if possible as i want the dynamic pages to "change" their index/noindex state each time google updates depending on the content that is being read in from another source at that point in time.
So i need the index/noindex state to be as dynamic as the page itself (if you see what i mean!)
I dont think that would be possible with robots.txt file.
Has anyone else got any experience with the meta noindex tags and let us in on if they work or not!
?
Thanks
Metatags seem to work on some people's sites and not others. If that's what you want to use go for it. If he doesn't follow the tags, you can always contact Google and tell them their bot is misbehaving and hope they change their software.
2) My use of the "noindex" directive was to avoid the waste of bandwidth. If G-bot hasn't got that right, it's just plain 'ignant. Their wasting their own resources..
<PLEA>G? WHY! - You need the bandwidth right</PLEA>
Am i right in thinking that i could add a extra variable to the URL link and then include this part in the robots.txt file as a filter?
ie.
widgets.php?size=big
widgets.php?size=small&stop=1
then in robots.txt :-
User-agent: Googlebot
Disallow: /*stop=1
would this stop any url with &stop=1 being inexed?
and allow all other through?
Can i make the robots.txt in notepad? i read something about being careful about how you create the file?
thanks again...
so id rather drop some "less" important ones so as to hopefully get more of the important pages indexed
My guess is that it won't work as you expect it. Since the number of pages crawled and indexed seems to be a function of PR, I would rather get more inbound links with good anchor text.
IMHO you should never lock out googlebot from something that could be important in the future.
A short-term measure would be to make the "more important" pages more important to Googlebot, e.g. placing more links to them, listing them in a dedicated sitemap linked by several pages, giving the links to them h1 tags etc.