I would put a meta tag on the page excluding bots
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
or search google's webmaster info page [google.com] for soloutions.
But what i've read that does not stop the page being pulled by the spider it only stops the indexing of such a page.
i really do not want any bot from accessing these pages as the bill will run up.
My site is quite large 50K static pages. and i get a lot of visist by bots each day.
can anyone else confirm this, meta tags only stop indexing not the bot pulling the page.
has anyone got another way,
I have files and folders listed as disallowed in the robots.txt file and I added the robots meta tag: no index, follow to the files individually.... still the files get listed in Google .. don't seem to be able to stop Google listing all swf files either
Has Googlebot gone mad!
Does Google follow Java.
What Iím thinking of doing is having a button that will need to be clicked to show the portion of the page that i pay for.
Questions that Iím not sure on are:
1. Will Gbot follow the link on page?
2. Is this cloaking.
I can not afford the Gbot hits on these parts of the page or being banned for a cloaker.
Any help on this would be greatly appreciated
robots.txt should be the way to go.
If you really want to be very sure you can rewrite your .htaccess and disallow the google IP range from the directories with the pages you don't want to get crawled.
|META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> |
How can you tell just GoogleBot to avoid certain pages?
it wouldn't hurt to do both i suppose.
robot.txt and the java on page to show the content.
>>How can you tell just GoogleBot to avoid certain pages?
In addition to excluding, make sure there are no links pointing to them.
as well you can put them in a user registration area. This way people have to sign up.
Another way is to make a script with a form and let people type their emailaddress before going to the file. By doing that you can target the user later with a polite request.
Inbounds direct to the pages could be a problem and not one that you can easily fix. Id look for a script that asked people to type in a 3 letter word from a graphic . Email address requests may put people off ..
Ive seen the software in use on news sites, but am not sure where to get it . Implementing it should be quite easy.
Food for thought thanks Iíve got some ideas now.
|How can you tell just GoogleBot to avoid certain pages? |
Google does support it's own specific robots tag...
|Googlebot obeys the noindex, nofollow, and noarchive Robots META Tag. If you place the tag in the head of your HTML/XHTML document, you can cause Google to not index, not follow, and/or not archive particular documents on your site. |
<meta name="googlebot" content="robots-terms">
The robots term of noindex will produce the following effect; Googlebot will retrieve the document, but it will not index the document.
The robots term of nofollow will produce the following effect; Googlebot will not follow any links that are present on the page to other documents.
The robots term of noarchive will produce the following effect; Google maintains a cache of all the documents that we fetch, to permit our users to access the content that we indexed (in the event that the original host of the content is inaccessible, or the content has changed). If you do not wish us to archive a document from your site, you can place this tag in the head of the document, and Google will not provide an archive copy for the document.
Further information on this specific robots tag can be found here with additional instructions.
How can I prevent Googlebot from following links from a particular page or archiving a copy of a page? [google.com]
Thanks pageone thats what i understood for the meta tags No index does not stop google asking for the page.
You're right, Vimes. That's why you need the entry in robots.txt -- to ensure the page doesn't get pulled by gbot.
BUT that doesn't stop google listing it as a URL-only result in SERPS (indicative of a page that google knows is important due to inbound links, but hasn't, or any reason, crawled the page yet).
The easy way to do this that I know of, is to use the <META> robots tag. But then, gbot has to be able to crawl the page to find this. Which won't happen if you use robots.txt.
There's the catch.
As a few ugly solutions, you should try the user-interaction technique (enter 3 chars, etc.), or keep the robots.txt entry and occasionally use Google's auto-removal tool to wipe their index of the url-only listings.
No need for complicated user interaction.
Why not use a simple form submit action without java?
Gbot doesn't seem to follow these on my sites.
|Small Website Guy|
document.write("first half of link HTML") + document.write("second half of link HTML")
That's how I place all of my email addresses on my websites, and so far I don't seem to be getting any junkmail (except from people in Africa with vast fortunes who need my help to liberate their fortune, for which they promise to give me 30%--I think these people are manually gathering email addresses).