|Made In Sheffield|
Nobody's talking about using Java for links! (bang, bang again, see post 27 :-)
The 100k limit, AFAIK, is only a limit of the page size stored by the cache, I believe a number of sources have confirmed that the information not cached in pages > 100k is still indexed.
Although it's obviously a huge Usability blunder to have a page that size and very very bad practice.
Oh, of course, an effective way to hide a link would be to incorporate it in a Java-applet. But it just might be overkill for a link.
A piece written in Flash could also do the trick.
<added>Made_In_Sheffield watch that head of yours, it might start to hurt ;)</added>
|Made In Sheffield|
Agreed Java would work, Flash not so sure, I don't use it but I believe Google is starting to index it? Can anyone confirm?
P.S. It's a soft wall :)
You mentioned that the best way to hide links from Google would be to block cgi-bin through robots.txt and then use a script which gives URL's of the format
where the id represents a URL
I have 2 questions
1) How to implement this? Could you give me links to such scripts as I am unable to find any?
2) What happens in general when Google sees a link but cannot follow it. Do such links contribute to PR leak?
Suppose I have 10 links on my page of which 4 are hidden via such a script. Would google distribute the PR to all the 10 links or only to those 6 visible links?
Imaster, there are a lot of such scripts. I recently gave some advice on installing one in another thread (http://www.webmasterworld.com/forum13/3142.htm) - the thread might not be relevant to you but the product works off the cgi-bin and counts clicks on links that it redirects to. You'll find it here: [aardvarkind.com...]
I did not pick this one but it seems to do the job just as well as others i've seen. You just have to look at "-Shortened URLs" in the "readme.txt" file for guidance on using IDs.
2) PR leak.
First, there's an understanding that a PR Leak is not a leak. Some PR is passed on from your page to the pages you link to, but your own page does not lose any PR in doing this. I've seen this statement in a few threads, here are two examples:
A link like that would be a link to an internal part of your own site. A part which is access restricted. I would assume that this makes it quite impossible to decide where the 4/10 of the PR should go to. In order to "leak" it must leak to "somewhere" - at least that is the way i understand it.
So, your 4 links will not leak. That is: They will not pass any PR on, as there is nothing to pass it on to. Your six remaining links will pass on 1/6 of the total amount passed on. That is, the total amount "passed on" will be the same, it will only benefit 6 pages in stead of 10. As 1/6 is greater than 1/10 this is good for the pages that get some.
<edit>typos, clarified text</edit>
Made_In_Sheffield: Beware with your head, you can break the wall! ;)
Everybody: I will remember to all of you that in post 23 I explained what is Java and JS. Please name things by their name before I join Made_In_Sheffield's autodestructive behaviour (and my walls won't last too much, so I'll break my neighbours' ones!)
claus: You said that a link can be hidden in a Java-applet, but I'm not sure: linking outside the applet's home domain is valid with Java security policies?
Imaster: I'll give you a quick definition 'bout PR leak:
A page gives PR to the pages it links to without losing its own. If the other page links back, then it'll give more PR to the 1st page. If your 1st page has more links, each one will carry less PR, so the 2nd pages that link back will return less. The page never losses PR by linking out, but it might happen that it gains less than before: this is what is called PR leak.
|Google definitely has followed the java links on my site. I have text links on the bottoms of the pages now, but Google indexed the pages before the text links were there. |
The wierd thing is, I have a client's site that uses the same type of java scripting, and the Goo didn't make it past the first page the first time it got crawled.
GoogleBot might have visited the pages in question, if you visited the page and had the Google ToolBar turned on to send data to Google.
Thats how Google finds new sites/pages...
|Made In Sheffield|
I disagree, I don't think Google do find pages in this way.
If a page is not linked to by another page then it is not part of the web as Google sees it and will therefore not be added to the index by my understanding. I think the only way Google finds pages is by following links from other sites. I'm pretty sure that submitting your site to Google does no good unless you have a link from another site (which kind of makes it a pointless exercise apart from making you feel better). Anyone want to correct me on any of that?
I do however think Alexa does find sites in this way.
I agree with mr. sheffield wednesday -;
No links in = no google recognition.
Maybe JS links can be followed, in fact I'm fairly sure they can, the point is what weight is given. IMHO none, many linkers like it that way -;))
Well someone visited our testpage which was only online about 10 minutes. Unfotunately did not keep the 404 but it was not linked to anything and nothing on it was clicked so no referals. Toolbar? Perhaps.
Mind you that is off topic - Sorry
|Made In Sheffield|
Nobody's saying Google can't find a page without links. What I am saying is that they won't add it to the index> until they find some links.
This is because of the way the PageRank alrgorythm works.
Why do you think pages get dropped from the index once you take them out of your site navigation but leave the pages there? Because they are no longer linked.
It takes a while but ti does happen in time.
Without looking in your log file and seeing what the User Agent was you can't blame that on Google, do you have the Alexa toolbar installed?
This question about hiding links from Google was discussed some months ago and Googleguy made his comments clear about this topic:
Googleguy comments about hidden links [webmasterworld.com]
his comments are page1, fifth comment
Personally, i use a php redirect script hidden from the bot using robots.txt. Google is gracious not to visit pages declared DENY by robots.txt
It's great for hiding repeated links to terms and conditions, privacy statements or adverts.
Read what Googleguy said, hope this helps.
A guy said:
"4. Use a redirect script and use robots.txt to disallow your script. (RECOMMENDED!) googlebot won't even touch your script."
then GoogleGuy replied:
"Yah, robots.txt or meta tags are your friends.. Those should both work fine."
If you just want your users to find the links, you could just put them on another page and setup a prominent link to that page.
Thanks GrinninGordon, Claus, and everyone for your help. I will check it up and post my observations soon :)
My understanding (though I could be wrong) is that you will benefit more from a backlink on a PR5 page with 10 other links than you will from a similar PR5 page with 100 links. I do NOT recall reading anything authoritative that said something along the lines "The more links you have on your page the lower will be its PR."
The code is:
The code is:
To whom is it passing PR?
I would like a confirmation on this observation. Can this really be true?
Think of it this way. Forget HTML, consider the page to be simply text. Finding absolute urls on the page is easy, and if the search engine spiders wish to do so they can follow those urls whether or not they are links.
Hope this helps,
I think Patrick_Taylor's Flash example would do the trick though. Wonder how long this will be (since google and macromedia alredy have some kind of google flash spidering thing going on).
If links can be hidden from the search engines, I'm going to start exchanging links with any site, including link farms. I don't recall anyone that wanted to trade links ever asking if the link would be picked up by Google and the other search engines.
There are a lot of cgi links pointing to my site. These are not read by spiders (drat, drat and double drat).
Could you put all your links in an access db or any db and have them pulled out by some client side method that Google can not follow. If you just have it in a text file google can look at it. I guess you could dissallow the text file. Google can't parse some script but they can look at the text and gleem URL's.
|If links can be hidden from the search engines, I'm going to start exchanging links with any site, including link farms. I don't recall anyone that wanted to trade links ever asking if the link would be picked up by Google and the other search engines. |
That won't help you with linkfarms cause they have bots that check if you're linking back to them etc.. and the bots won't see the link. They usually specifically require <a href="URL">DESC</a> link.
I guess it's all guessing untill we test it. But since ppl are reporting about Googlebot taking their .js files google must have some agenda with them..
i.e. the url http://www.xyz.com/dir/s.asp?l=111 actually leads to abc.com and if we check the backlinks for abc.com, the link http://www.xyz.com/dir/s.asp?l=111 does appear.
Examples used above are fictitious ;)
<a href="#" onclick="window.location='http://www.domain.com';return false;">link</a>
i have tested the above successfully, it isn't detected by google
| This 80 message thread spans 3 pages: < < 80 ( 1  3 ) > > |