Welcome to WebmasterWorld Guest from 22.214.171.124
Forum Moderators: open
In connection with spam reporting and development of algorythmic solutions, a couple of areas not touched upon, and which occupy a great deal of WebmasterWorld bandwidth are 1) cross-linking and 2) duplicate content.
I'm wondering if you would be willing to entertain questions here, or perhaps in another thread, regarding those two rather perplexing topics. Particularly, are they a hot topic within Google as far as spam techniques are concerned, and are algorythmic changes being developed to address these areas?
No one seems to know, for instance, what constitutes "excessive cross-linking", leaving hidden links aside.
as far as I know the GoogleBar from mozdev.org doesn't show PageRank (or estimate of PageRank). Would Google Engineers help out in including PageRank support for the mozdev GoogleBar?
[edited by: WebGuerrilla at 5:28 pm (utc) on June 12, 2003]
[edit reason] stay ontopic please [/edit]
....must....optimize....for...the phrase: "cheapest haircuts near googleplex"
Seriously, GG thanks for your responses.
About spam reports. If someone detects a spam "methodology" how should one flag it. For instance, my instinct would be to put up an (orphan) page on one of my sites with clear examples and links to the offenders, etc. rather than trying to fit it all into the little spam report form - and just say "spam methodology report - see link for details - webmasterworld id:X". However - I would be afraid that the harried spam report staff would just ignore it, or worse send the spambot to the special page and next thing I know I would be graybarred.
How much human interaction is involved with the spam reports? Are they read by human eyes first, or filtered automatically into "send the spambot" and "needs human review"?
[edited by: PatrickDeese at 5:59 pm (utc) on June 12, 2003]
As I mentioned in the begining of this thread, any follow-up questions/responses that take place in this thread need to be directly related to the questions GoogleGuy chose to answer.
I understand that there are many members who don't care for those ground rules, but they are necessary if we want to keep the door open for future Q & A type threads.
Now you know people without <img /> will see <h1>Google</h1> while those with it will see <img />
Question 2: What about other similar things - dynamic menus depending on onmouseover/out with display: none? Is google also protected from being misleaded by this?
Question 3: It google taking data also from tags like title?
As Webguerilla pointed out in post 95 - please keep on topic - if we lost GoogleGuy (even if some post are vague) Webmasterworld is losing a great resource!
Please dont critize Googleguy so much - he has given so much to Webmasterworld already :)
For example - google yields almost exactly 10,000 results when searching for [highlight spoilers] while all the web has over 50,000 results.
I am of course among those that think it is better to ignore than penalize.
If you do it the current way, are you able to reinclude sites that use this as an integral part of their site?
Also: [font color=#FFFF11]It seems unfair to be able to ban a page on most boards by doing something like this[/font]
Last Also: Now all someone has to do to fool google's invisible text filter - is do the search that I did - and copy thier methods for hiding spoilers.
Seems like google will spend as much time trying to make invisible text filters as good SERPs. I hate to see 100 posts a month with "My competitor is using invisible text - and google can't catch them."
There's been a lot of discussion about hidden text so I don't feel I'm off topic.
Would the new algo penalise a site that used the same font colour as the background if this text was within a table with a different colour.
I ask this question as a coloured side navigation bar in say blue with white text links within a white page with the main content in black or blue would look nice and be pleasing to the eye (also I have some sites like this).
Moving on, I also have some sites where the background colour of the table is set by using a background image, I can image that these sites could be spidered and be penalised as it could appear that hidden text is in use.
I have started recoding the sites (none of which had hidden text and some don't look quite as good as before), for example some had a background image the width of the table with a feature on the left to make it look more arty, these will be replaced with a table background colour, also white text/links on blue would be changed for black text/links on blue.
On other sites I've taken out the background colour from the body tag, white background still shows and I have white links in a blue table in the navigation and black text on a white background in the main page content.
Am I wasting my time or being paranoid?
It would be very useful to know exactly what constitutes hidden text as I learnt a long time ago that trying to trick search engines in this way would only give short term results and I like to build sites that last but my customers like them to look good.
I'm sure that if we were all a little clearer on what constitutes hidden text the web building community would be able to avoid potential pifalls.
Please - the moderators are busting a gut to keep this thread to discussions about the answers, and me, just a humble user, I'm scrolling past message after message that should be a new thread.
Let's start new threads for new questions and keep this one as a concise and focused summary of the questions that were answered...
I'm involved in the Flash community and one of the things that is very hard for Flash developers is to get good rankings for sites that use Flash Based Content.
Now I read recently at [sitepoint.com...] that you should cloak, that is have a database of your content and test to see if GoogleBot (for example) is visiting and serve him content you have in your file. The author of that article does state you should ONLY serve the actual content
But how is Google going to like this? And on that topic what plans does Google have for Indexing Flash based content? (Eg Altavista, and Atomz both apparently can index flash based content).
GG opened the subject of hidden text and I for one would like to know what is constituted as hidden text, I'm also sure that this information would be of value to a lot of us on this forum.
<despair>Maybe I should keep my questions to myself, I thought it was a valid point, after all what is white on white text, time will tell.</despair>
[edited by: Marcia at 11:00 pm (utc) on June 13, 2003]
[edit reason] Let's stay on topic, please. [/edit]
Don't use unauthorized computer programs to submit pages, check rankings, etc. Such programs consume computing resources and violate our terms of service. Google does not recommend the use of products such as WebPosition Goldô that send automatic or programmatic queries to Google.
i was wondering if this [touchgraph.com] falls in the category of unauthorized computer programs?
Well, I think it's an eminently sensible question, Symbios, and my suggestion was that you started a new thread with this topic, in the hope of disentangling it from an essentially closed set of topics, and of making sure that the thread got noticed.
I'm sorry you felt that my post offered the least value of those you mentioned, and you have a fair point - I should have left it to the moderators. However, I wasn't writing to dole out rhetoric, merely out of a desire to see this thread neat, compact and self-contained.
That said, I'm delighted to hear that you share a domicile on the South Coast of the UK, and others here really won't know what they're missing, But you play your cards much closer to your chest than I do, as your profile is as empty as my bank account...
My thoughts are that your site is fine. The definition of Hidden Text is not "White", it's "Hidden", and you seem to have no desire to conceal, merely to present clearly.
Which is what good website design is all about, after all.
Works adequately on the editor side. As for people having trouble using the directory at dmoz.org, from Google's point of view the fact their mirror of it is 100% reliable is a plus for them.
>and their independance is now completly gone as they are controled by AOL.
I see zero evidence that AOL is trying to manipulate the ODP for their own end. More likely is that AOL barely notices or cares about the ODP.
I'm not positive that I'm a huge fan of the theming arguments that people have made--some of the most useful links I've seen are from "off-topic" sites--but I would definitely agree that it helps users to link to useful, relevant, related sites.
I've never thought there was any theming in Google's algo. If you read "The Anatomy of a Large-Scale Hypertextual Web Search Engine" it explains how they use two inverted indexes on the front end to get a list of document IDs for further investigation.
One is the "fancy" index and the other is the "plain" index. The first one indexes words in URLs, anchor text, and titles. The second one indexes all words.
If they have enough hits after checking the "fancy" index, then they don't even have to go on and check the larger "plain" index. Lots of searches are satisfied without even needing to go on to the plain index. Each of the "hit lists" derived from each of these two indexes uses a different set of metrics. For example, fancy words consider font, capitalization, etc. Plain words consider position in the document and proximity.
This approach explains why, to my mind at least, the keywords that are "fancy" tend toward the top of the SERPs. There's nothing "theming" about it; it's a hit counter. A one or a zero!
Google respects DMOZ because it's the best human edited index on the net.
The value of a DMOZ link for Google is dependant on the PR of the cat and the number of links. The greatest value in being included is that it will ensure that you will be found by every SE, not just Google. If you're in the directory, you will eventually be in G, Ink, Fast, AV, Overture, and your Uncle Bob's homegrown search engine.
Q: Are the ODP and Google sites naturally occuring PR10 (PR11 for Google?) or is there some manual intervention for larger sites to ensure they have suitable PR?
A: Itís all natural.
I'm still working over the "DMOZ/ODP PR is natural" statement in my mind.
I'll buy that its natural to the calculation, but circumstances surrounding its backlinks are unnatural.
While other mirrors of the ODP aren't indexed and don't count as backlinks [google.com], every page of Google's Google Directory mirror links to the ODP home page (dmoz.org) and also links to dmoz.org/about.html and these links do count as hugely valuable backlinks. Just a sidenote, the "about" page has even more backlinks (30,000 more [google.com]) and a PR10 compared to the home page's PR9.
So in this sense, it is not natural, whether you want to make the case that the other mirrors should count or Google's mirror should not. Based on their own standards, no mirror should be indexed. It seems to me that google sought to boost the importance of the ODP in this manner while maintaining the integrity of the PR system.
Congratulations for introduction of Slide show but is not so helpful as it takes so much time to download results.
Thanks for the answers.
This thread seems to deal extensively with spam related issues. There are many more questions for Google as we know that there are a lot many major changes happening at google related to issues like spam(which you have already answered), backlinks, page rank algorithms, site submission and inclusion of web pages (for both new and old websites) in Google search and image search, multiple domains, etc.
Would you like to answer them in another thread?