| 4:59 pm on Jun 12, 2003 (gmt 0)|
Thank you Googleguy for enhancing our enlightenment here.
In connection with spam reporting and development of algorythmic solutions, a couple of areas not touched upon, and which occupy a great deal of WebmasterWorld bandwidth are 1) cross-linking and 2) duplicate content.
I'm wondering if you would be willing to entertain questions here, or perhaps in another thread, regarding those two rather perplexing topics. Particularly, are they a hot topic within Google as far as spam techniques are concerned, and are algorythmic changes being developed to address these areas?
No one seems to know, for instance, what constitutes "excessive cross-linking", leaving hidden links aside.
| 5:27 pm on Jun 12, 2003 (gmt 0)|
as far as I know the GoogleBar from mozdev.org doesn't show PageRank (or estimate of PageRank). Would Google Engineers help out in including PageRank support for the mozdev GoogleBar?
[edited by: WebGuerrilla at 5:28 pm (utc) on June 12, 2003]
[edit reason] stay ontopic please [/edit]
| 5:34 pm on Jun 12, 2003 (gmt 0)|
Thanks, Google Guy.
I authored a page which has four words in semi-hidden text (#c0c0c0 on #c0cccc background). I was concerned about alternative spellings for color wheel and wanted to include them all. The page hasn't been penalyzed by Google, but I wonder if this could end up a problem.
| 5:38 pm on Jun 12, 2003 (gmt 0)|
....must....optimize....for...the phrase: "cheapest haircuts near googleplex"
Seriously, GG thanks for your responses.
About spam reports. If someone detects a spam "methodology" how should one flag it. For instance, my instinct would be to put up an (orphan) page on one of my sites with clear examples and links to the offenders, etc. rather than trying to fit it all into the little spam report form - and just say "spam methodology report - see link for details - webmasterworld id:X". However - I would be afraid that the harried spam report staff would just ignore it, or worse send the spambot to the special page and next thing I know I would be graybarred.
How much human interaction is involved with the spam reports? Are they read by human eyes first, or filtered automatically into "send the spambot" and "needs human review"?
[edited by: PatrickDeese at 5:59 pm (utc) on June 12, 2003]
| 5:38 pm on Jun 12, 2003 (gmt 0)|
>>I'm wondering if you would be willing to entertain questions here, or perhaps in another thread
As I mentioned in the begining of this thread, any follow-up questions/responses that take place in this thread need to be directly related to the questions GoogleGuy chose to answer.
I understand that there are many members who don't care for those ground rules, but they are necessary if we want to keep the door open for future Q & A type threads.
| 5:45 pm on Jun 12, 2003 (gmt 0)|
Thanks for taking the time to answer all those questions, GoogleGuy.
| 6:10 pm on Jun 12, 2003 (gmt 0)|
Question 1: How can I be sure that my page won't get screwed because of using accessibility things like placing image above the text (so people browsing with mobile or [hell, or even with refrigerator in these days] will see my text instead of image)? Let's say you will do
<h1>Google<img src='google.gif' alt='google' /></h1>
Now you know people without <img /> will see <h1>Google</h1> while those with it will see <img />
Question 2: What about other similar things - dynamic menus depending on onmouseover/out with display: none? Is google also protected from being misleaded by this?
Question 3: It google taking data also from tags like title?
| 6:20 pm on Jun 12, 2003 (gmt 0)|
did anyone else notice how the question of whether googleguy has ever optimised someones site was skipped in favour of the guy he buried for hidden text?
| 7:01 pm on Jun 12, 2003 (gmt 0)|
As a newish member of Webmasterworld - but a long time reader....
As Webguerilla pointed out in post 95 - please keep on topic - if we lost GoogleGuy (even if some post are vague) Webmasterworld is losing a great resource!
Please dont critize Googleguy so much - he has given so much to Webmasterworld already :)
| 8:01 pm on Jun 12, 2003 (gmt 0)|
Thanks for the feedback!
|In an ideal world, webmasters would only worry about making great sites for users, and Google would follow that to find the best sites that users loved, and score those useful sites highly. |
Sounds like Nirvana. How many more levels til we get there?
| 8:04 pm on Jun 12, 2003 (gmt 0)|
I have mentioned before innocent people being banned for hidden text. Spoilers are one thing that comes to mind.
For example - google yields almost exactly 10,000 results when searching for [highlight spoilers] while all the web has over 50,000 results.
I am of course among those that think it is better to ignore than penalize.
If you do it the current way, are you able to reinclude sites that use this as an integral part of their site?
Also: [font color=#FFFF11]It seems unfair to be able to ban a page on most boards by doing something like this[/font]
Last Also: Now all someone has to do to fool google's invisible text filter - is do the search that I did - and copy thier methods for hiding spoilers.
Seems like google will spend as much time trying to make invisible text filters as good SERPs. I hate to see 100 posts a month with "My competitor is using invisible text - and google can't catch them."
| 9:30 pm on Jun 12, 2003 (gmt 0)|
GG, thanks for your comments.
There's been a lot of discussion about hidden text so I don't feel I'm off topic.
Would the new algo penalise a site that used the same font colour as the background if this text was within a table with a different colour.
I ask this question as a coloured side navigation bar in say blue with white text links within a white page with the main content in black or blue would look nice and be pleasing to the eye (also I have some sites like this).
Moving on, I also have some sites where the background colour of the table is set by using a background image, I can image that these sites could be spidered and be penalised as it could appear that hidden text is in use.
I have started recoding the sites (none of which had hidden text and some don't look quite as good as before), for example some had a background image the width of the table with a feature on the left to make it look more arty, these will be replaced with a table background colour, also white text/links on blue would be changed for black text/links on blue.
On other sites I've taken out the background colour from the body tag, white background still shows and I have white links in a blue table in the navigation and black text on a white background in the main page content.
Am I wasting my time or being paranoid?
It would be very useful to know exactly what constitutes hidden text as I learnt a long time ago that trying to trick search engines in this way would only give short term results and I like to build sites that last but my customers like them to look good.
I'm sure that if we were all a little clearer on what constitutes hidden text the web building community would be able to avoid potential pifalls.
| 10:20 pm on Jun 12, 2003 (gmt 0)|
>It would be very useful to know exactly...
Please - the moderators are busting a gut to keep this thread to discussions about the answers, and me, just a humble user, I'm scrolling past message after message that should be a new thread.
Let's start new threads for new questions and keep this one as a concise and focused summary of the questions that were answered...
| 10:28 pm on Jun 12, 2003 (gmt 0)|
GoogleGuy // cloaking //
I'm involved in the Flash community and one of the things that is very hard for Flash developers is to get good rankings for sites that use Flash Based Content.
Now I read recently at [sitepoint.com...] that you should cloak, that is have a database of your content and test to see if GoogleBot (for example) is visiting and serve him content you have in your file. The author of that article does state you should ONLY serve the actual content
But how is Google going to like this? And on that topic what plans does Google have for Indexing Flash based content? (Eg Altavista, and Atomz both apparently can index flash based content).
| 10:34 pm on Jun 12, 2003 (gmt 0)|
<A: Let me answer a more interesting question: have you ever taken action on a relative/friendís site that had spam? And the answer to that is yes. :) Our hidden text detection recently found hidden text on the page of someone I knew from college. That page got the same treatment as any other page. When the white-on-white text was removed, the page came back just fine and everyone was happy. The take-home message is that the spam guidelines apply uniformly.>
GG opened the subject of hidden text and I for one would like to know what is constituted as hidden text, I'm also sure that this information would be of value to a lot of us on this forum.
<despair>Maybe I should keep my questions to myself, I thought it was a valid point, after all what is white on white text, time will tell.</despair>
[edited by: Marcia at 11:00 pm (utc) on June 13, 2003]
[edit reason] Let's stay on topic, please. [/edit]
| 10:43 pm on Jun 12, 2003 (gmt 0)|
|Don't use unauthorized computer programs to submit pages, check rankings, etc. Such programs consume computing resources and violate our terms of service. Google does not recommend the use of products such as WebPosition Goldô that send automatic or programmatic queries to Google. |
i was wondering if this [touchgraph.com] falls in the category of unauthorized computer programs?
| 11:21 pm on Jun 12, 2003 (gmt 0)|
GG, why does Google continue to respect ODP?
Surely the DMOZ is by now seriously compromised?
Their systems seems to be technically flawed to the extent of being close to unuseable, and their independance is now completly gone as they are controled by AOL.
| 11:26 pm on Jun 12, 2003 (gmt 0)|
>>GG opened the subject of hidden text and I for one would like to know what is constituted as hidden text, I'm also sure that this information would be of value to a lot of us on this forum. <<
Well, I think it's an eminently sensible question, Symbios, and my suggestion was that you started a new thread with this topic, in the hope of disentangling it from an essentially closed set of topics, and of making sure that the thread got noticed.
I'm sorry you felt that my post offered the least value of those you mentioned, and you have a fair point - I should have left it to the moderators. However, I wasn't writing to dole out rhetoric, merely out of a desire to see this thread neat, compact and self-contained.
That said, I'm delighted to hear that you share a domicile on the South Coast of the UK, and others here really won't know what they're missing, But you play your cards much closer to your chest than I do, as your profile is as empty as my bank account...
My thoughts are that your site is fine. The definition of Hidden Text is not "White", it's "Hidden", and you seem to have no desire to conceal, merely to present clearly.
Which is what good website design is all about, after all.
| 11:33 pm on Jun 12, 2003 (gmt 0)|
>Their systems seems to be technically flawed to the extent of being close to unuseable,
Works adequately on the editor side. As for people having trouble using the directory at dmoz.org, from Google's point of view the fact their mirror of it is 100% reliable is a plus for them.
>and their independance is now completly gone as they are controled by AOL.
I see zero evidence that AOL is trying to manipulate the ODP for their own end. More likely is that AOL barely notices or cares about the ODP.
| 12:41 am on Jun 13, 2003 (gmt 0)|
|I'm not positive that I'm a huge fan of the theming arguments that people have made--some of the most useful links I've seen are from "off-topic" sites--but I would definitely agree that it helps users to link to useful, relevant, related sites. |
I've never thought there was any theming in Google's algo. If you read "The Anatomy of a Large-Scale Hypertextual Web Search Engine" it explains how they use two inverted indexes on the front end to get a list of document IDs for further investigation.
One is the "fancy" index and the other is the "plain" index. The first one indexes words in URLs, anchor text, and titles. The second one indexes all words.
If they have enough hits after checking the "fancy" index, then they don't even have to go on and check the larger "plain" index. Lots of searches are satisfied without even needing to go on to the plain index. Each of the "hit lists" derived from each of these two indexes uses a different set of metrics. For example, fancy words consider font, capitalization, etc. Plain words consider position in the document and proximity.
This approach explains why, to my mind at least, the keywords that are "fancy" tend toward the top of the SERPs. There's nothing "theming" about it; it's a hit counter. A one or a zero!
| 12:49 am on Jun 13, 2003 (gmt 0)|
>>why does Google continue to respect ODP?
dmoz isn't perfect. there are some link farmers in there. yet these sites often have high PR. how much value is given to dmoz links?
| 1:03 am on Jun 13, 2003 (gmt 0)|
We're off topic on what WebGuerrilla asked for but...
Google respects DMOZ because it's the best human edited index on the net.
The value of a DMOZ link for Google is dependant on the PR of the cat and the number of links. The greatest value in being included is that it will ensure that you will be found by every SE, not just Google. If you're in the directory, you will eventually be in G, Ink, Fast, AV, Overture, and your Uncle Bob's homegrown search engine.
| 1:53 am on Jun 13, 2003 (gmt 0)|
|Q: Are the ODP and Google sites naturally occuring PR10 (PR11 for Google?) or is there some manual intervention for larger sites to ensure they have suitable PR? |
A: Itís all natural.
I'm still working over the "DMOZ/ODP PR is natural" statement in my mind.
I'll buy that its natural to the calculation, but circumstances surrounding its backlinks are unnatural.
While other mirrors of the ODP aren't indexed and don't count as backlinks [google.com], every page of Google's Google Directory mirror links to the ODP home page (dmoz.org) and also links to dmoz.org/about.html and these links do count as hugely valuable backlinks. Just a sidenote, the "about" page has even more backlinks (30,000 more [google.com]) and a PR10 compared to the home page's PR9.
So in this sense, it is not natural, whether you want to make the case that the other mirrors should count or Google's mirror should not. Based on their own standards, no mirror should be indexed. It seems to me that google sought to boost the importance of the ODP in this manner while maintaining the integrity of the PR system.
| 2:11 am on Jun 13, 2003 (gmt 0)|
Fair argument, Dolemite. I guess Google couldn't bear to PR0 their own directory. ;)
| 2:41 am on Jun 13, 2003 (gmt 0)|
Actually, to be fair, Google Directory changes DMoz data in a way that makes it more useful (at least to me). Ordered by PR (approximately) rather than alphabetically (which can be gamed by webmasters or cat editor) and links to subcats are all listed together, instead of in the little groups cat editors like to stick them in (maybe because they want to set off their pet subcat or something -- I have seen this happen). Plus, Google bolds categories with many entries below them. I find it much more useful than dmoz straight up.
| 5:47 am on Jun 13, 2003 (gmt 0)|
Hey, I just wanted to thank everyone for being polite 'n' all. It definitely makes me want to do the Q&A thing again in a while. After the next update, I'd also like to get people's feedback(the constructive kind :) to make sure we're doing everything we can to give the best search results.
| 5:58 am on Jun 13, 2003 (gmt 0)|
Congratulations for introduction of Slide show but is not so helpful as it takes so much time to download results.
Thanks for the answers.
This thread seems to deal extensively with spam related issues. There are many more questions for Google as we know that there are a lot many major changes happening at google related to issues like spam(which you have already answered), backlinks, page rank algorithms, site submission and inclusion of web pages (for both new and old websites) in Google search and image search, multiple domains, etc.
Would you like to answer them in another thread?
| 6:00 am on Jun 13, 2003 (gmt 0)|
|After the next update, I'd also like to get people's feedback(the constructive kind to make sure we're doing everything we can to give the best search results. - GoogleGuy |
So there is going to be a next update :)
| 6:00 am on Jun 13, 2003 (gmt 0)|
So there will be at least one more "update" in the sense we understand it. Good stuff, thank you, GG.
| 6:12 am on Jun 13, 2003 (gmt 0)|
Happy to try to help, Beachboy. :)
Okay, it's late and I'm gonna surf around the rest of the web for a while before heading to bed..
| 6:28 am on Jun 13, 2003 (gmt 0)|
Thanks Googleguy, don't get lost in the gap though.
| This 152 message thread spans 6 pages: < < 152 ( 1 2 3  5 6 ) > > |