Forum Moderators: Robert Charlton & goodroi
Recently, on one of Matt's video's he also commented that the matter was complex.
When i looked into these forums [ unless i missed something ] i could see nothing that described the elements into a high level format that could be broken down and translated into a framework for easy management.
Does anyone believe they have mastered the comprehensive management of dupe content on Google into a format that can be shared on these forums?
Site started May 2006.
Sep 06 - 103 pages indexed, 70 supplemental.
Oct 06 - 106 pages indexed, 103 supplemental.
Ripped out a whole section of javascript drop-down navigation links which took up about 1/4 of the html on each page (and identical on every page), so it may have been seen by G as duplicate content.
Now: - 104 indexed, 70 supplemental, and going down every day.
Are the Supplemental Results for pages that are live, or are they for URLs that redirect or are 404?
Just counting "how many" URLs have a "Supplemental" status is a futile exercise, as the only Supplemental results that ever need any sort of fixing are those that represent live active pages on the site.
.
Tip: Put javascript into an external file, and call it with:
<script type="text/javascript" src="/jscode/the.script.js"></script>
http://www.example.com./
If you put a . at the end of the URL, the site returns the same result as www.example.com without the . at the end. It even works on Google's own site - [google.com....]
Would this be considered duplicate content of the homepage? How does Googlebot handle this?
So I guess I see it as a good thing that these pages are now coming out of the supplemental results.
Page impressions are up 30% since these pages have started to come out of the supplemental results.
[edited by: tedster at 2:52 am (utc) on Oct. 14, 2006]
[edit reason] use example.com [/edit]
What is different about each page? What makes it different to the other pages? Put those facts in the title.
.
A SERP that says:
mysite.com - Sitemap
mysite.com - Sitemap
mysite.com - Sitemap
mysite.com - Sitemap
mysite.com - Sitemap
mysite.com - Sitemap
mysite.com - Sitemap
mysite.com - Sitemap
mysite.com - Sitemap
mysite.com - Sitemap
mysite.com - Sitemap
mysite.com - Sitemap
is totally useless to a potential visitor.
.
On the other hand, titles like:
mysite.com - Sitemap - Widgets
mysite.com - Sitemap - Gadgets
mysite.com - Sitemap - Gizmos
mysite.com - Sitemap - Doodads
is much better (but usually with "mysite.com" generic information last on the line, not first).
[edited by: tedster at 4:24 pm (utc) on Oct. 14, 2006]
[edit reason] use example.com [/edit]
That's why URLs with session IDs spread in the SERPs, why non-www and www spread in the SERPs, but extra redundent parameters quickly fade away. That is, a URL that is only generated from the outside is not trusted as much as one that is also generated from within the site.
You have more than one page for your site map?
We have around 2,000 sitemap pages restricted to approximately 40 links per page. The idea was to drive link strength and make indexing faster with smaller pages [ around 9k ].
The meta titles/descriptions are like this:
Sitemap 1 - Widget 1
About Us - Template link 1 , template link 2
Sitemap 2 - Widget 2
About Us - Template link 1 , template link 2
There is no meta description, so this is filled with text taken from the links on the site template
There is no advantage to visitors to view these pages, except possibly from a navigation point of view.
From our perspective the only thing that concerns me is the lack of speed with which Google is indexing the sites, and i wondered if duplicated content, or content that is deemed too similar on sitemap pages might inhibit the overall indexing process for a site.
[edited by: Whitey at 3:13 am (utc) on Oct. 15, 2006]
We have sites where there are subheadlines (intro to a story) that are pretty good descriptions of what the page is all about. How asfe is it to use them as descriptions?
Or do I get them to write descriptions for google separately?
To anyone who trully recovered from duplicate penalties... How long did it take for your site to recover from those penalties AFTER you have applied the fixes?
Did getting some fresh links (in the process) help?
On one of my test sites I am doing a Disallow: * to check if all the duplicates can be removed from the index faster than it would normally take....
Yes, that type of Supplemental Result is shown in the search results for one year after the page is deleted. This is so that people who looked at that information some time before, can still find that URL again, and then either view the Google cache of the now-gone page, or visit some other part of your site instead.
>> Tried to also disallow the non-existing urls via the robots.txt with no luck whatsoever. <<
No. Don't do that. Google needs to "see" the 404 status in order to start the 'removal clock' ticking. The "noindex" tag doesn't help much in this case. The "noindex" removes URLs from the normal index, but seems to have little effect on Supplemental Results.
I use the page content, always over 700 words, and the first 100 words are my meta description.
The titles can be quote similar at times but this is somewhat necessary:
How to Find Widgets in Blue.
Classic Widgets in Red
etc.
Perfect Meta Titles and Descriptions
Have a look at Pageoneresults post over here - it looks pretty much bang on: [webmasterworld.com...]
Meta Title
Page title elements are normally 3-9 words (60-80 characters) maximum in length, no fluff, straight and to the point. This is what shows up in most search engine results as a link back to your page.
Meta Description
The meta Description Tag usually consists of 25 to 30 words or less using no more than 160 to 180 characters total (including spaces). The meta description also shows up in many search engine results as a summary of your site.
Hope that helps.
I've heard of folks using content extracted from the page to populate the meta description, to make it relevant.
[edited by: Whitey at 8:34 am (utc) on Oct. 18, 2006]
For my part, I think Google is smart enough to know that example.com and example.com/index.html are the same. I base this on the presumption that it isn't very likely that index.html and example.com will ever differ.
So for my two cents, the /index.html issue is a possible but not very likely source of duplicate content.
(I did redirect my http://example.com links to www though using cpanel.. but I'll try to keep it on one thing at a time.)
It was my "terrible" assumption. But i am smiling again - at least on duplicate content issues [ maybe not filters overall ] .
[edited by: Whitey at 3:28 am (utc) on Oct. 19, 2006]
This issue has happened to me - index.html specifically on one site, and has not recovered after 2 months.
Whitney - thanks for the heads up on a great post about metas and titles and uniqueness :P
RewriteCond %{THE_REQUEST} ^.*\/index\.html?
RewriteRule ^(.*)index\.html?$ http://www.example.com/$1 [R=301,L]
I can't read mod-rewrite rules so I hope it is enough. At any rate, you guys may have saved my internet life .. but we'll never know for sure.
When a look at http headers using an online tool, I see my server is sending a 301 status (perfect), but it also sends a typical apache error page:
<HTML><HEAD>
<TITLE>301 Moved Permanently</TITLE>
</HEAD><BODY>
<H1>Moved Permanently</H1>
The document has moved <A HREF="http://www.domain.com/">here</A>.<P>
<HR>
<ADDRESS>Apache/1.3.36 Server at domain.com Port 80</ADDRESS>
</BODY></HTML>
Maybe it's a silly question but, is this ok? Will Google index that page? Should I personalize that page and add a noindex metatag?
<head>
<style>
a:link{font:8pt/11pt verdana; color:red}
a:visited{font:8pt/11pt verdana; color:#4e4e4e}
</style>
<meta HTTP-EQUIV="Content-Type" Content="text-html; charset=Windows-1252">
<title>Cannot find server</title>
</head>
<SCRIPT>
function doNetDetect() {
saOC.NETDetectNextNavigate();
document.execCommand('refresh');
}
function initPage()
{
document.body.insertAdjacentHTML("afterBegin","<object id=saOC CLASSID='clsid:B45FF030-4447-11D2-85DE-00C04FA35C89' HEIGHT=0 width=0></object>");
}
</SCRIPT>
<body bgcolor="white" onload="initPage()">
<table width="400" cellpadding="3" cellspacing="5">
<tr>
<td id="tableProps" valign="top" align="left"><img id="pagerrorImg" SRC="pagerror.gif"
width="25" height="33"></td>
<td id="tableProps2" align="left" valign="middle" width="360"><h1 id="textSection1"
style="COLOR: black; FONT: 13pt/15pt verdana"><span id="errorText">The page cannot be displayed</span></h1>
</td>
</tr>
<tr>
<td id="tablePropsWidth" width="400" colspan="2"><font
style="COLOR: black; FONT: 8pt/11pt verdana">The page you are looking for is currently
unavailable. The Web site might be experiencing technical difficulties, or you may need to
adjust your browser settings.</font></td>
</tr>
<tr>
<td id="tablePropsWidth" width="400" colspan="2"><font id="LID1"
style="COLOR: black; FONT: 8pt/11pt verdana"><hr color="#C0C0C0" noshade>
<p id="LID2">Please try the following:</p><ul>
<li id="instructionsText1">Click the
<a xhref="javascript:location.reload()" target="_self">
<img border=0 src="refresh.gif" width="13" height="16"
alt="refresh.gif (82 bytes)" align="middle"></a> <a xhref="javascript:location.reload()" target="_self">Refresh</a> button, or try again later.<br>
</li>
<li id="instructionsText2">If you typed the page address in the Address bar, make sure that
it is spelled correctly.<br>
</li>
<li id="instructionsText3">To check your connection settings, click the <b>Tools</b> menu, and then click
<b>Internet Options</b>. On the <b>Connections</b> tab, click <b>Settings</b>.
The settings should match those provided by your local area network (LAN) administrator or Internet service provider (ISP). </li>
<li ID="list4">If your Network Administrator has enabled it, Microsoft Windows
can examine your network and automatically discover network connection settings.<BR>
If you would like Windows to try and discover them,
<br>click <a href="javascript:doNetDetect()" title="Detect Settings"><img border=0 src="search.gif" width="16" height="16" alt="Detect Settings" align="center"> Detect Network Settings</a>
</li>
<li id="instructionsText5">
Some sites require 128-bit connection security. Click the <b>Help</b> menu and then click <b> About Internet Explorer </b> to determine what strength security you have installed.
</li>
<li id="instructionsText4">
If you are trying to reach a secure site, make sure your Security settings can support it. Click the <B>Tools</b> menu, and then click <b>Internet Options</b>. On the Advanced tab, scroll to the Security section and check settings for SSL 2.0, SSL 3.0, TLS 1.0, PCT 1.0.
</li>
<li id="list3">Click the <a href="javascript:history.back(1)"><img valign=bottom border=0 src="back.gif"> Back</a> button to try another link. </li>
</ul>
<p><br>
</p>
<h2 id="IEText" style="font:8pt/11pt verdana; color:black">Cannot find server or DNS Error<BR> Internet Explorer
</h2>
</font></td>
</tr>
</table>
</body>
</html>