Forum Moderators: Robert Charlton & goodroi
The site's name had been used before, but was not used for about two years. Best I can tell, it was clean before me. There are no legacy links out there that I can find. So the site pretty much started up fresh. The site was picked up by Dmoz after about two months time. This lead to the normal links appearing. Other than these, I pretty much ignored the linking part of the 26 steps. I was too busy writing content to bother with links.
Google picked up the site quickly and as of this morning it has 840 pages in its index. The site does well in Yahoo and ASK and to a lesser degree in MSN. I think the problem with MSN right now is the site's structure, which is section -> category -> article. Through the first couple of days, this month, I get about 80% of the referals from Yahoo and 10% from ASK. MSN and the rest make up the remainder. Google lags behind engines like Overture.
Articles range from 500 to 1200 words each. The definitions range from 50 to 500 words. In the beginning, Google lead the pack on referals. I ran an advertising campaign through Adwords and GoClick (has some money in the account) in September, then stopped in November.
In December / January alone I added about 300 pages to the site - mostly definitions. In January / February, I did some heavy on site optimization H2... bold, internal linking to other articles and defintions.
The site was #1 for its own name from July until around August, when it dipped to around #5. In December, it fell off the face of Google's earth and has remained there since. This morning it ranked #227 for the site's own name. The other interesting observation is that I used to sign my articles with my name. At one time I was #1 for my own name. Right now, I am at 100+. (I rank #1 in MSN, Yahoo and ASK for my name, it is not that common).
I've reported most of this to Google, but to no avail. At this point, I am wondering what I can do. The site is all white hat, but the site's name has to do with a very competitive area. More recently the site appears in Blogs supporting RSS feeds. I do not signature spam, only submit to feeds and directories. I rarely exchange links.
The site is a PR 4, with most other pages PR3. It is 140 backlinks - increasing with each update. I think that due to the ranking I have on Yahoo for some pretty popular searches, I am starting to get a lot of links from other sites (scrapers...) in addition to reputable sites. It shows close to 1,000 links in Yahoo.
At this point, I have some options:
Nuke the site from Google (remove it all together) and let it start all over. This option will not hurt my traffic one bit.
Continue to wait it out.
I am afraid that the site is under some kind of penalty. It really is white hat, but maybe it was a causualty of some filter - I really have no idea. If I continue to wait it out, the penalty may never be lifted. So why not just nuke it now? By nuke it I mean using the removal tool and remove the entire site from their index.
Any help is deeply appreciated.
w*w.example.com/index2.php?....
shows as 'file format unrecognized' in SERP's when you click the 'similar pages' link your homepage is #1 followed by other related sites.
I sure wouldn't want my homepage related to an "unrecognized file format"
You should consider putting a robots.txt entry
disallow: /index2.php
funny it seems to have no trouble with /index.php though.
A more serious problem is the SERP w*w.example.com/%22
this tanslates as w*w.example.com/"
I think this is directly related to the invalid base href tag on your homepage.
<base href="ht*p://w*w.example.com/" />
this url should be the absolute canonical url of the page example.com/index.html (or whatever the case)
you could just remove that META tag, my guess is that you installed that at the suggestion that it may help prevent 302 problems.
so google seems to confuse
/
/index2.php
/"
I would fix that base href and then see what it does with that index2 (or you could just disallow it if it doesn't need to be in there).
Also:
I would recommend submitting your robots.txt to the google removel tool to clean up some unwanted SERP's but NOT in this case because google is confused about the URL of your homepage it may produce unwanted results (removal of your home page)
So fix the base href first and after your homepage has a solid position (no confusion on googles part) you may want to do that also.
I took a look at your site, all looks good except 2 problems.
. . .
I think this is directly related to the invalid base href tag on your homepage.
I took a look at Billy's main page as well and stickied a few suggestions.
I guess my question on the above issue would be why he needs to use BASE at all? Unless it's included here purely for 302 page-jacking purposes?
I think more fundamental issues for him are
1) validating the site, (I know, "nobody validates")
2) changing a few metas: populate desc and kw, loose the others
but most importantly: the multiple H1's on the page. I'd remove all but 1 and then sub-section the rest into H2 . . . Also, make the H1 text relative to the site itself, not the 1st sub-topic on the page, ie [mutual fund name].
He's obviously done a lot of work on this site and I think a few of these small coding things may be a fundamental problem. Just my 2 cents. Comments?
You would be barmy to NOT run the site through a validator, even if it were to make sure there were no "show-stopper" major errors in the code.
Having looked at the code for over a thousand sites, I can tell you that 99% of all the HTML errors that I have ever seen are actually just the same 30 errors repeated over and over again. Learn how to fix those 30 problems and you can fix 99% of the whole of the web.
w*w.example.com/index2.php?....
shows as 'file format unrecognized' in SERP's when you click the 'similar pages' link your homepage is #1 followed by other related sites.
It may be that you aren't serving the page with the "text/html" MIME type that is required. Use a HTTP header checker (such as WebBug, or an online offering) to check this out.
I took a look at this too. This reference is actually to an RSS feed from the site. Not sure why Google cannot read the file. I will be looking into this.
A more serious problem is the SERP w*w.example.com/%22
I have no idea how this got in there. According to the documentation with the CMS I use, the base href is set correctly. The above SERP returns a 410, so I'm not sure why Google is keeping it.
validating the site, (I know, "nobody validates")
The site returns three errors, it used to return like 20. It's built using Mambo and I was able to Hack some of the code and clean most up. The problem is splitting my time between hacking code and writing content. They keep promising that the next version will validate, so I spend more time writing.
2) changing a few metas: populate desc and kw,
Again, not to make excuses, but Mambo has a known shortcoming with meta tags for Section and Category (including home page). Unless I set global meta (the same tags appear on all pages) for description and kw, then they don't show any. Again, this is on the hack list.
I'd remove all but 1 and then sub-section the rest into H2
Something that I had hacked already, using H1, but I can change to H2. This one I can fix.
Again thanks to all that took a look for me.
A more serious problem is the SERP w*w.example.com/%22
I have no idea how this got in there. According to the documentation with the CMS I use, the base href is set correctly. The above SERP returns a 410, so I'm not sure why Google is keeping it.
it doesn't matter that your CMS documentation says is ok.
In December, it fell off the face of Google's earth and has remained there since.
I have no idea how this got in there.
This is how this got in there
<base href="ht*tp://w*w.example.com/" />
translation
This page is located at ht*p.w*w.example.com/"
One thing you need to consider is that bots are a lot dumber than browsers, maybe whoever wrote the CMS documentation (on a program notorius for generating invalid code 'esp in headers') wasn't thinking about bots.
for a dumb bot base location of the homepage is usually ht*tp://w*w.example.com/index.html
the server automatically translates / to index.html and google knows that, but you are telling it that the location of / is actually /"
google asks for / and gets /"
google asks for /" and gets 404
google lists / as a supplemental
google lists /" as homepage
site falls off the face of googles earth.
now you are forcing a 410 on /"?
what you should do is give the full URL of the page in the base META tag because that is what the tag means.
Although I would agree that descriptions, H tags etc all play a worthwhile role in ranking, this base tag problem will overshadow any other SEO attempts to get this page ranked.
You seems to have lots of experience in this issue, and I am having a similar problem these days. I'd like to understand your advise to BillyS, but something are not clear to me.
So, he had this base declaration :
<base href="http://www.example.com/" />
You said it is invalid, though I don't understand why, where, or may be something got cut or edited in your post.
Also, you said this :
the server automatically translates / to index.html and google knows that, but you are telling it that the location of / is actually /"
Where is that double quote coming from?
My problem is I got my site indexed without the "www", but I have the links to my site using "www". I was thinking to use the "base" tag to try to fix this.
What would be your advise on this?
Thanks in advance.
Maybe the user-agent is misinterpreting the XHTML slash with the root and is seeing home as /"
OR:
the base href itself is a very low-level paremeter.
The base element: path information
[w3.org...]href = uri [CT]
This attribute specifies an absolute URI that acts as the base URI for resolving relative URIs.and further down the page
User agents must calculate the base URI according to the following precedences (highest priority to lowest):
1. The base URI is set by the BASE element.
2. The base URI is given by meta data discovered during a protocol interaction, such as an HTTP header (see [RFC2616]).
3. By default, the base URI is that of the current document. Not all HTML documents have a base URI (e.g., a valid HTML document may appear in an email and may not be designated by a URI). Such HTML documents are considered erroneous if they contain relative URIs and rely on a default base URI.
so if google is indexing /" as a page it must be that META tag. Like I said either remove it or at least specify the full uri of the document.
When present, the BASE element must appear in the HEAD section of an HTML document, before any element that refers to an external source. The path information specified by the BASE element only affects URIs in the document where the element appears.
I really do appreciate the time you took and the feedback you gave me on my site. I also have a lot of respect for g1smd after giving me some good advice and help too.
Yes, the my site is XHTML transitional and the site is based on Mambo. I took a look at many other sites (that ranked well) and they all used the same format as I did. Quite frankly this syntax is used to make this php based site convert the URI's correctly. (Using mod_rewrite)
In fact, based on the feedback I re-hacked some of the code to eliminate multiple H1 for article titles (multiple on one page) and converted those to links instead. It looks good (for the reader) and should eliminate any downgrade or penalty for the on-page SEO.
Quite frankly, the site continues to do well in Yahoo. Google now shows 868 pages in the index, 844 it recognizes as not "similar to" and only a handful as potential duplicates (URI only results).
I actually think I know where the reference to the www.example.com/" might have come from - a typo from me. The way the site used to work is that if someone typed in an incorrect URI, the site would return a 302 and point to the home page. Unfortunately, I use the google toolbar, which then tells google that this page actually exists. It spiders the site and finds my home page again at www.example.com/" even though there is no legitimate link anywhere on the web - only my typo! This created a nighmare in Yahoo.
I changed this process to return a 410 (gone), hoping that if Google or Yahoo tried to find any page that did not really exist, it would not point to my home page anymore.