Welcome to WebmasterWorld Guest from 18.208.211.150

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Problems with Google...

Reaching the point where I might nuke my site from Google...

     
7:13 pm on May 4, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member billys is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 1, 2004
posts:3181
votes: 0


I would deeply appreciate any help in solving my problem, because I am at the point where I might just nuke my site from Google and start all over... I started this site in June 2004 and followed the advice in the 26 steps, with one exception (more on that later). Since then, I have written over 300 articles and 300 more defintions that support the articles. Right now the site has about 850 pages.

The site's name had been used before, but was not used for about two years. Best I can tell, it was clean before me. There are no legacy links out there that I can find. So the site pretty much started up fresh. The site was picked up by Dmoz after about two months time. This lead to the normal links appearing. Other than these, I pretty much ignored the linking part of the 26 steps. I was too busy writing content to bother with links.

Google picked up the site quickly and as of this morning it has 840 pages in its index. The site does well in Yahoo and ASK and to a lesser degree in MSN. I think the problem with MSN right now is the site's structure, which is section -> category -> article. Through the first couple of days, this month, I get about 80% of the referals from Yahoo and 10% from ASK. MSN and the rest make up the remainder. Google lags behind engines like Overture.

Articles range from 500 to 1200 words each. The definitions range from 50 to 500 words. In the beginning, Google lead the pack on referals. I ran an advertising campaign through Adwords and GoClick (has some money in the account) in September, then stopped in November.

In December / January alone I added about 300 pages to the site - mostly definitions. In January / February, I did some heavy on site optimization H2... bold, internal linking to other articles and defintions.

The site was #1 for its own name from July until around August, when it dipped to around #5. In December, it fell off the face of Google's earth and has remained there since. This morning it ranked #227 for the site's own name. The other interesting observation is that I used to sign my articles with my name. At one time I was #1 for my own name. Right now, I am at 100+. (I rank #1 in MSN, Yahoo and ASK for my name, it is not that common).

I've reported most of this to Google, but to no avail. At this point, I am wondering what I can do. The site is all white hat, but the site's name has to do with a very competitive area. More recently the site appears in Blogs supporting RSS feeds. I do not signature spam, only submit to feeds and directories. I rarely exchange links.

The site is a PR 4, with most other pages PR3. It is 140 backlinks - increasing with each update. I think that due to the ranking I have on Yahoo for some pretty popular searches, I am starting to get a lot of links from other sites (scrapers...) in addition to reputable sites. It shows close to 1,000 links in Yahoo.

At this point, I have some options:

Nuke the site from Google (remove it all together) and let it start all over. This option will not hurt my traffic one bit.

Continue to wait it out.

I am afraid that the site is under some kind of penalty. It really is white hat, but maybe it was a causualty of some filter - I really have no idea. If I continue to wait it out, the penalty may never be lifted. So why not just nuke it now? By nuke it I mean using the removal tool and remove the entire site from their index.

Any help is deeply appreciated.

8:20 pm on May 8, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 16, 2004
posts:693
votes: 0


Hi BillyS
I took a look at your site, all looks good except 2 problems.

w*w.example.com/index2.php?....
shows as 'file format unrecognized' in SERP's when you click the 'similar pages' link your homepage is #1 followed by other related sites.

I sure wouldn't want my homepage related to an "unrecognized file format"
You should consider putting a robots.txt entry
disallow: /index2.php

funny it seems to have no trouble with /index.php though.

A more serious problem is the SERP w*w.example.com/%22
this tanslates as w*w.example.com/"

I think this is directly related to the invalid base href tag on your homepage.
<base href="ht*p://w*w.example.com/" />
this url should be the absolute canonical url of the page example.com/index.html (or whatever the case)
you could just remove that META tag, my guess is that you installed that at the suggestion that it may help prevent 302 problems.

so google seems to confuse
/
/index2.php
/"

I would fix that base href and then see what it does with that index2 (or you could just disallow it if it doesn't need to be in there).
Also:
I would recommend submitting your robots.txt to the google removel tool to clean up some unwanted SERP's but NOT in this case because google is confused about the URL of your homepage it may produce unwanted results (removal of your home page)
So fix the base href first and after your homepage has a solid position (no confusion on googles part) you may want to do that also.

8:40 pm on May 8, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


>> I sure wouldn't want my homepage related to an "unrecognized file format" <<

It may be that you aren't serving the page with the "text/html" MIME type that is required. Use a HTTP header checker (such as WebBug, or an online offering) to check this out.

10:33 pm on May 8, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Oct 31, 2004
posts:43
votes: 0


Reid:

I took a look at your site, all looks good except 2 problems.
. . .
I think this is directly related to the invalid base href tag on your homepage.

I took a look at Billy's main page as well and stickied a few suggestions.

I guess my question on the above issue would be why he needs to use BASE at all? Unless it's included here purely for 302 page-jacking purposes?

I think more fundamental issues for him are
1) validating the site, (I know, "nobody validates")
2) changing a few metas: populate desc and kw, loose the others

but most importantly: the multiple H1's on the page. I'd remove all but 1 and then sub-section the rest into H2 . . . Also, make the H1 text relative to the site itself, not the 1st sub-topic on the page, ie [mutual fund name].

He's obviously done a lot of work on this site and I think a few of these small coding things may be a fundamental problem. Just my 2 cents. Comments?

11:34 pm on May 8, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


>> 1) validating the site, (I know, "nobody validates") <<

You would be barmy to NOT run the site through a validator, even if it were to make sure there were no "show-stopper" major errors in the code.

Having looked at the code for over a thousand sites, I can tell you that 99% of all the HTML errors that I have ever seen are actually just the same 30 errors repeated over and over again. Learn how to fix those 30 problems and you can fix 99% of the whole of the web.

1:51 am on May 9, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member billys is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 1, 2004
posts:3181
votes: 0


Again, thanks for all the comments:

w*w.example.com/index2.php?....
shows as 'file format unrecognized' in SERP's when you click the 'similar pages' link your homepage is #1 followed by other related sites.

It may be that you aren't serving the page with the "text/html" MIME type that is required. Use a HTTP header checker (such as WebBug, or an online offering) to check this out.

I took a look at this too. This reference is actually to an RSS feed from the site. Not sure why Google cannot read the file. I will be looking into this.

A more serious problem is the SERP w*w.example.com/%22

I have no idea how this got in there. According to the documentation with the CMS I use, the base href is set correctly. The above SERP returns a 410, so I'm not sure why Google is keeping it.

validating the site, (I know, "nobody validates")

The site returns three errors, it used to return like 20. It's built using Mambo and I was able to Hack some of the code and clean most up. The problem is splitting my time between hacking code and writing content. They keep promising that the next version will validate, so I spend more time writing.

2) changing a few metas: populate desc and kw,

Again, not to make excuses, but Mambo has a known shortcoming with meta tags for Section and Category (including home page). Unless I set global meta (the same tags appear on all pages) for description and kw, then they don't show any. Again, this is on the hack list.

I'd remove all but 1 and then sub-section the rest into H2

Something that I had hacked already, using H1, but I can change to H2. This one I can fix.

Again thanks to all that took a look for me.

7:41 am on May 9, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 16, 2004
posts:693
votes: 0


A more serious problem is the SERP w*w.example.com/%22

I have no idea how this got in there. According to the documentation with the CMS I use, the base href is set correctly. The above SERP returns a 410, so I'm not sure why Google is keeping it.

it doesn't matter that your CMS documentation says is ok.

In December, it fell off the face of Google's earth and has remained there since.

I have no idea how this got in there.

This is how this got in there

<base href="ht*tp://w*w.example.com/" />

translation
This page is located at ht*p.w*w.example.com/"

One thing you need to consider is that bots are a lot dumber than browsers, maybe whoever wrote the CMS documentation (on a program notorius for generating invalid code 'esp in headers') wasn't thinking about bots.

for a dumb bot base location of the homepage is usually ht*tp://w*w.example.com/index.html

the server automatically translates / to index.html and google knows that, but you are telling it that the location of / is actually /"

google asks for / and gets /"
google asks for /" and gets 404
google lists / as a supplemental
google lists /" as homepage
site falls off the face of googles earth.

now you are forcing a 410 on /"?
what you should do is give the full URL of the page in the base META tag because that is what the tag means.

Although I would agree that descriptions, H tags etc all play a worthwhile role in ranking, this base tag problem will overshadow any other SEO attempts to get this page ranked.

7:59 am on May 9, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 16, 2004
posts:693
votes: 0


/index2.php?... is not an unknown file format (mime type)
where is google getting this idea of an 'unknown file format'?

" is the only unknown mime type I see here.

9:50 am on May 9, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Good point. Google could think that the file extension is just "

It is expecting .php or .html etc, and " is not valid.

10:34 am on May 9, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:Oct 30, 2002
posts:404
votes: 0


BillyS - do you have the "vanity" domains of this domain as well? I assume that the name is the .com - just wondering if you also own the .net and .org? And if you do, or any other domains that are "similar" domains, do you do any relative linking or is it all direct urls within your site?
Second question - when looking through your pages listed on a "site:" command do you see any supplemental results
11:35 am on May 9, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 22, 2001
posts:2044
votes: 0


DaveAtIFG mentioned internal linking, with what sounds like many pages with a single definition, could it be seen as trying to artificially "big" itself up?
7:20 pm on May 18, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Jan 31, 2005
posts:1651
votes: 0


Reid :

You seems to have lots of experience in this issue, and I am having a similar problem these days. I'd like to understand your advise to BillyS, but something are not clear to me.
So, he had this base declaration :
<base href="http://www.example.com/" />
You said it is invalid, though I don't understand why, where, or may be something got cut or edited in your post.

Also, you said this :

the server automatically translates / to index.html and google knows that, but you are telling it that the location of / is actually /"

Where is that double quote coming from?

My problem is I got my site indexed without the "www", but I have the links to my site using "www". I was thinking to use the "base" tag to try to fix this.
What would be your advise on this?
Thanks in advance.

8:02 pm on May 18, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member billys is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 1, 2004
posts:3181
votes: 0


I'm in the same boat, when I looked at many sites, this syntax:

<base href="http://www.example.com/" />

seems to be correct.

8:05 pm on May 18, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


It is only correct in an XHTML document.

The final / is not allowed in an HTML document.

9:26 pm on May 18, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 16, 2004
posts:693
votes: 0


Yes and the document is XHTML TRANSTIONAL

Maybe the user-agent is misinterpreting the XHTML slash with the root and is seeing home as /"

OR:

the base href itself is a very low-level paremeter.

The base element: path information
[w3.org...]

href = uri [CT]
This attribute specifies an absolute URI that acts as the base URI for resolving relative URIs.

and further down the page

User agents must calculate the base URI according to the following precedences (highest priority to lowest):

1. The base URI is set by the BASE element.
2. The base URI is given by meta data discovered during a protocol interaction, such as an HTTP header (see [RFC2616]).
3. By default, the base URI is that of the current document. Not all HTML documents have a base URI (e.g., a valid HTML document may appear in an email and may not be designated by a URI). Such HTML documents are considered erroneous if they contain relative URIs and rely on a default base URI.

so if google is indexing /" as a page it must be that META tag. Like I said either remove it or at least specify the full uri of the document.

10:06 pm on May 18, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 16, 2004
posts:693
votes: 0


I never noticed this before.
I'm using META for an external stylesheet before the base tag.
It is 'incorrect' but it seems to work.
Maybe not on all browsers though.

When present, the BASE element must appear in the HEAD section of an HTML document, before any element that refers to an external source. The path information specified by the BASE element only affects URIs in the document where the element appears.
11:22 pm on May 18, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member billys is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 1, 2004
posts:3181
votes: 0


Reid -

I really do appreciate the time you took and the feedback you gave me on my site. I also have a lot of respect for g1smd after giving me some good advice and help too.

Yes, the my site is XHTML transitional and the site is based on Mambo. I took a look at many other sites (that ranked well) and they all used the same format as I did. Quite frankly this syntax is used to make this php based site convert the URI's correctly. (Using mod_rewrite)

In fact, based on the feedback I re-hacked some of the code to eliminate multiple H1 for article titles (multiple on one page) and converted those to links instead. It looks good (for the reader) and should eliminate any downgrade or penalty for the on-page SEO.

Quite frankly, the site continues to do well in Yahoo. Google now shows 868 pages in the index, 844 it recognizes as not "similar to" and only a handful as potential duplicates (URI only results).

I actually think I know where the reference to the www.example.com/" might have come from - a typo from me. The way the site used to work is that if someone typed in an incorrect URI, the site would return a 302 and point to the home page. Unfortunately, I use the google toolbar, which then tells google that this page actually exists. It spiders the site and finds my home page again at www.example.com/" even though there is no legitimate link anywhere on the web - only my typo! This created a nighmare in Yahoo.

I changed this process to return a 410 (gone), hoping that if Google or Yahoo tried to find any page that did not really exist, it would not point to my home page anymore.

11:37 pm on May 18, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member billys is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 1, 2004
posts:3181
votes: 0


I never thought to look after this recent Index refresh - but I just looked and that page is no longer showing up in Google's index. Hmm, not sure if that's good or bad, but the 410 must have caused them to remove the page from their index.
1:01 am on May 19, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 16, 2004
posts:693
votes: 0


that's good that it's gone now - maybe that will fix your google woes.

your domain may have been split between / and /"

who knows which one google was counting as the homepage.

so the base probably works, I thought base was depecrated in XHTML

This 48 message thread spans 2 pages: 48