Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google Sitemap Common Words in Site Content

Showing garbage/spammy content not found anywhere in my site

         

Jetgirl

4:17 pm on Apr 25, 2006 (gmt 0)

10+ Year Member



I have two very small sites. I have added XML formatted sitemaps into Google Sitemaps, and one of them is showing unexpected results in the Stats / Page Analysis / Common Words / In your site's content. It is a technical site, and some of the reported "content" is expected, the rest is just garbage/spam (i.e. íàõóé, èäè, òû, along with some terms definitely not found on my site).

As the site is relatively small, I do all the coding by hand, and I KNOW that I don't have anything like that in my code. Any help in tracking down how this is happening (so that I can fix it!?) would be greatly appreciated.

-Jetgirl

Jetgirl

5:39 pm on Apr 25, 2006 (gmt 0)

10+ Year Member



Update: I just found a result when doing a "site:www.example.com example" search in Google that has some of the suspect content in the search result description. However, the page does not contain the spammy content. How could the results have content that is not on my pages?

gellydonut

7:19 pm on Apr 25, 2006 (gmt 0)

10+ Year Member



This sounds like it could be a problem with character encoding. It does not only affect search engines but alternative browsers and could make your site unreadable to a small percentage of users. David Baron offers a nice explanation here: [dbaron.org...]

Make sure you declare your encoding (UTF-8 is best way to go in my opinion) in the header:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

Jetgirl

7:47 pm on Apr 25, 2006 (gmt 0)

10+ Year Member



gellydonut -

Thanks for your reply. I already am declaring my encoding with the following meta tag:

<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />

The only difference between your suggestion and my tag is the capitalization difference between UTF and I'm guessing that the capitalization doesn't mean anything.

The real problem seems to be with phantom text on a VERY few pages of my site. I have looked at the pages and there is no hidden text (white on white) that I can see - but I will be talking with the host for my site tonight to make sure they don't have a virus that could be causing some back-end addition of spam content to pages of their users. That's the only thing I can think of right now that would do this.