Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

When the sitemap is on the map

         

lucy24

4:41 pm on Jun 4, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



We all know that g### indexes everything it can get its hands on, including robots.txt and the sitemap. There have been threads about it. But this is a new one on me.

Every months or so I download the Keywords table* to look for any wonkies or things that don't belong. (Their own precaution: "How did 'Cialis' get to be #3 on the list?") This one's got a pair of whoppers. One: high on the list is my own domain name, which occurs nowhere in any text. Two: its sole source is "/sitemap.xml" ... which also shows up under some other keywords. (I think it's the top 10 pages containing the word, so it would include any directory name that's also a keyword.)

What the bleepity bleepity? Anyone else seeing this?

Wait, there's one more piece, though I'm ### if I can see a connection. I recently redid my sitemap by making it much, much smaller. No point in listing every single page on the site; they're all linked anyway. New version just lists the directories and main transfer points, in case the front page gets lost or the ssi's fail to include. The number of occurrences of my domain name is way too high for this new slimmed-down sitemap-- but it corresponds exactly to the number of indexed pages on the old one.

If anyone can shed light I would be grateful. Otherwise it just gets filed under Ours Not To Reason Why.**


* The current list goes: "rats, [name], Nunavut, cat, [name], father, fonts..." I will go out on a limb and say that nobody else in the world has this precise keyword sequence ;)
** "thumpity, thumpity, someone had blundered".

aristotle

9:06 pm on Jun 5, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm not sure I understand what you're concerned about. I believe that the words in this list are just the words that Google associates with your site. Maybe the domain name was lifted from the source code.

lucy24

9:58 pm on Jun 5, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The domain name doesn't occur in the source code. It occurs absolutely nowhere except in the sitemap, which is not human-viewable. (Unless, of course, they go snooping and ask for it by name, like robots.txt.) This is the first time the sitemap has ever shown up in the keywords list. That's what put me into "What the ###?" mode.

For those who have never downloaded a keywords list: It comes as a comma-delimited wad of information. It's in some database format or other; I just run a quick RegEx to convert it into an html table. Top 200 words with total number of occurrences of each, and the top 10 pages that use the word. There are a few stopwords but not as many as you'd think, and they get confused by curly quotes.