|Google search and the dash "-" character|
What effect does a hyphen have on Google caching and search
| 11:53 pm on Apr 9, 2007 (gmt 0)|
We have a website that dynamically lists about 38,000 different part numbers...which is a recent addition to its capabilities.
However, as Google is making its way through the site to cache all the new content we are noticing that even when a dynamic page is listed under the site:www.example.com search, if that part-number is searched for with Google it would not list it at all for our site. Eventually we found out that it is the part-numbers containing "-" that have for about 90% of the time have the problem, whereas the others without the "-" are fine. Why...this part number with the hyphens is in the page header, meta tags, and as text on the page.
Hmm... So, after reading up some things I found in the forum, I tried doing a search on those part-numbers but exchanged the - for a space...bingo...that works. However, that is not a lot of good for us as we need people to be able to search with the dashes.
One thing...I had the URLs in the natural form for a php search which contained the part-number with hyphens, but now have them rewritten with Apache to look like an html page. Those cached pages I am talking about are with the old URL still...would that make any difference?
Has anyone had this problem before?
| 3:37 am on Apr 10, 2007 (gmt 0)|
I have a similar issue with Bible Verses. The ideal delimiter would be a colon, which of course is not a valid url. Thus, I have used dashes in my urls, and colons in the titles of those pages.
Overall, I have found Google searches to be fairly consistent regardless of the delimiter. Most results are the same whether the numbers are separated by colon, space or dash.
However, reversing the numbers or running them together will yield entirely different results.
In my case the pages are all static urls. Is it possible the php is substituting a delimiter such as a hyphen with some other character? Just a wild guess. Good luck...I will be intersted in further input as this could easily become a concern for me as well.
| 10:40 pm on Apr 10, 2007 (gmt 0)|
Thanks for your input, and congrats on rendering the Word!
I wonder though, because when that part number including that character is rendered on the html page, then copied and pasted into a Google search...we still have the problem... :-S
You know, another thing I noticed was for a separate problem I had... Some of our part numbers have a "/" in them, which of course didn't work when I tried to use them in the rewritten URL's. So, I did a test search with Google, and found that when I search it with the "/", it came up with a cached page that had a "-" in place of the "/"...and no reference to a "/".... Seems like Google sees them as an interchangeable character...and a hyphen as a space, but not as a hyphen...that is the part I find wierd...
| 1:10 am on Apr 11, 2007 (gmt 0)|
After rereading your first post, I am indeed quite puzzled as to why your searches are not recognizing the hyphens. All it would need to do is recognize them as spaces, and everything else seems to point to hyphens and spaces being relatively interchangeable.
It seems you have covered your bases quite well by including the part numbers in text and tags. I couple more wild guesses to cover the only differnces I can find in our formats:
Would it be possible to use the part numbers in the title as well (or did you do this also)?
Would you also list the part numbers without the hyphens? Seems counter-intuitive but could be seo friendly to have a variety.
Otherwise you've used the numbers more thoroughly than I have, so I will also be interested to see better solutions than these.
| 1:24 am on Apr 11, 2007 (gmt 0)|
|So, I did a test search with Google, and found that when I search it with the "/", it came up with a cached page that had a "-" in place of the "/"...and no reference to a "/".... Seems like Google sees them as an interchangeable character...and a hyphen as a space, but not as a hyphen...that is the part I find wierd... |
Google sees them both as delimiters. I think I remember that Google once upon a time did not differentiate between hyphens, slashes, commas, semi-colons, and colons... but I just did a quick test search between two-words hyphenated and two words just spaced, and I am seeing a difference now, which surprises me. Haven't tested that in a while.
| 4:00 pm on Apr 11, 2007 (gmt 0)|
It might be because Google uses the - (hyphen) as an operator - (minus) to return results "without the word", and as a mathematical operator.
EG results -word
Would give you results without (the word) "word" in them.
Of course in searching for Bible verses, I would much rather have results with the Word included. =)
It would be interesting to know if Google treats the En Dash, (– or –) and Em Dash (— or —) differently than the - (hyphen, minus) character.
BTW have not tested the difference in results-word & results -word, but in using Google as a calculator, 45-3765, will give you the mathematical result.
| 7:07 pm on Apr 11, 2007 (gmt 0)|
That is cool guys, thanks for the input :)
Justin, I think that you could be onto something there. My boss and I were doing some more research this morning as I noticed an internal problem with part numbers containing the plus symbol "+" as some of our part numbers contain it. This plays havoc with retrieving it from a search for the php code.
However, when we search these part numbers on Google, it comes up with pages from some other sites...and when we look at the cached version of these pages, google seemingly has ignored the actual plus symbol and interprets the search to merely contain all the other characters...
Hmmm...so, I actually have a section down the bottom of my page where I put dynamic keywords pertaining to that part or brand. So, what I have done now is use the str_replace() function in PHP to take any part number with a space, hyphen, plus symbol, or decimal and spit out different combinations of it at the bottom of the page as a keyword. That is, one with all the characters string together and no symbols, one with spaces instead of symbols, one with every one of those symbols converted to a "+" sign, one with all symbols as dashes... Oh, and then they are also placed into the meta keywords along with the rest of the relevant dynamic info..
I guess the only way to really find out is to sit and wait the next 2-3 weeks that it seems to take Google to spider us and see what happens...
Oh, and in answer to an earlier post, yes...I do have the original version of the part number dynamically in my page header. Thanks, it is a good idea!
| 7:53 pm on Apr 11, 2007 (gmt 0)|
I like to use dots between words.
| 1:59 pm on Apr 13, 2007 (gmt 0)|
One of our main keywords is hyphenated. In Google Webmaster Tools, the searches for what I have to assume were orginally [keyword1 key-word2] show up as [keyword1 "key word2"], quotes included. That would seem to be a hint for how it treats such phrases.
However, the two searches (hyphen vs. quotes) when actually performed do not return identical results. For us, the difference is between #10 and #11, but another site is #9 for one and #5 for the other.
I've never been quite sure how to intrepret that, but I wonder if using punctuation that matches the query is a slight bonus, despite it being treated as a phrase.
| 3:26 pm on Apr 17, 2007 (gmt 0)|
Hi all and thanks for your thoughts and findings!
This problem is still a mystery for me, as sometimes the included dashes work, and we come up even #1 in the search...and then for some other part # with dashes, the search still doesn't work at all... So far, it just seems wierd and random.
I am still hoping that the keywords with the symbols stripped and replaced will actually work though.
| 9:34 pm on Apr 23, 2007 (gmt 0)|
Just as a follow-up... Google has now spidered several of those pages which I had put the variations of the part number on (without dashes, with spaces...bla bla).
So far so good! It seems to work! We now come up for those searches that contain the hyphens, and even the original text on the page that contains hyphens is highlighted on the cached pages after a search.