Forum Moderators: open

Message Too Old, No Replies

Non Alphanumeric Characters and Google

Which ones count, and when?

         

ciml

2:45 pm on Jun 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



We've known for a while that "-" in the page counts as a space in the query, while "_" in the pages matches only "_" in the query. There must be other interesting characters. I'll start with "#".

A#, B#...G# count literally, while H# does not. This is thought to be due to musical notation. J# didn't count in the past now but it does now, presumably because C# and J# are well known in programming circles.

Oddly, X# doesn't seem to bring up any results. I can think of no reason for that. If this thread ranks for X# in a while, then we know it's because there were no pages for x#, but that doesn't explain how it got on the list.

What other characters should we know about?

takagi

4:41 pm on Jun 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google sometimes handles non-alphanumerical characters like '-' or ',' in a special way.

If you search for the number of pages indexed in Google according to their home page, you can leave out the commas or not: "Searching 3,083,324,652 web pages" [google.com] and "Searching 3083324652 web pages" [google.com] both give the same results (please try it, and you can see that WebmasterWorld beats Google's home page for this string although they have a PR11 for that page).

Also a search for a word with a hyphen like CD-ROM, week-end, Coca-Cola, Toys-R-Us, will sometimes show SERPs with words highlighted that are written without this hyphen.

I didn't know the '#' in A# was handled as a letter. I presume ciml's question was restriced to the Latin-1 fontset (ISO 8859-1). In that case I don't know of any other special cases.

takagi

6:30 pm on Jun 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Just did some tests with the ampersand (&) and it looks Google treats it like a normal letter. A search for just this symbol gives 509,000,000 pages, which is more than some of the real letters (e.g. the 's' has 403,000,000).

If a keyword contains an ampersand, than it should be entered in the search query (Q&A, AT&T, C&A), if it is usually part of a multiword search string, it helps to add it (track & field, Barnes & Noble) but sometimes you need to add a '+' (Marks +& Spencer) to prevent it from being ignored. In such a case Google will warn you:

"&" is a very common word and was not included in your search

Although the vertical bar (the '¦' character) can be used to perform the OR function (which must be written in capitals), the ampersand is not regarded to be an AND for the simple reason that AND is not (yet) supported by Google.

skipfactor

7:12 pm on Jun 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I was searching for boats the other day, and the dealers don't like to list prices. Google treats the "$" as a space in both "sailboats $" and "sailboats$".

Stick a number after the "$", & Google sees it as one word. Stick a number in front of the "$" & it's ignored again.

Probably old news to you guys, but it's odd that Google returns a blank SERP for an "$" search: there's not the typical,"Your search - $- did not match any documents. No pages were found containing "$". Just a mostly a whitespace page & the "images" etc. tabs disappear.

added: this must mean that Google rejects the query altogether?

takagi

4:33 pm on Jun 15, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Today I will write about the asterix (*).

A search for this character results in a page with the following warning

No standard web pages containing all your search terms were found.

Suggestions:

- Try different keywords.

Also, you can try Google Answers for expert help with your search.

Also some headlines from Google News and a "Try Google News: Search news for *" option. So that is better than the SERP for '$' (see previous message).

In some other search engines the asterix can be used as a wildcard. For a long time I believed that Google didn't support any wildcards. And that is also the official information from Google [google.com].

.. Google does not use "stemming" or support "wildcard" searches.

But it is wrong. In a quoted string, the asterix stands for one random word.

A search for "make my day" [google.com] shows a SERP with pages containing that exact phrase (in total 66,800 indexed pages have this string).

A search for "make * day" [google.com] shows a SERP with a string like "Make My Day" but also "Make-Up Day", "Make ur Day", "Make his Day" and "Make Every Day" (in total 342,000 pages). So one word between 'make' and 'day', not zero, not more than one. Well, in some special cases there are two words in between 'make' and 'day' like in: "Make Someone's Day", "Make Valentine's Day" or "make Father's Day" (yes, this Sunday is Father's day). So somehow those 2 words are seen as one.

Unfortunately the wildcard cannot be used to find words starting by a certain string. If you hope to find both "make my day" and "make mother's day" with the following search "make m* day" [google.com] than you will be disappointed. It will only return pages with strings like "make m:y day".

In a search string you can also put more than one asterix. Searching for "make * * day" [google.com] will result in a list of pages with strings like "Make A Difference Day" and "Make progress every day".

That's it. Let's call it a day.