homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

google.co.il is totally broken
Data on .il sites is missing

 1:18 pm on May 27, 2003 (gmt 0)

The last "dance" has left the Israeli/Hebrew sites totally broken in google SERPS.

The serach is for the name of a major news site in Israel. Five out of the nine datacenters return the URL of the site without the title and this is just an example of searching a Latin string.

Searching for a Hebrew string in google.co.il returns a list or URL's without any other information, such as the title of the pages or (God forbid) the search word in the context of the page.

This isn't the first time it happens.

Back in November it also happened.

If I remember correctly, the last dance begun on May 5th. I think I read somewhere that it ended on the 20th. Today is the 27th.

If Google is not goint to fix the problem sometimes soon, Israeli users are going to lose confidence in Google.

Tommorow I am going to teach a search techniques course and I really don't know what to tell my students.

People have claimed that Google is the "Knowledge Operating System" of the Internet.

I don't know about that...

[edited by: heini at 1:45 pm (utc) on May 27, 2003]
[edit reason] removed urls per TOS / thanks! [/edit]



 5:42 am on Jun 2, 2003 (gmt 0)

Hi hanan_cohen! I found your blog posting via scripting.com, so I poked around google.co.il and couldn't find anything wrong--the searches that I tried worked. It could be that the site you are talking about was down when we crawled; then we could only show a link and not a snippet. Could you fill out a spam report (just because that's easy for me to access) and mention your name/nickname? I'm curious to see what search isn't working for you.


is the right url to use.. thanks again.


 5:12 pm on Jun 2, 2003 (gmt 0)

Aha. I see what you're talking about. I tried a query or two like "test" and saw results with descriptions, but I think you're talking about missing snippets?

I'll check on what we're doing to resolve this. Thanks for mentioning it!


 5:34 pm on Jun 2, 2003 (gmt 0)

Thanks for looking into it.

Try the queries here:

Click the links near Search text =

(Dear moderators, please don't edit out this URL. I have no other ways of communicating with GoogleGuy).

[edited by: Marcia at 10:27 pm (utc) on June 2, 2003]


 6:21 pm on Jun 2, 2003 (gmt 0)

And another srange thing. The problem only accures for web pages! Searching for an Hebrew and Arabic word on all file types got me correct search results.

It seems that some index algorithms choke on Hebrew/Arabic web pages.

There might be an explanation for this. Both Arabic and Hebrew are written and displayed from right-to-left and have two standards for publishing on the web.

One is the "Visual" standard where the text is stored from left-to-right and dislplayed, using a special font, from right-to-left.

The other is the "Logical" standard where the text is stored from right-to-left and displays from right-to-left.

You will see that when you search for a Hebrew or an Arabic word from the main search page, and then click the advanced search link, the string will appear TWICE in the field "with at least one of the words". One in the order you have typed it and another, reveresed. Google does that in order to find pages in both standards, Visual and Logical.

Text in files other than web pages is stored only from right-to-left (Logical) and this is the reason why I think the problem is with the reversing mechanism.

Hope that helps.


 6:44 pm on Jun 2, 2003 (gmt 0)

I GOT IT. I know where exacly the problem is. The problem is not with the index, it's with the display mechanism. (maybe it has to do with the index too.)

The display mechanism of web pages in Hebrew or Arabic is not sure what standard the page is in so it doesn't display any information on the page. No page title, no snippet, no cache link.

If the disply mechanism is sure what standard the page is in, like in non-web-pages files (DOC,XLS,PDF etc.) it knows how to display the information for the page. If it doesn't know, it looks to see if the site is listed in DMOZ and if it does, it displays the information from DMOZ.

The problem can be in two places.

The first place is the index itslef where the display standard of the page is stored. If the information on the display standard is missing, the display mechanism cannot know how to diaply the information about a page and only the URL of the page is displayed.

The second place for the problem can be somwhere in the display mechanism itself that doesn't know what to do with the display standard it gets from the index.

If the problem is with the index, the problem will be hard to solve, because the display standard information will have to be collected again for all the pages in the index.

If it's with the display mechanism, it's "only" a software bug that can be easily solved, since it already worked.

And maybe I don't know what I am talking about...


 7:43 pm on Jun 2, 2003 (gmt 0)

The same peoblem accures also with the other RTL languages, Persian and Urdu.


 8:28 pm on Jun 2, 2003 (gmt 0)

Me think Hanan_Cohen should work for Google!

What say you GoogleGuy?


 11:12 pm on Jun 2, 2003 (gmt 0)

If only all our reports were so thorough! :) I agree with what hanan_cohen was saying--it's not a problem in the index, but rather with the generation of snippets (the display mechanism). That's a good sign; it might take a few days to nail it down and get a fix pushed out, but it shouldn't require any crawling/indexing changes, because we have the correct documents. Thanks again, hanan_cohen. I'll drop by again if I find out more info from talking to people around here.


 12:08 am on Jun 3, 2003 (gmt 0)

Heh, if only you could get your PR notched up by one or two points, or moved a page or two up the SERPs, for sending in a useful report...


 8:30 am on Jun 4, 2003 (gmt 0)

Hebrew SERPS are getting better but strangly inconsistant.

I search for the name of the organization I work for, שתיל, get nicely formatted search results. Since there are many pages, I click the "more pages from this site" and get a SERP with some good results and some links with only the URL.

(this posting is also a testing of the multi linguality of this forum)


 9:31 am on Jun 4, 2003 (gmt 0)

I don't know about anybody else, but I get complete vertigo when I see anything on a page going from right to left. Half the time I think the page is backto-front.upside-down. :) Interesting to note that google also suffers from language disorientation.


 12:40 pm on Jun 4, 2003 (gmt 0)

I thought that you might want to know that the descriptions started to reappear today in one of our hebrew sites in google.co.il


 2:48 pm on Jun 4, 2003 (gmt 0)

I'm glad that things are coming back. I heard that snippets on .co.il would come back in the time frame of a few days. Hopefully snippets there should be completely back to normal soon. hanan_cohen, sometimes just urls show up if we saw that url during a crawl but didn't have the time or resources to fetch that url. So sometimes it can be normal behavior to see only the url. Keep us posted of how things look, and I'll let you know if I hear any more news around here.



 3:14 pm on Jun 4, 2003 (gmt 0)

Hi hanan_cohen and thanks for bringing this up. This issue has been going on for a while. I brought it up in this thread:
but it was more than a week after this started to happen. I thought it has something to do with the latest update, because everything else seems to be a little crazy with the Google results these days, so I decided to just wait and see.

... one thing I agree though, the more Google is taking their time, the more Hebrew surfers will lose their confidence in it.


 3:25 pm on Jun 4, 2003 (gmt 0)

Hi GoogleGuy,

Actually the results have just significantly improved right after my previous post (probably while I was still writing it.)

I am glad to see that you care about your Israeli audience :)


 6:52 pm on Jun 4, 2003 (gmt 0)

We do. :) Keep me posted if you see problems in the future. :)


 10:47 pm on Jun 4, 2003 (gmt 0)

I am glad to see that you care about your Israeli audience :)

Wrong. The problem was/is with all Right-to-Left languages, Arabic included. According to the Cambridge Encyclopedia of Language, 181,000,000 people speak Arabic. I think that not more than 6,000,000 people speak Hebrew.

Do the math.

Hanan Cohen
***Love and Peace***


 10:04 am on Jun 5, 2003 (gmt 0)

I disagree.

Which audience is more important - a million Web surfers who use the Web 3-4 hours a day, or one hundred million who use the Web for 15 minutes a week?

Google is the only major search engine to provide a Hebrew interface and a co.il domain. AV and ATW do provide Hebrew results but they haven't bothered to translate their sites into Hebrew. Other majors don't have Hebrew support whatsever. That only shows the differnce between GG and the rest. The GG folks are much better to understand the meaning of a globalized Web and they are far ahead of everyone else.

I believe GG is the number 2 search site in Israel right now (after Walla!) and it is on its way to become number 1. Way to go GG!


 12:54 am on Jun 6, 2003 (gmt 0)

Hey, let me know if snippets still aren't shown after this, but things should have been completely back to normal earlier today. There's some good reasons why it was a little hard to push this fix, but nonetheless almost all servers should have a new binary that handles this correctly now.

If anyone does still see any problems, please let me know.


 7:06 am on Jun 6, 2003 (gmt 0)

Thanks and sorry but its not working yet. Here is a screenshot.
The file name contains the date and time for California.

Search string : שתיל

The search string is the name of a website I run and I know the URL's intimately. Sometimes I get the title and snippet and sometimes I don't.


 9:15 am on Jun 6, 2003 (gmt 0)

Very strange. I see snippets for some results on this search, so it may be more than just the display mechanism. Let me look into this search with some engineers here.


 11:14 am on Jun 6, 2003 (gmt 0)

this might not just be an Hebraic Arabic Urdu problem.
I have got it with German pages too from March 20. (of March 20 just index page is indexed and has snippets, all 30 subpages do not have the snippets and are not indexed.

hope it helps


 1:07 pm on Jun 8, 2003 (gmt 0)

While we're at it... perhaps this is not the best place to ask, so GG guy, if you're still here, please direct me as to where I can send the following problem with Google AdWords in Hebrew:

When writing ads in Hebrew where it says: "Headline (maximum 25 characters)" it only realy allows 13 Hebrew charcters. The descriptions only allow 18 Hebrew charcters long lines (instead of 35 characters in English.)

The rates for ads in Hebrew are the same rates as the English rates. Why do we get only half the text space?


 3:40 pm on Jun 8, 2003 (gmt 0)

... and another thing, check out the following page:


it starts nicely with Dr. Eric E. Schmidt, then Sergey Brin, Larry Page... then the bio of Omid Kordestani reads something like:

"Omid Kordestani has more than 12 years of experience in...
Kordestani got an MBA from Stanford"

Then comes Wayne Rosing which reads something like:

"Wayne's got

The translator (who I bet was drunk at the time) has left when he got to McCaffrey's bio, so it remains in English, and afterwards the translator returns to unfinish the job.

GG - please add to your "to do" list - "translate management.html into Hebrew"


 7:15 pm on Jun 8, 2003 (gmt 0)

I'll pass that on, rbester. :)

amazed, you can also see a url with no snippets if we saw the url but didn't get a chance to crawl it.


 10:11 pm on Jun 8, 2003 (gmt 0)


Global Options:
 top home search open messages active posts  

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved