homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

Google has not indexed our Spanish pages - is it a character/filename problem?
Is it because of the ˝?
Tropical Island

 1:39 pm on Aug 24, 2002 (gmt 0)

We have been monitoring the forums for the last few months and have made changes that have moved us up - many thanks.

Our site has a large Spanish section and when I look at our main Entry page and Index page there is no page rank. In all the pages that show oursite/espa˝ol.htm there is no PR (greyed out) and yet all other pages have one. Is Google unable to read the ˝ ???? Please help. We were last spidered Aug. 4 so I assume that we'll be revisited soon.



 1:52 pm on Aug 24, 2002 (gmt 0)

i have a similar problem.

files indexed as: /filmename.htm show a grey bar while the update is in progress.

but html ending with / are a bit diffent.
sometimes at first is shows gray, but when i remove the / and press enter again, i shows the pr in 8/10 cases.


 2:04 pm on Aug 24, 2002 (gmt 0)

I suggest avoiding all special caracters and capitals in filenames. This will cause problems with many search engines.


 2:51 pm on Aug 24, 2002 (gmt 0)

>I suggest avoiding all special caracters and capitals in filenames

For domain names and all file names ASCII characters only.
No extended characters whatsoever.

This is a major problem of the whole structure of the internet. It's easily imaginable how aggravating this is for all people using non latin alphabets - in fact the majority of the wolrd population. Take Asia, Eastern and central Europe alone.

But even Spanish, German, the scandinavian languages have a lot of extended characters - no go for the web.
You just have to go for the nearest equivalent in ASCII, in this case that would be plain espanol

Tropical Island

 2:53 pm on Aug 24, 2002 (gmt 0)

Thank you Macguru.

As the only pages that do not show PR are the ones with the ˝ I can only assume that this is the problem. I also have some pages with "MyPage.htm" captilization and these do not seem to be a problem however just to be on the safe side I'm going to change them all before the next visit by Google.

Again thanks.


 3:02 pm on Aug 24, 2002 (gmt 0)

Good decision to avoid capitals too. This will hurt you with Fast. Very important for the Spanish speaking market share. I read somewhere here that Fast will return 404 on your pages if you use capitals.


 5:20 pm on Aug 24, 2002 (gmt 0)

google indexes 'sameroot.com/index.html' and 'sameroot.com/inDex.html'
as 2 diffrent files,you will get diffrent 2 listing!

(notice the capital in the 2nd index)


 5:35 pm on Aug 24, 2002 (gmt 0)

>>as 2 diffrent files,you will get diffrent 2 listing!

ikbenhet1, I hope your not using this trick for duplicate content. Because any site using this trick can be at risk on loosing both listings someday.

Not? Fine! Now is it worth the loss of traffic from Fast? Your decision.

Also, how is linkage? Can you link "Foo.htm" as good as you can link "foo.htm"?


 5:44 pm on Aug 24, 2002 (gmt 0)

i really didn't do this on pupose, i just saw it in the listing, the same site listed 5 times with differences in the url(diffrentcapitals)

added: i never get visitors from fast, i'm just the king in google.

Tropical Island

 5:55 pm on Aug 24, 2002 (gmt 0)

I have just replaced all the caps in my urls with small letters. Can I assume that as the spelling is exactly the same and, as the browser doesn't care, that it will not affect current listings in Google.

In other words if I went from "mysite/MyPage.htm" to "mysite/mypage.htm" that it should end up on exactly same page????


 9:49 am on Aug 25, 2002 (gmt 0)

Yes, that is the idea.

I converted some sites using capitals in sitename. Since it was generally only one of many much needed steps to improve rankings, I could not measure the importance of this isolated step.

As ikbenhet1, mentionned. Google stores some filenames in URL using capitals. I guess when the bot will pay a visit again thoses pages can 404. But is your site is fully crawlable. It should find the new ones. (this can cause a month or 2 of traffic loss on thoses pages. I am not sure about this.)

I believe changing your filenames will have no ill effects on Google. I am sure it can improve your trafic from Fast. It wont change your directory listings since they probably just indexed the home page. ( Hey thatrings a bell [webmasterworld.com]if you have sufficient content in Spanish... )

It's the best long term option.

Tropical Island

 2:14 am on Aug 26, 2002 (gmt 0)

Again thanks for the help. "Macguru"

I read the thread on foreign languages, especially the part about being able to answer the queries. One of our sites is in 4 languages - English, Dutch, Spanish and German - and Google has given us good placement. Fortunately my wife is Dutch and speakes and writes all 4 laguages fluently which has helped immensely in promoting our tourist business.

I spent all day eliminating the ˝'s and caps in our 100+ page site and can offer this word of warning if there are any others out there with the same problem. If you use an HTML editor to change the pages with caps (we use FP 2000) it will ask if you want to correct all the related links. Because this is not a spelling change it will NOT change the links. You must go through page by page and do them one at a time.

On a smaller site that we have on a "Linux" server when we changed the urls with caps to regular size none of the links would work until we changed all the links. This did not happen on the "Windows" server with the bigger site. In other words if you change "mysite/MyPage.htm" to "mysite/mypage.htm" the Linux server will not follow the link even though it is the same spelling.

Still have one 50+ page site to do tomorrow when I can see straighter.


 11:25 am on Aug 26, 2002 (gmt 0)

I dont know about Front Page. But one can change links sitewide in DreamWeaver whathever the case when you check the right option. It must be there somewhere in Front Page too. This is too much of a tedious job doing it by hand and can lead to omissions or errors.

When I replace renamed from capped to 'normal' files on a Linux server (the only ones I use) I always delete the older pages before I upload the new ones.

Tropical Island

 12:03 pm on Aug 26, 2002 (gmt 0)

I think the problem with FP is that all I did was "rename" the page without the caps. When you preform this operation there is a prompt asking if you want FP to update the relevant links. I did this however it left all the caps in the links. I can only assume that it didn't see this as a change as the spelling was the same. Maybe I should just "save as" and delete the "capped" page as you suggest.

Fortunately my "Linux" site is relatively small and making the changes was no problem.


 12:16 pm on Aug 26, 2002 (gmt 0)

I think there is some confusion here. (my english can be improved yet, sorry.)

I am sure there is an option in Frontpage to also change your on page links to no caps when you rename the files. We have very knolegeable members willing to help in the WYSIWYG and Text Code Editors forum. If most of the job is not all done yet. I suggest you browse this forum a bit. It must ba a simple "preference" check box somewhere.

I delete old files on the linux server before uploading the new ones.

<edid>replaced the "browser forum" buy "WYSIWYG and Text Code Editors forum"</edit>

[edited by: Macguru at 1:49 pm (utc) on Aug. 26, 2002]


 12:19 pm on Aug 26, 2002 (gmt 0)

>Fast will return 404 on your pages

I know a site in Fast which is indexed just fine with Capitals in the filenames...100+ pages.
The same adresses do not exist without caps on the site though.

The site also does fine in Google and AV with these filenames.

If it were my site I would have done it differntly to be on the safe side.

Btw..this site also has many spaces in the file names..as in
"file name.html"...also no problem for either fast, google or av.


 12:42 pm on Aug 26, 2002 (gmt 0)

Ouch! Damian, now I am hurt. (and probably not the only one)

I just checked Fast and Lycos and they do perfectly right with caps in URL. I can't find the thread here where the glitch was mentionned. Does this make caps in filenames problem mentionned everywhere on the net some old story?

Can anyone expand on this?

Tropical Island, I hope you saved the old version of your site, because the filename change can cause some temporarly negative impact on your traffic. Please wait until we know more before doing anything.


 1:06 pm on Aug 26, 2002 (gmt 0)

Like Damian said: I'd use no caps, no spaces, in any case, just to stay on the safe side. Why risk anything in this regard?
It's great if it does not cause trouble with the majors, but even if you loose from some smaller engines, that would be bad enough.
The extended characters anyhow should be avoided by all means.
Caps- on apache servers caps in filenames don't work anyhow, I believe? Never tried....


 1:09 pm on Aug 26, 2002 (gmt 0)

I have seen problems in google and ink where capital filenames were converted to lowercase. I am an NT guy so it does not effect my pages, but it does effect unix and other systems. I would be careful using caps in filenames if you are on a unix box. I have 2 incoming links from a site that has 1 page that links. 1 in caps and the other lowercase. I know the site owner and there is 1 uppercase page. Google somehow got it lowercase even though it says it has no incoming links except site links. Its not really a problem since he is on nt.

4.0 versions of netscape like to choke on file names with spaces. I am not sure about new versions but old versions hate them.


 2:29 pm on Aug 26, 2002 (gmt 0)

The IIS server is case insensitive on finding filenames on the server. Apache is case sensitive. However, there are lots of good reasons to always use lowercase filenames on the web server. It messes up search engines and log files. On Apache, people have a hard time matching capitalization when typing in a URL. On IIS, people linking to your site may sometimes put "index.htm" and sometimes "Index.htm". Google and logging programs treat that as two different files. This is something I learned along the way.

So always use lowercase filenames, regardless of your web server.


 5:54 pm on Aug 26, 2002 (gmt 0)

Capitals should cause no problems at all. Except where users are expected to type in an address.

The universal acceptable characters for filenames are:

a-z (may or may not be treated same as above)
_ (underscore)
. (if used for filename.filetype separator only)

Most systems can cope with the '-' character as well.


 1:47 am on Aug 27, 2002 (gmt 0)

It seems that they should look at everything as lowercase as the searches do. Along the same lines, I was having a problem with pages getting spidered that had arguments in the URL. It appears that arguments with a vaule of nothing cause googlebot not to crawl the page. A good example is with affiliate tracking.

Google apparently doesnt like the following URL:


Will it like this URL?


Tropical Island

 12:21 pm on Aug 27, 2002 (gmt 0)

First of all I would again like to thank everyone for the help even though it may have been initially a little inaccurate. It forced me to clean up 2 sites that needed it, especially the ˝ problem. The 3rd site I'm going to leave alone except for correcting the "site map".htm to "sitemap.htm". That was just a dumb error. It's only been up for one month so hopefully changing it now won't cause too any problems.

I look forward to my Spanish Home Page and Index page getting spidered next go-round.


 8:47 pm on Aug 27, 2002 (gmt 0)

I have noticed the capitalisation problem when editing links. A webmaster changes from
www.domain.com/example/index.htm to
and they break the link, resulting in 404 error and loss of trafic, not clever. These days, I try to make ALL my filenames lower case with_no_gaps-either, it is just safer.


 11:56 am on Sep 1, 2002 (gmt 0)

I do a lot of batch rename jobs and have found html rename! to be a good tool. It does things like projectwide rename
jpg to jpeg
htm to html
caps to small

It's smart to make a backup of your site before you start.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved