Opened 8 years ago

Last modified 8 years ago

#10947 new Bug report

In filenames Filezilla replaces German umlaut with normal letter + diaresis marker

Reported by: Hubert Schölnast Owned by:
Priority: critical Component: FileZilla Client
Keywords: changes filenames Cc:
Component version: 3.20.1 Operating system type: OS X
Operating system version: El Capitan, Version 10.11.6

Description

My local computer is a Mac (OS X El Capitan, Version 10.11.6)
My remote server has Ubuntu 16.04
The FileZilla version is 3.20.1

I use FileZilla to connect my Mac with the Server. I use empty directories on both computers.

I also activate:
View -> Directory Comparison -> Enable
(This does not influence the problem, but helps to see it)

I use ssh to log into my remote server, and there I write this command (I use a German Keyboard Layout):

touch ä

Then i switch to Filezille and there I click the Refresh-button. On the left side (local directory) i still see an empty directory. On the right side (remote computer) I can see the file with the name »ä« and a size of 0 byte. It has a yellow background which means, that there is no matching file on the other side (which is true).

Then i use FieZilla to download this file. On the left side appears a new file with the name »ä«. This seems ok so far, but then I click the Refresh-button again, and now something weired happens:
On the left side is a file named »ä« that does not match with a remote file, therefore it has a yellow background. And also the file on the remote computer with the name »ä« has no partner on the other side, and therefore it has a yellow background.

Now I upload the local file »ä« to the remote computer: The result is: Two files on the remote computer! One named »ä« (this one matches with the local file), the other named »ä«, which has no matching partner.

I go back to my ssh console of my remote computer and type

ls -als

it displays:

total 8
4 drwxr-xr-x 2 root root 4096 Aug 20 09:34 .
4 drwxr-xr-x 3 root root 4096 Aug 20 08:45 ..
0 -rw-r--r-- 1 root root    0 Aug 20 09:23 ä
0 -rw-r--r-- 1 root root    0 Aug 20 09:23 ä

I type

touch ä
ls -als

and then i see this:

total 8
4 drwxr-xr-x 2 root root 4096 Aug 20 09:34 .
4 drwxr-xr-x 3 root root 4096 Aug 20 08:45 ..
0 -rw-r--r-- 1 root root    0 Aug 20 09:23 ä
0 -rw-r--r-- 1 root root    0 Aug 20 09:40 ä

There are two independent files, one named »ä«, the other named »ä«. Looks like the same name, but thats not true. They have different names!

Here is the problem:

The original file has the name »ä« this is the Unicode Character 'LATIN SMALL LETTER A WITH DIAERESIS' (U+00E4). But when Filezilla downloads a file with this letter in its name to my Mac, it transforms this one character into two characters, which are:

Unicode Character 'LATIN SMALL LETTER A' (U+0061)
followed by
Unicode Character 'COMBINING DIAERESIS' (U+0308)

Most fonts (not all!) display (U+0061) followed by (U+0308) as a ligature that looks exactly like the single character (U+00E4). But (U+0061) followed by (U+0308) (two characters) is NOT identic with (U+00E4) (one character).

I can't say since which version of FileZilla this problem exists, but older versions of FileZilla did not have this bug.

When I download a file and then upload it again, i don't want to create a duplicate file with a new name. I want that the filename doesn't change!

Change History (4)

comment:1 by Tim Kosse, 8 years ago

FileZilla passes the combined character to the system's open() syscall to create the local file. It's OS X itself that's at that point doing the decomposition into base character and combining diacritic mark.

comment:2 by Tim Kosse, 8 years ago

Related to #10794

comment:3 by Hubert Schölnast, 8 years ago

Yes you are right, I also found that out in the meantime. I already posted a bug report to apple using this form: www.apple.com/feedback/macosx.html
Please can you also send a bug-report to apple?

comment:4 by Tim Kosse, 8 years ago

This behavior of OS X is intended functionality. HFS+ has been designed to store filenames in NFD.

While normalization forms are generally a good thing, whoever came up with the idea to have multiple normalization forms in the Unicode standard deserves to be hanged, drawn and quartered, then precomposed again and repeated thrice over for each normalization form.

Note: See TracTickets for help on using tickets.