Opened 7 years ago

Last modified 6 years ago

#8036 reopened Bug report

somehow show invalid UTF-8 encoded filenames

Reported by: MIchael Seume Owned by:
Priority: normal Component: FileZilla Client
Keywords: ASCII, UTF-8, character set Cc:
Component version: Operating system type:
Operating system version:

Description

Files and folders with special characters in names are not displayed if you selected the ASCII character set for upload and the UTF-8 character set for Download.

Test:

  1. Create a file with the name ctestäöü.txt
  2. Set the character set in "Charset" - "Use costum charset" to "ASCII" (or use any other non UTF-8 FTP-Client).
  3. Upload this file to your utf-8 ftp server.
  4. Switch the character set in "Charset" - "Autodetect" or "Force UTF-8".
  5. The file will no longer dosplayed!

Change History (9)

comment:1 Changed 7 years ago by MIchael Seume

The file will no longer displayed:-)

comment:2 Changed 7 years ago by Alexander Schuch

My guess is that the created "byte-stream of characters" (filename) is not valid UTF-8 anymore. Maybe FileZilla should show a warning in such a case, but basically, nothing "sane" can be done.

comment:3 Changed 7 years ago by Alexander Schuch

Summary: Unvisible folder and files in result of mixing ascii and utf-8 accessinvalid UTF-8 encoding results in missing files

comment:4 Changed 7 years ago by Alexander Schuch

Operating system version: MAX OS X 10.7.3Mac OS X 10.7.3, 10.8

comment:5 Changed 7 years ago by Tim Kosse

Resolution: rejected
Status: newclosed

Looks like your server is broken if it accepts and stores filenames that aren't valid UTF-8. Unfortunately nothing here we can do.

comment:6 Changed 7 years ago by Alexander Schuch

Operating system type: OS X
Operating system version: Mac OS X 10.7.3, 10.8
Priority: blockerlow
Resolution: rejected
Status: closedreopened
Summary: invalid UTF-8 encoding results in missing filessomehow show invalid UTF-8 encoded filenames
Type: Bug reportFeature request

Let's assume there is an FTP server which is configured to use C locale. This means that all filenames which do not contains an embedded zero (\0) are allowed.

Further assume that different users with different FTP clients access that server.

Now assume that one of the users FTP client uses C locale as well. That user uploads a file with a filename perfectly valid for C locale, but invalid when interpreted as UTF-8.

Finally, assume FileZilla is used by a second user. FileZilla for one or another reason uses UTF-8 as character encoding. This user then browses to the directory where the first user uploaded files with valid C locale names which unfortunately are not valid UTF-8 encoding.

FileZilla seem to be unable to show that file in the filelist as it cannot create a filename representation as it fails to decode the name.

The user of FileZilla should NOT see an empty directory, as that is simply wrong. The problem should somehow be shown to the user.

1) If a filename cannot be properly interpreted using the selected character encoding, FileZilla could show a file with a name of "?" which is not selectable. The user knows that there is something more, and that the directory is not empty.

2) A warning icon could be shown somewhere telling the user that such a problem happened.

3) A popup could be shown as well.

I changed this from bug report to feature request.

comment:7 Changed 7 years ago by MIchael Seume

Have a look at the utf-8 decoder. A single ascii character between f0-ff will be recognise as part of a utf-8 sequence. The following ascii character breaks the sequence with error. No character will be generate. Because the filename is a valid ascii filename you can for this case simple transform ascii to unicode in copy byte by byte. The german character Ä is ascii c4 is Unicode 00c4.

comment:8 Changed 7 years ago by Tim Kosse

Resolution: rejected
Status: reopenedclosed

According to the FTP specifications, FTP uses either 7-bit ASCII or UTF-8. Use of any other encoding requires explicit negotiation through unspecified means.

In other words, the specs don't care what happens if another encoding is used. Why should I add an insane amount of workaround for something not covered by the specs?

Just switch to a server that _enforces_ UTF-8 and be done with it.

comment:13 Changed 6 years ago by MIchael Seume

Priority: lownormal
Resolution: rejected
Status: closedreopened
Type: Feature requestBug report

For professional use it is recommended to view all files on server. The only way to ensure to see all files is to switch filezilla to ANSI alternative to UTF-8. Only in ANSI-mode FileZilla shows all files. Special chracters will not show correct, but all files are visible.

It is assumed that the utf-8 coder and decoder is the problem. For file system access the standard utf-8-access is correct. For access to non-standard-strings the standard utf-8-decoder is critical. The use of byte characters in range of 0xf0-0xff with non utf-8 sequencens will abort the decoding. Please use an alternative decoder to preserve these bytes instead of abort the decoding process.
Please consider that I don't have access to server configurations from others. And I could change the client software from others.

Note: See TracTickets for help on using tickets.