Opened 11 years ago
Last modified 11 years ago
#9118 new Bug report
utf8 and non-utf8 characters corrupt filezilla cache?
Reported by: | Arkadiusz Miskiewicz | Owned by: | |
---|---|---|---|
Priority: | normal | Component: | FileZilla Client |
Keywords: | utf-8 | Cc: | |
Component version: | Operating system type: | Linux | |
Operating system version: | PLD Linux Th |
Description
Test case for filezilla 3.7.3
Go to:
ftp://arm-filezilla@ftp.arm.beep.pl/
(password is: filezilla-project.org
the access will be there for few weeks for testing)
There is structure like shown below (using lftp unix cmdline client; it uses OPTS UTF8 ON).
Some entries have proper utf8 encoding like "test_polskich_znaków" while other have cp1250 encoding ("RZ 47 Jelenia G�ra").
According to https://wiki.filezilla-project.org/Character_Set filezilla won't support invalid encoding. Well, that not great solution (it could always indicate that there are some files, like lftp does, so user would be aware of the problem) but this bug isn't about this.
If I enter into "test_polskich_znaków"
( http://imageshack.us/a/img6/9378/0aoc.jpg )
and then into "gfgdfg" then directory list in filezilla gets (sometimes) corrupted ( http://imageshack.us/a/img543/2641/op5b.jpg ). filezilla starts thinking that there is such weird/corrupted, second directory there along with properly encoded one. Why ? No idea but it shouldn't do that.
lftp arm-filezilla@ftp.arm.beep.pl:/> ls -alR drwxr-xr-x 2 0 999 42 Dec 10 11:20 . drwxr-xr-x 2 0 999 42 Dec 10 11:20 .. drwx---r-x 3 10089 999 42 Dec 10 11:21 filezilla.arm.beep.pl ./filezilla.arm.beep.pl: drwx---r-x 3 10089 999 42 Dec 10 11:21 . drwxr-xr-x 2 0 999 42 Dec 10 11:20 .. drwxr-xr-x 6 29634 nogroup 110 Dec 10 11:15 test_polskich_znaków ./filezilla.arm.beep.pl/test_polskich_znaków: drwxr-xr-x 6 29634 nogroup 110 Dec 10 11:15 . drwxrwxrwx 1 0 0 22 Dec 10 11:20 .. drwxr-xr-x 4 29634 nogroup 60 Nov 19 10:35 RZ 47 Jelenia G�ra drwxr-xr-x 2 29634 nogroup 34 Dec 10 11:21 RZ 47 Wa�brzych drwxr-xr-x 2 29634 nogroup 34 Dec 10 11:22 RZ47 ZIELONA G�RA drwxr-xr-x 2 29634 nogroup 10 Dec 10 11:09 gfgdfg ./filezilla.arm.beep.pl/test_polskich_znaków/RZ 47 Jelenia G�ra: drwxr-xr-x 4 29634 nogroup 60 Nov 19 10:35 . drwxr-xr-x 6 29634 nogroup 110 Dec 10 11:15 .. drwxr-xr-x 2 29634 nogroup 34 Dec 10 11:21 2 sweter -50% drwxr-xr-x 2 29634 nogroup 34 Dec 10 11:21 SF JELENIA G�RA ./filezilla.arm.beep.pl/test_polskich_znaków/RZ 47 Jelenia G�ra/2 sweter -50%: drwxr-xr-x 2 29634 nogroup 34 Dec 10 11:21 . drwxr-xr-x 4 29634 nogroup 60 Nov 19 10:35 .. -rw-r--r-- 1 29634 nogroup 0 Dec 10 11:21 testfile1.txt ./filezilla.arm.beep.pl/test_polskich_znaków/RZ 47 Jelenia G�ra/SF JELENIA G�RA: drwxr-xr-x 2 29634 nogroup 34 Dec 10 11:21 . drwxr-xr-x 4 29634 nogroup 60 Nov 19 10:35 .. -rw-r--r-- 1 29634 nogroup 0 Dec 10 11:21 testfile2.txt ./filezilla.arm.beep.pl/test_polskich_znaków/RZ 47 Wa�brzych: drwxr-xr-x 2 29634 nogroup 34 Dec 10 11:21 . drwxr-xr-x 6 29634 nogroup 110 Dec 10 11:15 .. -rw-r--r-- 1 29634 nogroup 0 Dec 10 11:21 testfile3.txt ./filezilla.arm.beep.pl/test_polskich_znaków/RZ47 ZIELONA G�RA: drwxr-xr-x 2 29634 nogroup 34 Dec 10 11:22 . drwxr-xr-x 6 29634 nogroup 110 Dec 10 11:15 .. -rw-r--r-- 1 29634 nogroup 0 Dec 10 11:22 testfile4.txt ./filezilla.arm.beep.pl/test_polskich_znaków/gfgdfg: drwxr-xr-x 2 29634 nogroup 10 Dec 10 11:09 . drwxr-xr-x 6 29634 nogroup 110 Dec 10 11:15 ..
Change History (4)
comment:1 by , 11 years ago
comment:2 by , 11 years ago
Hmm,
"Status: Invalid character sequence received, disabling UTF-8. Select UTF-8 option in site manager to force UTF-8."
is likely the cause for the whole problem. Looks bad on filezilla side.
https://wiki.filezilla-project.org/Character_Set says:
"If an RFC 2640 compliant client sends OPTS UTF-8 ON, it has to use UTF-8 regardless whether OPTS UTF-8 ON succeeds or not."
and filezilla did send it:
Command: OPTS UTF8 ON Response: 200 OK, UTF-8 enabled
So it looks like filezilla is violating the rule above and that's the reason for the problem. Setting "force utf8" makes the problem go away.
comment:3 by , 11 years ago
The solution could be: if charset "autodetect" is used and if OPTS UTF8 ON was send then never automaticly switch utf8 off
comment:4 by , 11 years ago
Keywords: | utf-8 added |
---|
Log from filezilla (filezilla was run in LANG=en_US.UTF-8 env this time; previously LANG=pl_PL.UTF-8)
Note filezillas:
vs lftp:
filezilla somehow incorrectly interpretes character in CWD response?
And the same from lftp client: