UTF-16 line endings get corrupted by uploading and downloading
|Reported by:||Jens Mühlenhoff||Owned by:|
|Keywords:||utf-16 unicode text corruption||Cc:|
|Component version:||Operating system type:||Windows|
|Operating system version:||Windows 7 Professional x64|
I uploaded a UTF-16 text file with the following content:
FF FE 48 00 65 00 6C 00 6C 00 6F 00 0D 00 0A 00 57 00 6F 00 72 00 6C 00 64 00
Note the correct BOM for UTF-16 little endian and that every second byte in the text is set to zero.
After downloading it again the content changes like this:
FF FE 48 00 65 00 6C 00 6C 00 6F 00 0D 00 0D 0A 00 57 00 6F 00 72 00 6C 00 64 00
FileZilla transmits the file in text mode and because of that the UTF-16 line ending chars get corrupted to 8 bit ASCII line endings. In this case only 0A 00 got corrupted to 0D 0A 00, because of Windows -> Unix -> Windows.
The file is now 1 byte longer (an odd byte count for UTF-16 files is already invalid) and the content is broken.
Uploading in binary mode avoids this problem, so I suggest that FileZilla detects UTF-16 using BOM or line endings in 0D 00 0A 00 format and uses binary mode for these files by default to avoid corruption.