Opened 4 years ago

Closed 4 years ago

#12102 closed Bug report (rejected)

SFTP uploads to Cerberus Server of 1GB+ fail

Reported by: Rapidity Owned by:
Priority: normal Component: FileZilla Client
Keywords: SFTP fileshare Cerberus Cc:
Component version: Pro 3.46.3 Operating system type: Windows
Operating system version: Client 7/10 Server 2012r2/2019

Description

Our clients cannot upload large files to our Cerberus server v11 - anything 1 Byte over 1GB from a fileshare fails. Connection seems to reset and FZPro thinks the file is already there and asks to overwrite etc. If the file is on the computer local to FZPro the transfer works.

As far as I can tell this issue started with v3.46.1 – 3.46.0 worked OK. Issue happens with older versions of Cerberus too – tried a rolledback version of our server using v10 and the same issue.

A quick fix is to turn FileZilla's default connection timeout of 20 seconds off but our client does not see that as an acceptable solution (their users are prevented from making changes in the client - they are a financial institution).

Logs attached of a failed transfer (filezilla20sectimeout.log), a successful transfer (filezilla20sectimeout) and the corresponding Cerberus server log. File used was a 4GB Windows 10 installer – tested the successful upload by downloading it and doing an install from it.

Filezilla version is Filezilla Pro 3.46.3 and Cerberus version is 11.0.8.0 on a fully patched Windows Server 2019 AWS instance.

Attachments (3)

filezillanotimeout.log (11.6 KB ) - added by Rapidity 4 years ago.
cerberus.server.1.log (13.8 KB ) - added by Rapidity 4 years ago.
filezilla20sectimeout.log (14.2 KB ) - added by Rapidity 4 years ago.

Download all attachments as: .zip

Change History (4)

by Rapidity, 4 years ago

Attachment: filezillanotimeout.log added

by Rapidity, 4 years ago

Attachment: cerberus.server.1.log added

by Rapidity, 4 years ago

Attachment: filezilla20sectimeout.log added

comment:1 by Tim Kosse, 4 years ago

Resolution: rejected
Status: newclosed

Let's analyze the log, first the client:

2020-02-04 11:00:43 38624 2 Command: put "U:\software\microsoft\Windows_101903.iso" "Windows_101903.iso"
2020-02-04 11:00:43 38624 2 Command: local:U:\software\microsoft\Windows_101903.iso => remote:/Windows_101903.iso
2020-02-04 11:02:03 38624 2 Trace: Initiating key re-exchange (too much data sent)

From the client's perspective, the transfer starts normally, and eventually it initiates a key re-exchange after 1 GiB has been sent. It is a blocking command to which a reply is needed before the client can do anything else.

2020-02-04 11:04:29 38624 2 Trace: Doing ECDH key exchange with curve nistp256 and hash SHA-256 (unaccelerated)
2020-02-04 11:04:29 38624 2 Trace: Doing ECDH key exchange with curve nistp256 and hash SHA-256 (unaccelerated)

It took the server over 2 minutes to reply to the key exchange request.

2020-02-04 11:05:06 38624 2 Trace: Initialised AES-256 SDCTR (AES-NI accelerated) outbound encryption
2020-02-04 11:05:06 38624 2 Trace: Initialised HMAC-SHA-256 (unaccelerated) outbound MAC algorithm
2020-02-04 11:05:17 38624 2 Trace: Initialised AES-256 SDCTR (AES-NI accelerated) inbound encryption
2020-02-04 11:05:17 38624 2 Trace: Initialised HMAC-SHA-256 (unaccelerated) inbound MAC algorithm
2020-02-04 11:05:17 38624 2 Trace: Initiating key re-exchange (too much data sent)

Very quickly the client sends enough data to require another key exchange.

2020-02-04 11:07:30 38624 2 Trace: Doing ECDH key exchange with curve nistp256 and hash SHA-256 (unaccelerated)
2020-02-04 11:07:30 38624 2 Trace: Initialised AES-256 SDCTR (AES-NI accelerated) outbound encryption
2020-02-04 11:07:30 38624 2 Trace: Initialised HMAC-SHA-256 (unaccelerated) outbound MAC algorithm
2020-02-04 11:07:30 38624 2 Trace: Initialised AES-256 SDCTR (AES-NI accelerated) inbound encryption
2020-02-04 11:07:30 38624 2 Trace: Initialised HMAC-SHA-256 (unaccelerated) inbound MAC algorithm
2020-02-04 11:07:30 38624 2 Trace: CSftpFileTransferOpData::ParseResponse() in state 4
2020-02-04 11:07:30 38624 2 Trace: CControlSocket::ResetOperation(0)
2020-02-04 11:07:30 38624 2 Trace: CSftpFileTransferOpData::Reset(0) in state 4
2020-02-04 11:07:30 38624 2 Status: File transfer successful, transferred 4,271,374,336 bytes in 407 seconds

Eventually the transfer succeeds though.

Now the server:

[2020-02-04 11:00:43]:CONNECT [ 6] - [217.138.8.66]:rapbackup - Channel Open: 'session', Sender Channel: 256, Init Window Size: 2147483647, Max Packet Size: 16384

That is a bizarrely large window.

[...]
[2020-02-04 11:00:43]:COMMAND [ 6] - [217.138.8.66]:rapbackup - File open command received for '/Windows_101903.iso' requesting WRITE,CREATE,TRUNC
[2020-02-04 11:00:43]: SYSTEM [ 6] - [217.138.8.66]:rapbackup - File handle to '/Windows_101903.iso' opened
[2020-02-04 11:02:03]: SYSTEM [ 6] - [217.138.8.66]:rapbackup - Upload started for file 'D:\FTProot\SFTP\rapbackup\Windows_101903.iso'

This delay is very strange, why isn't this basically instant? This requires further investigation by the server administrator or the server vendor.

[2020-02-04 11:04:29]: INFO [ 6] - [217.138.8.66]:rapbackup - Increasing sender channel window by '2146185588' bytes

The bizarre window becomes grotesque.

[2020-02-04 11:04:29]:CONNECT [ 6] - [217.138.8.66]:rapbackup - Key Re-Exchange Started
[2020-02-04 11:04:29]:CONNECT [ 6] - [217.138.8.66]:rapbackup - Algorithm negotiation complete: Proceeding with key exchange
[2020-02-04 11:04:29]:CONNECT [ 6] - [217.138.8.66]:rapbackup - Kex: 'ecdh-sha2-nistp256' Host Key: 'ssh-rsa' C2S : 'aes256-ctr, hmac-sha2-256, none' S2C : 'aes256-ctr, hmac-sha2-256, none'
[2020-02-04 11:07:30]:CONNECT [ 6] - [217.138.8.66]:rapbackup - Key Re-Exchange Started
[2020-02-04 11:07:30]:CONNECT [ 6] - [217.138.8.66]:rapbackup - Algorithm negotiation complete: Proceeding with key exchange
[2020-02-04 11:07:30]:CONNECT [ 6] - [217.138.8.66]:rapbackup - Kex: 'ecdh-sha2-nistp256' Host Key: 'ssh-rsa' C2S : 'aes256-ctr, hmac-sha2-256, none' S2C : 'aes256-ctr, hmac-sha2-256, none'
[2020-02-04 11:07:30]: SYSTEM [ 6] - [217.138.8.66]:rapbackup - Successfully stored file at 'D:\FTProot\SFTP\rapbackup\Windows_101903.iso' (4271374336 B received)
[2020-02-04 11:07:30]: SYSTEM [ 6] - [217.138.8.66]:rapbackup - Handle '/Windows_101903.iso' closed

Eventually the transfer succeeds.

Observations:

  • Based on the time between the first kex and initializing the next, the network is very fast, looks like 1GBit/s.
  • The overall duration of the transfer suggests that the server itself can only handle about 83MBit/s, unknown whether it is bound by CPU or storage limitations.
  • The server's receive window is huge, gets nearly as big as it can possibly be. While it has enough _virtual memory_ to buffer this much data, unknown whether it has enough actual RAM or needs to swap.

Conclusion: Too big of a window. The window size should not grow larger than than the bandwidth delay product. To illustrate, with this large of a window, at a network latency of as high as 200ms, a window of 4GB would require a network link capable of 160GBit/s to saturate

Solution: The server needs to advertise a much smaller window more suitable for its capabilities, additionally it probably should not buffer so much data to begin with and only take a packet from the network if its buyffers are mostly empty.

Note: See TracTickets for help on using tickets.