Opened 4 months ago

Last modified 3 days ago

#13186 new Bug report

Stable way to fetch sources for packaging

Reported by: Robin Candau Owned by:
Priority: normal Component: FileZilla Client
Keywords: Cc: Robin Candau
Component version: Operating system type: Linux
Operating system version:

Description (last modified by Robin Candau)

Hi,

I'm the Arch Linux package maintainer for the filezilla & libfilezilla packages [1][2].

Recently the website's download link for sources became unstable. The related ticket [3] was closed explaining that this was intentional, in order to prevent eventual infrastructure abuses.

While I totally understand the desire to avoid such potential abuses, this causes problems for downstream distributions packaging & redistributing filezilla / libfilezilla.

Indeed, due to its (now) unstable nature, the website download link cannot be used to fetch sources in our packaging workflow anymore (both to fetch valid sources and for reproducibility matters).
I studied switching to SVN sources but it seems tags are not created there anymore [4][5], so we can't really use that either (unless we pinpoint specific revisions but this isn't really desirable)...

As such, distributions have started mirroring filezilla / libfilezilla tarballs on their side (see Open-Suse [6], Alpine [7] & Gentoo [8] for instance).
Of course, this technically works, but this represents significative extra burdens in our packaging workflow and automation.

Can a solution be discussed? If a stable download link is not desirable on your side (regarding eventual infrastructure abuses), would you eventually consider creating tags again on the SVN repo so distributions can use that as a source for their packages?

Since the download page for filezilla [9] "highly recommends to use the package management system of distributions", I think it's fair to ask for a way to facilitate packaging for such distributions.

I remain available to discuss eventual solutions or if any additional information is needed!

[1] https://archlinux.org/packages/extra/x86_64/filezilla/
[2] https://archlinux.org/packages/extra/x86_64/libfilezilla/
[3] https://trac.filezilla-project.org/ticket/13159#no4
[4] https://svn.filezilla-project.org/filezilla/FileZilla3/tags/?sortby=date
[5] https://svn.filezilla-project.org/filezilla/libfilezilla/tags/?sortby=date#dirlist
[6] https://build.opensuse.org/projects/openSUSE:Factory/packages/filezilla/files/filezilla.spec#L=29
[7] https://gitlab.alpinelinux.org/alpine/aports/-/blob/master/community/filezilla/APKBUILD#L24
[8] https://gitweb.gentoo.org/repo/gentoo.git/tree/net-ftp/filezilla/filezilla-3.68.1.ebuild#n15
[9] https://filezilla-project.org/download.php?type=client

Change History (15)

comment:1 by Robin Candau, 4 months ago

Description: modified (diff)

comment:2 by Tim Kosse, 4 months ago

Status: newmoreinfo

the website download link cannot be used to fetch sources in our packaging workflow anymore

Please clarify: Is this about the workflow of a) creating a new package, or version of it, or b) working with already created packages?

In the second case, distributions can simply places the sources right next to the built packages.

As such, distributions have started mirroring filezilla / libfilezilla tarballs on their side

Don't distributions in general already have to preserve a copy of the source tarballs for every single package?

On the technical side, not keeping a copy of the sources prevents rebuilding a package if upstream is unavailable. Thus the continuous availability of each individual download site for the hundreds if not thousands of critical packages turns each of it into a single point of failure. To illustrate with a hypothetical example: Imagine an unavoidable ABI change in glibc [1] forcing a rebuild of just about everything. In this situation, how would you deal with for example x.org [2] no longer being available? Drop every package depending on X11?

Then there is also the legal aspects with software distributed under the GPL and similar licenses: How can one distribute packages of GPL software, without being able to make the sources available, should upstream become unavailable?

[1] time_t on 32bit systems comes to mind.
[2] Or upstream OpenSSL, GTK, Qt, libpng, zlib, .... Endless possibilities

Last edited 4 months ago by Tim Kosse (previous) (diff)

comment:3 by Robin Candau, 4 months ago

Status: moreinfonew

I feel like I wasn't clear enough about the actual issue, sorry about that. Let me try to re-phrase it:

We do indeed preserve a copy of source tarballs for our packages (at least for GPL and similar licenses, due to legal aspect). But before being preserved, those source tarballs have to be downloaded / fetched, obviously.

For numerous distributions, this is handled as part of their packaging workflow. Indeed, their build scripts fetches sources directly from the upstream repo / tarballs (and then preserve it next to the built package when releasing the said package to their package repositories).

However, the intentionally added unstable part in the filezilla's download url (https://dl3.cdn.filezilla-project.org/client/FileZilla_3.68.1_x86_64-linux-gnu.tar.xz ?h=pdb4gbqtw08yYaxFABrHYg&x=1732651086) now makes it impossible for distributions to fetch sources from it in their build scripts (as the latter part is unstable and unpredictable).
As such, fetching the filezilla source now became an extra manual step that has to be performed beforehand, outside from usual distributions' packaging workflow (where this part is usually handled automatically).
For instance, as you can see from the links I provided in my ticket, OpenSuse, Alpine & Gentoo (among others) had to stop fetching the source tarball from Filezilla's website in their build script (due to the above) to rely on a local copy of the said source that they now have to manually download and upload on their side beforehand.

To sum up, the issue is not the source mirroring / preserving, but the initial downloading of said sources that became an extra manual tasks for distributions' package maintainers (due to the download link now being unstable, making it unusuable in build scripts).

I hope this make things clearer, sorry if my initial wording was confusing.

Last edited 4 months ago by Robin Candau (previous) (diff)

comment:4 by Robin Candau, 4 months ago

I will probably start to manually download filezilla sources and upload them somewhere on our side for our filezilla packages to fetch from, like other distributions already did.

I still hope that we can discuss that further and that we are hopefully able to find a solution / compromise that would suit everyone though.
Again, as Filezilla highly recommends users to use distributions' package management system, it feels fair to facilitate distributions' packaging by not requiring such additional manual step for packagers.

Thanks for your consideration.
I remain available :)

comment:5 by Andreas Rönnquist, 4 months ago

As the maintainer of filezilla in Debian I completely agree with Robin here - We have a mechanism in Debian that can automatically scan repositories for new upstream versions, and can simplify the work for a maintainer when packaging a new upstream version. As Robin mentioned, this has been working just fine, but it of course cannot be used any longer.

I too would very much appreciate if a solution to this would be found.

comment:6 by Kai Pastor, 11 days ago

This issue prevents creating a vcpkg port of libfilezilla.

vcpkg provides portable, CI-tested build recipes, but no binary packages for download. So there is no legal requirement to store and provide source tarballs.

However, vcpkg does use source asset caching.

The port could use svn - but that wouldn't reduce the load on filezilla's resources. In fact, it would prevent asset caching, so the load would increase.

comment:7 by Tim Kosse, 11 days ago

vcpkg provides portable, CI-tested build recipes,

Easy. Just place the source tarball right next to the recipe.

comment:8 by Robin Candau, 11 days ago

vcpkg provides portable, CI-tested build recipes,

Easy. Just place the source tarball right next to the recipe.

Correct me if I'm wrong, but the point here for vcpkg is actually to not provide source tarballs with the recipe (as in, users download them locally as part of the recipe execution process), like it's the case for the AUR for instance (where placing source tarball next to the recipe is explicitly forbidden to avoid any legal intricacies implied by redistributing source code or binaries).

In any case, may I ask if there's any plan to address this or not?
Multiple downstream redistributors are affected by this issue, such as Gentoo, OpenSUSE, Alpine (as illustrated in my initial ticket), Debian, vcpkg and Arch Linux (as represented respectively by Andreas, Kai and myself in this issue), yet we didn't get any clear stand from upstream as of now.

Last edited 11 days ago by Robin Candau (previous) (diff)

comment:9 by Robin Candau, 11 days ago

Easy. Just place the source tarball right next to the recipe.

Just to make things clear, what you qualify as "easy" is actually the whole problem this ticket is about.
Having to manually download and store source tarballs on our side for our respective filezilla packages recipes to fetch them is not easy (and breaks our automation).

in reply to:  7 comment:10 by Kai Pastor, 11 days ago

Robin is 100% at the point.

comment:11 by Tim Kosse, 10 days ago

I believe there are three separate aspects being conflated here:

  1. Maintainers of package managers wanting to be informed about new software versions
  1. Maintainers of package managers wanting to obtain the new versions
  1. Users of the package managers downloading the software

Let's start with the last, as this is where the contention actually stems from. There are two types of package managers: Those that host the files themselves, and those that only host metadata.

The first type is completely unproblematic, good job. The problem is with the second type, which always fetches files from upstream. They are offering their services without actually wanting to bear the costs of providing such service in terms of traffic, server hosting and maintenance costs. In short: 'Leeching'.

vcpkg, alongside winget, another of Microsoft's package managers, are perfect examples of this. Utilizing upstream resources to provide a service, without contributing to the costs. "Asset caching" is pointless if every user has their own separate cache.

Public package mangers are just the tip of the iceberg though, there are numerous proprietary systems engaging in the very same behavior.

My stance on this is clear: If you provide a service to distribute and/or deploy software, you host the software. As result, for the time being I won't provide stable download links that allow leeching.

As for the other two aspects, there are a couple of options:

  • A static file on the website stating the latest available versions
  • email notifications on new releases
  • I could have the source tarballs uploaded elsewhere if someone would grant me access to an SFTP server to dump the files to

Which would you prefer?

comment:12 by Robin Candau, 10 days ago

I believe there are three separate aspects being conflated here:

I believe the actual problem still isn't clear... Let me try to clarify once again.

  1. Maintainers of package managers wanting to be informed about new software versions

As far as I can tell, this part was never discussed or exposed as a problem in this ticket. To be clear, this is not what this ticket is about.

  1. Maintainers of package managers wanting to obtain the new versions
  1. Users of the package managers downloading the software

This is the issue this ticket is about (initially point 2, but also point 3 by extension).

Let's start with the last, as this is where the contention actually stems from. There are two types of package managers: Those that host the files themselves, and those that only host metadata.

The first type is completely unproblematic, good job. The problem is with the second type, which always fetches files from upstream.

Both types of package managers actually implies to fetch files from upstream. In fact, they work the exact same way technically speaking. The only difference between the first and the second type in that regard is that, in the first type, a package maintainer does the fetching once beforehand (and also compiles the software) so users don't have to do it themselves (as opposed to the second type where users do all of this themselves locally).

Fetching files from upstream, whether it is done by package maintainers in type 1 or users in type 2, is automated as part of the packaging process when packaging a new release. Our packaging tooling automatically fetches files from upstream according to the source URL declared in our build recipes.

Since the URL to download filezilla & libfilezilla sources became "unstable", this is of course not possible for our tooling to download them anymore (as the random part of the download URL cannot be "fixed" / predicted, obviously).
Therefore, the downloading of upstream sources, which is required in any case to initially fetch them on our side, now became an additional manual step. This breaks our automated packaging workflow for filezilla related packages, making them way more painful to maintain.

They are offering their services without actually wanting to bear the costs of providing such service in terms of traffic, server hosting and maintenance costs. In short: 'Leeching'.

vcpkg, alongside winget, another of Microsoft's package managers, are perfect examples of this. > Utilizing upstream resources to provide a service, without contributing to the costs. "Asset > caching" is pointless if every user has their own separate cache.

Public package mangers are just the tip of the iceberg though, there are numerous proprietary systems engaging in the very same behavior.

I understand your concerns about package managers type 2 in that regard. I just want to expose that the "protection" you implemented to refrain type 2 in that context has collateral damages for type 1 as well, as package managers type 1 also require to fetch files from upstream for each new versions to be able to store and build them (which used to be automated as part of our packaging workflow before the download URL became "unstable").

My stance on this is clear: If you provide a service to distribute and/or deploy software, you host the software.

Sure, but the software sources still need to be downloaded before they can be hosted... Tarballs for new releases are not magically appearing on our side... We have to download them first, which is now more difficult than it used to be. This is what this ticket is about. The initial download of upstream sources used to be automated on our side and now became an extra manual steps for us ("us" being package maintainers of the filezilla related packages at Debian, Alpine, Gentoo, Arch Linux, OpenSUSE... Which, apart from Gentoo maybe, all uses type 1 package managers).

As for the other two aspects, there are a couple of options:

A static file on the website stating the latest available versions
email notifications on new releases
I could have the source tarballs uploaded elsewhere if someone would grant me access to an SFTP server to dump the files to

Which would you prefer?

I doubt having downstream distributors granting you access to their infrastructure for you to upload tarballs there is a desirable solution.

If you really want to maintain and use your own infrastructure to host and expose your software, yet you don't want people to rely on it to fetch the sources; an actual solution could be to mirror sources / sources tarballs to a public service for people to point to that doesn't rely on your infrastructure instead (e.g. GitHub / GitLab or something)?

As result, for the time being I won't provide stable download links that allow leeching.

Now that you know that this also creates issues for distribution offering packages through type 1 package managers (since they actually work the exact same way as type 2 technically speaking), is there any chances that your stance changes?

While I totally understand your concerns regarding your infrastructure, would you consider providing a mirror for source tarballs that do not rely on your infrastructure (or on a different less critical one maybe)? As far as I can tell, that could be a fair compromise that would satisfy all parties?

Last edited 10 days ago by Robin Candau (previous) (diff)

comment:13 by Robin Candau, 10 days ago

Basically, I'm just trying to highlight the fact that making sources more difficult to download makes so for everyone. Not only for "bots" or users of "type 2" package managers, but also for package maintainers packaging and redistributing Filezilla via "type 1" package managers.

While what you implemented *might* mitigate the concerns you have regarding your infrastructure, it is also impacting downstream Filezilla redistributors by making the Filezilla packages maintenance way more painful than it used to be.

My goals with this ticket are:

1 - Make sure that you're aware of the above side effect it has for us (Filezilla package maintainers).
2 - Ask if you're willing to either consider a rollback or to provide an alternative way for us to fetch sources from a stable endpoint.

As I said in the ticket description, since the download page on the Filezilla website [1] "highly recommends to use the package management system of distributions" to install it, I think it's fair to raise the implication this has regarding the Filezilla packaging process for the said distributions.

[1] https://filezilla-project.org/download.php?type=client

comment:14 by Robin Candau, 10 days ago

My stance on this is clear: If you provide a service to distribute and/or deploy software, you host the software.

I'm really sorry to insist and repeat myself but I just want to clarify the above, as I feel this is where the misunderstanding comes from:
The software has to be downloaded before it can be hosted... When a new Filezilla release is being made and a new source tarball is created for it, we still have to download it initially in order to be able to host and redistribute it, right?

For downstream distributions, this process of fetching new upstream releases tarballs is automated as part of their packaging workflow / tooling / scripts. In case this isn't obvious, we are not manually downloading and uploading each tarballs of each new releases for the thousands of packages we are maintaining in our repository.

Well, Filezilla is now one of the software for which we unfortunately have to do all of this manually... Indeed, due to the now "unstable" nature of the download URL for the source tarball, this process can't be automated anymore as it's impossible for automated scripts / tooling to "predict" the random part of the link (which is the whole point, right...).

That means distributions now have to manually download Filezilla source tarballs and manually upload them on their side in order to host and redistribute them. And that is true for each Filezilla related packages they offer through their repositories (meaning at least filezilla and libfilezilla) multiplied by the number of versions of those packages they offer accros their different repositories (for instance, for those who simultaneously maintain multiple releases of their distribution).

I'm aware this might not sound as complex as it actually is but, in the management context of a Linux distribution (meaning with numerous individuals involved accros different roles, each having different processes, level of accesses and permissions over the different infrastructure components and package repositories), it actually is quite cumbersome...

I just wish we can discuss that and hopefully find a compromise that would suite all parties.

Last edited 10 days ago by Robin Candau (previous) (diff)

comment:15 by Andreas Rönnquist, 3 days ago

I agree fully with what Robin says in this issue, and by this message I would just like to add that what he mentioned in the thread start about adding tags again to the SVN repository might at least be a good solution that I guess would solve the problem for most downstreams (if not all).

(I would of course love to see a migration of the project to git, but that is not the topic of this ticket).

Note: See TracTickets for help on using tickets.