[ Friday, 6 March 2009, wiktorw ]
On distribution of Linux programs. In response to the painful article (in Polish, sorry), I’d like to touch the topic of a handy and easy way to install programs under Linux. First and the highest accusation to the current status quo is the impossibility to install the binary package on any Linux distribution, enabling, for example, the possibility to send a file to a friend, for a similar deployment, and the possibility to do a clean uninstallation of all dependencies. The other accusation states that “Linux has become a developers’ system, and not a system for the people (masses ?)
Let me begin with the more emotional part, that is the second accusation.
The money is the problem (or a lack of it)
In the world of Free Software, the basic assumption is that the freedom is not affected. To actually understand this, you have to read the GNU GPL, or to make use of a short summary of the four freedoms brought to you by the GNU creator, Richarda Matthew Stallman.
In my view, this license protects the users’ rights first of all, so that the source code was always available, and the software at all. But it also protects the rights of the programs’ creators, so that no Bad Company (TM) could use their work without giving back their changes. But one thing is the most important here, that is, the free source code availability.
Not that the people using ready-compiled programs would care anyway, because for them, usefulness is the only criteria – that is, if the program works correctly and does what they need. Well, but as you have entered the world of free and open software, you have to speak their way. Or do you ? There is still MacOS, Windows, there I will go with my programs, and I I’m going to get more satisfaction. OK, leave now, don’t read any further and don’t comment – this place deserves some peace.
The world of Open Source stands on the good will. The programmer creates something and gives out his source code – no more and no less. Sometimes – especially, if the project is getting better and has more than one developer – the binary packages come out. And this is about these bad developers, who write their programs poorly, making them dependant of the distributions, and prepare their packages poorly, because the software doesn’t install everywhere smoothly and sometimes it doesn’t even work.
Well, that’s life, I’d say. It is not my job, nor my adversaries’, to evangelize the dozens of dozens of developers in various projects, to put the new and only truth into their heads : please do care about the other users’ convenience. This won’t work, and even if it could, it won’t be a fast process. Software comes to life thanks to all kind of impulses, many times it is egoism (sic!). For example, if there is no software that can do a task I want, and I actually can write it myself, there I start. Sometimes programs are created for fun : just like from the book on Linus Thorvald’s “Just for fun”. Sometimes a program like that is released on some open license to the world – maybe somebody else might make a use of it, or help developing it ?
If this causes my adversary to say that “Linux has become a developers, not people’s system due to the software developers”, then this only shows that actually more people are getting interested in it. And also it shows the increase in usability – up to the point, where “people” start being interested. Funny isn’t it – to reverse the course of the mentioned accusation, it means that developers aren’t human, but well, I don’t suppose my adversary actually meant this.
The Reuse The Code Problem
Every system has its pains, depicted by some widely-known buzzword. In Windows there is “DLL Hell”, in RPM-based distributions I’ve heard of “RPM Hell”, and the Linux-based systems commonly suffer from a “Dependency Hell”. It all however boils down to one question : should everybody write everything from scratch (beware, another buzzword coming : “Not Invented Here Syndrome”), or should the ready solutions be used. And if yes, then how ?
Let me now describe a few solutions to this problem, on Windows and Linux, known to me. Well this is my background actually, because I started programming when there was DOS and Windows 3.1, I’ve survived until about Windows XP, and now, I’d rather Linux.
Windows delivers many (but not too many, that must be said) libraries made by Microsoft. On the other hand, many programs use their private libraries, e.g. as compiled object or private DLL. There comes the ‘smooth’ programs installation under Windows, the handy way to show the default installation path and (at least theoretically) the easy way to uninstall. Let me rephrase that : this platform mostly uses the compiled-in libraries or embedded source code; This is the way most programming environments do it there. Using the dynamically linked libraries (DLL’s) is The Advanced Topic, one must know how to do it and be able to distribute them along. One must also use a good installer suite (to buy e.g. InstallShield), be able to decide whether the DLL’s should go into C:\WINDOWS or the programs’ directory, and so on.
Under Windows, the programs point the linked libraries by name, (e.g. “USER32″, or “COMMCTL32″) and the symbols (function names) used, and the system actually can search for the DLL’s in only a few places. These include : the application or module directory, the current directory, the Windows system directory, the PATH variable directories (recommended reading : LoadLibrary and LoadLibraryEx functions). If the system linker won’t find the library when starting the program, the program is not started up.
Linux/POSIX/UNIX systems exercise a different philosophy, and the system assumes not-too-many shared directories for executables and binary library files. That’s why the package manager system is so important; E.g. dpkg or rpm let you check exactly, per package, which files and where they are installed, and which package owns the file in question.
Shared libraries : the lesser of two evils
There is a few implications coming from using shared libraries, shown here ordered by importance :
- Reusing the same code in multiple programs
- Disk space savings
- Memory usage and application startup time savings
- Easy and effective way of updating libraries
The first reason is quite obvious. The second one, given the modern multi-gigabyte disks is almost not important at all, especially in desktop type usage.
The third reason gives some food for thought, it is however quite important. If every application loaded its dependencies into memory, maybe less the system libraries, the memory usage would grow with every started-up program. Additionally, every surplus disk read at the application cold start does slow the process down for sure.
The last reason is the most important. There doesn’t exist a program that has no bugs, and the ones written in C/C++ are especially prone to typical memory management errors. The languages that do garbage collecting are better in this field, but every language can be used to commit a bug, for example the one resulting from the [over-]optimistic approach to algorithm implementation. In any case, every modern and self-aware system platform must not overlook the issue of upgrades. Would that be a new version or security patches, every application and every library needs to be updated sometime.
The fact of an existing single copy of a library, in a given major version, allows for easier swap-over to a new minor version, with upgraded functionality. Installing a new major version of the library shall not pose a problem at all, because in most cases means having a new file name – so they can coexist. The Linux-based systems are simpler in this area, because the libraries have given names, e.g. libc-2.6.1.so, pointed to by libc.so.6 – being a symbolic link. So it is enough, that the program requested a library in a given major version (libc.so.6) so that the specifically used version will only depend on the system developers’ minds and the state of up-to-date in the system.
It is worth noticing, that the binary compatibility is mostly an issue of the binary packages, containing programs/libraries compiled into the processors’ native code and using shared libraries. It can be an issue, because it needs the system linker and the rules to match the libraries and symbols contained. The programs created in e.g. interpreted languages don’t experience this problem very much. In most of them, the library API is at most expanded in newer versions, rather than being fundamentally changed.
From my own experience, on a side note I’d like to add, that the programs written in Java and deployed as .class or .jar assume the worst : every given program is a separate application and is not supposed to share the libraries, and carries along everything that is needed. It is quite a justified assumption, because even the virtual machine only runs one Java program at a time. The exception is made when shared libraries are pointed to in CLASSPATH variable, or when using Gentoo – where the Java team does miracles, to get the situation civilized (that is: to manage it like normal package)
Now there comes another question – how to update the programs, that use private libraries that have known security problems ? And what if the problem is known to affect the given version, compiled into a given program? How can one manage to update programs like that ?
The answer is obvious, one must be able to find all private copies of the libraries and update them (and hope that the programs using them will still work) or find out which program was affected by the security bug and be able to install a newer version of it (provided it exists). One can also delete the program from the system altogether, although if it is needed, this is no option. So if one is only interested in ‘whether it works’ criterion, he wouldn’t update neither the programs nor private libraries installed along. And when the system is infected by spyware or owned (controlled by somebody/something else), he will reinstall it afresh, with all programs he needs. ( note irony mode off here, though i used it under disguise)
Source code level compatibility
Let’s come back again to Linux-based systems, just to remind that of the binary compatibility I have written an article already. I only want to remind of the most important theorem : in the free software world, the source code level compatibility is the most important matter. Because this is the level the programmers work at, and if somebody wants to release a package for a given distro, then – well, sometimes the developers help out, sometimes the distro creators must do it themselves. The bigger the project, the better they cope with this task and prepare the packages themselves (in numerous formats) or cooperate with a few distros. The rest of the world has binaries, e.g. tar.gz or the source code as the last resort. Most projects have less manpower though, and this I think was the reason where my adversary got irritated.
One must not get over the moon with this, though – binary files of the package (given it doesn’t touch system configuration) can be archived together and copied from one system to another and be able to run. Given, that is, the compatibility at the native processor code level and dependencies. For example, the binary OpenOffice downloaded from the official project page always did work for me on Gentoo, and on a Debian based systems, after selective unpacking and integrating the suite with the system by hand (the task that the package manager should do – but such are the joys of “version ricing”). By the way when the packages appeared in the native distro format I preferred them, because e.g. the fonts were looking better. Which is also a case of problems experienced in programs using compiled-in or private freetype or pango libraries.
The compatibility, portability and fulfillment of the above dependencies are ensured at the stage of compilation of the program. If the source code needs autotools for example, then to generate the project description and its dependencies it is enough to create configure.in and Makefile.in. Then, prior to the compilation one calls ./configure that generates the Makefile we need, and compiles using make. One must however remember, that it is the configure stage, that gives the most flexibility, by accepting various parameters. For example, the http://www.sunfreeware.com site, where the binaries are provided for Solaris, shows how to install programs and libraries regardless of the system available versions, relatively easy. So then most of the executables from the packages are placed in /usr/local/bin, some have their own directories like /usr/local/apache2. All that thanks to the ./configure options, like prefix (!)
But, who cares of the package compilation alternatives at all ? Aren’t that the developers, who are supposed to give out to people waiting eagerly for their programs, a ready, bug-free, binary versions, always coming in the most fresh releases ? Maybe. Please don’t forget to tell me when this moment comes, OK ?
For now, the programmers do their own job, driven by various impulses (they have their own right to them) they are busy doing other things too, sometimes have some family or even a day job. The only support they require for their FLOSS projects are autotools – so far good enough for the GNU project, Linux and others, and still being maintained and developed for the last 17 years.
I could agree, that the way to build and compile the software could be changed. Could be adjusted to The Modern Way (TM) and needs. So that just by executing one command one could generate ready binary packages in various formats, for many various architectures, with as loose indication of dependencies as possible, so the user could install everything and run it with no major problems.
At the end of this article, I’d like only to remind, that the handy distribution model like that has one serious issue. One must control the source the package is downloaded from, exactly like under Windows. The popularity of “full” packages and the lack of good will, lead to the possibility to create vicious compilations, infected by malware. Getting the programs through the distros pipeline has one basic upside, that is, it is a natural sieve, separating the suspicious software from the users’ computers. The official repositories with trusted source is the base to the stability of various servers and the peace of mind for their admins.
Maybe, nevertheless, it is worth to write to the developers, distributors, go to conferences and talk with them face-to-face about the better way to distribute software – one more handy for programmers as well as for ‘the people’. Be this optimistic conclusion a punchline for today, and a greeting for those who read this article ’till The End.
This article was originally written by Wiktor Wandachowicz
Translated-by : el es