Patrick Koetter, Ralf Hildebrandt, The Book of Postfix

Postfix is fast becoming a popular alternative to sendmail. Although it can be complex to configure, it’s easier to use Postfix with additional filtering applications, for example Spam and virus filters, than with some other mail transfer agents. I spoke to Patrick Koetter and Ralk Hildebrandt about The Book of Postfix, the complexities of configuring Postfix, Spam, and email security.

The Book of PostfixHow does Postfix compare to sendmail and qmail?

Ralf Hildebrandt (RH): As opposed to sendmail, Postfix was built with security in mind.

As opposed to qmail, Postfix was built for real-life systems in mind that have to adapt to the hardships of the Internet today. qmail is effectively unmaintained.

Patrick Koetter (PK): That’s a tough question because I am not one of those postmasters who spent half their life working with Eric Allman’s Sendmail nor did I spent too much time enlarging my knowledge on qmail, so I can’t give you an in detail answer that will really tackle specific features or functionalities.

Let me give it a different spin and try if that answers it:

When I took out to run my first own mailserver I looked at Sendmail, qmail and Postfix.

Sendmail to me was too complicated to configure and since my knowledge of the M4 macro language was very little, but my fear of loosing e-mail or even configuring my server to be an open relay was large I dropped it. The ongoing rally of CERT Advisories about this or that Sendmail exploit by then didn’t make it a hard choice.

Then I took a look at qmail, but wasn’t really sure I wanted it because it is more or less a series of patches if you want to use it with nowadays feature range. But I gave it a try anyway and ended up asking some questions on the mailing list because the documentation would not answer what I was looking for.

To cut it short: I was under the impression you had to enter the “Church of qmail” before anyone would take the time to answer a question to a qmail novice. It might have changed since then, but back then I left and I never looked back because all I wanted was to run a MTA.

Finally I took a look at Postfix and was very surprised by the amount of documentation that was available. I also immediately fell in love with the configuration syntax, which seemed to simple and clear to me. For a while I thought this must be a very feature limited MTA, but the more I read the more I understood that it did almost the same things, but was simply easier to configure.

I finally decided to stick with Postfix after I had joined the Postfix mailing list and found out that people really cared for my questions, pointed me to documentation to read again or give me advice on how to do this or that more efficient.

Of course, as the Postfix community grew larger, one or the other character turned up who would rather lecture someone seeking help, but the overall impression still remains the same.

Postfix is well maintained, its security record is unbeaten up to now and the community is how I wished every community supporting a software should be. The modular software architecture Wietse Venema has chosen makes it easy to expand Postfix’ capabilities. Its a system that can grow very well. I haven’t seen another piece of software that does the complex job of being a MTA that well.

Postfix seems a little complex to install - there are quite a few configuration files, some of which seem to contain arcane magic to get things working. Is this a downside to the application?

PK: That’s the provoking question, isn’t it? ;)

To me Postfix is as simple or complex as the process of mail transport itself is. That’s why we added so many theory chapters to the book that explain the e-mail handling process before we took out to explain how Postfix does it in the follow-up chapter. If you understand the process its pretty straightforward to configure Postfix to deal with it.

But basically all you need is three files, main.cf, master.cf and the aliases file. Wait! You could even remove the main.cf file and Postfix would work with reasonable defaults on this specific server.

The main.cf file carries all parameters that are applied globally. If you need options that are specific to a special daemon and should override global options from main.cf, you add them in master.cf in the context of that special daemon. That’s the basic idea of configuring Postfix.

Then there is a lot of tables in the /etc/postfix directory, which you usually don’t need unless you take out or configure a specific feature that isn’t part of basic functionality.

Sure, the amount of tables might frighten a novice, but then they are there for the sole purpose of supporting a novice and even advanced users because they hold the documentation about what the specific table is about and how you would add entries to the table if you wanted to use it.

The rest is complexity added by additional software, for example Cyrus SASL which is a royal pain for beginners.

Of course your mileage will vary when you take out to configure a full blown MTA that incorporates Anti-Spam measures, Anti-Virus checking, SMTP Authentication and Transport Layer Security, where Postfix looks up recipient names and other information from an LDAP server that also drives an IMAP MTA.

But when you begin it boils down to the two configuration files and an aliases file.

As for the “arcane magic” I don’t exactly understand what you relate to so I do some speculation based on my own experiences.

I struggled with smtpd_*_restrictions for quite a while until I realized: “Its the mail transport process that makes it so hard to understand.” Once you’ve understood how a SMTP dialog should be processed it suddenly seems very simple. This is at least what happened to me. I recall hours sitting in front of these restrictions, Ralf ripping hair out of his head and looking at me as if I was from another planet.

The quote we used in the restrictions chapter alludes to that day and it also contains the answer I came up with: “To know what to restrict you need to know what what is.” I looked the “what” parts up in the RFCs, understood what smtpd_*_restrictions were all about and saved Ralf from going mad ;)

But that’s specific to smtpd_*_restrictions. For all other parameters and options it pays to read RFCs as well, but you can get very far by reading the excellent documentation Wietse has written _and_ by looking at the mere names he used for the parameters. Most of the time they speak for themselves and tell you what they will do. I think Wietse has done a great job at thinking of catchy self-explanatory parameter names.

RH: Postfix works with the default main.cf and master.cf. If you have advanced requirements, the configuration can get elaborate. But configuration files like I created them and also offer them on http://www.stahl.bau.tu-bs.de/~hildeb/postfix/ have evolved over several years of use (and abuse of the Internet by Spammers) - I never thought “That’s the way to do it”, but it was rather “trial and error”.

Postfix seems to work exceptionally well as a mail transport agent - i.e. one that operates as an intermediate relay or relayhost (I’ve just set up a Postfix relay that filters spam and viruses, but ultimately delivers to a sendmail host, for example). Is this because of the flexible external interface Postfix uses?

RH: It also works excellent as a mailbox host :) Over the years, Wietse added features for content filtering and the ability to specify maps that tell the system which recipient addresses should be accepted and send on further inwards.

That makes it easy to say “Instead of throwing away our old Product-X server, we simply wedge Postfix in between”

But there’s no special preference as “intermediate relay”. It’s an universal MTA. We use it everywhere. Also for the server handling the mailboxes. Or our list exploder.

Do you have a preferred deployment platform for postfix?

PK: Basically I go for any platform that suits the needs. As for Linux I prefer distributions that don’t patch Postfix, but that’s only because I support many people on SMTP AUTH issues on the Postfix mailing list and some maintainers have taken to do this or that different, which makes configuring SMTP AUTH even harder.

Personally I’d go for RedHat Linux because I know it the best and produce good results faster as on other platforms. But then I wouldn’t hesitate a second to go for something else if it suits the scenario better. That’s another side of Postfix I like very much: It runs on many many systems.

RH: Debian GNU/Linux with Kernel 2.6.x. Patrick begs to differ on the Debian thing. Anyway, it works on any Unixoid OS. I ran it on Solaris and HP-UX back in the old days.

You cover the performance aspects of Postfix. Is it particularly taxing on hardware?

PK: That’s a question that turns up regularly on the Postfix mailing list. Read the archives… ;)

But seriously, you can run Postfix for a single domain on almost any old hardware that flies around. If your OS works with the hardware Postfix will probably go along with it as well.

The more domains you add the more mail you put through the likelier of course that you will get to the limits. But those limits usually aren’t limits imposed by Postfix, but by the I/O performance of your hardware.

Think of it this way: Mail Transport is about writing, moving and copying little files in the filesystem of your computer. The MTA receives a mail from a client and writes it to a mail queue where it waits for further processing. A scheduler determines the next job for the file and the message is moved to another queue. There it might wait another while until it gets picked up again to be delivered to another, maybe remote destination. If the remote server is unreachable at the moment it will be written back to the filesystem again to another queue and so an and so on until it finally can be removed after successful delivery.

The calculation to decide what to do with the mail doesn’t take a lot of time, but writing, moving and copying the file takes a lot longer. That’s due to the limitations of hardware. Hard discs nowadays really can store a lot of e-mail away, but the access speed didn’t grow at the same time. Still you need to stick to them because storing the message in a temporary device would lose the mail if the system was turned off suddenly.

So the basic rule is to get fast discs, arrays and controllers when you need to handle _a lot_ email. Regular hardware does it for private users quite well.

Another slowdown you should be prepared to expect is when you integrate Anti-Spam and Anti-Virus measures. They do not only require to read and write the files they also examine the content which often requires to unpack attached archives. This will temporary eat some of your CPU. But that’s something current hardware can deal with as well.

For hard facts you will need to find somebody who is willing to come up with a real world and well documented test scenario. So far one or the other has posted “measurement data”, but none of them would really tell about their setup and how they tested. Also I don’t know about a sophisticated comparison of Sendmail, qmail and Postfix.

Most of the “comparisons” I’ve heard weren’t able get rid of the odor of “because you wanted it to be better”.

Such tests are not what Postfix is and, as far as I can say without asking him, isn’t Wietse Venema about. I vividly recall him posting “Stop speculating, start measuring!” to someone who came up with a performance problem. I like that attitude a lot, because comparisons should be about facts and not believe.

I enjoyed the in-depth coverage on using certificate based security for authenticating communication between clients and servers. Do you see this as a vital step in the deployment process?

PK: Vital or not depends on your requirements and your in-house policy. Personally I do like certificate based relaying a lot and I think it should be used more widely, because you could really track spam a lot better down and would gain a more secure mail transport at the same time, but then certificate based relaying simply lacks the critical mass of servers and clients supporting it.

As long as you don’t have the critical mass of servers and clients using it there will always be a relay that does it without and that can be tricked to relay spam one or the other way and you loose track of the sender.

It also takes more work to configure, but especially maintain certificate based relaying because you need to maintain the list of certificates. You need to remove the ones that are expired, add others, hand out new ones, this and that…

I think its a “good thing to do [TM]” if you use it in your company, have many mobile users, but most of all (!) have all clients and serves under your control. Then you can automatize some of the work that needs to be done and all that together can pay up for the security and simplicity you get on your network.

But I doubt any private user would be willing to pay the additional fee for maintenance not to mention the certificate infrastructure to maintain the certificates themselves.

Was it Yahoo who had some certificate based Anti-spam measure on their mind? So many attempts to fix the effects of Spam… I think what we really need is a redesign of SMTP to cope with the current challenges. But that’s another topic and I am certainly not the one to be asked how it should be done. ;)

Is it better to use files or MySQL for the control tables in Postfix?

RH: “He said Jehova!”

Performance-wise mysql just sucks. The latency for queries is way higher than when asking a file based map. But then with mysql maps, any changes to the map become effective immediately, without the daemons that use the map having to exit and restart again. If your maps change often AND you get a lot of mail: mysql In all other cases: file based maps.

And: Keep it simple! If you don’t NEED mysql, why use it?

PK: I don’t think there’s a better or worse, because either way you loose or you gain something, but what you loose and gain aren’t the same things:

From a performance point of view you loose a lot of time when you use SQL or LDAP databases because of their higher lookup latency so you might want to stick with the files.

But then, if you host many domains, you win a lot when you maintain the data in a database. You can delegate many administrational tasks to the end user who accesses such a database through some web frontend. So there’s the pro for databases.

If you need both, performance and maintainability, you can build a chain from databases and files. The editing is done in the database and job on your computer checks the database on a regular base and builds (new) files from it when the data has changed. This way you get the best of both worlds for the price of a little delay after changes had been done in the database.

IMAP or POP?

PK: An old couple sits in the kitchen at home.

She: “Let’s go to the movies.”
He: “But we have been to the movies just recently…”
She: “Yes, but they show movies in colour AND with sound now!”

Definitely IMAP ;)

RH: Depends on your needs. Let the user decide: go for courier-imap (which also does pop), so the user can choose.

Is there a simple solution to the spam problem?

RH: Mind control? Orbital lasers? No, but Postfix’s restrictions and the possibility of delegating policy decisions to external programs can help.

PK: No, unfortunately not. There are too many reasons why Spam works and a working solution would have to be technical, political and business oriented at the same time.

First of all it works because the SMTP protocol as designed has little to no means to prove that a message was really sent by the sender given in the e-mail. Anybody can claim to be anybody. As long as this design problem persists it will cost a fortune to track spammers down.

Even if you know where the spam came from the spammer might have redrawn to a country that don’t mind spammers and will protect them from being pursued by foreign law.

The world simply lacks anti-spam laws all countries agree on. You typically are forced to end your chase for a spammer the moment you pass another countries borders because you are not entitled to chase the suspect.

Still, if you where entitled to do so, if costs a fortune to track a spammer down and even then it might take ages to get some money for the damage they have done. Is your company willing to pay that much just to nail one spammer down when another two emerge the moment the one goes behind bars?

And then Spam works, because it is so cheap. You buy a hundred thousand addresses for 250 bucks or even less and IIRC Yahoo found out that 1/3 of their mail users read spam and VISIT the pages they promote.

If one wants to make it go away one must make it expensive for those that send or endorse spam. If you ruin the business concept no one will send spam. That’s business… ;)

To sum my position up: The problem is global and we don’t have the right tools to hinder the cause. Currently all we can do is diminish the effect, by using as many anti-spam features as we can think of.

Do either of you have a favourite comic book hero?

PK: The “Tasmanian Devil” is my all time favourite. I even have a little plastic figure sitting in front of me under my monitor, which has become some kind of talisman. It reminds me to smile about myself on days where I’d rather go out and kill somebody else for not being the way I would want them to be ;)

RH: Calvin (of Calvin and Hobbes)
or
Too much Coffee Man!

Author Bios
Ralf Hildebrandt and Patrick Koetter are active and well-known figures in the Postfix community. Hildebrandt is a systems engineer for T-NetPro, a German telecommunications company, and Koetter runs his own company consulting and developing corporate communication for customers in Europe and Africa. Both have spoken about Postfix at industry conferences and contribute regularly to a number of open source mailing lists.

Cristian Darie, Mihai Bucica; Beginning PHP 5 and Mysql E-Commerce

PHP and MySQL are common solutions in many web development situations. However, when using them for e-commerce sites some different techniques should be employed to get the best out of the platforms. I talked to Cristian Darie and Mihai Bucica about their new book which uses an interesting approach to demonstrating the required techniques; the book builds an entire T-Shirt ordering shop.

Beginning PHP 5 and Mysql E-CommerceCould you give me, in a nut shell, the main focus of the book?

When writing “Beginning PHP 5 and MySQL E-Commerce”, we had two big goals of equal importance in mind. The first goal was to teach the reader how to approach the development of a data-driven web application with PHP and MySQL. We met this goal by taking a case-study approach, and we did our best to mix new theory and practice of incremental complexity in each chapter.

The second goal was to provide the knowledge necessary to build a fully functional e-commerce website. We did our best to simulate development in a real world environment, where you start with an initial set of requirements and on a low budget, and along the way (eventually after the company expands), new requirements show up and need to be addressed.

You can check out the website that we build in the book at http://web.cristiandarie.ro:8080/tshirtshop/.

Why use PHP and MySQL for e-commerce? Do you think it’s easier to develop e-commerce sites with open source tools like PHP and MySQL?

Generally speaking, the best technology is the one you know.

PHP and MySQL is an excellent technology mix for building data-driven websites of small and medium complexity. The technologies are stable and reliable, and the performance is good.

However, we actually don’t advocate using any particular technology because we live in the real world where each technology has its strengts and weaknesses, and each project has its own particularities that can lead to choosing one technology over the other. For example, if the client already has an infrastructure built on Microsoft technologies, it would probably be a bit hard to convince him or her to use PHP.

As many already know, for developers that prefer (or must use) ASP.NET and SQL Server, Cristian co-authored a book for them as well - “Beginning ASP.NET 1.1E-Commerce: From Novice to Professional”, with the ASP.NET 2.0 edition coming out later this year.

You de-mystify some of the tricks of the e-commerce trade - like rankings and recommendations; do you think these tricks have a significant impact on the usability of your site?

The impact of this kind of tricks is very important not only from the usability point of view, but also because the competitors already have these features implemented. If you don’t want them to steal your customers (or sell more than you do), read the 9 pages chapter about implementing product recommendations, and add that feature for your own website as well.

You don’t use transactional database techniques in your book - is this something that you would recommend for heavy-use sites?

Yes. “Beginning PHP 5 and MySQL E-Commerce” is addressed to beginning to intermediate programmers, building small to medium e-commerce websites - as are the vast majority of e-commerce websites nowadays. The architecture we’re offering is appropriate for this kind of website, and it doesn’t require using database transactions. For a complex, heavy-traffic website, a more advanced solution would be recommended, and we may write another book to cover that scenario.

The book shows in detail the implementation of an e-commerce website - do you know of anybody using this code for their own site?

Although the book is quite new, we’ve received lots of feedback from readers, some of them showing us their customized solutions based on the code shown in this book. Some of these solutions are about to be launched to production.

Credit card transactions always seemed to be the bane of e-commerce, especially for open source technology. Is it true that this has become easier recently?

Yes. Because much more e-commerce websites are built with open source technologies than they used to be, the payment gateways have started providing APIs, documentation and examples for PHP, just as they are doing for .NET and Java. This makes the life of the developer much easier.

Larger e-commerce applications may require more extensive deployment environments - are the techniques you cover here suitable for deployment in a multi-server environment?

PHP has its own limitations that make it innapropriate for extremely complex applications, but for the vast majority of cases PHP is just fine. The techniques we cover in the book aren’t meant to be used in multi-server environments; for these kinds of environments PHP may not be your best choice, but then again, it all depends on the particularities of the system.

Obviously PHP and MySQL provide the inner workings to an e-commerce site. Do you think the website design is as important as the implementation?

Of course, the website design is critical, because it reflects the “face” of your business. As we’ve mentioned in the book, it just doesn’t matter what rocket science was used to build the site, if the site is boring, hard to find, or easy to forget. Always make sure you have a good web designer to complement the programmers’ skills.

What do you do to relax?

Well, we’re both doing a good job at being 24 years old…

Author Bios

Mihai Bucica

Mihai Bucica started programming and competing in programming contests (winning many of them), all at age twelve. With a bachelor’s degree in computer science from the Automatic Control and Computers Faculty of the Politehnica University of Bucharest, Romania, Bucica works as an Outsourcing Project Manager for Galaxy Soft SRL. Even after working with a multitude of languages and technologies, Bucica’s programming language of choice remains C++, and he loves the LGPL word.

Cristian Darie

Cristian Darie, currently the technical lead for the Better Business Bureau Romania, is an experienced programmer specializing in open source and Microsoft technologies, and relational database management systems. In the last 5 years he has designed, deployed, and optimized many data-oriented software applications while working as a consultant for a wide variety of companies. Cristian co-authored several programming books for Apress, Wrox, and Packt Publishing, including Beginning ASP .NET 2.0 E-Commerce, Beginning PHP 5 and MySQL E-Commerce, Building Websites With The ASP.NET Community Starter Kit, and The Programmer’s Guide to SQL. Cristian can be contacted through his personal website, www.CristianDarie.ro.

Garrett Rooney, Practical Subversion

Subversion is having what can only be described as a subversive effect on the versioning software environment. CVS has long been the standard amongst programmers, but it has it’s faults and Subversion (read Sub-version) addresses those known and perceived about CVS. I talked to Garrett Rooney about his book Practical Subversion, his contributions to the Subversion code and where Subversion fits into the scheme of your administration and development environments.

Practical SubversionI see from the book you are a strong believer in version control - can you summarize the main benefits of version control?

I like to think of version control as a way of communicating information between developers.

When you commit a change to a source tree you can think of it as an automated way of telling every other developer how they can fix the same problem in their source tree. The benefits go further though, since in addition to keeping everyone on the team up to date with the latest fixes, you’re also recording all of the history. This means that later on, when you want to figure out how a piece of code got the way it is you can look at the series of changes (and hopefully the justification for the changes, if you’ve been good about writing log messages) that let to the current situation. Looking at that history is often the best way to understand why the code got the way it is, which means you’re less likely to make the same mistake twice when making new changes.

So version control is really a way to help you communicate, both with other people working on your project right now and with those working on it in the future.

There’s been a lot of discussion online about the benefits of Subversion compared to the previous preferred environment of CVS. How much better is Subversion?

I recently had to start using CVS again, after a rather long period of time where I’d only used either Subversion or Perforce, a commercial version control system. It never ceases to amaze me, whenever I go back to CVS, how irritating it is to use.

Let’s start with the basics. Lots of things in CVS are slow.

Specifically, lots of operations that I like to do fairly often (’cvs diff’ is the big one here) need to contact the repository in order to work, this means going out over a network, which means it’s pretty slow. In Subversion the equivalent command is lightning quick, since your working copy keeps a cached copy of each file, so it doesn’t have to contact the server in order to show you the difference between the file you started with and the new version you created.

There are other parts of CVS that are also quite slow when compared to Subversion. In CVS the act of tagging or branching your source tree requires you to make a small change to each and every file in the tree. This takes a lot of time for a large tree, and a noticable amount of disk space. In Subversion the equivalent operation takes a constant, and very small amount of time and disk space.

The other big improvement is the fact that in Subversion changes are committed to the source tree in an atomic fashion. Either the entire change makes it in or none of it does. In CVS you can get into a situation where you updated your working copy in the middle of a commit, resulting in you getting only half of the changes, and thus a broken source tree. In Subversion this doesn’t happen.

The same mechanism means that it’s much easier to talk about changes in Subversion than in CVS. In CVS if you have a change to five separate files in order to talk about it you need to talk about the individual change to each file. “I committed revision 1.4 of foo.c, 1.19 of bar.c, …” This means that if someone wants to look at the change you made to each file they have to go look at each individual file to do it. In Subversion you just say “I committed revision 105″, and anyone who wants to look at the diff can just say something like “svn diff -r104:105″ to see the difference between revision 104 and revision 105 of the entire tree. This is also quite useful when merging changes between branches, something that’s quite difficult in CVS.

Finally, the user interface provided by the Subversion client is simply nicer than the one provided by CVS. It’s more consistent, and generally easier to use. Enough things are similar to CVS that a CVS user can easily get up to speed, but the commands generally make sense to a new user, as compared to those of CVS which can be rather confusing.

How does Subversion compare with version controls other than CVS, BitKeeper for example has been in the news a lot recently. How about commercial products, like Visual SourceSafe or ClearCase?

I’ve personally never used BitKeeper, largely because of its license. While BK was available under a “free as in beer” license for use in developing open source software the license prohibited users from working on competing products, like Subversion. As a result I’ve never really had a chance to try it out.

I do think that BitKeeper has some interesting ideas though, and the other distributed version control systems (Arch, Darcs, Bazaar-NG, etc) are all on my radar. I don’t know if I’m convinced of their advantages over centralized systems like Subversion, but there is interesting work being done here. Personally, of the three distributed systems I just mentioned I’m most interested in Bazaar-NG (http://bazaar-ng.org/).

As for the commercial products out there, I’ve had personal experience with Perforce and SourceSafe. I wasn’t impressed with SourceSafe at all, and I really can’t think of a situation where I’d use it willingly. Perforce on the other hand is a very nice system. Its branching and merging support is superior to what Subversion provides at the moment (although the Subversion team has plans to close that gap in the future). That said, Perforce is expensive, and unless you really need specific features that can only be found there I wouldn’t see much reason to go with it.

You sound like you’ve had a lot of personal experience of where the right source control mechanism has saved your life. Any true tales that might help highlight the benefits of version control?

Personally, my most memorable experiences where version control would have been a lifesaver are those from before I started making use of it on a daily basis.

I know that back in college there were several times where I was working late at night on some project, usually due within a few hours, and I managed to screw things up badly. It’s remarkable how easy it is to go from a version of a program that’s mostly working to one that’s totally screwed up, all while trying to fix that last bug. It’s especially bad when your attempt to fix that last bug introduces more problems, and you can know longer remember exactly what you changed.

With a version control system, you never really need to be in that situation. At the absolute worst, you can always roll back to the version of the code you had at your last commit. It’s impossible to get stuck in that situation where you can’t figure out what you changed because the system will remember for you.

Now that all of my non-trivial projects (and most of my trivial ones honestly) make use of version control I just don’t find myself in those kind of situations anymore.

Existing developers will almost certainly need to migrate to Subversion - how easy is this?

It’s easy to underestimate the problems that come with migrating from one version control system to another. Technically, a conversion is usually pretty straitforward. There are various different programs available to migrate your data (cvs2svn for CVS repositories, p42svn for Perforce, and others), and in many cases it can be tempting to just run the conversion, toss your users some documentation and away you go.

Unfortunately, it isn’t that simple. Version control becomes part of a developer’s day to day workflow, and changing something like that has consequences. There needs to be careful planning and most importantly you need to have buy in from the people involved.

Another Subversion developer, Brian Fitzpatrick, will actually be giving a talk about this very subject at OSCON this year, and I’m looking forward to hearing what he has to say.

http://conferences.oreillynet.com/cs/os2005/view/e_sess/6750

Some versioning systems have problems with anything other than text. What file types does Subversion support?

Subversion, by default, treats all files as binary. Over the network, and within the repository, files are always treated as binary blobs of data. Binary diff algorithms are used to efficiently store changes to files, and in a very real sense text and binary files are treated identically.

Optionally, there are various ways you can tell Subversion to treat particular files as something other than binary.

If you want end-of-line conversion to be performed on a file, for example so it could show up as a DOS style file when checked out on windows but a Unix style file when checked out on a Unix machine all you have to do is set the svn:eol-style property on the file

Similarly, if you want keyword substitution to be performed on a file, so words like $Revision$ or $Date$ to be replaced with the revision and date the file was last changed on, you can set the svn:keywords property to indicate that.

The key fact to keep in mind is that in Subversion these are optional features that are turned off by default. By their very nature they require that Subversion make changes to your files, which can be catestrophic in some cases (changing all the \r\n’s in a binary file to \n’s isn’t likely to work very well), so you need to ask for this behavior if you want it. In systems like CVS these kind of features are turned on by default, which has resulted in countless hours of pain for CVS users over the years.

From reading your book, it’s obvious that Subversion seems a little bit more application friendly than CVS, integrating with Apache, emacs and others with a little more grace than CVS and RCS. Is that really the case?

Well, let’s be fair, CVS and RCS have quite good integration with various tools, ranging from Emacs to Eclipse. That said, it hasn’t been easy to get to that point. In many cases tools that want to integrate with CVS have to jump through hoops to call out to the command line client and parse the resulting output, which can be fragile. In cases where that isn’t possible many projects have reimplemented CVS so that it could be more easily integrated.

In Subversion many of these problems are aleviated by the fact that the core functionality is implemented as a collection of software libraries. If you want to make use of Subversion in your own code all you need to do is link against the Subversion libraries and you can provide exactly the same functionality as the official Subversion client. If you’re not working in C or C++ there are probably bindings for the Subversion libraries written in your language of choice, so you can even do this without having to learn a lot about the C level libraries.

Additionally, Subversion’s ability to integrate with Apache has provided a number of abilities, ranging from WebDAV integration to the ability to use an SQL or LDAP database for storing usernames and passwords, that otherwise would have been incredibly difficult to implement. By working within the Apache framework we get all of that for free.

Subversion includes autoversioning support for DAV volumes, could you explain how you could use that to your advantage?

DAV autoversioning is most useful when you need to allow non-technical users, who would be uncomfortable making use of normal Subversion clients, to work with your versioned resources. This could mean source code, but most commonly it involves graphics files or word documents and other things like that. Your users simply use a DAV client (which is often built into their operating system) to access the files, and the version control happens transparently, without them even knowing about it. When they save their changes to the file it is automatically committed to the repository. This is a very powerful tool, and can give you some of the advantages of version control without costly training for your users.

Most people use version control for their development projects, but I’ve also found it useful for recording configuration file changes. Is that something you would advocate?

Absolutely! I personally keep much of my home directory under Subversion’s control, allowing me to version my editor’s config files, .bashrc, et cetera. All the same benefits you can get from using version control with software development are just as applicable for configuration files.

How important do you think it is for high quality tools like Subversion to be Open Source?

I’m a big fan of using open source licensing and development models for infrastructure level software (operating systems, compilers, development tools, etc). Well, honestly I’m a big fan of using open source licenses and development models for most kinds of software, but I think it’s particularly appropriate for software at the infrastructure level, where the primary use case is building larger systems.

It’s difficult to imagine a company being able to make money developing a new operating system in this day and age, or a version control system, or a C runtime library. These are largely commoditized parts of the software ecosystem, and as a result I think it makes sense for the various people who benefit from having them available to share the cost for producing them, and the best way we currently have to do that is via open source.

Additionally, it’s difficult to underestimate the value of having high quality systems out there for people to learn from. I’ve learned a great deal by reading open source code, and even more by participating in open source projects.

Finally though, I just like the fact that if I find a problem in a piece of open source software like Subversion I can actually do something about it. I’ve worked with closed source third party software in the past, and I’ve found that I tend to spend a lot of time digging through inadequate documentation and beating my head against the wall while trying to work around bugs. With an open source product you can at least make an attempt to figure out what the actual problem is.

Contributing to Subversion in your spare time doesn’t seem like a very relaxing way to spend your free time. Is there something, less, computer based that you like to do?

I don’t know, there’s something fun about working on open source projects. It’s awfully nice to have the freedom to do things the “right” way, as opposed to the “get it done right now” way, which happens far too often in the commerical software world.

That said, I do try to get off of the computer from time to time. I see a lot of movies, read a lot, and lately I’ve been picking up photography. I also just moved to Silicon Valley, so I’m making a concerted effort to explore the area.

Anything else in the pipeline you’d like to tell us about?

Well, Subversion 1.2 is on its way out the door any day now, and that’ll bring with it some great new features, primarily support for locking of files, something that many users had been requesting.

As for my other projects, I’m going to be giving a talk at O’Reilly’s OSCON again in August. This year I’ll be speaking about the issues regarding backwards compatibility in open source software. I’ve also been spending a lot of time on the Lucene4c project (http://incubator.apache.org/lucene4c/), trying to provide a C level API to access the Apache Lucene search engine.

Garrett Rooney Bio

Garrett Rooney works for Ask Jeeves, in Los Gatos CA, on Bloglines.com. Rooney attended Rensselaer Polytechnic Institute, where he managed to complete 3 years of a mechanical engineering degree before coming to his senses and realizing he wanted to get a job where someone would pay him to play with computers. Since then, Rooney completed a computer science degree at RPI and has spent far too much time working on a wide variety of open source projects, most notably Subversion.

Computerworld Blogs

Computerworld have set up a dedicated blogging area on their site at Computerworld blogs

There are a few of us there; all dedicated to blogging on different news stories in a range of different areas and topics. You can read my blog at the dedicated Martin MC Brown Computerworld blog.

Alternatively, you can subscribe to my dedicated RSS feed.

You can see that we’ve been populated it over the last week or so; there are already blog posts from me, and others, about a variety of topics.

Please feel free to read and either comment there, or here and let me know how I’m getting on.

FOSS Anniversaries

In the last LinuxWorld article I wrote for the magazine I talked about FOSS anniversaries, mostly because a number of important projects turned into double figures, and yet most people let it pass them by.

Talk to young programmers and developers today and you’d be fooled into thinking that free/open source software (FOSS) was a relatively new invention. Those crusty old folk among us (myself included, born in that prehistoric era of the early ’70s) know that it goes back a little further than that.

Many of us become dewy-eyed about our memories of Linux when it first came out - or the first Red Hat release. In fact, many of the FOSS projects that we take for granted today are a heck of a lot of older than people realize.

And my final request:

To try and redress the balance I’m starting a FOSS anniversaries project. Initially it’s going to be held on my personal blog at http://mcslp.com - click on the FOSS Anniversaries link to go to the page. If I get enough interest, I’ll consider improving on it and moving it elsewhere. Until then, if you’ve got some additions or corrections, use the contact form to let me know.

Here is the FOSS Anniversaries page, which is on this site. If you want me to update anything, use the Contact page.

Session Tracking With Apache

My new piece on how to track user sessions on your website with Apache is available on ServerWatch.com. Here’s an excerpt:

Using HTTP logs to track the users who visit your site isn’t always as useful as you think it’s going to be. While metrics, like the total number of page hits and, within that, page hits over time or from a specific IP address, easily identify, they don’t always tell how people are viewing your site or answer specific questions the marketing department may pose.

This article looks at how to track progress through a site using an Apache module and provides answers to some of the more complex marketing-led questions that may be posed.

Read on for the rest of the article.

Kyle Rankin, Knoppix Hacks

Knoppix is not just another Linux distribution. Unlike many Linux alternatives, Knoppix doesn’t need to be installed; everything runs from a CD (called a ‘Live CD’ distribution). While Live CDs aren’t unique to Knoppix, it is the way the Knoppix CD is packaged that makes the difference. Knoppix includes intelligent hardware detection – it can automatically identify nearly everything on your machine and then make the bet of it – and the CD includes a wide selection of programs, from typical Linux applications through to repair utilities and tools.

I talked to Kyle Rankin, author of Knoppix Hacks about how the book idea was formed, how he chose the contents and some of the things you can do with Knoppix.

Knoppix HacksOK - I can’t make up my mind whether I’ve fallen in love with Knoppix or the Knoppix Hacks book. What lead to the production of this book?

A friend of mine works at O’Reilly heard that they were looking for someone to do a Knoppix book for them. Not too long before he had seen me use Knoppix at an installfest to resize someone’s Windows partition and set up Debian in a relatively short amount of time. He approached me with the news and encouraged me to send them a book proposal. I had never written a book before, but I personally used Knoppix a lot, especially as a recovery tool. I thought a Hacks book applied to Knoppix was a great idea so I started jotting down ideas and submitted a formal proposal for the book that was accepted. Add months of furious writing and Knoppix Hacks was born. I started the book liking Knoppix and finished the book absolutely loving it.

What impressed me most is the range and usefulness of the hacks - I immediately felt like trying them out, even if I didn’t want to image my partition. How did you pick the hacks that made it into the book?

Thanks. When writing the book, I realized that you could organize the ways that people use Knoppix into a few general categories: desktop use, a Linux installer, a systems administrator tool, a rescue CD, and as a platform to create your own live CD. We had a discussion about whether to make the book mostly focused on more advanced topics like system recovery, sysadmin hacks, and remastering, but decided that it since Knoppix was used by all sorts of people at many different skill levels, it made more sense to represent all of the different types of use in different chapters. In particular, when I wrote the Linux and Windows repair chapters, I tried to think of all of the different recovery scenarios that I have found myself in, and how I used Knoppix to fix it. My goal was to create a list of common recovery steps that a sysadmin in a jam would reach for before anything else. Along the way I discovered some really clever recovery techniques you could use Knoppix for that I hadn’t known about previously (like Windows registry hacking).

Knoppix is obviously a practical way to do a great many things; can it also be used as a general desktop OS?

Knoppix was actually originally created just to be a portable Linux distribution for Klaus Knopper to take with him to different computers. From the very beginning it was intended first and foremost to be a desktop OS. The excellent hardware detection makes it much easier to take the CD from computer to computer, and there are a number of scripts in place that allow you to keep your settings no matter what computer you are in front of.

What do you do about user storage. Can I use a USB key for example?

Yes, you can use basically any writable media you might have (that Knoppix can detect) to store user files including floppy drives, hard drives on the system, and USB keys. There are a few different scripts included with Knoppix that automate the process of storing data to writable media so it’s really just a matter of a few clicks to save settings. Then you just use a cheat code when you boot Knoppix to tell it to restore your settings the next time you boot. 5. Staying on the topic of alternative storage mediums, is it possible to use Knoppix on DVD, USB Key or smaller storage mediums, like Compact Flash? Knoppix can be remastered and used on a DVD and in fact there are a few Knoppix variants that have done just this. In fact, Klaus Knopper has announced his intention to start shipping a formal DVD version of Knoppix as soon as this summer. Knoppix is pretty large, so the process of stripping it down to smaller media such as a USB key or flash drive can be difficult. Luckily there already are a number of other distributions such as Feather Linux that make it easy to set up and use on a USB key.

Is there any reason why I shouldn’t simply write my Knoppix image to my hard disk and never use the CD ever again?

A number of people have installed Knoppix to a hard drive as a permanent solution over the years, and in fact there is a nice GUI that automates the process. However, Knoppix was designed to be run from CD-ROM and Klaus mixes packages from a number of different Debian repositories. This can make upgrading in the future quite a headache so I generally recommend people to immediately dist-upgrade to Debian Sid if they install Knoppix (and I include a hack in the book that talks about how to do this). Alternatively there are other distributions that make Debian easy to install like Ubuntu and Kanotix that are also much easier to upgrade.

Some of the tools represent what can only be classed as an administrators dream. Image partitioning, copying and repair tools are all on the Knoppix CD. Could you tell us a little more about these hacks and how they can be exploited?

It’s actually pretty amazing how many different administrator tools Knoppix includes. Some of the things that really surprised me were the complete Apache and BIND servers that were included on the CD so in a pinch you could set up a number of different emergency servers. A friend of mine actually used this idea when a webserver of his was hacked into. He needed to be able to serve the pages while not having the server actually be up and running, so he booted Knoppix and served the web pages directly from its Apache server. It’s especially interesting to introduce Knoppix to a systems administrator who is mostly used to proprietary (and often expensive) Windows admin tools. You can use dd or partimage to image disks locally or over the network, you can graphically resize partitions on the fly with QTParted, you can scan systems for viruses and rootkits, perform forensics scanning, wipe hard drives, plus a number of other things all from this single free CD. Also, Knoppix makes for a great sanity check when you suspect hardware is bad. You can not only test the RAM, but you can also test hardware from the CD.

The Knoppix idea seems so obvious - does it surprise you that it’s a relatively recent invention?

Over time there have been a number of different rescue floppies and CDs like tomsrtbt and the old LinuxCare bootable business card, but what continues to surprise me with Knoppix is just how incredibly flexible and useful the CD is. You can use it to demo Linux to a newcomer, fix a broken Windows system, and scan a Linux server for rootkits all from the same CD. There are hundreds of different Live CDs out there, many based on Knoppix, but I’ve found that I keep coming back to Knoppix for day-to-day use just because of how flexible it is.

I’m hoping you have a Knoppix Hacks Volume 2 in the works?

Well, I have actually recently finished a Knoppix Pocket Reference for O’Reilly that should be out in July. As the name indicates it is much more referential and even though it is small, covers a lot of ground that Knoppix Hacks didn’t cover while containing a lot of the sort of Knoppix tips you’d want to carry around in your pocket. As far as a second edition of Knoppix Hacks, Knoppix continues to add interesting functionality (for instance, I can think of a number of really powerful Hacks you can do just with the new UnionFS system in 3.8) so a second edition is a possibility but nothing is officially planned or anything.

Obviously MacGyver is a favourite, but is there anything else you like to watch when relaxing?

What is this ‘relaxing’ you speak of? Actually when I’m not working or writing, I like to watch the Daily Show and have been a long time fan of the Simpsons. I’ve noticed my wife and I have been watching more movies these days than TV shows (probably Netflix has something to do with that). Any remaining free time I have seems to be absorbed by IRC.

Kyle Rankin Bio

Kyle is a system administrator for The Green Sheet, Inc., the current president of the North Bay Linux Users Group, and the author of Knoppix Hacks. Kyle has been using Linux in one form or another since early 1998. In his free time he does pretty much the same thing he does at work–works with Linux.

David Sklar, Essential PHP Tools: Modules, Extensions, and Accelerators

Hardening Apache
PHP is a popular web development/deployment platform and you can get even more out of the platform by using the extensions and tools available on the web to extend PHP’s capabilities. I talk to David Sklar, author of Essential PHP, about his new book and PHP development.

Why do you use PHP?

It’s proven itself to be a flexible and capable solution for building lots of web applications. I’m a big fan of the "use the right tool for the job" philosophy. PHP isn’t the right tool for every job, but when you need to build a dynamic web app, it’s hard to beat.

Could you tell me what guided your thoughts on the solutions you feature in the book?

They’re solutions to problems I’ve needed to solve. Code reuse is a wonderful thing and PEAR makes it easy. It’s a frustrating waste of time to write code that does boring stuff like populate form fields with appropriately escaped user input when you’re redisplaying a form because of an error. HTML_QuickForm does it for you. The Auth module transparently accomodates many different kinds of data stores for authentication information. One project might require a database, another an LDAP server. With PEAR Auth, the only difference between the two would be one or two lines of configuration for Auth.

Do you think PHP provides a richer environment for Web publishing than, say, Perl or Python?

I don’t know much about Python, so I can’t compare it with PHP. I know a moderate amount about Perl, so I can (moderately) compare it with Perl. (And if those caveats aren’t enough, I’ll also add that "environment" is a loaded term — I suppose it could encompass not just the functions and libraries in a language, but IDEs, debugging and deployment tools, and so on.)

The big difference for me, when it comes to web development, between PHP and Perl is that the PHP interpreter assumes that a given program is going to be generating a web page (unless you tell it otherwise), while the Perl interpreter assumes (again, unless you tell it otherwise) that a given program is going to read a bunch of stuff from standard in, mess with it, and print it to standard out.

In PHP, you don’t have to do anything special to access form, cookie, and URL variables — they’re in the auto-global arrays $_POST, $_COOKIE, and $_GET. Similarly, HTTP headers are in $_SERVER. The PHP interpreter emits a Content-Type: text/html header unless to tell it to do something else. In Perl, you have to go through some rigamole (admittedly, just a little bit of rigamarole) to do that web-centric set up.

(The flip of this, of course, is that if you want to write a program in PHP to munge files, you have to do more work than in Perl.)

Perl is a great programming language and you can use it to solve web programming problems quite capably. So is PHP.

You seem to be a fan of Web Services, do you see them as simply a useful tool, or a more serious way of providing services over the web?

Like many things, promise_of("Web Services") > current_usefulness("Web Services"). A lot of the neat stuff about SOAP – automatically generating WSDL from classes and encoding and decoding complex data types is more difficult in PHP because of PHP’s loosey-goosey type system. Nevertheless, I think SOAP can be great in situations where you need custom data types and you have sharp separations between the folks who implement and maintain the functionality being exposed by SOAP and the folks who use those functions. When you have control over both ends of the conversation, or don’t need to encapsulate such complicated relationships in your data structure, XMLRPC or just a homegrown RESTful interface is fine.

Security is vital part of web programming, particularly when working with forms and other data. Any tips?

htmlspecialchars(): encode external input with htmlspecialchars() or htmlentities() before printing it to avoid cross-site scripting and cross-site request forgery attacks. Not doing this is probably the most widely committed PHP (and web application development) security error.

Similarly, encode external input before putting it into your database. PEAR DB’s placeholders do this for you automatically, which is a great convenience. Each database extension has its own function for doing this, and there’s the generic addslashes() function as well.

In the larger security scheme of things, I would also encourage developers to think of security as a process, not as an end state. The place you want to get to is not that your application is "secure," but that it is "secure enough." The specific definition of "secure enough" depends on how much time and money you have, what kind of data your application is dealing with, and what the consequences are if something goes wrong.

There are, certainly, some security-related practices that are so easy to implement and so catastrophic if you don’t (like escaping external input before printing it or putting it into the database) that you should always do them. But thinking about security means evaluating tradeoffs.

You cover a number of different code caching solutions, how much time can you really save using these systems?

The benchmarks in the book indicate about a 280% speedup. The specific speedup you get varies with your applications behavior, so I’d advise anyone considering code caches to test them with an actual application you’re going to use. It’s a really easy way to get a performance boost, though, since you don’t have to edit any of your code – just install the code cache, restart your web server, and you’re all done.

Do you have a favourite PHP tool?

That’s a tough question. My favorite PHP function is strtotime() but I don’t know if that qualifies as a tool. I like the XDebug extension a lot. I do most of my coding in XEmacs but I’ve started to play around with IDEs like the Zend Studio and Komodo, so one of those might become my favorite tool sometime soon.

Your preferred platform for PHP deployment?

Apache 1.3 running on Linux. It’s stable, flexible, and you can’t beat the price tag.

Any thoughts on PHP5 you’d like to share with our readers?

If you’ve never used PHP before, now is the time to start! With PHP5, you get all of the great things about PHP 4 — comprehensive function library, incredibly easy deployment of web applications, connectivity to lots of different database programs. Plus, you get all of the goodies that the new version brings: robust Object Oriented programming support, revamped XML processing that makes it a snap to parse simple XML documents and gives you the full DOM API when you need to do XML heavy lifting, and bells and whistles like exceptions, iterators, and interfaces.

What advice would you give to anybody considering PHP as their development platform?

Make a personal or hobby project your first PHP application, something like keeping track of your books or CDs, a personal URL bookmark database, or league statistics for your kids’ soccer games. Your first app isn’t going to be perfect. It will have security problems, it won’t be as fast as it could be, the database schema won’t be optimized and so on. But that’s fine. Just get a feel for what PHP can do. Make your second project the one that matters for your job or whomever else is counting on you.

What made you start up PX?

It was definitely a case of scratching one’s own itch. When I started it, there weren’t a lot of places to look for code that someone else had written in PHP to solve a certain problem. The site gets very steady usage — it’s nice to see folks continuing to turn to it for solutions.

It’s nice to see another IT-savvy cook, do you have a particular culinary speciality?

I’m flattered that you called me an "IT-savvy cook" instead of a "cooking-savvy programmer"! I recently got a slow cooker, so I’ve been trying lots of new things in that. I also like baking and making desserts: even if something goes wrong so that the results are not cosmetically perfect, they still taste good.

David Sklar Bio

David Sklar is an independent consultant specializing in technical training, software development, and strategic planning. He is the author of Learning PHP 5 (O’Reilly), Essential PHP Tools (Apress), and PHP Cookbook (O’Reilly).

After discovering PHP as a solution to his web programming needs in 1996, he created the PX (http://px.sklar.com), which enables PHP users to exchange programs. Since then, he has continued to rely on PHP for personal and professional projects.

David is an instructor at the New School University and has spoken at many conferences, including the O’Reilly Open Source Conference, the EGovOS Open Source/Open Standards Conference, and the International PHP Conference.

When away from the computer, David eats mini-donuts, plays records, and likes to cook. He lives in New York City and has a degree in Computer Science from Yale University.

Tony Mobily, Hardening Apache

Hardening Apache
It is the administration task we love to hate: securing a website. Apache forms the backbone of most websites so it makes sense to start there. In Hardening Apache, Tony Mobily does just that, starting with the basics of creating of a secure Apache installation and moving on to more in depth techniques for securing Apache installations from attack. Let’s see what Tony has to say when I talk to him about his new book and how to approach security, Apache and otherwise.

One of the key elements I get from your book is the back to basics approach. For example, I know a lot of companies with extensive login systems that leave their server room doors wide open. Do you it’s best to work from the inside out or the outside in when setting up security?

I believe that you always need to get the right person for the job. For example, if you need to re-tile your bathroom, you don’t call a wood worker. It’s the same with computer security; "physical" security (e.g. preventing people from breaking in) and "logical" security (preventing crackers and script kiddies from using your servers and resources) are very different things which require very different skills and training.

In this field - in fact, in any field - improvisation is just not an option.

If a company asked me to secure their physical network, I would redirect them to Steve, a friend of mine who does just that. Steve tells me amazing stories of sniffing packets by just placing a device next to the cable, for example, or other stories which I would see nicely in a James Bond movie rather than real life.

Even "logical" security branches out! I wouldn’t be able to audit the source code of a complex program, for example.
The problem is that even though improvisation shouldn’t be an option, it still happens. When a manager installs updates on a Unix system, or (worse) a service pack on a Windows machine, he is improvising and putting his systems at risk - full stop.

To go back to the question, security is a problem that has to be faced as a whole. To connect to the example I made earlier, a good physical design will prevent problems such as random people getting to close to a network cable and sniffing packets, or people accessing the servers’ consoles. On the other hand, a good logical design will mean that any piece of information will be encrypted, and if intruders did manage to access the cable, they won’t be able to do anything with the collected information.

Apache 1.3.x or Apache 2.x?

For me, there is no doubt: Apache 2.x.
It’s not just a matter of wanting to use the latest piece of software at any cost.

The problem with security is that often you are tied to the Apache version you are using. For example, if you use Apache 1.3.x for long enough on a complex web site, eventually you will be using a number of modules which are only available on Apache 1.3.x. In this common situation, upgrading to Apache 2.x can be really hard and might even require redesigning some parts of your web sites in order to use different technologies. The longer you leave it, the harder it will be to actually upgrade.

The problem is that eventually, you will have to upgrade because the 1.3.x branch of Apache will no longer be supported and patched anymore. It might not be soon, but it will happen. A lazy system administrator, at that point, will find himself (or herself) with an unpatchable system and, what’s worse, he or she won’t be able to upgrade without majorly disrupting the hosted web sites.

You make good use of the warnings and notifications made by sites like CVE and ApacheWeek. Are these sites that Apache administrators should be checking regularly?

Yes, absolutely.
Checking sites like ApacheWeek is both necessary and boring. I think there is also fear - sometimes you are just about to go on holiday or go home, and you discover that your production server has a security hole as big as a crater, and you urgently need to recompile the whole thing!
These sites are crucial to make sure that system administrators don’t live in their own "little world", and can realize that software is not just something they install on their computers and it works; software changes, evolves, improves, stumble across problems, and so on.

You use a lot of sample exploits to demonstrate weaknesses. Is it worth creating a tool-kit for checking these exploits against your site?

Writing such a tool-kit is a good idea in theory. In practice, however, there isn’t really much point because you know that if you upgrade your Apache server when you need to, then the security problems will be fixed.

What would you say was the weakest part, security wise, of most websites ?

That’s a hard question! It took me a while to work out what the most sensible answer is: the weakest part is the lack of maintenance and upgrade.
The problem is that keeping a system updated is hard work. If you manage 40, 50, or 150 Unix systems, then keeping up with all of them does require a whole lot of skills, because at that point the shell is just not good enough. You need to use something like CFEngine to configure them, and other automated tools to keep an eye on their security.

Here is an example: I have my own server, where I host my personal web site, my friends’ email, their small sites and so on.
I receive my email from LogWatch every day.

Today, it read:

**Unmatched Entries**
Illegal user patrick from 161.53.202.3
Illegal user patrick from 161.53.202.3
Illegal user rolo from 161.53.202.3
[…]
Illegal user john from 161.53.202.3
Illegal user test from 161.53.202.3
Illegal user merc from 151.31.36.81

Normally, I would run whois, find out who manages those networks, and report these attempts. Well, today I simply didn’t have the time. I am writing this answer on a train to London. Tomorrow I will be in Brunei, and in three days I will be back to Perth. My Internet connection is expensive and erratic. So here I am, Mr. 161.53.202.3 tried to attack me and he won’t be reported. And that’s only one person (me) with only one server!

Do you advocate the use of ‘cracker’ tools for testing?

That’s another tough one.

Well, I don’t advocate the /use/ of such tools. However, I am strongly against making these tools illegal.

Crackers and script kiddies, at the end of the day, are our friends (!). If you compared the Internet to an living organism, they are like those nasty (biological) viruses which occasionally knock you down and give you a tremendous sore throat, but are necessary to keep your body alert and your antibodies "trained". Also, if you catch a cold you can’t blame it completely on the virus - you’ve got to wonder if your body is healthy.

I believe it’s the same with the Internet: crackers will randomly try and get into your system (literally!). You have to make sure your defences are strong enough and well organized, so that when that happens you are prepared.

Some big companies won’t accept that. They will try to make tools such as Nexus illegal. Why? Maybe because they think that if such tools are not available anymore, then crackers will simply disappear. Or who knows, maybe they would like to sell testing tools to certified companies for a lot of money…

You cover quite a few security modules. Which would pick, and why, as the best modules?

The best and most useful module in my opinion is mod_dosevasive written by Jonathan A. Zdziarski. I believe Jonathan deserves a monument dedicated to him, also because he wrote DSPAM (which saves my life on a daily basis).

I believe that it should be part of the default Apache installation - in fact, I wonder if the Apache group would.

Tell us what your ‘Apache in Jail’ chapter is all about.

Well, jailing can be extremely complicated, but at the same time it is a very powerful tool against crackers.

Thanks to the system call "chroot()", you can tell a program what the root directory is when it runs. For example, you could run Apache making it believe that the root directory ("/") is "/cage/apache". This means that Apache will not be able to see anything outside "/cage/apache" - which is while you say that it’s "jailed". If a cracker does manage to use a buffer overflow exploit against your server, and get Apache to execute arbitrary commands, there will be nothing in /bin or /sbin to be executed, because /cage/apache/bin and /cage/apache/sbin will be nearly empty!

In my book, I tried to explain how to "jail" Apache step by step, by trying to make the readers aware of why and how everything was done. This deep understanding is necessary, because it is really quite tricky to use more complex software and third party modules on a jailed Apache.

You have some unusual outside interests. How did you end up sharing your life between Apache security with Jazz and Ballet?

At the moment it looks like I have a broken knee and I haven’t danced in ages (2 months), which is very sad. Classical Ballet (I am hopeless at jazz) has become part of me. As you can imagine, I spend a lot of time sitting down in front of a computer. Dancing is my escape: I love classical music, and I love feeling fit. You see, when you are training at ballet, you are a sort of a super-human: you never get tired, you are very flexible, and you generally feel good (and as this is a serious interview, I won’t mention the fun of making six pirouettes suddenly in the middle of the footpath while having a stroll with friends).

It’s funny, because I never considered the two things (computers and dancing) to be in contrast.

Anything in the pipeline?

Well, right now I am working on "Free Software Magazine" (www.freesoftwaremagazine.com), a magazine which concentrates entirely on free software.
It has been amazingly challenging. The first issue (January 2005) required a huge effort from many people, but the result is really rewarding.

Tony Mobily Bio

Tony Mobily is the project coordinator of Free Software Magazine.

When he is not talking about himself in the third person, Tony Mobily, BSc, is an ordinary human being, enjoying his life in the best city in the world: Perth (Western Australia). He is a senior system administrator and security expert, and is knowledgeable in several internet technologies. He loves Linux, Apache, Perl, C, and Bash.

Tony has been in the publishing industry his whole life, starting from the Italian magazine Dev. (he is lucky enough to be bilingual) in 1996.

He is also trained in Classical Ballet (ISTD), and fighting his way through learning hip hop and jazz. He also writes short and long stories, and keeps a blog at http://www.mobily.com.

Matthias Warkus, The Official GNOME 2 Developer’s Guide

The Official GNOME 2 Developers Guide
Install Linux and the chances are you’ll be given the choice between a GNOME or KDE desktop. GNOME is the better known of the two, but if you want to development applications that use the GNOME environment where do you start? Well a good place would be Matthias Warkus’ new book, The Official GNOME 2 Developers Guide. I talk to Matthias and ask him about the GNOME system and environment, along with one or two other topics.

Could you describe to us what GNOME is?

GNOME is one of the leading projects developing user-friendly free software. The GNOME community effort includes the GNOME Desktop & Developer Platform, probably the most advanced free desktop environment around, translations, documentation and many third-party applications.

What you actually see on a computer said to be "running GNOME" is a tightly integrated, no-frills desktop system, on par with any commercial offering.

What is the benefit of the GNOME system over more traditional window managers, like Motif?

Actually, neither GNOME nor Motif are window managers, though both include one :)

The difference is so huge it’s hard to decide where to start. Not only is GNOME’s basic GUI technology (GTK+) much more advanced than the Motif toolkit (it can, for example, display right-to-left scripts such as Hebrew or CJK scripts such as Chinese), but the overall goal of the system is much more ambitious. What GNOME is trying to do is to integrate all system components well, and not in the traditional Unix way of providing a default that will work in 90% of all cases, whereas in all other cases, something has to be fixed by hand; GNOME intends to completely and "Just Work" in all supported environments.

You can witness this sort of integration in the new GNOME support for removable media. Whatever you insert or plug into the system, be it a CD, DVD, digital camera or USB stick, it will instantly be recognised and an appropriate window to access it will be opened.

GNOME seems to encompass a lot more than window dressing. Using your book I was able to create quite complex applications with some fairly advanced widgets with less than a hundred lines of code. Is this fairly common of the GNOME environment?

The GTK+ library stack, which sits at the core of GNOME, includes very powerful widgets, such as the file chooser, colour picker etc., but especially the text and tree/column view widgets based on the model-view-controller paradigm.

Other GNOME libraries bring even higher-level functionality. GNOME tries as much as possible to prevent programmers from reinventing the wheel.

You’ve managed to get a good balance in the book between the examples and the reference material. Do you have a favourite example from the book?

I suppose my favourite example would be the GdkPixbuf demo (pp. 132-136), a little thingy that lets you can change the scale and saturation of an image with two sliders. I think it’s less than 300 lines, half of which is comments and whitespace. Very neat example, and impressively small in size considering it’s pure C and not something higher-level such as Python or Java.

Could you tell us a bit more about gconf?

GConf is the solution to the recurring problem of where to store and how to sensibly process configuration values. It’s a self-documenting, typesafe database with a tree structure that is usually saved in XML. Applications connect handlers to GConf keys, and any changes to a key, whether from the app itself, another instance of it or an external configuration tool, will at once apply to all running instances. There is a GConf editor to centrally change all system settings. Default settings can be provided and mandatory settings can be enforced centrally, for all users. The new GConf editor includes special administrator functions to do this. This is essential for the large installations where GNOME is getting popular these days: There are organisations rolling out GNOME on several tens of thousands of desktop computers.

I suppose you could call GConf the Windows registry done right. People used to hate central configuration databases because the nightmare that is the Windows registry was the only one they knew. GConf is starting to change that. It’s really a good idea.

I’ll admit to being new to GnomeVFS. Is this something that could be adopted wider amongst the Linux community?

Actually, because GNOME is not Linux-only, it would need to be adopted across a broader set of platforms. Perhaps the people working on cross-desktop standards specifications at freedesktop.org will make this real one day, who knows?

Anyway, GNOME-VFS is a very nice interface to access files in a network-transparent and asynchronous way. I think it’s performance has improved a lot over the last months, too. Writing a GNOME application, there’s no reason to use the old libc file access functions anymore; using GNOME-VFS, your application will, at no extra cost, be able to process remote files as well as local files.

Your book focuses on the C API. Are there any other alternatives?

GNOME has officially supported language bindings for C++ (making use of all C++ features in the canonical way, unlike, for example, Qt), for Java, Perl and Python. Especially Python is popular as an RAD language; in combination with the Glade user interface builder, you can write productive GNOME applications in no time flat.

Unofficial language bindings exist for many languages, including exotic ones; there are bindings for (at least) C#, D, Eiffel, Erlang, Euphoria, Felix, Gauche, Guile, Haskell, JavaScript, Objective-Caml, Pascal, Pike, PHP, Ruby, Scheme, S-Lang, Smalltalk, Tcl, TOM and XBase, though the degree of support varies widely.

Programming GNOME in C# is becoming popular, and the Ruby bindings do also seem to have some success.

What do you think of KDE?

GNOME would not exist without KDE. Probably free software’s desktop ambitions wouldn’t be as visible as they are today. We all owe a great deal to the KDE project.

Being used to GNOME, most KDE applications look confusing to me. I like GNOME’s philosophy of keeping user interfaces as lean as possible. One example: In the default setting, KDE’s file manager presents so many toolbar, sidebar and status bar icons to me that I instinctively want it to just go away again.

I also like GNOME’s way of keeping the number of distinct user-visible components low by integrating new functionality into existing applications. Work is currently being done on a CD/DVD burning framework that will integrate audio CD burning into the audio player etc.; unlike KDE, we don’t think writing an all-singing, all-dancing Nero clone with four different configuration dialogues, several toolbars and theme-able icons is the way to go. Don’t get me wrong - I seriously love the job they did on the underlying functionality and I use K3B all the time. But I don’t really think the interface is appropriate.

Is there a future for both alternatives, or do you seem some kind of merging in the future?

GNOMEs well-integrated, no-nonsense desktop with the excellent Evolution groupware client and several administration and lockdown features seems to be better suited to the large-scale free software desktop deployments we are seeing at the moment than KDE.

I don’t think that KDE will ever go away. Neither will GNOME, for that matter. Some years ago, many KDEers kept telling GNOME to just fold and merge with KDE; I don’t hear this anymore. With different views on how user interfaces should look and work and how functionality should be distributed across the system, there is a place in the world for both.

What’s your favourite cartoon character?

Hard question. I suppose that would be Piro from MegaTokyo, with Warren from Absurd Notions and Cerulean the Dragon from "Why the long face" a close second and third.

What are you working on at the moment?

I’m not really working on any GNOME-related things at the moment. My focus is on getting on with my studies; I’m in my fifth semester of philosophy, sociology and French, and my main activity is to learn ancient Greek, which is taking up much of my spare time.

I intend to review the original German edition of my book for an eventual revised and extended second edition, but it seems it’ll be hard to find a publisher, and I haven’t got the time at the moment anyway. I hope someday I’ll have more time to consecrate to working and writing on GNOME.

Matthias Warkus Bio

Matthias Warkus was born and raised in one of the most rural regions of Germany. He started using Linux out of sheer boredom at the age of sixteen. Shortly afterwards, he got involved with GNOME, first as a translator, later also doing promotional work, holding many GNOME-related talks in Germany. He considers himself to be better at writing than at coding, and thus went on to write the Official GNOME 2 Developer’s Manual. Currently, he is a student of philosophy. When he’s not struggling with lofty theories in class or discussing them with his friends in one of Marburg’s countless pubs, he enjoys reading, writing and playing the piano.

All the MCB Guru blogs that are fit to print