Tom Jackiewicz, Deploying OpenLDAP

OpenLDAP is the directory server of choice if you want a completely free and open source solution to the directory server problem. Tom Jackiewicz is the author of Deploying OpenLDAP, a title that aims to dissolve many of the myths and cover the mechnanics of using OpenLDAP in your organization. I talked to him about his book, his job (managing OpenLDAP servers) and what he does when he isn’t working on an LDAP problem.

Deploying OpenLDAPCould you summarize the main benefits of LDAP as a directory solution?

There are many solutions to every problem. Some solutions are obviously better than others and they are widely used for that reason. LDAP was just one solution for a directory implementation. Some people insist that Sony’s BetaMax was a better solution than VHS–unfortunately for them, it just didn’t catch on. The main benefit of using LDAP as a directory solution is the same reason people use VHS now. There might be something better out there but people haven’t heard of it, therefore it gets no support and defeats the idea of having a centralized directory solution in place. Bigger and better things out there might exist but if they stand alone and don’t play well with others, they just don’t fit into the overall goals of your environment.

If you deploy any of the LDAP implementations that exist today, you instantly have applications that can tie into your directory with ease. Because of this reason, what used to be a large scale integration project becomes something that can actually be accomplished. I’m way into standards. I guess LDAP was simple enough for everyone to implement and just caught on. If LDAP existed in the same form it does today but another directory solution was more accepted, maybe I’d be making arguments against using LDAP.

Please read the rest of the interview at LinuxPlanet.

Finding alternatives in developing software

My latest article over at Free Software Magazine is available. This time, I’m looking at the role of free software in development, both of free and proprietary applications. I discuss the benefits of free software and the pitfalls of proprietary solutions. Here’s an extract of the intro:

Developing software within the free software model can be achieved with all sorts of different tools, but choosing the right tools can make a big difference to the success of your project. Even if you are developing a proprietary solution, there are benefits to using free software tools to achieve it. But what free software tools are available? In this article I’m going to look at the development tools available, from languages and libraries to development environments, as well as examining the issues surrounding the use of free software tools by comparison to their proprietary equivalents.

You can read the full article.

Patrick Koetter, Ralf Hildebrandt, The Book of Postfix

Postfix is fast becoming a popular alternative to sendmail. Although it can be complex to configure, it’s easier to use Postfix with additional filtering applications, for example Spam and virus filters, than with some other mail transfer agents. I spoke to Patrick Koetter and Ralk Hildebrandt about The Book of Postfix, the complexities of configuring Postfix, Spam, and email security.

The Book of PostfixHow does Postfix compare to sendmail and qmail?

Ralf Hildebrandt (RH): As opposed to sendmail, Postfix was built with security in mind.

As opposed to qmail, Postfix was built for real-life systems in mind that have to adapt to the hardships of the Internet today. qmail is effectively unmaintained.

Patrick Koetter (PK): That’s a tough question because I am not one of those postmasters who spent half their life working with Eric Allman’s Sendmail nor did I spent too much time enlarging my knowledge on qmail, so I can’t give you an in detail answer that will really tackle specific features or functionalities.

Let me give it a different spin and try if that answers it:

When I took out to run my first own mailserver I looked at Sendmail, qmail and Postfix.

Sendmail to me was too complicated to configure and since my knowledge of the M4 macro language was very little, but my fear of loosing e-mail or even configuring my server to be an open relay was large I dropped it. The ongoing rally of CERT Advisories about this or that Sendmail exploit by then didn’t make it a hard choice.

Then I took a look at qmail, but wasn’t really sure I wanted it because it is more or less a series of patches if you want to use it with nowadays feature range. But I gave it a try anyway and ended up asking some questions on the mailing list because the documentation would not answer what I was looking for.

To cut it short: I was under the impression you had to enter the “Church of qmail” before anyone would take the time to answer a question to a qmail novice. It might have changed since then, but back then I left and I never looked back because all I wanted was to run a MTA.

Finally I took a look at Postfix and was very surprised by the amount of documentation that was available. I also immediately fell in love with the configuration syntax, which seemed to simple and clear to me. For a while I thought this must be a very feature limited MTA, but the more I read the more I understood that it did almost the same things, but was simply easier to configure.

I finally decided to stick with Postfix after I had joined the Postfix mailing list and found out that people really cared for my questions, pointed me to documentation to read again or give me advice on how to do this or that more efficient.

Of course, as the Postfix community grew larger, one or the other character turned up who would rather lecture someone seeking help, but the overall impression still remains the same.

Postfix is well maintained, its security record is unbeaten up to now and the community is how I wished every community supporting a software should be. The modular software architecture Wietse Venema has chosen makes it easy to expand Postfix’ capabilities. Its a system that can grow very well. I haven’t seen another piece of software that does the complex job of being a MTA that well.

Postfix seems a little complex to install - there are quite a few configuration files, some of which seem to contain arcane magic to get things working. Is this a downside to the application?

PK: That’s the provoking question, isn’t it? ;)

To me Postfix is as simple or complex as the process of mail transport itself is. That’s why we added so many theory chapters to the book that explain the e-mail handling process before we took out to explain how Postfix does it in the follow-up chapter. If you understand the process its pretty straightforward to configure Postfix to deal with it.

But basically all you need is three files, main.cf, master.cf and the aliases file. Wait! You could even remove the main.cf file and Postfix would work with reasonable defaults on this specific server.

The main.cf file carries all parameters that are applied globally. If you need options that are specific to a special daemon and should override global options from main.cf, you add them in master.cf in the context of that special daemon. That’s the basic idea of configuring Postfix.

Then there is a lot of tables in the /etc/postfix directory, which you usually don’t need unless you take out or configure a specific feature that isn’t part of basic functionality.

Sure, the amount of tables might frighten a novice, but then they are there for the sole purpose of supporting a novice and even advanced users because they hold the documentation about what the specific table is about and how you would add entries to the table if you wanted to use it.

The rest is complexity added by additional software, for example Cyrus SASL which is a royal pain for beginners.

Of course your mileage will vary when you take out to configure a full blown MTA that incorporates Anti-Spam measures, Anti-Virus checking, SMTP Authentication and Transport Layer Security, where Postfix looks up recipient names and other information from an LDAP server that also drives an IMAP MTA.

But when you begin it boils down to the two configuration files and an aliases file.

As for the “arcane magic” I don’t exactly understand what you relate to so I do some speculation based on my own experiences.

I struggled with smtpd_*_restrictions for quite a while until I realized: “Its the mail transport process that makes it so hard to understand.” Once you’ve understood how a SMTP dialog should be processed it suddenly seems very simple. This is at least what happened to me. I recall hours sitting in front of these restrictions, Ralf ripping hair out of his head and looking at me as if I was from another planet.

The quote we used in the restrictions chapter alludes to that day and it also contains the answer I came up with: “To know what to restrict you need to know what what is.” I looked the “what” parts up in the RFCs, understood what smtpd_*_restrictions were all about and saved Ralf from going mad ;)

But that’s specific to smtpd_*_restrictions. For all other parameters and options it pays to read RFCs as well, but you can get very far by reading the excellent documentation Wietse has written _and_ by looking at the mere names he used for the parameters. Most of the time they speak for themselves and tell you what they will do. I think Wietse has done a great job at thinking of catchy self-explanatory parameter names.

RH: Postfix works with the default main.cf and master.cf. If you have advanced requirements, the configuration can get elaborate. But configuration files like I created them and also offer them on http://www.stahl.bau.tu-bs.de/~hildeb/postfix/ have evolved over several years of use (and abuse of the Internet by Spammers) - I never thought “That’s the way to do it”, but it was rather “trial and error”.

Postfix seems to work exceptionally well as a mail transport agent - i.e. one that operates as an intermediate relay or relayhost (I’ve just set up a Postfix relay that filters spam and viruses, but ultimately delivers to a sendmail host, for example). Is this because of the flexible external interface Postfix uses?

RH: It also works excellent as a mailbox host :) Over the years, Wietse added features for content filtering and the ability to specify maps that tell the system which recipient addresses should be accepted and send on further inwards.

That makes it easy to say “Instead of throwing away our old Product-X server, we simply wedge Postfix in between”

But there’s no special preference as “intermediate relay”. It’s an universal MTA. We use it everywhere. Also for the server handling the mailboxes. Or our list exploder.

Do you have a preferred deployment platform for postfix?

PK: Basically I go for any platform that suits the needs. As for Linux I prefer distributions that don’t patch Postfix, but that’s only because I support many people on SMTP AUTH issues on the Postfix mailing list and some maintainers have taken to do this or that different, which makes configuring SMTP AUTH even harder.

Personally I’d go for RedHat Linux because I know it the best and produce good results faster as on other platforms. But then I wouldn’t hesitate a second to go for something else if it suits the scenario better. That’s another side of Postfix I like very much: It runs on many many systems.

RH: Debian GNU/Linux with Kernel 2.6.x. Patrick begs to differ on the Debian thing. Anyway, it works on any Unixoid OS. I ran it on Solaris and HP-UX back in the old days.

You cover the performance aspects of Postfix. Is it particularly taxing on hardware?

PK: That’s a question that turns up regularly on the Postfix mailing list. Read the archives… ;)

But seriously, you can run Postfix for a single domain on almost any old hardware that flies around. If your OS works with the hardware Postfix will probably go along with it as well.

The more domains you add the more mail you put through the likelier of course that you will get to the limits. But those limits usually aren’t limits imposed by Postfix, but by the I/O performance of your hardware.

Think of it this way: Mail Transport is about writing, moving and copying little files in the filesystem of your computer. The MTA receives a mail from a client and writes it to a mail queue where it waits for further processing. A scheduler determines the next job for the file and the message is moved to another queue. There it might wait another while until it gets picked up again to be delivered to another, maybe remote destination. If the remote server is unreachable at the moment it will be written back to the filesystem again to another queue and so an and so on until it finally can be removed after successful delivery.

The calculation to decide what to do with the mail doesn’t take a lot of time, but writing, moving and copying the file takes a lot longer. That’s due to the limitations of hardware. Hard discs nowadays really can store a lot of e-mail away, but the access speed didn’t grow at the same time. Still you need to stick to them because storing the message in a temporary device would lose the mail if the system was turned off suddenly.

So the basic rule is to get fast discs, arrays and controllers when you need to handle _a lot_ email. Regular hardware does it for private users quite well.

Another slowdown you should be prepared to expect is when you integrate Anti-Spam and Anti-Virus measures. They do not only require to read and write the files they also examine the content which often requires to unpack attached archives. This will temporary eat some of your CPU. But that’s something current hardware can deal with as well.

For hard facts you will need to find somebody who is willing to come up with a real world and well documented test scenario. So far one or the other has posted “measurement data”, but none of them would really tell about their setup and how they tested. Also I don’t know about a sophisticated comparison of Sendmail, qmail and Postfix.

Most of the “comparisons” I’ve heard weren’t able get rid of the odor of “because you wanted it to be better”.

Such tests are not what Postfix is and, as far as I can say without asking him, isn’t Wietse Venema about. I vividly recall him posting “Stop speculating, start measuring!” to someone who came up with a performance problem. I like that attitude a lot, because comparisons should be about facts and not believe.

I enjoyed the in-depth coverage on using certificate based security for authenticating communication between clients and servers. Do you see this as a vital step in the deployment process?

PK: Vital or not depends on your requirements and your in-house policy. Personally I do like certificate based relaying a lot and I think it should be used more widely, because you could really track spam a lot better down and would gain a more secure mail transport at the same time, but then certificate based relaying simply lacks the critical mass of servers and clients supporting it.

As long as you don’t have the critical mass of servers and clients using it there will always be a relay that does it without and that can be tricked to relay spam one or the other way and you loose track of the sender.

It also takes more work to configure, but especially maintain certificate based relaying because you need to maintain the list of certificates. You need to remove the ones that are expired, add others, hand out new ones, this and that…

I think its a “good thing to do [TM]” if you use it in your company, have many mobile users, but most of all (!) have all clients and serves under your control. Then you can automatize some of the work that needs to be done and all that together can pay up for the security and simplicity you get on your network.

But I doubt any private user would be willing to pay the additional fee for maintenance not to mention the certificate infrastructure to maintain the certificates themselves.

Was it Yahoo who had some certificate based Anti-spam measure on their mind? So many attempts to fix the effects of Spam… I think what we really need is a redesign of SMTP to cope with the current challenges. But that’s another topic and I am certainly not the one to be asked how it should be done. ;)

Is it better to use files or MySQL for the control tables in Postfix?

RH: “He said Jehova!”

Performance-wise mysql just sucks. The latency for queries is way higher than when asking a file based map. But then with mysql maps, any changes to the map become effective immediately, without the daemons that use the map having to exit and restart again. If your maps change often AND you get a lot of mail: mysql In all other cases: file based maps.

And: Keep it simple! If you don’t NEED mysql, why use it?

PK: I don’t think there’s a better or worse, because either way you loose or you gain something, but what you loose and gain aren’t the same things:

From a performance point of view you loose a lot of time when you use SQL or LDAP databases because of their higher lookup latency so you might want to stick with the files.

But then, if you host many domains, you win a lot when you maintain the data in a database. You can delegate many administrational tasks to the end user who accesses such a database through some web frontend. So there’s the pro for databases.

If you need both, performance and maintainability, you can build a chain from databases and files. The editing is done in the database and job on your computer checks the database on a regular base and builds (new) files from it when the data has changed. This way you get the best of both worlds for the price of a little delay after changes had been done in the database.

IMAP or POP?

PK: An old couple sits in the kitchen at home.

She: “Let’s go to the movies.”
He: “But we have been to the movies just recently…”
She: “Yes, but they show movies in colour AND with sound now!”

Definitely IMAP ;)

RH: Depends on your needs. Let the user decide: go for courier-imap (which also does pop), so the user can choose.

Is there a simple solution to the spam problem?

RH: Mind control? Orbital lasers? No, but Postfix’s restrictions and the possibility of delegating policy decisions to external programs can help.

PK: No, unfortunately not. There are too many reasons why Spam works and a working solution would have to be technical, political and business oriented at the same time.

First of all it works because the SMTP protocol as designed has little to no means to prove that a message was really sent by the sender given in the e-mail. Anybody can claim to be anybody. As long as this design problem persists it will cost a fortune to track spammers down.

Even if you know where the spam came from the spammer might have redrawn to a country that don’t mind spammers and will protect them from being pursued by foreign law.

The world simply lacks anti-spam laws all countries agree on. You typically are forced to end your chase for a spammer the moment you pass another countries borders because you are not entitled to chase the suspect.

Still, if you where entitled to do so, if costs a fortune to track a spammer down and even then it might take ages to get some money for the damage they have done. Is your company willing to pay that much just to nail one spammer down when another two emerge the moment the one goes behind bars?

And then Spam works, because it is so cheap. You buy a hundred thousand addresses for 250 bucks or even less and IIRC Yahoo found out that 1/3 of their mail users read spam and VISIT the pages they promote.

If one wants to make it go away one must make it expensive for those that send or endorse spam. If you ruin the business concept no one will send spam. That’s business… ;)

To sum my position up: The problem is global and we don’t have the right tools to hinder the cause. Currently all we can do is diminish the effect, by using as many anti-spam features as we can think of.

Do either of you have a favourite comic book hero?

PK: The “Tasmanian Devil” is my all time favourite. I even have a little plastic figure sitting in front of me under my monitor, which has become some kind of talisman. It reminds me to smile about myself on days where I’d rather go out and kill somebody else for not being the way I would want them to be ;)

RH: Calvin (of Calvin and Hobbes)
or
Too much Coffee Man!

Author Bios
Ralf Hildebrandt and Patrick Koetter are active and well-known figures in the Postfix community. Hildebrandt is a systems engineer for T-NetPro, a German telecommunications company, and Koetter runs his own company consulting and developing corporate communication for customers in Europe and Africa. Both have spoken about Postfix at industry conferences and contribute regularly to a number of open source mailing lists.

Cristian Darie, Mihai Bucica; Beginning PHP 5 and Mysql E-Commerce

PHP and MySQL are common solutions in many web development situations. However, when using them for e-commerce sites some different techniques should be employed to get the best out of the platforms. I talked to Cristian Darie and Mihai Bucica about their new book which uses an interesting approach to demonstrating the required techniques; the book builds an entire T-Shirt ordering shop.

Beginning PHP 5 and Mysql E-CommerceCould you give me, in a nut shell, the main focus of the book?

When writing “Beginning PHP 5 and MySQL E-Commerce”, we had two big goals of equal importance in mind. The first goal was to teach the reader how to approach the development of a data-driven web application with PHP and MySQL. We met this goal by taking a case-study approach, and we did our best to mix new theory and practice of incremental complexity in each chapter.

The second goal was to provide the knowledge necessary to build a fully functional e-commerce website. We did our best to simulate development in a real world environment, where you start with an initial set of requirements and on a low budget, and along the way (eventually after the company expands), new requirements show up and need to be addressed.

You can check out the website that we build in the book at http://web.cristiandarie.ro:8080/tshirtshop/.

Why use PHP and MySQL for e-commerce? Do you think it’s easier to develop e-commerce sites with open source tools like PHP and MySQL?

Generally speaking, the best technology is the one you know.

PHP and MySQL is an excellent technology mix for building data-driven websites of small and medium complexity. The technologies are stable and reliable, and the performance is good.

However, we actually don’t advocate using any particular technology because we live in the real world where each technology has its strengts and weaknesses, and each project has its own particularities that can lead to choosing one technology over the other. For example, if the client already has an infrastructure built on Microsoft technologies, it would probably be a bit hard to convince him or her to use PHP.

As many already know, for developers that prefer (or must use) ASP.NET and SQL Server, Cristian co-authored a book for them as well - “Beginning ASP.NET 1.1E-Commerce: From Novice to Professional”, with the ASP.NET 2.0 edition coming out later this year.

You de-mystify some of the tricks of the e-commerce trade - like rankings and recommendations; do you think these tricks have a significant impact on the usability of your site?

The impact of this kind of tricks is very important not only from the usability point of view, but also because the competitors already have these features implemented. If you don’t want them to steal your customers (or sell more than you do), read the 9 pages chapter about implementing product recommendations, and add that feature for your own website as well.

You don’t use transactional database techniques in your book - is this something that you would recommend for heavy-use sites?

Yes. “Beginning PHP 5 and MySQL E-Commerce” is addressed to beginning to intermediate programmers, building small to medium e-commerce websites - as are the vast majority of e-commerce websites nowadays. The architecture we’re offering is appropriate for this kind of website, and it doesn’t require using database transactions. For a complex, heavy-traffic website, a more advanced solution would be recommended, and we may write another book to cover that scenario.

The book shows in detail the implementation of an e-commerce website - do you know of anybody using this code for their own site?

Although the book is quite new, we’ve received lots of feedback from readers, some of them showing us their customized solutions based on the code shown in this book. Some of these solutions are about to be launched to production.

Credit card transactions always seemed to be the bane of e-commerce, especially for open source technology. Is it true that this has become easier recently?

Yes. Because much more e-commerce websites are built with open source technologies than they used to be, the payment gateways have started providing APIs, documentation and examples for PHP, just as they are doing for .NET and Java. This makes the life of the developer much easier.

Larger e-commerce applications may require more extensive deployment environments - are the techniques you cover here suitable for deployment in a multi-server environment?

PHP has its own limitations that make it innapropriate for extremely complex applications, but for the vast majority of cases PHP is just fine. The techniques we cover in the book aren’t meant to be used in multi-server environments; for these kinds of environments PHP may not be your best choice, but then again, it all depends on the particularities of the system.

Obviously PHP and MySQL provide the inner workings to an e-commerce site. Do you think the website design is as important as the implementation?

Of course, the website design is critical, because it reflects the “face” of your business. As we’ve mentioned in the book, it just doesn’t matter what rocket science was used to build the site, if the site is boring, hard to find, or easy to forget. Always make sure you have a good web designer to complement the programmers’ skills.

What do you do to relax?

Well, we’re both doing a good job at being 24 years old…

Author Bios

Mihai Bucica

Mihai Bucica started programming and competing in programming contests (winning many of them), all at age twelve. With a bachelor’s degree in computer science from the Automatic Control and Computers Faculty of the Politehnica University of Bucharest, Romania, Bucica works as an Outsourcing Project Manager for Galaxy Soft SRL. Even after working with a multitude of languages and technologies, Bucica’s programming language of choice remains C++, and he loves the LGPL word.

Cristian Darie

Cristian Darie, currently the technical lead for the Better Business Bureau Romania, is an experienced programmer specializing in open source and Microsoft technologies, and relational database management systems. In the last 5 years he has designed, deployed, and optimized many data-oriented software applications while working as a consultant for a wide variety of companies. Cristian co-authored several programming books for Apress, Wrox, and Packt Publishing, including Beginning ASP .NET 2.0 E-Commerce, Beginning PHP 5 and MySQL E-Commerce, Building Websites With The ASP.NET Community Starter Kit, and The Programmer’s Guide to SQL. Cristian can be contacted through his personal website, www.CristianDarie.ro.

Garrett Rooney, Practical Subversion

Subversion is having what can only be described as a subversive effect on the versioning software environment. CVS has long been the standard amongst programmers, but it has it’s faults and Subversion (read Sub-version) addresses those known and perceived about CVS. I talked to Garrett Rooney about his book Practical Subversion, his contributions to the Subversion code and where Subversion fits into the scheme of your administration and development environments.

Practical SubversionI see from the book you are a strong believer in version control - can you summarize the main benefits of version control?

I like to think of version control as a way of communicating information between developers.

When you commit a change to a source tree you can think of it as an automated way of telling every other developer how they can fix the same problem in their source tree. The benefits go further though, since in addition to keeping everyone on the team up to date with the latest fixes, you’re also recording all of the history. This means that later on, when you want to figure out how a piece of code got the way it is you can look at the series of changes (and hopefully the justification for the changes, if you’ve been good about writing log messages) that let to the current situation. Looking at that history is often the best way to understand why the code got the way it is, which means you’re less likely to make the same mistake twice when making new changes.

So version control is really a way to help you communicate, both with other people working on your project right now and with those working on it in the future.

There’s been a lot of discussion online about the benefits of Subversion compared to the previous preferred environment of CVS. How much better is Subversion?

I recently had to start using CVS again, after a rather long period of time where I’d only used either Subversion or Perforce, a commercial version control system. It never ceases to amaze me, whenever I go back to CVS, how irritating it is to use.

Let’s start with the basics. Lots of things in CVS are slow.

Specifically, lots of operations that I like to do fairly often (’cvs diff’ is the big one here) need to contact the repository in order to work, this means going out over a network, which means it’s pretty slow. In Subversion the equivalent command is lightning quick, since your working copy keeps a cached copy of each file, so it doesn’t have to contact the server in order to show you the difference between the file you started with and the new version you created.

There are other parts of CVS that are also quite slow when compared to Subversion. In CVS the act of tagging or branching your source tree requires you to make a small change to each and every file in the tree. This takes a lot of time for a large tree, and a noticable amount of disk space. In Subversion the equivalent operation takes a constant, and very small amount of time and disk space.

The other big improvement is the fact that in Subversion changes are committed to the source tree in an atomic fashion. Either the entire change makes it in or none of it does. In CVS you can get into a situation where you updated your working copy in the middle of a commit, resulting in you getting only half of the changes, and thus a broken source tree. In Subversion this doesn’t happen.

The same mechanism means that it’s much easier to talk about changes in Subversion than in CVS. In CVS if you have a change to five separate files in order to talk about it you need to talk about the individual change to each file. “I committed revision 1.4 of foo.c, 1.19 of bar.c, …” This means that if someone wants to look at the change you made to each file they have to go look at each individual file to do it. In Subversion you just say “I committed revision 105″, and anyone who wants to look at the diff can just say something like “svn diff -r104:105″ to see the difference between revision 104 and revision 105 of the entire tree. This is also quite useful when merging changes between branches, something that’s quite difficult in CVS.

Finally, the user interface provided by the Subversion client is simply nicer than the one provided by CVS. It’s more consistent, and generally easier to use. Enough things are similar to CVS that a CVS user can easily get up to speed, but the commands generally make sense to a new user, as compared to those of CVS which can be rather confusing.

How does Subversion compare with version controls other than CVS, BitKeeper for example has been in the news a lot recently. How about commercial products, like Visual SourceSafe or ClearCase?

I’ve personally never used BitKeeper, largely because of its license. While BK was available under a “free as in beer” license for use in developing open source software the license prohibited users from working on competing products, like Subversion. As a result I’ve never really had a chance to try it out.

I do think that BitKeeper has some interesting ideas though, and the other distributed version control systems (Arch, Darcs, Bazaar-NG, etc) are all on my radar. I don’t know if I’m convinced of their advantages over centralized systems like Subversion, but there is interesting work being done here. Personally, of the three distributed systems I just mentioned I’m most interested in Bazaar-NG (http://bazaar-ng.org/).

As for the commercial products out there, I’ve had personal experience with Perforce and SourceSafe. I wasn’t impressed with SourceSafe at all, and I really can’t think of a situation where I’d use it willingly. Perforce on the other hand is a very nice system. Its branching and merging support is superior to what Subversion provides at the moment (although the Subversion team has plans to close that gap in the future). That said, Perforce is expensive, and unless you really need specific features that can only be found there I wouldn’t see much reason to go with it.

You sound like you’ve had a lot of personal experience of where the right source control mechanism has saved your life. Any true tales that might help highlight the benefits of version control?

Personally, my most memorable experiences where version control would have been a lifesaver are those from before I started making use of it on a daily basis.

I know that back in college there were several times where I was working late at night on some project, usually due within a few hours, and I managed to screw things up badly. It’s remarkable how easy it is to go from a version of a program that’s mostly working to one that’s totally screwed up, all while trying to fix that last bug. It’s especially bad when your attempt to fix that last bug introduces more problems, and you can know longer remember exactly what you changed.

With a version control system, you never really need to be in that situation. At the absolute worst, you can always roll back to the version of the code you had at your last commit. It’s impossible to get stuck in that situation where you can’t figure out what you changed because the system will remember for you.

Now that all of my non-trivial projects (and most of my trivial ones honestly) make use of version control I just don’t find myself in those kind of situations anymore.

Existing developers will almost certainly need to migrate to Subversion - how easy is this?

It’s easy to underestimate the problems that come with migrating from one version control system to another. Technically, a conversion is usually pretty straitforward. There are various different programs available to migrate your data (cvs2svn for CVS repositories, p42svn for Perforce, and others), and in many cases it can be tempting to just run the conversion, toss your users some documentation and away you go.

Unfortunately, it isn’t that simple. Version control becomes part of a developer’s day to day workflow, and changing something like that has consequences. There needs to be careful planning and most importantly you need to have buy in from the people involved.

Another Subversion developer, Brian Fitzpatrick, will actually be giving a talk about this very subject at OSCON this year, and I’m looking forward to hearing what he has to say.

http://conferences.oreillynet.com/cs/os2005/view/e_sess/6750

Some versioning systems have problems with anything other than text. What file types does Subversion support?

Subversion, by default, treats all files as binary. Over the network, and within the repository, files are always treated as binary blobs of data. Binary diff algorithms are used to efficiently store changes to files, and in a very real sense text and binary files are treated identically.

Optionally, there are various ways you can tell Subversion to treat particular files as something other than binary.

If you want end-of-line conversion to be performed on a file, for example so it could show up as a DOS style file when checked out on windows but a Unix style file when checked out on a Unix machine all you have to do is set the svn:eol-style property on the file

Similarly, if you want keyword substitution to be performed on a file, so words like $Revision$ or $Date$ to be replaced with the revision and date the file was last changed on, you can set the svn:keywords property to indicate that.

The key fact to keep in mind is that in Subversion these are optional features that are turned off by default. By their very nature they require that Subversion make changes to your files, which can be catestrophic in some cases (changing all the \r\n’s in a binary file to \n’s isn’t likely to work very well), so you need to ask for this behavior if you want it. In systems like CVS these kind of features are turned on by default, which has resulted in countless hours of pain for CVS users over the years.

From reading your book, it’s obvious that Subversion seems a little bit more application friendly than CVS, integrating with Apache, emacs and others with a little more grace than CVS and RCS. Is that really the case?

Well, let’s be fair, CVS and RCS have quite good integration with various tools, ranging from Emacs to Eclipse. That said, it hasn’t been easy to get to that point. In many cases tools that want to integrate with CVS have to jump through hoops to call out to the command line client and parse the resulting output, which can be fragile. In cases where that isn’t possible many projects have reimplemented CVS so that it could be more easily integrated.

In Subversion many of these problems are aleviated by the fact that the core functionality is implemented as a collection of software libraries. If you want to make use of Subversion in your own code all you need to do is link against the Subversion libraries and you can provide exactly the same functionality as the official Subversion client. If you’re not working in C or C++ there are probably bindings for the Subversion libraries written in your language of choice, so you can even do this without having to learn a lot about the C level libraries.

Additionally, Subversion’s ability to integrate with Apache has provided a number of abilities, ranging from WebDAV integration to the ability to use an SQL or LDAP database for storing usernames and passwords, that otherwise would have been incredibly difficult to implement. By working within the Apache framework we get all of that for free.

Subversion includes autoversioning support for DAV volumes, could you explain how you could use that to your advantage?

DAV autoversioning is most useful when you need to allow non-technical users, who would be uncomfortable making use of normal Subversion clients, to work with your versioned resources. This could mean source code, but most commonly it involves graphics files or word documents and other things like that. Your users simply use a DAV client (which is often built into their operating system) to access the files, and the version control happens transparently, without them even knowing about it. When they save their changes to the file it is automatically committed to the repository. This is a very powerful tool, and can give you some of the advantages of version control without costly training for your users.

Most people use version control for their development projects, but I’ve also found it useful for recording configuration file changes. Is that something you would advocate?

Absolutely! I personally keep much of my home directory under Subversion’s control, allowing me to version my editor’s config files, .bashrc, et cetera. All the same benefits you can get from using version control with software development are just as applicable for configuration files.

How important do you think it is for high quality tools like Subversion to be Open Source?

I’m a big fan of using open source licensing and development models for infrastructure level software (operating systems, compilers, development tools, etc). Well, honestly I’m a big fan of using open source licenses and development models for most kinds of software, but I think it’s particularly appropriate for software at the infrastructure level, where the primary use case is building larger systems.

It’s difficult to imagine a company being able to make money developing a new operating system in this day and age, or a version control system, or a C runtime library. These are largely commoditized parts of the software ecosystem, and as a result I think it makes sense for the various people who benefit from having them available to share the cost for producing them, and the best way we currently have to do that is via open source.

Additionally, it’s difficult to underestimate the value of having high quality systems out there for people to learn from. I’ve learned a great deal by reading open source code, and even more by participating in open source projects.

Finally though, I just like the fact that if I find a problem in a piece of open source software like Subversion I can actually do something about it. I’ve worked with closed source third party software in the past, and I’ve found that I tend to spend a lot of time digging through inadequate documentation and beating my head against the wall while trying to work around bugs. With an open source product you can at least make an attempt to figure out what the actual problem is.

Contributing to Subversion in your spare time doesn’t seem like a very relaxing way to spend your free time. Is there something, less, computer based that you like to do?

I don’t know, there’s something fun about working on open source projects. It’s awfully nice to have the freedom to do things the “right” way, as opposed to the “get it done right now” way, which happens far too often in the commerical software world.

That said, I do try to get off of the computer from time to time. I see a lot of movies, read a lot, and lately I’ve been picking up photography. I also just moved to Silicon Valley, so I’m making a concerted effort to explore the area.

Anything else in the pipeline you’d like to tell us about?

Well, Subversion 1.2 is on its way out the door any day now, and that’ll bring with it some great new features, primarily support for locking of files, something that many users had been requesting.

As for my other projects, I’m going to be giving a talk at O’Reilly’s OSCON again in August. This year I’ll be speaking about the issues regarding backwards compatibility in open source software. I’ve also been spending a lot of time on the Lucene4c project (http://incubator.apache.org/lucene4c/), trying to provide a C level API to access the Apache Lucene search engine.

Garrett Rooney Bio

Garrett Rooney works for Ask Jeeves, in Los Gatos CA, on Bloglines.com. Rooney attended Rensselaer Polytechnic Institute, where he managed to complete 3 years of a mechanical engineering degree before coming to his senses and realizing he wanted to get a job where someone would pay him to play with computers. Since then, Rooney completed a computer science degree at RPI and has spent far too much time working on a wide variety of open source projects, most notably Subversion.

Computerworld Blogs

Computerworld have set up a dedicated blogging area on their site at Computerworld blogs

There are a few of us there; all dedicated to blogging on different news stories in a range of different areas and topics. You can read my blog at the dedicated Martin MC Brown Computerworld blog.

Alternatively, you can subscribe to my dedicated RSS feed.

You can see that we’ve been populated it over the last week or so; there are already blog posts from me, and others, about a variety of topics.

Please feel free to read and either comment there, or here and let me know how I’m getting on.

FOSS Anniversaries

In the last LinuxWorld article I wrote for the magazine I talked about FOSS anniversaries, mostly because a number of important projects turned into double figures, and yet most people let it pass them by.

Talk to young programmers and developers today and you’d be fooled into thinking that free/open source software (FOSS) was a relatively new invention. Those crusty old folk among us (myself included, born in that prehistoric era of the early ’70s) know that it goes back a little further than that.

Many of us become dewy-eyed about our memories of Linux when it first came out - or the first Red Hat release. In fact, many of the FOSS projects that we take for granted today are a heck of a lot of older than people realize.

And my final request:

To try and redress the balance I’m starting a FOSS anniversaries project. Initially it’s going to be held on my personal blog at http://mcslp.com - click on the FOSS Anniversaries link to go to the page. If I get enough interest, I’ll consider improving on it and moving it elsewhere. Until then, if you’ve got some additions or corrections, use the contact form to let me know.

Here is the FOSS Anniversaries page, which is on this site. If you want me to update anything, use the Contact page.

Session Tracking With Apache

My new piece on how to track user sessions on your website with Apache is available on ServerWatch.com. Here’s an excerpt:

Using HTTP logs to track the users who visit your site isn’t always as useful as you think it’s going to be. While metrics, like the total number of page hits and, within that, page hits over time or from a specific IP address, easily identify, they don’t always tell how people are viewing your site or answer specific questions the marketing department may pose.

This article looks at how to track progress through a site using an Apache module and provides answers to some of the more complex marketing-led questions that may be posed.

Read on for the rest of the article.

Kyle Rankin, Knoppix Hacks

Knoppix is not just another Linux distribution. Unlike many Linux alternatives, Knoppix doesn’t need to be installed; everything runs from a CD (called a ‘Live CD’ distribution). While Live CDs aren’t unique to Knoppix, it is the way the Knoppix CD is packaged that makes the difference. Knoppix includes intelligent hardware detection – it can automatically identify nearly everything on your machine and then make the bet of it – and the CD includes a wide selection of programs, from typical Linux applications through to repair utilities and tools.

I talked to Kyle Rankin, author of Knoppix Hacks about how the book idea was formed, how he chose the contents and some of the things you can do with Knoppix.

Knoppix HacksOK - I can’t make up my mind whether I’ve fallen in love with Knoppix or the Knoppix Hacks book. What lead to the production of this book?

A friend of mine works at O’Reilly heard that they were looking for someone to do a Knoppix book for them. Not too long before he had seen me use Knoppix at an installfest to resize someone’s Windows partition and set up Debian in a relatively short amount of time. He approached me with the news and encouraged me to send them a book proposal. I had never written a book before, but I personally used Knoppix a lot, especially as a recovery tool. I thought a Hacks book applied to Knoppix was a great idea so I started jotting down ideas and submitted a formal proposal for the book that was accepted. Add months of furious writing and Knoppix Hacks was born. I started the book liking Knoppix and finished the book absolutely loving it.

What impressed me most is the range and usefulness of the hacks - I immediately felt like trying them out, even if I didn’t want to image my partition. How did you pick the hacks that made it into the book?

Thanks. When writing the book, I realized that you could organize the ways that people use Knoppix into a few general categories: desktop use, a Linux installer, a systems administrator tool, a rescue CD, and as a platform to create your own live CD. We had a discussion about whether to make the book mostly focused on more advanced topics like system recovery, sysadmin hacks, and remastering, but decided that it since Knoppix was used by all sorts of people at many different skill levels, it made more sense to represent all of the different types of use in different chapters. In particular, when I wrote the Linux and Windows repair chapters, I tried to think of all of the different recovery scenarios that I have found myself in, and how I used Knoppix to fix it. My goal was to create a list of common recovery steps that a sysadmin in a jam would reach for before anything else. Along the way I discovered some really clever recovery techniques you could use Knoppix for that I hadn’t known about previously (like Windows registry hacking).

Knoppix is obviously a practical way to do a great many things; can it also be used as a general desktop OS?

Knoppix was actually originally created just to be a portable Linux distribution for Klaus Knopper to take with him to different computers. From the very beginning it was intended first and foremost to be a desktop OS. The excellent hardware detection makes it much easier to take the CD from computer to computer, and there are a number of scripts in place that allow you to keep your settings no matter what computer you are in front of.

What do you do about user storage. Can I use a USB key for example?

Yes, you can use basically any writable media you might have (that Knoppix can detect) to store user files including floppy drives, hard drives on the system, and USB keys. There are a few different scripts included with Knoppix that automate the process of storing data to writable media so it’s really just a matter of a few clicks to save settings. Then you just use a cheat code when you boot Knoppix to tell it to restore your settings the next time you boot. 5. Staying on the topic of alternative storage mediums, is it possible to use Knoppix on DVD, USB Key or smaller storage mediums, like Compact Flash? Knoppix can be remastered and used on a DVD and in fact there are a few Knoppix variants that have done just this. In fact, Klaus Knopper has announced his intention to start shipping a formal DVD version of Knoppix as soon as this summer. Knoppix is pretty large, so the process of stripping it down to smaller media such as a USB key or flash drive can be difficult. Luckily there already are a number of other distributions such as Feather Linux that make it easy to set up and use on a USB key.

Is there any reason why I shouldn’t simply write my Knoppix image to my hard disk and never use the CD ever again?

A number of people have installed Knoppix to a hard drive as a permanent solution over the years, and in fact there is a nice GUI that automates the process. However, Knoppix was designed to be run from CD-ROM and Klaus mixes packages from a number of different Debian repositories. This can make upgrading in the future quite a headache so I generally recommend people to immediately dist-upgrade to Debian Sid if they install Knoppix (and I include a hack in the book that talks about how to do this). Alternatively there are other distributions that make Debian easy to install like Ubuntu and Kanotix that are also much easier to upgrade.

Some of the tools represent what can only be classed as an administrators dream. Image partitioning, copying and repair tools are all on the Knoppix CD. Could you tell us a little more about these hacks and how they can be exploited?

It’s actually pretty amazing how many different administrator tools Knoppix includes. Some of the things that really surprised me were the complete Apache and BIND servers that were included on the CD so in a pinch you could set up a number of different emergency servers. A friend of mine actually used this idea when a webserver of his was hacked into. He needed to be able to serve the pages while not having the server actually be up and running, so he booted Knoppix and served the web pages directly from its Apache server. It’s especially interesting to introduce Knoppix to a systems administrator who is mostly used to proprietary (and often expensive) Windows admin tools. You can use dd or partimage to image disks locally or over the network, you can graphically resize partitions on the fly with QTParted, you can scan systems for viruses and rootkits, perform forensics scanning, wipe hard drives, plus a number of other things all from this single free CD. Also, Knoppix makes for a great sanity check when you suspect hardware is bad. You can not only test the RAM, but you can also test hardware from the CD.

The Knoppix idea seems so obvious - does it surprise you that it’s a relatively recent invention?

Over time there have been a number of different rescue floppies and CDs like tomsrtbt and the old LinuxCare bootable business card, but what continues to surprise me with Knoppix is just how incredibly flexible and useful the CD is. You can use it to demo Linux to a newcomer, fix a broken Windows system, and scan a Linux server for rootkits all from the same CD. There are hundreds of different Live CDs out there, many based on Knoppix, but I’ve found that I keep coming back to Knoppix for day-to-day use just because of how flexible it is.

I’m hoping you have a Knoppix Hacks Volume 2 in the works?

Well, I have actually recently finished a Knoppix Pocket Reference for O’Reilly that should be out in July. As the name indicates it is much more referential and even though it is small, covers a lot of ground that Knoppix Hacks didn’t cover while containing a lot of the sort of Knoppix tips you’d want to carry around in your pocket. As far as a second edition of Knoppix Hacks, Knoppix continues to add interesting functionality (for instance, I can think of a number of really powerful Hacks you can do just with the new UnionFS system in 3.8) so a second edition is a possibility but nothing is officially planned or anything.

Obviously MacGyver is a favourite, but is there anything else you like to watch when relaxing?

What is this ‘relaxing’ you speak of? Actually when I’m not working or writing, I like to watch the Daily Show and have been a long time fan of the Simpsons. I’ve noticed my wife and I have been watching more movies these days than TV shows (probably Netflix has something to do with that). Any remaining free time I have seems to be absorbed by IRC.

Kyle Rankin Bio

Kyle is a system administrator for The Green Sheet, Inc., the current president of the North Bay Linux Users Group, and the author of Knoppix Hacks. Kyle has been using Linux in one form or another since early 1998. In his free time he does pretty much the same thing he does at work–works with Linux.

David Sklar, Essential PHP Tools: Modules, Extensions, and Accelerators

Hardening Apache
PHP is a popular web development/deployment platform and you can get even more out of the platform by using the extensions and tools available on the web to extend PHP’s capabilities. I talk to David Sklar, author of Essential PHP, about his new book and PHP development.

Why do you use PHP?

It’s proven itself to be a flexible and capable solution for building lots of web applications. I’m a big fan of the "use the right tool for the job" philosophy. PHP isn’t the right tool for every job, but when you need to build a dynamic web app, it’s hard to beat.

Could you tell me what guided your thoughts on the solutions you feature in the book?

They’re solutions to problems I’ve needed to solve. Code reuse is a wonderful thing and PEAR makes it easy. It’s a frustrating waste of time to write code that does boring stuff like populate form fields with appropriately escaped user input when you’re redisplaying a form because of an error. HTML_QuickForm does it for you. The Auth module transparently accomodates many different kinds of data stores for authentication information. One project might require a database, another an LDAP server. With PEAR Auth, the only difference between the two would be one or two lines of configuration for Auth.

Do you think PHP provides a richer environment for Web publishing than, say, Perl or Python?

I don’t know much about Python, so I can’t compare it with PHP. I know a moderate amount about Perl, so I can (moderately) compare it with Perl. (And if those caveats aren’t enough, I’ll also add that "environment" is a loaded term — I suppose it could encompass not just the functions and libraries in a language, but IDEs, debugging and deployment tools, and so on.)

The big difference for me, when it comes to web development, between PHP and Perl is that the PHP interpreter assumes that a given program is going to be generating a web page (unless you tell it otherwise), while the Perl interpreter assumes (again, unless you tell it otherwise) that a given program is going to read a bunch of stuff from standard in, mess with it, and print it to standard out.

In PHP, you don’t have to do anything special to access form, cookie, and URL variables — they’re in the auto-global arrays $_POST, $_COOKIE, and $_GET. Similarly, HTTP headers are in $_SERVER. The PHP interpreter emits a Content-Type: text/html header unless to tell it to do something else. In Perl, you have to go through some rigamole (admittedly, just a little bit of rigamarole) to do that web-centric set up.

(The flip of this, of course, is that if you want to write a program in PHP to munge files, you have to do more work than in Perl.)

Perl is a great programming language and you can use it to solve web programming problems quite capably. So is PHP.

You seem to be a fan of Web Services, do you see them as simply a useful tool, or a more serious way of providing services over the web?

Like many things, promise_of("Web Services") > current_usefulness("Web Services"). A lot of the neat stuff about SOAP – automatically generating WSDL from classes and encoding and decoding complex data types is more difficult in PHP because of PHP’s loosey-goosey type system. Nevertheless, I think SOAP can be great in situations where you need custom data types and you have sharp separations between the folks who implement and maintain the functionality being exposed by SOAP and the folks who use those functions. When you have control over both ends of the conversation, or don’t need to encapsulate such complicated relationships in your data structure, XMLRPC or just a homegrown RESTful interface is fine.

Security is vital part of web programming, particularly when working with forms and other data. Any tips?

htmlspecialchars(): encode external input with htmlspecialchars() or htmlentities() before printing it to avoid cross-site scripting and cross-site request forgery attacks. Not doing this is probably the most widely committed PHP (and web application development) security error.

Similarly, encode external input before putting it into your database. PEAR DB’s placeholders do this for you automatically, which is a great convenience. Each database extension has its own function for doing this, and there’s the generic addslashes() function as well.

In the larger security scheme of things, I would also encourage developers to think of security as a process, not as an end state. The place you want to get to is not that your application is "secure," but that it is "secure enough." The specific definition of "secure enough" depends on how much time and money you have, what kind of data your application is dealing with, and what the consequences are if something goes wrong.

There are, certainly, some security-related practices that are so easy to implement and so catastrophic if you don’t (like escaping external input before printing it or putting it into the database) that you should always do them. But thinking about security means evaluating tradeoffs.

You cover a number of different code caching solutions, how much time can you really save using these systems?

The benchmarks in the book indicate about a 280% speedup. The specific speedup you get varies with your applications behavior, so I’d advise anyone considering code caches to test them with an actual application you’re going to use. It’s a really easy way to get a performance boost, though, since you don’t have to edit any of your code – just install the code cache, restart your web server, and you’re all done.

Do you have a favourite PHP tool?

That’s a tough question. My favorite PHP function is strtotime() but I don’t know if that qualifies as a tool. I like the XDebug extension a lot. I do most of my coding in XEmacs but I’ve started to play around with IDEs like the Zend Studio and Komodo, so one of those might become my favorite tool sometime soon.

Your preferred platform for PHP deployment?

Apache 1.3 running on Linux. It’s stable, flexible, and you can’t beat the price tag.

Any thoughts on PHP5 you’d like to share with our readers?

If you’ve never used PHP before, now is the time to start! With PHP5, you get all of the great things about PHP 4 — comprehensive function library, incredibly easy deployment of web applications, connectivity to lots of different database programs. Plus, you get all of the goodies that the new version brings: robust Object Oriented programming support, revamped XML processing that makes it a snap to parse simple XML documents and gives you the full DOM API when you need to do XML heavy lifting, and bells and whistles like exceptions, iterators, and interfaces.

What advice would you give to anybody considering PHP as their development platform?

Make a personal or hobby project your first PHP application, something like keeping track of your books or CDs, a personal URL bookmark database, or league statistics for your kids’ soccer games. Your first app isn’t going to be perfect. It will have security problems, it won’t be as fast as it could be, the database schema won’t be optimized and so on. But that’s fine. Just get a feel for what PHP can do. Make your second project the one that matters for your job or whomever else is counting on you.

What made you start up PX?

It was definitely a case of scratching one’s own itch. When I started it, there weren’t a lot of places to look for code that someone else had written in PHP to solve a certain problem. The site gets very steady usage — it’s nice to see folks continuing to turn to it for solutions.

It’s nice to see another IT-savvy cook, do you have a particular culinary speciality?

I’m flattered that you called me an "IT-savvy cook" instead of a "cooking-savvy programmer"! I recently got a slow cooker, so I’ve been trying lots of new things in that. I also like baking and making desserts: even if something goes wrong so that the results are not cosmetically perfect, they still taste good.

David Sklar Bio

David Sklar is an independent consultant specializing in technical training, software development, and strategic planning. He is the author of Learning PHP 5 (O’Reilly), Essential PHP Tools (Apress), and PHP Cookbook (O’Reilly).

After discovering PHP as a solution to his web programming needs in 1996, he created the PX (http://px.sklar.com), which enables PHP users to exchange programs. Since then, he has continued to rely on PHP for personal and professional projects.

David is an instructor at the New School University and has spoken at many conferences, including the O’Reilly Open Source Conference, the EGovOS Open Source/Open Standards Conference, and the International PHP Conference.

When away from the computer, David eats mini-donuts, plays records, and likes to cook. He lives in New York City and has a degree in Computer Science from Yale University.

All the MCB Guru blogs that are fit to print