diff en/ch13-mq-collab.tex @ 649:5cd47f721686

Rename LaTeX input files to have numeric prefixes
author Bryan O'Sullivan <bos@serpentine.com>
date Thu, 29 Jan 2009 22:56:27 -0800
parents en/mq-collab.tex@97e929385442
children f72b7e6cbe90
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/en/ch13-mq-collab.tex	Thu Jan 29 22:56:27 2009 -0800
@@ -0,0 +1,393 @@
+\chapter{Advanced uses of Mercurial Queues}
+\label{chap:mq-collab}
+
+While it's easy to pick up straightforward uses of Mercurial Queues,
+use of a little discipline and some of MQ's less frequently used
+capabilities makes it possible to work in complicated development
+environments.
+
+In this chapter, I will use as an example a technique I have used to
+manage the development of an Infiniband device driver for the Linux
+kernel.  The driver in question is large (at least as drivers go),
+with 25,000 lines of code spread across 35 source files.  It is
+maintained by a small team of developers.
+
+While much of the material in this chapter is specific to Linux, the
+same principles apply to any code base for which you're not the
+primary owner, and upon which you need to do a lot of development.
+
+\section{The problem of many targets}
+
+The Linux kernel changes rapidly, and has never been internally
+stable; developers frequently make drastic changes between releases.
+This means that a version of the driver that works well with a
+particular released version of the kernel will not even \emph{compile}
+correctly against, typically, any other version.
+
+To maintain a driver, we have to keep a number of distinct versions of
+Linux in mind.
+\begin{itemize}
+\item One target is the main Linux kernel development tree.
+  Maintenance of the code is in this case partly shared by other
+  developers in the kernel community, who make ``drive-by''
+  modifications to the driver as they develop and refine kernel
+  subsystems.
+\item We also maintain a number of ``backports'' to older versions of
+  the Linux kernel, to support the needs of customers who are running
+  older Linux distributions that do not incorporate our drivers.  (To
+  \emph{backport} a piece of code is to modify it to work in an older
+  version of its target environment than the version it was developed
+  for.)
+\item Finally, we make software releases on a schedule that is
+  necessarily not aligned with those used by Linux distributors and
+  kernel developers, so that we can deliver new features to customers
+  without forcing them to upgrade their entire kernels or
+  distributions.
+\end{itemize}
+
+\subsection{Tempting approaches that don't work well}
+
+There are two ``standard'' ways to maintain a piece of software that
+has to target many different environments.
+
+The first is to maintain a number of branches, each intended for a
+single target.  The trouble with this approach is that you must
+maintain iron discipline in the flow of changes between repositories.
+A new feature or bug fix must start life in a ``pristine'' repository,
+then percolate out to every backport repository.  Backport changes are
+more limited in the branches they should propagate to; a backport
+change that is applied to a branch where it doesn't belong will
+probably stop the driver from compiling.
+
+The second is to maintain a single source tree filled with conditional
+statements that turn chunks of code on or off depending on the
+intended target.  Because these ``ifdefs'' are not allowed in the
+Linux kernel tree, a manual or automatic process must be followed to
+strip them out and yield a clean tree.  A code base maintained in this
+fashion rapidly becomes a rat's nest of conditional blocks that are
+difficult to understand and maintain.
+
+Neither of these approaches is well suited to a situation where you
+don't ``own'' the canonical copy of a source tree.  In the case of a
+Linux driver that is distributed with the standard kernel, Linus's
+tree contains the copy of the code that will be treated by the world
+as canonical.  The upstream version of ``my'' driver can be modified
+by people I don't know, without me even finding out about it until
+after the changes show up in Linus's tree.  
+
+These approaches have the added weakness of making it difficult to
+generate well-formed patches to submit upstream.
+
+In principle, Mercurial Queues seems like a good candidate to manage a
+development scenario such as the above.  While this is indeed the
+case, MQ contains a few added features that make the job more
+pleasant.
+
+\section{Conditionally applying patches with 
+  guards}
+
+Perhaps the best way to maintain sanity with so many targets is to be
+able to choose specific patches to apply for a given situation.  MQ
+provides a feature called ``guards'' (which originates with quilt's
+\texttt{guards} command) that does just this.  To start off, let's
+create a simple repository for experimenting in.
+\interaction{mq.guards.init}
+This gives us a tiny repository that contains two patches that don't
+have any dependencies on each other, because they touch different files.
+
+The idea behind conditional application is that you can ``tag'' a
+patch with a \emph{guard}, which is simply a text string of your
+choosing, then tell MQ to select specific guards to use when applying
+patches.  MQ will then either apply, or skip over, a guarded patch,
+depending on the guards that you have selected.
+
+A patch can have an arbitrary number of guards;
+each one is \emph{positive} (``apply this patch if this guard is
+selected'') or \emph{negative} (``skip this patch if this guard is
+selected'').  A patch with no guards is always applied.
+
+\section{Controlling the guards on a patch}
+
+The \hgxcmd{mq}{qguard} command lets you determine which guards should
+apply to a patch, or display the guards that are already in effect.
+Without any arguments, it displays the guards on the current topmost
+patch.
+\interaction{mq.guards.qguard}
+To set a positive guard on a patch, prefix the name of the guard with
+a ``\texttt{+}''.
+\interaction{mq.guards.qguard.pos}
+To set a negative guard on a patch, prefix the name of the guard with
+a ``\texttt{-}''.
+\interaction{mq.guards.qguard.neg}
+
+\begin{note}
+  The \hgxcmd{mq}{qguard} command \emph{sets} the guards on a patch; it
+  doesn't \emph{modify} them.  What this means is that if you run
+  \hgcmdargs{qguard}{+a +b} on a patch, then \hgcmdargs{qguard}{+c} on
+  the same patch, the \emph{only} guard that will be set on it
+  afterwards is \texttt{+c}.
+\end{note}
+
+Mercurial stores guards in the \sfilename{series} file; the form in
+which they are stored is easy both to understand and to edit by hand.
+(In other words, you don't have to use the \hgxcmd{mq}{qguard} command if
+you don't want to; it's okay to simply edit the \sfilename{series}
+file.)
+\interaction{mq.guards.series}
+
+\section{Selecting the guards to use}
+
+The \hgxcmd{mq}{qselect} command determines which guards are active at a
+given time.  The effect of this is to determine which patches MQ will
+apply the next time you run \hgxcmd{mq}{qpush}.  It has no other effect; in
+particular, it doesn't do anything to patches that are already
+applied.
+
+With no arguments, the \hgxcmd{mq}{qselect} command lists the guards
+currently in effect, one per line of output.  Each argument is treated
+as the name of a guard to apply.
+\interaction{mq.guards.qselect.foo}
+In case you're interested, the currently selected guards are stored in
+the \sfilename{guards} file.
+\interaction{mq.guards.qselect.cat}
+We can see the effect the selected guards have when we run
+\hgxcmd{mq}{qpush}.
+\interaction{mq.guards.qselect.qpush}
+
+A guard cannot start with a ``\texttt{+}'' or ``\texttt{-}''
+character.  The name of a guard must not contain white space, but most
+other characters are acceptable.  If you try to use a guard with an
+invalid name, MQ will complain:
+\interaction{mq.guards.qselect.error} 
+Changing the selected guards changes the patches that are applied.
+\interaction{mq.guards.qselect.quux} 
+You can see in the example below that negative guards take precedence
+over positive guards.
+\interaction{mq.guards.qselect.foobar}
+
+\section{MQ's rules for applying patches}
+
+The rules that MQ uses when deciding whether to apply a patch
+are as follows.
+\begin{itemize}
+\item A patch that has no guards is always applied.
+\item If the patch has any negative guard that matches any currently
+  selected guard, the patch is skipped.
+\item If the patch has any positive guard that matches any currently
+  selected guard, the patch is applied.
+\item If the patch has positive or negative guards, but none matches
+  any currently selected guard, the patch is skipped.
+\end{itemize}
+
+\section{Trimming the work environment}
+
+In working on the device driver I mentioned earlier, I don't apply the
+patches to a normal Linux kernel tree.  Instead, I use a repository
+that contains only a snapshot of the source files and headers that are
+relevant to Infiniband development.  This repository is~1\% the size
+of a kernel repository, so it's easier to work with.
+
+I then choose a ``base'' version on top of which the patches are
+applied.  This is a snapshot of the Linux kernel tree as of a revision
+of my choosing.  When I take the snapshot, I record the changeset ID
+from the kernel repository in the commit message.  Since the snapshot
+preserves the ``shape'' and content of the relevant parts of the
+kernel tree, I can apply my patches on top of either my tiny
+repository or a normal kernel tree.
+
+Normally, the base tree atop which the patches apply should be a
+snapshot of a very recent upstream tree.  This best facilitates the
+development of patches that can easily be submitted upstream with few
+or no modifications.
+
+\section{Dividing up the \sfilename{series} file}
+
+I categorise the patches in the \sfilename{series} file into a number
+of logical groups.  Each section of like patches begins with a block
+of comments that describes the purpose of the patches that follow.
+
+The sequence of patch groups that I maintain follows.  The ordering of
+these groups is important; I'll describe why after I introduce the
+groups.
+\begin{itemize}
+\item The ``accepted'' group.  Patches that the development team has
+  submitted to the maintainer of the Infiniband subsystem, and which
+  he has accepted, but which are not present in the snapshot that the
+  tiny repository is based on.  These are ``read only'' patches,
+  present only to transform the tree into a similar state as it is in
+  the upstream maintainer's repository.
+\item The ``rework'' group.  Patches that I have submitted, but that
+  the upstream maintainer has requested modifications to before he
+  will accept them.
+\item The ``pending'' group.  Patches that I have not yet submitted to
+  the upstream maintainer, but which we have finished working on.
+  These will be ``read only'' for a while.  If the upstream maintainer
+  accepts them upon submission, I'll move them to the end of the
+  ``accepted'' group.  If he requests that I modify any, I'll move
+  them to the beginning of the ``rework'' group.
+\item The ``in progress'' group.  Patches that are actively being
+  developed, and should not be submitted anywhere yet.
+\item The ``backport'' group.  Patches that adapt the source tree to
+  older versions of the kernel tree.
+\item The ``do not ship'' group.  Patches that for some reason should
+  never be submitted upstream.  For example, one such patch might
+  change embedded driver identification strings to make it easier to
+  distinguish, in the field, between an out-of-tree version of the
+  driver and a version shipped by a distribution vendor.
+\end{itemize}
+
+Now to return to the reasons for ordering groups of patches in this
+way.  We would like the lowest patches in the stack to be as stable as
+possible, so that we will not need to rework higher patches due to
+changes in context.  Putting patches that will never be changed first
+in the \sfilename{series} file serves this purpose.
+
+We would also like the patches that we know we'll need to modify to be
+applied on top of a source tree that resembles the upstream tree as
+closely as possible.  This is why we keep accepted patches around for
+a while.
+
+The ``backport'' and ``do not ship'' patches float at the end of the
+\sfilename{series} file.  The backport patches must be applied on top
+of all other patches, and the ``do not ship'' patches might as well
+stay out of harm's way.
+
+\section{Maintaining the patch series}
+
+In my work, I use a number of guards to control which patches are to
+be applied.
+
+\begin{itemize}
+\item ``Accepted'' patches are guarded with \texttt{accepted}.  I
+  enable this guard most of the time.  When I'm applying the patches
+  on top of a tree where the patches are already present, I can turn
+  this patch off, and the patches that follow it will apply cleanly.
+\item Patches that are ``finished'', but not yet submitted, have no
+  guards.  If I'm applying the patch stack to a copy of the upstream
+  tree, I don't need to enable any guards in order to get a reasonably
+  safe source tree.
+\item Those patches that need reworking before being resubmitted are
+  guarded with \texttt{rework}.
+\item For those patches that are still under development, I use
+  \texttt{devel}.
+\item A backport patch may have several guards, one for each version
+  of the kernel to which it applies.  For example, a patch that
+  backports a piece of code to~2.6.9 will have a~\texttt{2.6.9} guard.
+\end{itemize}
+This variety of guards gives me considerable flexibility in
+determining what kind of source tree I want to end up with.  For most
+situations, the selection of appropriate guards is automated during
+the build process, but I can manually tune the guards to use for less
+common circumstances.
+
+\subsection{The art of writing backport patches}
+
+Using MQ, writing a backport patch is a simple process.  All such a
+patch has to do is modify a piece of code that uses a kernel feature
+not present in the older version of the kernel, so that the driver
+continues to work correctly under that older version.
+
+A useful goal when writing a good backport patch is to make your code
+look as if it was written for the older version of the kernel you're
+targeting.  The less obtrusive the patch, the easier it will be to
+understand and maintain.  If you're writing a collection of backport
+patches to avoid the ``rat's nest'' effect of lots of
+\texttt{\#ifdef}s (hunks of source code that are only used
+conditionally) in your code, don't introduce version-dependent
+\texttt{\#ifdef}s into the patches.  Instead, write several patches,
+each of which makes unconditional changes, and control their
+application using guards.
+
+There are two reasons to divide backport patches into a distinct
+group, away from the ``regular'' patches whose effects they modify.
+The first is that intermingling the two makes it more difficult to use
+a tool like the \hgext{patchbomb} extension to automate the process of
+submitting the patches to an upstream maintainer.  The second is that
+a backport patch could perturb the context in which a subsequent
+regular patch is applied, making it impossible to apply the regular
+patch cleanly \emph{without} the earlier backport patch already being
+applied.
+
+\section{Useful tips for developing with MQ}
+
+\subsection{Organising patches in directories}
+
+If you're working on a substantial project with MQ, it's not difficult
+to accumulate a large number of patches.  For example, I have one
+patch repository that contains over 250 patches.
+
+If you can group these patches into separate logical categories, you
+can if you like store them in different directories; MQ has no
+problems with patch names that contain path separators.
+
+\subsection{Viewing the history of a patch}
+\label{mq-collab:tips:interdiff}
+
+If you're developing a set of patches over a long time, it's a good
+idea to maintain them in a repository, as discussed in
+section~\ref{sec:mq:repo}.  If you do so, you'll quickly discover that
+using the \hgcmd{diff} command to look at the history of changes to a
+patch is unworkable.  This is in part because you're looking at the
+second derivative of the real code (a diff of a diff), but also
+because MQ adds noise to the process by modifying time stamps and
+directory names when it updates a patch.
+
+However, you can use the \hgext{extdiff} extension, which is bundled
+with Mercurial, to turn a diff of two versions of a patch into
+something readable.  To do this, you will need a third-party package
+called \package{patchutils}~\cite{web:patchutils}.  This provides a
+command named \command{interdiff}, which shows the differences between
+two diffs as a diff.  Used on two versions of the same diff, it
+generates a diff that represents the diff from the first to the second
+version.
+
+You can enable the \hgext{extdiff} extension in the usual way, by
+adding a line to the \rcsection{extensions} section of your \hgrc.
+\begin{codesample2}
+  [extensions]
+  extdiff =
+\end{codesample2}
+The \command{interdiff} command expects to be passed the names of two
+files, but the \hgext{extdiff} extension passes the program it runs a
+pair of directories, each of which can contain an arbitrary number of
+files.  We thus need a small program that will run \command{interdiff}
+on each pair of files in these two directories.  This program is
+available as \sfilename{hg-interdiff} in the \dirname{examples}
+directory of the source code repository that accompanies this book.
+\excode{hg-interdiff}
+
+With the \sfilename{hg-interdiff} program in your shell's search path,
+you can run it as follows, from inside an MQ patch directory:
+\begin{codesample2}
+  hg extdiff -p hg-interdiff -r A:B my-change.patch
+\end{codesample2}
+Since you'll probably want to use this long-winded command a lot, you
+can get \hgext{hgext} to make it available as a normal Mercurial
+command, again by editing your \hgrc.
+\begin{codesample2}
+  [extdiff]
+  cmd.interdiff = hg-interdiff
+\end{codesample2}
+This directs \hgext{hgext} to make an \texttt{interdiff} command
+available, so you can now shorten the previous invocation of
+\hgxcmd{extdiff}{extdiff} to something a little more wieldy.
+\begin{codesample2}
+  hg interdiff -r A:B my-change.patch
+\end{codesample2}
+
+\begin{note}
+  The \command{interdiff} command works well only if the underlying
+  files against which versions of a patch are generated remain the
+  same.  If you create a patch, modify the underlying files, and then
+  regenerate the patch, \command{interdiff} may not produce useful
+  output.
+\end{note}
+
+The \hgext{extdiff} extension is useful for more than merely improving
+the presentation of MQ~patches.  To read more about it, go to
+section~\ref{sec:hgext:extdiff}.
+
+%%% Local Variables: 
+%%% mode: latex
+%%% TeX-master: "00book"
+%%% End: