comparison en/ch13-mq-collab.tex @ 649:5cd47f721686

Rename LaTeX input files to have numeric prefixes
author Bryan O'Sullivan <bos@serpentine.com>
date Thu, 29 Jan 2009 22:56:27 -0800
parents en/mq-collab.tex@97e929385442
children f72b7e6cbe90
comparison
equal deleted inserted replaced
648:bc14f94e726a 649:5cd47f721686
1 \chapter{Advanced uses of Mercurial Queues}
2 \label{chap:mq-collab}
3
4 While it's easy to pick up straightforward uses of Mercurial Queues,
5 use of a little discipline and some of MQ's less frequently used
6 capabilities makes it possible to work in complicated development
7 environments.
8
9 In this chapter, I will use as an example a technique I have used to
10 manage the development of an Infiniband device driver for the Linux
11 kernel. The driver in question is large (at least as drivers go),
12 with 25,000 lines of code spread across 35 source files. It is
13 maintained by a small team of developers.
14
15 While much of the material in this chapter is specific to Linux, the
16 same principles apply to any code base for which you're not the
17 primary owner, and upon which you need to do a lot of development.
18
19 \section{The problem of many targets}
20
21 The Linux kernel changes rapidly, and has never been internally
22 stable; developers frequently make drastic changes between releases.
23 This means that a version of the driver that works well with a
24 particular released version of the kernel will not even \emph{compile}
25 correctly against, typically, any other version.
26
27 To maintain a driver, we have to keep a number of distinct versions of
28 Linux in mind.
29 \begin{itemize}
30 \item One target is the main Linux kernel development tree.
31 Maintenance of the code is in this case partly shared by other
32 developers in the kernel community, who make ``drive-by''
33 modifications to the driver as they develop and refine kernel
34 subsystems.
35 \item We also maintain a number of ``backports'' to older versions of
36 the Linux kernel, to support the needs of customers who are running
37 older Linux distributions that do not incorporate our drivers. (To
38 \emph{backport} a piece of code is to modify it to work in an older
39 version of its target environment than the version it was developed
40 for.)
41 \item Finally, we make software releases on a schedule that is
42 necessarily not aligned with those used by Linux distributors and
43 kernel developers, so that we can deliver new features to customers
44 without forcing them to upgrade their entire kernels or
45 distributions.
46 \end{itemize}
47
48 \subsection{Tempting approaches that don't work well}
49
50 There are two ``standard'' ways to maintain a piece of software that
51 has to target many different environments.
52
53 The first is to maintain a number of branches, each intended for a
54 single target. The trouble with this approach is that you must
55 maintain iron discipline in the flow of changes between repositories.
56 A new feature or bug fix must start life in a ``pristine'' repository,
57 then percolate out to every backport repository. Backport changes are
58 more limited in the branches they should propagate to; a backport
59 change that is applied to a branch where it doesn't belong will
60 probably stop the driver from compiling.
61
62 The second is to maintain a single source tree filled with conditional
63 statements that turn chunks of code on or off depending on the
64 intended target. Because these ``ifdefs'' are not allowed in the
65 Linux kernel tree, a manual or automatic process must be followed to
66 strip them out and yield a clean tree. A code base maintained in this
67 fashion rapidly becomes a rat's nest of conditional blocks that are
68 difficult to understand and maintain.
69
70 Neither of these approaches is well suited to a situation where you
71 don't ``own'' the canonical copy of a source tree. In the case of a
72 Linux driver that is distributed with the standard kernel, Linus's
73 tree contains the copy of the code that will be treated by the world
74 as canonical. The upstream version of ``my'' driver can be modified
75 by people I don't know, without me even finding out about it until
76 after the changes show up in Linus's tree.
77
78 These approaches have the added weakness of making it difficult to
79 generate well-formed patches to submit upstream.
80
81 In principle, Mercurial Queues seems like a good candidate to manage a
82 development scenario such as the above. While this is indeed the
83 case, MQ contains a few added features that make the job more
84 pleasant.
85
86 \section{Conditionally applying patches with
87 guards}
88
89 Perhaps the best way to maintain sanity with so many targets is to be
90 able to choose specific patches to apply for a given situation. MQ
91 provides a feature called ``guards'' (which originates with quilt's
92 \texttt{guards} command) that does just this. To start off, let's
93 create a simple repository for experimenting in.
94 \interaction{mq.guards.init}
95 This gives us a tiny repository that contains two patches that don't
96 have any dependencies on each other, because they touch different files.
97
98 The idea behind conditional application is that you can ``tag'' a
99 patch with a \emph{guard}, which is simply a text string of your
100 choosing, then tell MQ to select specific guards to use when applying
101 patches. MQ will then either apply, or skip over, a guarded patch,
102 depending on the guards that you have selected.
103
104 A patch can have an arbitrary number of guards;
105 each one is \emph{positive} (``apply this patch if this guard is
106 selected'') or \emph{negative} (``skip this patch if this guard is
107 selected''). A patch with no guards is always applied.
108
109 \section{Controlling the guards on a patch}
110
111 The \hgxcmd{mq}{qguard} command lets you determine which guards should
112 apply to a patch, or display the guards that are already in effect.
113 Without any arguments, it displays the guards on the current topmost
114 patch.
115 \interaction{mq.guards.qguard}
116 To set a positive guard on a patch, prefix the name of the guard with
117 a ``\texttt{+}''.
118 \interaction{mq.guards.qguard.pos}
119 To set a negative guard on a patch, prefix the name of the guard with
120 a ``\texttt{-}''.
121 \interaction{mq.guards.qguard.neg}
122
123 \begin{note}
124 The \hgxcmd{mq}{qguard} command \emph{sets} the guards on a patch; it
125 doesn't \emph{modify} them. What this means is that if you run
126 \hgcmdargs{qguard}{+a +b} on a patch, then \hgcmdargs{qguard}{+c} on
127 the same patch, the \emph{only} guard that will be set on it
128 afterwards is \texttt{+c}.
129 \end{note}
130
131 Mercurial stores guards in the \sfilename{series} file; the form in
132 which they are stored is easy both to understand and to edit by hand.
133 (In other words, you don't have to use the \hgxcmd{mq}{qguard} command if
134 you don't want to; it's okay to simply edit the \sfilename{series}
135 file.)
136 \interaction{mq.guards.series}
137
138 \section{Selecting the guards to use}
139
140 The \hgxcmd{mq}{qselect} command determines which guards are active at a
141 given time. The effect of this is to determine which patches MQ will
142 apply the next time you run \hgxcmd{mq}{qpush}. It has no other effect; in
143 particular, it doesn't do anything to patches that are already
144 applied.
145
146 With no arguments, the \hgxcmd{mq}{qselect} command lists the guards
147 currently in effect, one per line of output. Each argument is treated
148 as the name of a guard to apply.
149 \interaction{mq.guards.qselect.foo}
150 In case you're interested, the currently selected guards are stored in
151 the \sfilename{guards} file.
152 \interaction{mq.guards.qselect.cat}
153 We can see the effect the selected guards have when we run
154 \hgxcmd{mq}{qpush}.
155 \interaction{mq.guards.qselect.qpush}
156
157 A guard cannot start with a ``\texttt{+}'' or ``\texttt{-}''
158 character. The name of a guard must not contain white space, but most
159 other characters are acceptable. If you try to use a guard with an
160 invalid name, MQ will complain:
161 \interaction{mq.guards.qselect.error}
162 Changing the selected guards changes the patches that are applied.
163 \interaction{mq.guards.qselect.quux}
164 You can see in the example below that negative guards take precedence
165 over positive guards.
166 \interaction{mq.guards.qselect.foobar}
167
168 \section{MQ's rules for applying patches}
169
170 The rules that MQ uses when deciding whether to apply a patch
171 are as follows.
172 \begin{itemize}
173 \item A patch that has no guards is always applied.
174 \item If the patch has any negative guard that matches any currently
175 selected guard, the patch is skipped.
176 \item If the patch has any positive guard that matches any currently
177 selected guard, the patch is applied.
178 \item If the patch has positive or negative guards, but none matches
179 any currently selected guard, the patch is skipped.
180 \end{itemize}
181
182 \section{Trimming the work environment}
183
184 In working on the device driver I mentioned earlier, I don't apply the
185 patches to a normal Linux kernel tree. Instead, I use a repository
186 that contains only a snapshot of the source files and headers that are
187 relevant to Infiniband development. This repository is~1\% the size
188 of a kernel repository, so it's easier to work with.
189
190 I then choose a ``base'' version on top of which the patches are
191 applied. This is a snapshot of the Linux kernel tree as of a revision
192 of my choosing. When I take the snapshot, I record the changeset ID
193 from the kernel repository in the commit message. Since the snapshot
194 preserves the ``shape'' and content of the relevant parts of the
195 kernel tree, I can apply my patches on top of either my tiny
196 repository or a normal kernel tree.
197
198 Normally, the base tree atop which the patches apply should be a
199 snapshot of a very recent upstream tree. This best facilitates the
200 development of patches that can easily be submitted upstream with few
201 or no modifications.
202
203 \section{Dividing up the \sfilename{series} file}
204
205 I categorise the patches in the \sfilename{series} file into a number
206 of logical groups. Each section of like patches begins with a block
207 of comments that describes the purpose of the patches that follow.
208
209 The sequence of patch groups that I maintain follows. The ordering of
210 these groups is important; I'll describe why after I introduce the
211 groups.
212 \begin{itemize}
213 \item The ``accepted'' group. Patches that the development team has
214 submitted to the maintainer of the Infiniband subsystem, and which
215 he has accepted, but which are not present in the snapshot that the
216 tiny repository is based on. These are ``read only'' patches,
217 present only to transform the tree into a similar state as it is in
218 the upstream maintainer's repository.
219 \item The ``rework'' group. Patches that I have submitted, but that
220 the upstream maintainer has requested modifications to before he
221 will accept them.
222 \item The ``pending'' group. Patches that I have not yet submitted to
223 the upstream maintainer, but which we have finished working on.
224 These will be ``read only'' for a while. If the upstream maintainer
225 accepts them upon submission, I'll move them to the end of the
226 ``accepted'' group. If he requests that I modify any, I'll move
227 them to the beginning of the ``rework'' group.
228 \item The ``in progress'' group. Patches that are actively being
229 developed, and should not be submitted anywhere yet.
230 \item The ``backport'' group. Patches that adapt the source tree to
231 older versions of the kernel tree.
232 \item The ``do not ship'' group. Patches that for some reason should
233 never be submitted upstream. For example, one such patch might
234 change embedded driver identification strings to make it easier to
235 distinguish, in the field, between an out-of-tree version of the
236 driver and a version shipped by a distribution vendor.
237 \end{itemize}
238
239 Now to return to the reasons for ordering groups of patches in this
240 way. We would like the lowest patches in the stack to be as stable as
241 possible, so that we will not need to rework higher patches due to
242 changes in context. Putting patches that will never be changed first
243 in the \sfilename{series} file serves this purpose.
244
245 We would also like the patches that we know we'll need to modify to be
246 applied on top of a source tree that resembles the upstream tree as
247 closely as possible. This is why we keep accepted patches around for
248 a while.
249
250 The ``backport'' and ``do not ship'' patches float at the end of the
251 \sfilename{series} file. The backport patches must be applied on top
252 of all other patches, and the ``do not ship'' patches might as well
253 stay out of harm's way.
254
255 \section{Maintaining the patch series}
256
257 In my work, I use a number of guards to control which patches are to
258 be applied.
259
260 \begin{itemize}
261 \item ``Accepted'' patches are guarded with \texttt{accepted}. I
262 enable this guard most of the time. When I'm applying the patches
263 on top of a tree where the patches are already present, I can turn
264 this patch off, and the patches that follow it will apply cleanly.
265 \item Patches that are ``finished'', but not yet submitted, have no
266 guards. If I'm applying the patch stack to a copy of the upstream
267 tree, I don't need to enable any guards in order to get a reasonably
268 safe source tree.
269 \item Those patches that need reworking before being resubmitted are
270 guarded with \texttt{rework}.
271 \item For those patches that are still under development, I use
272 \texttt{devel}.
273 \item A backport patch may have several guards, one for each version
274 of the kernel to which it applies. For example, a patch that
275 backports a piece of code to~2.6.9 will have a~\texttt{2.6.9} guard.
276 \end{itemize}
277 This variety of guards gives me considerable flexibility in
278 determining what kind of source tree I want to end up with. For most
279 situations, the selection of appropriate guards is automated during
280 the build process, but I can manually tune the guards to use for less
281 common circumstances.
282
283 \subsection{The art of writing backport patches}
284
285 Using MQ, writing a backport patch is a simple process. All such a
286 patch has to do is modify a piece of code that uses a kernel feature
287 not present in the older version of the kernel, so that the driver
288 continues to work correctly under that older version.
289
290 A useful goal when writing a good backport patch is to make your code
291 look as if it was written for the older version of the kernel you're
292 targeting. The less obtrusive the patch, the easier it will be to
293 understand and maintain. If you're writing a collection of backport
294 patches to avoid the ``rat's nest'' effect of lots of
295 \texttt{\#ifdef}s (hunks of source code that are only used
296 conditionally) in your code, don't introduce version-dependent
297 \texttt{\#ifdef}s into the patches. Instead, write several patches,
298 each of which makes unconditional changes, and control their
299 application using guards.
300
301 There are two reasons to divide backport patches into a distinct
302 group, away from the ``regular'' patches whose effects they modify.
303 The first is that intermingling the two makes it more difficult to use
304 a tool like the \hgext{patchbomb} extension to automate the process of
305 submitting the patches to an upstream maintainer. The second is that
306 a backport patch could perturb the context in which a subsequent
307 regular patch is applied, making it impossible to apply the regular
308 patch cleanly \emph{without} the earlier backport patch already being
309 applied.
310
311 \section{Useful tips for developing with MQ}
312
313 \subsection{Organising patches in directories}
314
315 If you're working on a substantial project with MQ, it's not difficult
316 to accumulate a large number of patches. For example, I have one
317 patch repository that contains over 250 patches.
318
319 If you can group these patches into separate logical categories, you
320 can if you like store them in different directories; MQ has no
321 problems with patch names that contain path separators.
322
323 \subsection{Viewing the history of a patch}
324 \label{mq-collab:tips:interdiff}
325
326 If you're developing a set of patches over a long time, it's a good
327 idea to maintain them in a repository, as discussed in
328 section~\ref{sec:mq:repo}. If you do so, you'll quickly discover that
329 using the \hgcmd{diff} command to look at the history of changes to a
330 patch is unworkable. This is in part because you're looking at the
331 second derivative of the real code (a diff of a diff), but also
332 because MQ adds noise to the process by modifying time stamps and
333 directory names when it updates a patch.
334
335 However, you can use the \hgext{extdiff} extension, which is bundled
336 with Mercurial, to turn a diff of two versions of a patch into
337 something readable. To do this, you will need a third-party package
338 called \package{patchutils}~\cite{web:patchutils}. This provides a
339 command named \command{interdiff}, which shows the differences between
340 two diffs as a diff. Used on two versions of the same diff, it
341 generates a diff that represents the diff from the first to the second
342 version.
343
344 You can enable the \hgext{extdiff} extension in the usual way, by
345 adding a line to the \rcsection{extensions} section of your \hgrc.
346 \begin{codesample2}
347 [extensions]
348 extdiff =
349 \end{codesample2}
350 The \command{interdiff} command expects to be passed the names of two
351 files, but the \hgext{extdiff} extension passes the program it runs a
352 pair of directories, each of which can contain an arbitrary number of
353 files. We thus need a small program that will run \command{interdiff}
354 on each pair of files in these two directories. This program is
355 available as \sfilename{hg-interdiff} in the \dirname{examples}
356 directory of the source code repository that accompanies this book.
357 \excode{hg-interdiff}
358
359 With the \sfilename{hg-interdiff} program in your shell's search path,
360 you can run it as follows, from inside an MQ patch directory:
361 \begin{codesample2}
362 hg extdiff -p hg-interdiff -r A:B my-change.patch
363 \end{codesample2}
364 Since you'll probably want to use this long-winded command a lot, you
365 can get \hgext{hgext} to make it available as a normal Mercurial
366 command, again by editing your \hgrc.
367 \begin{codesample2}
368 [extdiff]
369 cmd.interdiff = hg-interdiff
370 \end{codesample2}
371 This directs \hgext{hgext} to make an \texttt{interdiff} command
372 available, so you can now shorten the previous invocation of
373 \hgxcmd{extdiff}{extdiff} to something a little more wieldy.
374 \begin{codesample2}
375 hg interdiff -r A:B my-change.patch
376 \end{codesample2}
377
378 \begin{note}
379 The \command{interdiff} command works well only if the underlying
380 files against which versions of a patch are generated remain the
381 same. If you create a patch, modify the underlying files, and then
382 regenerate the patch, \command{interdiff} may not produce useful
383 output.
384 \end{note}
385
386 The \hgext{extdiff} extension is useful for more than merely improving
387 the presentation of MQ~patches. To read more about it, go to
388 section~\ref{sec:hgext:extdiff}.
389
390 %%% Local Variables:
391 %%% mode: latex
392 %%% TeX-master: "00book"
393 %%% End: