# HG changeset patch # User Bryan O'Sullivan # Date 1233298587 28800 # Node ID 5cd47f721686d00d16d7d46ddba237838b3e44a6 # Parent bc14f94e726a827326233a62b2d45a27a7d565b3 Rename LaTeX input files to have numeric prefixes diff -r bc14f94e726a -r 5cd47f721686 en/00book.tex --- a/en/00book.tex Thu Jan 29 22:47:34 2009 -0800 +++ b/en/00book.tex Thu Jan 29 22:56:27 2009 -0800 @@ -40,27 +40,27 @@ \pagenumbering{arabic} -\include{preface} -\include{intro} -\include{tour-basic} -\include{tour-merge} -\include{concepts} -\include{daily} -\include{collab} -\include{filenames} -\include{branch} -\include{undo} -\include{hook} -\include{template} -\include{mq} -\include{mq-collab} -\include{hgext} +\include{ch00-preface} +\include{ch01-intro} +\include{ch02-tour-basic} +\include{ch03-tour-merge} +\include{ch04-concepts} +\include{ch05-daily} +\include{ch06-collab} +\include{ch07-filenames} +\include{ch08-branch} +\include{ch09-undo} +\include{ch10-hook} +\include{ch11-template} +\include{ch12-mq} +\include{ch13-mq-collab} +\include{ch14-hgext} \appendix -\include{cmdref} -\include{mq-ref} -\include{srcinstall} -\include{license} +\include{appA-cmdref} +\include{appB-mq-ref} +\include{appC-srcinstall} +\include{appD-license} \addcontentsline{toc}{chapter}{Bibliography} \bibliographystyle{alpha} \bibliography{99book} diff -r bc14f94e726a -r 5cd47f721686 en/Makefile --- a/en/Makefile Thu Jan 29 22:47:34 2009 -0800 +++ b/en/Makefile Thu Jan 29 22:56:27 2009 -0800 @@ -4,26 +4,8 @@ 00book.tex \ 99book.bib \ 99defs.tex \ - build_id.tex \ - branch.tex \ - cmdref.tex \ - collab.tex \ - concepts.tex \ - daily.tex \ - filenames.tex \ - hg_id.tex \ - hgext.tex \ - hook.tex \ - intro.tex \ - mq.tex \ - mq-collab.tex \ - mq-ref.tex \ - preface.tex \ - srcinstall.tex \ - template.tex \ - tour-basic.tex \ - tour-merge.tex \ - undo.tex + app*.tex \ + ch*.tex image-sources := \ feature-branches.dot \ diff -r bc14f94e726a -r 5cd47f721686 en/appA-cmdref.tex --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/en/appA-cmdref.tex Thu Jan 29 22:56:27 2009 -0800 @@ -0,0 +1,176 @@ +\chapter{Command reference} +\label{cmdref} + +\cmdref{add}{add files at the next commit} +\optref{add}{I}{include} +\optref{add}{X}{exclude} +\optref{add}{n}{dry-run} + +\cmdref{diff}{print changes in history or working directory} + +Show differences between revisions for the specified files or +directories, using the unified diff format. For a description of the +unified diff format, see section~\ref{sec:mq:patch}. + +By default, this command does not print diffs for files that Mercurial +considers to contain binary data. To control this behaviour, see the +\hgopt{diff}{-a} and \hgopt{diff}{--git} options. + +\subsection{Options} + +\loptref{diff}{nodates} + +Omit date and time information when printing diff headers. + +\optref{diff}{B}{ignore-blank-lines} + +Do not print changes that only insert or delete blank lines. A line +that contains only whitespace is not considered blank. + +\optref{diff}{I}{include} + +Include files and directories whose names match the given patterns. + +\optref{diff}{X}{exclude} + +Exclude files and directories whose names match the given patterns. + +\optref{diff}{a}{text} + +If this option is not specified, \hgcmd{diff} will refuse to print +diffs for files that it detects as binary. Specifying \hgopt{diff}{-a} +forces \hgcmd{diff} to treat all files as text, and generate diffs for +all of them. + +This option is useful for files that are ``mostly text'' but have a +few embedded NUL characters. If you use it on files that contain a +lot of binary data, its output will be incomprehensible. + +\optref{diff}{b}{ignore-space-change} + +Do not print a line if the only change to that line is in the amount +of white space it contains. + +\optref{diff}{g}{git} + +Print \command{git}-compatible diffs. XXX reference a format +description. + +\optref{diff}{p}{show-function} + +Display the name of the enclosing function in a hunk header, using a +simple heuristic. This functionality is enabled by default, so the +\hgopt{diff}{-p} option has no effect unless you change the value of +the \rcitem{diff}{showfunc} config item, as in the following example. +\interaction{cmdref.diff-p} + +\optref{diff}{r}{rev} + +Specify one or more revisions to compare. The \hgcmd{diff} command +accepts up to two \hgopt{diff}{-r} options to specify the revisions to +compare. + +\begin{enumerate} +\setcounter{enumi}{0} +\item Display the differences between the parent revision of the + working directory and the working directory. +\item Display the differences between the specified changeset and the + working directory. +\item Display the differences between the two specified changesets. +\end{enumerate} + +You can specify two revisions using either two \hgopt{diff}{-r} +options or revision range notation. For example, the two revision +specifications below are equivalent. +\begin{codesample2} + hg diff -r 10 -r 20 + hg diff -r10:20 +\end{codesample2} + +When you provide two revisions, Mercurial treats the order of those +revisions as significant. Thus, \hgcmdargs{diff}{-r10:20} will +produce a diff that will transform files from their contents as of +revision~10 to their contents as of revision~20, while +\hgcmdargs{diff}{-r20:10} means the opposite: the diff that will +transform files from their revision~20 contents to their revision~10 +contents. You cannot reverse the ordering in this way if you are +diffing against the working directory. + +\optref{diff}{w}{ignore-all-space} + +\cmdref{version}{print version and copyright information} + +This command displays the version of Mercurial you are running, and +its copyright license. There are four kinds of version string that +you may see. +\begin{itemize} +\item The string ``\texttt{unknown}''. This version of Mercurial was + not built in a Mercurial repository, and cannot determine its own + version. +\item A short numeric string, such as ``\texttt{1.1}''. This is a + build of a revision of Mercurial that was identified by a specific + tag in the repository where it was built. (This doesn't necessarily + mean that you're running an official release; someone else could + have added that tag to any revision in the repository where they + built Mercurial.) +\item A hexadecimal string, such as ``\texttt{875489e31abe}''. This + is a build of the given revision of Mercurial. +\item A hexadecimal string followed by a date, such as + ``\texttt{875489e31abe+20070205}''. This is a build of the given + revision of Mercurial, where the build repository contained some + local changes that had not been committed. +\end{itemize} + +\subsection{Tips and tricks} + +\subsubsection{Why do the results of \hgcmd{diff} and \hgcmd{status} + differ?} +\label{cmdref:diff-vs-status} + +When you run the \hgcmd{status} command, you'll see a list of files +that Mercurial will record changes for the next time you perform a +commit. If you run the \hgcmd{diff} command, you may notice that it +prints diffs for only a \emph{subset} of the files that \hgcmd{status} +listed. There are two possible reasons for this. + +The first is that \hgcmd{status} prints some kinds of modifications +that \hgcmd{diff} doesn't normally display. The \hgcmd{diff} command +normally outputs unified diffs, which don't have the ability to +represent some changes that Mercurial can track. Most notably, +traditional diffs can't represent a change in whether or not a file is +executable, but Mercurial records this information. + +If you use the \hgopt{diff}{--git} option to \hgcmd{diff}, it will +display \command{git}-compatible diffs that \emph{can} display this +extra information. + +The second possible reason that \hgcmd{diff} might be printing diffs +for a subset of the files displayed by \hgcmd{status} is that if you +invoke it without any arguments, \hgcmd{diff} prints diffs against the +first parent of the working directory. If you have run \hgcmd{merge} +to merge two changesets, but you haven't yet committed the results of +the merge, your working directory has two parents (use \hgcmd{parents} +to see them). While \hgcmd{status} prints modifications relative to +\emph{both} parents after an uncommitted merge, \hgcmd{diff} still +operates relative only to the first parent. You can get it to print +diffs relative to the second parent by specifying that parent with the +\hgopt{diff}{-r} option. There is no way to print diffs relative to +both parents. + +\subsubsection{Generating safe binary diffs} + +If you use the \hgopt{diff}{-a} option to force Mercurial to print +diffs of files that are either ``mostly text'' or contain lots of +binary data, those diffs cannot subsequently be applied by either +Mercurial's \hgcmd{import} command or the system's \command{patch} +command. + +If you want to generate a diff of a binary file that is safe to use as +input for \hgcmd{import}, use the \hgcmd{diff}{--git} option when you +generate the patch. The system \command{patch} command cannot handle +binary patches at all. + +%%% Local Variables: +%%% mode: latex +%%% TeX-master: "00book" +%%% End: diff -r bc14f94e726a -r 5cd47f721686 en/appB-mq-ref.tex --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/en/appB-mq-ref.tex Thu Jan 29 22:56:27 2009 -0800 @@ -0,0 +1,349 @@ +\chapter{Mercurial Queues reference} +\label{chap:mqref} + +\section{MQ command reference} +\label{sec:mqref:cmdref} + +For an overview of the commands provided by MQ, use the command +\hgcmdargs{help}{mq}. + +\subsection{\hgxcmd{mq}{qapplied}---print applied patches} + +The \hgxcmd{mq}{qapplied} command prints the current stack of applied +patches. Patches are printed in oldest-to-newest order, so the last +patch in the list is the ``top'' patch. + +\subsection{\hgxcmd{mq}{qcommit}---commit changes in the queue repository} + +The \hgxcmd{mq}{qcommit} command commits any outstanding changes in the +\sdirname{.hg/patches} repository. This command only works if the +\sdirname{.hg/patches} directory is a repository, i.e.~you created the +directory using \hgcmdargs{qinit}{\hgxopt{mq}{qinit}{-c}} or ran +\hgcmd{init} in the directory after running \hgxcmd{mq}{qinit}. + +This command is shorthand for \hgcmdargs{commit}{--cwd .hg/patches}. + +\subsection{\hgxcmd{mq}{qdelete}---delete a patch from the + \sfilename{series} file} + +The \hgxcmd{mq}{qdelete} command removes the entry for a patch from the +\sfilename{series} file in the \sdirname{.hg/patches} directory. It +does not pop the patch if the patch is already applied. By default, +it does not delete the patch file; use the \hgxopt{mq}{qdel}{-f} option to +do that. + +Options: +\begin{itemize} +\item[\hgxopt{mq}{qdel}{-f}] Delete the patch file. +\end{itemize} + +\subsection{\hgxcmd{mq}{qdiff}---print a diff of the topmost applied patch} + +The \hgxcmd{mq}{qdiff} command prints a diff of the topmost applied patch. +It is equivalent to \hgcmdargs{diff}{-r-2:-1}. + +\subsection{\hgxcmd{mq}{qfold}---merge (``fold'') several patches into one} + +The \hgxcmd{mq}{qfold} command merges multiple patches into the topmost +applied patch, so that the topmost applied patch makes the union of +all of the changes in the patches in question. + +The patches to fold must not be applied; \hgxcmd{mq}{qfold} will exit with +an error if any is. The order in which patches are folded is +significant; \hgcmdargs{qfold}{a b} means ``apply the current topmost +patch, followed by \texttt{a}, followed by \texttt{b}''. + +The comments from the folded patches are appended to the comments of +the destination patch, with each block of comments separated by three +asterisk (``\texttt{*}'') characters. Use the \hgxopt{mq}{qfold}{-e} +option to edit the commit message for the combined patch/changeset +after the folding has completed. + +Options: +\begin{itemize} +\item[\hgxopt{mq}{qfold}{-e}] Edit the commit message and patch description + for the newly folded patch. +\item[\hgxopt{mq}{qfold}{-l}] Use the contents of the given file as the new + commit message and patch description for the folded patch. +\item[\hgxopt{mq}{qfold}{-m}] Use the given text as the new commit message + and patch description for the folded patch. +\end{itemize} + +\subsection{\hgxcmd{mq}{qheader}---display the header/description of a patch} + +The \hgxcmd{mq}{qheader} command prints the header, or description, of a +patch. By default, it prints the header of the topmost applied patch. +Given an argument, it prints the header of the named patch. + +\subsection{\hgxcmd{mq}{qimport}---import a third-party patch into the queue} + +The \hgxcmd{mq}{qimport} command adds an entry for an external patch to the +\sfilename{series} file, and copies the patch into the +\sdirname{.hg/patches} directory. It adds the entry immediately after +the topmost applied patch, but does not push the patch. + +If the \sdirname{.hg/patches} directory is a repository, +\hgxcmd{mq}{qimport} automatically does an \hgcmd{add} of the imported +patch. + +\subsection{\hgxcmd{mq}{qinit}---prepare a repository to work with MQ} + +The \hgxcmd{mq}{qinit} command prepares a repository to work with MQ. It +creates a directory called \sdirname{.hg/patches}. + +Options: +\begin{itemize} +\item[\hgxopt{mq}{qinit}{-c}] Create \sdirname{.hg/patches} as a repository + in its own right. Also creates a \sfilename{.hgignore} file that + will ignore the \sfilename{status} file. +\end{itemize} + +When the \sdirname{.hg/patches} directory is a repository, the +\hgxcmd{mq}{qimport} and \hgxcmd{mq}{qnew} commands automatically \hgcmd{add} +new patches. + +\subsection{\hgxcmd{mq}{qnew}---create a new patch} + +The \hgxcmd{mq}{qnew} command creates a new patch. It takes one mandatory +argument, the name to use for the patch file. The newly created patch +is created empty by default. It is added to the \sfilename{series} +file after the current topmost applied patch, and is immediately +pushed on top of that patch. + +If \hgxcmd{mq}{qnew} finds modified files in the working directory, it will +refuse to create a new patch unless the \hgxopt{mq}{qnew}{-f} option is +used (see below). This behaviour allows you to \hgxcmd{mq}{qrefresh} your +topmost applied patch before you apply a new patch on top of it. + +Options: +\begin{itemize} +\item[\hgxopt{mq}{qnew}{-f}] Create a new patch if the contents of the + working directory are modified. Any outstanding modifications are + added to the newly created patch, so after this command completes, + the working directory will no longer be modified. +\item[\hgxopt{mq}{qnew}{-m}] Use the given text as the commit message. + This text will be stored at the beginning of the patch file, before + the patch data. +\end{itemize} + +\subsection{\hgxcmd{mq}{qnext}---print the name of the next patch} + +The \hgxcmd{mq}{qnext} command prints the name name of the next patch in +the \sfilename{series} file after the topmost applied patch. This +patch will become the topmost applied patch if you run \hgxcmd{mq}{qpush}. + +\subsection{\hgxcmd{mq}{qpop}---pop patches off the stack} + +The \hgxcmd{mq}{qpop} command removes applied patches from the top of the +stack of applied patches. By default, it removes only one patch. + +This command removes the changesets that represent the popped patches +from the repository, and updates the working directory to undo the +effects of the patches. + +This command takes an optional argument, which it uses as the name or +index of the patch to pop to. If given a name, it will pop patches +until the named patch is the topmost applied patch. If given a +number, \hgxcmd{mq}{qpop} treats the number as an index into the entries in +the series file, counting from zero (empty lines and lines containing +only comments do not count). It pops patches until the patch +identified by the given index is the topmost applied patch. + +The \hgxcmd{mq}{qpop} command does not read or write patches or the +\sfilename{series} file. It is thus safe to \hgxcmd{mq}{qpop} a patch that +you have removed from the \sfilename{series} file, or a patch that you +have renamed or deleted entirely. In the latter two cases, use the +name of the patch as it was when you applied it. + +By default, the \hgxcmd{mq}{qpop} command will not pop any patches if the +working directory has been modified. You can override this behaviour +using the \hgxopt{mq}{qpop}{-f} option, which reverts all modifications in +the working directory. + +Options: +\begin{itemize} +\item[\hgxopt{mq}{qpop}{-a}] Pop all applied patches. This returns the + repository to its state before you applied any patches. +\item[\hgxopt{mq}{qpop}{-f}] Forcibly revert any modifications to the + working directory when popping. +\item[\hgxopt{mq}{qpop}{-n}] Pop a patch from the named queue. +\end{itemize} + +The \hgxcmd{mq}{qpop} command removes one line from the end of the +\sfilename{status} file for each patch that it pops. + +\subsection{\hgxcmd{mq}{qprev}---print the name of the previous patch} + +The \hgxcmd{mq}{qprev} command prints the name of the patch in the +\sfilename{series} file that comes before the topmost applied patch. +This will become the topmost applied patch if you run \hgxcmd{mq}{qpop}. + +\subsection{\hgxcmd{mq}{qpush}---push patches onto the stack} +\label{sec:mqref:cmd:qpush} + +The \hgxcmd{mq}{qpush} command adds patches onto the applied stack. By +default, it adds only one patch. + +This command creates a new changeset to represent each applied patch, +and updates the working directory to apply the effects of the patches. + +The default data used when creating a changeset are as follows: +\begin{itemize} +\item The commit date and time zone are the current date and time + zone. Because these data are used to compute the identity of a + changeset, this means that if you \hgxcmd{mq}{qpop} a patch and + \hgxcmd{mq}{qpush} it again, the changeset that you push will have a + different identity than the changeset you popped. +\item The author is the same as the default used by the \hgcmd{commit} + command. +\item The commit message is any text from the patch file that comes + before the first diff header. If there is no such text, a default + commit message is used that identifies the name of the patch. +\end{itemize} +If a patch contains a Mercurial patch header (XXX add link), the +information in the patch header overrides these defaults. + +Options: +\begin{itemize} +\item[\hgxopt{mq}{qpush}{-a}] Push all unapplied patches from the + \sfilename{series} file until there are none left to push. +\item[\hgxopt{mq}{qpush}{-l}] Add the name of the patch to the end + of the commit message. +\item[\hgxopt{mq}{qpush}{-m}] If a patch fails to apply cleanly, use the + entry for the patch in another saved queue to compute the parameters + for a three-way merge, and perform a three-way merge using the + normal Mercurial merge machinery. Use the resolution of the merge + as the new patch content. +\item[\hgxopt{mq}{qpush}{-n}] Use the named queue if merging while pushing. +\end{itemize} + +The \hgxcmd{mq}{qpush} command reads, but does not modify, the +\sfilename{series} file. It appends one line to the \hgcmd{status} +file for each patch that it pushes. + +\subsection{\hgxcmd{mq}{qrefresh}---update the topmost applied patch} + +The \hgxcmd{mq}{qrefresh} command updates the topmost applied patch. It +modifies the patch, removes the old changeset that represented the +patch, and creates a new changeset to represent the modified patch. + +The \hgxcmd{mq}{qrefresh} command looks for the following modifications: +\begin{itemize} +\item Changes to the commit message, i.e.~the text before the first + diff header in the patch file, are reflected in the new changeset + that represents the patch. +\item Modifications to tracked files in the working directory are + added to the patch. +\item Changes to the files tracked using \hgcmd{add}, \hgcmd{copy}, + \hgcmd{remove}, or \hgcmd{rename}. Added files and copy and rename + destinations are added to the patch, while removed files and rename + sources are removed. +\end{itemize} + +Even if \hgxcmd{mq}{qrefresh} detects no changes, it still recreates the +changeset that represents the patch. This causes the identity of the +changeset to differ from the previous changeset that identified the +patch. + +Options: +\begin{itemize} +\item[\hgxopt{mq}{qrefresh}{-e}] Modify the commit and patch description, + using the preferred text editor. +\item[\hgxopt{mq}{qrefresh}{-m}] Modify the commit message and patch + description, using the given text. +\item[\hgxopt{mq}{qrefresh}{-l}] Modify the commit message and patch + description, using text from the given file. +\end{itemize} + +\subsection{\hgxcmd{mq}{qrename}---rename a patch} + +The \hgxcmd{mq}{qrename} command renames a patch, and changes the entry for +the patch in the \sfilename{series} file. + +With a single argument, \hgxcmd{mq}{qrename} renames the topmost applied +patch. With two arguments, it renames its first argument to its +second. + +\subsection{\hgxcmd{mq}{qrestore}---restore saved queue state} + +XXX No idea what this does. + +\subsection{\hgxcmd{mq}{qsave}---save current queue state} + +XXX Likewise. + +\subsection{\hgxcmd{mq}{qseries}---print the entire patch series} + +The \hgxcmd{mq}{qseries} command prints the entire patch series from the +\sfilename{series} file. It prints only patch names, not empty lines +or comments. It prints in order from first to be applied to last. + +\subsection{\hgxcmd{mq}{qtop}---print the name of the current patch} + +The \hgxcmd{mq}{qtop} prints the name of the topmost currently applied +patch. + +\subsection{\hgxcmd{mq}{qunapplied}---print patches not yet applied} + +The \hgxcmd{mq}{qunapplied} command prints the names of patches from the +\sfilename{series} file that are not yet applied. It prints them in +order from the next patch that will be pushed to the last. + +\subsection{\hgcmd{strip}---remove a revision and descendants} + +The \hgcmd{strip} command removes a revision, and all of its +descendants, from the repository. It undoes the effects of the +removed revisions from the repository, and updates the working +directory to the first parent of the removed revision. + +The \hgcmd{strip} command saves a backup of the removed changesets in +a bundle, so that they can be reapplied if removed in error. + +Options: +\begin{itemize} +\item[\hgopt{strip}{-b}] Save unrelated changesets that are intermixed + with the stripped changesets in the backup bundle. +\item[\hgopt{strip}{-f}] If a branch has multiple heads, remove all + heads. XXX This should be renamed, and use \texttt{-f} to strip revs + when there are pending changes. +\item[\hgopt{strip}{-n}] Do not save a backup bundle. +\end{itemize} + +\section{MQ file reference} + +\subsection{The \sfilename{series} file} + +The \sfilename{series} file contains a list of the names of all +patches that MQ can apply. It is represented as a list of names, with +one name saved per line. Leading and trailing white space in each +line are ignored. + +Lines may contain comments. A comment begins with the ``\texttt{\#}'' +character, and extends to the end of the line. Empty lines, and lines +that contain only comments, are ignored. + +You will often need to edit the \sfilename{series} file by hand, hence +the support for comments and empty lines noted above. For example, +you can comment out a patch temporarily, and \hgxcmd{mq}{qpush} will skip +over that patch when applying patches. You can also change the order +in which patches are applied by reordering their entries in the +\sfilename{series} file. + +Placing the \sfilename{series} file under revision control is also +supported; it is a good idea to place all of the patches that it +refers to under revision control, as well. If you create a patch +directory using the \hgxopt{mq}{qinit}{-c} option to \hgxcmd{mq}{qinit}, this +will be done for you automatically. + +\subsection{The \sfilename{status} file} + +The \sfilename{status} file contains the names and changeset hashes of +all patches that MQ currently has applied. Unlike the +\sfilename{series} file, this file is not intended for editing. You +should not place this file under revision control, or modify it in any +way. It is used by MQ strictly for internal book-keeping. + +%%% Local Variables: +%%% mode: latex +%%% TeX-master: "00book" +%%% End: diff -r bc14f94e726a -r 5cd47f721686 en/appC-srcinstall.tex --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/en/appC-srcinstall.tex Thu Jan 29 22:56:27 2009 -0800 @@ -0,0 +1,53 @@ +\chapter{Installing Mercurial from source} +\label{chap:srcinstall} + +\section{On a Unix-like system} +\label{sec:srcinstall:unixlike} + +If you are using a Unix-like system that has a sufficiently recent +version of Python (2.3~or newer) available, it is easy to install +Mercurial from source. +\begin{enumerate} +\item Download a recent source tarball from + \url{http://www.selenic.com/mercurial/download}. +\item Unpack the tarball: + \begin{codesample4} + gzip -dc mercurial-\emph{version}.tar.gz | tar xf - + \end{codesample4} +\item Go into the source directory and run the installer script. This + will build Mercurial and install it in your home directory. + \begin{codesample4} + cd mercurial-\emph{version} + python setup.py install --force --home=\$HOME + \end{codesample4} +\end{enumerate} +Once the install finishes, Mercurial will be in the \texttt{bin} +subdirectory of your home directory. Don't forget to make sure that +this directory is present in your shell's search path. + +You will probably need to set the \envar{PYTHONPATH} environment +variable so that the Mercurial executable can find the rest of the +Mercurial packages. For example, on my laptop, I have set it to +\texttt{/home/bos/lib/python}. The exact path that you will need to +use depends on how Python was built for your system, but should be +easy to figure out. If you're uncertain, look through the output of +the installer script above, and see where the contents of the +\texttt{mercurial} directory were installed to. + +\section{On Windows} + +Building and installing Mercurial on Windows requires a variety of +tools, a fair amount of technical knowledge, and considerable +patience. I very much \emph{do not recommend} this route if you are a +``casual user''. Unless you intend to hack on Mercurial, I strongly +suggest that you use a binary package instead. + +If you are intent on building Mercurial from source on Windows, follow +the ``hard way'' directions on the Mercurial wiki at +\url{http://www.selenic.com/mercurial/wiki/index.cgi/WindowsInstall}, +and expect the process to involve a lot of fiddly work. + +%%% Local Variables: +%%% mode: latex +%%% TeX-master: "00book" +%%% End: diff -r bc14f94e726a -r 5cd47f721686 en/appD-license.tex --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/en/appD-license.tex Thu Jan 29 22:56:27 2009 -0800 @@ -0,0 +1,138 @@ +\chapter{Open Publication License} +\label{cha:opl} + +Version 1.0, 8 June 1999 + +\section{Requirements on both unmodified and modified versions} + +The Open Publication works may be reproduced and distributed in whole +or in part, in any medium physical or electronic, provided that the +terms of this license are adhered to, and that this license or an +incorporation of it by reference (with any options elected by the +author(s) and/or publisher) is displayed in the reproduction. + +Proper form for an incorporation by reference is as follows: + +\begin{quote} + Copyright (c) \emph{year} by \emph{author's name or designee}. This + material may be distributed only subject to the terms and conditions + set forth in the Open Publication License, v\emph{x.y} or later (the + latest version is presently available at + \url{http://www.opencontent.org/openpub/}). +\end{quote} + +The reference must be immediately followed with any options elected by +the author(s) and/or publisher of the document (see +section~\ref{sec:opl:options}). + +Commercial redistribution of Open Publication-licensed material is +permitted. + +Any publication in standard (paper) book form shall require the +citation of the original publisher and author. The publisher and +author's names shall appear on all outer surfaces of the book. On all +outer surfaces of the book the original publisher's name shall be as +large as the title of the work and cited as possessive with respect to +the title. + +\section{Copyright} + +The copyright to each Open Publication is owned by its author(s) or +designee. + +\section{Scope of license} + +The following license terms apply to all Open Publication works, +unless otherwise explicitly stated in the document. + +Mere aggregation of Open Publication works or a portion of an Open +Publication work with other works or programs on the same media shall +not cause this license to apply to those other works. The aggregate +work shall contain a notice specifying the inclusion of the Open +Publication material and appropriate copyright notice. + +\textbf{Severability}. If any part of this license is found to be +unenforceable in any jurisdiction, the remaining portions of the +license remain in force. + +\textbf{No warranty}. Open Publication works are licensed and provided +``as is'' without warranty of any kind, express or implied, including, +but not limited to, the implied warranties of merchantability and +fitness for a particular purpose or a warranty of non-infringement. + +\section{Requirements on modified works} + +All modified versions of documents covered by this license, including +translations, anthologies, compilations and partial documents, must +meet the following requirements: + +\begin{enumerate} +\item The modified version must be labeled as such. +\item The person making the modifications must be identified and the + modifications dated. +\item Acknowledgement of the original author and publisher if + applicable must be retained according to normal academic citation + practices. +\item The location of the original unmodified document must be + identified. +\item The original author's (or authors') name(s) may not be used to + assert or imply endorsement of the resulting document without the + original author's (or authors') permission. +\end{enumerate} + +\section{Good-practice recommendations} + +In addition to the requirements of this license, it is requested from +and strongly recommended of redistributors that: + +\begin{enumerate} +\item If you are distributing Open Publication works on hardcopy or + CD-ROM, you provide email notification to the authors of your intent + to redistribute at least thirty days before your manuscript or media + freeze, to give the authors time to provide updated documents. This + notification should describe modifications, if any, made to the + document. +\item All substantive modifications (including deletions) be either + clearly marked up in the document or else described in an attachment + to the document. +\item Finally, while it is not mandatory under this license, it is + considered good form to offer a free copy of any hardcopy and CD-ROM + expression of an Open Publication-licensed work to its author(s). +\end{enumerate} + +\section{License options} +\label{sec:opl:options} + +The author(s) and/or publisher of an Open Publication-licensed +document may elect certain options by appending language to the +reference to or copy of the license. These options are considered part +of the license instance and must be included with the license (or its +incorporation by reference) in derived works. + +\begin{enumerate}[A] +\item To prohibit distribution of substantively modified versions + without the explicit permission of the author(s). ``Substantive + modification'' is defined as a change to the semantic content of the + document, and excludes mere changes in format or typographical + corrections. + + To accomplish this, add the phrase ``Distribution of substantively + modified versions of this document is prohibited without the + explicit permission of the copyright holder.'' to the license + reference or copy. + +\item To prohibit any publication of this work or derivative works in + whole or in part in standard (paper) book form for commercial + purposes is prohibited unless prior permission is obtained from the + copyright holder. + + To accomplish this, add the phrase ``Distribution of the work or + derivative of the work in any standard (paper) book form is + prohibited unless prior permission is obtained from the copyright + holder.'' to the license reference or copy. +\end{enumerate} + +%%% Local Variables: +%%% mode: latex +%%% TeX-master: "00book" +%%% End: diff -r bc14f94e726a -r 5cd47f721686 en/branch.tex --- a/en/branch.tex Thu Jan 29 22:47:34 2009 -0800 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,392 +0,0 @@ -\chapter{Managing releases and branchy development} -\label{chap:branch} - -Mercurial provides several mechanisms for you to manage a project that -is making progress on multiple fronts at once. To understand these -mechanisms, let's first take a brief look at a fairly normal software -project structure. - -Many software projects issue periodic ``major'' releases that contain -substantial new features. In parallel, they may issue ``minor'' -releases. These are usually identical to the major releases off which -they're based, but with a few bugs fixed. - -In this chapter, we'll start by talking about how to keep records of -project milestones such as releases. We'll then continue on to talk -about the flow of work between different phases of a project, and how -Mercurial can help you to isolate and manage this work. - -\section{Giving a persistent name to a revision} - -Once you decide that you'd like to call a particular revision a -``release'', it's a good idea to record the identity of that revision. -This will let you reproduce that release at a later date, for whatever -purpose you might need at the time (reproducing a bug, porting to a -new platform, etc). -\interaction{tag.init} - -Mercurial lets you give a permanent name to any revision using the -\hgcmd{tag} command. Not surprisingly, these names are called -``tags''. -\interaction{tag.tag} - -A tag is nothing more than a ``symbolic name'' for a revision. Tags -exist purely for your convenience, so that you have a handy permanent -way to refer to a revision; Mercurial doesn't interpret the tag names -you use in any way. Neither does Mercurial place any restrictions on -the name of a tag, beyond a few that are necessary to ensure that a -tag can be parsed unambiguously. A tag name cannot contain any of the -following characters: -\begin{itemize} -\item Colon (ASCII 58, ``\texttt{:}'') -\item Carriage return (ASCII 13, ``\Verb+\r+'') -\item Newline (ASCII 10, ``\Verb+\n+'') -\end{itemize} - -You can use the \hgcmd{tags} command to display the tags present in -your repository. In the output, each tagged revision is identified -first by its name, then by revision number, and finally by the unique -hash of the revision. -\interaction{tag.tags} -Notice that \texttt{tip} is listed in the output of \hgcmd{tags}. The -\texttt{tip} tag is a special ``floating'' tag, which always -identifies the newest revision in the repository. - -In the output of the \hgcmd{tags} command, tags are listed in reverse -order, by revision number. This usually means that recent tags are -listed before older tags. It also means that \texttt{tip} is always -going to be the first tag listed in the output of \hgcmd{tags}. - -When you run \hgcmd{log}, if it displays a revision that has tags -associated with it, it will print those tags. -\interaction{tag.log} - -Any time you need to provide a revision~ID to a Mercurial command, the -command will accept a tag name in its place. Internally, Mercurial -will translate your tag name into the corresponding revision~ID, then -use that. -\interaction{tag.log.v1.0} - -There's no limit on the number of tags you can have in a repository, -or on the number of tags that a single revision can have. As a -practical matter, it's not a great idea to have ``too many'' (a number -which will vary from project to project), simply because tags are -supposed to help you to find revisions. If you have lots of tags, the -ease of using them to identify revisions diminishes rapidly. - -For example, if your project has milestones as frequent as every few -days, it's perfectly reasonable to tag each one of those. But if you -have a continuous build system that makes sure every revision can be -built cleanly, you'd be introducing a lot of noise if you were to tag -every clean build. Instead, you could tag failed builds (on the -assumption that they're rare!), or simply not use tags to track -buildability. - -If you want to remove a tag that you no longer want, use -\hgcmdargs{tag}{--remove}. -\interaction{tag.remove} -You can also modify a tag at any time, so that it identifies a -different revision, by simply issuing a new \hgcmd{tag} command. -You'll have to use the \hgopt{tag}{-f} option to tell Mercurial that -you \emph{really} want to update the tag. -\interaction{tag.replace} -There will still be a permanent record of the previous identity of the -tag, but Mercurial will no longer use it. There's thus no penalty to -tagging the wrong revision; all you have to do is turn around and tag -the correct revision once you discover your error. - -Mercurial stores tags in a normal revision-controlled file in your -repository. If you've created any tags, you'll find them in a file -named \sfilename{.hgtags}. When you run the \hgcmd{tag} command, -Mercurial modifies this file, then automatically commits the change to -it. This means that every time you run \hgcmd{tag}, you'll see a -corresponding changeset in the output of \hgcmd{log}. -\interaction{tag.tip} - -\subsection{Handling tag conflicts during a merge} - -You won't often need to care about the \sfilename{.hgtags} file, but -it sometimes makes its presence known during a merge. The format of -the file is simple: it consists of a series of lines. Each line -starts with a changeset hash, followed by a space, followed by the -name of a tag. - -If you're resolving a conflict in the \sfilename{.hgtags} file during -a merge, there's one twist to modifying the \sfilename{.hgtags} file: -when Mercurial is parsing the tags in a repository, it \emph{never} -reads the working copy of the \sfilename{.hgtags} file. Instead, it -reads the \emph{most recently committed} revision of the file. - -An unfortunate consequence of this design is that you can't actually -verify that your merged \sfilename{.hgtags} file is correct until -\emph{after} you've committed a change. So if you find yourself -resolving a conflict on \sfilename{.hgtags} during a merge, be sure to -run \hgcmd{tags} after you commit. If it finds an error in the -\sfilename{.hgtags} file, it will report the location of the error, -which you can then fix and commit. You should then run \hgcmd{tags} -again, just to be sure that your fix is correct. - -\subsection{Tags and cloning} - -You may have noticed that the \hgcmd{clone} command has a -\hgopt{clone}{-r} option that lets you clone an exact copy of the -repository as of a particular changeset. The new clone will not -contain any project history that comes after the revision you -specified. This has an interaction with tags that can surprise the -unwary. - -Recall that a tag is stored as a revision to the \sfilename{.hgtags} -file, so that when you create a tag, the changeset in which it's -recorded necessarily refers to an older changeset. When you run -\hgcmdargs{clone}{-r foo} to clone a repository as of tag -\texttt{foo}, the new clone \emph{will not contain the history that - created the tag} that you used to clone the repository. The result -is that you'll get exactly the right subset of the project's history -in the new repository, but \emph{not} the tag you might have expected. - -\subsection{When permanent tags are too much} - -Since Mercurial's tags are revision controlled and carried around with -a project's history, everyone you work with will see the tags you -create. But giving names to revisions has uses beyond simply noting -that revision \texttt{4237e45506ee} is really \texttt{v2.0.2}. If -you're trying to track down a subtle bug, you might want a tag to -remind you of something like ``Anne saw the symptoms with this -revision''. - -For cases like this, what you might want to use are \emph{local} tags. -You can create a local tag with the \hgopt{tag}{-l} option to the -\hgcmd{tag} command. This will store the tag in a file called -\sfilename{.hg/localtags}. Unlike \sfilename{.hgtags}, -\sfilename{.hg/localtags} is not revision controlled. Any tags you -create using \hgopt{tag}{-l} remain strictly local to the repository -you're currently working in. - -\section{The flow of changes---big picture vs. little} - -To return to the outline I sketched at the beginning of a chapter, -let's think about a project that has multiple concurrent pieces of -work under development at once. - -There might be a push for a new ``main'' release; a new minor bugfix -release to the last main release; and an unexpected ``hot fix'' to an -old release that is now in maintenance mode. - -The usual way people refer to these different concurrent directions of -development is as ``branches''. However, we've already seen numerous -times that Mercurial treats \emph{all of history} as a series of -branches and merges. Really, what we have here is two ideas that are -peripherally related, but which happen to share a name. -\begin{itemize} -\item ``Big picture'' branches represent the sweep of a project's - evolution; people give them names, and talk about them in - conversation. -\item ``Little picture'' branches are artefacts of the day-to-day - activity of developing and merging changes. They expose the - narrative of how the code was developed. -\end{itemize} - -\section{Managing big-picture branches in repositories} - -The easiest way to isolate a ``big picture'' branch in Mercurial is in -a dedicated repository. If you have an existing shared -repository---let's call it \texttt{myproject}---that reaches a ``1.0'' -milestone, you can start to prepare for future maintenance releases on -top of version~1.0 by tagging the revision from which you prepared -the~1.0 release. -\interaction{branch-repo.tag} -You can then clone a new shared \texttt{myproject-1.0.1} repository as -of that tag. -\interaction{branch-repo.clone} - -Afterwards, if someone needs to work on a bug fix that ought to go -into an upcoming~1.0.1 minor release, they clone the -\texttt{myproject-1.0.1} repository, make their changes, and push them -back. -\interaction{branch-repo.bugfix} -Meanwhile, development for the next major release can continue, -isolated and unabated, in the \texttt{myproject} repository. -\interaction{branch-repo.new} - -\section{Don't repeat yourself: merging across branches} - -In many cases, if you have a bug to fix on a maintenance branch, the -chances are good that the bug exists on your project's main branch -(and possibly other maintenance branches, too). It's a rare developer -who wants to fix the same bug multiple times, so let's look at a few -ways that Mercurial can help you to manage these bugfixes without -duplicating your work. - -In the simplest instance, all you need to do is pull changes from your -maintenance branch into your local clone of the target branch. -\interaction{branch-repo.pull} -You'll then need to merge the heads of the two branches, and push back -to the main branch. -\interaction{branch-repo.merge} - -\section{Naming branches within one repository} - -In most instances, isolating branches in repositories is the right -approach. Its simplicity makes it easy to understand; and so it's -hard to make mistakes. There's a one-to-one relationship between -branches you're working in and directories on your system. This lets -you use normal (non-Mercurial-aware) tools to work on files within a -branch/repository. - -If you're more in the ``power user'' category (\emph{and} your -collaborators are too), there is an alternative way of handling -branches that you can consider. I've already mentioned the -human-level distinction between ``small picture'' and ``big picture'' -branches. While Mercurial works with multiple ``small picture'' -branches in a repository all the time (for example after you pull -changes in, but before you merge them), it can \emph{also} work with -multiple ``big picture'' branches. - -The key to working this way is that Mercurial lets you assign a -persistent \emph{name} to a branch. There always exists a branch -named \texttt{default}. Even before you start naming branches -yourself, you can find traces of the \texttt{default} branch if you -look for them. - -As an example, when you run the \hgcmd{commit} command, and it pops up -your editor so that you can enter a commit message, look for a line -that contains the text ``\texttt{HG: branch default}'' at the bottom. -This is telling you that your commit will occur on the branch named -\texttt{default}. - -To start working with named branches, use the \hgcmd{branches} -command. This command lists the named branches already present in -your repository, telling you which changeset is the tip of each. -\interaction{branch-named.branches} -Since you haven't created any named branches yet, the only one that -exists is \texttt{default}. - -To find out what the ``current'' branch is, run the \hgcmd{branch} -command, giving it no arguments. This tells you what branch the -parent of the current changeset is on. -\interaction{branch-named.branch} - -To create a new branch, run the \hgcmd{branch} command again. This -time, give it one argument: the name of the branch you want to create. -\interaction{branch-named.create} - -After you've created a branch, you might wonder what effect the -\hgcmd{branch} command has had. What do the \hgcmd{status} and -\hgcmd{tip} commands report? -\interaction{branch-named.status} -Nothing has changed in the working directory, and there's been no new -history created. As this suggests, running the \hgcmd{branch} command -has no permanent effect; it only tells Mercurial what branch name to -use the \emph{next} time you commit a changeset. - -When you commit a change, Mercurial records the name of the branch on -which you committed. Once you've switched from the \texttt{default} -branch to another and committed, you'll see the name of the new branch -show up in the output of \hgcmd{log}, \hgcmd{tip}, and other commands -that display the same kind of output. -\interaction{branch-named.commit} -The \hgcmd{log}-like commands will print the branch name of every -changeset that's not on the \texttt{default} branch. As a result, if -you never use named branches, you'll never see this information. - -Once you've named a branch and committed a change with that name, -every subsequent commit that descends from that change will inherit -the same branch name. You can change the name of a branch at any -time, using the \hgcmd{branch} command. -\interaction{branch-named.rebranch} -In practice, this is something you won't do very often, as branch -names tend to have fairly long lifetimes. (This isn't a rule, just an -observation.) - -\section{Dealing with multiple named branches in a repository} - -If you have more than one named branch in a repository, Mercurial will -remember the branch that your working directory on when you start a -command like \hgcmd{update} or \hgcmdargs{pull}{-u}. It will update -the working directory to the tip of this branch, no matter what the -``repo-wide'' tip is. To update to a revision that's on a different -named branch, you may need to use the \hgopt{update}{-C} option to -\hgcmd{update}. - -This behaviour is a little subtle, so let's see it in action. First, -let's remind ourselves what branch we're currently on, and what -branches are in our repository. -\interaction{branch-named.parents} -We're on the \texttt{bar} branch, but there also exists an older -\hgcmd{foo} branch. - -We can \hgcmd{update} back and forth between the tips of the -\texttt{foo} and \texttt{bar} branches without needing to use the -\hgopt{update}{-C} option, because this only involves going backwards -and forwards linearly through our change history. -\interaction{branch-named.update-switchy} - -If we go back to the \texttt{foo} branch and then run \hgcmd{update}, -it will keep us on \texttt{foo}, not move us to the tip of -\texttt{bar}. -\interaction{branch-named.update-nothing} - -Committing a new change on the \texttt{foo} branch introduces a new -head. -\interaction{branch-named.foo-commit} - -\section{Branch names and merging} - -As you've probably noticed, merges in Mercurial are not symmetrical. -Let's say our repository has two heads, 17 and 23. If I -\hgcmd{update} to 17 and then \hgcmd{merge} with 23, Mercurial records -17 as the first parent of the merge, and 23 as the second. Whereas if -I \hgcmd{update} to 23 and then \hgcmd{merge} with 17, it records 23 -as the first parent, and 17 as the second. - -This affects Mercurial's choice of branch name when you merge. After -a merge, Mercurial will retain the branch name of the first parent -when you commit the result of the merge. If your first parent's -branch name is \texttt{foo}, and you merge with \texttt{bar}, the -branch name will still be \texttt{foo} after you merge. - -It's not unusual for a repository to contain multiple heads, each with -the same branch name. Let's say I'm working on the \texttt{foo} -branch, and so are you. We commit different changes; I pull your -changes; I now have two heads, each claiming to be on the \texttt{foo} -branch. The result of a merge will be a single head on the -\texttt{foo} branch, as you might hope. - -But if I'm working on the \texttt{bar} branch, and I merge work from -the \texttt{foo} branch, the result will remain on the \texttt{bar} -branch. -\interaction{branch-named.merge} - -To give a more concrete example, if I'm working on the -\texttt{bleeding-edge} branch, and I want to bring in the latest fixes -from the \texttt{stable} branch, Mercurial will choose the ``right'' -(\texttt{bleeding-edge}) branch name when I pull and merge from -\texttt{stable}. - -\section{Branch naming is generally useful} - -You shouldn't think of named branches as applicable only to situations -where you have multiple long-lived branches cohabiting in a single -repository. They're very useful even in the one-branch-per-repository -case. - -In the simplest case, giving a name to each branch gives you a -permanent record of which branch a changeset originated on. This -gives you more context when you're trying to follow the history of a -long-lived branchy project. - -If you're working with shared repositories, you can set up a -\hook{pretxnchangegroup} hook on each that will block incoming changes -that have the ``wrong'' branch name. This provides a simple, but -effective, defence against people accidentally pushing changes from a -``bleeding edge'' branch to a ``stable'' branch. Such a hook might -look like this inside the shared repo's \hgrc. -\begin{codesample2} - [hooks] - pretxnchangegroup.branch = hg heads --template '{branches} ' | grep mybranch -\end{codesample2} - -%%% Local Variables: -%%% mode: latex -%%% TeX-master: "00book" -%%% End: diff -r bc14f94e726a -r 5cd47f721686 en/ch00-preface.tex --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/en/ch00-preface.tex Thu Jan 29 22:56:27 2009 -0800 @@ -0,0 +1,67 @@ +\chapter*{Preface} +\addcontentsline{toc}{chapter}{Preface} +\label{chap:preface} + +Distributed revision control is a relatively new territory, and has +thus far grown due to people's willingness to strike out into +ill-charted territory. + +I am writing a book about distributed revision control because I +believe that it is an important subject that deserves a field guide. +I chose to write about Mercurial because it is the easiest tool to +learn the terrain with, and yet it scales to the demands of real, +challenging environments where many other revision control tools fail. + +\section{This book is a work in progress} + +I am releasing this book while I am still writing it, in the hope that +it will prove useful to others. I also hope that readers will +contribute as they see fit. + +\section{About the examples in this book} + +This book takes an unusual approach to code samples. Every example is +``live''---each one is actually the result of a shell script that +executes the Mercurial commands you see. Every time an image of the +book is built from its sources, all the example scripts are +automatically run, and their current results compared against their +expected results. + +The advantage of this approach is that the examples are always +accurate; they describe \emph{exactly} the behaviour of the version of +Mercurial that's mentioned at the front of the book. If I update the +version of Mercurial that I'm documenting, and the output of some +command changes, the build fails. + +There is a small disadvantage to this approach, which is that the +dates and times you'll see in examples tend to be ``squashed'' +together in a way that they wouldn't be if the same commands were +being typed by a human. Where a human can issue no more than one +command every few seconds, with any resulting timestamps +correspondingly spread out, my automated example scripts run many +commands in one second. + +As an instance of this, several consecutive commits in an example can +show up as having occurred during the same second. You can see this +occur in the \hgext{bisect} example in section~\ref{sec:undo:bisect}, +for instance. + +So when you're reading examples, don't place too much weight on the +dates or times you see in the output of commands. But \emph{do} be +confident that the behaviour you're seeing is consistent and +reproducible. + +\section{Colophon---this book is Free} + +This book is licensed under the Open Publication License, and is +produced entirely using Free Software tools. It is typeset with +\LaTeX{}; illustrations are drawn and rendered with +\href{http://www.inkscape.org/}{Inkscape}. + +The complete source code for this book is published as a Mercurial +repository, at \url{http://hg.serpentine.com/mercurial/book}. + +%%% Local Variables: +%%% mode: latex +%%% TeX-master: "00book" +%%% End: diff -r bc14f94e726a -r 5cd47f721686 en/ch01-intro.tex --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/en/ch01-intro.tex Thu Jan 29 22:56:27 2009 -0800 @@ -0,0 +1,561 @@ +\chapter{Introduction} +\label{chap:intro} + +\section{About revision control} + +Revision control is the process of managing multiple versions of a +piece of information. In its simplest form, this is something that +many people do by hand: every time you modify a file, save it under a +new name that contains a number, each one higher than the number of +the preceding version. + +Manually managing multiple versions of even a single file is an +error-prone task, though, so software tools to help automate this +process have long been available. The earliest automated revision +control tools were intended to help a single user to manage revisions +of a single file. Over the past few decades, the scope of revision +control tools has expanded greatly; they now manage multiple files, +and help multiple people to work together. The best modern revision +control tools have no problem coping with thousands of people working +together on projects that consist of hundreds of thousands of files. + +\subsection{Why use revision control?} + +There are a number of reasons why you or your team might want to use +an automated revision control tool for a project. +\begin{itemize} +\item It will track the history and evolution of your project, so you + don't have to. For every change, you'll have a log of \emph{who} + made it; \emph{why} they made it; \emph{when} they made it; and + \emph{what} the change was. +\item When you're working with other people, revision control software + makes it easier for you to collaborate. For example, when people + more or less simultaneously make potentially incompatible changes, + the software will help you to identify and resolve those conflicts. +\item It can help you to recover from mistakes. If you make a change + that later turns out to be in error, you can revert to an earlier + version of one or more files. In fact, a \emph{really} good + revision control tool will even help you to efficiently figure out + exactly when a problem was introduced (see + section~\ref{sec:undo:bisect} for details). +\item It will help you to work simultaneously on, and manage the drift + between, multiple versions of your project. +\end{itemize} +Most of these reasons are equally valid---at least in theory---whether +you're working on a project by yourself, or with a hundred other +people. + +A key question about the practicality of revision control at these two +different scales (``lone hacker'' and ``huge team'') is how its +\emph{benefits} compare to its \emph{costs}. A revision control tool +that's difficult to understand or use is going to impose a high cost. + +A five-hundred-person project is likely to collapse under its own +weight almost immediately without a revision control tool and process. +In this case, the cost of using revision control might hardly seem +worth considering, since \emph{without} it, failure is almost +guaranteed. + +On the other hand, a one-person ``quick hack'' might seem like a poor +place to use a revision control tool, because surely the cost of using +one must be close to the overall cost of the project. Right? + +Mercurial uniquely supports \emph{both} of these scales of +development. You can learn the basics in just a few minutes, and due +to its low overhead, you can apply revision control to the smallest of +projects with ease. Its simplicity means you won't have a lot of +abstruse concepts or command sequences competing for mental space with +whatever you're \emph{really} trying to do. At the same time, +Mercurial's high performance and peer-to-peer nature let you scale +painlessly to handle large projects. + +No revision control tool can rescue a poorly run project, but a good +choice of tools can make a huge difference to the fluidity with which +you can work on a project. + +\subsection{The many names of revision control} + +Revision control is a diverse field, so much so that it doesn't +actually have a single name or acronym. Here are a few of the more +common names and acronyms you'll encounter: +\begin{itemize} +\item Revision control (RCS) +\item Software configuration management (SCM), or configuration management +\item Source code management +\item Source code control, or source control +\item Version control (VCS) +\end{itemize} +Some people claim that these terms actually have different meanings, +but in practice they overlap so much that there's no agreed or even +useful way to tease them apart. + +\section{A short history of revision control} + +The best known of the old-time revision control tools is SCCS (Source +Code Control System), which Marc Rochkind wrote at Bell Labs, in the +early 1970s. SCCS operated on individual files, and required every +person working on a project to have access to a shared workspace on a +single system. Only one person could modify a file at any time; +arbitration for access to files was via locks. It was common for +people to lock files, and later forget to unlock them, preventing +anyone else from modifying those files without the help of an +administrator. + +Walter Tichy developed a free alternative to SCCS in the early 1980s; +he called his program RCS (Revison Control System). Like SCCS, RCS +required developers to work in a single shared workspace, and to lock +files to prevent multiple people from modifying them simultaneously. + +Later in the 1980s, Dick Grune used RCS as a building block for a set +of shell scripts he initially called cmt, but then renamed to CVS +(Concurrent Versions System). The big innovation of CVS was that it +let developers work simultaneously and somewhat independently in their +own personal workspaces. The personal workspaces prevented developers +from stepping on each other's toes all the time, as was common with +SCCS and RCS. Each developer had a copy of every project file, and +could modify their copies independently. They had to merge their +edits prior to committing changes to the central repository. + +Brian Berliner took Grune's original scripts and rewrote them in~C, +releasing in 1989 the code that has since developed into the modern +version of CVS. CVS subsequently acquired the ability to operate over +a network connection, giving it a client/server architecture. CVS's +architecture is centralised; only the server has a copy of the history +of the project. Client workspaces just contain copies of recent +versions of the project's files, and a little metadata to tell them +where the server is. CVS has been enormously successful; it is +probably the world's most widely used revision control system. + +In the early 1990s, Sun Microsystems developed an early distributed +revision control system, called TeamWare. A TeamWare workspace +contains a complete copy of the project's history. TeamWare has no +notion of a central repository. (CVS relied upon RCS for its history +storage; TeamWare used SCCS.) + +As the 1990s progressed, awareness grew of a number of problems with +CVS. It records simultaneous changes to multiple files individually, +instead of grouping them together as a single logically atomic +operation. It does not manage its file hierarchy well; it is easy to +make a mess of a repository by renaming files and directories. Worse, +its source code is difficult to read and maintain, which made the +``pain level'' of fixing these architectural problems prohibitive. + +In 2001, Jim Blandy and Karl Fogel, two developers who had worked on +CVS, started a project to replace it with a tool that would have a +better architecture and cleaner code. The result, Subversion, does +not stray from CVS's centralised client/server model, but it adds +multi-file atomic commits, better namespace management, and a number +of other features that make it a generally better tool than CVS. +Since its initial release, it has rapidly grown in popularity. + +More or less simultaneously, Graydon Hoare began working on an +ambitious distributed revision control system that he named Monotone. +While Monotone addresses many of CVS's design flaws and has a +peer-to-peer architecture, it goes beyond earlier (and subsequent) +revision control tools in a number of innovative ways. It uses +cryptographic hashes as identifiers, and has an integral notion of +``trust'' for code from different sources. + +Mercurial began life in 2005. While a few aspects of its design are +influenced by Monotone, Mercurial focuses on ease of use, high +performance, and scalability to very large projects. + +\section{Trends in revision control} + +There has been an unmistakable trend in the development and use of +revision control tools over the past four decades, as people have +become familiar with the capabilities of their tools and constrained +by their limitations. + +The first generation began by managing single files on individual +computers. Although these tools represented a huge advance over +ad-hoc manual revision control, their locking model and reliance on a +single computer limited them to small, tightly-knit teams. + +The second generation loosened these constraints by moving to +network-centered architectures, and managing entire projects at a +time. As projects grew larger, they ran into new problems. With +clients needing to talk to servers very frequently, server scaling +became an issue for large projects. An unreliable network connection +could prevent remote users from being able to talk to the server at +all. As open source projects started making read-only access +available anonymously to anyone, people without commit privileges +found that they could not use the tools to interact with a project in +a natural way, as they could not record their changes. + +The current generation of revision control tools is peer-to-peer in +nature. All of these systems have dropped the dependency on a single +central server, and allow people to distribute their revision control +data to where it's actually needed. Collaboration over the Internet +has moved from constrained by technology to a matter of choice and +consensus. Modern tools can operate offline indefinitely and +autonomously, with a network connection only needed when syncing +changes with another repository. + +\section{A few of the advantages of distributed revision control} + +Even though distributed revision control tools have for several years +been as robust and usable as their previous-generation counterparts, +people using older tools have not yet necessarily woken up to their +advantages. There are a number of ways in which distributed tools +shine relative to centralised ones. + +For an individual developer, distributed tools are almost always much +faster than centralised tools. This is for a simple reason: a +centralised tool needs to talk over the network for many common +operations, because most metadata is stored in a single copy on the +central server. A distributed tool stores all of its metadata +locally. All else being equal, talking over the network adds overhead +to a centralised tool. Don't underestimate the value of a snappy, +responsive tool: you're going to spend a lot of time interacting with +your revision control software. + +Distributed tools are indifferent to the vagaries of your server +infrastructure, again because they replicate metadata to so many +locations. If you use a centralised system and your server catches +fire, you'd better hope that your backup media are reliable, and that +your last backup was recent and actually worked. With a distributed +tool, you have many backups available on every contributor's computer. + +The reliability of your network will affect distributed tools far less +than it will centralised tools. You can't even use a centralised tool +without a network connection, except for a few highly constrained +commands. With a distributed tool, if your network connection goes +down while you're working, you may not even notice. The only thing +you won't be able to do is talk to repositories on other computers, +something that is relatively rare compared with local operations. If +you have a far-flung team of collaborators, this may be significant. + +\subsection{Advantages for open source projects} + +If you take a shine to an open source project and decide that you +would like to start hacking on it, and that project uses a distributed +revision control tool, you are at once a peer with the people who +consider themselves the ``core'' of that project. If they publish +their repositories, you can immediately copy their project history, +start making changes, and record your work, using the same tools in +the same ways as insiders. By contrast, with a centralised tool, you +must use the software in a ``read only'' mode unless someone grants +you permission to commit changes to their central server. Until then, +you won't be able to record changes, and your local modifications will +be at risk of corruption any time you try to update your client's view +of the repository. + +\subsubsection{The forking non-problem} + +It has been suggested that distributed revision control tools pose +some sort of risk to open source projects because they make it easy to +``fork'' the development of a project. A fork happens when there are +differences in opinion or attitude between groups of developers that +cause them to decide that they can't work together any longer. Each +side takes a more or less complete copy of the project's source code, +and goes off in its own direction. + +Sometimes the camps in a fork decide to reconcile their differences. +With a centralised revision control system, the \emph{technical} +process of reconciliation is painful, and has to be performed largely +by hand. You have to decide whose revision history is going to +``win'', and graft the other team's changes into the tree somehow. +This usually loses some or all of one side's revision history. + +What distributed tools do with respect to forking is they make forking +the \emph{only} way to develop a project. Every single change that +you make is potentially a fork point. The great strength of this +approach is that a distributed revision control tool has to be really +good at \emph{merging} forks, because forks are absolutely +fundamental: they happen all the time. + +If every piece of work that everybody does, all the time, is framed in +terms of forking and merging, then what the open source world refers +to as a ``fork'' becomes \emph{purely} a social issue. If anything, +distributed tools \emph{lower} the likelihood of a fork: +\begin{itemize} +\item They eliminate the social distinction that centralised tools + impose: that between insiders (people with commit access) and + outsiders (people without). +\item They make it easier to reconcile after a social fork, because + all that's involved from the perspective of the revision control + software is just another merge. +\end{itemize} + +Some people resist distributed tools because they want to retain tight +control over their projects, and they believe that centralised tools +give them this control. However, if you're of this belief, and you +publish your CVS or Subversion repositories publically, there are +plenty of tools available that can pull out your entire project's +history (albeit slowly) and recreate it somewhere that you don't +control. So while your control in this case is illusory, you are +forgoing the ability to fluidly collaborate with whatever people feel +compelled to mirror and fork your history. + +\subsection{Advantages for commercial projects} + +Many commercial projects are undertaken by teams that are scattered +across the globe. Contributors who are far from a central server will +see slower command execution and perhaps less reliability. Commercial +revision control systems attempt to ameliorate these problems with +remote-site replication add-ons that are typically expensive to buy +and cantankerous to administer. A distributed system doesn't suffer +from these problems in the first place. Better yet, you can easily +set up multiple authoritative servers, say one per site, so that +there's no redundant communication between repositories over expensive +long-haul network links. + +Centralised revision control systems tend to have relatively low +scalability. It's not unusual for an expensive centralised system to +fall over under the combined load of just a few dozen concurrent +users. Once again, the typical response tends to be an expensive and +clunky replication facility. Since the load on a central server---if +you have one at all---is many times lower with a distributed +tool (because all of the data is replicated everywhere), a single +cheap server can handle the needs of a much larger team, and +replication to balance load becomes a simple matter of scripting. + +If you have an employee in the field, troubleshooting a problem at a +customer's site, they'll benefit from distributed revision control. +The tool will let them generate custom builds, try different fixes in +isolation from each other, and search efficiently through history for +the sources of bugs and regressions in the customer's environment, all +without needing to connect to your company's network. + +\section{Why choose Mercurial?} + +Mercurial has a unique set of properties that make it a particularly +good choice as a revision control system. +\begin{itemize} +\item It is easy to learn and use. +\item It is lightweight. +\item It scales excellently. +\item It is easy to customise. +\end{itemize} + +If you are at all familiar with revision control systems, you should +be able to get up and running with Mercurial in less than five +minutes. Even if not, it will take no more than a few minutes +longer. Mercurial's command and feature sets are generally uniform +and consistent, so you can keep track of a few general rules instead +of a host of exceptions. + +On a small project, you can start working with Mercurial in moments. +Creating new changes and branches; transferring changes around +(whether locally or over a network); and history and status operations +are all fast. Mercurial attempts to stay nimble and largely out of +your way by combining low cognitive overhead with blazingly fast +operations. + +The usefulness of Mercurial is not limited to small projects: it is +used by projects with hundreds to thousands of contributors, each +containing tens of thousands of files and hundreds of megabytes of +source code. + +If the core functionality of Mercurial is not enough for you, it's +easy to build on. Mercurial is well suited to scripting tasks, and +its clean internals and implementation in Python make it easy to add +features in the form of extensions. There are a number of popular and +useful extensions already available, ranging from helping to identify +bugs to improving performance. + +\section{Mercurial compared with other tools} + +Before you read on, please understand that this section necessarily +reflects my own experiences, interests, and (dare I say it) biases. I +have used every one of the revision control tools listed below, in +most cases for several years at a time. + + +\subsection{Subversion} + +Subversion is a popular revision control tool, developed to replace +CVS. It has a centralised client/server architecture. + +Subversion and Mercurial have similarly named commands for performing +the same operations, so if you're familiar with one, it is easy to +learn to use the other. Both tools are portable to all popular +operating systems. + +Prior to version 1.5, Subversion had no useful support for merges. +At the time of writing, its merge tracking capability is new, and known to be +\href{http://svnbook.red-bean.com/nightly/en/svn.branchmerge.advanced.html#svn.branchmerge.advanced.finalword}{complicated + and buggy}. + +Mercurial has a substantial performance advantage over Subversion on +every revision control operation I have benchmarked. I have measured +its advantage as ranging from a factor of two to a factor of six when +compared with Subversion~1.4.3's \emph{ra\_local} file store, which is +the fastest access method available. In more realistic deployments +involving a network-based store, Subversion will be at a substantially +larger disadvantage. Because many Subversion commands must talk to +the server and Subversion does not have useful replication facilities, +server capacity and network bandwidth become bottlenecks for modestly +large projects. + +Additionally, Subversion incurs substantial storage overhead to avoid +network transactions for a few common operations, such as finding +modified files (\texttt{status}) and displaying modifications against +the current revision (\texttt{diff}). As a result, a Subversion +working copy is often the same size as, or larger than, a Mercurial +repository and working directory, even though the Mercurial repository +contains a complete history of the project. + +Subversion is widely supported by third party tools. Mercurial +currently lags considerably in this area. This gap is closing, +however, and indeed some of Mercurial's GUI tools now outshine their +Subversion equivalents. Like Mercurial, Subversion has an excellent +user manual. + +Because Subversion doesn't store revision history on the client, it is +well suited to managing projects that deal with lots of large, opaque +binary files. If you check in fifty revisions to an incompressible +10MB file, Subversion's client-side space usage stays constant The +space used by any distributed SCM will grow rapidly in proportion to +the number of revisions, because the differences between each revision +are large. + +In addition, it's often difficult or, more usually, impossible to +merge different versions of a binary file. Subversion's ability to +let a user lock a file, so that they temporarily have the exclusive +right to commit changes to it, can be a significant advantage to a +project where binary files are widely used. + +Mercurial can import revision history from a Subversion repository. +It can also export revision history to a Subversion repository. This +makes it easy to ``test the waters'' and use Mercurial and Subversion +in parallel before deciding to switch. History conversion is +incremental, so you can perform an initial conversion, then small +additional conversions afterwards to bring in new changes. + + +\subsection{Git} + +Git is a distributed revision control tool that was developed for +managing the Linux kernel source tree. Like Mercurial, its early +design was somewhat influenced by Monotone. + +Git has a very large command set, with version~1.5.0 providing~139 +individual commands. It has something of a reputation for being +difficult to learn. Compared to Git, Mercurial has a strong focus on +simplicity. + +In terms of performance, Git is extremely fast. In several cases, it +is faster than Mercurial, at least on Linux, while Mercurial performs +better on other operations. However, on Windows, the performance and +general level of support that Git provides is, at the time of writing, +far behind that of Mercurial. + +While a Mercurial repository needs no maintenance, a Git repository +requires frequent manual ``repacks'' of its metadata. Without these, +performance degrades, while space usage grows rapidly. A server that +contains many Git repositories that are not rigorously and frequently +repacked will become heavily disk-bound during backups, and there have +been instances of daily backups taking far longer than~24 hours as a +result. A freshly packed Git repository is slightly smaller than a +Mercurial repository, but an unpacked repository is several orders of +magnitude larger. + +The core of Git is written in C. Many Git commands are implemented as +shell or Perl scripts, and the quality of these scripts varies widely. +I have encountered several instances where scripts charged along +blindly in the presence of errors that should have been fatal. + +Mercurial can import revision history from a Git repository. + + +\subsection{CVS} + +CVS is probably the most widely used revision control tool in the +world. Due to its age and internal untidiness, it has been only +lightly maintained for many years. + +It has a centralised client/server architecture. It does not group +related file changes into atomic commits, making it easy for people to +``break the build'': one person can successfully commit part of a +change and then be blocked by the need for a merge, causing other +people to see only a portion of the work they intended to do. This +also affects how you work with project history. If you want to see +all of the modifications someone made as part of a task, you will need +to manually inspect the descriptions and timestamps of the changes +made to each file involved (if you even know what those files were). + +CVS has a muddled notion of tags and branches that I will not attempt +to even describe. It does not support renaming of files or +directories well, making it easy to corrupt a repository. It has +almost no internal consistency checking capabilities, so it is usually +not even possible to tell whether or how a repository is corrupt. I +would not recommend CVS for any project, existing or new. + +Mercurial can import CVS revision history. However, there are a few +caveats that apply; these are true of every other revision control +tool's CVS importer, too. Due to CVS's lack of atomic changes and +unversioned filesystem hierarchy, it is not possible to reconstruct +CVS history completely accurately; some guesswork is involved, and +renames will usually not show up. Because a lot of advanced CVS +administration has to be done by hand and is hence error-prone, it's +common for CVS importers to run into multiple problems with corrupted +repositories (completely bogus revision timestamps and files that have +remained locked for over a decade are just two of the less interesting +problems I can recall from personal experience). + +Mercurial can import revision history from a CVS repository. + + +\subsection{Commercial tools} + +Perforce has a centralised client/server architecture, with no +client-side caching of any data. Unlike modern revision control +tools, Perforce requires that a user run a command to inform the +server about every file they intend to edit. + +The performance of Perforce is quite good for small teams, but it +falls off rapidly as the number of users grows beyond a few dozen. +Modestly large Perforce installations require the deployment of +proxies to cope with the load their users generate. + + +\subsection{Choosing a revision control tool} + +With the exception of CVS, all of the tools listed above have unique +strengths that suit them to particular styles of work. There is no +single revision control tool that is best in all situations. + +As an example, Subversion is a good choice for working with frequently +edited binary files, due to its centralised nature and support for +file locking. + +I personally find Mercurial's properties of simplicity, performance, +and good merge support to be a compelling combination that has served +me well for several years. + + +\section{Switching from another tool to Mercurial} + +Mercurial is bundled with an extension named \hgext{convert}, which +can incrementally import revision history from several other revision +control tools. By ``incremental'', I mean that you can convert all of +a project's history to date in one go, then rerun the conversion later +to obtain new changes that happened after the initial conversion. + +The revision control tools supported by \hgext{convert} are as +follows: +\begin{itemize} +\item Subversion +\item CVS +\item Git +\item Darcs +\end{itemize} + +In addition, \hgext{convert} can export changes from Mercurial to +Subversion. This makes it possible to try Subversion and Mercurial in +parallel before committing to a switchover, without risking the loss +of any work. + +The \hgxcmd{conver}{convert} command is easy to use. Simply point it +at the path or URL of the source repository, optionally give it the +name of the destination repository, and it will start working. After +the initial conversion, just run the same command again to import new +changes. + + +%%% Local Variables: +%%% mode: latex +%%% TeX-master: "00book" +%%% End: diff -r bc14f94e726a -r 5cd47f721686 en/ch02-tour-basic.tex --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/en/ch02-tour-basic.tex Thu Jan 29 22:56:27 2009 -0800 @@ -0,0 +1,624 @@ +\chapter{A tour of Mercurial: the basics} +\label{chap:tour-basic} + +\section{Installing Mercurial on your system} +\label{sec:tour:install} + +Prebuilt binary packages of Mercurial are available for every popular +operating system. These make it easy to start using Mercurial on your +computer immediately. + +\subsection{Linux} + +Because each Linux distribution has its own packaging tools, policies, +and rate of development, it's difficult to give a comprehensive set of +instructions on how to install Mercurial binaries. The version of +Mercurial that you will end up with can vary depending on how active +the person is who maintains the package for your distribution. + +To keep things simple, I will focus on installing Mercurial from the +command line under the most popular Linux distributions. Most of +these distributions provide graphical package managers that will let +you install Mercurial with a single click; the package name to look +for is \texttt{mercurial}. + +\begin{itemize} +\item[Debian] + \begin{codesample4} + apt-get install mercurial + \end{codesample4} + +\item[Fedora Core] + \begin{codesample4} + yum install mercurial + \end{codesample4} + +\item[Gentoo] + \begin{codesample4} + emerge mercurial + \end{codesample4} + +\item[OpenSUSE] + \begin{codesample4} + yum install mercurial + \end{codesample4} + +\item[Ubuntu] Ubuntu's Mercurial package is based on Debian's. To + install it, run the following command. + \begin{codesample4} + apt-get install mercurial + \end{codesample4} + The Ubuntu package for Mercurial tends to lag behind the Debian + version by a considerable time margin (at the time of writing, seven + months), which in some cases will mean that on Ubuntu, you may run + into problems that have since been fixed in the Debian package. +\end{itemize} + +\subsection{Solaris} + +SunFreeWare, at \url{http://www.sunfreeware.com}, is a good source for a +large number of pre-built Solaris packages for 32 and 64 bit Intel and +Sparc architectures, including current versions of Mercurial. + +\subsection{Mac OS X} + +Lee Cantey publishes an installer of Mercurial for Mac OS~X at +\url{http://mercurial.berkwood.com}. This package works on both +Intel-~and Power-based Macs. Before you can use it, you must install +a compatible version of Universal MacPython~\cite{web:macpython}. This +is easy to do; simply follow the instructions on Lee's site. + +It's also possible to install Mercurial using Fink or MacPorts, +two popular free package managers for Mac OS X. If you have Fink, +use \command{sudo apt-get install mercurial-py25}. If MacPorts, +\command{sudo port install mercurial}. + +\subsection{Windows} + +Lee Cantey publishes an installer of Mercurial for Windows at +\url{http://mercurial.berkwood.com}. This package has no external +dependencies; it ``just works''. + +\begin{note} + The Windows version of Mercurial does not automatically convert line + endings between Windows and Unix styles. If you want to share work + with Unix users, you must do a little additional configuration + work. XXX Flesh this out. +\end{note} + +\section{Getting started} + +To begin, we'll use the \hgcmd{version} command to find out whether +Mercurial is actually installed properly. The actual version +information that it prints isn't so important; it's whether it prints +anything at all that we care about. +\interaction{tour.version} + +\subsection{Built-in help} + +Mercurial provides a built-in help system. This is invaluable for those +times when you find yourself stuck trying to remember how to run a +command. If you are completely stuck, simply run \hgcmd{help}; it +will print a brief list of commands, along with a description of what +each does. If you ask for help on a specific command (as below), it +prints more detailed information. +\interaction{tour.help} +For a more impressive level of detail (which you won't usually need) +run \hgcmdargs{help}{\hggopt{-v}}. The \hggopt{-v} option is short +for \hggopt{--verbose}, and tells Mercurial to print more information +than it usually would. + +\section{Working with a repository} + +In Mercurial, everything happens inside a \emph{repository}. The +repository for a project contains all of the files that ``belong to'' +that project, along with a historical record of the project's files. + +There's nothing particularly magical about a repository; it is simply +a directory tree in your filesystem that Mercurial treats as special. +You can rename or delete a repository any time you like, using either the +command line or your file browser. + +\subsection{Making a local copy of a repository} + +\emph{Copying} a repository is just a little bit special. While you +could use a normal file copying command to make a copy of a +repository, it's best to use a built-in command that Mercurial +provides. This command is called \hgcmd{clone}, because it creates an +identical copy of an existing repository. +\interaction{tour.clone} +If our clone succeeded, we should now have a local directory called +\dirname{hello}. This directory will contain some files. +\interaction{tour.ls} +These files have the same contents and history in our repository as +they do in the repository we cloned. + +Every Mercurial repository is complete, self-contained, and +independent. It contains its own private copy of a project's files +and history. A cloned repository remembers the location of the +repository it was cloned from, but it does not communicate with that +repository, or any other, unless you tell it to. + +What this means for now is that we're free to experiment with our +repository, safe in the knowledge that it's a private ``sandbox'' that +won't affect anyone else. + +\subsection{What's in a repository?} + +When we take a more detailed look inside a repository, we can see that +it contains a directory named \dirname{.hg}. This is where Mercurial +keeps all of its metadata for the repository. +\interaction{tour.ls-a} + +The contents of the \dirname{.hg} directory and its subdirectories are +private to Mercurial. Every other file and directory in the +repository is yours to do with as you please. + +To introduce a little terminology, the \dirname{.hg} directory is the +``real'' repository, and all of the files and directories that coexist +with it are said to live in the \emph{working directory}. An easy way +to remember the distinction is that the \emph{repository} contains the +\emph{history} of your project, while the \emph{working directory} +contains a \emph{snapshot} of your project at a particular point in +history. + +\section{A tour through history} + +One of the first things we might want to do with a new, unfamiliar +repository is understand its history. The \hgcmd{log} command gives +us a view of history. +\interaction{tour.log} +By default, this command prints a brief paragraph of output for each +change to the project that was recorded. In Mercurial terminology, we +call each of these recorded events a \emph{changeset}, because it can +contain a record of changes to several files. + +The fields in a record of output from \hgcmd{log} are as follows. +\begin{itemize} +\item[\texttt{changeset}] This field has the format of a number, + followed by a colon, followed by a hexadecimal string. These are + \emph{identifiers} for the changeset. There are two identifiers + because the number is shorter and easier to type than the hex + string. +\item[\texttt{user}] The identity of the person who created the + changeset. This is a free-form field, but it most often contains a + person's name and email address. +\item[\texttt{date}] The date and time on which the changeset was + created, and the timezone in which it was created. (The date and + time are local to that timezone; they display what time and date it + was for the person who created the changeset.) +\item[\texttt{summary}] The first line of the text message that the + creator of the changeset entered to describe the changeset. +\end{itemize} +The default output printed by \hgcmd{log} is purely a summary; it is +missing a lot of detail. + +Figure~\ref{fig:tour-basic:history} provides a graphical representation of +the history of the \dirname{hello} repository, to make it a little +easier to see which direction history is ``flowing'' in. We'll be +returning to this figure several times in this chapter and the chapter +that follows. + +\begin{figure}[ht] + \centering + \grafix{tour-history} + \caption{Graphical history of the \dirname{hello} repository} + \label{fig:tour-basic:history} +\end{figure} + +\subsection{Changesets, revisions, and talking to other + people} + +As English is a notoriously sloppy language, and computer science has +a hallowed history of terminological confusion (why use one term when +four will do?), revision control has a variety of words and phrases +that mean the same thing. If you are talking about Mercurial history +with other people, you will find that the word ``changeset'' is often +compressed to ``change'' or (when written) ``cset'', and sometimes a +changeset is referred to as a ``revision'' or a ``rev''. + +While it doesn't matter what \emph{word} you use to refer to the +concept of ``a~changeset'', the \emph{identifier} that you use to +refer to ``a~\emph{specific} changeset'' is of great importance. +Recall that the \texttt{changeset} field in the output from +\hgcmd{log} identifies a changeset using both a number and a +hexadecimal string. +\begin{itemize} +\item The revision number is \emph{only valid in that repository}, +\item while the hex string is the \emph{permanent, unchanging + identifier} that will always identify that exact changeset in + \emph{every} copy of the repository. +\end{itemize} +This distinction is important. If you send someone an email talking +about ``revision~33'', there's a high likelihood that their +revision~33 will \emph{not be the same} as yours. The reason for this +is that a revision number depends on the order in which changes +arrived in a repository, and there is no guarantee that the same +changes will happen in the same order in different repositories. +Three changes $a,b,c$ can easily appear in one repository as $0,1,2$, +while in another as $1,0,2$. + +Mercurial uses revision numbers purely as a convenient shorthand. If +you need to discuss a changeset with someone, or make a record of a +changeset for some other reason (for example, in a bug report), use +the hexadecimal identifier. + +\subsection{Viewing specific revisions} + +To narrow the output of \hgcmd{log} down to a single revision, use the +\hgopt{log}{-r} (or \hgopt{log}{--rev}) option. You can use either a +revision number or a long-form changeset identifier, and you can +provide as many revisions as you want. \interaction{tour.log-r} + +If you want to see the history of several revisions without having to +list each one, you can use \emph{range notation}; this lets you +express the idea ``I want all revisions between $a$ and $b$, +inclusive''. +\interaction{tour.log.range} +Mercurial also honours the order in which you specify revisions, so +\hgcmdargs{log}{-r 2:4} prints $2,3,4$ while \hgcmdargs{log}{-r 4:2} +prints $4,3,2$. + +\subsection{More detailed information} + +While the summary information printed by \hgcmd{log} is useful if you +already know what you're looking for, you may need to see a complete +description of the change, or a list of the files changed, if you're +trying to decide whether a changeset is the one you're looking for. +The \hgcmd{log} command's \hggopt{-v} (or \hggopt{--verbose}) +option gives you this extra detail. +\interaction{tour.log-v} + +If you want to see both the description and content of a change, add +the \hgopt{log}{-p} (or \hgopt{log}{--patch}) option. This displays +the content of a change as a \emph{unified diff} (if you've never seen +a unified diff before, see section~\ref{sec:mq:patch} for an overview). +\interaction{tour.log-vp} + +\section{All about command options} + +Let's take a brief break from exploring Mercurial commands to discuss +a pattern in the way that they work; you may find this useful to keep +in mind as we continue our tour. + +Mercurial has a consistent and straightforward approach to dealing +with the options that you can pass to commands. It follows the +conventions for options that are common to modern Linux and Unix +systems. +\begin{itemize} +\item Every option has a long name. For example, as we've already + seen, the \hgcmd{log} command accepts a \hgopt{log}{--rev} option. +\item Most options have short names, too. Instead of + \hgopt{log}{--rev}, we can use \hgopt{log}{-r}. (The reason that + some options don't have short names is that the options in question + are rarely used.) +\item Long options start with two dashes (e.g.~\hgopt{log}{--rev}), + while short options start with one (e.g.~\hgopt{log}{-r}). +\item Option naming and usage is consistent across commands. For + example, every command that lets you specify a changeset~ID or + revision number accepts both \hgopt{log}{-r} and \hgopt{log}{--rev} + arguments. +\end{itemize} +In the examples throughout this book, I use short options instead of +long. This just reflects my own preference, so don't read anything +significant into it. + +Most commands that print output of some kind will print more output +when passed a \hggopt{-v} (or \hggopt{--verbose}) option, and less +when passed \hggopt{-q} (or \hggopt{--quiet}). + +\section{Making and reviewing changes} + +Now that we have a grasp of viewing history in Mercurial, let's take a +look at making some changes and examining them. + +The first thing we'll do is isolate our experiment in a repository of +its own. We use the \hgcmd{clone} command, but we don't need to +clone a copy of the remote repository. Since we already have a copy +of it locally, we can just clone that instead. This is much faster +than cloning over the network, and cloning a local repository uses +less disk space in most cases, too. +\interaction{tour.reclone} +As an aside, it's often good practice to keep a ``pristine'' copy of a +remote repository around, which you can then make temporary clones of +to create sandboxes for each task you want to work on. This lets you +work on multiple tasks in parallel, each isolated from the others +until it's complete and you're ready to integrate it back. Because +local clones are so cheap, there's almost no overhead to cloning and +destroying repositories whenever you want. + +In our \dirname{my-hello} repository, we have a file +\filename{hello.c} that contains the classic ``hello, world'' program. +Let's use the ancient and venerable \command{sed} command to edit this +file so that it prints a second line of output. (I'm only using +\command{sed} to do this because it's easy to write a scripted example +this way. Since you're not under the same constraint, you probably +won't want to use \command{sed}; simply use your preferred text editor to +do the same thing.) +\interaction{tour.sed} + +Mercurial's \hgcmd{status} command will tell us what Mercurial knows +about the files in the repository. +\interaction{tour.status} +The \hgcmd{status} command prints no output for some files, but a line +starting with ``\texttt{M}'' for \filename{hello.c}. Unless you tell +it to, \hgcmd{status} will not print any output for files that have +not been modified. + +The ``\texttt{M}'' indicates that Mercurial has noticed that we +modified \filename{hello.c}. We didn't need to \emph{inform} +Mercurial that we were going to modify the file before we started, or +that we had modified the file after we were done; it was able to +figure this out itself. + +It's a little bit helpful to know that we've modified +\filename{hello.c}, but we might prefer to know exactly \emph{what} +changes we've made to it. To do this, we use the \hgcmd{diff} +command. +\interaction{tour.diff} + +\section{Recording changes in a new changeset} + +We can modify files, build and test our changes, and use +\hgcmd{status} and \hgcmd{diff} to review our changes, until we're +satisfied with what we've done and arrive at a natural stopping point +where we want to record our work in a new changeset. + +The \hgcmd{commit} command lets us create a new changeset; we'll +usually refer to this as ``making a commit'' or ``committing''. + +\subsection{Setting up a username} + +When you try to run \hgcmd{commit} for the first time, it is not +guaranteed to succeed. Mercurial records your name and address with +each change that you commit, so that you and others will later be able +to tell who made each change. Mercurial tries to automatically figure +out a sensible username to commit the change with. It will attempt +each of the following methods, in order: +\begin{enumerate} +\item If you specify a \hgopt{commit}{-u} option to the \hgcmd{commit} + command on the command line, followed by a username, this is always + given the highest precedence. +\item If you have set the \envar{HGUSER} environment variable, this is + checked next. +\item If you create a file in your home directory called + \sfilename{.hgrc}, with a \rcitem{ui}{username} entry, that will be + used next. To see what the contents of this file should look like, + refer to section~\ref{sec:tour-basic:username} below. +\item If you have set the \envar{EMAIL} environment variable, this + will be used next. +\item Mercurial will query your system to find out your local user + name and host name, and construct a username from these components. + Since this often results in a username that is not very useful, it + will print a warning if it has to do this. +\end{enumerate} +If all of these mechanisms fail, Mercurial will fail, printing an +error message. In this case, it will not let you commit until you set +up a username. + +You should think of the \envar{HGUSER} environment variable and the +\hgopt{commit}{-u} option to the \hgcmd{commit} command as ways to +\emph{override} Mercurial's default selection of username. For normal +use, the simplest and most robust way to set a username for yourself +is by creating a \sfilename{.hgrc} file; see below for details. + +\subsubsection{Creating a Mercurial configuration file} +\label{sec:tour-basic:username} + +To set a user name, use your favourite editor to create a file called +\sfilename{.hgrc} in your home directory. Mercurial will use this +file to look up your personalised configuration settings. The initial +contents of your \sfilename{.hgrc} should look like this. +\begin{codesample2} + # This is a Mercurial configuration file. + [ui] + username = Firstname Lastname +\end{codesample2} +The ``\texttt{[ui]}'' line begins a \emph{section} of the config file, +so you can read the ``\texttt{username = ...}'' line as meaning ``set +the value of the \texttt{username} item in the \texttt{ui} section''. +A section continues until a new section begins, or the end of the +file. Mercurial ignores empty lines and treats any text from +``\texttt{\#}'' to the end of a line as a comment. + +\subsubsection{Choosing a user name} + +You can use any text you like as the value of the \texttt{username} +config item, since this information is for reading by other people, +but for interpreting by Mercurial. The convention that most people +follow is to use their name and email address, as in the example +above. + +\begin{note} + Mercurial's built-in web server obfuscates email addresses, to make + it more difficult for the email harvesting tools that spammers use. + This reduces the likelihood that you'll start receiving more junk + email if you publish a Mercurial repository on the web. +\end{note} + +\subsection{Writing a commit message} + +When we commit a change, Mercurial drops us into a text editor, to +enter a message that will describe the modifications we've made in +this changeset. This is called the \emph{commit message}. It will be +a record for readers of what we did and why, and it will be printed by +\hgcmd{log} after we've finished committing. +\interaction{tour.commit} + +The editor that the \hgcmd{commit} command drops us into will contain +an empty line, followed by a number of lines starting with +``\texttt{HG:}''. +\begin{codesample2} + \emph{empty line} + HG: changed hello.c +\end{codesample2} +Mercurial ignores the lines that start with ``\texttt{HG:}''; it uses +them only to tell us which files it's recording changes to. Modifying +or deleting these lines has no effect. + +\subsection{Writing a good commit message} + +Since \hgcmd{log} only prints the first line of a commit message by +default, it's best to write a commit message whose first line stands +alone. Here's a real example of a commit message that \emph{doesn't} +follow this guideline, and hence has a summary that is not readable. +\begin{codesample2} + changeset: 73:584af0e231be + user: Censored Person + date: Tue Sep 26 21:37:07 2006 -0700 + summary: include buildmeister/commondefs. Add an exports and install +\end{codesample2} + +As far as the remainder of the contents of the commit message are +concerned, there are no hard-and-fast rules. Mercurial itself doesn't +interpret or care about the contents of the commit message, though +your project may have policies that dictate a certain kind of +formatting. + +My personal preference is for short, but informative, commit messages +that tell me something that I can't figure out with a quick glance at +the output of \hgcmdargs{log}{--patch}. + +\subsection{Aborting a commit} + +If you decide that you don't want to commit while in the middle of +editing a commit message, simply exit from your editor without saving +the file that it's editing. This will cause nothing to happen to +either the repository or the working directory. + +If we run the \hgcmd{commit} command without any arguments, it records +all of the changes we've made, as reported by \hgcmd{status} and +\hgcmd{diff}. + +\subsection{Admiring our new handiwork} + +Once we've finished the commit, we can use the \hgcmd{tip} command to +display the changeset we just created. This command produces output +that is identical to \hgcmd{log}, but it only displays the newest +revision in the repository. +\interaction{tour.tip} +We refer to the newest revision in the repository as the tip revision, +or simply the tip. + +\section{Sharing changes} + +We mentioned earlier that repositories in Mercurial are +self-contained. This means that the changeset we just created exists +only in our \dirname{my-hello} repository. Let's look at a few ways +that we can propagate this change into other repositories. + +\subsection{Pulling changes from another repository} +\label{sec:tour:pull} + +To get started, let's clone our original \dirname{hello} repository, +which does not contain the change we just committed. We'll call our +temporary repository \dirname{hello-pull}. +\interaction{tour.clone-pull} + +We'll use the \hgcmd{pull} command to bring changes from +\dirname{my-hello} into \dirname{hello-pull}. However, blindly +pulling unknown changes into a repository is a somewhat scary +prospect. Mercurial provides the \hgcmd{incoming} command to tell us +what changes the \hgcmd{pull} command \emph{would} pull into the +repository, without actually pulling the changes in. +\interaction{tour.incoming} +(Of course, someone could cause more changesets to appear in the +repository that we ran \hgcmd{incoming} in, before we get a chance to +\hgcmd{pull} the changes, so that we could end up pulling changes that we +didn't expect.) + +Bringing changes into a repository is a simple matter of running the +\hgcmd{pull} command, and telling it which repository to pull from. +\interaction{tour.pull} +As you can see from the before-and-after output of \hgcmd{tip}, we +have successfully pulled changes into our repository. There remains +one step before we can see these changes in the working directory. + +\subsection{Updating the working directory} + +We have so far glossed over the relationship between a repository and +its working directory. The \hgcmd{pull} command that we ran in +section~\ref{sec:tour:pull} brought changes into the repository, but +if we check, there's no sign of those changes in the working +directory. This is because \hgcmd{pull} does not (by default) touch +the working directory. Instead, we use the \hgcmd{update} command to +do this. +\interaction{tour.update} + +It might seem a bit strange that \hgcmd{pull} doesn't update the +working directory automatically. There's actually a good reason for +this: you can use \hgcmd{update} to update the working directory to +the state it was in at \emph{any revision} in the history of the +repository. If you had the working directory updated to an old +revision---to hunt down the origin of a bug, say---and ran a +\hgcmd{pull} which automatically updated the working directory to a +new revision, you might not be terribly happy. + +However, since pull-then-update is such a common thing to do, +Mercurial lets you combine the two by passing the \hgopt{pull}{-u} +option to \hgcmd{pull}. +\begin{codesample2} + hg pull -u +\end{codesample2} +If you look back at the output of \hgcmd{pull} in +section~\ref{sec:tour:pull} when we ran it without \hgopt{pull}{-u}, +you can see that it printed a helpful reminder that we'd have to take +an explicit step to update the working directory: +\begin{codesample2} + (run 'hg update' to get a working copy) +\end{codesample2} + +To find out what revision the working directory is at, use the +\hgcmd{parents} command. +\interaction{tour.parents} +If you look back at figure~\ref{fig:tour-basic:history}, you'll see +arrows connecting each changeset. The node that the arrow leads +\emph{from} in each case is a parent, and the node that the arrow +leads \emph{to} is its child. The working directory has a parent in +just the same way; this is the changeset that the working directory +currently contains. + +To update the working directory to a particular revision, give a +revision number or changeset~ID to the \hgcmd{update} command. +\interaction{tour.older} +If you omit an explicit revision, \hgcmd{update} will update to the +tip revision, as shown by the second call to \hgcmd{update} in the +example above. + +\subsection{Pushing changes to another repository} + +Mercurial lets us push changes to another repository, from the +repository we're currently visiting. As with the example of +\hgcmd{pull} above, we'll create a temporary repository to push our +changes into. +\interaction{tour.clone-push} +The \hgcmd{outgoing} command tells us what changes would be pushed +into another repository. +\interaction{tour.outgoing} +And the \hgcmd{push} command does the actual push. +\interaction{tour.push} +As with \hgcmd{pull}, the \hgcmd{push} command does not update the +working directory in the repository that it's pushing changes into. +(Unlike \hgcmd{pull}, \hgcmd{push} does not provide a \texttt{-u} +option that updates the other repository's working directory.) + +What happens if we try to pull or push changes and the receiving +repository already has those changes? Nothing too exciting. +\interaction{tour.push.nothing} + +\subsection{Sharing changes over a network} + +The commands we have covered in the previous few sections are not +limited to working with local repositories. Each works in exactly the +same fashion over a network connection; simply pass in a URL instead +of a local path. +\interaction{tour.outgoing.net} +In this example, we can see what changes we could push to the remote +repository, but the repository is understandably not set up to let +anonymous users push to it. +\interaction{tour.push.net} + +%%% Local Variables: +%%% mode: latex +%%% TeX-master: "00book" +%%% End: diff -r bc14f94e726a -r 5cd47f721686 en/ch03-tour-merge.tex --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/en/ch03-tour-merge.tex Thu Jan 29 22:56:27 2009 -0800 @@ -0,0 +1,286 @@ +\chapter{A tour of Mercurial: merging work} +\label{chap:tour-merge} + +We've now covered cloning a repository, making changes in a +repository, and pulling or pushing changes from one repository into +another. Our next step is \emph{merging} changes from separate +repositories. + +\section{Merging streams of work} + +Merging is a fundamental part of working with a distributed revision +control tool. +\begin{itemize} +\item Alice and Bob each have a personal copy of a repository for a + project they're collaborating on. Alice fixes a bug in her + repository; Bob adds a new feature in his. They want the shared + repository to contain both the bug fix and the new feature. +\item I frequently work on several different tasks for a single + project at once, each safely isolated in its own repository. + Working this way means that I often need to merge one piece of my + own work with another. +\end{itemize} + +Because merging is such a common thing to need to do, Mercurial makes +it easy. Let's walk through the process. We'll begin by cloning yet +another repository (see how often they spring up?) and making a change +in it. +\interaction{tour.merge.clone} +We should now have two copies of \filename{hello.c} with different +contents. The histories of the two repositories have also diverged, +as illustrated in figure~\ref{fig:tour-merge:sep-repos}. +\interaction{tour.merge.cat} + +\begin{figure}[ht] + \centering + \grafix{tour-merge-sep-repos} + \caption{Divergent recent histories of the \dirname{my-hello} and + \dirname{my-new-hello} repositories} + \label{fig:tour-merge:sep-repos} +\end{figure} + +We already know that pulling changes from our \dirname{my-hello} +repository will have no effect on the working directory. +\interaction{tour.merge.pull} +However, the \hgcmd{pull} command says something about ``heads''. + +\subsection{Head changesets} + +A head is a change that has no descendants, or children, as they're +also known. The tip revision is thus a head, because the newest +revision in a repository doesn't have any children, but a repository +can contain more than one head. + +\begin{figure}[ht] + \centering + \grafix{tour-merge-pull} + \caption{Repository contents after pulling from \dirname{my-hello} into + \dirname{my-new-hello}} + \label{fig:tour-merge:pull} +\end{figure} + +In figure~\ref{fig:tour-merge:pull}, you can see the effect of the +pull from \dirname{my-hello} into \dirname{my-new-hello}. The history +that was already present in \dirname{my-new-hello} is untouched, but a +new revision has been added. By referring to +figure~\ref{fig:tour-merge:sep-repos}, we can see that the +\emph{changeset ID} remains the same in the new repository, but the +\emph{revision number} has changed. (This, incidentally, is a fine +example of why it's not safe to use revision numbers when discussing +changesets.) We can view the heads in a repository using the +\hgcmd{heads} command. +\interaction{tour.merge.heads} + +\subsection{Performing the merge} + +What happens if we try to use the normal \hgcmd{update} command to +update to the new tip? +\interaction{tour.merge.update} +Mercurial is telling us that the \hgcmd{update} command won't do a +merge; it won't update the working directory when it thinks we might +be wanting to do a merge, unless we force it to do so. Instead, we +use the \hgcmd{merge} command to merge the two heads. +\interaction{tour.merge.merge} + +\begin{figure}[ht] + \centering + \grafix{tour-merge-merge} + \caption{Working directory and repository during merge, and + following commit} + \label{fig:tour-merge:merge} +\end{figure} + +This updates the working directory so that it contains changes from +\emph{both} heads, which is reflected in both the output of +\hgcmd{parents} and the contents of \filename{hello.c}. +\interaction{tour.merge.parents} + +\subsection{Committing the results of the merge} + +Whenever we've done a merge, \hgcmd{parents} will display two parents +until we \hgcmd{commit} the results of the merge. +\interaction{tour.merge.commit} +We now have a new tip revision; notice that it has \emph{both} of +our former heads as its parents. These are the same revisions that +were previously displayed by \hgcmd{parents}. +\interaction{tour.merge.tip} +In figure~\ref{fig:tour-merge:merge}, you can see a representation of +what happens to the working directory during the merge, and how this +affects the repository when the commit happens. During the merge, the +working directory has two parent changesets, and these become the +parents of the new changeset. + +\section{Merging conflicting changes} + +Most merges are simple affairs, but sometimes you'll find yourself +merging changes where each modifies the same portions of the same +files. Unless both modifications are identical, this results in a +\emph{conflict}, where you have to decide how to reconcile the +different changes into something coherent. + +\begin{figure}[ht] + \centering + \grafix{tour-merge-conflict} + \caption{Conflicting changes to a document} + \label{fig:tour-merge:conflict} +\end{figure} + +Figure~\ref{fig:tour-merge:conflict} illustrates an instance of two +conflicting changes to a document. We started with a single version +of the file; then we made some changes; while someone else made +different changes to the same text. Our task in resolving the +conflicting changes is to decide what the file should look like. + +Mercurial doesn't have a built-in facility for handling conflicts. +Instead, it runs an external program called \command{hgmerge}. This +is a shell script that is bundled with Mercurial; you can change it to +behave however you please. What it does by default is try to find one +of several different merging tools that are likely to be installed on +your system. It first tries a few fully automatic merging tools; if +these don't succeed (because the resolution process requires human +guidance) or aren't present, the script tries a few different +graphical merging tools. + +It's also possible to get Mercurial to run another program or script +instead of \command{hgmerge}, by setting the \envar{HGMERGE} +environment variable to the name of your preferred program. + +\subsection{Using a graphical merge tool} + +My preferred graphical merge tool is \command{kdiff3}, which I'll use +to describe the features that are common to graphical file merging +tools. You can see a screenshot of \command{kdiff3} in action in +figure~\ref{fig:tour-merge:kdiff3}. The kind of merge it is +performing is called a \emph{three-way merge}, because there are three +different versions of the file of interest to us. The tool thus +splits the upper portion of the window into three panes: +\begin{itemize} +\item At the left is the \emph{base} version of the file, i.e.~the + most recent version from which the two versions we're trying to + merge are descended. +\item In the middle is ``our'' version of the file, with the contents + that we modified. +\item On the right is ``their'' version of the file, the one that + from the changeset that we're trying to merge with. +\end{itemize} +In the pane below these is the current \emph{result} of the merge. +Our task is to replace all of the red text, which indicates unresolved +conflicts, with some sensible merger of the ``ours'' and ``theirs'' +versions of the file. + +All four of these panes are \emph{locked together}; if we scroll +vertically or horizontally in any of them, the others are updated to +display the corresponding sections of their respective files. + +\begin{figure}[ht] + \centering + \grafix{kdiff3} + \caption{Using \command{kdiff3} to merge versions of a file} + \label{fig:tour-merge:kdiff3} +\end{figure} + +For each conflicting portion of the file, we can choose to resolve +the conflict using some combination of text from the base version, +ours, or theirs. We can also manually edit the merged file at any +time, in case we need to make further modifications. + +There are \emph{many} file merging tools available, too many to cover +here. They vary in which platforms they are available for, and in +their particular strengths and weaknesses. Most are tuned for merging +files containing plain text, while a few are aimed at specialised file +formats (generally XML). + +\subsection{A worked example} + +In this example, we will reproduce the file modification history of +figure~\ref{fig:tour-merge:conflict} above. Let's begin by creating a +repository with a base version of our document. +\interaction{tour-merge-conflict.wife} +We'll clone the repository and make a change to the file. +\interaction{tour-merge-conflict.cousin} +And another clone, to simulate someone else making a change to the +file. (This hints at the idea that it's not all that unusual to merge +with yourself when you isolate tasks in separate repositories, and +indeed to find and resolve conflicts while doing so.) +\interaction{tour-merge-conflict.son} +Having created two different versions of the file, we'll set up an +environment suitable for running our merge. +\interaction{tour-merge-conflict.pull} + +In this example, I won't use Mercurial's normal \command{hgmerge} +program to do the merge, because it would drop my nice automated +example-running tool into a graphical user interface. Instead, I'll +set \envar{HGMERGE} to tell Mercurial to use the non-interactive +\command{merge} command. This is bundled with many Unix-like systems. +If you're following this example on your computer, don't bother +setting \envar{HGMERGE}. + +\textbf{XXX FIX THIS EXAMPLE.} + +\interaction{tour-merge-conflict.merge} +Because \command{merge} can't resolve the conflicting changes, it +leaves \emph{merge markers} inside the file that has conflicts, +indicating which lines have conflicts, and whether they came from our +version of the file or theirs. + +Mercurial can tell from the way \command{merge} exits that it wasn't +able to merge successfully, so it tells us what commands we'll need to +run if we want to redo the merging operation. This could be useful +if, for example, we were running a graphical merge tool and quit +because we were confused or realised we had made a mistake. + +If automatic or manual merges fail, there's nothing to prevent us from +``fixing up'' the affected files ourselves, and committing the results +of our merge: +\interaction{tour-merge-conflict.commit} + +\section{Simplifying the pull-merge-commit sequence} +\label{sec:tour-merge:fetch} + +The process of merging changes as outlined above is straightforward, +but requires running three commands in sequence. +\begin{codesample2} + hg pull + hg merge + hg commit -m 'Merged remote changes' +\end{codesample2} +In the case of the final commit, you also need to enter a commit +message, which is almost always going to be a piece of uninteresting +``boilerplate'' text. + +It would be nice to reduce the number of steps needed, if this were +possible. Indeed, Mercurial is distributed with an extension called +\hgext{fetch} that does just this. + +Mercurial provides a flexible extension mechanism that lets people +extend its functionality, while keeping the core of Mercurial small +and easy to deal with. Some extensions add new commands that you can +use from the command line, while others work ``behind the scenes,'' +for example adding capabilities to the server. + +The \hgext{fetch} extension adds a new command called, not +surprisingly, \hgcmd{fetch}. This extension acts as a combination of +\hgcmd{pull}, \hgcmd{update} and \hgcmd{merge}. It begins by pulling +changes from another repository into the current repository. If it +finds that the changes added a new head to the repository, it begins a +merge, then commits the result of the merge with an +automatically-generated commit message. If no new heads were added, +it updates the working directory to the new tip changeset. + +Enabling the \hgext{fetch} extension is easy. Edit your +\sfilename{.hgrc}, and either go to the \rcsection{extensions} section +or create an \rcsection{extensions} section. Then add a line that +simply reads ``\Verb+fetch +''. +\begin{codesample2} + [extensions] + fetch = +\end{codesample2} +(Normally, on the right-hand side of the ``\texttt{=}'' would appear +the location of the extension, but since the \hgext{fetch} extension +is in the standard distribution, Mercurial knows where to search for +it.) + +%%% Local Variables: +%%% mode: latex +%%% TeX-master: "00book" +%%% End: diff -r bc14f94e726a -r 5cd47f721686 en/ch04-concepts.tex --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/en/ch04-concepts.tex Thu Jan 29 22:56:27 2009 -0800 @@ -0,0 +1,577 @@ +\chapter{Behind the scenes} +\label{chap:concepts} + +Unlike many revision control systems, the concepts upon which +Mercurial is built are simple enough that it's easy to understand how +the software really works. Knowing this certainly isn't necessary, +but I find it useful to have a ``mental model'' of what's going on. + +This understanding gives me confidence that Mercurial has been +carefully designed to be both \emph{safe} and \emph{efficient}. And +just as importantly, if it's easy for me to retain a good idea of what +the software is doing when I perform a revision control task, I'm less +likely to be surprised by its behaviour. + +In this chapter, we'll initially cover the core concepts behind +Mercurial's design, then continue to discuss some of the interesting +details of its implementation. + +\section{Mercurial's historical record} + +\subsection{Tracking the history of a single file} + +When Mercurial tracks modifications to a file, it stores the history +of that file in a metadata object called a \emph{filelog}. Each entry +in the filelog contains enough information to reconstruct one revision +of the file that is being tracked. Filelogs are stored as files in +the \sdirname{.hg/store/data} directory. A filelog contains two kinds +of information: revision data, and an index to help Mercurial to find +a revision efficiently. + +A file that is large, or has a lot of history, has its filelog stored +in separate data (``\texttt{.d}'' suffix) and index (``\texttt{.i}'' +suffix) files. For small files without much history, the revision +data and index are combined in a single ``\texttt{.i}'' file. The +correspondence between a file in the working directory and the filelog +that tracks its history in the repository is illustrated in +figure~\ref{fig:concepts:filelog}. + +\begin{figure}[ht] + \centering + \grafix{filelog} + \caption{Relationships between files in working directory and + filelogs in repository} + \label{fig:concepts:filelog} +\end{figure} + +\subsection{Managing tracked files} + +Mercurial uses a structure called a \emph{manifest} to collect +together information about the files that it tracks. Each entry in +the manifest contains information about the files present in a single +changeset. An entry records which files are present in the changeset, +the revision of each file, and a few other pieces of file metadata. + +\subsection{Recording changeset information} + +The \emph{changelog} contains information about each changeset. Each +revision records who committed a change, the changeset comment, other +pieces of changeset-related information, and the revision of the +manifest to use. + +\subsection{Relationships between revisions} + +Within a changelog, a manifest, or a filelog, each revision stores a +pointer to its immediate parent (or to its two parents, if it's a +merge revision). As I mentioned above, there are also relationships +between revisions \emph{across} these structures, and they are +hierarchical in nature. + +For every changeset in a repository, there is exactly one revision +stored in the changelog. Each revision of the changelog contains a +pointer to a single revision of the manifest. A revision of the +manifest stores a pointer to a single revision of each filelog tracked +when that changeset was created. These relationships are illustrated +in figure~\ref{fig:concepts:metadata}. + +\begin{figure}[ht] + \centering + \grafix{metadata} + \caption{Metadata relationships} + \label{fig:concepts:metadata} +\end{figure} + +As the illustration shows, there is \emph{not} a ``one to one'' +relationship between revisions in the changelog, manifest, or filelog. +If the manifest hasn't changed between two changesets, the changelog +entries for those changesets will point to the same revision of the +manifest. If a file that Mercurial tracks hasn't changed between two +changesets, the entry for that file in the two revisions of the +manifest will point to the same revision of its filelog. + +\section{Safe, efficient storage} + +The underpinnings of changelogs, manifests, and filelogs are provided +by a single structure called the \emph{revlog}. + +\subsection{Efficient storage} + +The revlog provides efficient storage of revisions using a +\emph{delta} mechanism. Instead of storing a complete copy of a file +for each revision, it stores the changes needed to transform an older +revision into the new revision. For many kinds of file data, these +deltas are typically a fraction of a percent of the size of a full +copy of a file. + +Some obsolete revision control systems can only work with deltas of +text files. They must either store binary files as complete snapshots +or encoded into a text representation, both of which are wasteful +approaches. Mercurial can efficiently handle deltas of files with +arbitrary binary contents; it doesn't need to treat text as special. + +\subsection{Safe operation} +\label{sec:concepts:txn} + +Mercurial only ever \emph{appends} data to the end of a revlog file. +It never modifies a section of a file after it has written it. This +is both more robust and efficient than schemes that need to modify or +rewrite data. + +In addition, Mercurial treats every write as part of a +\emph{transaction} that can span a number of files. A transaction is +\emph{atomic}: either the entire transaction succeeds and its effects +are all visible to readers in one go, or the whole thing is undone. +This guarantee of atomicity means that if you're running two copies of +Mercurial, where one is reading data and one is writing it, the reader +will never see a partially written result that might confuse it. + +The fact that Mercurial only appends to files makes it easier to +provide this transactional guarantee. The easier it is to do stuff +like this, the more confident you should be that it's done correctly. + +\subsection{Fast retrieval} + +Mercurial cleverly avoids a pitfall common to all earlier +revision control systems: the problem of \emph{inefficient retrieval}. +Most revision control systems store the contents of a revision as an +incremental series of modifications against a ``snapshot''. To +reconstruct a specific revision, you must first read the snapshot, and +then every one of the revisions between the snapshot and your target +revision. The more history that a file accumulates, the more +revisions you must read, hence the longer it takes to reconstruct a +particular revision. + +\begin{figure}[ht] + \centering + \grafix{snapshot} + \caption{Snapshot of a revlog, with incremental deltas} + \label{fig:concepts:snapshot} +\end{figure} + +The innovation that Mercurial applies to this problem is simple but +effective. Once the cumulative amount of delta information stored +since the last snapshot exceeds a fixed threshold, it stores a new +snapshot (compressed, of course), instead of another delta. This +makes it possible to reconstruct \emph{any} revision of a file +quickly. This approach works so well that it has since been copied by +several other revision control systems. + +Figure~\ref{fig:concepts:snapshot} illustrates the idea. In an entry +in a revlog's index file, Mercurial stores the range of entries from +the data file that it must read to reconstruct a particular revision. + +\subsubsection{Aside: the influence of video compression} + +If you're familiar with video compression or have ever watched a TV +feed through a digital cable or satellite service, you may know that +most video compression schemes store each frame of video as a delta +against its predecessor frame. In addition, these schemes use +``lossy'' compression techniques to increase the compression ratio, so +visual errors accumulate over the course of a number of inter-frame +deltas. + +Because it's possible for a video stream to ``drop out'' occasionally +due to signal glitches, and to limit the accumulation of artefacts +introduced by the lossy compression process, video encoders +periodically insert a complete frame (called a ``key frame'') into the +video stream; the next delta is generated against that frame. This +means that if the video signal gets interrupted, it will resume once +the next key frame is received. Also, the accumulation of encoding +errors restarts anew with each key frame. + +\subsection{Identification and strong integrity} + +Along with delta or snapshot information, a revlog entry contains a +cryptographic hash of the data that it represents. This makes it +difficult to forge the contents of a revision, and easy to detect +accidental corruption. + +Hashes provide more than a mere check against corruption; they are +used as the identifiers for revisions. The changeset identification +hashes that you see as an end user are from revisions of the +changelog. Although filelogs and the manifest also use hashes, +Mercurial only uses these behind the scenes. + +Mercurial verifies that hashes are correct when it retrieves file +revisions and when it pulls changes from another repository. If it +encounters an integrity problem, it will complain and stop whatever +it's doing. + +In addition to the effect it has on retrieval efficiency, Mercurial's +use of periodic snapshots makes it more robust against partial data +corruption. If a revlog becomes partly corrupted due to a hardware +error or system bug, it's often possible to reconstruct some or most +revisions from the uncorrupted sections of the revlog, both before and +after the corrupted section. This would not be possible with a +delta-only storage model. + +\section{Revision history, branching, + and merging} + +Every entry in a Mercurial revlog knows the identity of its immediate +ancestor revision, usually referred to as its \emph{parent}. In fact, +a revision contains room for not one parent, but two. Mercurial uses +a special hash, called the ``null ID'', to represent the idea ``there +is no parent here''. This hash is simply a string of zeroes. + +In figure~\ref{fig:concepts:revlog}, you can see an example of the +conceptual structure of a revlog. Filelogs, manifests, and changelogs +all have this same structure; they differ only in the kind of data +stored in each delta or snapshot. + +The first revision in a revlog (at the bottom of the image) has the +null ID in both of its parent slots. For a ``normal'' revision, its +first parent slot contains the ID of its parent revision, and its +second contains the null ID, indicating that the revision has only one +real parent. Any two revisions that have the same parent ID are +branches. A revision that represents a merge between branches has two +normal revision IDs in its parent slots. + +\begin{figure}[ht] + \centering + \grafix{revlog} + \caption{} + \label{fig:concepts:revlog} +\end{figure} + +\section{The working directory} + +In the working directory, Mercurial stores a snapshot of the files +from the repository as of a particular changeset. + +The working directory ``knows'' which changeset it contains. When you +update the working directory to contain a particular changeset, +Mercurial looks up the appropriate revision of the manifest to find +out which files it was tracking at the time that changeset was +committed, and which revision of each file was then current. It then +recreates a copy of each of those files, with the same contents it had +when the changeset was committed. + +The \emph{dirstate} contains Mercurial's knowledge of the working +directory. This details which changeset the working directory is +updated to, and all of the files that Mercurial is tracking in the +working directory. + +Just as a revision of a revlog has room for two parents, so that it +can represent either a normal revision (with one parent) or a merge of +two earlier revisions, the dirstate has slots for two parents. When +you use the \hgcmd{update} command, the changeset that you update to +is stored in the ``first parent'' slot, and the null ID in the second. +When you \hgcmd{merge} with another changeset, the first parent +remains unchanged, and the second parent is filled in with the +changeset you're merging with. The \hgcmd{parents} command tells you +what the parents of the dirstate are. + +\subsection{What happens when you commit} + +The dirstate stores parent information for more than just book-keeping +purposes. Mercurial uses the parents of the dirstate as \emph{the + parents of a new changeset} when you perform a commit. + +\begin{figure}[ht] + \centering + \grafix{wdir} + \caption{The working directory can have two parents} + \label{fig:concepts:wdir} +\end{figure} + +Figure~\ref{fig:concepts:wdir} shows the normal state of the working +directory, where it has a single changeset as parent. That changeset +is the \emph{tip}, the newest changeset in the repository that has no +children. + +\begin{figure}[ht] + \centering + \grafix{wdir-after-commit} + \caption{The working directory gains new parents after a commit} + \label{fig:concepts:wdir-after-commit} +\end{figure} + +It's useful to think of the working directory as ``the changeset I'm +about to commit''. Any files that you tell Mercurial that you've +added, removed, renamed, or copied will be reflected in that +changeset, as will modifications to any files that Mercurial is +already tracking; the new changeset will have the parents of the +working directory as its parents. + +After a commit, Mercurial will update the parents of the working +directory, so that the first parent is the ID of the new changeset, +and the second is the null ID. This is shown in +figure~\ref{fig:concepts:wdir-after-commit}. Mercurial doesn't touch +any of the files in the working directory when you commit; it just +modifies the dirstate to note its new parents. + +\subsection{Creating a new head} + +It's perfectly normal to update the working directory to a changeset +other than the current tip. For example, you might want to know what +your project looked like last Tuesday, or you could be looking through +changesets to see which one introduced a bug. In cases like this, the +natural thing to do is update the working directory to the changeset +you're interested in, and then examine the files in the working +directory directly to see their contents as they were when you +committed that changeset. The effect of this is shown in +figure~\ref{fig:concepts:wdir-pre-branch}. + +\begin{figure}[ht] + \centering + \grafix{wdir-pre-branch} + \caption{The working directory, updated to an older changeset} + \label{fig:concepts:wdir-pre-branch} +\end{figure} + +Having updated the working directory to an older changeset, what +happens if you make some changes, and then commit? Mercurial behaves +in the same way as I outlined above. The parents of the working +directory become the parents of the new changeset. This new changeset +has no children, so it becomes the new tip. And the repository now +contains two changesets that have no children; we call these +\emph{heads}. You can see the structure that this creates in +figure~\ref{fig:concepts:wdir-branch}. + +\begin{figure}[ht] + \centering + \grafix{wdir-branch} + \caption{After a commit made while synced to an older changeset} + \label{fig:concepts:wdir-branch} +\end{figure} + +\begin{note} + If you're new to Mercurial, you should keep in mind a common + ``error'', which is to use the \hgcmd{pull} command without any + options. By default, the \hgcmd{pull} command \emph{does not} + update the working directory, so you'll bring new changesets into + your repository, but the working directory will stay synced at the + same changeset as before the pull. If you make some changes and + commit afterwards, you'll thus create a new head, because your + working directory isn't synced to whatever the current tip is. + + I put the word ``error'' in quotes because all that you need to do + to rectify this situation is \hgcmd{merge}, then \hgcmd{commit}. In + other words, this almost never has negative consequences; it just + surprises people. I'll discuss other ways to avoid this behaviour, + and why Mercurial behaves in this initially surprising way, later + on. +\end{note} + +\subsection{Merging heads} + +When you run the \hgcmd{merge} command, Mercurial leaves the first +parent of the working directory unchanged, and sets the second parent +to the changeset you're merging with, as shown in +figure~\ref{fig:concepts:wdir-merge}. + +\begin{figure}[ht] + \centering + \grafix{wdir-merge} + \caption{Merging two heads} + \label{fig:concepts:wdir-merge} +\end{figure} + +Mercurial also has to modify the working directory, to merge the files +managed in the two changesets. Simplified a little, the merging +process goes like this, for every file in the manifests of both +changesets. +\begin{itemize} +\item If neither changeset has modified a file, do nothing with that + file. +\item If one changeset has modified a file, and the other hasn't, + create the modified copy of the file in the working directory. +\item If one changeset has removed a file, and the other hasn't (or + has also deleted it), delete the file from the working directory. +\item If one changeset has removed a file, but the other has modified + the file, ask the user what to do: keep the modified file, or remove + it? +\item If both changesets have modified a file, invoke an external + merge program to choose the new contents for the merged file. This + may require input from the user. +\item If one changeset has modified a file, and the other has renamed + or copied the file, make sure that the changes follow the new name + of the file. +\end{itemize} +There are more details---merging has plenty of corner cases---but +these are the most common choices that are involved in a merge. As +you can see, most cases are completely automatic, and indeed most +merges finish automatically, without requiring your input to resolve +any conflicts. + +When you're thinking about what happens when you commit after a merge, +once again the working directory is ``the changeset I'm about to +commit''. After the \hgcmd{merge} command completes, the working +directory has two parents; these will become the parents of the new +changeset. + +Mercurial lets you perform multiple merges, but you must commit the +results of each individual merge as you go. This is necessary because +Mercurial only tracks two parents for both revisions and the working +directory. While it would be technically possible to merge multiple +changesets at once, the prospect of user confusion and making a +terrible mess of a merge immediately becomes overwhelming. + +\section{Other interesting design features} + +In the sections above, I've tried to highlight some of the most +important aspects of Mercurial's design, to illustrate that it pays +careful attention to reliability and performance. However, the +attention to detail doesn't stop there. There are a number of other +aspects of Mercurial's construction that I personally find +interesting. I'll detail a few of them here, separate from the ``big +ticket'' items above, so that if you're interested, you can gain a +better idea of the amount of thinking that goes into a well-designed +system. + +\subsection{Clever compression} + +When appropriate, Mercurial will store both snapshots and deltas in +compressed form. It does this by always \emph{trying to} compress a +snapshot or delta, but only storing the compressed version if it's +smaller than the uncompressed version. + +This means that Mercurial does ``the right thing'' when storing a file +whose native form is compressed, such as a \texttt{zip} archive or a +JPEG image. When these types of files are compressed a second time, +the resulting file is usually bigger than the once-compressed form, +and so Mercurial will store the plain \texttt{zip} or JPEG. + +Deltas between revisions of a compressed file are usually larger than +snapshots of the file, and Mercurial again does ``the right thing'' in +these cases. It finds that such a delta exceeds the threshold at +which it should store a complete snapshot of the file, so it stores +the snapshot, again saving space compared to a naive delta-only +approach. + +\subsubsection{Network recompression} + +When storing revisions on disk, Mercurial uses the ``deflate'' +compression algorithm (the same one used by the popular \texttt{zip} +archive format), which balances good speed with a respectable +compression ratio. However, when transmitting revision data over a +network connection, Mercurial uncompresses the compressed revision +data. + +If the connection is over HTTP, Mercurial recompresses the entire +stream of data using a compression algorithm that gives a better +compression ratio (the Burrows-Wheeler algorithm from the widely used +\texttt{bzip2} compression package). This combination of algorithm +and compression of the entire stream (instead of a revision at a time) +substantially reduces the number of bytes to be transferred, yielding +better network performance over almost all kinds of network. + +(If the connection is over \command{ssh}, Mercurial \emph{doesn't} +recompress the stream, because \command{ssh} can already do this +itself.) + +\subsection{Read/write ordering and atomicity} + +Appending to files isn't the whole story when it comes to guaranteeing +that a reader won't see a partial write. If you recall +figure~\ref{fig:concepts:metadata}, revisions in the changelog point to +revisions in the manifest, and revisions in the manifest point to +revisions in filelogs. This hierarchy is deliberate. + +A writer starts a transaction by writing filelog and manifest data, +and doesn't write any changelog data until those are finished. A +reader starts by reading changelog data, then manifest data, followed +by filelog data. + +Since the writer has always finished writing filelog and manifest data +before it writes to the changelog, a reader will never read a pointer +to a partially written manifest revision from the changelog, and it will +never read a pointer to a partially written filelog revision from the +manifest. + +\subsection{Concurrent access} + +The read/write ordering and atomicity guarantees mean that Mercurial +never needs to \emph{lock} a repository when it's reading data, even +if the repository is being written to while the read is occurring. +This has a big effect on scalability; you can have an arbitrary number +of Mercurial processes safely reading data from a repository safely +all at once, no matter whether it's being written to or not. + +The lockless nature of reading means that if you're sharing a +repository on a multi-user system, you don't need to grant other local +users permission to \emph{write} to your repository in order for them +to be able to clone it or pull changes from it; they only need +\emph{read} permission. (This is \emph{not} a common feature among +revision control systems, so don't take it for granted! Most require +readers to be able to lock a repository to access it safely, and this +requires write permission on at least one directory, which of course +makes for all kinds of nasty and annoying security and administrative +problems.) + +Mercurial uses locks to ensure that only one process can write to a +repository at a time (the locking mechanism is safe even over +filesystems that are notoriously hostile to locking, such as NFS). If +a repository is locked, a writer will wait for a while to retry if the +repository becomes unlocked, but if the repository remains locked for +too long, the process attempting to write will time out after a while. +This means that your daily automated scripts won't get stuck forever +and pile up if a system crashes unnoticed, for example. (Yes, the +timeout is configurable, from zero to infinity.) + +\subsubsection{Safe dirstate access} + +As with revision data, Mercurial doesn't take a lock to read the +dirstate file; it does acquire a lock to write it. To avoid the +possibility of reading a partially written copy of the dirstate file, +Mercurial writes to a file with a unique name in the same directory as +the dirstate file, then renames the temporary file atomically to +\filename{dirstate}. The file named \filename{dirstate} is thus +guaranteed to be complete, not partially written. + +\subsection{Avoiding seeks} + +Critical to Mercurial's performance is the avoidance of seeks of the +disk head, since any seek is far more expensive than even a +comparatively large read operation. + +This is why, for example, the dirstate is stored in a single file. If +there were a dirstate file per directory that Mercurial tracked, the +disk would seek once per directory. Instead, Mercurial reads the +entire single dirstate file in one step. + +Mercurial also uses a ``copy on write'' scheme when cloning a +repository on local storage. Instead of copying every revlog file +from the old repository into the new repository, it makes a ``hard +link'', which is a shorthand way to say ``these two names point to the +same file''. When Mercurial is about to write to one of a revlog's +files, it checks to see if the number of names pointing at the file is +greater than one. If it is, more than one repository is using the +file, so Mercurial makes a new copy of the file that is private to +this repository. + +A few revision control developers have pointed out that this idea of +making a complete private copy of a file is not very efficient in its +use of storage. While this is true, storage is cheap, and this method +gives the highest performance while deferring most book-keeping to the +operating system. An alternative scheme would most likely reduce +performance and increase the complexity of the software, each of which +is much more important to the ``feel'' of day-to-day use. + +\subsection{Other contents of the dirstate} + +Because Mercurial doesn't force you to tell it when you're modifying a +file, it uses the dirstate to store some extra information so it can +determine efficiently whether you have modified a file. For each file +in the working directory, it stores the time that it last modified the +file itself, and the size of the file at that time. + +When you explicitly \hgcmd{add}, \hgcmd{remove}, \hgcmd{rename} or +\hgcmd{copy} files, Mercurial updates the dirstate so that it knows +what to do with those files when you commit. + +When Mercurial is checking the states of files in the working +directory, it first checks a file's modification time. If that has +not changed, the file must not have been modified. If the file's size +has changed, the file must have been modified. If the modification +time has changed, but the size has not, only then does Mercurial need +to read the actual contents of the file to see if they've changed. +Storing these few extra pieces of information dramatically reduces the +amount of data that Mercurial needs to read, which yields large +performance improvements compared to other revision control systems. + +%%% Local Variables: +%%% mode: latex +%%% TeX-master: "00book" +%%% End: diff -r bc14f94e726a -r 5cd47f721686 en/ch05-daily.tex --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/en/ch05-daily.tex Thu Jan 29 22:56:27 2009 -0800 @@ -0,0 +1,381 @@ +\chapter{Mercurial in daily use} +\label{chap:daily} + +\section{Telling Mercurial which files to track} + +Mercurial does not work with files in your repository unless you tell +it to manage them. The \hgcmd{status} command will tell you which +files Mercurial doesn't know about; it uses a ``\texttt{?}'' to +display such files. + +To tell Mercurial to track a file, use the \hgcmd{add} command. Once +you have added a file, the entry in the output of \hgcmd{status} for +that file changes from ``\texttt{?}'' to ``\texttt{A}''. +\interaction{daily.files.add} + +After you run a \hgcmd{commit}, the files that you added before the +commit will no longer be listed in the output of \hgcmd{status}. The +reason for this is that \hgcmd{status} only tells you about +``interesting'' files---those that you have modified or told Mercurial +to do something with---by default. If you have a repository that +contains thousands of files, you will rarely want to know about files +that Mercurial is tracking, but that have not changed. (You can still +get this information; we'll return to this later.) + +Once you add a file, Mercurial doesn't do anything with it +immediately. Instead, it will take a snapshot of the file's state the +next time you perform a commit. It will then continue to track the +changes you make to the file every time you commit, until you remove +the file. + +\subsection{Explicit versus implicit file naming} + +A useful behaviour that Mercurial has is that if you pass the name of +a directory to a command, every Mercurial command will treat this as +``I want to operate on every file in this directory and its +subdirectories''. +\interaction{daily.files.add-dir} +Notice in this example that Mercurial printed the names of the files +it added, whereas it didn't do so when we added the file named +\filename{a} in the earlier example. + +What's going on is that in the former case, we explicitly named the +file to add on the command line, so the assumption that Mercurial +makes in such cases is that you know what you were doing, and it +doesn't print any output. + +However, when we \emph{imply} the names of files by giving the name of +a directory, Mercurial takes the extra step of printing the name of +each file that it does something with. This makes it more clear what +is happening, and reduces the likelihood of a silent and nasty +surprise. This behaviour is common to most Mercurial commands. + +\subsection{Aside: Mercurial tracks files, not directories} + +Mercurial does not track directory information. Instead, it tracks +the path to a file. Before creating a file, it first creates any +missing directory components of the path. After it deletes a file, it +then deletes any empty directories that were in the deleted file's +path. This sounds like a trivial distinction, but it has one minor +practical consequence: it is not possible to represent a completely +empty directory in Mercurial. + +Empty directories are rarely useful, and there are unintrusive +workarounds that you can use to achieve an appropriate effect. The +developers of Mercurial thus felt that the complexity that would be +required to manage empty directories was not worth the limited benefit +this feature would bring. + +If you need an empty directory in your repository, there are a few +ways to achieve this. One is to create a directory, then \hgcmd{add} a +``hidden'' file to that directory. On Unix-like systems, any file +name that begins with a period (``\texttt{.}'') is treated as hidden +by most commands and GUI tools. This approach is illustrated in +figure~\ref{ex:daily:hidden}. + +\begin{figure}[ht] + \interaction{daily.files.hidden} + \caption{Simulating an empty directory using a hidden file} + \label{ex:daily:hidden} +\end{figure} + +Another way to tackle a need for an empty directory is to simply +create one in your automated build scripts before they will need it. + +\section{How to stop tracking a file} + +Once you decide that a file no longer belongs in your repository, use +the \hgcmd{remove} command; this deletes the file, and tells Mercurial +to stop tracking it. A removed file is represented in the output of +\hgcmd{status} with a ``\texttt{R}''. +\interaction{daily.files.remove} + +After you \hgcmd{remove} a file, Mercurial will no longer track +changes to that file, even if you recreate a file with the same name +in your working directory. If you do recreate a file with the same +name and want Mercurial to track the new file, simply \hgcmd{add} it. +Mercurial will know that the newly added file is not related to the +old file of the same name. + +\subsection{Removing a file does not affect its history} + +It is important to understand that removing a file has only two +effects. +\begin{itemize} +\item It removes the current version of the file from the working + directory. +\item It stops Mercurial from tracking changes to the file, from the + time of the next commit. +\end{itemize} +Removing a file \emph{does not} in any way alter the \emph{history} of +the file. + +If you update the working directory to a changeset in which a file +that you have removed was still tracked, it will reappear in the +working directory, with the contents it had when you committed that +changeset. If you then update the working directory to a later +changeset, in which the file had been removed, Mercurial will once +again remove the file from the working directory. + +\subsection{Missing files} + +Mercurial considers a file that you have deleted, but not used +\hgcmd{remove} to delete, to be \emph{missing}. A missing file is +represented with ``\texttt{!}'' in the output of \hgcmd{status}. +Mercurial commands will not generally do anything with missing files. +\interaction{daily.files.missing} + +If your repository contains a file that \hgcmd{status} reports as +missing, and you want the file to stay gone, you can run +\hgcmdargs{remove}{\hgopt{remove}{--after}} at any time later on, to +tell Mercurial that you really did mean to remove the file. +\interaction{daily.files.remove-after} + +On the other hand, if you deleted the missing file by accident, use +\hgcmdargs{revert}{\emph{filename}} to recover the file. It will +reappear, in unmodified form. +\interaction{daily.files.recover-missing} + +\subsection{Aside: why tell Mercurial explicitly to + remove a file?} + +You might wonder why Mercurial requires you to explicitly tell it that +you are deleting a file. Early during the development of Mercurial, +it let you delete a file however you pleased; Mercurial would notice +the absence of the file automatically when you next ran a +\hgcmd{commit}, and stop tracking the file. In practice, this made it +too easy to accidentally remove a file without noticing. + +\subsection{Useful shorthand---adding and removing files + in one step} + +Mercurial offers a combination command, \hgcmd{addremove}, that adds +untracked files and marks missing files as removed. +\interaction{daily.files.addremove} +The \hgcmd{commit} command also provides a \hgopt{commit}{-A} option +that performs this same add-and-remove, immediately followed by a +commit. +\interaction{daily.files.commit-addremove} + +\section{Copying files} + +Mercurial provides a \hgcmd{copy} command that lets you make a new +copy of a file. When you copy a file using this command, Mercurial +makes a record of the fact that the new file is a copy of the original +file. It treats these copied files specially when you merge your work +with someone else's. + +\subsection{The results of copying during a merge} + +What happens during a merge is that changes ``follow'' a copy. To +best illustrate what this means, let's create an example. We'll start +with the usual tiny repository that contains a single file. +\interaction{daily.copy.init} +We need to do some work in parallel, so that we'll have something to +merge. So let's clone our repository. +\interaction{daily.copy.clone} +Back in our initial repository, let's use the \hgcmd{copy} command to +make a copy of the first file we created. +\interaction{daily.copy.copy} + +If we look at the output of the \hgcmd{status} command afterwards, the +copied file looks just like a normal added file. +\interaction{daily.copy.status} +But if we pass the \hgopt{status}{-C} option to \hgcmd{status}, it +prints another line of output: this is the file that our newly-added +file was copied \emph{from}. +\interaction{daily.copy.status-copy} + +Now, back in the repository we cloned, let's make a change in +parallel. We'll add a line of content to the original file that we +created. +\interaction{daily.copy.other} +Now we have a modified \filename{file} in this repository. When we +pull the changes from the first repository, and merge the two heads, +Mercurial will propagate the changes that we made locally to +\filename{file} into its copy, \filename{new-file}. +\interaction{daily.copy.merge} + +\subsection{Why should changes follow copies?} +\label{sec:daily:why-copy} + +This behaviour, of changes to a file propagating out to copies of the +file, might seem esoteric, but in most cases it's highly desirable. + +First of all, remember that this propagation \emph{only} happens when +you merge. So if you \hgcmd{copy} a file, and subsequently modify the +original file during the normal course of your work, nothing will +happen. + +The second thing to know is that modifications will only propagate +across a copy as long as the repository that you're pulling changes +from \emph{doesn't know} about the copy. + +The reason that Mercurial does this is as follows. Let's say I make +an important bug fix in a source file, and commit my changes. +Meanwhile, you've decided to \hgcmd{copy} the file in your repository, +without knowing about the bug or having seen the fix, and you have +started hacking on your copy of the file. + +If you pulled and merged my changes, and Mercurial \emph{didn't} +propagate changes across copies, your source file would now contain +the bug, and unless you remembered to propagate the bug fix by hand, +the bug would \emph{remain} in your copy of the file. + +By automatically propagating the change that fixed the bug from the +original file to the copy, Mercurial prevents this class of problem. +To my knowledge, Mercurial is the \emph{only} revision control system +that propagates changes across copies like this. + +Once your change history has a record that the copy and subsequent +merge occurred, there's usually no further need to propagate changes +from the original file to the copied file, and that's why Mercurial +only propagates changes across copies until this point, and no +further. + +\subsection{How to make changes \emph{not} follow a copy} + +If, for some reason, you decide that this business of automatically +propagating changes across copies is not for you, simply use your +system's normal file copy command (on Unix-like systems, that's +\command{cp}) to make a copy of a file, then \hgcmd{add} the new copy +by hand. Before you do so, though, please do reread +section~\ref{sec:daily:why-copy}, and make an informed decision that +this behaviour is not appropriate to your specific case. + +\subsection{Behaviour of the \hgcmd{copy} command} + +When you use the \hgcmd{copy} command, Mercurial makes a copy of each +source file as it currently stands in the working directory. This +means that if you make some modifications to a file, then \hgcmd{copy} +it without first having committed those changes, the new copy will +also contain the modifications you have made up until that point. (I +find this behaviour a little counterintuitive, which is why I mention +it here.) + +The \hgcmd{copy} command acts similarly to the Unix \command{cp} +command (you can use the \hgcmd{cp} alias if you prefer). The last +argument is the \emph{destination}, and all prior arguments are +\emph{sources}. If you pass it a single file as the source, and the +destination does not exist, it creates a new file with that name. +\interaction{daily.copy.simple} +If the destination is a directory, Mercurial copies its sources into +that directory. +\interaction{daily.copy.dir-dest} +Copying a directory is recursive, and preserves the directory +structure of the source. +\interaction{daily.copy.dir-src} +If the source and destination are both directories, the source tree is +recreated in the destination directory. +\interaction{daily.copy.dir-src-dest} + +As with the \hgcmd{rename} command, if you copy a file manually and +then want Mercurial to know that you've copied the file, simply use +the \hgopt{copy}{--after} option to \hgcmd{copy}. +\interaction{daily.copy.after} + +\section{Renaming files} + +It's rather more common to need to rename a file than to make a copy +of it. The reason I discussed the \hgcmd{copy} command before talking +about renaming files is that Mercurial treats a rename in essentially +the same way as a copy. Therefore, knowing what Mercurial does when +you copy a file tells you what to expect when you rename a file. + +When you use the \hgcmd{rename} command, Mercurial makes a copy of +each source file, then deletes it and marks the file as removed. +\interaction{daily.rename.rename} +The \hgcmd{status} command shows the newly copied file as added, and +the copied-from file as removed. +\interaction{daily.rename.status} +As with the results of a \hgcmd{copy}, we must use the +\hgopt{status}{-C} option to \hgcmd{status} to see that the added file +is really being tracked by Mercurial as a copy of the original, now +removed, file. +\interaction{daily.rename.status-copy} + +As with \hgcmd{remove} and \hgcmd{copy}, you can tell Mercurial about +a rename after the fact using the \hgopt{rename}{--after} option. In +most other respects, the behaviour of the \hgcmd{rename} command, and +the options it accepts, are similar to the \hgcmd{copy} command. + +\subsection{Renaming files and merging changes} + +Since Mercurial's rename is implemented as copy-and-remove, the same +propagation of changes happens when you merge after a rename as after +a copy. + +If I modify a file, and you rename it to a new name, and then we merge +our respective changes, my modifications to the file under its +original name will be propagated into the file under its new name. +(This is something you might expect to ``simply work,'' but not all +revision control systems actually do this.) + +Whereas having changes follow a copy is a feature where you can +perhaps nod and say ``yes, that might be useful,'' it should be clear +that having them follow a rename is definitely important. Without +this facility, it would simply be too easy for changes to become +orphaned when files are renamed. + +\subsection{Divergent renames and merging} + +The case of diverging names occurs when two developers start with a +file---let's call it \filename{foo}---in their respective +repositories. + +\interaction{rename.divergent.clone} +Anne renames the file to \filename{bar}. +\interaction{rename.divergent.rename.anne} +Meanwhile, Bob renames it to \filename{quux}. +\interaction{rename.divergent.rename.bob} + +I like to think of this as a conflict because each developer has +expressed different intentions about what the file ought to be named. + +What do you think should happen when they merge their work? +Mercurial's actual behaviour is that it always preserves \emph{both} +names when it merges changesets that contain divergent renames. +\interaction{rename.divergent.merge} + +Notice that Mercurial does warn about the divergent renames, but it +leaves it up to you to do something about the divergence after the merge. + +\subsection{Convergent renames and merging} + +Another kind of rename conflict occurs when two people choose to +rename different \emph{source} files to the same \emph{destination}. +In this case, Mercurial runs its normal merge machinery, and lets you +guide it to a suitable resolution. + +\subsection{Other name-related corner cases} + +Mercurial has a longstanding bug in which it fails to handle a merge +where one side has a file with a given name, while another has a +directory with the same name. This is documented as~\bug{29}. +\interaction{issue29.go} + +\section{Recovering from mistakes} + +Mercurial has some useful commands that will help you to recover from +some common mistakes. + +The \hgcmd{revert} command lets you undo changes that you have made to +your working directory. For example, if you \hgcmd{add} a file by +accident, just run \hgcmd{revert} with the name of the file you added, +and while the file won't be touched in any way, it won't be tracked +for adding by Mercurial any longer, either. You can also use +\hgcmd{revert} to get rid of erroneous changes to a file. + +It's useful to remember that the \hgcmd{revert} command is useful for +changes that you have not yet committed. Once you've committed a +change, if you decide it was a mistake, you can still do something +about it, though your options may be more limited. + +For more information about the \hgcmd{revert} command, and details +about how to deal with changes you have already committed, see +chapter~\ref{chap:undo}. + +%%% Local Variables: +%%% mode: latex +%%% TeX-master: "00book" +%%% End: diff -r bc14f94e726a -r 5cd47f721686 en/ch06-collab.tex --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/en/ch06-collab.tex Thu Jan 29 22:56:27 2009 -0800 @@ -0,0 +1,1118 @@ +\chapter{Collaborating with other people} +\label{cha:collab} + +As a completely decentralised tool, Mercurial doesn't impose any +policy on how people ought to work with each other. However, if +you're new to distributed revision control, it helps to have some +tools and examples in mind when you're thinking about possible +workflow models. + +\section{Mercurial's web interface} + +Mercurial has a powerful web interface that provides several +useful capabilities. + +For interactive use, the web interface lets you browse a single +repository or a collection of repositories. You can view the history +of a repository, examine each change (comments and diffs), and view +the contents of each directory and file. + +Also for human consumption, the web interface provides an RSS feed of +the changes in a repository. This lets you ``subscribe'' to a +repository using your favourite feed reader, and be automatically +notified of activity in that repository as soon as it happens. I find +this capability much more convenient than the model of subscribing to +a mailing list to which notifications are sent, as it requires no +additional configuration on the part of whoever is serving the +repository. + +The web interface also lets remote users clone a repository, pull +changes from it, and (when the server is configured to permit it) push +changes back to it. Mercurial's HTTP tunneling protocol aggressively +compresses data, so that it works efficiently even over low-bandwidth +network connections. + +The easiest way to get started with the web interface is to use your +web browser to visit an existing repository, such as the master +Mercurial repository at +\url{http://www.selenic.com/repo/hg?style=gitweb}. + +If you're interested in providing a web interface to your own +repositories, Mercurial provides two ways to do this. The first is +using the \hgcmd{serve} command, which is best suited to short-term +``lightweight'' serving. See section~\ref{sec:collab:serve} below for +details of how to use this command. If you have a long-lived +repository that you'd like to make permanently available, Mercurial +has built-in support for the CGI (Common Gateway Interface) standard, +which all common web servers support. See +section~\ref{sec:collab:cgi} for details of CGI configuration. + +\section{Collaboration models} + +With a suitably flexible tool, making decisions about workflow is much +more of a social engineering challenge than a technical one. +Mercurial imposes few limitations on how you can structure the flow of +work in a project, so it's up to you and your group to set up and live +with a model that matches your own particular needs. + +\subsection{Factors to keep in mind} + +The most important aspect of any model that you must keep in mind is +how well it matches the needs and capabilities of the people who will +be using it. This might seem self-evident; even so, you still can't +afford to forget it for a moment. + +I once put together a workflow model that seemed to make perfect sense +to me, but that caused a considerable amount of consternation and +strife within my development team. In spite of my attempts to explain +why we needed a complex set of branches, and how changes ought to flow +between them, a few team members revolted. Even though they were +smart people, they didn't want to pay attention to the constraints we +were operating under, or face the consequences of those constraints in +the details of the model that I was advocating. + +Don't sweep foreseeable social or technical problems under the rug. +Whatever scheme you put into effect, you should plan for mistakes and +problem scenarios. Consider adding automated machinery to prevent, or +quickly recover from, trouble that you can anticipate. As an example, +if you intend to have a branch with not-for-release changes in it, +you'd do well to think early about the possibility that someone might +accidentally merge those changes into a release branch. You could +avoid this particular problem by writing a hook that prevents changes +from being merged from an inappropriate branch. + +\subsection{Informal anarchy} + +I wouldn't suggest an ``anything goes'' approach as something +sustainable, but it's a model that's easy to grasp, and it works +perfectly well in a few unusual situations. + +As one example, many projects have a loose-knit group of collaborators +who rarely physically meet each other. Some groups like to overcome +the isolation of working at a distance by organising occasional +``sprints''. In a sprint, a number of people get together in a single +location (a company's conference room, a hotel meeting room, that kind +of place) and spend several days more or less locked in there, hacking +intensely on a handful of projects. + +A sprint is the perfect place to use the \hgcmd{serve} command, since +\hgcmd{serve} does not requires any fancy server infrastructure. You +can get started with \hgcmd{serve} in moments, by reading +section~\ref{sec:collab:serve} below. Then simply tell the person +next to you that you're running a server, send the URL to them in an +instant message, and you immediately have a quick-turnaround way to +work together. They can type your URL into their web browser and +quickly review your changes; or they can pull a bugfix from you and +verify it; or they can clone a branch containing a new feature and try +it out. + +The charm, and the problem, with doing things in an ad hoc fashion +like this is that only people who know about your changes, and where +they are, can see them. Such an informal approach simply doesn't +scale beyond a handful people, because each individual needs to know +about $n$ different repositories to pull from. + +\subsection{A single central repository} + +For smaller projects migrating from a centralised revision control +tool, perhaps the easiest way to get started is to have changes flow +through a single shared central repository. This is also the +most common ``building block'' for more ambitious workflow schemes. + +Contributors start by cloning a copy of this repository. They can +pull changes from it whenever they need to, and some (perhaps all) +developers have permission to push a change back when they're ready +for other people to see it. + +Under this model, it can still often make sense for people to pull +changes directly from each other, without going through the central +repository. Consider a case in which I have a tentative bug fix, but +I am worried that if I were to publish it to the central repository, +it might subsequently break everyone else's trees as they pull it. To +reduce the potential for damage, I can ask you to clone my repository +into a temporary repository of your own and test it. This lets us put +off publishing the potentially unsafe change until it has had a little +testing. + +In this kind of scenario, people usually use the \command{ssh} +protocol to securely push changes to the central repository, as +documented in section~\ref{sec:collab:ssh}. It's also usual to +publish a read-only copy of the repository over HTTP using CGI, as in +section~\ref{sec:collab:cgi}. Publishing over HTTP satisfies the +needs of people who don't have push access, and those who want to use +web browsers to browse the repository's history. + +\subsection{Working with multiple branches} + +Projects of any significant size naturally tend to make progress on +several fronts simultaneously. In the case of software, it's common +for a project to go through periodic official releases. A release +might then go into ``maintenance mode'' for a while after its first +publication; maintenance releases tend to contain only bug fixes, not +new features. In parallel with these maintenance releases, one or +more future releases may be under development. People normally use +the word ``branch'' to refer to one of these many slightly different +directions in which development is proceeding. + +Mercurial is particularly well suited to managing a number of +simultaneous, but not identical, branches. Each ``development +direction'' can live in its own central repository, and you can merge +changes from one to another as the need arises. Because repositories +are independent of each other, unstable changes in a development +branch will never affect a stable branch unless someone explicitly +merges those changes in. + +Here's an example of how this can work in practice. Let's say you +have one ``main branch'' on a central server. +\interaction{branching.init} +People clone it, make changes locally, test them, and push them back. + +Once the main branch reaches a release milestone, you can use the +\hgcmd{tag} command to give a permanent name to the milestone +revision. +\interaction{branching.tag} +Let's say some ongoing development occurs on the main branch. +\interaction{branching.main} +Using the tag that was recorded at the milestone, people who clone +that repository at any time in the future can use \hgcmd{update} to +get a copy of the working directory exactly as it was when that tagged +revision was committed. +\interaction{branching.update} + +In addition, immediately after the main branch is tagged, someone can +then clone the main branch on the server to a new ``stable'' branch, +also on the server. +\interaction{branching.clone} + +Someone who needs to make a change to the stable branch can then clone +\emph{that} repository, make their changes, commit, and push their +changes back there. +\interaction{branching.stable} +Because Mercurial repositories are independent, and Mercurial doesn't +move changes around automatically, the stable and main branches are +\emph{isolated} from each other. The changes that you made on the +main branch don't ``leak'' to the stable branch, and vice versa. + +You'll often want all of your bugfixes on the stable branch to show up +on the main branch, too. Rather than rewrite a bugfix on the main +branch, you can simply pull and merge changes from the stable to the +main branch, and Mercurial will bring those bugfixes in for you. +\interaction{branching.merge} +The main branch will still contain changes that are not on the stable +branch, but it will also contain all of the bugfixes from the stable +branch. The stable branch remains unaffected by these changes. + +\subsection{Feature branches} + +For larger projects, an effective way to manage change is to break up +a team into smaller groups. Each group has a shared branch of its +own, cloned from a single ``master'' branch used by the entire +project. People working on an individual branch are typically quite +isolated from developments on other branches. + +\begin{figure}[ht] + \centering + \grafix{feature-branches} + \caption{Feature branches} + \label{fig:collab:feature-branches} +\end{figure} + +When a particular feature is deemed to be in suitable shape, someone +on that feature team pulls and merges from the master branch into the +feature branch, then pushes back up to the master branch. + +\subsection{The release train} + +Some projects are organised on a ``train'' basis: a release is +scheduled to happen every few months, and whatever features are ready +when the ``train'' is ready to leave are allowed in. + +This model resembles working with feature branches. The difference is +that when a feature branch misses a train, someone on the feature team +pulls and merges the changes that went out on that train release into +the feature branch, and the team continues its work on top of that +release so that their feature can make the next release. + +\subsection{The Linux kernel model} + +The development of the Linux kernel has a shallow hierarchical +structure, surrounded by a cloud of apparent chaos. Because most +Linux developers use \command{git}, a distributed revision control +tool with capabilities similar to Mercurial, it's useful to describe +the way work flows in that environment; if you like the ideas, the +approach translates well across tools. + +At the center of the community sits Linus Torvalds, the creator of +Linux. He publishes a single source repository that is considered the +``authoritative'' current tree by the entire developer community. +Anyone can clone Linus's tree, but he is very choosy about whose trees +he pulls from. + +Linus has a number of ``trusted lieutenants''. As a general rule, he +pulls whatever changes they publish, in most cases without even +reviewing those changes. Some of those lieutenants are generally +agreed to be ``maintainers'', responsible for specific subsystems +within the kernel. If a random kernel hacker wants to make a change +to a subsystem that they want to end up in Linus's tree, they must +find out who the subsystem's maintainer is, and ask that maintainer to +take their change. If the maintainer reviews their changes and agrees +to take them, they'll pass them along to Linus in due course. + +Individual lieutenants have their own approaches to reviewing, +accepting, and publishing changes; and for deciding when to feed them +to Linus. In addition, there are several well known branches that +people use for different purposes. For example, a few people maintain +``stable'' repositories of older versions of the kernel, to which they +apply critical fixes as needed. Some maintainers publish multiple +trees: one for experimental changes; one for changes that they are +about to feed upstream; and so on. Others just publish a single +tree. + +This model has two notable features. The first is that it's ``pull +only''. You have to ask, convince, or beg another developer to take a +change from you, because there are almost no trees to which more than +one person can push, and there's no way to push changes into a tree +that someone else controls. + +The second is that it's based on reputation and acclaim. If you're an +unknown, Linus will probably ignore changes from you without even +responding. But a subsystem maintainer will probably review them, and +will likely take them if they pass their criteria for suitability. +The more ``good'' changes you contribute to a maintainer, the more +likely they are to trust your judgment and accept your changes. If +you're well-known and maintain a long-lived branch for something Linus +hasn't yet accepted, people with similar interests may pull your +changes regularly to keep up with your work. + +Reputation and acclaim don't necessarily cross subsystem or ``people'' +boundaries. If you're a respected but specialised storage hacker, and +you try to fix a networking bug, that change will receive a level of +scrutiny from a network maintainer comparable to a change from a +complete stranger. + +To people who come from more orderly project backgrounds, the +comparatively chaotic Linux kernel development process often seems +completely insane. It's subject to the whims of individuals; people +make sweeping changes whenever they deem it appropriate; and the pace +of development is astounding. And yet Linux is a highly successful, +well-regarded piece of software. + +\subsection{Pull-only versus shared-push collaboration} + +A perpetual source of heat in the open source community is whether a +development model in which people only ever pull changes from others +is ``better than'' one in which multiple people can push changes to a +shared repository. + +Typically, the backers of the shared-push model use tools that +actively enforce this approach. If you're using a centralised +revision control tool such as Subversion, there's no way to make a +choice over which model you'll use: the tool gives you shared-push, +and if you want to do anything else, you'll have to roll your own +approach on top (such as applying a patch by hand). + +A good distributed revision control tool, such as Mercurial, will +support both models. You and your collaborators can then structure +how you work together based on your own needs and preferences, not on +what contortions your tools force you into. + +\subsection{Where collaboration meets branch management} + +Once you and your team set up some shared repositories and start +propagating changes back and forth between local and shared repos, you +begin to face a related, but slightly different challenge: that of +managing the multiple directions in which your team may be moving at +once. Even though this subject is intimately related to how your team +collaborates, it's dense enough to merit treatment of its own, in +chapter~\ref{chap:branch}. + +\section{The technical side of sharing} + +The remainder of this chapter is devoted to the question of serving +data to your collaborators. + +\section{Informal sharing with \hgcmd{serve}} +\label{sec:collab:serve} + +Mercurial's \hgcmd{serve} command is wonderfully suited to small, +tight-knit, and fast-paced group environments. It also provides a +great way to get a feel for using Mercurial commands over a network. + +Run \hgcmd{serve} inside a repository, and in under a second it will +bring up a specialised HTTP server; this will accept connections from +any client, and serve up data for that repository until you terminate +it. Anyone who knows the URL of the server you just started, and can +talk to your computer over the network, can then use a web browser or +Mercurial to read data from that repository. A URL for a +\hgcmd{serve} instance running on a laptop is likely to look something +like \Verb|http://my-laptop.local:8000/|. + +The \hgcmd{serve} command is \emph{not} a general-purpose web server. +It can do only two things: +\begin{itemize} +\item Allow people to browse the history of the repository it's + serving, from their normal web browsers. +\item Speak Mercurial's wire protocol, so that people can + \hgcmd{clone} or \hgcmd{pull} changes from that repository. +\end{itemize} +In particular, \hgcmd{serve} won't allow remote users to \emph{modify} +your repository. It's intended for read-only use. + +If you're getting started with Mercurial, there's nothing to prevent +you from using \hgcmd{serve} to serve up a repository on your own +computer, then use commands like \hgcmd{clone}, \hgcmd{incoming}, and +so on to talk to that server as if the repository was hosted remotely. +This can help you to quickly get acquainted with using commands on +network-hosted repositories. + +\subsection{A few things to keep in mind} + +Because it provides unauthenticated read access to all clients, you +should only use \hgcmd{serve} in an environment where you either don't +care, or have complete control over, who can access your network and +pull data from your repository. + +The \hgcmd{serve} command knows nothing about any firewall software +you might have installed on your system or network. It cannot detect +or control your firewall software. If other people are unable to talk +to a running \hgcmd{serve} instance, the second thing you should do +(\emph{after} you make sure that they're using the correct URL) is +check your firewall configuration. + +By default, \hgcmd{serve} listens for incoming connections on +port~8000. If another process is already listening on the port you +want to use, you can specify a different port to listen on using the +\hgopt{serve}{-p} option. + +Normally, when \hgcmd{serve} starts, it prints no output, which can be +a bit unnerving. If you'd like to confirm that it is indeed running +correctly, and find out what URL you should send to your +collaborators, start it with the \hggopt{-v} option. + +\section{Using the Secure Shell (ssh) protocol} +\label{sec:collab:ssh} + +You can pull and push changes securely over a network connection using +the Secure Shell (\texttt{ssh}) protocol. To use this successfully, +you may have to do a little bit of configuration on the client or +server sides. + +If you're not familiar with ssh, it's a network protocol that lets you +securely communicate with another computer. To use it with Mercurial, +you'll be setting up one or more user accounts on a server so that +remote users can log in and execute commands. + +(If you \emph{are} familiar with ssh, you'll probably find some of the +material that follows to be elementary in nature.) + +\subsection{How to read and write ssh URLs} + +An ssh URL tends to look like this: +\begin{codesample2} + ssh://bos@hg.serpentine.com:22/hg/hgbook +\end{codesample2} +\begin{enumerate} +\item The ``\texttt{ssh://}'' part tells Mercurial to use the ssh + protocol. +\item The ``\texttt{bos@}'' component indicates what username to log + into the server as. You can leave this out if the remote username + is the same as your local username. +\item The ``\texttt{hg.serpentine.com}'' gives the hostname of the + server to log into. +\item The ``:22'' identifies the port number to connect to the server + on. The default port is~22, so you only need to specify this part + if you're \emph{not} using port~22. +\item The remainder of the URL is the local path to the repository on + the server. +\end{enumerate} + +There's plenty of scope for confusion with the path component of ssh +URLs, as there is no standard way for tools to interpret it. Some +programs behave differently than others when dealing with these paths. +This isn't an ideal situation, but it's unlikely to change. Please +read the following paragraphs carefully. + +Mercurial treats the path to a repository on the server as relative to +the remote user's home directory. For example, if user \texttt{foo} +on the server has a home directory of \dirname{/home/foo}, then an ssh +URL that contains a path component of \dirname{bar} +\emph{really} refers to the directory \dirname{/home/foo/bar}. + +If you want to specify a path relative to another user's home +directory, you can use a path that starts with a tilde character +followed by the user's name (let's call them \texttt{otheruser}), like +this. +\begin{codesample2} + ssh://server/~otheruser/hg/repo +\end{codesample2} + +And if you really want to specify an \emph{absolute} path on the +server, begin the path component with two slashes, as in this example. +\begin{codesample2} + ssh://server//absolute/path +\end{codesample2} + +\subsection{Finding an ssh client for your system} + +Almost every Unix-like system comes with OpenSSH preinstalled. If +you're using such a system, run \Verb|which ssh| to find out if +the \command{ssh} command is installed (it's usually in +\dirname{/usr/bin}). In the unlikely event that it isn't present, +take a look at your system documentation to figure out how to install +it. + +On Windows, you'll first need to download a suitable ssh +client. There are two alternatives. +\begin{itemize} +\item Simon Tatham's excellent PuTTY package~\cite{web:putty} provides + a complete suite of ssh client commands. +\item If you have a high tolerance for pain, you can use the Cygwin + port of OpenSSH. +\end{itemize} +In either case, you'll need to edit your \hgini\ file to tell +Mercurial where to find the actual client command. For example, if +you're using PuTTY, you'll need to use the \command{plink} command as +a command-line ssh client. +\begin{codesample2} + [ui] + ssh = C:/path/to/plink.exe -ssh -i "C:/path/to/my/private/key" +\end{codesample2} + +\begin{note} + The path to \command{plink} shouldn't contain any whitespace + characters, or Mercurial may not be able to run it correctly (so + putting it in \dirname{C:\\Program Files} is probably not a good + idea). +\end{note} + +\subsection{Generating a key pair} + +To avoid the need to repetitively type a password every time you need +to use your ssh client, I recommend generating a key pair. On a +Unix-like system, the \command{ssh-keygen} command will do the trick. +On Windows, if you're using PuTTY, the \command{puttygen} command is +what you'll need. + +When you generate a key pair, it's usually \emph{highly} advisable to +protect it with a passphrase. (The only time that you might not want +to do this is when you're using the ssh protocol for automated tasks +on a secure network.) + +Simply generating a key pair isn't enough, however. You'll need to +add the public key to the set of authorised keys for whatever user +you're logging in remotely as. For servers using OpenSSH (the vast +majority), this will mean adding the public key to a list in a file +called \sfilename{authorized\_keys} in their \sdirname{.ssh} +directory. + +On a Unix-like system, your public key will have a \filename{.pub} +extension. If you're using \command{puttygen} on Windows, you can +save the public key to a file of your choosing, or paste it from the +window it's displayed in straight into the +\sfilename{authorized\_keys} file. + +\subsection{Using an authentication agent} + +An authentication agent is a daemon that stores passphrases in memory +(so it will forget passphrases if you log out and log back in again). +An ssh client will notice if it's running, and query it for a +passphrase. If there's no authentication agent running, or the agent +doesn't store the necessary passphrase, you'll have to type your +passphrase every time Mercurial tries to communicate with a server on +your behalf (e.g.~whenever you pull or push changes). + +The downside of storing passphrases in an agent is that it's possible +for a well-prepared attacker to recover the plain text of your +passphrases, in some cases even if your system has been power-cycled. +You should make your own judgment as to whether this is an acceptable +risk. It certainly saves a lot of repeated typing. + +On Unix-like systems, the agent is called \command{ssh-agent}, and +it's often run automatically for you when you log in. You'll need to +use the \command{ssh-add} command to add passphrases to the agent's +store. On Windows, if you're using PuTTY, the \command{pageant} +command acts as the agent. It adds an icon to your system tray that +will let you manage stored passphrases. + +\subsection{Configuring the server side properly} + +Because ssh can be fiddly to set up if you're new to it, there's a +variety of things that can go wrong. Add Mercurial on top, and +there's plenty more scope for head-scratching. Most of these +potential problems occur on the server side, not the client side. The +good news is that once you've gotten a configuration working, it will +usually continue to work indefinitely. + +Before you try using Mercurial to talk to an ssh server, it's best to +make sure that you can use the normal \command{ssh} or \command{putty} +command to talk to the server first. If you run into problems with +using these commands directly, Mercurial surely won't work. Worse, it +will obscure the underlying problem. Any time you want to debug +ssh-related Mercurial problems, you should drop back to making sure +that plain ssh client commands work first, \emph{before} you worry +about whether there's a problem with Mercurial. + +The first thing to be sure of on the server side is that you can +actually log in from another machine at all. If you can't use +\command{ssh} or \command{putty} to log in, the error message you get +may give you a few hints as to what's wrong. The most common problems +are as follows. +\begin{itemize} +\item If you get a ``connection refused'' error, either there isn't an + SSH daemon running on the server at all, or it's inaccessible due to + firewall configuration. +\item If you get a ``no route to host'' error, you either have an + incorrect address for the server or a seriously locked down firewall + that won't admit its existence at all. +\item If you get a ``permission denied'' error, you may have mistyped + the username on the server, or you could have mistyped your key's + passphrase or the remote user's password. +\end{itemize} +In summary, if you're having trouble talking to the server's ssh +daemon, first make sure that one is running at all. On many systems +it will be installed, but disabled, by default. Once you're done with +this step, you should then check that the server's firewall is +configured to allow incoming connections on the port the ssh daemon is +listening on (usually~22). Don't worry about more exotic +possibilities for misconfiguration until you've checked these two +first. + +If you're using an authentication agent on the client side to store +passphrases for your keys, you ought to be able to log into the server +without being prompted for a passphrase or a password. If you're +prompted for a passphrase, there are a few possible culprits. +\begin{itemize} +\item You might have forgotten to use \command{ssh-add} or + \command{pageant} to store the passphrase. +\item You might have stored the passphrase for the wrong key. +\end{itemize} +If you're being prompted for the remote user's password, there are +another few possible problems to check. +\begin{itemize} +\item Either the user's home directory or their \sdirname{.ssh} + directory might have excessively liberal permissions. As a result, + the ssh daemon will not trust or read their + \sfilename{authorized\_keys} file. For example, a group-writable + home or \sdirname{.ssh} directory will often cause this symptom. +\item The user's \sfilename{authorized\_keys} file may have a problem. + If anyone other than the user owns or can write to that file, the + ssh daemon will not trust or read it. +\end{itemize} + +In the ideal world, you should be able to run the following command +successfully, and it should print exactly one line of output, the +current date and time. +\begin{codesample2} + ssh myserver date +\end{codesample2} + +If, on your server, you have login scripts that print banners or other +junk even when running non-interactive commands like this, you should +fix them before you continue, so that they only print output if +they're run interactively. Otherwise these banners will at least +clutter up Mercurial's output. Worse, they could potentially cause +problems with running Mercurial commands remotely. Mercurial makes +tries to detect and ignore banners in non-interactive \command{ssh} +sessions, but it is not foolproof. (If you're editing your login +scripts on your server, the usual way to see if a login script is +running in an interactive shell is to check the return code from the +command \Verb|tty -s|.) + +Once you've verified that plain old ssh is working with your server, +the next step is to ensure that Mercurial runs on the server. The +following command should run successfully: +\begin{codesample2} + ssh myserver hg version +\end{codesample2} +If you see an error message instead of normal \hgcmd{version} output, +this is usually because you haven't installed Mercurial to +\dirname{/usr/bin}. Don't worry if this is the case; you don't need +to do that. But you should check for a few possible problems. +\begin{itemize} +\item Is Mercurial really installed on the server at all? I know this + sounds trivial, but it's worth checking! +\item Maybe your shell's search path (usually set via the \envar{PATH} + environment variable) is simply misconfigured. +\item Perhaps your \envar{PATH} environment variable is only being set + to point to the location of the \command{hg} executable if the login + session is interactive. This can happen if you're setting the path + in the wrong shell login script. See your shell's documentation for + details. +\item The \envar{PYTHONPATH} environment variable may need to contain + the path to the Mercurial Python modules. It might not be set at + all; it could be incorrect; or it may be set only if the login is + interactive. +\end{itemize} + +If you can run \hgcmd{version} over an ssh connection, well done! +You've got the server and client sorted out. You should now be able +to use Mercurial to access repositories hosted by that username on +that server. If you run into problems with Mercurial and ssh at this +point, try using the \hggopt{--debug} option to get a clearer picture +of what's going on. + +\subsection{Using compression with ssh} + +Mercurial does not compress data when it uses the ssh protocol, +because the ssh protocol can transparently compress data. However, +the default behaviour of ssh clients is \emph{not} to request +compression. + +Over any network other than a fast LAN (even a wireless network), +using compression is likely to significantly speed up Mercurial's +network operations. For example, over a WAN, someone measured +compression as reducing the amount of time required to clone a +particularly large repository from~51 minutes to~17 minutes. + +Both \command{ssh} and \command{plink} accept a \cmdopt{ssh}{-C} +option which turns on compression. You can easily edit your \hgrc\ to +enable compression for all of Mercurial's uses of the ssh protocol. +\begin{codesample2} + [ui] + ssh = ssh -C +\end{codesample2} + +If you use \command{ssh}, you can configure it to always use +compression when talking to your server. To do this, edit your +\sfilename{.ssh/config} file (which may not yet exist), as follows. +\begin{codesample2} + Host hg + Compression yes + HostName hg.example.com +\end{codesample2} +This defines an alias, \texttt{hg}. When you use it on the +\command{ssh} command line or in a Mercurial \texttt{ssh}-protocol +URL, it will cause \command{ssh} to connect to \texttt{hg.example.com} +and use compression. This gives you both a shorter name to type and +compression, each of which is a good thing in its own right. + +\section{Serving over HTTP using CGI} +\label{sec:collab:cgi} + +Depending on how ambitious you are, configuring Mercurial's CGI +interface can take anything from a few moments to several hours. + +We'll begin with the simplest of examples, and work our way towards a +more complex configuration. Even for the most basic case, you're +almost certainly going to need to read and modify your web server's +configuration. + +\begin{note} + Configuring a web server is a complex, fiddly, and highly + system-dependent activity. I can't possibly give you instructions + that will cover anything like all of the cases you will encounter. + Please use your discretion and judgment in following the sections + below. Be prepared to make plenty of mistakes, and to spend a lot + of time reading your server's error logs. +\end{note} + +\subsection{Web server configuration checklist} + +Before you continue, do take a few moments to check a few aspects of +your system's setup. + +\begin{enumerate} +\item Do you have a web server installed at all? Mac OS X ships with + Apache, but many other systems may not have a web server installed. +\item If you have a web server installed, is it actually running? On + most systems, even if one is present, it will be disabled by + default. +\item Is your server configured to allow you to run CGI programs in + the directory where you plan to do so? Most servers default to + explicitly disabling the ability to run CGI programs. +\end{enumerate} + +If you don't have a web server installed, and don't have substantial +experience configuring Apache, you should consider using the +\texttt{lighttpd} web server instead of Apache. Apache has a +well-deserved reputation for baroque and confusing configuration. +While \texttt{lighttpd} is less capable in some ways than Apache, most +of these capabilities are not relevant to serving Mercurial +repositories. And \texttt{lighttpd} is undeniably \emph{much} easier +to get started with than Apache. + +\subsection{Basic CGI configuration} + +On Unix-like systems, it's common for users to have a subdirectory +named something like \dirname{public\_html} in their home directory, +from which they can serve up web pages. A file named \filename{foo} +in this directory will be accessible at a URL of the form +\texttt{http://www.example.com/\~{}username/foo}. + +To get started, find the \sfilename{hgweb.cgi} script that should be +present in your Mercurial installation. If you can't quickly find a +local copy on your system, simply download one from the master +Mercurial repository at +\url{http://www.selenic.com/repo/hg/raw-file/tip/hgweb.cgi}. + +You'll need to copy this script into your \dirname{public\_html} +directory, and ensure that it's executable. +\begin{codesample2} + cp .../hgweb.cgi ~/public_html + chmod 755 ~/public_html/hgweb.cgi +\end{codesample2} +The \texttt{755} argument to \command{chmod} is a little more general +than just making the script executable: it ensures that the script is +executable by anyone, and that ``group'' and ``other'' write +permissions are \emph{not} set. If you were to leave those write +permissions enabled, Apache's \texttt{suexec} subsystem would likely +refuse to execute the script. In fact, \texttt{suexec} also insists +that the \emph{directory} in which the script resides must not be +writable by others. +\begin{codesample2} + chmod 755 ~/public_html +\end{codesample2} + +\subsubsection{What could \emph{possibly} go wrong?} +\label{sec:collab:wtf} + +Once you've copied the CGI script into place, go into a web browser, +and try to open the URL \url{http://myhostname/~myuser/hgweb.cgi}, +\emph{but} brace yourself for instant failure. There's a high +probability that trying to visit this URL will fail, and there are +many possible reasons for this. In fact, you're likely to stumble +over almost every one of the possible errors below, so please read +carefully. The following are all of the problems I ran into on a +system running Fedora~7, with a fresh installation of Apache, and a +user account that I created specially to perform this exercise. + +Your web server may have per-user directories disabled. If you're +using Apache, search your config file for a \texttt{UserDir} +directive. If there's none present, per-user directories will be +disabled. If one exists, but its value is \texttt{disabled}, then +per-user directories will be disabled. Otherwise, the string after +\texttt{UserDir} gives the name of the subdirectory that Apache will +look in under your home directory, for example \dirname{public\_html}. + +Your file access permissions may be too restrictive. The web server +must be able to traverse your home directory and directories under +your \dirname{public\_html} directory, and read files under the latter +too. Here's a quick recipe to help you to make your permissions more +appropriate. +\begin{codesample2} + chmod 755 ~ + find ~/public_html -type d -print0 | xargs -0r chmod 755 + find ~/public_html -type f -print0 | xargs -0r chmod 644 +\end{codesample2} + +The other possibility with permissions is that you might get a +completely empty window when you try to load the script. In this +case, it's likely that your access permissions are \emph{too + permissive}. Apache's \texttt{suexec} subsystem won't execute a +script that's group-~or world-writable, for example. + +Your web server may be configured to disallow execution of CGI +programs in your per-user web directory. Here's Apache's +default per-user configuration from my Fedora system. +\begin{codesample2} + + AllowOverride FileInfo AuthConfig Limit + Options MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec + + Order allow,deny + Allow from all + + + Order deny,allow + Deny from all + + +\end{codesample2} +If you find a similar-looking \texttt{Directory} group in your Apache +configuration, the directive to look at inside it is \texttt{Options}. +Add \texttt{ExecCGI} to the end of this list if it's missing, and +restart the web server. + +If you find that Apache serves you the text of the CGI script instead +of executing it, you may need to either uncomment (if already present) +or add a directive like this. +\begin{codesample2} + AddHandler cgi-script .cgi +\end{codesample2} + +The next possibility is that you might be served with a colourful +Python backtrace claiming that it can't import a +\texttt{mercurial}-related module. This is actually progress! The +server is now capable of executing your CGI script. This error is +only likely to occur if you're running a private installation of +Mercurial, instead of a system-wide version. Remember that the web +server runs the CGI program without any of the environment variables +that you take for granted in an interactive session. If this error +happens to you, edit your copy of \sfilename{hgweb.cgi} and follow the +directions inside it to correctly set your \envar{PYTHONPATH} +environment variable. + +Finally, you are \emph{certain} to by served with another colourful +Python backtrace: this one will complain that it can't find +\dirname{/path/to/repository}. Edit your \sfilename{hgweb.cgi} script +and replace the \dirname{/path/to/repository} string with the complete +path to the repository you want to serve up. + +At this point, when you try to reload the page, you should be +presented with a nice HTML view of your repository's history. Whew! + +\subsubsection{Configuring lighttpd} + +To be exhaustive in my experiments, I tried configuring the +increasingly popular \texttt{lighttpd} web server to serve the same +repository as I described with Apache above. I had already overcome +all of the problems I outlined with Apache, many of which are not +server-specific. As a result, I was fairly sure that my file and +directory permissions were good, and that my \sfilename{hgweb.cgi} +script was properly edited. + +Once I had Apache running, getting \texttt{lighttpd} to serve the +repository was a snap (in other words, even if you're trying to use +\texttt{lighttpd}, you should read the Apache section). I first had +to edit the \texttt{mod\_access} section of its config file to enable +\texttt{mod\_cgi} and \texttt{mod\_userdir}, both of which were +disabled by default on my system. I then added a few lines to the end +of the config file, to configure these modules. +\begin{codesample2} + userdir.path = "public_html" + cgi.assign = ( ".cgi" => "" ) +\end{codesample2} +With this done, \texttt{lighttpd} ran immediately for me. If I had +configured \texttt{lighttpd} before Apache, I'd almost certainly have +run into many of the same system-level configuration problems as I did +with Apache. However, I found \texttt{lighttpd} to be noticeably +easier to configure than Apache, even though I've used Apache for over +a decade, and this was my first exposure to \texttt{lighttpd}. + +\subsection{Sharing multiple repositories with one CGI script} + +The \sfilename{hgweb.cgi} script only lets you publish a single +repository, which is an annoying restriction. If you want to publish +more than one without wracking yourself with multiple copies of the +same script, each with different names, a better choice is to use the +\sfilename{hgwebdir.cgi} script. + +The procedure to configure \sfilename{hgwebdir.cgi} is only a little +more involved than for \sfilename{hgweb.cgi}. First, you must obtain +a copy of the script. If you don't have one handy, you can download a +copy from the master Mercurial repository at +\url{http://www.selenic.com/repo/hg/raw-file/tip/hgwebdir.cgi}. + +You'll need to copy this script into your \dirname{public\_html} +directory, and ensure that it's executable. +\begin{codesample2} + cp .../hgwebdir.cgi ~/public_html + chmod 755 ~/public_html ~/public_html/hgwebdir.cgi +\end{codesample2} +With basic configuration out of the way, try to visit +\url{http://myhostname/~myuser/hgwebdir.cgi} in your browser. It +should display an empty list of repositories. If you get a blank +window or error message, try walking through the list of potential +problems in section~\ref{sec:collab:wtf}. + +The \sfilename{hgwebdir.cgi} script relies on an external +configuration file. By default, it searches for a file named +\sfilename{hgweb.config} in the same directory as itself. You'll need +to create this file, and make it world-readable. The format of the +file is similar to a Windows ``ini'' file, as understood by Python's +\texttt{ConfigParser}~\cite{web:configparser} module. + +The easiest way to configure \sfilename{hgwebdir.cgi} is with a +section named \texttt{collections}. This will automatically publish +\emph{every} repository under the directories you name. The section +should look like this: +\begin{codesample2} + [collections] + /my/root = /my/root +\end{codesample2} +Mercurial interprets this by looking at the directory name on the +\emph{right} hand side of the ``\texttt{=}'' sign; finding +repositories in that directory hierarchy; and using the text on the +\emph{left} to strip off matching text from the names it will actually +list in the web interface. The remaining component of a path after +this stripping has occurred is called a ``virtual path''. + +Given the example above, if we have a repository whose local path is +\dirname{/my/root/this/repo}, the CGI script will strip the leading +\dirname{/my/root} from the name, and publish the repository with a +virtual path of \dirname{this/repo}. If the base URL for our CGI +script is \url{http://myhostname/~myuser/hgwebdir.cgi}, the complete +URL for that repository will be +\url{http://myhostname/~myuser/hgwebdir.cgi/this/repo}. + +If we replace \dirname{/my/root} on the left hand side of this example +with \dirname{/my}, then \sfilename{hgwebdir.cgi} will only strip off +\dirname{/my} from the repository name, and will give us a virtual +path of \dirname{root/this/repo} instead of \dirname{this/repo}. + +The \sfilename{hgwebdir.cgi} script will recursively search each +directory listed in the \texttt{collections} section of its +configuration file, but it will \texttt{not} recurse into the +repositories it finds. + +The \texttt{collections} mechanism makes it easy to publish many +repositories in a ``fire and forget'' manner. You only need to set up +the CGI script and configuration file one time. Afterwards, you can +publish or unpublish a repository at any time by simply moving it +into, or out of, the directory hierarchy in which you've configured +\sfilename{hgwebdir.cgi} to look. + +\subsubsection{Explicitly specifying which repositories to publish} + +In addition to the \texttt{collections} mechanism, the +\sfilename{hgwebdir.cgi} script allows you to publish a specific list +of repositories. To do so, create a \texttt{paths} section, with +contents of the following form. +\begin{codesample2} + [paths] + repo1 = /my/path/to/some/repo + repo2 = /some/path/to/another +\end{codesample2} +In this case, the virtual path (the component that will appear in a +URL) is on the left hand side of each definition, while the path to +the repository is on the right. Notice that there does not need to be +any relationship between the virtual path you choose and the location +of a repository in your filesystem. + +If you wish, you can use both the \texttt{collections} and +\texttt{paths} mechanisms simultaneously in a single configuration +file. + +\begin{note} + If multiple repositories have the same virtual path, + \sfilename{hgwebdir.cgi} will not report an error. Instead, it will + behave unpredictably. +\end{note} + +\subsection{Downloading source archives} + +Mercurial's web interface lets users download an archive of any +revision. This archive will contain a snapshot of the working +directory as of that revision, but it will not contain a copy of the +repository data. + +By default, this feature is not enabled. To enable it, you'll need to +add an \rcitem{web}{allow\_archive} item to the \rcsection{web} +section of your \hgrc. + +\subsection{Web configuration options} + +Mercurial's web interfaces (the \hgcmd{serve} command, and the +\sfilename{hgweb.cgi} and \sfilename{hgwebdir.cgi} scripts) have a +number of configuration options that you can set. These belong in a +section named \rcsection{web}. +\begin{itemize} +\item[\rcitem{web}{allow\_archive}] Determines which (if any) archive + download mechanisms Mercurial supports. If you enable this + feature, users of the web interface will be able to download an + archive of whatever revision of a repository they are viewing. + To enable the archive feature, this item must take the form of a + sequence of words drawn from the list below. + \begin{itemize} + \item[\texttt{bz2}] A \command{tar} archive, compressed using + \texttt{bzip2} compression. This has the best compression ratio, + but uses the most CPU time on the server. + \item[\texttt{gz}] A \command{tar} archive, compressed using + \texttt{gzip} compression. + \item[\texttt{zip}] A \command{zip} archive, compressed using LZW + compression. This format has the worst compression ratio, but is + widely used in the Windows world. + \end{itemize} + If you provide an empty list, or don't have an + \rcitem{web}{allow\_archive} entry at all, this feature will be + disabled. Here is an example of how to enable all three supported + formats. + \begin{codesample4} + [web] + allow_archive = bz2 gz zip + \end{codesample4} +\item[\rcitem{web}{allowpull}] Boolean. Determines whether the web + interface allows remote users to \hgcmd{pull} and \hgcmd{clone} this + repository over~HTTP. If set to \texttt{no} or \texttt{false}, only + the ``human-oriented'' portion of the web interface is available. +\item[\rcitem{web}{contact}] String. A free-form (but preferably + brief) string identifying the person or group in charge of the + repository. This often contains the name and email address of a + person or mailing list. It often makes sense to place this entry in + a repository's own \sfilename{.hg/hgrc} file, but it can make sense + to use in a global \hgrc\ if every repository has a single + maintainer. +\item[\rcitem{web}{maxchanges}] Integer. The default maximum number + of changesets to display in a single page of output. +\item[\rcitem{web}{maxfiles}] Integer. The default maximum number + of modified files to display in a single page of output. +\item[\rcitem{web}{stripes}] Integer. If the web interface displays + alternating ``stripes'' to make it easier to visually align rows + when you are looking at a table, this number controls the number of + rows in each stripe. +\item[\rcitem{web}{style}] Controls the template Mercurial uses to + display the web interface. Mercurial ships with two web templates, + named \texttt{default} and \texttt{gitweb} (the latter is much more + visually attractive). You can also specify a custom template of + your own; see chapter~\ref{chap:template} for details. Here, you + can see how to enable the \texttt{gitweb} style. + \begin{codesample4} + [web] + style = gitweb + \end{codesample4} +\item[\rcitem{web}{templates}] Path. The directory in which to search + for template files. By default, Mercurial searches in the directory + in which it was installed. +\end{itemize} +If you are using \sfilename{hgwebdir.cgi}, you can place a few +configuration items in a \rcsection{web} section of the +\sfilename{hgweb.config} file instead of a \hgrc\ file, for +convenience. These items are \rcitem{web}{motd} and +\rcitem{web}{style}. + +\subsubsection{Options specific to an individual repository} + +A few \rcsection{web} configuration items ought to be placed in a +repository's local \sfilename{.hg/hgrc}, rather than a user's or +global \hgrc. +\begin{itemize} +\item[\rcitem{web}{description}] String. A free-form (but preferably + brief) string that describes the contents or purpose of the + repository. +\item[\rcitem{web}{name}] String. The name to use for the repository + in the web interface. This overrides the default name, which is the + last component of the repository's path. +\end{itemize} + +\subsubsection{Options specific to the \hgcmd{serve} command} + +Some of the items in the \rcsection{web} section of a \hgrc\ file are +only for use with the \hgcmd{serve} command. +\begin{itemize} +\item[\rcitem{web}{accesslog}] Path. The name of a file into which to + write an access log. By default, the \hgcmd{serve} command writes + this information to standard output, not to a file. Log entries are + written in the standard ``combined'' file format used by almost all + web servers. +\item[\rcitem{web}{address}] String. The local address on which the + server should listen for incoming connections. By default, the + server listens on all addresses. +\item[\rcitem{web}{errorlog}] Path. The name of a file into which to + write an error log. By default, the \hgcmd{serve} command writes this + information to standard error, not to a file. +\item[\rcitem{web}{ipv6}] Boolean. Whether to use the IPv6 protocol. + By default, IPv6 is not used. +\item[\rcitem{web}{port}] Integer. The TCP~port number on which the + server should listen. The default port number used is~8000. +\end{itemize} + +\subsubsection{Choosing the right \hgrc\ file to add \rcsection{web} + items to} + +It is important to remember that a web server like Apache or +\texttt{lighttpd} will run under a user~ID that is different to yours. +CGI scripts run by your server, such as \sfilename{hgweb.cgi}, will +usually also run under that user~ID. + +If you add \rcsection{web} items to your own personal \hgrc\ file, CGI +scripts won't read that \hgrc\ file. Those settings will thus only +affect the behaviour of the \hgcmd{serve} command when you run it. To +cause CGI scripts to see your settings, either create a \hgrc\ file in +the home directory of the user ID that runs your web server, or add +those settings to a system-wide \hgrc\ file. + + +%%% Local Variables: +%%% mode: latex +%%% TeX-master: "00book" +%%% End: diff -r bc14f94e726a -r 5cd47f721686 en/ch07-filenames.tex --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/en/ch07-filenames.tex Thu Jan 29 22:56:27 2009 -0800 @@ -0,0 +1,306 @@ +\chapter{File names and pattern matching} +\label{chap:names} + +Mercurial provides mechanisms that let you work with file names in a +consistent and expressive way. + +\section{Simple file naming} + +Mercurial uses a unified piece of machinery ``under the hood'' to +handle file names. Every command behaves uniformly with respect to +file names. The way in which commands work with file names is as +follows. + +If you explicitly name real files on the command line, Mercurial works +with exactly those files, as you would expect. +\interaction{filenames.files} + +When you provide a directory name, Mercurial will interpret this as +``operate on every file in this directory and its subdirectories''. +Mercurial traverses the files and subdirectories in a directory in +alphabetical order. When it encounters a subdirectory, it will +traverse that subdirectory before continuing with the current +directory. +\interaction{filenames.dirs} + +\section{Running commands without any file names} + +Mercurial's commands that work with file names have useful default +behaviours when you invoke them without providing any file names or +patterns. What kind of behaviour you should expect depends on what +the command does. Here are a few rules of thumb you can use to +predict what a command is likely to do if you don't give it any names +to work with. +\begin{itemize} +\item Most commands will operate on the entire working directory. + This is what the \hgcmd{add} command does, for example. +\item If the command has effects that are difficult or impossible to + reverse, it will force you to explicitly provide at least one name + or pattern (see below). This protects you from accidentally + deleting files by running \hgcmd{remove} with no arguments, for + example. +\end{itemize} + +It's easy to work around these default behaviours if they don't suit +you. If a command normally operates on the whole working directory, +you can invoke it on just the current directory and its subdirectories +by giving it the name ``\dirname{.}''. +\interaction{filenames.wdir-subdir} + +Along the same lines, some commands normally print file names relative +to the root of the repository, even if you're invoking them from a +subdirectory. Such a command will print file names relative to your +subdirectory if you give it explicit names. Here, we're going to run +\hgcmd{status} from a subdirectory, and get it to operate on the +entire working directory while printing file names relative to our +subdirectory, by passing it the output of the \hgcmd{root} command. +\interaction{filenames.wdir-relname} + +\section{Telling you what's going on} + +The \hgcmd{add} example in the preceding section illustrates something +else that's helpful about Mercurial commands. If a command operates +on a file that you didn't name explicitly on the command line, it will +usually print the name of the file, so that you will not be surprised +what's going on. + +The principle here is of \emph{least surprise}. If you've exactly +named a file on the command line, there's no point in repeating it +back at you. If Mercurial is acting on a file \emph{implicitly}, +because you provided no names, or a directory, or a pattern (see +below), it's safest to tell you what it's doing. + +For commands that behave this way, you can silence them using the +\hggopt{-q} option. You can also get them to print the name of every +file, even those you've named explicitly, using the \hggopt{-v} +option. + +\section{Using patterns to identify files} + +In addition to working with file and directory names, Mercurial lets +you use \emph{patterns} to identify files. Mercurial's pattern +handling is expressive. + +On Unix-like systems (Linux, MacOS, etc.), the job of matching file +names to patterns normally falls to the shell. On these systems, you +must explicitly tell Mercurial that a name is a pattern. On Windows, +the shell does not expand patterns, so Mercurial will automatically +identify names that are patterns, and expand them for you. + +To provide a pattern in place of a regular name on the command line, +the mechanism is simple: +\begin{codesample2} + syntax:patternbody +\end{codesample2} +That is, a pattern is identified by a short text string that says what +kind of pattern this is, followed by a colon, followed by the actual +pattern. + +Mercurial supports two kinds of pattern syntax. The most frequently +used is called \texttt{glob}; this is the same kind of pattern +matching used by the Unix shell, and should be familiar to Windows +command prompt users, too. + +When Mercurial does automatic pattern matching on Windows, it uses +\texttt{glob} syntax. You can thus omit the ``\texttt{glob:}'' prefix +on Windows, but it's safe to use it, too. + +The \texttt{re} syntax is more powerful; it lets you specify patterns +using regular expressions, also known as regexps. + +By the way, in the examples that follow, notice that I'm careful to +wrap all of my patterns in quote characters, so that they won't get +expanded by the shell before Mercurial sees them. + +\subsection{Shell-style \texttt{glob} patterns} + +This is an overview of the kinds of patterns you can use when you're +matching on glob patterns. + +The ``\texttt{*}'' character matches any string, within a single +directory. +\interaction{filenames.glob.star} + +The ``\texttt{**}'' pattern matches any string, and crosses directory +boundaries. It's not a standard Unix glob token, but it's accepted by +several popular Unix shells, and is very useful. +\interaction{filenames.glob.starstar} + +The ``\texttt{?}'' pattern matches any single character. +\interaction{filenames.glob.question} + +The ``\texttt{[}'' character begins a \emph{character class}. This +matches any single character within the class. The class ends with a +``\texttt{]}'' character. A class may contain multiple \emph{range}s +of the form ``\texttt{a-f}'', which is shorthand for +``\texttt{abcdef}''. +\interaction{filenames.glob.range} +If the first character after the ``\texttt{[}'' in a character class +is a ``\texttt{!}'', it \emph{negates} the class, making it match any +single character not in the class. + +A ``\texttt{\{}'' begins a group of subpatterns, where the whole group +matches if any subpattern in the group matches. The ``\texttt{,}'' +character separates subpatterns, and ``\texttt{\}}'' ends the group. +\interaction{filenames.glob.group} + +\subsubsection{Watch out!} + +Don't forget that if you want to match a pattern in any directory, you +should not be using the ``\texttt{*}'' match-any token, as this will +only match within one directory. Instead, use the ``\texttt{**}'' +token. This small example illustrates the difference between the two. +\interaction{filenames.glob.star-starstar} + +\subsection{Regular expression matching with \texttt{re} patterns} + +Mercurial accepts the same regular expression syntax as the Python +programming language (it uses Python's regexp engine internally). +This is based on the Perl language's regexp syntax, which is the most +popular dialect in use (it's also used in Java, for example). + +I won't discuss Mercurial's regexp dialect in any detail here, as +regexps are not often used. Perl-style regexps are in any case +already exhaustively documented on a multitude of web sites, and in +many books. Instead, I will focus here on a few things you should +know if you find yourself needing to use regexps with Mercurial. + +A regexp is matched against an entire file name, relative to the root +of the repository. In other words, even if you're already in +subbdirectory \dirname{foo}, if you want to match files under this +directory, your pattern must start with ``\texttt{foo/}''. + +One thing to note, if you're familiar with Perl-style regexps, is that +Mercurial's are \emph{rooted}. That is, a regexp starts matching +against the beginning of a string; it doesn't look for a match +anywhere within the string. To match anywhere in a string, start +your pattern with ``\texttt{.*}''. + +\section{Filtering files} + +Not only does Mercurial give you a variety of ways to specify files; +it lets you further winnow those files using \emph{filters}. Commands +that work with file names accept two filtering options. +\begin{itemize} +\item \hggopt{-I}, or \hggopt{--include}, lets you specify a pattern + that file names must match in order to be processed. +\item \hggopt{-X}, or \hggopt{--exclude}, gives you a way to + \emph{avoid} processing files, if they match this pattern. +\end{itemize} +You can provide multiple \hggopt{-I} and \hggopt{-X} options on the +command line, and intermix them as you please. Mercurial interprets +the patterns you provide using glob syntax by default (but you can use +regexps if you need to). + +You can read a \hggopt{-I} filter as ``process only the files that +match this filter''. +\interaction{filenames.filter.include} +The \hggopt{-X} filter is best read as ``process only the files that +don't match this pattern''. +\interaction{filenames.filter.exclude} + +\section{Ignoring unwanted files and directories} + +XXX. + +\section{Case sensitivity} +\label{sec:names:case} + +If you're working in a mixed development environment that contains +both Linux (or other Unix) systems and Macs or Windows systems, you +should keep in the back of your mind the knowledge that they treat the +case (``N'' versus ``n'') of file names in incompatible ways. This is +not very likely to affect you, and it's easy to deal with if it does, +but it could surprise you if you don't know about it. + +Operating systems and filesystems differ in the way they handle the +\emph{case} of characters in file and directory names. There are +three common ways to handle case in names. +\begin{itemize} +\item Completely case insensitive. Uppercase and lowercase versions + of a letter are treated as identical, both when creating a file and + during subsequent accesses. This is common on older DOS-based + systems. +\item Case preserving, but insensitive. When a file or directory is + created, the case of its name is stored, and can be retrieved and + displayed by the operating system. When an existing file is being + looked up, its case is ignored. This is the standard arrangement on + Windows and MacOS. The names \filename{foo} and \filename{FoO} + identify the same file. This treatment of uppercase and lowercase + letters as interchangeable is also referred to as \emph{case + folding}. +\item Case sensitive. The case of a name is significant at all times. + The names \filename{foo} and {FoO} identify different files. This + is the way Linux and Unix systems normally work. +\end{itemize} + +On Unix-like systems, it is possible to have any or all of the above +ways of handling case in action at once. For example, if you use a +USB thumb drive formatted with a FAT32 filesystem on a Linux system, +Linux will handle names on that filesystem in a case preserving, but +insensitive, way. + +\subsection{Safe, portable repository storage} + +Mercurial's repository storage mechanism is \emph{case safe}. It +translates file names so that they can be safely stored on both case +sensitive and case insensitive filesystems. This means that you can +use normal file copying tools to transfer a Mercurial repository onto, +for example, a USB thumb drive, and safely move that drive and +repository back and forth between a Mac, a PC running Windows, and a +Linux box. + +\subsection{Detecting case conflicts} + +When operating in the working directory, Mercurial honours the naming +policy of the filesystem where the working directory is located. If +the filesystem is case preserving, but insensitive, Mercurial will +treat names that differ only in case as the same. + +An important aspect of this approach is that it is possible to commit +a changeset on a case sensitive (typically Linux or Unix) filesystem +that will cause trouble for users on case insensitive (usually Windows +and MacOS) users. If a Linux user commits changes to two files, one +named \filename{myfile.c} and the other named \filename{MyFile.C}, +they will be stored correctly in the repository. And in the working +directories of other Linux users, they will be correctly represented +as separate files. + +If a Windows or Mac user pulls this change, they will not initially +have a problem, because Mercurial's repository storage mechanism is +case safe. However, once they try to \hgcmd{update} the working +directory to that changeset, or \hgcmd{merge} with that changeset, +Mercurial will spot the conflict between the two file names that the +filesystem would treat as the same, and forbid the update or merge +from occurring. + +\subsection{Fixing a case conflict} + +If you are using Windows or a Mac in a mixed environment where some of +your collaborators are using Linux or Unix, and Mercurial reports a +case folding conflict when you try to \hgcmd{update} or \hgcmd{merge}, +the procedure to fix the problem is simple. + +Just find a nearby Linux or Unix box, clone the problem repository +onto it, and use Mercurial's \hgcmd{rename} command to change the +names of any offending files or directories so that they will no +longer cause case folding conflicts. Commit this change, \hgcmd{pull} +or \hgcmd{push} it across to your Windows or MacOS system, and +\hgcmd{update} to the revision with the non-conflicting names. + +The changeset with case-conflicting names will remain in your +project's history, and you still won't be able to \hgcmd{update} your +working directory to that changeset on a Windows or MacOS system, but +you can continue development unimpeded. + +\begin{note} + Prior to version~0.9.3, Mercurial did not use a case safe repository + storage mechanism, and did not detect case folding conflicts. If + you are using an older version of Mercurial on Windows or MacOS, I + strongly recommend that you upgrade. +\end{note} + +%%% Local Variables: +%%% mode: latex +%%% TeX-master: "00book" +%%% End: diff -r bc14f94e726a -r 5cd47f721686 en/ch08-branch.tex --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/en/ch08-branch.tex Thu Jan 29 22:56:27 2009 -0800 @@ -0,0 +1,392 @@ +\chapter{Managing releases and branchy development} +\label{chap:branch} + +Mercurial provides several mechanisms for you to manage a project that +is making progress on multiple fronts at once. To understand these +mechanisms, let's first take a brief look at a fairly normal software +project structure. + +Many software projects issue periodic ``major'' releases that contain +substantial new features. In parallel, they may issue ``minor'' +releases. These are usually identical to the major releases off which +they're based, but with a few bugs fixed. + +In this chapter, we'll start by talking about how to keep records of +project milestones such as releases. We'll then continue on to talk +about the flow of work between different phases of a project, and how +Mercurial can help you to isolate and manage this work. + +\section{Giving a persistent name to a revision} + +Once you decide that you'd like to call a particular revision a +``release'', it's a good idea to record the identity of that revision. +This will let you reproduce that release at a later date, for whatever +purpose you might need at the time (reproducing a bug, porting to a +new platform, etc). +\interaction{tag.init} + +Mercurial lets you give a permanent name to any revision using the +\hgcmd{tag} command. Not surprisingly, these names are called +``tags''. +\interaction{tag.tag} + +A tag is nothing more than a ``symbolic name'' for a revision. Tags +exist purely for your convenience, so that you have a handy permanent +way to refer to a revision; Mercurial doesn't interpret the tag names +you use in any way. Neither does Mercurial place any restrictions on +the name of a tag, beyond a few that are necessary to ensure that a +tag can be parsed unambiguously. A tag name cannot contain any of the +following characters: +\begin{itemize} +\item Colon (ASCII 58, ``\texttt{:}'') +\item Carriage return (ASCII 13, ``\Verb+\r+'') +\item Newline (ASCII 10, ``\Verb+\n+'') +\end{itemize} + +You can use the \hgcmd{tags} command to display the tags present in +your repository. In the output, each tagged revision is identified +first by its name, then by revision number, and finally by the unique +hash of the revision. +\interaction{tag.tags} +Notice that \texttt{tip} is listed in the output of \hgcmd{tags}. The +\texttt{tip} tag is a special ``floating'' tag, which always +identifies the newest revision in the repository. + +In the output of the \hgcmd{tags} command, tags are listed in reverse +order, by revision number. This usually means that recent tags are +listed before older tags. It also means that \texttt{tip} is always +going to be the first tag listed in the output of \hgcmd{tags}. + +When you run \hgcmd{log}, if it displays a revision that has tags +associated with it, it will print those tags. +\interaction{tag.log} + +Any time you need to provide a revision~ID to a Mercurial command, the +command will accept a tag name in its place. Internally, Mercurial +will translate your tag name into the corresponding revision~ID, then +use that. +\interaction{tag.log.v1.0} + +There's no limit on the number of tags you can have in a repository, +or on the number of tags that a single revision can have. As a +practical matter, it's not a great idea to have ``too many'' (a number +which will vary from project to project), simply because tags are +supposed to help you to find revisions. If you have lots of tags, the +ease of using them to identify revisions diminishes rapidly. + +For example, if your project has milestones as frequent as every few +days, it's perfectly reasonable to tag each one of those. But if you +have a continuous build system that makes sure every revision can be +built cleanly, you'd be introducing a lot of noise if you were to tag +every clean build. Instead, you could tag failed builds (on the +assumption that they're rare!), or simply not use tags to track +buildability. + +If you want to remove a tag that you no longer want, use +\hgcmdargs{tag}{--remove}. +\interaction{tag.remove} +You can also modify a tag at any time, so that it identifies a +different revision, by simply issuing a new \hgcmd{tag} command. +You'll have to use the \hgopt{tag}{-f} option to tell Mercurial that +you \emph{really} want to update the tag. +\interaction{tag.replace} +There will still be a permanent record of the previous identity of the +tag, but Mercurial will no longer use it. There's thus no penalty to +tagging the wrong revision; all you have to do is turn around and tag +the correct revision once you discover your error. + +Mercurial stores tags in a normal revision-controlled file in your +repository. If you've created any tags, you'll find them in a file +named \sfilename{.hgtags}. When you run the \hgcmd{tag} command, +Mercurial modifies this file, then automatically commits the change to +it. This means that every time you run \hgcmd{tag}, you'll see a +corresponding changeset in the output of \hgcmd{log}. +\interaction{tag.tip} + +\subsection{Handling tag conflicts during a merge} + +You won't often need to care about the \sfilename{.hgtags} file, but +it sometimes makes its presence known during a merge. The format of +the file is simple: it consists of a series of lines. Each line +starts with a changeset hash, followed by a space, followed by the +name of a tag. + +If you're resolving a conflict in the \sfilename{.hgtags} file during +a merge, there's one twist to modifying the \sfilename{.hgtags} file: +when Mercurial is parsing the tags in a repository, it \emph{never} +reads the working copy of the \sfilename{.hgtags} file. Instead, it +reads the \emph{most recently committed} revision of the file. + +An unfortunate consequence of this design is that you can't actually +verify that your merged \sfilename{.hgtags} file is correct until +\emph{after} you've committed a change. So if you find yourself +resolving a conflict on \sfilename{.hgtags} during a merge, be sure to +run \hgcmd{tags} after you commit. If it finds an error in the +\sfilename{.hgtags} file, it will report the location of the error, +which you can then fix and commit. You should then run \hgcmd{tags} +again, just to be sure that your fix is correct. + +\subsection{Tags and cloning} + +You may have noticed that the \hgcmd{clone} command has a +\hgopt{clone}{-r} option that lets you clone an exact copy of the +repository as of a particular changeset. The new clone will not +contain any project history that comes after the revision you +specified. This has an interaction with tags that can surprise the +unwary. + +Recall that a tag is stored as a revision to the \sfilename{.hgtags} +file, so that when you create a tag, the changeset in which it's +recorded necessarily refers to an older changeset. When you run +\hgcmdargs{clone}{-r foo} to clone a repository as of tag +\texttt{foo}, the new clone \emph{will not contain the history that + created the tag} that you used to clone the repository. The result +is that you'll get exactly the right subset of the project's history +in the new repository, but \emph{not} the tag you might have expected. + +\subsection{When permanent tags are too much} + +Since Mercurial's tags are revision controlled and carried around with +a project's history, everyone you work with will see the tags you +create. But giving names to revisions has uses beyond simply noting +that revision \texttt{4237e45506ee} is really \texttt{v2.0.2}. If +you're trying to track down a subtle bug, you might want a tag to +remind you of something like ``Anne saw the symptoms with this +revision''. + +For cases like this, what you might want to use are \emph{local} tags. +You can create a local tag with the \hgopt{tag}{-l} option to the +\hgcmd{tag} command. This will store the tag in a file called +\sfilename{.hg/localtags}. Unlike \sfilename{.hgtags}, +\sfilename{.hg/localtags} is not revision controlled. Any tags you +create using \hgopt{tag}{-l} remain strictly local to the repository +you're currently working in. + +\section{The flow of changes---big picture vs. little} + +To return to the outline I sketched at the beginning of a chapter, +let's think about a project that has multiple concurrent pieces of +work under development at once. + +There might be a push for a new ``main'' release; a new minor bugfix +release to the last main release; and an unexpected ``hot fix'' to an +old release that is now in maintenance mode. + +The usual way people refer to these different concurrent directions of +development is as ``branches''. However, we've already seen numerous +times that Mercurial treats \emph{all of history} as a series of +branches and merges. Really, what we have here is two ideas that are +peripherally related, but which happen to share a name. +\begin{itemize} +\item ``Big picture'' branches represent the sweep of a project's + evolution; people give them names, and talk about them in + conversation. +\item ``Little picture'' branches are artefacts of the day-to-day + activity of developing and merging changes. They expose the + narrative of how the code was developed. +\end{itemize} + +\section{Managing big-picture branches in repositories} + +The easiest way to isolate a ``big picture'' branch in Mercurial is in +a dedicated repository. If you have an existing shared +repository---let's call it \texttt{myproject}---that reaches a ``1.0'' +milestone, you can start to prepare for future maintenance releases on +top of version~1.0 by tagging the revision from which you prepared +the~1.0 release. +\interaction{branch-repo.tag} +You can then clone a new shared \texttt{myproject-1.0.1} repository as +of that tag. +\interaction{branch-repo.clone} + +Afterwards, if someone needs to work on a bug fix that ought to go +into an upcoming~1.0.1 minor release, they clone the +\texttt{myproject-1.0.1} repository, make their changes, and push them +back. +\interaction{branch-repo.bugfix} +Meanwhile, development for the next major release can continue, +isolated and unabated, in the \texttt{myproject} repository. +\interaction{branch-repo.new} + +\section{Don't repeat yourself: merging across branches} + +In many cases, if you have a bug to fix on a maintenance branch, the +chances are good that the bug exists on your project's main branch +(and possibly other maintenance branches, too). It's a rare developer +who wants to fix the same bug multiple times, so let's look at a few +ways that Mercurial can help you to manage these bugfixes without +duplicating your work. + +In the simplest instance, all you need to do is pull changes from your +maintenance branch into your local clone of the target branch. +\interaction{branch-repo.pull} +You'll then need to merge the heads of the two branches, and push back +to the main branch. +\interaction{branch-repo.merge} + +\section{Naming branches within one repository} + +In most instances, isolating branches in repositories is the right +approach. Its simplicity makes it easy to understand; and so it's +hard to make mistakes. There's a one-to-one relationship between +branches you're working in and directories on your system. This lets +you use normal (non-Mercurial-aware) tools to work on files within a +branch/repository. + +If you're more in the ``power user'' category (\emph{and} your +collaborators are too), there is an alternative way of handling +branches that you can consider. I've already mentioned the +human-level distinction between ``small picture'' and ``big picture'' +branches. While Mercurial works with multiple ``small picture'' +branches in a repository all the time (for example after you pull +changes in, but before you merge them), it can \emph{also} work with +multiple ``big picture'' branches. + +The key to working this way is that Mercurial lets you assign a +persistent \emph{name} to a branch. There always exists a branch +named \texttt{default}. Even before you start naming branches +yourself, you can find traces of the \texttt{default} branch if you +look for them. + +As an example, when you run the \hgcmd{commit} command, and it pops up +your editor so that you can enter a commit message, look for a line +that contains the text ``\texttt{HG: branch default}'' at the bottom. +This is telling you that your commit will occur on the branch named +\texttt{default}. + +To start working with named branches, use the \hgcmd{branches} +command. This command lists the named branches already present in +your repository, telling you which changeset is the tip of each. +\interaction{branch-named.branches} +Since you haven't created any named branches yet, the only one that +exists is \texttt{default}. + +To find out what the ``current'' branch is, run the \hgcmd{branch} +command, giving it no arguments. This tells you what branch the +parent of the current changeset is on. +\interaction{branch-named.branch} + +To create a new branch, run the \hgcmd{branch} command again. This +time, give it one argument: the name of the branch you want to create. +\interaction{branch-named.create} + +After you've created a branch, you might wonder what effect the +\hgcmd{branch} command has had. What do the \hgcmd{status} and +\hgcmd{tip} commands report? +\interaction{branch-named.status} +Nothing has changed in the working directory, and there's been no new +history created. As this suggests, running the \hgcmd{branch} command +has no permanent effect; it only tells Mercurial what branch name to +use the \emph{next} time you commit a changeset. + +When you commit a change, Mercurial records the name of the branch on +which you committed. Once you've switched from the \texttt{default} +branch to another and committed, you'll see the name of the new branch +show up in the output of \hgcmd{log}, \hgcmd{tip}, and other commands +that display the same kind of output. +\interaction{branch-named.commit} +The \hgcmd{log}-like commands will print the branch name of every +changeset that's not on the \texttt{default} branch. As a result, if +you never use named branches, you'll never see this information. + +Once you've named a branch and committed a change with that name, +every subsequent commit that descends from that change will inherit +the same branch name. You can change the name of a branch at any +time, using the \hgcmd{branch} command. +\interaction{branch-named.rebranch} +In practice, this is something you won't do very often, as branch +names tend to have fairly long lifetimes. (This isn't a rule, just an +observation.) + +\section{Dealing with multiple named branches in a repository} + +If you have more than one named branch in a repository, Mercurial will +remember the branch that your working directory on when you start a +command like \hgcmd{update} or \hgcmdargs{pull}{-u}. It will update +the working directory to the tip of this branch, no matter what the +``repo-wide'' tip is. To update to a revision that's on a different +named branch, you may need to use the \hgopt{update}{-C} option to +\hgcmd{update}. + +This behaviour is a little subtle, so let's see it in action. First, +let's remind ourselves what branch we're currently on, and what +branches are in our repository. +\interaction{branch-named.parents} +We're on the \texttt{bar} branch, but there also exists an older +\hgcmd{foo} branch. + +We can \hgcmd{update} back and forth between the tips of the +\texttt{foo} and \texttt{bar} branches without needing to use the +\hgopt{update}{-C} option, because this only involves going backwards +and forwards linearly through our change history. +\interaction{branch-named.update-switchy} + +If we go back to the \texttt{foo} branch and then run \hgcmd{update}, +it will keep us on \texttt{foo}, not move us to the tip of +\texttt{bar}. +\interaction{branch-named.update-nothing} + +Committing a new change on the \texttt{foo} branch introduces a new +head. +\interaction{branch-named.foo-commit} + +\section{Branch names and merging} + +As you've probably noticed, merges in Mercurial are not symmetrical. +Let's say our repository has two heads, 17 and 23. If I +\hgcmd{update} to 17 and then \hgcmd{merge} with 23, Mercurial records +17 as the first parent of the merge, and 23 as the second. Whereas if +I \hgcmd{update} to 23 and then \hgcmd{merge} with 17, it records 23 +as the first parent, and 17 as the second. + +This affects Mercurial's choice of branch name when you merge. After +a merge, Mercurial will retain the branch name of the first parent +when you commit the result of the merge. If your first parent's +branch name is \texttt{foo}, and you merge with \texttt{bar}, the +branch name will still be \texttt{foo} after you merge. + +It's not unusual for a repository to contain multiple heads, each with +the same branch name. Let's say I'm working on the \texttt{foo} +branch, and so are you. We commit different changes; I pull your +changes; I now have two heads, each claiming to be on the \texttt{foo} +branch. The result of a merge will be a single head on the +\texttt{foo} branch, as you might hope. + +But if I'm working on the \texttt{bar} branch, and I merge work from +the \texttt{foo} branch, the result will remain on the \texttt{bar} +branch. +\interaction{branch-named.merge} + +To give a more concrete example, if I'm working on the +\texttt{bleeding-edge} branch, and I want to bring in the latest fixes +from the \texttt{stable} branch, Mercurial will choose the ``right'' +(\texttt{bleeding-edge}) branch name when I pull and merge from +\texttt{stable}. + +\section{Branch naming is generally useful} + +You shouldn't think of named branches as applicable only to situations +where you have multiple long-lived branches cohabiting in a single +repository. They're very useful even in the one-branch-per-repository +case. + +In the simplest case, giving a name to each branch gives you a +permanent record of which branch a changeset originated on. This +gives you more context when you're trying to follow the history of a +long-lived branchy project. + +If you're working with shared repositories, you can set up a +\hook{pretxnchangegroup} hook on each that will block incoming changes +that have the ``wrong'' branch name. This provides a simple, but +effective, defence against people accidentally pushing changes from a +``bleeding edge'' branch to a ``stable'' branch. Such a hook might +look like this inside the shared repo's \hgrc. +\begin{codesample2} + [hooks] + pretxnchangegroup.branch = hg heads --template '{branches} ' | grep mybranch +\end{codesample2} + +%%% Local Variables: +%%% mode: latex +%%% TeX-master: "00book" +%%% End: diff -r bc14f94e726a -r 5cd47f721686 en/ch09-undo.tex --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/en/ch09-undo.tex Thu Jan 29 22:56:27 2009 -0800 @@ -0,0 +1,767 @@ +\chapter{Finding and fixing your mistakes} +\label{chap:undo} + +To err might be human, but to really handle the consequences well +takes a top-notch revision control system. In this chapter, we'll +discuss some of the techniques you can use when you find that a +problem has crept into your project. Mercurial has some highly +capable features that will help you to isolate the sources of +problems, and to handle them appropriately. + +\section{Erasing local history} + +\subsection{The accidental commit} + +I have the occasional but persistent problem of typing rather more +quickly than I can think, which sometimes results in me committing a +changeset that is either incomplete or plain wrong. In my case, the +usual kind of incomplete changeset is one in which I've created a new +source file, but forgotten to \hgcmd{add} it. A ``plain wrong'' +changeset is not as common, but no less annoying. + +\subsection{Rolling back a transaction} +\label{sec:undo:rollback} + +In section~\ref{sec:concepts:txn}, I mentioned that Mercurial treats +each modification of a repository as a \emph{transaction}. Every time +you commit a changeset or pull changes from another repository, +Mercurial remembers what you did. You can undo, or \emph{roll back}, +exactly one of these actions using the \hgcmd{rollback} command. (See +section~\ref{sec:undo:rollback-after-push} for an important caveat +about the use of this command.) + +Here's a mistake that I often find myself making: committing a change +in which I've created a new file, but forgotten to \hgcmd{add} it. +\interaction{rollback.commit} +Looking at the output of \hgcmd{status} after the commit immediately +confirms the error. +\interaction{rollback.status} +The commit captured the changes to the file \filename{a}, but not the +new file \filename{b}. If I were to push this changeset to a +repository that I shared with a colleague, the chances are high that +something in \filename{a} would refer to \filename{b}, which would not +be present in their repository when they pulled my changes. I would +thus become the object of some indignation. + +However, luck is with me---I've caught my error before I pushed the +changeset. I use the \hgcmd{rollback} command, and Mercurial makes +that last changeset vanish. +\interaction{rollback.rollback} +Notice that the changeset is no longer present in the repository's +history, and the working directory once again thinks that the file +\filename{a} is modified. The commit and rollback have left the +working directory exactly as it was prior to the commit; the changeset +has been completely erased. I can now safely \hgcmd{add} the file +\filename{b}, and rerun my commit. +\interaction{rollback.add} + +\subsection{The erroneous pull} + +It's common practice with Mercurial to maintain separate development +branches of a project in different repositories. Your development +team might have one shared repository for your project's ``0.9'' +release, and another, containing different changes, for the ``1.0'' +release. + +Given this, you can imagine that the consequences could be messy if +you had a local ``0.9'' repository, and accidentally pulled changes +from the shared ``1.0'' repository into it. At worst, you could be +paying insufficient attention, and push those changes into the shared +``0.9'' tree, confusing your entire team (but don't worry, we'll +return to this horror scenario later). However, it's more likely that +you'll notice immediately, because Mercurial will display the URL it's +pulling from, or you will see it pull a suspiciously large number of +changes into the repository. + +The \hgcmd{rollback} command will work nicely to expunge all of the +changesets that you just pulled. Mercurial groups all changes from +one \hgcmd{pull} into a single transaction, so one \hgcmd{rollback} is +all you need to undo this mistake. + +\subsection{Rolling back is useless once you've pushed} +\label{sec:undo:rollback-after-push} + +The value of the \hgcmd{rollback} command drops to zero once you've +pushed your changes to another repository. Rolling back a change +makes it disappear entirely, but \emph{only} in the repository in +which you perform the \hgcmd{rollback}. Because a rollback eliminates +history, there's no way for the disappearance of a change to propagate +between repositories. + +If you've pushed a change to another repository---particularly if it's +a shared repository---it has essentially ``escaped into the wild,'' +and you'll have to recover from your mistake in a different way. What +will happen if you push a changeset somewhere, then roll it back, then +pull from the repository you pushed to, is that the changeset will +reappear in your repository. + +(If you absolutely know for sure that the change you want to roll back +is the most recent change in the repository that you pushed to, +\emph{and} you know that nobody else could have pulled it from that +repository, you can roll back the changeset there, too, but you really +should really not rely on this working reliably. If you do this, +sooner or later a change really will make it into a repository that +you don't directly control (or have forgotten about), and come back to +bite you.) + +\subsection{You can only roll back once} + +Mercurial stores exactly one transaction in its transaction log; that +transaction is the most recent one that occurred in the repository. +This means that you can only roll back one transaction. If you expect +to be able to roll back one transaction, then its predecessor, this is +not the behaviour you will get. +\interaction{rollback.twice} +Once you've rolled back one transaction in a repository, you can't +roll back again in that repository until you perform another commit or +pull. + +\section{Reverting the mistaken change} + +If you make a modification to a file, and decide that you really +didn't want to change the file at all, and you haven't yet committed +your changes, the \hgcmd{revert} command is the one you'll need. It +looks at the changeset that's the parent of the working directory, and +restores the contents of the file to their state as of that changeset. +(That's a long-winded way of saying that, in the normal case, it +undoes your modifications.) + +Let's illustrate how the \hgcmd{revert} command works with yet another +small example. We'll begin by modifying a file that Mercurial is +already tracking. +\interaction{daily.revert.modify} +If we don't want that change, we can simply \hgcmd{revert} the file. +\interaction{daily.revert.unmodify} +The \hgcmd{revert} command provides us with an extra degree of safety +by saving our modified file with a \filename{.orig} extension. +\interaction{daily.revert.status} + +Here is a summary of the cases that the \hgcmd{revert} command can +deal with. We will describe each of these in more detail in the +section that follows. +\begin{itemize} +\item If you modify a file, it will restore the file to its unmodified + state. +\item If you \hgcmd{add} a file, it will undo the ``added'' state of + the file, but leave the file itself untouched. +\item If you delete a file without telling Mercurial, it will restore + the file to its unmodified contents. +\item If you use the \hgcmd{remove} command to remove a file, it will + undo the ``removed'' state of the file, and restore the file to its + unmodified contents. +\end{itemize} + +\subsection{File management errors} +\label{sec:undo:mgmt} + +The \hgcmd{revert} command is useful for more than just modified +files. It lets you reverse the results of all of Mercurial's file +management commands---\hgcmd{add}, \hgcmd{remove}, and so on. + +If you \hgcmd{add} a file, then decide that in fact you don't want +Mercurial to track it, use \hgcmd{revert} to undo the add. Don't +worry; Mercurial will not modify the file in any way. It will just +``unmark'' the file. +\interaction{daily.revert.add} + +Similarly, if you ask Mercurial to \hgcmd{remove} a file, you can use +\hgcmd{revert} to restore it to the contents it had as of the parent +of the working directory. +\interaction{daily.revert.remove} +This works just as well for a file that you deleted by hand, without +telling Mercurial (recall that in Mercurial terminology, this kind of +file is called ``missing''). +\interaction{daily.revert.missing} + +If you revert a \hgcmd{copy}, the copied-to file remains in your +working directory afterwards, untracked. Since a copy doesn't affect +the copied-from file in any way, Mercurial doesn't do anything with +the copied-from file. +\interaction{daily.revert.copy} + +\subsubsection{A slightly special case: reverting a rename} + +If you \hgcmd{rename} a file, there is one small detail that +you should remember. When you \hgcmd{revert} a rename, it's not +enough to provide the name of the renamed-to file, as you can see +here. +\interaction{daily.revert.rename} +As you can see from the output of \hgcmd{status}, the renamed-to file +is no longer identified as added, but the renamed-\emph{from} file is +still removed! This is counter-intuitive (at least to me), but at +least it's easy to deal with. +\interaction{daily.revert.rename-orig} +So remember, to revert a \hgcmd{rename}, you must provide \emph{both} +the source and destination names. + +% TODO: the output doesn't look like it will be removed! + +(By the way, if you rename a file, then modify the renamed-to file, +then revert both components of the rename, when Mercurial restores the +file that was removed as part of the rename, it will be unmodified. +If you need the modifications in the renamed-to file to show up in the +renamed-from file, don't forget to copy them over.) + +These fiddly aspects of reverting a rename arguably constitute a small +bug in Mercurial. + +\section{Dealing with committed changes} + +Consider a case where you have committed a change $a$, and another +change $b$ on top of it; you then realise that change $a$ was +incorrect. Mercurial lets you ``back out'' an entire changeset +automatically, and building blocks that let you reverse part of a +changeset by hand. + +Before you read this section, here's something to keep in mind: the +\hgcmd{backout} command undoes changes by \emph{adding} history, not +by modifying or erasing it. It's the right tool to use if you're +fixing bugs, but not if you're trying to undo some change that has +catastrophic consequences. To deal with those, see +section~\ref{sec:undo:aaaiiieee}. + +\subsection{Backing out a changeset} + +The \hgcmd{backout} command lets you ``undo'' the effects of an entire +changeset in an automated fashion. Because Mercurial's history is +immutable, this command \emph{does not} get rid of the changeset you +want to undo. Instead, it creates a new changeset that +\emph{reverses} the effect of the to-be-undone changeset. + +The operation of the \hgcmd{backout} command is a little intricate, so +let's illustrate it with some examples. First, we'll create a +repository with some simple changes. +\interaction{backout.init} + +The \hgcmd{backout} command takes a single changeset ID as its +argument; this is the changeset to back out. Normally, +\hgcmd{backout} will drop you into a text editor to write a commit +message, so you can record why you're backing the change out. In this +example, we provide a commit message on the command line using the +\hgopt{backout}{-m} option. + +\subsection{Backing out the tip changeset} + +We're going to start by backing out the last changeset we committed. +\interaction{backout.simple} +You can see that the second line from \filename{myfile} is no longer +present. Taking a look at the output of \hgcmd{log} gives us an idea +of what the \hgcmd{backout} command has done. +\interaction{backout.simple.log} +Notice that the new changeset that \hgcmd{backout} has created is a +child of the changeset we backed out. It's easier to see this in +figure~\ref{fig:undo:backout}, which presents a graphical view of the +change history. As you can see, the history is nice and linear. + +\begin{figure}[htb] + \centering + \grafix{undo-simple} + \caption{Backing out a change using the \hgcmd{backout} command} + \label{fig:undo:backout} +\end{figure} + +\subsection{Backing out a non-tip change} + +If you want to back out a change other than the last one you +committed, pass the \hgopt{backout}{--merge} option to the +\hgcmd{backout} command. +\interaction{backout.non-tip.clone} +This makes backing out any changeset a ``one-shot'' operation that's +usually simple and fast. +\interaction{backout.non-tip.backout} + +If you take a look at the contents of \filename{myfile} after the +backout finishes, you'll see that the first and third changes are +present, but not the second. +\interaction{backout.non-tip.cat} + +As the graphical history in figure~\ref{fig:undo:backout-non-tip} +illustrates, Mercurial actually commits \emph{two} changes in this +kind of situation (the box-shaped nodes are the ones that Mercurial +commits automatically). Before Mercurial begins the backout process, +it first remembers what the current parent of the working directory +is. It then backs out the target changeset, and commits that as a +changeset. Finally, it merges back to the previous parent of the +working directory, and commits the result of the merge. + +% TODO: to me it looks like mercurial doesn't commit the second merge automatically! + +\begin{figure}[htb] + \centering + \grafix{undo-non-tip} + \caption{Automated backout of a non-tip change using the \hgcmd{backout} command} + \label{fig:undo:backout-non-tip} +\end{figure} + +The result is that you end up ``back where you were'', only with some +extra history that undoes the effect of the changeset you wanted to +back out. + +\subsubsection{Always use the \hgopt{backout}{--merge} option} + +In fact, since the \hgopt{backout}{--merge} option will do the ``right +thing'' whether or not the changeset you're backing out is the tip +(i.e.~it won't try to merge if it's backing out the tip, since there's +no need), you should \emph{always} use this option when you run the +\hgcmd{backout} command. + +\subsection{Gaining more control of the backout process} + +While I've recommended that you always use the +\hgopt{backout}{--merge} option when backing out a change, the +\hgcmd{backout} command lets you decide how to merge a backout +changeset. Taking control of the backout process by hand is something +you will rarely need to do, but it can be useful to understand what +the \hgcmd{backout} command is doing for you automatically. To +illustrate this, let's clone our first repository, but omit the +backout change that it contains. + +\interaction{backout.manual.clone} +As with our earlier example, We'll commit a third changeset, then back +out its parent, and see what happens. +\interaction{backout.manual.backout} +Our new changeset is again a descendant of the changeset we backout +out; it's thus a new head, \emph{not} a descendant of the changeset +that was the tip. The \hgcmd{backout} command was quite explicit in +telling us this. +\interaction{backout.manual.log} + +Again, it's easier to see what has happened by looking at a graph of +the revision history, in figure~\ref{fig:undo:backout-manual}. This +makes it clear that when we use \hgcmd{backout} to back out a change +other than the tip, Mercurial adds a new head to the repository (the +change it committed is box-shaped). + +\begin{figure}[htb] + \centering + \grafix{undo-manual} + \caption{Backing out a change using the \hgcmd{backout} command} + \label{fig:undo:backout-manual} +\end{figure} + +After the \hgcmd{backout} command has completed, it leaves the new +``backout'' changeset as the parent of the working directory. +\interaction{backout.manual.parents} +Now we have two isolated sets of changes. +\interaction{backout.manual.heads} + +Let's think about what we expect to see as the contents of +\filename{myfile} now. The first change should be present, because +we've never backed it out. The second change should be missing, as +that's the change we backed out. Since the history graph shows the +third change as a separate head, we \emph{don't} expect to see the +third change present in \filename{myfile}. +\interaction{backout.manual.cat} +To get the third change back into the file, we just do a normal merge +of our two heads. +\interaction{backout.manual.merge} +Afterwards, the graphical history of our repository looks like +figure~\ref{fig:undo:backout-manual-merge}. + +\begin{figure}[htb] + \centering + \grafix{undo-manual-merge} + \caption{Manually merging a backout change} + \label{fig:undo:backout-manual-merge} +\end{figure} + +\subsection{Why \hgcmd{backout} works as it does} + +Here's a brief description of how the \hgcmd{backout} command works. +\begin{enumerate} +\item It ensures that the working directory is ``clean'', i.e.~that + the output of \hgcmd{status} would be empty. +\item It remembers the current parent of the working directory. Let's + call this changeset \texttt{orig} +\item It does the equivalent of a \hgcmd{update} to sync the working + directory to the changeset you want to back out. Let's call this + changeset \texttt{backout} +\item It finds the parent of that changeset. Let's call that + changeset \texttt{parent}. +\item For each file that the \texttt{backout} changeset affected, it + does the equivalent of a \hgcmdargs{revert}{-r parent} on that file, + to restore it to the contents it had before that changeset was + committed. +\item It commits the result as a new changeset. This changeset has + \texttt{backout} as its parent. +\item If you specify \hgopt{backout}{--merge} on the command line, it + merges with \texttt{orig}, and commits the result of the merge. +\end{enumerate} + +An alternative way to implement the \hgcmd{backout} command would be +to \hgcmd{export} the to-be-backed-out changeset as a diff, then use +the \cmdopt{patch}{--reverse} option to the \command{patch} command to +reverse the effect of the change without fiddling with the working +directory. This sounds much simpler, but it would not work nearly as +well. + +The reason that \hgcmd{backout} does an update, a commit, a merge, and +another commit is to give the merge machinery the best chance to do a +good job when dealing with all the changes \emph{between} the change +you're backing out and the current tip. + +If you're backing out a changeset that's~100 revisions back in your +project's history, the chances that the \command{patch} command will +be able to apply a reverse diff cleanly are not good, because +intervening changes are likely to have ``broken the context'' that +\command{patch} uses to determine whether it can apply a patch (if +this sounds like gibberish, see \ref{sec:mq:patch} for a +discussion of the \command{patch} command). Also, Mercurial's merge +machinery will handle files and directories being renamed, permission +changes, and modifications to binary files, none of which +\command{patch} can deal with. + +\section{Changes that should never have been} +\label{sec:undo:aaaiiieee} + +Most of the time, the \hgcmd{backout} command is exactly what you need +if you want to undo the effects of a change. It leaves a permanent +record of exactly what you did, both when committing the original +changeset and when you cleaned up after it. + +On rare occasions, though, you may find that you've committed a change +that really should not be present in the repository at all. For +example, it would be very unusual, and usually considered a mistake, +to commit a software project's object files as well as its source +files. Object files have almost no intrinsic value, and they're +\emph{big}, so they increase the size of the repository and the amount +of time it takes to clone or pull changes. + +Before I discuss the options that you have if you commit a ``brown +paper bag'' change (the kind that's so bad that you want to pull a +brown paper bag over your head), let me first discuss some approaches +that probably won't work. + +Since Mercurial treats history as accumulative---every change builds +on top of all changes that preceded it---you generally can't just make +disastrous changes disappear. The one exception is when you've just +committed a change, and it hasn't been pushed or pulled into another +repository. That's when you can safely use the \hgcmd{rollback} +command, as I detailed in section~\ref{sec:undo:rollback}. + +After you've pushed a bad change to another repository, you +\emph{could} still use \hgcmd{rollback} to make your local copy of the +change disappear, but it won't have the consequences you want. The +change will still be present in the remote repository, so it will +reappear in your local repository the next time you pull. + +If a situation like this arises, and you know which repositories your +bad change has propagated into, you can \emph{try} to get rid of the +changeefrom \emph{every} one of those repositories. This is, of +course, not a satisfactory solution: if you miss even a single +repository while you're expunging, the change is still ``in the +wild'', and could propagate further. + +If you've committed one or more changes \emph{after} the change that +you'd like to see disappear, your options are further reduced. +Mercurial doesn't provide a way to ``punch a hole'' in history, +leaving changesets intact. + +XXX This needs filling out. The \texttt{hg-replay} script in the +\texttt{examples} directory works, but doesn't handle merge +changesets. Kind of an important omission. + +\subsection{Protect yourself from ``escaped'' changes} + +If you've committed some changes to your local repository and they've +been pushed or pulled somewhere else, this isn't necessarily a +disaster. You can protect yourself ahead of time against some classes +of bad changeset. This is particularly easy if your team usually +pulls changes from a central repository. + +By configuring some hooks on that repository to validate incoming +changesets (see chapter~\ref{chap:hook}), you can automatically +prevent some kinds of bad changeset from being pushed to the central +repository at all. With such a configuration in place, some kinds of +bad changeset will naturally tend to ``die out'' because they can't +propagate into the central repository. Better yet, this happens +without any need for explicit intervention. + +For instance, an incoming change hook that verifies that a changeset +will actually compile can prevent people from inadvertantly ``breaking +the build''. + +\section{Finding the source of a bug} +\label{sec:undo:bisect} + +While it's all very well to be able to back out a changeset that +introduced a bug, this requires that you know which changeset to back +out. Mercurial provides an invaluable command, called +\hgcmd{bisect}, that helps you to automate this process and accomplish +it very efficiently. + +The idea behind the \hgcmd{bisect} command is that a changeset has +introduced some change of behaviour that you can identify with a +simple binary test. You don't know which piece of code introduced the +change, but you know how to test for the presence of the bug. The +\hgcmd{bisect} command uses your test to direct its search for the +changeset that introduced the code that caused the bug. + +Here are a few scenarios to help you understand how you might apply +this command. +\begin{itemize} +\item The most recent version of your software has a bug that you + remember wasn't present a few weeks ago, but you don't know when it + was introduced. Here, your binary test checks for the presence of + that bug. +\item You fixed a bug in a rush, and now it's time to close the entry + in your team's bug database. The bug database requires a changeset + ID when you close an entry, but you don't remember which changeset + you fixed the bug in. Once again, your binary test checks for the + presence of the bug. +\item Your software works correctly, but runs~15\% slower than the + last time you measured it. You want to know which changeset + introduced the performance regression. In this case, your binary + test measures the performance of your software, to see whether it's + ``fast'' or ``slow''. +\item The sizes of the components of your project that you ship + exploded recently, and you suspect that something changed in the way + you build your project. +\end{itemize} + +From these examples, it should be clear that the \hgcmd{bisect} +command is not useful only for finding the sources of bugs. You can +use it to find any ``emergent property'' of a repository (anything +that you can't find from a simple text search of the files in the +tree) for which you can write a binary test. + +We'll introduce a little bit of terminology here, just to make it +clear which parts of the search process are your responsibility, and +which are Mercurial's. A \emph{test} is something that \emph{you} run +when \hgcmd{bisect} chooses a changeset. A \emph{probe} is what +\hgcmd{bisect} runs to tell whether a revision is good. Finally, +we'll use the word ``bisect'', as both a noun and a verb, to stand in +for the phrase ``search using the \hgcmd{bisect} command. + +One simple way to automate the searching process would be simply to +probe every changeset. However, this scales poorly. If it took ten +minutes to test a single changeset, and you had 10,000 changesets in +your repository, the exhaustive approach would take on average~35 +\emph{days} to find the changeset that introduced a bug. Even if you +knew that the bug was introduced by one of the last 500 changesets, +and limited your search to those, you'd still be looking at over 40 +hours to find the changeset that introduced your bug. + +What the \hgcmd{bisect} command does is use its knowledge of the +``shape'' of your project's revision history to perform a search in +time proportional to the \emph{logarithm} of the number of changesets +to check (the kind of search it performs is called a dichotomic +search). With this approach, searching through 10,000 changesets will +take less than three hours, even at ten minutes per test (the search +will require about 14 tests). Limit your search to the last hundred +changesets, and it will take only about an hour (roughly seven tests). + +The \hgcmd{bisect} command is aware of the ``branchy'' nature of a +Mercurial project's revision history, so it has no problems dealing +with branches, merges, or multiple heads in a repository. It can +prune entire branches of history with a single probe, which is how it +operates so efficiently. + +\subsection{Using the \hgcmd{bisect} command} + +Here's an example of \hgcmd{bisect} in action. + +\begin{note} + In versions 0.9.5 and earlier of Mercurial, \hgcmd{bisect} was not a + core command: it was distributed with Mercurial as an extension. + This section describes the built-in command, not the old extension. +\end{note} + +Now let's create a repository, so that we can try out the +\hgcmd{bisect} command in isolation. +\interaction{bisect.init} +We'll simulate a project that has a bug in it in a simple-minded way: +create trivial changes in a loop, and nominate one specific change +that will have the ``bug''. This loop creates 35 changesets, each +adding a single file to the repository. We'll represent our ``bug'' +with a file that contains the text ``i have a gub''. +\interaction{bisect.commits} + +The next thing that we'd like to do is figure out how to use the +\hgcmd{bisect} command. We can use Mercurial's normal built-in help +mechanism for this. +\interaction{bisect.help} + +The \hgcmd{bisect} command works in steps. Each step proceeds as follows. +\begin{enumerate} +\item You run your binary test. + \begin{itemize} + \item If the test succeeded, you tell \hgcmd{bisect} by running the + \hgcmdargs{bisect}{good} command. + \item If it failed, run the \hgcmdargs{bisect}{--bad} command. + \end{itemize} +\item The command uses your information to decide which changeset to + test next. +\item It updates the working directory to that changeset, and the + process begins again. +\end{enumerate} +The process ends when \hgcmd{bisect} identifies a unique changeset +that marks the point where your test transitioned from ``succeeding'' +to ``failing''. + +To start the search, we must run the \hgcmdargs{bisect}{--reset} command. +\interaction{bisect.search.init} + +In our case, the binary test we use is simple: we check to see if any +file in the repository contains the string ``i have a gub''. If it +does, this changeset contains the change that ``caused the bug''. By +convention, a changeset that has the property we're searching for is +``bad'', while one that doesn't is ``good''. + +Most of the time, the revision to which the working directory is +synced (usually the tip) already exhibits the problem introduced by +the buggy change, so we'll mark it as ``bad''. +\interaction{bisect.search.bad-init} + +Our next task is to nominate a changeset that we know \emph{doesn't} +have the bug; the \hgcmd{bisect} command will ``bracket'' its search +between the first pair of good and bad changesets. In our case, we +know that revision~10 didn't have the bug. (I'll have more words +about choosing the first ``good'' changeset later.) +\interaction{bisect.search.good-init} + +Notice that this command printed some output. +\begin{itemize} +\item It told us how many changesets it must consider before it can + identify the one that introduced the bug, and how many tests that + will require. +\item It updated the working directory to the next changeset to test, + and told us which changeset it's testing. +\end{itemize} + +We now run our test in the working directory. We use the +\command{grep} command to see if our ``bad'' file is present in the +working directory. If it is, this revision is bad; if not, this +revision is good. +\interaction{bisect.search.step1} + +This test looks like a perfect candidate for automation, so let's turn +it into a shell function. +\interaction{bisect.search.mytest} +We can now run an entire test step with a single command, +\texttt{mytest}. +\interaction{bisect.search.step2} +A few more invocations of our canned test step command, and we're +done. +\interaction{bisect.search.rest} + +Even though we had~40 changesets to search through, the \hgcmd{bisect} +command let us find the changeset that introduced our ``bug'' with +only five tests. Because the number of tests that the \hgcmd{bisect} +command performs grows logarithmically with the number of changesets to +search, the advantage that it has over the ``brute force'' search +approach increases with every changeset you add. + +\subsection{Cleaning up after your search} + +When you're finished using the \hgcmd{bisect} command in a +repository, you can use the \hgcmdargs{bisect}{reset} command to drop +the information it was using to drive your search. The command +doesn't use much space, so it doesn't matter if you forget to run this +command. However, \hgcmd{bisect} won't let you start a new search in +that repository until you do a \hgcmdargs{bisect}{reset}. +\interaction{bisect.search.reset} + +\section{Tips for finding bugs effectively} + +\subsection{Give consistent input} + +The \hgcmd{bisect} command requires that you correctly report the +result of every test you perform. If you tell it that a test failed +when it really succeeded, it \emph{might} be able to detect the +inconsistency. If it can identify an inconsistency in your reports, +it will tell you that a particular changeset is both good and bad. +However, it can't do this perfectly; it's about as likely to report +the wrong changeset as the source of the bug. + +\subsection{Automate as much as possible} + +When I started using the \hgcmd{bisect} command, I tried a few times +to run my tests by hand, on the command line. This is an approach +that I, at least, am not suited to. After a few tries, I found that I +was making enough mistakes that I was having to restart my searches +several times before finally getting correct results. + +My initial problems with driving the \hgcmd{bisect} command by hand +occurred even with simple searches on small repositories; if the +problem you're looking for is more subtle, or the number of tests that +\hgcmd{bisect} must perform increases, the likelihood of operator +error ruining the search is much higher. Once I started automating my +tests, I had much better results. + +The key to automated testing is twofold: +\begin{itemize} +\item always test for the same symptom, and +\item always feed consistent input to the \hgcmd{bisect} command. +\end{itemize} +In my tutorial example above, the \command{grep} command tests for the +symptom, and the \texttt{if} statement takes the result of this check +and ensures that we always feed the same input to the \hgcmd{bisect} +command. The \texttt{mytest} function marries these together in a +reproducible way, so that every test is uniform and consistent. + +\subsection{Check your results} + +Because the output of a \hgcmd{bisect} search is only as good as the +input you give it, don't take the changeset it reports as the +absolute truth. A simple way to cross-check its report is to manually +run your test at each of the following changesets: +\begin{itemize} +\item The changeset that it reports as the first bad revision. Your + test should still report this as bad. +\item The parent of that changeset (either parent, if it's a merge). + Your test should report this changeset as good. +\item A child of that changeset. Your test should report this + changeset as bad. +\end{itemize} + +\subsection{Beware interference between bugs} + +It's possible that your search for one bug could be disrupted by the +presence of another. For example, let's say your software crashes at +revision 100, and worked correctly at revision 50. Unknown to you, +someone else introduced a different crashing bug at revision 60, and +fixed it at revision 80. This could distort your results in one of +several ways. + +It is possible that this other bug completely ``masks'' yours, which +is to say that it occurs before your bug has a chance to manifest +itself. If you can't avoid that other bug (for example, it prevents +your project from building), and so can't tell whether your bug is +present in a particular changeset, the \hgcmd{bisect} command cannot +help you directly. Instead, you can mark a changeset as untested by +running \hgcmdargs{bisect}{--skip}. + +A different problem could arise if your test for a bug's presence is +not specific enough. If you check for ``my program crashes'', then +both your crashing bug and an unrelated crashing bug that masks it +will look like the same thing, and mislead \hgcmd{bisect}. + +Another useful situation in which to use \hgcmdargs{bisect}{--skip} is +if you can't test a revision because your project was in a broken and +hence untestable state at that revision, perhaps because someone +checked in a change that prevented the project from building. + +\subsection{Bracket your search lazily} + +Choosing the first ``good'' and ``bad'' changesets that will mark the +end points of your search is often easy, but it bears a little +discussion nevertheless. From the perspective of \hgcmd{bisect}, the +``newest'' changeset is conventionally ``bad'', and the older +changeset is ``good''. + +If you're having trouble remembering when a suitable ``good'' change +was, so that you can tell \hgcmd{bisect}, you could do worse than +testing changesets at random. Just remember to eliminate contenders +that can't possibly exhibit the bug (perhaps because the feature with +the bug isn't present yet) and those where another problem masks the +bug (as I discussed above). + +Even if you end up ``early'' by thousands of changesets or months of +history, you will only add a handful of tests to the total number that +\hgcmd{bisect} must perform, thanks to its logarithmic behaviour. + +%%% Local Variables: +%%% mode: latex +%%% TeX-master: "00book" +%%% End: diff -r bc14f94e726a -r 5cd47f721686 en/ch10-hook.tex --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/en/ch10-hook.tex Thu Jan 29 22:56:27 2009 -0800 @@ -0,0 +1,1413 @@ +\chapter{Handling repository events with hooks} +\label{chap:hook} + +Mercurial offers a powerful mechanism to let you perform automated +actions in response to events that occur in a repository. In some +cases, you can even control Mercurial's response to those events. + +The name Mercurial uses for one of these actions is a \emph{hook}. +Hooks are called ``triggers'' in some revision control systems, but +the two names refer to the same idea. + +\section{An overview of hooks in Mercurial} + +Here is a brief list of the hooks that Mercurial supports. We will +revisit each of these hooks in more detail later, in +section~\ref{sec:hook:ref}. + +\begin{itemize} +\item[\small\hook{changegroup}] This is run after a group of + changesets has been brought into the repository from elsewhere. +\item[\small\hook{commit}] This is run after a new changeset has been + created in the local repository. +\item[\small\hook{incoming}] This is run once for each new changeset + that is brought into the repository from elsewhere. Notice the + difference from \hook{changegroup}, which is run once per + \emph{group} of changesets brought in. +\item[\small\hook{outgoing}] This is run after a group of changesets + has been transmitted from this repository. +\item[\small\hook{prechangegroup}] This is run before starting to + bring a group of changesets into the repository. +\item[\small\hook{precommit}] Controlling. This is run before starting + a commit. +\item[\small\hook{preoutgoing}] Controlling. This is run before + starting to transmit a group of changesets from this repository. +\item[\small\hook{pretag}] Controlling. This is run before creating a tag. +\item[\small\hook{pretxnchangegroup}] Controlling. This is run after a + group of changesets has been brought into the local repository from + another, but before the transaction completes that will make the + changes permanent in the repository. +\item[\small\hook{pretxncommit}] Controlling. This is run after a new + changeset has been created in the local repository, but before the + transaction completes that will make it permanent. +\item[\small\hook{preupdate}] Controlling. This is run before starting + an update or merge of the working directory. +\item[\small\hook{tag}] This is run after a tag is created. +\item[\small\hook{update}] This is run after an update or merge of the + working directory has finished. +\end{itemize} +Each of the hooks whose description begins with the word +``Controlling'' has the ability to determine whether an activity can +proceed. If the hook succeeds, the activity may proceed; if it fails, +the activity is either not permitted or undone, depending on the hook. + +\section{Hooks and security} + +\subsection{Hooks are run with your privileges} + +When you run a Mercurial command in a repository, and the command +causes a hook to run, that hook runs on \emph{your} system, under +\emph{your} user account, with \emph{your} privilege level. Since +hooks are arbitrary pieces of executable code, you should treat them +with an appropriate level of suspicion. Do not install a hook unless +you are confident that you know who created it and what it does. + +In some cases, you may be exposed to hooks that you did not install +yourself. If you work with Mercurial on an unfamiliar system, +Mercurial will run hooks defined in that system's global \hgrc\ file. + +If you are working with a repository owned by another user, Mercurial +can run hooks defined in that user's repository, but it will still run +them as ``you''. For example, if you \hgcmd{pull} from that +repository, and its \sfilename{.hg/hgrc} defines a local +\hook{outgoing} hook, that hook will run under your user account, even +though you don't own that repository. + +\begin{note} + This only applies if you are pulling from a repository on a local or + network filesystem. If you're pulling over http or ssh, any + \hook{outgoing} hook will run under whatever account is executing + the server process, on the server. +\end{note} + +XXX To see what hooks are defined in a repository, use the +\hgcmdargs{config}{hooks} command. If you are working in one +repository, but talking to another that you do not own (e.g.~using +\hgcmd{pull} or \hgcmd{incoming}), remember that it is the other +repository's hooks you should be checking, not your own. + +\subsection{Hooks do not propagate} + +In Mercurial, hooks are not revision controlled, and do not propagate +when you clone, or pull from, a repository. The reason for this is +simple: a hook is a completely arbitrary piece of executable code. It +runs under your user identity, with your privilege level, on your +machine. + +It would be extremely reckless for any distributed revision control +system to implement revision-controlled hooks, as this would offer an +easily exploitable way to subvert the accounts of users of the +revision control system. + +Since Mercurial does not propagate hooks, if you are collaborating +with other people on a common project, you should not assume that they +are using the same Mercurial hooks as you are, or that theirs are +correctly configured. You should document the hooks you expect people +to use. + +In a corporate intranet, this is somewhat easier to control, as you +can for example provide a ``standard'' installation of Mercurial on an +NFS filesystem, and use a site-wide \hgrc\ file to define hooks that +all users will see. However, this too has its limits; see below. + +\subsection{Hooks can be overridden} + +Mercurial allows you to override a hook definition by redefining the +hook. You can disable it by setting its value to the empty string, or +change its behaviour as you wish. + +If you deploy a system-~or site-wide \hgrc\ file that defines some +hooks, you should thus understand that your users can disable or +override those hooks. + +\subsection{Ensuring that critical hooks are run} + +Sometimes you may want to enforce a policy that you do not want others +to be able to work around. For example, you may have a requirement +that every changeset must pass a rigorous set of tests. Defining this +requirement via a hook in a site-wide \hgrc\ won't work for remote +users on laptops, and of course local users can subvert it at will by +overriding the hook. + +Instead, you can set up your policies for use of Mercurial so that +people are expected to propagate changes through a well-known +``canonical'' server that you have locked down and configured +appropriately. + +One way to do this is via a combination of social engineering and +technology. Set up a restricted-access account; users can push +changes over the network to repositories managed by this account, but +they cannot log into the account and run normal shell commands. In +this scenario, a user can commit a changeset that contains any old +garbage they want. + +When someone pushes a changeset to the server that everyone pulls +from, the server will test the changeset before it accepts it as +permanent, and reject it if it fails to pass the test suite. If +people only pull changes from this filtering server, it will serve to +ensure that all changes that people pull have been automatically +vetted. + +\section{Care with \texttt{pretxn} hooks in a shared-access repository} + +If you want to use hooks to do some automated work in a repository +that a number of people have shared access to, you need to be careful +in how you do this. + +Mercurial only locks a repository when it is writing to the +repository, and only the parts of Mercurial that write to the +repository pay attention to locks. Write locks are necessary to +prevent multiple simultaneous writers from scribbling on each other's +work, corrupting the repository. + +Because Mercurial is careful with the order in which it reads and +writes data, it does not need to acquire a lock when it wants to read +data from the repository. The parts of Mercurial that read from the +repository never pay attention to locks. This lockless reading scheme +greatly increases performance and concurrency. + +With great performance comes a trade-off, though, one which has the +potential to cause you trouble unless you're aware of it. To describe +this requires a little detail about how Mercurial adds changesets to a +repository and reads those changes. + +When Mercurial \emph{writes} metadata, it writes it straight into the +destination file. It writes file data first, then manifest data +(which contains pointers to the new file data), then changelog data +(which contains pointers to the new manifest data). Before the first +write to each file, it stores a record of where the end of the file +was in its transaction log. If the transaction must be rolled back, +Mercurial simply truncates each file back to the size it was before the +transaction began. + +When Mercurial \emph{reads} metadata, it reads the changelog first, +then everything else. Since a reader will only access parts of the +manifest or file metadata that it can see in the changelog, it can +never see partially written data. + +Some controlling hooks (\hook{pretxncommit} and +\hook{pretxnchangegroup}) run when a transaction is almost complete. +All of the metadata has been written, but Mercurial can still roll the +transaction back and cause the newly-written data to disappear. + +If one of these hooks runs for long, it opens a window of time during +which a reader can see the metadata for changesets that are not yet +permanent, and should not be thought of as ``really there''. The +longer the hook runs, the longer that window is open. + +\subsection{The problem illustrated} + +In principle, a good use for the \hook{pretxnchangegroup} hook would +be to automatically build and test incoming changes before they are +accepted into a central repository. This could let you guarantee that +nobody can push changes to this repository that ``break the build''. +But if a client can pull changes while they're being tested, the +usefulness of the test is zero; an unsuspecting someone can pull +untested changes, potentially breaking their build. + +The safest technological answer to this challenge is to set up such a +``gatekeeper'' repository as \emph{unidirectional}. Let it take +changes pushed in from the outside, but do not allow anyone to pull +changes from it (use the \hook{preoutgoing} hook to lock it down). +Configure a \hook{changegroup} hook so that if a build or test +succeeds, the hook will push the new changes out to another repository +that people \emph{can} pull from. + +In practice, putting a centralised bottleneck like this in place is +not often a good idea, and transaction visibility has nothing to do +with the problem. As the size of a project---and the time it takes to +build and test---grows, you rapidly run into a wall with this ``try +before you buy'' approach, where you have more changesets to test than +time in which to deal with them. The inevitable result is frustration +on the part of all involved. + +An approach that scales better is to get people to build and test +before they push, then run automated builds and tests centrally +\emph{after} a push, to be sure all is well. The advantage of this +approach is that it does not impose a limit on the rate at which the +repository can accept changes. + +\section{A short tutorial on using hooks} +\label{sec:hook:simple} + +It is easy to write a Mercurial hook. Let's start with a hook that +runs when you finish a \hgcmd{commit}, and simply prints the hash of +the changeset you just created. The hook is called \hook{commit}. + +\begin{figure}[ht] + \interaction{hook.simple.init} + \caption{A simple hook that runs when a changeset is committed} + \label{ex:hook:init} +\end{figure} + +All hooks follow the pattern in example~\ref{ex:hook:init}. You add +an entry to the \rcsection{hooks} section of your \hgrc. On the left +is the name of the event to trigger on; on the right is the action to +take. As you can see, you can run an arbitrary shell command in a +hook. Mercurial passes extra information to the hook using +environment variables (look for \envar{HG\_NODE} in the example). + +\subsection{Performing multiple actions per event} + +Quite often, you will want to define more than one hook for a +particular kind of event, as shown in example~\ref{ex:hook:ext}. +Mercurial lets you do this by adding an \emph{extension} to the end of +a hook's name. You extend a hook's name by giving the name of the +hook, followed by a full stop (the ``\texttt{.}'' character), followed +by some more text of your choosing. For example, Mercurial will run +both \texttt{commit.foo} and \texttt{commit.bar} when the +\texttt{commit} event occurs. + +\begin{figure}[ht] + \interaction{hook.simple.ext} + \caption{Defining a second \hook{commit} hook} + \label{ex:hook:ext} +\end{figure} + +To give a well-defined order of execution when there are multiple +hooks defined for an event, Mercurial sorts hooks by extension, and +executes the hook commands in this sorted order. In the above +example, it will execute \texttt{commit.bar} before +\texttt{commit.foo}, and \texttt{commit} before both. + +It is a good idea to use a somewhat descriptive extension when you +define a new hook. This will help you to remember what the hook was +for. If the hook fails, you'll get an error message that contains the +hook name and extension, so using a descriptive extension could give +you an immediate hint as to why the hook failed (see +section~\ref{sec:hook:perm} for an example). + +\subsection{Controlling whether an activity can proceed} +\label{sec:hook:perm} + +In our earlier examples, we used the \hook{commit} hook, which is +run after a commit has completed. This is one of several Mercurial +hooks that run after an activity finishes. Such hooks have no way of +influencing the activity itself. + +Mercurial defines a number of events that occur before an activity +starts; or after it starts, but before it finishes. Hooks that +trigger on these events have the added ability to choose whether the +activity can continue, or will abort. + +The \hook{pretxncommit} hook runs after a commit has all but +completed. In other words, the metadata representing the changeset +has been written out to disk, but the transaction has not yet been +allowed to complete. The \hook{pretxncommit} hook has the ability to +decide whether the transaction can complete, or must be rolled back. + +If the \hook{pretxncommit} hook exits with a status code of zero, the +transaction is allowed to complete; the commit finishes; and the +\hook{commit} hook is run. If the \hook{pretxncommit} hook exits with +a non-zero status code, the transaction is rolled back; the metadata +representing the changeset is erased; and the \hook{commit} hook is +not run. + +\begin{figure}[ht] + \interaction{hook.simple.pretxncommit} + \caption{Using the \hook{pretxncommit} hook to control commits} + \label{ex:hook:pretxncommit} +\end{figure} + +The hook in example~\ref{ex:hook:pretxncommit} checks that a commit +comment contains a bug ID. If it does, the commit can complete. If +not, the commit is rolled back. + +\section{Writing your own hooks} + +When you are writing a hook, you might find it useful to run Mercurial +either with the \hggopt{-v} option, or the \rcitem{ui}{verbose} config +item set to ``true''. When you do so, Mercurial will print a message +before it calls each hook. + +\subsection{Choosing how your hook should run} +\label{sec:hook:lang} + +You can write a hook either as a normal program---typically a shell +script---or as a Python function that is executed within the Mercurial +process. + +Writing a hook as an external program has the advantage that it +requires no knowledge of Mercurial's internals. You can call normal +Mercurial commands to get any added information you need. The +trade-off is that external hooks are slower than in-process hooks. + +An in-process Python hook has complete access to the Mercurial API, +and does not ``shell out'' to another process, so it is inherently +faster than an external hook. It is also easier to obtain much of the +information that a hook requires by using the Mercurial API than by +running Mercurial commands. + +If you are comfortable with Python, or require high performance, +writing your hooks in Python may be a good choice. However, when you +have a straightforward hook to write and you don't need to care about +performance (probably the majority of hooks), a shell script is +perfectly fine. + +\subsection{Hook parameters} +\label{sec:hook:param} + +Mercurial calls each hook with a set of well-defined parameters. In +Python, a parameter is passed as a keyword argument to your hook +function. For an external program, a parameter is passed as an +environment variable. + +Whether your hook is written in Python or as a shell script, the +hook-specific parameter names and values will be the same. A boolean +parameter will be represented as a boolean value in Python, but as the +number 1 (for ``true'') or 0 (for ``false'') as an environment +variable for an external hook. If a hook parameter is named +\texttt{foo}, the keyword argument for a Python hook will also be +named \texttt{foo}, while the environment variable for an external +hook will be named \texttt{HG\_FOO}. + +\subsection{Hook return values and activity control} + +A hook that executes successfully must exit with a status of zero if +external, or return boolean ``false'' if in-process. Failure is +indicated with a non-zero exit status from an external hook, or an +in-process hook returning boolean ``true''. If an in-process hook +raises an exception, the hook is considered to have failed. + +For a hook that controls whether an activity can proceed, zero/false +means ``allow'', while non-zero/true/exception means ``deny''. + +\subsection{Writing an external hook} + +When you define an external hook in your \hgrc\ and the hook is run, +its value is passed to your shell, which interprets it. This means +that you can use normal shell constructs in the body of the hook. + +An executable hook is always run with its current directory set to a +repository's root directory. + +Each hook parameter is passed in as an environment variable; the name +is upper-cased, and prefixed with the string ``\texttt{HG\_}''. + +With the exception of hook parameters, Mercurial does not set or +modify any environment variables when running a hook. This is useful +to remember if you are writing a site-wide hook that may be run by a +number of different users with differing environment variables set. +In multi-user situations, you should not rely on environment variables +being set to the values you have in your environment when testing the +hook. + +\subsection{Telling Mercurial to use an in-process hook} + +The \hgrc\ syntax for defining an in-process hook is slightly +different than for an executable hook. The value of the hook must +start with the text ``\texttt{python:}'', and continue with the +fully-qualified name of a callable object to use as the hook's value. + +The module in which a hook lives is automatically imported when a hook +is run. So long as you have the module name and \envar{PYTHONPATH} +right, it should ``just work''. + +The following \hgrc\ example snippet illustrates the syntax and +meaning of the notions we just described. +\begin{codesample2} + [hooks] + commit.example = python:mymodule.submodule.myhook +\end{codesample2} +When Mercurial runs the \texttt{commit.example} hook, it imports +\texttt{mymodule.submodule}, looks for the callable object named +\texttt{myhook}, and calls it. + +\subsection{Writing an in-process hook} + +The simplest in-process hook does nothing, but illustrates the basic +shape of the hook API: +\begin{codesample2} + def myhook(ui, repo, **kwargs): + pass +\end{codesample2} +The first argument to a Python hook is always a +\pymodclass{mercurial.ui}{ui} object. The second is a repository object; +at the moment, it is always an instance of +\pymodclass{mercurial.localrepo}{localrepository}. Following these two +arguments are other keyword arguments. Which ones are passed in +depends on the hook being called, but a hook can ignore arguments it +doesn't care about by dropping them into a keyword argument dict, as +with \texttt{**kwargs} above. + +\section{Some hook examples} + +\subsection{Writing meaningful commit messages} + +It's hard to imagine a useful commit message being very short. The +simple \hook{pretxncommit} hook of figure~\ref{ex:hook:msglen.go} +will prevent you from committing a changeset with a message that is +less than ten bytes long. + +\begin{figure}[ht] + \interaction{hook.msglen.go} + \caption{A hook that forbids overly short commit messages} + \label{ex:hook:msglen.go} +\end{figure} + +\subsection{Checking for trailing whitespace} + +An interesting use of a commit-related hook is to help you to write +cleaner code. A simple example of ``cleaner code'' is the dictum that +a change should not add any new lines of text that contain ``trailing +whitespace''. Trailing whitespace is a series of space and tab +characters at the end of a line of text. In most cases, trailing +whitespace is unnecessary, invisible noise, but it is occasionally +problematic, and people often prefer to get rid of it. + +You can use either the \hook{precommit} or \hook{pretxncommit} hook to +tell whether you have a trailing whitespace problem. If you use the +\hook{precommit} hook, the hook will not know which files you are +committing, so it will have to check every modified file in the +repository for trailing white space. If you want to commit a change +to just the file \filename{foo}, but the file \filename{bar} contains +trailing whitespace, doing a check in the \hook{precommit} hook will +prevent you from committing \filename{foo} due to the problem with +\filename{bar}. This doesn't seem right. + +Should you choose the \hook{pretxncommit} hook, the check won't occur +until just before the transaction for the commit completes. This will +allow you to check for problems only the exact files that are being +committed. However, if you entered the commit message interactively +and the hook fails, the transaction will roll back; you'll have to +re-enter the commit message after you fix the trailing whitespace and +run \hgcmd{commit} again. + +\begin{figure}[ht] + \interaction{hook.ws.simple} + \caption{A simple hook that checks for trailing whitespace} + \label{ex:hook:ws.simple} +\end{figure} + +Figure~\ref{ex:hook:ws.simple} introduces a simple \hook{pretxncommit} +hook that checks for trailing whitespace. This hook is short, but not +very helpful. It exits with an error status if a change adds a line +with trailing whitespace to any file, but does not print any +information that might help us to identify the offending file or +line. It also has the nice property of not paying attention to +unmodified lines; only lines that introduce new trailing whitespace +cause problems. + +\begin{figure}[ht] + \interaction{hook.ws.better} + \caption{A better trailing whitespace hook} + \label{ex:hook:ws.better} +\end{figure} + +The example of figure~\ref{ex:hook:ws.better} is much more complex, +but also more useful. It parses a unified diff to see if any lines +add trailing whitespace, and prints the name of the file and the line +number of each such occurrence. Even better, if the change adds +trailing whitespace, this hook saves the commit comment and prints the +name of the save file before exiting and telling Mercurial to roll the +transaction back, so you can use +\hgcmdargs{commit}{\hgopt{commit}{-l}~\emph{filename}} to reuse the +saved commit message once you've corrected the problem. + +As a final aside, note in figure~\ref{ex:hook:ws.better} the use of +\command{perl}'s in-place editing feature to get rid of trailing +whitespace from a file. This is concise and useful enough that I will +reproduce it here. +\begin{codesample2} + perl -pi -e 's,\textbackslash{}s+\$,,' filename +\end{codesample2} + +\section{Bundled hooks} + +Mercurial ships with several bundled hooks. You can find them in the +\dirname{hgext} directory of a Mercurial source tree. If you are +using a Mercurial binary package, the hooks will be located in the +\dirname{hgext} directory of wherever your package installer put +Mercurial. + +\subsection{\hgext{acl}---access control for parts of a repository} + +The \hgext{acl} extension lets you control which remote users are +allowed to push changesets to a networked server. You can protect any +portion of a repository (including the entire repo), so that a +specific remote user can push changes that do not affect the protected +portion. + +This extension implements access control based on the identity of the +user performing a push, \emph{not} on who committed the changesets +they're pushing. It makes sense to use this hook only if you have a +locked-down server environment that authenticates remote users, and +you want to be sure that only specific users are allowed to push +changes to that server. + +\subsubsection{Configuring the \hook{acl} hook} + +In order to manage incoming changesets, the \hgext{acl} hook must be +used as a \hook{pretxnchangegroup} hook. This lets it see which files +are modified by each incoming changeset, and roll back a group of +changesets if they modify ``forbidden'' files. Example: +\begin{codesample2} + [hooks] + pretxnchangegroup.acl = python:hgext.acl.hook +\end{codesample2} + +The \hgext{acl} extension is configured using three sections. + +The \rcsection{acl} section has only one entry, \rcitem{acl}{sources}, +which lists the sources of incoming changesets that the hook should +pay attention to. You don't normally need to configure this section. +\begin{itemize} +\item[\rcitem{acl}{serve}] Control incoming changesets that are arriving + from a remote repository over http or ssh. This is the default + value of \rcitem{acl}{sources}, and usually the only setting you'll + need for this configuration item. +\item[\rcitem{acl}{pull}] Control incoming changesets that are + arriving via a pull from a local repository. +\item[\rcitem{acl}{push}] Control incoming changesets that are + arriving via a push from a local repository. +\item[\rcitem{acl}{bundle}] Control incoming changesets that are + arriving from another repository via a bundle. +\end{itemize} + +The \rcsection{acl.allow} section controls the users that are allowed to +add changesets to the repository. If this section is not present, all +users that are not explicitly denied are allowed. If this section is +present, all users that are not explicitly allowed are denied (so an +empty section means that all users are denied). + +The \rcsection{acl.deny} section determines which users are denied +from adding changesets to the repository. If this section is not +present or is empty, no users are denied. + +The syntaxes for the \rcsection{acl.allow} and \rcsection{acl.deny} +sections are identical. On the left of each entry is a glob pattern +that matches files or directories, relative to the root of the +repository; on the right, a user name. + +In the following example, the user \texttt{docwriter} can only push +changes to the \dirname{docs} subtree of the repository, while +\texttt{intern} can push changes to any file or directory except +\dirname{source/sensitive}. +\begin{codesample2} + [acl.allow] + docs/** = docwriter + + [acl.deny] + source/sensitive/** = intern +\end{codesample2} + +\subsubsection{Testing and troubleshooting} + +If you want to test the \hgext{acl} hook, run it with Mercurial's +debugging output enabled. Since you'll probably be running it on a +server where it's not convenient (or sometimes possible) to pass in +the \hggopt{--debug} option, don't forget that you can enable +debugging output in your \hgrc: +\begin{codesample2} + [ui] + debug = true +\end{codesample2} +With this enabled, the \hgext{acl} hook will print enough information +to let you figure out why it is allowing or forbidding pushes from +specific users. + +\subsection{\hgext{bugzilla}---integration with Bugzilla} + +The \hgext{bugzilla} extension adds a comment to a Bugzilla bug +whenever it finds a reference to that bug ID in a commit comment. You +can install this hook on a shared server, so that any time a remote +user pushes changes to this server, the hook gets run. + +It adds a comment to the bug that looks like this (you can configure +the contents of the comment---see below): +\begin{codesample2} + Changeset aad8b264143a, made by Joe User in + the frobnitz repository, refers to this bug. + + For complete details, see + http://hg.domain.com/frobnitz?cmd=changeset;node=aad8b264143a + + Changeset description: + Fix bug 10483 by guarding against some NULL pointers +\end{codesample2} +The value of this hook is that it automates the process of updating a +bug any time a changeset refers to it. If you configure the hook +properly, it makes it easy for people to browse straight from a +Bugzilla bug to a changeset that refers to that bug. + +You can use the code in this hook as a starting point for some more +exotic Bugzilla integration recipes. Here are a few possibilities: +\begin{itemize} +\item Require that every changeset pushed to the server have a valid + bug~ID in its commit comment. In this case, you'd want to configure + the hook as a \hook{pretxncommit} hook. This would allow the hook + to reject changes that didn't contain bug IDs. +\item Allow incoming changesets to automatically modify the + \emph{state} of a bug, as well as simply adding a comment. For + example, the hook could recognise the string ``fixed bug 31337'' as + indicating that it should update the state of bug 31337 to + ``requires testing''. +\end{itemize} + +\subsubsection{Configuring the \hook{bugzilla} hook} +\label{sec:hook:bugzilla:config} + +You should configure this hook in your server's \hgrc\ as an +\hook{incoming} hook, for example as follows: +\begin{codesample2} + [hooks] + incoming.bugzilla = python:hgext.bugzilla.hook +\end{codesample2} + +Because of the specialised nature of this hook, and because Bugzilla +was not written with this kind of integration in mind, configuring +this hook is a somewhat involved process. + +Before you begin, you must install the MySQL bindings for Python on +the host(s) where you'll be running the hook. If this is not +available as a binary package for your system, you can download it +from~\cite{web:mysql-python}. + +Configuration information for this hook lives in the +\rcsection{bugzilla} section of your \hgrc. +\begin{itemize} +\item[\rcitem{bugzilla}{version}] The version of Bugzilla installed on + the server. The database schema that Bugzilla uses changes + occasionally, so this hook has to know exactly which schema to use. + At the moment, the only version supported is \texttt{2.16}. +\item[\rcitem{bugzilla}{host}] The hostname of the MySQL server that + stores your Bugzilla data. The database must be configured to allow + connections from whatever host you are running the \hook{bugzilla} + hook on. +\item[\rcitem{bugzilla}{user}] The username with which to connect to + the MySQL server. The database must be configured to allow this + user to connect from whatever host you are running the + \hook{bugzilla} hook on. This user must be able to access and + modify Bugzilla tables. The default value of this item is + \texttt{bugs}, which is the standard name of the Bugzilla user in a + MySQL database. +\item[\rcitem{bugzilla}{password}] The MySQL password for the user you + configured above. This is stored as plain text, so you should make + sure that unauthorised users cannot read the \hgrc\ file where you + store this information. +\item[\rcitem{bugzilla}{db}] The name of the Bugzilla database on the + MySQL server. The default value of this item is \texttt{bugs}, + which is the standard name of the MySQL database where Bugzilla + stores its data. +\item[\rcitem{bugzilla}{notify}] If you want Bugzilla to send out a + notification email to subscribers after this hook has added a + comment to a bug, you will need this hook to run a command whenever + it updates the database. The command to run depends on where you + have installed Bugzilla, but it will typically look something like + this, if you have Bugzilla installed in + \dirname{/var/www/html/bugzilla}: + \begin{codesample4} + cd /var/www/html/bugzilla && ./processmail %s nobody@nowhere.com + \end{codesample4} + The Bugzilla \texttt{processmail} program expects to be given a + bug~ID (the hook replaces ``\texttt{\%s}'' with the bug~ID) and an + email address. It also expects to be able to write to some files in + the directory that it runs in. If Bugzilla and this hook are not + installed on the same machine, you will need to find a way to run + \texttt{processmail} on the server where Bugzilla is installed. +\end{itemize} + +\subsubsection{Mapping committer names to Bugzilla user names} + +By default, the \hgext{bugzilla} hook tries to use the email address +of a changeset's committer as the Bugzilla user name with which to +update a bug. If this does not suit your needs, you can map committer +email addresses to Bugzilla user names using a \rcsection{usermap} +section. + +Each item in the \rcsection{usermap} section contains an email address +on the left, and a Bugzilla user name on the right. +\begin{codesample2} + [usermap] + jane.user@example.com = jane +\end{codesample2} +You can either keep the \rcsection{usermap} data in a normal \hgrc, or +tell the \hgext{bugzilla} hook to read the information from an +external \filename{usermap} file. In the latter case, you can store +\filename{usermap} data by itself in (for example) a user-modifiable +repository. This makes it possible to let your users maintain their +own \rcitem{bugzilla}{usermap} entries. The main \hgrc\ file might +look like this: +\begin{codesample2} + # regular hgrc file refers to external usermap file + [bugzilla] + usermap = /home/hg/repos/userdata/bugzilla-usermap.conf +\end{codesample2} +While the \filename{usermap} file that it refers to might look like +this: +\begin{codesample2} + # bugzilla-usermap.conf - inside a hg repository + [usermap] + stephanie@example.com = steph +\end{codesample2} + +\subsubsection{Configuring the text that gets added to a bug} + +You can configure the text that this hook adds as a comment; you +specify it in the form of a Mercurial template. Several \hgrc\ +entries (still in the \rcsection{bugzilla} section) control this +behaviour. +\begin{itemize} +\item[\texttt{strip}] The number of leading path elements to strip + from a repository's path name to construct a partial path for a URL. + For example, if the repositories on your server live under + \dirname{/home/hg/repos}, and you have a repository whose path is + \dirname{/home/hg/repos/app/tests}, then setting \texttt{strip} to + \texttt{4} will give a partial path of \dirname{app/tests}. The + hook will make this partial path available when expanding a + template, as \texttt{webroot}. +\item[\texttt{template}] The text of the template to use. In addition + to the usual changeset-related variables, this template can use + \texttt{hgweb} (the value of the \texttt{hgweb} configuration item + above) and \texttt{webroot} (the path constructed using + \texttt{strip} above). +\end{itemize} + +In addition, you can add a \rcitem{web}{baseurl} item to the +\rcsection{web} section of your \hgrc. The \hgext{bugzilla} hook will +make this available when expanding a template, as the base string to +use when constructing a URL that will let users browse from a Bugzilla +comment to view a changeset. Example: +\begin{codesample2} + [web] + baseurl = http://hg.domain.com/ +\end{codesample2} + +Here is an example set of \hgext{bugzilla} hook config information. +\begin{codesample2} + [bugzilla] + host = bugzilla.example.com + password = mypassword + version = 2.16 + # server-side repos live in /home/hg/repos, so strip 4 leading + # separators + strip = 4 + hgweb = http://hg.example.com/ + usermap = /home/hg/repos/notify/bugzilla.conf + template = Changeset \{node|short\}, made by \{author\} in the \{webroot\} + repo, refers to this bug.\\nFor complete details, see + \{hgweb\}\{webroot\}?cmd=changeset;node=\{node|short\}\\nChangeset + description:\\n\\t\{desc|tabindent\} +\end{codesample2} + +\subsubsection{Testing and troubleshooting} + +The most common problems with configuring the \hgext{bugzilla} hook +relate to running Bugzilla's \filename{processmail} script and mapping +committer names to user names. + +Recall from section~\ref{sec:hook:bugzilla:config} above that the user +that runs the Mercurial process on the server is also the one that +will run the \filename{processmail} script. The +\filename{processmail} script sometimes causes Bugzilla to write to +files in its configuration directory, and Bugzilla's configuration +files are usually owned by the user that your web server runs under. + +You can cause \filename{processmail} to be run with the suitable +user's identity using the \command{sudo} command. Here is an example +entry for a \filename{sudoers} file. +\begin{codesample2} + hg_user = (httpd_user) NOPASSWD: /var/www/html/bugzilla/processmail-wrapper %s +\end{codesample2} +This allows the \texttt{hg\_user} user to run a +\filename{processmail-wrapper} program under the identity of +\texttt{httpd\_user}. + +This indirection through a wrapper script is necessary, because +\filename{processmail} expects to be run with its current directory +set to wherever you installed Bugzilla; you can't specify that kind of +constraint in a \filename{sudoers} file. The contents of the wrapper +script are simple: +\begin{codesample2} + #!/bin/sh + cd `dirname $0` && ./processmail "$1" nobody@example.com +\end{codesample2} +It doesn't seem to matter what email address you pass to +\filename{processmail}. + +If your \rcsection{usermap} is not set up correctly, users will see an +error message from the \hgext{bugzilla} hook when they push changes +to the server. The error message will look like this: +\begin{codesample2} + cannot find bugzilla user id for john.q.public@example.com +\end{codesample2} +What this means is that the committer's address, +\texttt{john.q.public@example.com}, is not a valid Bugzilla user name, +nor does it have an entry in your \rcsection{usermap} that maps it to +a valid Bugzilla user name. + +\subsection{\hgext{notify}---send email notifications} + +Although Mercurial's built-in web server provides RSS feeds of changes +in every repository, many people prefer to receive change +notifications via email. The \hgext{notify} hook lets you send out +notifications to a set of email addresses whenever changesets arrive +that those subscribers are interested in. + +As with the \hgext{bugzilla} hook, the \hgext{notify} hook is +template-driven, so you can customise the contents of the notification +messages that it sends. + +By default, the \hgext{notify} hook includes a diff of every changeset +that it sends out; you can limit the size of the diff, or turn this +feature off entirely. It is useful for letting subscribers review +changes immediately, rather than clicking to follow a URL. + +\subsubsection{Configuring the \hgext{notify} hook} + +You can set up the \hgext{notify} hook to send one email message per +incoming changeset, or one per incoming group of changesets (all those +that arrived in a single pull or push). +\begin{codesample2} + [hooks] + # send one email per group of changes + changegroup.notify = python:hgext.notify.hook + # send one email per change + incoming.notify = python:hgext.notify.hook +\end{codesample2} + +Configuration information for this hook lives in the +\rcsection{notify} section of a \hgrc\ file. +\begin{itemize} +\item[\rcitem{notify}{test}] By default, this hook does not send out + email at all; instead, it prints the message that it \emph{would} + send. Set this item to \texttt{false} to allow email to be sent. + The reason that sending of email is turned off by default is that it + takes several tries to configure this extension exactly as you would + like, and it would be bad form to spam subscribers with a number of + ``broken'' notifications while you debug your configuration. +\item[\rcitem{notify}{config}] The path to a configuration file that + contains subscription information. This is kept separate from the + main \hgrc\ so that you can maintain it in a repository of its own. + People can then clone that repository, update their subscriptions, + and push the changes back to your server. +\item[\rcitem{notify}{strip}] The number of leading path separator + characters to strip from a repository's path, when deciding whether + a repository has subscribers. For example, if the repositories on + your server live in \dirname{/home/hg/repos}, and \hgext{notify} is + considering a repository named \dirname{/home/hg/repos/shared/test}, + setting \rcitem{notify}{strip} to \texttt{4} will cause + \hgext{notify} to trim the path it considers down to + \dirname{shared/test}, and it will match subscribers against that. +\item[\rcitem{notify}{template}] The template text to use when sending + messages. This specifies both the contents of the message header + and its body. +\item[\rcitem{notify}{maxdiff}] The maximum number of lines of diff + data to append to the end of a message. If a diff is longer than + this, it is truncated. By default, this is set to 300. Set this to + \texttt{0} to omit diffs from notification emails. +\item[\rcitem{notify}{sources}] A list of sources of changesets to + consider. This lets you limit \hgext{notify} to only sending out + email about changes that remote users pushed into this repository + via a server, for example. See section~\ref{sec:hook:sources} for + the sources you can specify here. +\end{itemize} + +If you set the \rcitem{web}{baseurl} item in the \rcsection{web} +section, you can use it in a template; it will be available as +\texttt{webroot}. + +Here is an example set of \hgext{notify} configuration information. +\begin{codesample2} + [notify] + # really send email + test = false + # subscriber data lives in the notify repo + config = /home/hg/repos/notify/notify.conf + # repos live in /home/hg/repos on server, so strip 4 "/" chars + strip = 4 + template = X-Hg-Repo: \{webroot\} + Subject: \{webroot\}: \{desc|firstline|strip\} + From: \{author\} + + changeset \{node|short\} in \{root\} + details: \{baseurl\}\{webroot\}?cmd=changeset;node=\{node|short\} + description: + \{desc|tabindent|strip\} + + [web] + baseurl = http://hg.example.com/ +\end{codesample2} + +This will produce a message that looks like the following: +\begin{codesample2} + X-Hg-Repo: tests/slave + Subject: tests/slave: Handle error case when slave has no buffers + Date: Wed, 2 Aug 2006 15:25:46 -0700 (PDT) + + changeset 3cba9bfe74b5 in /home/hg/repos/tests/slave + details: http://hg.example.com/tests/slave?cmd=changeset;node=3cba9bfe74b5 + description: + Handle error case when slave has no buffers + diffs (54 lines): + + diff -r 9d95df7cf2ad -r 3cba9bfe74b5 include/tests.h + --- a/include/tests.h Wed Aug 02 15:19:52 2006 -0700 + +++ b/include/tests.h Wed Aug 02 15:25:26 2006 -0700 + @@ -212,6 +212,15 @@ static __inline__ void test_headers(void *h) + [...snip...] +\end{codesample2} + +\subsubsection{Testing and troubleshooting} + +Do not forget that by default, the \hgext{notify} extension \emph{will + not send any mail} until you explicitly configure it to do so, by +setting \rcitem{notify}{test} to \texttt{false}. Until you do that, +it simply prints the message it \emph{would} send. + +\section{Information for writers of hooks} +\label{sec:hook:ref} + +\subsection{In-process hook execution} + +An in-process hook is called with arguments of the following form: +\begin{codesample2} + def myhook(ui, repo, **kwargs): + pass +\end{codesample2} +The \texttt{ui} parameter is a \pymodclass{mercurial.ui}{ui} object. +The \texttt{repo} parameter is a +\pymodclass{mercurial.localrepo}{localrepository} object. The +names and values of the \texttt{**kwargs} parameters depend on the +hook being invoked, with the following common features: +\begin{itemize} +\item If a parameter is named \texttt{node} or + \texttt{parent\emph{N}}, it will contain a hexadecimal changeset ID. + The empty string is used to represent ``null changeset ID'' instead + of a string of zeroes. +\item If a parameter is named \texttt{url}, it will contain the URL of + a remote repository, if that can be determined. +\item Boolean-valued parameters are represented as Python + \texttt{bool} objects. +\end{itemize} + +An in-process hook is called without a change to the process's working +directory (unlike external hooks, which are run in the root of the +repository). It must not change the process's working directory, or +it will cause any calls it makes into the Mercurial API to fail. + +If a hook returns a boolean ``false'' value, it is considered to have +succeeded. If it returns a boolean ``true'' value or raises an +exception, it is considered to have failed. A useful way to think of +the calling convention is ``tell me if you fail''. + +Note that changeset IDs are passed into Python hooks as hexadecimal +strings, not the binary hashes that Mercurial's APIs normally use. To +convert a hash from hex to binary, use the +\pymodfunc{mercurial.node}{bin} function. + +\subsection{External hook execution} + +An external hook is passed to the shell of the user running Mercurial. +Features of that shell, such as variable substitution and command +redirection, are available. The hook is run in the root directory of +the repository (unlike in-process hooks, which are run in the same +directory that Mercurial was run in). + +Hook parameters are passed to the hook as environment variables. Each +environment variable's name is converted in upper case and prefixed +with the string ``\texttt{HG\_}''. For example, if the name of a +parameter is ``\texttt{node}'', the name of the environment variable +representing that parameter will be ``\texttt{HG\_NODE}''. + +A boolean parameter is represented as the string ``\texttt{1}'' for +``true'', ``\texttt{0}'' for ``false''. If an environment variable is +named \envar{HG\_NODE}, \envar{HG\_PARENT1} or \envar{HG\_PARENT2}, it +contains a changeset ID represented as a hexadecimal string. The +empty string is used to represent ``null changeset ID'' instead of a +string of zeroes. If an environment variable is named +\envar{HG\_URL}, it will contain the URL of a remote repository, if +that can be determined. + +If a hook exits with a status of zero, it is considered to have +succeeded. If it exits with a non-zero status, it is considered to +have failed. + +\subsection{Finding out where changesets come from} + +A hook that involves the transfer of changesets between a local +repository and another may be able to find out information about the +``far side''. Mercurial knows \emph{how} changes are being +transferred, and in many cases \emph{where} they are being transferred +to or from. + +\subsubsection{Sources of changesets} +\label{sec:hook:sources} + +Mercurial will tell a hook what means are, or were, used to transfer +changesets between repositories. This is provided by Mercurial in a +Python parameter named \texttt{source}, or an environment variable named +\envar{HG\_SOURCE}. + +\begin{itemize} +\item[\texttt{serve}] Changesets are transferred to or from a remote + repository over http or ssh. +\item[\texttt{pull}] Changesets are being transferred via a pull from + one repository into another. +\item[\texttt{push}] Changesets are being transferred via a push from + one repository into another. +\item[\texttt{bundle}] Changesets are being transferred to or from a + bundle. +\end{itemize} + +\subsubsection{Where changes are going---remote repository URLs} +\label{sec:hook:url} + +When possible, Mercurial will tell a hook the location of the ``far +side'' of an activity that transfers changeset data between +repositories. This is provided by Mercurial in a Python parameter +named \texttt{url}, or an environment variable named \envar{HG\_URL}. + +This information is not always known. If a hook is invoked in a +repository that is being served via http or ssh, Mercurial cannot tell +where the remote repository is, but it may know where the client is +connecting from. In such cases, the URL will take one of the +following forms: +\begin{itemize} +\item \texttt{remote:ssh:\emph{ip-address}}---remote ssh client, at + the given IP address. +\item \texttt{remote:http:\emph{ip-address}}---remote http client, at + the given IP address. If the client is using SSL, this will be of + the form \texttt{remote:https:\emph{ip-address}}. +\item Empty---no information could be discovered about the remote + client. +\end{itemize} + +\section{Hook reference} + +\subsection{\hook{changegroup}---after remote changesets added} +\label{sec:hook:changegroup} + +This hook is run after a group of pre-existing changesets has been +added to the repository, for example via a \hgcmd{pull} or +\hgcmd{unbundle}. This hook is run once per operation that added one +or more changesets. This is in contrast to the \hook{incoming} hook, +which is run once per changeset, regardless of whether the changesets +arrive in a group. + +Some possible uses for this hook include kicking off an automated +build or test of the added changesets, updating a bug database, or +notifying subscribers that a repository contains new changes. + +Parameters to this hook: +\begin{itemize} +\item[\texttt{node}] A changeset ID. The changeset ID of the first + changeset in the group that was added. All changesets between this + and \index{tags!\texttt{tip}}\texttt{tip}, inclusive, were added by + a single \hgcmd{pull}, \hgcmd{push} or \hgcmd{unbundle}. +\item[\texttt{source}] A string. The source of these changes. See + section~\ref{sec:hook:sources} for details. +\item[\texttt{url}] A URL. The location of the remote repository, if + known. See section~\ref{sec:hook:url} for more information. +\end{itemize} + +See also: \hook{incoming} (section~\ref{sec:hook:incoming}), +\hook{prechangegroup} (section~\ref{sec:hook:prechangegroup}), +\hook{pretxnchangegroup} (section~\ref{sec:hook:pretxnchangegroup}) + +\subsection{\hook{commit}---after a new changeset is created} +\label{sec:hook:commit} + +This hook is run after a new changeset has been created. + +Parameters to this hook: +\begin{itemize} +\item[\texttt{node}] A changeset ID. The changeset ID of the newly + committed changeset. +\item[\texttt{parent1}] A changeset ID. The changeset ID of the first + parent of the newly committed changeset. +\item[\texttt{parent2}] A changeset ID. The changeset ID of the second + parent of the newly committed changeset. +\end{itemize} + +See also: \hook{precommit} (section~\ref{sec:hook:precommit}), +\hook{pretxncommit} (section~\ref{sec:hook:pretxncommit}) + +\subsection{\hook{incoming}---after one remote changeset is added} +\label{sec:hook:incoming} + +This hook is run after a pre-existing changeset has been added to the +repository, for example via a \hgcmd{push}. If a group of changesets +was added in a single operation, this hook is called once for each +added changeset. + +You can use this hook for the same purposes as the \hook{changegroup} +hook (section~\ref{sec:hook:changegroup}); it's simply more convenient +sometimes to run a hook once per group of changesets, while other +times it's handier once per changeset. + +Parameters to this hook: +\begin{itemize} +\item[\texttt{node}] A changeset ID. The ID of the newly added + changeset. +\item[\texttt{source}] A string. The source of these changes. See + section~\ref{sec:hook:sources} for details. +\item[\texttt{url}] A URL. The location of the remote repository, if + known. See section~\ref{sec:hook:url} for more information. +\end{itemize} + +See also: \hook{changegroup} (section~\ref{sec:hook:changegroup}) \hook{prechangegroup} (section~\ref{sec:hook:prechangegroup}), \hook{pretxnchangegroup} (section~\ref{sec:hook:pretxnchangegroup}) + +\subsection{\hook{outgoing}---after changesets are propagated} +\label{sec:hook:outgoing} + +This hook is run after a group of changesets has been propagated out +of this repository, for example by a \hgcmd{push} or \hgcmd{bundle} +command. + +One possible use for this hook is to notify administrators that +changes have been pulled. + +Parameters to this hook: +\begin{itemize} +\item[\texttt{node}] A changeset ID. The changeset ID of the first + changeset of the group that was sent. +\item[\texttt{source}] A string. The source of the of the operation + (see section~\ref{sec:hook:sources}). If a remote client pulled + changes from this repository, \texttt{source} will be + \texttt{serve}. If the client that obtained changes from this + repository was local, \texttt{source} will be \texttt{bundle}, + \texttt{pull}, or \texttt{push}, depending on the operation the + client performed. +\item[\texttt{url}] A URL. The location of the remote repository, if + known. See section~\ref{sec:hook:url} for more information. +\end{itemize} + +See also: \hook{preoutgoing} (section~\ref{sec:hook:preoutgoing}) + +\subsection{\hook{prechangegroup}---before starting to add remote changesets} +\label{sec:hook:prechangegroup} + +This controlling hook is run before Mercurial begins to add a group of +changesets from another repository. + +This hook does not have any information about the changesets to be +added, because it is run before transmission of those changesets is +allowed to begin. If this hook fails, the changesets will not be +transmitted. + +One use for this hook is to prevent external changes from being added +to a repository. For example, you could use this to ``freeze'' a +server-hosted branch temporarily or permanently so that users cannot +push to it, while still allowing a local administrator to modify the +repository. + +Parameters to this hook: +\begin{itemize} +\item[\texttt{source}] A string. The source of these changes. See + section~\ref{sec:hook:sources} for details. +\item[\texttt{url}] A URL. The location of the remote repository, if + known. See section~\ref{sec:hook:url} for more information. +\end{itemize} + +See also: \hook{changegroup} (section~\ref{sec:hook:changegroup}), +\hook{incoming} (section~\ref{sec:hook:incoming}), , +\hook{pretxnchangegroup} (section~\ref{sec:hook:pretxnchangegroup}) + +\subsection{\hook{precommit}---before starting to commit a changeset} +\label{sec:hook:precommit} + +This hook is run before Mercurial begins to commit a new changeset. +It is run before Mercurial has any of the metadata for the commit, +such as the files to be committed, the commit message, or the commit +date. + +One use for this hook is to disable the ability to commit new +changesets, while still allowing incoming changesets. Another is to +run a build or test, and only allow the commit to begin if the build +or test succeeds. + +Parameters to this hook: +\begin{itemize} +\item[\texttt{parent1}] A changeset ID. The changeset ID of the first + parent of the working directory. +\item[\texttt{parent2}] A changeset ID. The changeset ID of the second + parent of the working directory. +\end{itemize} +If the commit proceeds, the parents of the working directory will +become the parents of the new changeset. + +See also: \hook{commit} (section~\ref{sec:hook:commit}), +\hook{pretxncommit} (section~\ref{sec:hook:pretxncommit}) + +\subsection{\hook{preoutgoing}---before starting to propagate changesets} +\label{sec:hook:preoutgoing} + +This hook is invoked before Mercurial knows the identities of the +changesets to be transmitted. + +One use for this hook is to prevent changes from being transmitted to +another repository. + +Parameters to this hook: +\begin{itemize} +\item[\texttt{source}] A string. The source of the operation that is + attempting to obtain changes from this repository (see + section~\ref{sec:hook:sources}). See the documentation for the + \texttt{source} parameter to the \hook{outgoing} hook, in + section~\ref{sec:hook:outgoing}, for possible values of this + parameter. +\item[\texttt{url}] A URL. The location of the remote repository, if + known. See section~\ref{sec:hook:url} for more information. +\end{itemize} + +See also: \hook{outgoing} (section~\ref{sec:hook:outgoing}) + +\subsection{\hook{pretag}---before tagging a changeset} +\label{sec:hook:pretag} + +This controlling hook is run before a tag is created. If the hook +succeeds, creation of the tag proceeds. If the hook fails, the tag is +not created. + +Parameters to this hook: +\begin{itemize} +\item[\texttt{local}] A boolean. Whether the tag is local to this + repository instance (i.e.~stored in \sfilename{.hg/localtags}) or + managed by Mercurial (stored in \sfilename{.hgtags}). +\item[\texttt{node}] A changeset ID. The ID of the changeset to be tagged. +\item[\texttt{tag}] A string. The name of the tag to be created. +\end{itemize} + +If the tag to be created is revision-controlled, the \hook{precommit} +and \hook{pretxncommit} hooks (sections~\ref{sec:hook:commit} +and~\ref{sec:hook:pretxncommit}) will also be run. + +See also: \hook{tag} (section~\ref{sec:hook:tag}) + +\subsection{\hook{pretxnchangegroup}---before completing addition of + remote changesets} +\label{sec:hook:pretxnchangegroup} + +This controlling hook is run before a transaction---that manages the +addition of a group of new changesets from outside the +repository---completes. If the hook succeeds, the transaction +completes, and all of the changesets become permanent within this +repository. If the hook fails, the transaction is rolled back, and +the data for the changesets is erased. + +This hook can access the metadata associated with the almost-added +changesets, but it should not do anything permanent with this data. +It must also not modify the working directory. + +While this hook is running, if other Mercurial processes access this +repository, they will be able to see the almost-added changesets as if +they are permanent. This may lead to race conditions if you do not +take steps to avoid them. + +This hook can be used to automatically vet a group of changesets. If +the hook fails, all of the changesets are ``rejected'' when the +transaction rolls back. + +Parameters to this hook: +\begin{itemize} +\item[\texttt{node}] A changeset ID. The changeset ID of the first + changeset in the group that was added. All changesets between this + and \index{tags!\texttt{tip}}\texttt{tip}, inclusive, were added by + a single \hgcmd{pull}, \hgcmd{push} or \hgcmd{unbundle}. +\item[\texttt{source}] A string. The source of these changes. See + section~\ref{sec:hook:sources} for details. +\item[\texttt{url}] A URL. The location of the remote repository, if + known. See section~\ref{sec:hook:url} for more information. +\end{itemize} + +See also: \hook{changegroup} (section~\ref{sec:hook:changegroup}), +\hook{incoming} (section~\ref{sec:hook:incoming}), +\hook{prechangegroup} (section~\ref{sec:hook:prechangegroup}) + +\subsection{\hook{pretxncommit}---before completing commit of new changeset} +\label{sec:hook:pretxncommit} + +This controlling hook is run before a transaction---that manages a new +commit---completes. If the hook succeeds, the transaction completes +and the changeset becomes permanent within this repository. If the +hook fails, the transaction is rolled back, and the commit data is +erased. + +This hook can access the metadata associated with the almost-new +changeset, but it should not do anything permanent with this data. It +must also not modify the working directory. + +While this hook is running, if other Mercurial processes access this +repository, they will be able to see the almost-new changeset as if it +is permanent. This may lead to race conditions if you do not take +steps to avoid them. + +Parameters to this hook: +\begin{itemize} +\item[\texttt{node}] A changeset ID. The changeset ID of the newly + committed changeset. +\item[\texttt{parent1}] A changeset ID. The changeset ID of the first + parent of the newly committed changeset. +\item[\texttt{parent2}] A changeset ID. The changeset ID of the second + parent of the newly committed changeset. +\end{itemize} + +See also: \hook{precommit} (section~\ref{sec:hook:precommit}) + +\subsection{\hook{preupdate}---before updating or merging working directory} +\label{sec:hook:preupdate} + +This controlling hook is run before an update or merge of the working +directory begins. It is run only if Mercurial's normal pre-update +checks determine that the update or merge can proceed. If the hook +succeeds, the update or merge may proceed; if it fails, the update or +merge does not start. + +Parameters to this hook: +\begin{itemize} +\item[\texttt{parent1}] A changeset ID. The ID of the parent that the + working directory is to be updated to. If the working directory is + being merged, it will not change this parent. +\item[\texttt{parent2}] A changeset ID. Only set if the working + directory is being merged. The ID of the revision that the working + directory is being merged with. +\end{itemize} + +See also: \hook{update} (section~\ref{sec:hook:update}) + +\subsection{\hook{tag}---after tagging a changeset} +\label{sec:hook:tag} + +This hook is run after a tag has been created. + +Parameters to this hook: +\begin{itemize} +\item[\texttt{local}] A boolean. Whether the new tag is local to this + repository instance (i.e.~stored in \sfilename{.hg/localtags}) or + managed by Mercurial (stored in \sfilename{.hgtags}). +\item[\texttt{node}] A changeset ID. The ID of the changeset that was + tagged. +\item[\texttt{tag}] A string. The name of the tag that was created. +\end{itemize} + +If the created tag is revision-controlled, the \hook{commit} hook +(section~\ref{sec:hook:commit}) is run before this hook. + +See also: \hook{pretag} (section~\ref{sec:hook:pretag}) + +\subsection{\hook{update}---after updating or merging working directory} +\label{sec:hook:update} + +This hook is run after an update or merge of the working directory +completes. Since a merge can fail (if the external \command{hgmerge} +command fails to resolve conflicts in a file), this hook communicates +whether the update or merge completed cleanly. + +\begin{itemize} +\item[\texttt{error}] A boolean. Indicates whether the update or + merge completed successfully. +\item[\texttt{parent1}] A changeset ID. The ID of the parent that the + working directory was updated to. If the working directory was + merged, it will not have changed this parent. +\item[\texttt{parent2}] A changeset ID. Only set if the working + directory was merged. The ID of the revision that the working + directory was merged with. +\end{itemize} + +See also: \hook{preupdate} (section~\ref{sec:hook:preupdate}) + +%%% Local Variables: +%%% mode: latex +%%% TeX-master: "00book" +%%% End: diff -r bc14f94e726a -r 5cd47f721686 en/ch11-template.tex --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/en/ch11-template.tex Thu Jan 29 22:56:27 2009 -0800 @@ -0,0 +1,475 @@ +\chapter{Customising the output of Mercurial} +\label{chap:template} + +Mercurial provides a powerful mechanism to let you control how it +displays information. The mechanism is based on templates. You can +use templates to generate specific output for a single command, or to +customise the entire appearance of the built-in web interface. + +\section{Using precanned output styles} +\label{sec:style} + +Packaged with Mercurial are some output styles that you can use +immediately. A style is simply a precanned template that someone +wrote and installed somewhere that Mercurial can find. + +Before we take a look at Mercurial's bundled styles, let's review its +normal output. + +\interaction{template.simple.normal} + +This is somewhat informative, but it takes up a lot of space---five +lines of output per changeset. The \texttt{compact} style reduces +this to three lines, presented in a sparse manner. + +\interaction{template.simple.compact} + +The \texttt{changelog} style hints at the expressive power of +Mercurial's templating engine. This style attempts to follow the GNU +Project's changelog guidelines\cite{web:changelog}. + +\interaction{template.simple.changelog} + +You will not be shocked to learn that Mercurial's default output style +is named \texttt{default}. + +\subsection{Setting a default style} + +You can modify the output style that Mercurial will use for every +command by editing your \hgrc\ file, naming the style you would +prefer to use. + +\begin{codesample2} + [ui] + style = compact +\end{codesample2} + +If you write a style of your own, you can use it by either providing +the path to your style file, or copying your style file into a +location where Mercurial can find it (typically the \texttt{templates} +subdirectory of your Mercurial install directory). + +\section{Commands that support styles and templates} + +All of Mercurial's ``\texttt{log}-like'' commands let you use styles +and templates: \hgcmd{incoming}, \hgcmd{log}, \hgcmd{outgoing}, and +\hgcmd{tip}. + +As I write this manual, these are so far the only commands that +support styles and templates. Since these are the most important +commands that need customisable output, there has been little pressure +from the Mercurial user community to add style and template support to +other commands. + +\section{The basics of templating} + +At its simplest, a Mercurial template is a piece of text. Some of the +text never changes, while other parts are \emph{expanded}, or replaced +with new text, when necessary. + +Before we continue, let's look again at a simple example of +Mercurial's normal output. + +\interaction{template.simple.normal} + +Now, let's run the same command, but using a template to change its +output. + +\interaction{template.simple.simplest} + +The example above illustrates the simplest possible template; it's +just a piece of static text, printed once for each changeset. The +\hgopt{log}{--template} option to the \hgcmd{log} command tells +Mercurial to use the given text as the template when printing each +changeset. + +Notice that the template string above ends with the text +``\Verb+\n+''. This is an \emph{escape sequence}, telling Mercurial +to print a newline at the end of each template item. If you omit this +newline, Mercurial will run each piece of output together. See +section~\ref{sec:template:escape} for more details of escape sequences. + +A template that prints a fixed string of text all the time isn't very +useful; let's try something a bit more complex. + +\interaction{template.simple.simplesub} + +As you can see, the string ``\Verb+{desc}+'' in the template has been +replaced in the output with the description of each changeset. Every +time Mercurial finds text enclosed in curly braces (``\texttt{\{}'' +and ``\texttt{\}}''), it will try to replace the braces and text with +the expansion of whatever is inside. To print a literal curly brace, +you must escape it, as described in section~\ref{sec:template:escape}. + +\section{Common template keywords} +\label{sec:template:keyword} + +You can start writing simple templates immediately using the keywords +below. + +\begin{itemize} +\item[\tplkword{author}] String. The unmodified author of the changeset. +\item[\tplkword{branches}] String. The name of the branch on which + the changeset was committed. Will be empty if the branch name was + \texttt{default}. +\item[\tplkword{date}] Date information. The date when the changeset + was committed. This is \emph{not} human-readable; you must pass it + through a filter that will render it appropriately. See + section~\ref{sec:template:filter} for more information on filters. + The date is expressed as a pair of numbers. The first number is a + Unix UTC timestamp (seconds since January 1, 1970); the second is + the offset of the committer's timezone from UTC, in seconds. +\item[\tplkword{desc}] String. The text of the changeset description. +\item[\tplkword{files}] List of strings. All files modified, added, or + removed by this changeset. +\item[\tplkword{file\_adds}] List of strings. Files added by this + changeset. +\item[\tplkword{file\_dels}] List of strings. Files removed by this + changeset. +\item[\tplkword{node}] String. The changeset identification hash, as a + 40-character hexadecimal string. +\item[\tplkword{parents}] List of strings. The parents of the + changeset. +\item[\tplkword{rev}] Integer. The repository-local changeset revision + number. +\item[\tplkword{tags}] List of strings. Any tags associated with the + changeset. +\end{itemize} + +A few simple experiments will show us what to expect when we use these +keywords; you can see the results in +figure~\ref{fig:template:keywords}. + +\begin{figure} + \interaction{template.simple.keywords} + \caption{Template keywords in use} + \label{fig:template:keywords} +\end{figure} + +As we noted above, the date keyword does not produce human-readable +output, so we must treat it specially. This involves using a +\emph{filter}, about which more in section~\ref{sec:template:filter}. + +\interaction{template.simple.datekeyword} + +\section{Escape sequences} +\label{sec:template:escape} + +Mercurial's templating engine recognises the most commonly used escape +sequences in strings. When it sees a backslash (``\Verb+\+'') +character, it looks at the following character and substitutes the two +characters with a single replacement, as described below. + +\begin{itemize} +\item[\Verb+\textbackslash\textbackslash+] Backslash, ``\Verb+\+'', + ASCII~134. +\item[\Verb+\textbackslash n+] Newline, ASCII~12. +\item[\Verb+\textbackslash r+] Carriage return, ASCII~15. +\item[\Verb+\textbackslash t+] Tab, ASCII~11. +\item[\Verb+\textbackslash v+] Vertical tab, ASCII~13. +\item[\Verb+\textbackslash \{+] Open curly brace, ``\Verb+{+'', ASCII~173. +\item[\Verb+\textbackslash \}+] Close curly brace, ``\Verb+}+'', ASCII~175. +\end{itemize} + +As indicated above, if you want the expansion of a template to contain +a literal ``\Verb+\+'', ``\Verb+{+'', or ``\Verb+{+'' character, you +must escape it. + +\section{Filtering keywords to change their results} +\label{sec:template:filter} + +Some of the results of template expansion are not immediately easy to +use. Mercurial lets you specify an optional chain of \emph{filters} +to modify the result of expanding a keyword. You have already seen a +common filter, \tplkwfilt{date}{isodate}, in action above, to make a +date readable. + +Below is a list of the most commonly used filters that Mercurial +supports. While some filters can be applied to any text, others can +only be used in specific circumstances. The name of each filter is +followed first by an indication of where it can be used, then a +description of its effect. + +\begin{itemize} +\item[\tplfilter{addbreaks}] Any text. Add an XHTML ``\Verb+
+'' + tag before the end of every line except the last. For example, + ``\Verb+foo\nbar+'' becomes ``\Verb+foo
\nbar+''. +\item[\tplkwfilt{date}{age}] \tplkword{date} keyword. Render the + age of the date, relative to the current time. Yields a string like + ``\Verb+10 minutes+''. +\item[\tplfilter{basename}] Any text, but most useful for the + \tplkword{files} keyword and its relatives. Treat the text as a + path, and return the basename. For example, ``\Verb+foo/bar/baz+'' + becomes ``\Verb+baz+''. +\item[\tplkwfilt{date}{date}] \tplkword{date} keyword. Render a date + in a similar format to the Unix \tplkword{date} command, but with + timezone included. Yields a string like + ``\Verb+Mon Sep 04 15:13:13 2006 -0700+''. +\item[\tplkwfilt{author}{domain}] Any text, but most useful for the + \tplkword{author} keyword. Finds the first string that looks like + an email address, and extract just the domain component. For + example, ``\Verb+Bryan O'Sullivan +'' becomes + ``\Verb+serpentine.com+''. +\item[\tplkwfilt{author}{email}] Any text, but most useful for the + \tplkword{author} keyword. Extract the first string that looks like + an email address. For example, + ``\Verb+Bryan O'Sullivan +'' becomes + ``\Verb+bos@serpentine.com+''. +\item[\tplfilter{escape}] Any text. Replace the special XML/XHTML + characters ``\Verb+&+'', ``\Verb+<+'' and ``\Verb+>+'' with + XML entities. +\item[\tplfilter{fill68}] Any text. Wrap the text to fit in 68 + columns. This is useful before you pass text through the + \tplfilter{tabindent} filter, and still want it to fit in an + 80-column fixed-font window. +\item[\tplfilter{fill76}] Any text. Wrap the text to fit in 76 + columns. +\item[\tplfilter{firstline}] Any text. Yield the first line of text, + without any trailing newlines. +\item[\tplkwfilt{date}{hgdate}] \tplkword{date} keyword. Render the + date as a pair of readable numbers. Yields a string like + ``\Verb+1157407993 25200+''. +\item[\tplkwfilt{date}{isodate}] \tplkword{date} keyword. Render the + date as a text string in ISO~8601 format. Yields a string like + ``\Verb+2006-09-04 15:13:13 -0700+''. +\item[\tplfilter{obfuscate}] Any text, but most useful for the + \tplkword{author} keyword. Yield the input text rendered as a + sequence of XML entities. This helps to defeat some particularly + stupid screen-scraping email harvesting spambots. +\item[\tplkwfilt{author}{person}] Any text, but most useful for the + \tplkword{author} keyword. Yield the text before an email address. + For example, ``\Verb+Bryan O'Sullivan +'' + becomes ``\Verb+Bryan O'Sullivan+''. +\item[\tplkwfilt{date}{rfc822date}] \tplkword{date} keyword. Render a + date using the same format used in email headers. Yields a string + like ``\Verb+Mon, 04 Sep 2006 15:13:13 -0700+''. +\item[\tplkwfilt{node}{short}] Changeset hash. Yield the short form + of a changeset hash, i.e.~a 12-character hexadecimal string. +\item[\tplkwfilt{date}{shortdate}] \tplkword{date} keyword. Render + the year, month, and day of the date. Yields a string like + ``\Verb+2006-09-04+''. +\item[\tplfilter{strip}] Any text. Strip all leading and trailing + whitespace from the string. +\item[\tplfilter{tabindent}] Any text. Yield the text, with every line + except the first starting with a tab character. +\item[\tplfilter{urlescape}] Any text. Escape all characters that are + considered ``special'' by URL parsers. For example, \Verb+foo bar+ + becomes \Verb+foo%20bar+. +\item[\tplkwfilt{author}{user}] Any text, but most useful for the + \tplkword{author} keyword. Return the ``user'' portion of an email + address. For example, + ``\Verb+Bryan O'Sullivan +'' becomes + ``\Verb+bos+''. +\end{itemize} + +\begin{figure} + \interaction{template.simple.manyfilters} + \caption{Template filters in action} + \label{fig:template:filters} +\end{figure} + +\begin{note} + If you try to apply a filter to a piece of data that it cannot + process, Mercurial will fail and print a Python exception. For + example, trying to run the output of the \tplkword{desc} keyword + into the \tplkwfilt{date}{isodate} filter is not a good idea. +\end{note} + +\subsection{Combining filters} + +It is easy to combine filters to yield output in the form you would +like. The following chain of filters tidies up a description, then +makes sure that it fits cleanly into 68 columns, then indents it by a +further 8~characters (at least on Unix-like systems, where a tab is +conventionally 8~characters wide). + +\interaction{template.simple.combine} + +Note the use of ``\Verb+\t+'' (a tab character) in the template to +force the first line to be indented; this is necessary since +\tplkword{tabindent} indents all lines \emph{except} the first. + +Keep in mind that the order of filters in a chain is significant. The +first filter is applied to the result of the keyword; the second to +the result of the first filter; and so on. For example, using +\Verb+fill68|tabindent+ gives very different results from +\Verb+tabindent|fill68+. + + +\section{From templates to styles} + +A command line template provides a quick and simple way to format some +output. Templates can become verbose, though, and it's useful to be +able to give a template a name. A style file is a template with a +name, stored in a file. + +More than that, using a style file unlocks the power of Mercurial's +templating engine in ways that are not possible using the command line +\hgopt{log}{--template} option. + +\subsection{The simplest of style files} + +Our simple style file contains just one line: + +\interaction{template.simple.rev} + +This tells Mercurial, ``if you're printing a changeset, use the text +on the right as the template''. + +\subsection{Style file syntax} + +The syntax rules for a style file are simple. + +\begin{itemize} +\item The file is processed one line at a time. + +\item Leading and trailing white space are ignored. + +\item Empty lines are skipped. + +\item If a line starts with either of the characters ``\texttt{\#}'' or + ``\texttt{;}'', the entire line is treated as a comment, and skipped + as if empty. + +\item A line starts with a keyword. This must start with an + alphabetic character or underscore, and can subsequently contain any + alphanumeric character or underscore. (In regexp notation, a + keyword must match \Verb+[A-Za-z_][A-Za-z0-9_]*+.) + +\item The next element must be an ``\texttt{=}'' character, which can + be preceded or followed by an arbitrary amount of white space. + +\item If the rest of the line starts and ends with matching quote + characters (either single or double quote), it is treated as a + template body. + +\item If the rest of the line \emph{does not} start with a quote + character, it is treated as the name of a file; the contents of this + file will be read and used as a template body. +\end{itemize} + +\section{Style files by example} + +To illustrate how to write a style file, we will construct a few by +example. Rather than provide a complete style file and walk through +it, we'll mirror the usual process of developing a style file by +starting with something very simple, and walking through a series of +successively more complete examples. + +\subsection{Identifying mistakes in style files} + +If Mercurial encounters a problem in a style file you are working on, +it prints a terse error message that, once you figure out what it +means, is actually quite useful. + +\interaction{template.svnstyle.syntax.input} + +Notice that \filename{broken.style} attempts to define a +\texttt{changeset} keyword, but forgets to give any content for it. +When instructed to use this style file, Mercurial promptly complains. + +\interaction{template.svnstyle.syntax.error} + +This error message looks intimidating, but it is not too hard to +follow. + +\begin{itemize} +\item The first component is simply Mercurial's way of saying ``I am + giving up''. + \begin{codesample4} + \textbf{abort:} broken.style:1: parse error + \end{codesample4} + +\item Next comes the name of the style file that contains the error. + \begin{codesample4} + abort: \textbf{broken.style}:1: parse error + \end{codesample4} + +\item Following the file name is the line number where the error was + encountered. + \begin{codesample4} + abort: broken.style:\textbf{1}: parse error + \end{codesample4} + +\item Finally, a description of what went wrong. + \begin{codesample4} + abort: broken.style:1: \textbf{parse error} + \end{codesample4} + The description of the problem is not always clear (as in this + case), but even when it is cryptic, it is almost always trivial to + visually inspect the offending line in the style file and see what + is wrong. +\end{itemize} + +\subsection{Uniquely identifying a repository} + +If you would like to be able to identify a Mercurial repository +``fairly uniquely'' using a short string as an identifier, you can +use the first revision in the repository. +\interaction{template.svnstyle.id} +This is not guaranteed to be unique, but it is nevertheless useful in +many cases. +\begin{itemize} +\item It will not work in a completely empty repository, because such + a repository does not have a revision~zero. +\item Neither will it work in the (extremely rare) case where a + repository is a merge of two or more formerly independent + repositories, and you still have those repositories around. +\end{itemize} +Here are some uses to which you could put this identifier: +\begin{itemize} +\item As a key into a table for a database that manages repositories + on a server. +\item As half of a \{\emph{repository~ID}, \emph{revision~ID}\} tuple. + Save this information away when you run an automated build or other + activity, so that you can ``replay'' the build later if necessary. +\end{itemize} + +\subsection{Mimicking Subversion's output} + +Let's try to emulate the default output format used by another +revision control tool, Subversion. +\interaction{template.svnstyle.short} + +Since Subversion's output style is fairly simple, it is easy to +copy-and-paste a hunk of its output into a file, and replace the text +produced above by Subversion with the template values we'd like to see +expanded. +\interaction{template.svnstyle.template} + +There are a few small ways in which this template deviates from the +output produced by Subversion. +\begin{itemize} +\item Subversion prints a ``readable'' date (the ``\texttt{Wed, 27 Sep + 2006}'' in the example output above) in parentheses. Mercurial's + templating engine does not provide a way to display a date in this + format without also printing the time and time zone. +\item We emulate Subversion's printing of ``separator'' lines full of + ``\texttt{-}'' characters by ending the template with such a line. + We use the templating engine's \tplkword{header} keyword to print a + separator line as the first line of output (see below), thus + achieving similar output to Subversion. +\item Subversion's output includes a count in the header of the number + of lines in the commit message. We cannot replicate this in + Mercurial; the templating engine does not currently provide a filter + that counts the number of lines the template generates. +\end{itemize} +It took me no more than a minute or two of work to replace literal +text from an example of Subversion's output with some keywords and +filters to give the template above. The style file simply refers to +the template. +\interaction{template.svnstyle.style} + +We could have included the text of the template file directly in the +style file by enclosing it in quotes and replacing the newlines with +``\verb!\n!'' sequences, but it would have made the style file too +difficult to read. Readability is a good guide when you're trying to +decide whether some text belongs in a style file, or in a template +file that the style file points to. If the style file will look too +big or cluttered if you insert a literal piece of text, drop it into a +template instead. + +%%% Local Variables: +%%% mode: latex +%%% TeX-master: "00book" +%%% End: diff -r bc14f94e726a -r 5cd47f721686 en/ch12-mq.tex --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/en/ch12-mq.tex Thu Jan 29 22:56:27 2009 -0800 @@ -0,0 +1,1043 @@ +\chapter{Managing change with Mercurial Queues} +\label{chap:mq} + +\section{The patch management problem} +\label{sec:mq:patch-mgmt} + +Here is a common scenario: you need to install a software package from +source, but you find a bug that you must fix in the source before you +can start using the package. You make your changes, forget about the +package for a while, and a few months later you need to upgrade to a +newer version of the package. If the newer version of the package +still has the bug, you must extract your fix from the older source +tree and apply it against the newer version. This is a tedious task, +and it's easy to make mistakes. + +This is a simple case of the ``patch management'' problem. You have +an ``upstream'' source tree that you can't change; you need to make +some local changes on top of the upstream tree; and you'd like to be +able to keep those changes separate, so that you can apply them to +newer versions of the upstream source. + +The patch management problem arises in many situations. Probably the +most visible is that a user of an open source software project will +contribute a bug fix or new feature to the project's maintainers in the +form of a patch. + +Distributors of operating systems that include open source software +often need to make changes to the packages they distribute so that +they will build properly in their environments. + +When you have few changes to maintain, it is easy to manage a single +patch using the standard \command{diff} and \command{patch} programs +(see section~\ref{sec:mq:patch} for a discussion of these tools). +Once the number of changes grows, it starts to make sense to maintain +patches as discrete ``chunks of work,'' so that for example a single +patch will contain only one bug fix (the patch might modify several +files, but it's doing ``only one thing''), and you may have a number +of such patches for different bugs you need fixed and local changes +you require. In this situation, if you submit a bug fix patch to the +upstream maintainers of a package and they include your fix in a +subsequent release, you can simply drop that single patch when you're +updating to the newer release. + +Maintaining a single patch against an upstream tree is a little +tedious and error-prone, but not difficult. However, the complexity +of the problem grows rapidly as the number of patches you have to +maintain increases. With more than a tiny number of patches in hand, +understanding which ones you have applied and maintaining them moves +from messy to overwhelming. + +Fortunately, Mercurial includes a powerful extension, Mercurial Queues +(or simply ``MQ''), that massively simplifies the patch management +problem. + +\section{The prehistory of Mercurial Queues} +\label{sec:mq:history} + +During the late 1990s, several Linux kernel developers started to +maintain ``patch series'' that modified the behaviour of the Linux +kernel. Some of these series were focused on stability, some on +feature coverage, and others were more speculative. + +The sizes of these patch series grew rapidly. In 2002, Andrew Morton +published some shell scripts he had been using to automate the task of +managing his patch queues. Andrew was successfully using these +scripts to manage hundreds (sometimes thousands) of patches on top of +the Linux kernel. + +\subsection{A patchwork quilt} +\label{sec:mq:quilt} + +In early 2003, Andreas Gruenbacher and Martin Quinson borrowed the +approach of Andrew's scripts and published a tool called ``patchwork +quilt''~\cite{web:quilt}, or simply ``quilt'' +(see~\cite{gruenbacher:2005} for a paper describing it). Because +quilt substantially automated patch management, it rapidly gained a +large following among open source software developers. + +Quilt manages a \emph{stack of patches} on top of a directory tree. +To begin, you tell quilt to manage a directory tree, and tell it which +files you want to manage; it stores away the names and contents of +those files. To fix a bug, you create a new patch (using a single +command), edit the files you need to fix, then ``refresh'' the patch. + +The refresh step causes quilt to scan the directory tree; it updates +the patch with all of the changes you have made. You can create +another patch on top of the first, which will track the changes +required to modify the tree from ``tree with one patch applied'' to +``tree with two patches applied''. + +You can \emph{change} which patches are applied to the tree. If you +``pop'' a patch, the changes made by that patch will vanish from the +directory tree. Quilt remembers which patches you have popped, +though, so you can ``push'' a popped patch again, and the directory +tree will be restored to contain the modifications in the patch. Most +importantly, you can run the ``refresh'' command at any time, and the +topmost applied patch will be updated. This means that you can, at +any time, change both which patches are applied and what +modifications those patches make. + +Quilt knows nothing about revision control tools, so it works equally +well on top of an unpacked tarball or a Subversion working copy. + +\subsection{From patchwork quilt to Mercurial Queues} +\label{sec:mq:quilt-mq} + +In mid-2005, Chris Mason took the features of quilt and wrote an +extension that he called Mercurial Queues, which added quilt-like +behaviour to Mercurial. + +The key difference between quilt and MQ is that quilt knows nothing +about revision control systems, while MQ is \emph{integrated} into +Mercurial. Each patch that you push is represented as a Mercurial +changeset. Pop a patch, and the changeset goes away. + +Because quilt does not care about revision control tools, it is still +a tremendously useful piece of software to know about for situations +where you cannot use Mercurial and MQ. + +\section{The huge advantage of MQ} + +I cannot overstate the value that MQ offers through the unification of +patches and revision control. + +A major reason that patches have persisted in the free software and +open source world---in spite of the availability of increasingly +capable revision control tools over the years---is the \emph{agility} +they offer. + +Traditional revision control tools make a permanent, irreversible +record of everything that you do. While this has great value, it's +also somewhat stifling. If you want to perform a wild-eyed +experiment, you have to be careful in how you go about it, or you risk +leaving unneeded---or worse, misleading or destabilising---traces of +your missteps and errors in the permanent revision record. + +By contrast, MQ's marriage of distributed revision control with +patches makes it much easier to isolate your work. Your patches live +on top of normal revision history, and you can make them disappear or +reappear at will. If you don't like a patch, you can drop it. If a +patch isn't quite as you want it to be, simply fix it---as many times +as you need to, until you have refined it into the form you desire. + +As an example, the integration of patches with revision control makes +understanding patches and debugging their effects---and their +interplay with the code they're based on---\emph{enormously} easier. +Since every applied patch has an associated changeset, you can use +\hgcmdargs{log}{\emph{filename}} to see which changesets and patches +affected a file. You can use the \hgext{bisect} command to +binary-search through all changesets and applied patches to see where +a bug got introduced or fixed. You can use the \hgcmd{annotate} +command to see which changeset or patch modified a particular line of +a source file. And so on. + +\section{Understanding patches} +\label{sec:mq:patch} + +Because MQ doesn't hide its patch-oriented nature, it is helpful to +understand what patches are, and a little about the tools that work +with them. + +The traditional Unix \command{diff} command compares two files, and +prints a list of differences between them. The \command{patch} command +understands these differences as \emph{modifications} to make to a +file. Take a look at figure~\ref{ex:mq:diff} for a simple example of +these commands in action. + +\begin{figure}[ht] + \interaction{mq.dodiff.diff} + \caption{Simple uses of the \command{diff} and \command{patch} commands} + \label{ex:mq:diff} +\end{figure} + +The type of file that \command{diff} generates (and \command{patch} +takes as input) is called a ``patch'' or a ``diff''; there is no +difference between a patch and a diff. (We'll use the term ``patch'', +since it's more commonly used.) + +A patch file can start with arbitrary text; the \command{patch} +command ignores this text, but MQ uses it as the commit message when +creating changesets. To find the beginning of the patch content, +\command{patch} searches for the first line that starts with the +string ``\texttt{diff~-}''. + +MQ works with \emph{unified} diffs (\command{patch} can accept several +other diff formats, but MQ doesn't). A unified diff contains two +kinds of header. The \emph{file header} describes the file being +modified; it contains the name of the file to modify. When +\command{patch} sees a new file header, it looks for a file with that +name to start modifying. + +After the file header comes a series of \emph{hunks}. Each hunk +starts with a header; this identifies the range of line numbers within +the file that the hunk should modify. Following the header, a hunk +starts and ends with a few (usually three) lines of text from the +unmodified file; these are called the \emph{context} for the hunk. If +there's only a small amount of context between successive hunks, +\command{diff} doesn't print a new hunk header; it just runs the hunks +together, with a few lines of context between modifications. + +Each line of context begins with a space character. Within the hunk, +a line that begins with ``\texttt{-}'' means ``remove this line,'' +while a line that begins with ``\texttt{+}'' means ``insert this +line.'' For example, a line that is modified is represented by one +deletion and one insertion. + +We will return to some of the more subtle aspects of patches later (in +section~\ref{sec:mq:adv-patch}), but you should have enough information +now to use MQ. + +\section{Getting started with Mercurial Queues} +\label{sec:mq:start} + +Because MQ is implemented as an extension, you must explicitly enable +before you can use it. (You don't need to download anything; MQ ships +with the standard Mercurial distribution.) To enable MQ, edit your +\tildefile{.hgrc} file, and add the lines in figure~\ref{ex:mq:config}. + +\begin{figure}[ht] + \begin{codesample4} + [extensions] + hgext.mq = + \end{codesample4} + \label{ex:mq:config} + \caption{Contents to add to \tildefile{.hgrc} to enable the MQ extension} +\end{figure} + +Once the extension is enabled, it will make a number of new commands +available. To verify that the extension is working, you can use +\hgcmd{help} to see if the \hgxcmd{mq}{qinit} command is now available; see +the example in figure~\ref{ex:mq:enabled}. + +\begin{figure}[ht] + \interaction{mq.qinit-help.help} + \caption{How to verify that MQ is enabled} + \label{ex:mq:enabled} +\end{figure} + +You can use MQ with \emph{any} Mercurial repository, and its commands +only operate within that repository. To get started, simply prepare +the repository using the \hgxcmd{mq}{qinit} command (see +figure~\ref{ex:mq:qinit}). This command creates an empty directory +called \sdirname{.hg/patches}, where MQ will keep its metadata. As +with many Mercurial commands, the \hgxcmd{mq}{qinit} command prints nothing +if it succeeds. + +\begin{figure}[ht] + \interaction{mq.tutorial.qinit} + \caption{Preparing a repository for use with MQ} + \label{ex:mq:qinit} +\end{figure} + +\begin{figure}[ht] + \interaction{mq.tutorial.qnew} + \caption{Creating a new patch} + \label{ex:mq:qnew} +\end{figure} + +\subsection{Creating a new patch} + +To begin work on a new patch, use the \hgxcmd{mq}{qnew} command. This +command takes one argument, the name of the patch to create. MQ will +use this as the name of an actual file in the \sdirname{.hg/patches} +directory, as you can see in figure~\ref{ex:mq:qnew}. + +Also newly present in the \sdirname{.hg/patches} directory are two +other files, \sfilename{series} and \sfilename{status}. The +\sfilename{series} file lists all of the patches that MQ knows about +for this repository, with one patch per line. Mercurial uses the +\sfilename{status} file for internal book-keeping; it tracks all of the +patches that MQ has \emph{applied} in this repository. + +\begin{note} + You may sometimes want to edit the \sfilename{series} file by hand; + for example, to change the sequence in which some patches are + applied. However, manually editing the \sfilename{status} file is + almost always a bad idea, as it's easy to corrupt MQ's idea of what + is happening. +\end{note} + +Once you have created your new patch, you can edit files in the +working directory as you usually would. All of the normal Mercurial +commands, such as \hgcmd{diff} and \hgcmd{annotate}, work exactly as +they did before. + +\subsection{Refreshing a patch} + +When you reach a point where you want to save your work, use the +\hgxcmd{mq}{qrefresh} command (figure~\ref{ex:mq:qnew}) to update the patch +you are working on. This command folds the changes you have made in +the working directory into your patch, and updates its corresponding +changeset to contain those changes. + +\begin{figure}[ht] + \interaction{mq.tutorial.qrefresh} + \caption{Refreshing a patch} + \label{ex:mq:qrefresh} +\end{figure} + +You can run \hgxcmd{mq}{qrefresh} as often as you like, so it's a good way +to ``checkpoint'' your work. Refresh your patch at an opportune +time; try an experiment; and if the experiment doesn't work out, +\hgcmd{revert} your modifications back to the last time you refreshed. + +\begin{figure}[ht] + \interaction{mq.tutorial.qrefresh2} + \caption{Refresh a patch many times to accumulate changes} + \label{ex:mq:qrefresh2} +\end{figure} + +\subsection{Stacking and tracking patches} + +Once you have finished working on a patch, or need to work on another, +you can use the \hgxcmd{mq}{qnew} command again to create a new patch. +Mercurial will apply this patch on top of your existing patch. See +figure~\ref{ex:mq:qnew2} for an example. Notice that the patch +contains the changes in our prior patch as part of its context (you +can see this more clearly in the output of \hgcmd{annotate}). + +\begin{figure}[ht] + \interaction{mq.tutorial.qnew2} + \caption{Stacking a second patch on top of the first} + \label{ex:mq:qnew2} +\end{figure} + +So far, with the exception of \hgxcmd{mq}{qnew} and \hgxcmd{mq}{qrefresh}, we've +been careful to only use regular Mercurial commands. However, MQ +provides many commands that are easier to use when you are thinking +about patches, as illustrated in figure~\ref{ex:mq:qseries}: + +\begin{itemize} +\item The \hgxcmd{mq}{qseries} command lists every patch that MQ knows + about in this repository, from oldest to newest (most recently + \emph{created}). +\item The \hgxcmd{mq}{qapplied} command lists every patch that MQ has + \emph{applied} in this repository, again from oldest to newest (most + recently applied). +\end{itemize} + +\begin{figure}[ht] + \interaction{mq.tutorial.qseries} + \caption{Understanding the patch stack with \hgxcmd{mq}{qseries} and + \hgxcmd{mq}{qapplied}} + \label{ex:mq:qseries} +\end{figure} + +\subsection{Manipulating the patch stack} + +The previous discussion implied that there must be a difference +between ``known'' and ``applied'' patches, and there is. MQ can +manage a patch without it being applied in the repository. + +An \emph{applied} patch has a corresponding changeset in the +repository, and the effects of the patch and changeset are visible in +the working directory. You can undo the application of a patch using +the \hgxcmd{mq}{qpop} command. MQ still \emph{knows about}, or manages, a +popped patch, but the patch no longer has a corresponding changeset in +the repository, and the working directory does not contain the changes +made by the patch. Figure~\ref{fig:mq:stack} illustrates the +difference between applied and tracked patches. + +\begin{figure}[ht] + \centering + \grafix{mq-stack} + \caption{Applied and unapplied patches in the MQ patch stack} + \label{fig:mq:stack} +\end{figure} + +You can reapply an unapplied, or popped, patch using the \hgxcmd{mq}{qpush} +command. This creates a new changeset to correspond to the patch, and +the patch's changes once again become present in the working +directory. See figure~\ref{ex:mq:qpop} for examples of \hgxcmd{mq}{qpop} +and \hgxcmd{mq}{qpush} in action. Notice that once we have popped a patch +or two patches, the output of \hgxcmd{mq}{qseries} remains the same, while +that of \hgxcmd{mq}{qapplied} has changed. + +\begin{figure}[ht] + \interaction{mq.tutorial.qpop} + \caption{Modifying the stack of applied patches} + \label{ex:mq:qpop} +\end{figure} + +\subsection{Pushing and popping many patches} + +While \hgxcmd{mq}{qpush} and \hgxcmd{mq}{qpop} each operate on a single patch at +a time by default, you can push and pop many patches in one go. The +\hgxopt{mq}{qpush}{-a} option to \hgxcmd{mq}{qpush} causes it to push all +unapplied patches, while the \hgxopt{mq}{qpop}{-a} option to \hgxcmd{mq}{qpop} +causes it to pop all applied patches. (For some more ways to push and +pop many patches, see section~\ref{sec:mq:perf} below.) + +\begin{figure}[ht] + \interaction{mq.tutorial.qpush-a} + \caption{Pushing all unapplied patches} + \label{ex:mq:qpush-a} +\end{figure} + +\subsection{Safety checks, and overriding them} + +Several MQ commands check the working directory before they do +anything, and fail if they find any modifications. They do this to +ensure that you won't lose any changes that you have made, but not yet +incorporated into a patch. Figure~\ref{ex:mq:add} illustrates this; +the \hgxcmd{mq}{qnew} command will not create a new patch if there are +outstanding changes, caused in this case by the \hgcmd{add} of +\filename{file3}. + +\begin{figure}[ht] + \interaction{mq.tutorial.add} + \caption{Forcibly creating a patch} + \label{ex:mq:add} +\end{figure} + +Commands that check the working directory all take an ``I know what +I'm doing'' option, which is always named \option{-f}. The exact +meaning of \option{-f} depends on the command. For example, +\hgcmdargs{qnew}{\hgxopt{mq}{qnew}{-f}} will incorporate any outstanding +changes into the new patch it creates, but +\hgcmdargs{qpop}{\hgxopt{mq}{qpop}{-f}} will revert modifications to any +files affected by the patch that it is popping. Be sure to read the +documentation for a command's \option{-f} option before you use it! + +\subsection{Working on several patches at once} + +The \hgxcmd{mq}{qrefresh} command always refreshes the \emph{topmost} +applied patch. This means that you can suspend work on one patch (by +refreshing it), pop or push to make a different patch the top, and +work on \emph{that} patch for a while. + +Here's an example that illustrates how you can use this ability. +Let's say you're developing a new feature as two patches. The first +is a change to the core of your software, and the second---layered on +top of the first---changes the user interface to use the code you just +added to the core. If you notice a bug in the core while you're +working on the UI patch, it's easy to fix the core. Simply +\hgxcmd{mq}{qrefresh} the UI patch to save your in-progress changes, and +\hgxcmd{mq}{qpop} down to the core patch. Fix the core bug, +\hgxcmd{mq}{qrefresh} the core patch, and \hgxcmd{mq}{qpush} back to the UI +patch to continue where you left off. + +\section{More about patches} +\label{sec:mq:adv-patch} + +MQ uses the GNU \command{patch} command to apply patches, so it's +helpful to know a few more detailed aspects of how \command{patch} +works, and about patches themselves. + +\subsection{The strip count} + +If you look at the file headers in a patch, you will notice that the +pathnames usually have an extra component on the front that isn't +present in the actual path name. This is a holdover from the way that +people used to generate patches (people still do this, but it's +somewhat rare with modern revision control tools). + +Alice would unpack a tarball, edit her files, then decide that she +wanted to create a patch. So she'd rename her working directory, +unpack the tarball again (hence the need for the rename), and use the +\cmdopt{diff}{-r} and \cmdopt{diff}{-N} options to \command{diff} to +recursively generate a patch between the unmodified directory and the +modified one. The result would be that the name of the unmodified +directory would be at the front of the left-hand path in every file +header, and the name of the modified directory would be at the front +of the right-hand path. + +Since someone receiving a patch from the Alices of the net would be +unlikely to have unmodified and modified directories with exactly the +same names, the \command{patch} command has a \cmdopt{patch}{-p} +option that indicates the number of leading path name components to +strip when trying to apply a patch. This number is called the +\emph{strip count}. + +An option of ``\texttt{-p1}'' means ``use a strip count of one''. If +\command{patch} sees a file name \filename{foo/bar/baz} in a file +header, it will strip \filename{foo} and try to patch a file named +\filename{bar/baz}. (Strictly speaking, the strip count refers to the +number of \emph{path separators} (and the components that go with them +) to strip. A strip count of one will turn \filename{foo/bar} into +\filename{bar}, but \filename{/foo/bar} (notice the extra leading +slash) into \filename{foo/bar}.) + +The ``standard'' strip count for patches is one; almost all patches +contain one leading path name component that needs to be stripped. +Mercurial's \hgcmd{diff} command generates path names in this form, +and the \hgcmd{import} command and MQ expect patches to have a strip +count of one. + +If you receive a patch from someone that you want to add to your patch +queue, and the patch needs a strip count other than one, you cannot +just \hgxcmd{mq}{qimport} the patch, because \hgxcmd{mq}{qimport} does not yet +have a \texttt{-p} option (see~\bug{311}). Your best bet is to +\hgxcmd{mq}{qnew} a patch of your own, then use \cmdargs{patch}{-p\emph{N}} +to apply their patch, followed by \hgcmd{addremove} to pick up any +files added or removed by the patch, followed by \hgxcmd{mq}{qrefresh}. +This complexity may become unnecessary; see~\bug{311} for details. +\subsection{Strategies for applying a patch} + +When \command{patch} applies a hunk, it tries a handful of +successively less accurate strategies to try to make the hunk apply. +This falling-back technique often makes it possible to take a patch +that was generated against an old version of a file, and apply it +against a newer version of that file. + +First, \command{patch} tries an exact match, where the line numbers, +the context, and the text to be modified must apply exactly. If it +cannot make an exact match, it tries to find an exact match for the +context, without honouring the line numbering information. If this +succeeds, it prints a line of output saying that the hunk was applied, +but at some \emph{offset} from the original line number. + +If a context-only match fails, \command{patch} removes the first and +last lines of the context, and tries a \emph{reduced} context-only +match. If the hunk with reduced context succeeds, it prints a message +saying that it applied the hunk with a \emph{fuzz factor} (the number +after the fuzz factor indicates how many lines of context +\command{patch} had to trim before the patch applied). + +When neither of these techniques works, \command{patch} prints a +message saying that the hunk in question was rejected. It saves +rejected hunks (also simply called ``rejects'') to a file with the +same name, and an added \sfilename{.rej} extension. It also saves an +unmodified copy of the file with a \sfilename{.orig} extension; the +copy of the file without any extensions will contain any changes made +by hunks that \emph{did} apply cleanly. If you have a patch that +modifies \filename{foo} with six hunks, and one of them fails to +apply, you will have: an unmodified \filename{foo.orig}, a +\filename{foo.rej} containing one hunk, and \filename{foo}, containing +the changes made by the five successful hunks. + +\subsection{Some quirks of patch representation} + +There are a few useful things to know about how \command{patch} works +with files. +\begin{itemize} +\item This should already be obvious, but \command{patch} cannot + handle binary files. +\item Neither does it care about the executable bit; it creates new + files as readable, but not executable. +\item \command{patch} treats the removal of a file as a diff between + the file to be removed and the empty file. So your idea of ``I + deleted this file'' looks like ``every line of this file was + deleted'' in a patch. +\item It treats the addition of a file as a diff between the empty + file and the file to be added. So in a patch, your idea of ``I + added this file'' looks like ``every line of this file was added''. +\item It treats a renamed file as the removal of the old name, and the + addition of the new name. This means that renamed files have a big + footprint in patches. (Note also that Mercurial does not currently + try to infer when files have been renamed or copied in a patch.) +\item \command{patch} cannot represent empty files, so you cannot use + a patch to represent the notion ``I added this empty file to the + tree''. +\end{itemize} +\subsection{Beware the fuzz} + +While applying a hunk at an offset, or with a fuzz factor, will often +be completely successful, these inexact techniques naturally leave +open the possibility of corrupting the patched file. The most common +cases typically involve applying a patch twice, or at an incorrect +location in the file. If \command{patch} or \hgxcmd{mq}{qpush} ever +mentions an offset or fuzz factor, you should make sure that the +modified files are correct afterwards. + +It's often a good idea to refresh a patch that has applied with an +offset or fuzz factor; refreshing the patch generates new context +information that will make it apply cleanly. I say ``often,'' not +``always,'' because sometimes refreshing a patch will make it fail to +apply against a different revision of the underlying files. In some +cases, such as when you're maintaining a patch that must sit on top of +multiple versions of a source tree, it's acceptable to have a patch +apply with some fuzz, provided you've verified the results of the +patching process in such cases. + +\subsection{Handling rejection} + +If \hgxcmd{mq}{qpush} fails to apply a patch, it will print an error +message and exit. If it has left \sfilename{.rej} files behind, it is +usually best to fix up the rejected hunks before you push more patches +or do any further work. + +If your patch \emph{used to} apply cleanly, and no longer does because +you've changed the underlying code that your patches are based on, +Mercurial Queues can help; see section~\ref{sec:mq:merge} for details. + +Unfortunately, there aren't any great techniques for dealing with +rejected hunks. Most often, you'll need to view the \sfilename{.rej} +file and edit the target file, applying the rejected hunks by hand. + +If you're feeling adventurous, Neil Brown, a Linux kernel hacker, +wrote a tool called \command{wiggle}~\cite{web:wiggle}, which is more +vigorous than \command{patch} in its attempts to make a patch apply. + +Another Linux kernel hacker, Chris Mason (the author of Mercurial +Queues), wrote a similar tool called +\command{mpatch}~\cite{web:mpatch}, which takes a simple approach to +automating the application of hunks rejected by \command{patch}. The +\command{mpatch} command can help with four common reasons that a hunk +may be rejected: + +\begin{itemize} +\item The context in the middle of a hunk has changed. +\item A hunk is missing some context at the beginning or end. +\item A large hunk might apply better---either entirely or in + part---if it was broken up into smaller hunks. +\item A hunk removes lines with slightly different content than those + currently present in the file. +\end{itemize} + +If you use \command{wiggle} or \command{mpatch}, you should be doubly +careful to check your results when you're done. In fact, +\command{mpatch} enforces this method of double-checking the tool's +output, by automatically dropping you into a merge program when it has +done its job, so that you can verify its work and finish off any +remaining merges. + +\section{Getting the best performance out of MQ} +\label{sec:mq:perf} + +MQ is very efficient at handling a large number of patches. I ran +some performance experiments in mid-2006 for a talk that I gave at the +2006 EuroPython conference~\cite{web:europython}. I used as my data +set the Linux 2.6.17-mm1 patch series, which consists of 1,738 +patches. I applied these on top of a Linux kernel repository +containing all 27,472 revisions between Linux 2.6.12-rc2 and Linux +2.6.17. + +On my old, slow laptop, I was able to +\hgcmdargs{qpush}{\hgxopt{mq}{qpush}{-a}} all 1,738 patches in 3.5 minutes, +and \hgcmdargs{qpop}{\hgxopt{mq}{qpop}{-a}} them all in 30 seconds. (On a +newer laptop, the time to push all patches dropped to two minutes.) I +could \hgxcmd{mq}{qrefresh} one of the biggest patches (which made 22,779 +lines of changes to 287 files) in 6.6 seconds. + +Clearly, MQ is well suited to working in large trees, but there are a +few tricks you can use to get the best performance of it. + +First of all, try to ``batch'' operations together. Every time you +run \hgxcmd{mq}{qpush} or \hgxcmd{mq}{qpop}, these commands scan the working +directory once to make sure you haven't made some changes and then +forgotten to run \hgxcmd{mq}{qrefresh}. On a small tree, the time that +this scan takes is unnoticeable. However, on a medium-sized tree +(containing tens of thousands of files), it can take a second or more. + +The \hgxcmd{mq}{qpush} and \hgxcmd{mq}{qpop} commands allow you to push and pop +multiple patches at a time. You can identify the ``destination +patch'' that you want to end up at. When you \hgxcmd{mq}{qpush} with a +destination specified, it will push patches until that patch is at the +top of the applied stack. When you \hgxcmd{mq}{qpop} to a destination, MQ +will pop patches until the destination patch is at the top. + +You can identify a destination patch using either the name of the +patch, or by number. If you use numeric addressing, patches are +counted from zero; this means that the first patch is zero, the second +is one, and so on. + +\section{Updating your patches when the underlying code changes} +\label{sec:mq:merge} + +It's common to have a stack of patches on top of an underlying +repository that you don't modify directly. If you're working on +changes to third-party code, or on a feature that is taking longer to +develop than the rate of change of the code beneath, you will often +need to sync up with the underlying code, and fix up any hunks in your +patches that no longer apply. This is called \emph{rebasing} your +patch series. + +The simplest way to do this is to \hgcmdargs{qpop}{\hgxopt{mq}{qpop}{-a}} +your patches, then \hgcmd{pull} changes into the underlying +repository, and finally \hgcmdargs{qpush}{\hgxopt{mq}{qpop}{-a}} your +patches again. MQ will stop pushing any time it runs across a patch +that fails to apply during conflicts, allowing you to fix your +conflicts, \hgxcmd{mq}{qrefresh} the affected patch, and continue pushing +until you have fixed your entire stack. + +This approach is easy to use and works well if you don't expect +changes to the underlying code to affect how well your patches apply. +If your patch stack touches code that is modified frequently or +invasively in the underlying repository, however, fixing up rejected +hunks by hand quickly becomes tiresome. + +It's possible to partially automate the rebasing process. If your +patches apply cleanly against some revision of the underlying repo, MQ +can use this information to help you to resolve conflicts between your +patches and a different revision. + +The process is a little involved. +\begin{enumerate} +\item To begin, \hgcmdargs{qpush}{-a} all of your patches on top of + the revision where you know that they apply cleanly. +\item Save a backup copy of your patch directory using + \hgcmdargs{qsave}{\hgxopt{mq}{qsave}{-e} \hgxopt{mq}{qsave}{-c}}. This prints + the name of the directory that it has saved the patches in. It will + save the patches to a directory called + \sdirname{.hg/patches.\emph{N}}, where \texttt{\emph{N}} is a small + integer. It also commits a ``save changeset'' on top of your + applied patches; this is for internal book-keeping, and records the + states of the \sfilename{series} and \sfilename{status} files. +\item Use \hgcmd{pull} to bring new changes into the underlying + repository. (Don't run \hgcmdargs{pull}{-u}; see below for why.) +\item Update to the new tip revision, using + \hgcmdargs{update}{\hgopt{update}{-C}} to override the patches you + have pushed. +\item Merge all patches using \hgcmdargs{qpush}{\hgxopt{mq}{qpush}{-m} + \hgxopt{mq}{qpush}{-a}}. The \hgxopt{mq}{qpush}{-m} option to \hgxcmd{mq}{qpush} + tells MQ to perform a three-way merge if the patch fails to apply. +\end{enumerate} + +During the \hgcmdargs{qpush}{\hgxopt{mq}{qpush}{-m}}, each patch in the +\sfilename{series} file is applied normally. If a patch applies with +fuzz or rejects, MQ looks at the queue you \hgxcmd{mq}{qsave}d, and +performs a three-way merge with the corresponding changeset. This +merge uses Mercurial's normal merge machinery, so it may pop up a GUI +merge tool to help you to resolve problems. + +When you finish resolving the effects of a patch, MQ refreshes your +patch based on the result of the merge. + +At the end of this process, your repository will have one extra head +from the old patch queue, and a copy of the old patch queue will be in +\sdirname{.hg/patches.\emph{N}}. You can remove the extra head using +\hgcmdargs{qpop}{\hgxopt{mq}{qpop}{-a} \hgxopt{mq}{qpop}{-n} patches.\emph{N}} +or \hgcmd{strip}. You can delete \sdirname{.hg/patches.\emph{N}} once +you are sure that you no longer need it as a backup. + +\section{Identifying patches} + +MQ commands that work with patches let you refer to a patch either by +using its name or by a number. By name is obvious enough; pass the +name \filename{foo.patch} to \hgxcmd{mq}{qpush}, for example, and it will +push patches until \filename{foo.patch} is applied. + +As a shortcut, you can refer to a patch using both a name and a +numeric offset; \texttt{foo.patch-2} means ``two patches before +\texttt{foo.patch}'', while \texttt{bar.patch+4} means ``four patches +after \texttt{bar.patch}''. + +Referring to a patch by index isn't much different. The first patch +printed in the output of \hgxcmd{mq}{qseries} is patch zero (yes, it's one +of those start-at-zero counting systems); the second is patch one; and +so on. + +MQ also makes it easy to work with patches when you are using normal +Mercurial commands. Every command that accepts a changeset ID will +also accept the name of an applied patch. MQ augments the tags +normally in the repository with an eponymous one for each applied +patch. In addition, the special tags \index{tags!special tag + names!\texttt{qbase}}\texttt{qbase} and \index{tags!special tag + names!\texttt{qtip}}\texttt{qtip} identify the ``bottom-most'' and +topmost applied patches, respectively. + +These additions to Mercurial's normal tagging capabilities make +dealing with patches even more of a breeze. +\begin{itemize} +\item Want to patchbomb a mailing list with your latest series of + changes? + \begin{codesample4} + hg email qbase:qtip + \end{codesample4} + (Don't know what ``patchbombing'' is? See + section~\ref{sec:hgext:patchbomb}.) +\item Need to see all of the patches since \texttt{foo.patch} that + have touched files in a subdirectory of your tree? + \begin{codesample4} + hg log -r foo.patch:qtip \emph{subdir} + \end{codesample4} +\end{itemize} + +Because MQ makes the names of patches available to the rest of +Mercurial through its normal internal tag machinery, you don't need to +type in the entire name of a patch when you want to identify it by +name. + +\begin{figure}[ht] + \interaction{mq.id.output} + \caption{Using MQ's tag features to work with patches} + \label{ex:mq:id} +\end{figure} + +Another nice consequence of representing patch names as tags is that +when you run the \hgcmd{log} command, it will display a patch's name +as a tag, simply as part of its normal output. This makes it easy to +visually distinguish applied patches from underlying ``normal'' +revisions. Figure~\ref{ex:mq:id} shows a few normal Mercurial +commands in use with applied patches. + +\section{Useful things to know about} + +There are a number of aspects of MQ usage that don't fit tidily into +sections of their own, but that are good to know. Here they are, in +one place. + +\begin{itemize} +\item Normally, when you \hgxcmd{mq}{qpop} a patch and \hgxcmd{mq}{qpush} it + again, the changeset that represents the patch after the pop/push + will have a \emph{different identity} than the changeset that + represented the hash beforehand. See + section~\ref{sec:mqref:cmd:qpush} for information as to why this is. +\item It's not a good idea to \hgcmd{merge} changes from another + branch with a patch changeset, at least if you want to maintain the + ``patchiness'' of that changeset and changesets below it on the + patch stack. If you try to do this, it will appear to succeed, but + MQ will become confused. +\end{itemize} + +\section{Managing patches in a repository} +\label{sec:mq:repo} + +Because MQ's \sdirname{.hg/patches} directory resides outside a +Mercurial repository's working directory, the ``underlying'' Mercurial +repository knows nothing about the management or presence of patches. + +This presents the interesting possibility of managing the contents of +the patch directory as a Mercurial repository in its own right. This +can be a useful way to work. For example, you can work on a patch for +a while, \hgxcmd{mq}{qrefresh} it, then \hgcmd{commit} the current state of +the patch. This lets you ``roll back'' to that version of the patch +later on. + +You can then share different versions of the same patch stack among +multiple underlying repositories. I use this when I am developing a +Linux kernel feature. I have a pristine copy of my kernel sources for +each of several CPU architectures, and a cloned repository under each +that contains the patches I am working on. When I want to test a +change on a different architecture, I push my current patches to the +patch repository associated with that kernel tree, pop and push all of +my patches, and build and test that kernel. + +Managing patches in a repository makes it possible for multiple +developers to work on the same patch series without colliding with +each other, all on top of an underlying source base that they may or +may not control. + +\subsection{MQ support for patch repositories} + +MQ helps you to work with the \sdirname{.hg/patches} directory as a +repository; when you prepare a repository for working with patches +using \hgxcmd{mq}{qinit}, you can pass the \hgxopt{mq}{qinit}{-c} option to +create the \sdirname{.hg/patches} directory as a Mercurial repository. + +\begin{note} + If you forget to use the \hgxopt{mq}{qinit}{-c} option, you can simply go + into the \sdirname{.hg/patches} directory at any time and run + \hgcmd{init}. Don't forget to add an entry for the + \sfilename{status} file to the \sfilename{.hgignore} file, though + + (\hgcmdargs{qinit}{\hgxopt{mq}{qinit}{-c}} does this for you + automatically); you \emph{really} don't want to manage the + \sfilename{status} file. +\end{note} + +As a convenience, if MQ notices that the \dirname{.hg/patches} +directory is a repository, it will automatically \hgcmd{add} every +patch that you create and import. + +MQ provides a shortcut command, \hgxcmd{mq}{qcommit}, that runs +\hgcmd{commit} in the \sdirname{.hg/patches} directory. This saves +some bothersome typing. + +Finally, as a convenience to manage the patch directory, you can +define the alias \command{mq} on Unix systems. For example, on Linux +systems using the \command{bash} shell, you can include the following +snippet in your \tildefile{.bashrc}. + +\begin{codesample2} + alias mq=`hg -R \$(hg root)/.hg/patches' +\end{codesample2} + +You can then issue commands of the form \cmdargs{mq}{pull} from +the main repository. + +\subsection{A few things to watch out for} + +MQ's support for working with a repository full of patches is limited +in a few small respects. + +MQ cannot automatically detect changes that you make to the patch +directory. If you \hgcmd{pull}, manually edit, or \hgcmd{update} +changes to patches or the \sfilename{series} file, you will have to +\hgcmdargs{qpop}{\hgxopt{mq}{qpop}{-a}} and then +\hgcmdargs{qpush}{\hgxopt{mq}{qpush}{-a}} in the underlying repository to +see those changes show up there. If you forget to do this, you can +confuse MQ's idea of which patches are applied. + +\section{Third party tools for working with patches} +\label{sec:mq:tools} + +Once you've been working with patches for a while, you'll find +yourself hungry for tools that will help you to understand and +manipulate the patches you're dealing with. + +The \command{diffstat} command~\cite{web:diffstat} generates a +histogram of the modifications made to each file in a patch. It +provides a good way to ``get a sense of'' a patch---which files it +affects, and how much change it introduces to each file and as a +whole. (I find that it's a good idea to use \command{diffstat}'s +\cmdopt{diffstat}{-p} option as a matter of course, as otherwise it +will try to do clever things with prefixes of file names that +inevitably confuse at least me.) + +\begin{figure}[ht] + \interaction{mq.tools.tools} + \caption{The \command{diffstat}, \command{filterdiff}, and \command{lsdiff} commands} + \label{ex:mq:tools} +\end{figure} + +The \package{patchutils} package~\cite{web:patchutils} is invaluable. +It provides a set of small utilities that follow the ``Unix +philosophy;'' each does one useful thing with a patch. The +\package{patchutils} command I use most is \command{filterdiff}, which +extracts subsets from a patch file. For example, given a patch that +modifies hundreds of files across dozens of directories, a single +invocation of \command{filterdiff} can generate a smaller patch that +only touches files whose names match a particular glob pattern. See +section~\ref{mq-collab:tips:interdiff} for another example. + +\section{Good ways to work with patches} + +Whether you are working on a patch series to submit to a free software +or open source project, or a series that you intend to treat as a +sequence of regular changesets when you're done, you can use some +simple techniques to keep your work well organised. + +Give your patches descriptive names. A good name for a patch might be +\filename{rework-device-alloc.patch}, because it will immediately give +you a hint what the purpose of the patch is. Long names shouldn't be +a problem; you won't be typing the names often, but you \emph{will} be +running commands like \hgxcmd{mq}{qapplied} and \hgxcmd{mq}{qtop} over and over. +Good naming becomes especially important when you have a number of +patches to work with, or if you are juggling a number of different +tasks and your patches only get a fraction of your attention. + +Be aware of what patch you're working on. Use the \hgxcmd{mq}{qtop} +command and skim over the text of your patches frequently---for +example, using \hgcmdargs{tip}{\hgopt{tip}{-p}})---to be sure of where +you stand. I have several times worked on and \hgxcmd{mq}{qrefresh}ed a +patch other than the one I intended, and it's often tricky to migrate +changes into the right patch after making them in the wrong one. + +For this reason, it is very much worth investing a little time to +learn how to use some of the third-party tools I described in +section~\ref{sec:mq:tools}, particularly \command{diffstat} and +\command{filterdiff}. The former will give you a quick idea of what +changes your patch is making, while the latter makes it easy to splice +hunks selectively out of one patch and into another. + +\section{MQ cookbook} + +\subsection{Manage ``trivial'' patches} + +Because the overhead of dropping files into a new Mercurial repository +is so low, it makes a lot of sense to manage patches this way even if +you simply want to make a few changes to a source tarball that you +downloaded. + +Begin by downloading and unpacking the source tarball, +and turning it into a Mercurial repository. +\interaction{mq.tarball.download} + +Continue by creating a patch stack and making your changes. +\interaction{mq.tarball.qinit} + +Let's say a few weeks or months pass, and your package author releases +a new version. First, bring their changes into the repository. +\interaction{mq.tarball.newsource} +The pipeline starting with \hgcmd{locate} above deletes all files in +the working directory, so that \hgcmd{commit}'s +\hgopt{commit}{--addremove} option can actually tell which files have +really been removed in the newer version of the source. + +Finally, you can apply your patches on top of the new tree. +\interaction{mq.tarball.repush} + +\subsection{Combining entire patches} +\label{sec:mq:combine} + +MQ provides a command, \hgxcmd{mq}{qfold} that lets you combine entire +patches. This ``folds'' the patches you name, in the order you name +them, into the topmost applied patch, and concatenates their +descriptions onto the end of its description. The patches that you +fold must be unapplied before you fold them. + +The order in which you fold patches matters. If your topmost applied +patch is \texttt{foo}, and you \hgxcmd{mq}{qfold} \texttt{bar} and +\texttt{quux} into it, you will end up with a patch that has the same +effect as if you applied first \texttt{foo}, then \texttt{bar}, +followed by \texttt{quux}. + +\subsection{Merging part of one patch into another} + +Merging \emph{part} of one patch into another is more difficult than +combining entire patches. + +If you want to move changes to entire files, you can use +\command{filterdiff}'s \cmdopt{filterdiff}{-i} and +\cmdopt{filterdiff}{-x} options to choose the modifications to snip +out of one patch, concatenating its output onto the end of the patch +you want to merge into. You usually won't need to modify the patch +you've merged the changes from. Instead, MQ will report some rejected +hunks when you \hgxcmd{mq}{qpush} it (from the hunks you moved into the +other patch), and you can simply \hgxcmd{mq}{qrefresh} the patch to drop +the duplicate hunks. + +If you have a patch that has multiple hunks modifying a file, and you +only want to move a few of those hunks, the job becomes more messy, +but you can still partly automate it. Use \cmdargs{lsdiff}{-nvv} to +print some metadata about the patch. +\interaction{mq.tools.lsdiff} + +This command prints three different kinds of number: +\begin{itemize} +\item (in the first column) a \emph{file number} to identify each file + modified in the patch; +\item (on the next line, indented) the line number within a modified + file where a hunk starts; and +\item (on the same line) a \emph{hunk number} to identify that hunk. +\end{itemize} + +You'll have to use some visual inspection, and reading of the patch, +to identify the file and hunk numbers you'll want, but you can then +pass them to to \command{filterdiff}'s \cmdopt{filterdiff}{--files} +and \cmdopt{filterdiff}{--hunks} options, to select exactly the file +and hunk you want to extract. + +Once you have this hunk, you can concatenate it onto the end of your +destination patch and continue with the remainder of +section~\ref{sec:mq:combine}. + +\section{Differences between quilt and MQ} + +If you are already familiar with quilt, MQ provides a similar command +set. There are a few differences in the way that it works. + +You will already have noticed that most quilt commands have MQ +counterparts that simply begin with a ``\texttt{q}''. The exceptions +are quilt's \texttt{add} and \texttt{remove} commands, the +counterparts for which are the normal Mercurial \hgcmd{add} and +\hgcmd{remove} commands. There is no MQ equivalent of the quilt +\texttt{edit} command. + +%%% Local Variables: +%%% mode: latex +%%% TeX-master: "00book" +%%% End: diff -r bc14f94e726a -r 5cd47f721686 en/ch13-mq-collab.tex --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/en/ch13-mq-collab.tex Thu Jan 29 22:56:27 2009 -0800 @@ -0,0 +1,393 @@ +\chapter{Advanced uses of Mercurial Queues} +\label{chap:mq-collab} + +While it's easy to pick up straightforward uses of Mercurial Queues, +use of a little discipline and some of MQ's less frequently used +capabilities makes it possible to work in complicated development +environments. + +In this chapter, I will use as an example a technique I have used to +manage the development of an Infiniband device driver for the Linux +kernel. The driver in question is large (at least as drivers go), +with 25,000 lines of code spread across 35 source files. It is +maintained by a small team of developers. + +While much of the material in this chapter is specific to Linux, the +same principles apply to any code base for which you're not the +primary owner, and upon which you need to do a lot of development. + +\section{The problem of many targets} + +The Linux kernel changes rapidly, and has never been internally +stable; developers frequently make drastic changes between releases. +This means that a version of the driver that works well with a +particular released version of the kernel will not even \emph{compile} +correctly against, typically, any other version. + +To maintain a driver, we have to keep a number of distinct versions of +Linux in mind. +\begin{itemize} +\item One target is the main Linux kernel development tree. + Maintenance of the code is in this case partly shared by other + developers in the kernel community, who make ``drive-by'' + modifications to the driver as they develop and refine kernel + subsystems. +\item We also maintain a number of ``backports'' to older versions of + the Linux kernel, to support the needs of customers who are running + older Linux distributions that do not incorporate our drivers. (To + \emph{backport} a piece of code is to modify it to work in an older + version of its target environment than the version it was developed + for.) +\item Finally, we make software releases on a schedule that is + necessarily not aligned with those used by Linux distributors and + kernel developers, so that we can deliver new features to customers + without forcing them to upgrade their entire kernels or + distributions. +\end{itemize} + +\subsection{Tempting approaches that don't work well} + +There are two ``standard'' ways to maintain a piece of software that +has to target many different environments. + +The first is to maintain a number of branches, each intended for a +single target. The trouble with this approach is that you must +maintain iron discipline in the flow of changes between repositories. +A new feature or bug fix must start life in a ``pristine'' repository, +then percolate out to every backport repository. Backport changes are +more limited in the branches they should propagate to; a backport +change that is applied to a branch where it doesn't belong will +probably stop the driver from compiling. + +The second is to maintain a single source tree filled with conditional +statements that turn chunks of code on or off depending on the +intended target. Because these ``ifdefs'' are not allowed in the +Linux kernel tree, a manual or automatic process must be followed to +strip them out and yield a clean tree. A code base maintained in this +fashion rapidly becomes a rat's nest of conditional blocks that are +difficult to understand and maintain. + +Neither of these approaches is well suited to a situation where you +don't ``own'' the canonical copy of a source tree. In the case of a +Linux driver that is distributed with the standard kernel, Linus's +tree contains the copy of the code that will be treated by the world +as canonical. The upstream version of ``my'' driver can be modified +by people I don't know, without me even finding out about it until +after the changes show up in Linus's tree. + +These approaches have the added weakness of making it difficult to +generate well-formed patches to submit upstream. + +In principle, Mercurial Queues seems like a good candidate to manage a +development scenario such as the above. While this is indeed the +case, MQ contains a few added features that make the job more +pleasant. + +\section{Conditionally applying patches with + guards} + +Perhaps the best way to maintain sanity with so many targets is to be +able to choose specific patches to apply for a given situation. MQ +provides a feature called ``guards'' (which originates with quilt's +\texttt{guards} command) that does just this. To start off, let's +create a simple repository for experimenting in. +\interaction{mq.guards.init} +This gives us a tiny repository that contains two patches that don't +have any dependencies on each other, because they touch different files. + +The idea behind conditional application is that you can ``tag'' a +patch with a \emph{guard}, which is simply a text string of your +choosing, then tell MQ to select specific guards to use when applying +patches. MQ will then either apply, or skip over, a guarded patch, +depending on the guards that you have selected. + +A patch can have an arbitrary number of guards; +each one is \emph{positive} (``apply this patch if this guard is +selected'') or \emph{negative} (``skip this patch if this guard is +selected''). A patch with no guards is always applied. + +\section{Controlling the guards on a patch} + +The \hgxcmd{mq}{qguard} command lets you determine which guards should +apply to a patch, or display the guards that are already in effect. +Without any arguments, it displays the guards on the current topmost +patch. +\interaction{mq.guards.qguard} +To set a positive guard on a patch, prefix the name of the guard with +a ``\texttt{+}''. +\interaction{mq.guards.qguard.pos} +To set a negative guard on a patch, prefix the name of the guard with +a ``\texttt{-}''. +\interaction{mq.guards.qguard.neg} + +\begin{note} + The \hgxcmd{mq}{qguard} command \emph{sets} the guards on a patch; it + doesn't \emph{modify} them. What this means is that if you run + \hgcmdargs{qguard}{+a +b} on a patch, then \hgcmdargs{qguard}{+c} on + the same patch, the \emph{only} guard that will be set on it + afterwards is \texttt{+c}. +\end{note} + +Mercurial stores guards in the \sfilename{series} file; the form in +which they are stored is easy both to understand and to edit by hand. +(In other words, you don't have to use the \hgxcmd{mq}{qguard} command if +you don't want to; it's okay to simply edit the \sfilename{series} +file.) +\interaction{mq.guards.series} + +\section{Selecting the guards to use} + +The \hgxcmd{mq}{qselect} command determines which guards are active at a +given time. The effect of this is to determine which patches MQ will +apply the next time you run \hgxcmd{mq}{qpush}. It has no other effect; in +particular, it doesn't do anything to patches that are already +applied. + +With no arguments, the \hgxcmd{mq}{qselect} command lists the guards +currently in effect, one per line of output. Each argument is treated +as the name of a guard to apply. +\interaction{mq.guards.qselect.foo} +In case you're interested, the currently selected guards are stored in +the \sfilename{guards} file. +\interaction{mq.guards.qselect.cat} +We can see the effect the selected guards have when we run +\hgxcmd{mq}{qpush}. +\interaction{mq.guards.qselect.qpush} + +A guard cannot start with a ``\texttt{+}'' or ``\texttt{-}'' +character. The name of a guard must not contain white space, but most +other characters are acceptable. If you try to use a guard with an +invalid name, MQ will complain: +\interaction{mq.guards.qselect.error} +Changing the selected guards changes the patches that are applied. +\interaction{mq.guards.qselect.quux} +You can see in the example below that negative guards take precedence +over positive guards. +\interaction{mq.guards.qselect.foobar} + +\section{MQ's rules for applying patches} + +The rules that MQ uses when deciding whether to apply a patch +are as follows. +\begin{itemize} +\item A patch that has no guards is always applied. +\item If the patch has any negative guard that matches any currently + selected guard, the patch is skipped. +\item If the patch has any positive guard that matches any currently + selected guard, the patch is applied. +\item If the patch has positive or negative guards, but none matches + any currently selected guard, the patch is skipped. +\end{itemize} + +\section{Trimming the work environment} + +In working on the device driver I mentioned earlier, I don't apply the +patches to a normal Linux kernel tree. Instead, I use a repository +that contains only a snapshot of the source files and headers that are +relevant to Infiniband development. This repository is~1\% the size +of a kernel repository, so it's easier to work with. + +I then choose a ``base'' version on top of which the patches are +applied. This is a snapshot of the Linux kernel tree as of a revision +of my choosing. When I take the snapshot, I record the changeset ID +from the kernel repository in the commit message. Since the snapshot +preserves the ``shape'' and content of the relevant parts of the +kernel tree, I can apply my patches on top of either my tiny +repository or a normal kernel tree. + +Normally, the base tree atop which the patches apply should be a +snapshot of a very recent upstream tree. This best facilitates the +development of patches that can easily be submitted upstream with few +or no modifications. + +\section{Dividing up the \sfilename{series} file} + +I categorise the patches in the \sfilename{series} file into a number +of logical groups. Each section of like patches begins with a block +of comments that describes the purpose of the patches that follow. + +The sequence of patch groups that I maintain follows. The ordering of +these groups is important; I'll describe why after I introduce the +groups. +\begin{itemize} +\item The ``accepted'' group. Patches that the development team has + submitted to the maintainer of the Infiniband subsystem, and which + he has accepted, but which are not present in the snapshot that the + tiny repository is based on. These are ``read only'' patches, + present only to transform the tree into a similar state as it is in + the upstream maintainer's repository. +\item The ``rework'' group. Patches that I have submitted, but that + the upstream maintainer has requested modifications to before he + will accept them. +\item The ``pending'' group. Patches that I have not yet submitted to + the upstream maintainer, but which we have finished working on. + These will be ``read only'' for a while. If the upstream maintainer + accepts them upon submission, I'll move them to the end of the + ``accepted'' group. If he requests that I modify any, I'll move + them to the beginning of the ``rework'' group. +\item The ``in progress'' group. Patches that are actively being + developed, and should not be submitted anywhere yet. +\item The ``backport'' group. Patches that adapt the source tree to + older versions of the kernel tree. +\item The ``do not ship'' group. Patches that for some reason should + never be submitted upstream. For example, one such patch might + change embedded driver identification strings to make it easier to + distinguish, in the field, between an out-of-tree version of the + driver and a version shipped by a distribution vendor. +\end{itemize} + +Now to return to the reasons for ordering groups of patches in this +way. We would like the lowest patches in the stack to be as stable as +possible, so that we will not need to rework higher patches due to +changes in context. Putting patches that will never be changed first +in the \sfilename{series} file serves this purpose. + +We would also like the patches that we know we'll need to modify to be +applied on top of a source tree that resembles the upstream tree as +closely as possible. This is why we keep accepted patches around for +a while. + +The ``backport'' and ``do not ship'' patches float at the end of the +\sfilename{series} file. The backport patches must be applied on top +of all other patches, and the ``do not ship'' patches might as well +stay out of harm's way. + +\section{Maintaining the patch series} + +In my work, I use a number of guards to control which patches are to +be applied. + +\begin{itemize} +\item ``Accepted'' patches are guarded with \texttt{accepted}. I + enable this guard most of the time. When I'm applying the patches + on top of a tree where the patches are already present, I can turn + this patch off, and the patches that follow it will apply cleanly. +\item Patches that are ``finished'', but not yet submitted, have no + guards. If I'm applying the patch stack to a copy of the upstream + tree, I don't need to enable any guards in order to get a reasonably + safe source tree. +\item Those patches that need reworking before being resubmitted are + guarded with \texttt{rework}. +\item For those patches that are still under development, I use + \texttt{devel}. +\item A backport patch may have several guards, one for each version + of the kernel to which it applies. For example, a patch that + backports a piece of code to~2.6.9 will have a~\texttt{2.6.9} guard. +\end{itemize} +This variety of guards gives me considerable flexibility in +determining what kind of source tree I want to end up with. For most +situations, the selection of appropriate guards is automated during +the build process, but I can manually tune the guards to use for less +common circumstances. + +\subsection{The art of writing backport patches} + +Using MQ, writing a backport patch is a simple process. All such a +patch has to do is modify a piece of code that uses a kernel feature +not present in the older version of the kernel, so that the driver +continues to work correctly under that older version. + +A useful goal when writing a good backport patch is to make your code +look as if it was written for the older version of the kernel you're +targeting. The less obtrusive the patch, the easier it will be to +understand and maintain. If you're writing a collection of backport +patches to avoid the ``rat's nest'' effect of lots of +\texttt{\#ifdef}s (hunks of source code that are only used +conditionally) in your code, don't introduce version-dependent +\texttt{\#ifdef}s into the patches. Instead, write several patches, +each of which makes unconditional changes, and control their +application using guards. + +There are two reasons to divide backport patches into a distinct +group, away from the ``regular'' patches whose effects they modify. +The first is that intermingling the two makes it more difficult to use +a tool like the \hgext{patchbomb} extension to automate the process of +submitting the patches to an upstream maintainer. The second is that +a backport patch could perturb the context in which a subsequent +regular patch is applied, making it impossible to apply the regular +patch cleanly \emph{without} the earlier backport patch already being +applied. + +\section{Useful tips for developing with MQ} + +\subsection{Organising patches in directories} + +If you're working on a substantial project with MQ, it's not difficult +to accumulate a large number of patches. For example, I have one +patch repository that contains over 250 patches. + +If you can group these patches into separate logical categories, you +can if you like store them in different directories; MQ has no +problems with patch names that contain path separators. + +\subsection{Viewing the history of a patch} +\label{mq-collab:tips:interdiff} + +If you're developing a set of patches over a long time, it's a good +idea to maintain them in a repository, as discussed in +section~\ref{sec:mq:repo}. If you do so, you'll quickly discover that +using the \hgcmd{diff} command to look at the history of changes to a +patch is unworkable. This is in part because you're looking at the +second derivative of the real code (a diff of a diff), but also +because MQ adds noise to the process by modifying time stamps and +directory names when it updates a patch. + +However, you can use the \hgext{extdiff} extension, which is bundled +with Mercurial, to turn a diff of two versions of a patch into +something readable. To do this, you will need a third-party package +called \package{patchutils}~\cite{web:patchutils}. This provides a +command named \command{interdiff}, which shows the differences between +two diffs as a diff. Used on two versions of the same diff, it +generates a diff that represents the diff from the first to the second +version. + +You can enable the \hgext{extdiff} extension in the usual way, by +adding a line to the \rcsection{extensions} section of your \hgrc. +\begin{codesample2} + [extensions] + extdiff = +\end{codesample2} +The \command{interdiff} command expects to be passed the names of two +files, but the \hgext{extdiff} extension passes the program it runs a +pair of directories, each of which can contain an arbitrary number of +files. We thus need a small program that will run \command{interdiff} +on each pair of files in these two directories. This program is +available as \sfilename{hg-interdiff} in the \dirname{examples} +directory of the source code repository that accompanies this book. +\excode{hg-interdiff} + +With the \sfilename{hg-interdiff} program in your shell's search path, +you can run it as follows, from inside an MQ patch directory: +\begin{codesample2} + hg extdiff -p hg-interdiff -r A:B my-change.patch +\end{codesample2} +Since you'll probably want to use this long-winded command a lot, you +can get \hgext{hgext} to make it available as a normal Mercurial +command, again by editing your \hgrc. +\begin{codesample2} + [extdiff] + cmd.interdiff = hg-interdiff +\end{codesample2} +This directs \hgext{hgext} to make an \texttt{interdiff} command +available, so you can now shorten the previous invocation of +\hgxcmd{extdiff}{extdiff} to something a little more wieldy. +\begin{codesample2} + hg interdiff -r A:B my-change.patch +\end{codesample2} + +\begin{note} + The \command{interdiff} command works well only if the underlying + files against which versions of a patch are generated remain the + same. If you create a patch, modify the underlying files, and then + regenerate the patch, \command{interdiff} may not produce useful + output. +\end{note} + +The \hgext{extdiff} extension is useful for more than merely improving +the presentation of MQ~patches. To read more about it, go to +section~\ref{sec:hgext:extdiff}. + +%%% Local Variables: +%%% mode: latex +%%% TeX-master: "00book" +%%% End: diff -r bc14f94e726a -r 5cd47f721686 en/ch14-hgext.tex --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/en/ch14-hgext.tex Thu Jan 29 22:56:27 2009 -0800 @@ -0,0 +1,429 @@ +\chapter{Adding functionality with extensions} +\label{chap:hgext} + +While the core of Mercurial is quite complete from a functionality +standpoint, it's deliberately shorn of fancy features. This approach +of preserving simplicity keeps the software easy to deal with for both +maintainers and users. + +However, Mercurial doesn't box you in with an inflexible command set: +you can add features to it as \emph{extensions} (sometimes known as +\emph{plugins}). We've already discussed a few of these extensions in +earlier chapters. +\begin{itemize} +\item Section~\ref{sec:tour-merge:fetch} covers the \hgext{fetch} + extension; this combines pulling new changes and merging them with + local changes into a single command, \hgxcmd{fetch}{fetch}. +\item In chapter~\ref{chap:hook}, we covered several extensions that + are useful for hook-related functionality: \hgext{acl} adds access + control lists; \hgext{bugzilla} adds integration with the Bugzilla + bug tracking system; and \hgext{notify} sends notification emails on + new changes. +\item The Mercurial Queues patch management extension is so invaluable + that it merits two chapters and an appendix all to itself. + Chapter~\ref{chap:mq} covers the basics; + chapter~\ref{chap:mq-collab} discusses advanced topics; and + appendix~\ref{chap:mqref} goes into detail on each command. +\end{itemize} + +In this chapter, we'll cover some of the other extensions that are +available for Mercurial, and briefly touch on some of the machinery +you'll need to know about if you want to write an extension of your +own. +\begin{itemize} +\item In section~\ref{sec:hgext:inotify}, we'll discuss the + possibility of \emph{huge} performance improvements using the + \hgext{inotify} extension. +\end{itemize} + +\section{Improve performance with the \hgext{inotify} extension} +\label{sec:hgext:inotify} + +Are you interested in having some of the most common Mercurial +operations run as much as a hundred times faster? Read on! + +Mercurial has great performance under normal circumstances. For +example, when you run the \hgcmd{status} command, Mercurial has to +scan almost every directory and file in your repository so that it can +display file status. Many other Mercurial commands need to do the +same work behind the scenes; for example, the \hgcmd{diff} command +uses the status machinery to avoid doing an expensive comparison +operation on files that obviously haven't changed. + +Because obtaining file status is crucial to good performance, the +authors of Mercurial have optimised this code to within an inch of its +life. However, there's no avoiding the fact that when you run +\hgcmd{status}, Mercurial is going to have to perform at least one +expensive system call for each managed file to determine whether it's +changed since the last time Mercurial checked. For a sufficiently +large repository, this can take a long time. + +To put a number on the magnitude of this effect, I created a +repository containing 150,000 managed files. I timed \hgcmd{status} +as taking ten seconds to run, even when \emph{none} of those files had +been modified. + +Many modern operating systems contain a file notification facility. +If a program signs up to an appropriate service, the operating system +will notify it every time a file of interest is created, modified, or +deleted. On Linux systems, the kernel component that does this is +called \texttt{inotify}. + +Mercurial's \hgext{inotify} extension talks to the kernel's +\texttt{inotify} component to optimise \hgcmd{status} commands. The +extension has two components. A daemon sits in the background and +receives notifications from the \texttt{inotify} subsystem. It also +listens for connections from a regular Mercurial command. The +extension modifies Mercurial's behaviour so that instead of scanning +the filesystem, it queries the daemon. Since the daemon has perfect +information about the state of the repository, it can respond with a +result instantaneously, avoiding the need to scan every directory and +file in the repository. + +Recall the ten seconds that I measured plain Mercurial as taking to +run \hgcmd{status} on a 150,000 file repository. With the +\hgext{inotify} extension enabled, the time dropped to 0.1~seconds, a +factor of \emph{one hundred} faster. + +Before we continue, please pay attention to some caveats. +\begin{itemize} +\item The \hgext{inotify} extension is Linux-specific. Because it + interfaces directly to the Linux kernel's \texttt{inotify} + subsystem, it does not work on other operating systems. +\item It should work on any Linux distribution that was released after + early~2005. Older distributions are likely to have a kernel that + lacks \texttt{inotify}, or a version of \texttt{glibc} that does not + have the necessary interfacing support. +\item Not all filesystems are suitable for use with the + \hgext{inotify} extension. Network filesystems such as NFS are a + non-starter, for example, particularly if you're running Mercurial + on several systems, all mounting the same network filesystem. The + kernel's \texttt{inotify} system has no way of knowing about changes + made on another system. Most local filesystems (e.g.~ext3, XFS, + ReiserFS) should work fine. +\end{itemize} + +The \hgext{inotify} extension is not yet shipped with Mercurial as of +May~2007, so it's a little more involved to set up than other +extensions. But the performance improvement is worth it! + +The extension currently comes in two parts: a set of patches to the +Mercurial source code, and a library of Python bindings to the +\texttt{inotify} subsystem. +\begin{note} + There are \emph{two} Python \texttt{inotify} binding libraries. One + of them is called \texttt{pyinotify}, and is packaged by some Linux + distributions as \texttt{python-inotify}. This is \emph{not} the + one you'll need, as it is too buggy and inefficient to be practical. +\end{note} +To get going, it's best to already have a functioning copy of +Mercurial installed. +\begin{note} + If you follow the instructions below, you'll be \emph{replacing} and + overwriting any existing installation of Mercurial that you might + already have, using the latest ``bleeding edge'' Mercurial code. + Don't say you weren't warned! +\end{note} +\begin{enumerate} +\item Clone the Python \texttt{inotify} binding repository. Build and + install it. + \begin{codesample4} + hg clone http://hg.kublai.com/python/inotify + cd inotify + python setup.py build --force + sudo python setup.py install --skip-build + \end{codesample4} +\item Clone the \dirname{crew} Mercurial repository. Clone the + \hgext{inotify} patch repository so that Mercurial Queues will be + able to apply patches to your cope of the \dirname{crew} repository. + \begin{codesample4} + hg clone http://hg.intevation.org/mercurial/crew + hg clone crew inotify + hg clone http://hg.kublai.com/mercurial/patches/inotify inotify/.hg/patches + \end{codesample4} +\item Make sure that you have the Mercurial Queues extension, + \hgext{mq}, enabled. If you've never used MQ, read + section~\ref{sec:mq:start} to get started quickly. +\item Go into the \dirname{inotify} repo, and apply all of the + \hgext{inotify} patches using the \hgxopt{mq}{qpush}{-a} option to + the \hgxcmd{mq}{qpush} command. + \begin{codesample4} + cd inotify + hg qpush -a + \end{codesample4} + If you get an error message from \hgxcmd{mq}{qpush}, you should not + continue. Instead, ask for help. +\item Build and install the patched version of Mercurial. + \begin{codesample4} + python setup.py build --force + sudo python setup.py install --skip-build + \end{codesample4} +\end{enumerate} +Once you've build a suitably patched version of Mercurial, all you +need to do to enable the \hgext{inotify} extension is add an entry to +your \hgrc. +\begin{codesample2} + [extensions] + inotify = +\end{codesample2} +When the \hgext{inotify} extension is enabled, Mercurial will +automatically and transparently start the status daemon the first time +you run a command that needs status in a repository. It runs one +status daemon per repository. + +The status daemon is started silently, and runs in the background. If +you look at a list of running processes after you've enabled the +\hgext{inotify} extension and run a few commands in different +repositories, you'll thus see a few \texttt{hg} processes sitting +around, waiting for updates from the kernel and queries from +Mercurial. + +The first time you run a Mercurial command in a repository when you +have the \hgext{inotify} extension enabled, it will run with about the +same performance as a normal Mercurial command. This is because the +status daemon needs to perform a normal status scan so that it has a +baseline against which to apply later updates from the kernel. +However, \emph{every} subsequent command that does any kind of status +check should be noticeably faster on repositories of even fairly +modest size. Better yet, the bigger your repository is, the greater a +performance advantage you'll see. The \hgext{inotify} daemon makes +status operations almost instantaneous on repositories of all sizes! + +If you like, you can manually start a status daemon using the +\hgxcmd{inotify}{inserve} command. This gives you slightly finer +control over how the daemon ought to run. This command will of course +only be available when the \hgext{inotify} extension is enabled. + +When you're using the \hgext{inotify} extension, you should notice +\emph{no difference at all} in Mercurial's behaviour, with the sole +exception of status-related commands running a whole lot faster than +they used to. You should specifically expect that commands will not +print different output; neither should they give different results. +If either of these situations occurs, please report a bug. + +\section{Flexible diff support with the \hgext{extdiff} extension} +\label{sec:hgext:extdiff} + +Mercurial's built-in \hgcmd{diff} command outputs plaintext unified +diffs. +\interaction{extdiff.diff} +If you would like to use an external tool to display modifications, +you'll want to use the \hgext{extdiff} extension. This will let you +use, for example, a graphical diff tool. + +The \hgext{extdiff} extension is bundled with Mercurial, so it's easy +to set up. In the \rcsection{extensions} section of your \hgrc, +simply add a one-line entry to enable the extension. +\begin{codesample2} + [extensions] + extdiff = +\end{codesample2} +This introduces a command named \hgxcmd{extdiff}{extdiff}, which by +default uses your system's \command{diff} command to generate a +unified diff in the same form as the built-in \hgcmd{diff} command. +\interaction{extdiff.extdiff} +The result won't be exactly the same as with the built-in \hgcmd{diff} +variations, because the output of \command{diff} varies from one +system to another, even when passed the same options. + +As the ``\texttt{making snapshot}'' lines of output above imply, the +\hgxcmd{extdiff}{extdiff} command works by creating two snapshots of +your source tree. The first snapshot is of the source revision; the +second, of the target revision or working directory. The +\hgxcmd{extdiff}{extdiff} command generates these snapshots in a +temporary directory, passes the name of each directory to an external +diff viewer, then deletes the temporary directory. For efficiency, it +only snapshots the directories and files that have changed between the +two revisions. + +Snapshot directory names have the same base name as your repository. +If your repository path is \dirname{/quux/bar/foo}, then \dirname{foo} +will be the name of each snapshot directory. Each snapshot directory +name has its changeset ID appended, if appropriate. If a snapshot is +of revision \texttt{a631aca1083f}, the directory will be named +\dirname{foo.a631aca1083f}. A snapshot of the working directory won't +have a changeset ID appended, so it would just be \dirname{foo} in +this example. To see what this looks like in practice, look again at +the \hgxcmd{extdiff}{extdiff} example above. Notice that the diff has +the snapshot directory names embedded in its header. + +The \hgxcmd{extdiff}{extdiff} command accepts two important options. +The \hgxopt{extdiff}{extdiff}{-p} option lets you choose a program to +view differences with, instead of \command{diff}. With the +\hgxopt{extdiff}{extdiff}{-o} option, you can change the options that +\hgxcmd{extdiff}{extdiff} passes to the program (by default, these +options are ``\texttt{-Npru}'', which only make sense if you're +running \command{diff}). In other respects, the +\hgxcmd{extdiff}{extdiff} command acts similarly to the built-in +\hgcmd{diff} command: you use the same option names, syntax, and +arguments to specify the revisions you want, the files you want, and +so on. + +As an example, here's how to run the normal system \command{diff} +command, getting it to generate context diffs (using the +\cmdopt{diff}{-c} option) instead of unified diffs, and five lines of +context instead of the default three (passing \texttt{5} as the +argument to the \cmdopt{diff}{-C} option). +\interaction{extdiff.extdiff-ctx} + +Launching a visual diff tool is just as easy. Here's how to launch +the \command{kdiff3} viewer. +\begin{codesample2} + hg extdiff -p kdiff3 -o '' +\end{codesample2} + +If your diff viewing command can't deal with directories, you can +easily work around this with a little scripting. For an example of +such scripting in action with the \hgext{mq} extension and the +\command{interdiff} command, see +section~\ref{mq-collab:tips:interdiff}. + +\subsection{Defining command aliases} + +It can be cumbersome to remember the options to both the +\hgxcmd{extdiff}{extdiff} command and the diff viewer you want to use, +so the \hgext{extdiff} extension lets you define \emph{new} commands +that will invoke your diff viewer with exactly the right options. + +All you need to do is edit your \hgrc, and add a section named +\rcsection{extdiff}. Inside this section, you can define multiple +commands. Here's how to add a \texttt{kdiff3} command. Once you've +defined this, you can type ``\texttt{hg kdiff3}'' and the +\hgext{extdiff} extension will run \command{kdiff3} for you. +\begin{codesample2} + [extdiff] + cmd.kdiff3 = +\end{codesample2} +If you leave the right hand side of the definition empty, as above, +the \hgext{extdiff} extension uses the name of the command you defined +as the name of the external program to run. But these names don't +have to be the same. Here, we define a command named ``\texttt{hg + wibble}'', which runs \command{kdiff3}. +\begin{codesample2} + [extdiff] + cmd.wibble = kdiff3 +\end{codesample2} + +You can also specify the default options that you want to invoke your +diff viewing program with. The prefix to use is ``\texttt{opts.}'', +followed by the name of the command to which the options apply. This +example defines a ``\texttt{hg vimdiff}'' command that runs the +\command{vim} editor's \texttt{DirDiff} extension. +\begin{codesample2} + [extdiff] + cmd.vimdiff = vim + opts.vimdiff = -f '+next' '+execute "DirDiff" argv(0) argv(1)' +\end{codesample2} + +\section{Cherrypicking changes with the \hgext{transplant} extension} +\label{sec:hgext:transplant} + +Need to have a long chat with Brendan about this. + +\section{Send changes via email with the \hgext{patchbomb} extension} +\label{sec:hgext:patchbomb} + +Many projects have a culture of ``change review'', in which people +send their modifications to a mailing list for others to read and +comment on before they commit the final version to a shared +repository. Some projects have people who act as gatekeepers; they +apply changes from other people to a repository to which those others +don't have access. + +Mercurial makes it easy to send changes over email for review or +application, via its \hgext{patchbomb} extension. The extension is so +namd because changes are formatted as patches, and it's usual to send +one changeset per email message. Sending a long series of changes by +email is thus much like ``bombing'' the recipient's inbox, hence +``patchbomb''. + +As usual, the basic configuration of the \hgext{patchbomb} extension +takes just one or two lines in your \hgrc. +\begin{codesample2} + [extensions] + patchbomb = +\end{codesample2} +Once you've enabled the extension, you will have a new command +available, named \hgxcmd{patchbomb}{email}. + +The safest and best way to invoke the \hgxcmd{patchbomb}{email} +command is to \emph{always} run it first with the +\hgxopt{patchbomb}{email}{-n} option. This will show you what the +command \emph{would} send, without actually sending anything. Once +you've had a quick glance over the changes and verified that you are +sending the right ones, you can rerun the same command, with the +\hgxopt{patchbomb}{email}{-n} option removed. + +The \hgxcmd{patchbomb}{email} command accepts the same kind of +revision syntax as every other Mercurial command. For example, this +command will send every revision between 7 and \texttt{tip}, +inclusive. +\begin{codesample2} + hg email -n 7:tip +\end{codesample2} +You can also specify a \emph{repository} to compare with. If you +provide a repository but no revisions, the \hgxcmd{patchbomb}{email} +command will send all revisions in the local repository that are not +present in the remote repository. If you additionally specify +revisions or a branch name (the latter using the +\hgxopt{patchbomb}{email}{-b} option), this will constrain the +revisions sent. + +It's perfectly safe to run the \hgxcmd{patchbomb}{email} command +without the names of the people you want to send to: if you do this, +it will just prompt you for those values interactively. (If you're +using a Linux or Unix-like system, you should have enhanced +\texttt{readline}-style editing capabilities when entering those +headers, too, which is useful.) + +When you are sending just one revision, the \hgxcmd{patchbomb}{email} +command will by default use the first line of the changeset +description as the subject of the single email message it sends. + +If you send multiple revisions, the \hgxcmd{patchbomb}{email} command +will usually send one message per changeset. It will preface the +series with an introductory message, in which you should describe the +purpose of the series of changes you're sending. + +\subsection{Changing the behaviour of patchbombs} + +Not every project has exactly the same conventions for sending changes +in email; the \hgext{patchbomb} extension tries to accommodate a +number of variations through command line options. +\begin{itemize} +\item You can write a subject for the introductory message on the + command line using the \hgxopt{patchbomb}{email}{-s} option. This + takes one argument, the text of the subject to use. +\item To change the email address from which the messages originate, + use the \hgxopt{patchbomb}{email}{-f} option. This takes one + argument, the email address to use. +\item The default behaviour is to send unified diffs (see + section~\ref{sec:mq:patch} for a description of the format), one per + message. You can send a binary bundle instead with the + \hgxopt{patchbomb}{email}{-b} option. +\item Unified diffs are normally prefaced with a metadata header. You + can omit this, and send unadorned diffs, with the + \hgxopt{patchbomb}{email}{--plain} option. +\item Diffs are normally sent ``inline'', in the same body part as the + description of a patch. This makes it easiest for the largest + number of readers to quote and respond to parts of a diff, as some + mail clients will only quote the first MIME body part in a message. + If you'd prefer to send the description and the diff in separate + body parts, use the \hgxopt{patchbomb}{email}{-a} option. +\item Instead of sending mail messages, you can write them to an + \texttt{mbox}-format mail folder using the + \hgxopt{patchbomb}{email}{-m} option. That option takes one + argument, the name of the file to write to. +\item If you would like to add a \command{diffstat}-format summary to + each patch, and one to the introductory message, use the + \hgxopt{patchbomb}{email}{-d} option. The \command{diffstat} + command displays a table containing the name of each file patched, + the number of lines affected, and a histogram showing how much each + file is modified. This gives readers a qualitative glance at how + complex a patch is. +\end{itemize} + +%%% Local Variables: +%%% mode: latex +%%% TeX-master: "00book" +%%% End: diff -r bc14f94e726a -r 5cd47f721686 en/cmdref.tex --- a/en/cmdref.tex Thu Jan 29 22:47:34 2009 -0800 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,176 +0,0 @@ -\chapter{Command reference} -\label{cmdref} - -\cmdref{add}{add files at the next commit} -\optref{add}{I}{include} -\optref{add}{X}{exclude} -\optref{add}{n}{dry-run} - -\cmdref{diff}{print changes in history or working directory} - -Show differences between revisions for the specified files or -directories, using the unified diff format. For a description of the -unified diff format, see section~\ref{sec:mq:patch}. - -By default, this command does not print diffs for files that Mercurial -considers to contain binary data. To control this behaviour, see the -\hgopt{diff}{-a} and \hgopt{diff}{--git} options. - -\subsection{Options} - -\loptref{diff}{nodates} - -Omit date and time information when printing diff headers. - -\optref{diff}{B}{ignore-blank-lines} - -Do not print changes that only insert or delete blank lines. A line -that contains only whitespace is not considered blank. - -\optref{diff}{I}{include} - -Include files and directories whose names match the given patterns. - -\optref{diff}{X}{exclude} - -Exclude files and directories whose names match the given patterns. - -\optref{diff}{a}{text} - -If this option is not specified, \hgcmd{diff} will refuse to print -diffs for files that it detects as binary. Specifying \hgopt{diff}{-a} -forces \hgcmd{diff} to treat all files as text, and generate diffs for -all of them. - -This option is useful for files that are ``mostly text'' but have a -few embedded NUL characters. If you use it on files that contain a -lot of binary data, its output will be incomprehensible. - -\optref{diff}{b}{ignore-space-change} - -Do not print a line if the only change to that line is in the amount -of white space it contains. - -\optref{diff}{g}{git} - -Print \command{git}-compatible diffs. XXX reference a format -description. - -\optref{diff}{p}{show-function} - -Display the name of the enclosing function in a hunk header, using a -simple heuristic. This functionality is enabled by default, so the -\hgopt{diff}{-p} option has no effect unless you change the value of -the \rcitem{diff}{showfunc} config item, as in the following example. -\interaction{cmdref.diff-p} - -\optref{diff}{r}{rev} - -Specify one or more revisions to compare. The \hgcmd{diff} command -accepts up to two \hgopt{diff}{-r} options to specify the revisions to -compare. - -\begin{enumerate} -\setcounter{enumi}{0} -\item Display the differences between the parent revision of the - working directory and the working directory. -\item Display the differences between the specified changeset and the - working directory. -\item Display the differences between the two specified changesets. -\end{enumerate} - -You can specify two revisions using either two \hgopt{diff}{-r} -options or revision range notation. For example, the two revision -specifications below are equivalent. -\begin{codesample2} - hg diff -r 10 -r 20 - hg diff -r10:20 -\end{codesample2} - -When you provide two revisions, Mercurial treats the order of those -revisions as significant. Thus, \hgcmdargs{diff}{-r10:20} will -produce a diff that will transform files from their contents as of -revision~10 to their contents as of revision~20, while -\hgcmdargs{diff}{-r20:10} means the opposite: the diff that will -transform files from their revision~20 contents to their revision~10 -contents. You cannot reverse the ordering in this way if you are -diffing against the working directory. - -\optref{diff}{w}{ignore-all-space} - -\cmdref{version}{print version and copyright information} - -This command displays the version of Mercurial you are running, and -its copyright license. There are four kinds of version string that -you may see. -\begin{itemize} -\item The string ``\texttt{unknown}''. This version of Mercurial was - not built in a Mercurial repository, and cannot determine its own - version. -\item A short numeric string, such as ``\texttt{1.1}''. This is a - build of a revision of Mercurial that was identified by a specific - tag in the repository where it was built. (This doesn't necessarily - mean that you're running an official release; someone else could - have added that tag to any revision in the repository where they - built Mercurial.) -\item A hexadecimal string, such as ``\texttt{875489e31abe}''. This - is a build of the given revision of Mercurial. -\item A hexadecimal string followed by a date, such as - ``\texttt{875489e31abe+20070205}''. This is a build of the given - revision of Mercurial, where the build repository contained some - local changes that had not been committed. -\end{itemize} - -\subsection{Tips and tricks} - -\subsubsection{Why do the results of \hgcmd{diff} and \hgcmd{status} - differ?} -\label{cmdref:diff-vs-status} - -When you run the \hgcmd{status} command, you'll see a list of files -that Mercurial will record changes for the next time you perform a -commit. If you run the \hgcmd{diff} command, you may notice that it -prints diffs for only a \emph{subset} of the files that \hgcmd{status} -listed. There are two possible reasons for this. - -The first is that \hgcmd{status} prints some kinds of modifications -that \hgcmd{diff} doesn't normally display. The \hgcmd{diff} command -normally outputs unified diffs, which don't have the ability to -represent some changes that Mercurial can track. Most notably, -traditional diffs can't represent a change in whether or not a file is -executable, but Mercurial records this information. - -If you use the \hgopt{diff}{--git} option to \hgcmd{diff}, it will -display \command{git}-compatible diffs that \emph{can} display this -extra information. - -The second possible reason that \hgcmd{diff} might be printing diffs -for a subset of the files displayed by \hgcmd{status} is that if you -invoke it without any arguments, \hgcmd{diff} prints diffs against the -first parent of the working directory. If you have run \hgcmd{merge} -to merge two changesets, but you haven't yet committed the results of -the merge, your working directory has two parents (use \hgcmd{parents} -to see them). While \hgcmd{status} prints modifications relative to -\emph{both} parents after an uncommitted merge, \hgcmd{diff} still -operates relative only to the first parent. You can get it to print -diffs relative to the second parent by specifying that parent with the -\hgopt{diff}{-r} option. There is no way to print diffs relative to -both parents. - -\subsubsection{Generating safe binary diffs} - -If you use the \hgopt{diff}{-a} option to force Mercurial to print -diffs of files that are either ``mostly text'' or contain lots of -binary data, those diffs cannot subsequently be applied by either -Mercurial's \hgcmd{import} command or the system's \command{patch} -command. - -If you want to generate a diff of a binary file that is safe to use as -input for \hgcmd{import}, use the \hgcmd{diff}{--git} option when you -generate the patch. The system \command{patch} command cannot handle -binary patches at all. - -%%% Local Variables: -%%% mode: latex -%%% TeX-master: "00book" -%%% End: diff -r bc14f94e726a -r 5cd47f721686 en/collab.tex --- a/en/collab.tex Thu Jan 29 22:47:34 2009 -0800 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,1118 +0,0 @@ -\chapter{Collaborating with other people} -\label{cha:collab} - -As a completely decentralised tool, Mercurial doesn't impose any -policy on how people ought to work with each other. However, if -you're new to distributed revision control, it helps to have some -tools and examples in mind when you're thinking about possible -workflow models. - -\section{Mercurial's web interface} - -Mercurial has a powerful web interface that provides several -useful capabilities. - -For interactive use, the web interface lets you browse a single -repository or a collection of repositories. You can view the history -of a repository, examine each change (comments and diffs), and view -the contents of each directory and file. - -Also for human consumption, the web interface provides an RSS feed of -the changes in a repository. This lets you ``subscribe'' to a -repository using your favourite feed reader, and be automatically -notified of activity in that repository as soon as it happens. I find -this capability much more convenient than the model of subscribing to -a mailing list to which notifications are sent, as it requires no -additional configuration on the part of whoever is serving the -repository. - -The web interface also lets remote users clone a repository, pull -changes from it, and (when the server is configured to permit it) push -changes back to it. Mercurial's HTTP tunneling protocol aggressively -compresses data, so that it works efficiently even over low-bandwidth -network connections. - -The easiest way to get started with the web interface is to use your -web browser to visit an existing repository, such as the master -Mercurial repository at -\url{http://www.selenic.com/repo/hg?style=gitweb}. - -If you're interested in providing a web interface to your own -repositories, Mercurial provides two ways to do this. The first is -using the \hgcmd{serve} command, which is best suited to short-term -``lightweight'' serving. See section~\ref{sec:collab:serve} below for -details of how to use this command. If you have a long-lived -repository that you'd like to make permanently available, Mercurial -has built-in support for the CGI (Common Gateway Interface) standard, -which all common web servers support. See -section~\ref{sec:collab:cgi} for details of CGI configuration. - -\section{Collaboration models} - -With a suitably flexible tool, making decisions about workflow is much -more of a social engineering challenge than a technical one. -Mercurial imposes few limitations on how you can structure the flow of -work in a project, so it's up to you and your group to set up and live -with a model that matches your own particular needs. - -\subsection{Factors to keep in mind} - -The most important aspect of any model that you must keep in mind is -how well it matches the needs and capabilities of the people who will -be using it. This might seem self-evident; even so, you still can't -afford to forget it for a moment. - -I once put together a workflow model that seemed to make perfect sense -to me, but that caused a considerable amount of consternation and -strife within my development team. In spite of my attempts to explain -why we needed a complex set of branches, and how changes ought to flow -between them, a few team members revolted. Even though they were -smart people, they didn't want to pay attention to the constraints we -were operating under, or face the consequences of those constraints in -the details of the model that I was advocating. - -Don't sweep foreseeable social or technical problems under the rug. -Whatever scheme you put into effect, you should plan for mistakes and -problem scenarios. Consider adding automated machinery to prevent, or -quickly recover from, trouble that you can anticipate. As an example, -if you intend to have a branch with not-for-release changes in it, -you'd do well to think early about the possibility that someone might -accidentally merge those changes into a release branch. You could -avoid this particular problem by writing a hook that prevents changes -from being merged from an inappropriate branch. - -\subsection{Informal anarchy} - -I wouldn't suggest an ``anything goes'' approach as something -sustainable, but it's a model that's easy to grasp, and it works -perfectly well in a few unusual situations. - -As one example, many projects have a loose-knit group of collaborators -who rarely physically meet each other. Some groups like to overcome -the isolation of working at a distance by organising occasional -``sprints''. In a sprint, a number of people get together in a single -location (a company's conference room, a hotel meeting room, that kind -of place) and spend several days more or less locked in there, hacking -intensely on a handful of projects. - -A sprint is the perfect place to use the \hgcmd{serve} command, since -\hgcmd{serve} does not requires any fancy server infrastructure. You -can get started with \hgcmd{serve} in moments, by reading -section~\ref{sec:collab:serve} below. Then simply tell the person -next to you that you're running a server, send the URL to them in an -instant message, and you immediately have a quick-turnaround way to -work together. They can type your URL into their web browser and -quickly review your changes; or they can pull a bugfix from you and -verify it; or they can clone a branch containing a new feature and try -it out. - -The charm, and the problem, with doing things in an ad hoc fashion -like this is that only people who know about your changes, and where -they are, can see them. Such an informal approach simply doesn't -scale beyond a handful people, because each individual needs to know -about $n$ different repositories to pull from. - -\subsection{A single central repository} - -For smaller projects migrating from a centralised revision control -tool, perhaps the easiest way to get started is to have changes flow -through a single shared central repository. This is also the -most common ``building block'' for more ambitious workflow schemes. - -Contributors start by cloning a copy of this repository. They can -pull changes from it whenever they need to, and some (perhaps all) -developers have permission to push a change back when they're ready -for other people to see it. - -Under this model, it can still often make sense for people to pull -changes directly from each other, without going through the central -repository. Consider a case in which I have a tentative bug fix, but -I am worried that if I were to publish it to the central repository, -it might subsequently break everyone else's trees as they pull it. To -reduce the potential for damage, I can ask you to clone my repository -into a temporary repository of your own and test it. This lets us put -off publishing the potentially unsafe change until it has had a little -testing. - -In this kind of scenario, people usually use the \command{ssh} -protocol to securely push changes to the central repository, as -documented in section~\ref{sec:collab:ssh}. It's also usual to -publish a read-only copy of the repository over HTTP using CGI, as in -section~\ref{sec:collab:cgi}. Publishing over HTTP satisfies the -needs of people who don't have push access, and those who want to use -web browsers to browse the repository's history. - -\subsection{Working with multiple branches} - -Projects of any significant size naturally tend to make progress on -several fronts simultaneously. In the case of software, it's common -for a project to go through periodic official releases. A release -might then go into ``maintenance mode'' for a while after its first -publication; maintenance releases tend to contain only bug fixes, not -new features. In parallel with these maintenance releases, one or -more future releases may be under development. People normally use -the word ``branch'' to refer to one of these many slightly different -directions in which development is proceeding. - -Mercurial is particularly well suited to managing a number of -simultaneous, but not identical, branches. Each ``development -direction'' can live in its own central repository, and you can merge -changes from one to another as the need arises. Because repositories -are independent of each other, unstable changes in a development -branch will never affect a stable branch unless someone explicitly -merges those changes in. - -Here's an example of how this can work in practice. Let's say you -have one ``main branch'' on a central server. -\interaction{branching.init} -People clone it, make changes locally, test them, and push them back. - -Once the main branch reaches a release milestone, you can use the -\hgcmd{tag} command to give a permanent name to the milestone -revision. -\interaction{branching.tag} -Let's say some ongoing development occurs on the main branch. -\interaction{branching.main} -Using the tag that was recorded at the milestone, people who clone -that repository at any time in the future can use \hgcmd{update} to -get a copy of the working directory exactly as it was when that tagged -revision was committed. -\interaction{branching.update} - -In addition, immediately after the main branch is tagged, someone can -then clone the main branch on the server to a new ``stable'' branch, -also on the server. -\interaction{branching.clone} - -Someone who needs to make a change to the stable branch can then clone -\emph{that} repository, make their changes, commit, and push their -changes back there. -\interaction{branching.stable} -Because Mercurial repositories are independent, and Mercurial doesn't -move changes around automatically, the stable and main branches are -\emph{isolated} from each other. The changes that you made on the -main branch don't ``leak'' to the stable branch, and vice versa. - -You'll often want all of your bugfixes on the stable branch to show up -on the main branch, too. Rather than rewrite a bugfix on the main -branch, you can simply pull and merge changes from the stable to the -main branch, and Mercurial will bring those bugfixes in for you. -\interaction{branching.merge} -The main branch will still contain changes that are not on the stable -branch, but it will also contain all of the bugfixes from the stable -branch. The stable branch remains unaffected by these changes. - -\subsection{Feature branches} - -For larger projects, an effective way to manage change is to break up -a team into smaller groups. Each group has a shared branch of its -own, cloned from a single ``master'' branch used by the entire -project. People working on an individual branch are typically quite -isolated from developments on other branches. - -\begin{figure}[ht] - \centering - \grafix{feature-branches} - \caption{Feature branches} - \label{fig:collab:feature-branches} -\end{figure} - -When a particular feature is deemed to be in suitable shape, someone -on that feature team pulls and merges from the master branch into the -feature branch, then pushes back up to the master branch. - -\subsection{The release train} - -Some projects are organised on a ``train'' basis: a release is -scheduled to happen every few months, and whatever features are ready -when the ``train'' is ready to leave are allowed in. - -This model resembles working with feature branches. The difference is -that when a feature branch misses a train, someone on the feature team -pulls and merges the changes that went out on that train release into -the feature branch, and the team continues its work on top of that -release so that their feature can make the next release. - -\subsection{The Linux kernel model} - -The development of the Linux kernel has a shallow hierarchical -structure, surrounded by a cloud of apparent chaos. Because most -Linux developers use \command{git}, a distributed revision control -tool with capabilities similar to Mercurial, it's useful to describe -the way work flows in that environment; if you like the ideas, the -approach translates well across tools. - -At the center of the community sits Linus Torvalds, the creator of -Linux. He publishes a single source repository that is considered the -``authoritative'' current tree by the entire developer community. -Anyone can clone Linus's tree, but he is very choosy about whose trees -he pulls from. - -Linus has a number of ``trusted lieutenants''. As a general rule, he -pulls whatever changes they publish, in most cases without even -reviewing those changes. Some of those lieutenants are generally -agreed to be ``maintainers'', responsible for specific subsystems -within the kernel. If a random kernel hacker wants to make a change -to a subsystem that they want to end up in Linus's tree, they must -find out who the subsystem's maintainer is, and ask that maintainer to -take their change. If the maintainer reviews their changes and agrees -to take them, they'll pass them along to Linus in due course. - -Individual lieutenants have their own approaches to reviewing, -accepting, and publishing changes; and for deciding when to feed them -to Linus. In addition, there are several well known branches that -people use for different purposes. For example, a few people maintain -``stable'' repositories of older versions of the kernel, to which they -apply critical fixes as needed. Some maintainers publish multiple -trees: one for experimental changes; one for changes that they are -about to feed upstream; and so on. Others just publish a single -tree. - -This model has two notable features. The first is that it's ``pull -only''. You have to ask, convince, or beg another developer to take a -change from you, because there are almost no trees to which more than -one person can push, and there's no way to push changes into a tree -that someone else controls. - -The second is that it's based on reputation and acclaim. If you're an -unknown, Linus will probably ignore changes from you without even -responding. But a subsystem maintainer will probably review them, and -will likely take them if they pass their criteria for suitability. -The more ``good'' changes you contribute to a maintainer, the more -likely they are to trust your judgment and accept your changes. If -you're well-known and maintain a long-lived branch for something Linus -hasn't yet accepted, people with similar interests may pull your -changes regularly to keep up with your work. - -Reputation and acclaim don't necessarily cross subsystem or ``people'' -boundaries. If you're a respected but specialised storage hacker, and -you try to fix a networking bug, that change will receive a level of -scrutiny from a network maintainer comparable to a change from a -complete stranger. - -To people who come from more orderly project backgrounds, the -comparatively chaotic Linux kernel development process often seems -completely insane. It's subject to the whims of individuals; people -make sweeping changes whenever they deem it appropriate; and the pace -of development is astounding. And yet Linux is a highly successful, -well-regarded piece of software. - -\subsection{Pull-only versus shared-push collaboration} - -A perpetual source of heat in the open source community is whether a -development model in which people only ever pull changes from others -is ``better than'' one in which multiple people can push changes to a -shared repository. - -Typically, the backers of the shared-push model use tools that -actively enforce this approach. If you're using a centralised -revision control tool such as Subversion, there's no way to make a -choice over which model you'll use: the tool gives you shared-push, -and if you want to do anything else, you'll have to roll your own -approach on top (such as applying a patch by hand). - -A good distributed revision control tool, such as Mercurial, will -support both models. You and your collaborators can then structure -how you work together based on your own needs and preferences, not on -what contortions your tools force you into. - -\subsection{Where collaboration meets branch management} - -Once you and your team set up some shared repositories and start -propagating changes back and forth between local and shared repos, you -begin to face a related, but slightly different challenge: that of -managing the multiple directions in which your team may be moving at -once. Even though this subject is intimately related to how your team -collaborates, it's dense enough to merit treatment of its own, in -chapter~\ref{chap:branch}. - -\section{The technical side of sharing} - -The remainder of this chapter is devoted to the question of serving -data to your collaborators. - -\section{Informal sharing with \hgcmd{serve}} -\label{sec:collab:serve} - -Mercurial's \hgcmd{serve} command is wonderfully suited to small, -tight-knit, and fast-paced group environments. It also provides a -great way to get a feel for using Mercurial commands over a network. - -Run \hgcmd{serve} inside a repository, and in under a second it will -bring up a specialised HTTP server; this will accept connections from -any client, and serve up data for that repository until you terminate -it. Anyone who knows the URL of the server you just started, and can -talk to your computer over the network, can then use a web browser or -Mercurial to read data from that repository. A URL for a -\hgcmd{serve} instance running on a laptop is likely to look something -like \Verb|http://my-laptop.local:8000/|. - -The \hgcmd{serve} command is \emph{not} a general-purpose web server. -It can do only two things: -\begin{itemize} -\item Allow people to browse the history of the repository it's - serving, from their normal web browsers. -\item Speak Mercurial's wire protocol, so that people can - \hgcmd{clone} or \hgcmd{pull} changes from that repository. -\end{itemize} -In particular, \hgcmd{serve} won't allow remote users to \emph{modify} -your repository. It's intended for read-only use. - -If you're getting started with Mercurial, there's nothing to prevent -you from using \hgcmd{serve} to serve up a repository on your own -computer, then use commands like \hgcmd{clone}, \hgcmd{incoming}, and -so on to talk to that server as if the repository was hosted remotely. -This can help you to quickly get acquainted with using commands on -network-hosted repositories. - -\subsection{A few things to keep in mind} - -Because it provides unauthenticated read access to all clients, you -should only use \hgcmd{serve} in an environment where you either don't -care, or have complete control over, who can access your network and -pull data from your repository. - -The \hgcmd{serve} command knows nothing about any firewall software -you might have installed on your system or network. It cannot detect -or control your firewall software. If other people are unable to talk -to a running \hgcmd{serve} instance, the second thing you should do -(\emph{after} you make sure that they're using the correct URL) is -check your firewall configuration. - -By default, \hgcmd{serve} listens for incoming connections on -port~8000. If another process is already listening on the port you -want to use, you can specify a different port to listen on using the -\hgopt{serve}{-p} option. - -Normally, when \hgcmd{serve} starts, it prints no output, which can be -a bit unnerving. If you'd like to confirm that it is indeed running -correctly, and find out what URL you should send to your -collaborators, start it with the \hggopt{-v} option. - -\section{Using the Secure Shell (ssh) protocol} -\label{sec:collab:ssh} - -You can pull and push changes securely over a network connection using -the Secure Shell (\texttt{ssh}) protocol. To use this successfully, -you may have to do a little bit of configuration on the client or -server sides. - -If you're not familiar with ssh, it's a network protocol that lets you -securely communicate with another computer. To use it with Mercurial, -you'll be setting up one or more user accounts on a server so that -remote users can log in and execute commands. - -(If you \emph{are} familiar with ssh, you'll probably find some of the -material that follows to be elementary in nature.) - -\subsection{How to read and write ssh URLs} - -An ssh URL tends to look like this: -\begin{codesample2} - ssh://bos@hg.serpentine.com:22/hg/hgbook -\end{codesample2} -\begin{enumerate} -\item The ``\texttt{ssh://}'' part tells Mercurial to use the ssh - protocol. -\item The ``\texttt{bos@}'' component indicates what username to log - into the server as. You can leave this out if the remote username - is the same as your local username. -\item The ``\texttt{hg.serpentine.com}'' gives the hostname of the - server to log into. -\item The ``:22'' identifies the port number to connect to the server - on. The default port is~22, so you only need to specify this part - if you're \emph{not} using port~22. -\item The remainder of the URL is the local path to the repository on - the server. -\end{enumerate} - -There's plenty of scope for confusion with the path component of ssh -URLs, as there is no standard way for tools to interpret it. Some -programs behave differently than others when dealing with these paths. -This isn't an ideal situation, but it's unlikely to change. Please -read the following paragraphs carefully. - -Mercurial treats the path to a repository on the server as relative to -the remote user's home directory. For example, if user \texttt{foo} -on the server has a home directory of \dirname{/home/foo}, then an ssh -URL that contains a path component of \dirname{bar} -\emph{really} refers to the directory \dirname{/home/foo/bar}. - -If you want to specify a path relative to another user's home -directory, you can use a path that starts with a tilde character -followed by the user's name (let's call them \texttt{otheruser}), like -this. -\begin{codesample2} - ssh://server/~otheruser/hg/repo -\end{codesample2} - -And if you really want to specify an \emph{absolute} path on the -server, begin the path component with two slashes, as in this example. -\begin{codesample2} - ssh://server//absolute/path -\end{codesample2} - -\subsection{Finding an ssh client for your system} - -Almost every Unix-like system comes with OpenSSH preinstalled. If -you're using such a system, run \Verb|which ssh| to find out if -the \command{ssh} command is installed (it's usually in -\dirname{/usr/bin}). In the unlikely event that it isn't present, -take a look at your system documentation to figure out how to install -it. - -On Windows, you'll first need to download a suitable ssh -client. There are two alternatives. -\begin{itemize} -\item Simon Tatham's excellent PuTTY package~\cite{web:putty} provides - a complete suite of ssh client commands. -\item If you have a high tolerance for pain, you can use the Cygwin - port of OpenSSH. -\end{itemize} -In either case, you'll need to edit your \hgini\ file to tell -Mercurial where to find the actual client command. For example, if -you're using PuTTY, you'll need to use the \command{plink} command as -a command-line ssh client. -\begin{codesample2} - [ui] - ssh = C:/path/to/plink.exe -ssh -i "C:/path/to/my/private/key" -\end{codesample2} - -\begin{note} - The path to \command{plink} shouldn't contain any whitespace - characters, or Mercurial may not be able to run it correctly (so - putting it in \dirname{C:\\Program Files} is probably not a good - idea). -\end{note} - -\subsection{Generating a key pair} - -To avoid the need to repetitively type a password every time you need -to use your ssh client, I recommend generating a key pair. On a -Unix-like system, the \command{ssh-keygen} command will do the trick. -On Windows, if you're using PuTTY, the \command{puttygen} command is -what you'll need. - -When you generate a key pair, it's usually \emph{highly} advisable to -protect it with a passphrase. (The only time that you might not want -to do this is when you're using the ssh protocol for automated tasks -on a secure network.) - -Simply generating a key pair isn't enough, however. You'll need to -add the public key to the set of authorised keys for whatever user -you're logging in remotely as. For servers using OpenSSH (the vast -majority), this will mean adding the public key to a list in a file -called \sfilename{authorized\_keys} in their \sdirname{.ssh} -directory. - -On a Unix-like system, your public key will have a \filename{.pub} -extension. If you're using \command{puttygen} on Windows, you can -save the public key to a file of your choosing, or paste it from the -window it's displayed in straight into the -\sfilename{authorized\_keys} file. - -\subsection{Using an authentication agent} - -An authentication agent is a daemon that stores passphrases in memory -(so it will forget passphrases if you log out and log back in again). -An ssh client will notice if it's running, and query it for a -passphrase. If there's no authentication agent running, or the agent -doesn't store the necessary passphrase, you'll have to type your -passphrase every time Mercurial tries to communicate with a server on -your behalf (e.g.~whenever you pull or push changes). - -The downside of storing passphrases in an agent is that it's possible -for a well-prepared attacker to recover the plain text of your -passphrases, in some cases even if your system has been power-cycled. -You should make your own judgment as to whether this is an acceptable -risk. It certainly saves a lot of repeated typing. - -On Unix-like systems, the agent is called \command{ssh-agent}, and -it's often run automatically for you when you log in. You'll need to -use the \command{ssh-add} command to add passphrases to the agent's -store. On Windows, if you're using PuTTY, the \command{pageant} -command acts as the agent. It adds an icon to your system tray that -will let you manage stored passphrases. - -\subsection{Configuring the server side properly} - -Because ssh can be fiddly to set up if you're new to it, there's a -variety of things that can go wrong. Add Mercurial on top, and -there's plenty more scope for head-scratching. Most of these -potential problems occur on the server side, not the client side. The -good news is that once you've gotten a configuration working, it will -usually continue to work indefinitely. - -Before you try using Mercurial to talk to an ssh server, it's best to -make sure that you can use the normal \command{ssh} or \command{putty} -command to talk to the server first. If you run into problems with -using these commands directly, Mercurial surely won't work. Worse, it -will obscure the underlying problem. Any time you want to debug -ssh-related Mercurial problems, you should drop back to making sure -that plain ssh client commands work first, \emph{before} you worry -about whether there's a problem with Mercurial. - -The first thing to be sure of on the server side is that you can -actually log in from another machine at all. If you can't use -\command{ssh} or \command{putty} to log in, the error message you get -may give you a few hints as to what's wrong. The most common problems -are as follows. -\begin{itemize} -\item If you get a ``connection refused'' error, either there isn't an - SSH daemon running on the server at all, or it's inaccessible due to - firewall configuration. -\item If you get a ``no route to host'' error, you either have an - incorrect address for the server or a seriously locked down firewall - that won't admit its existence at all. -\item If you get a ``permission denied'' error, you may have mistyped - the username on the server, or you could have mistyped your key's - passphrase or the remote user's password. -\end{itemize} -In summary, if you're having trouble talking to the server's ssh -daemon, first make sure that one is running at all. On many systems -it will be installed, but disabled, by default. Once you're done with -this step, you should then check that the server's firewall is -configured to allow incoming connections on the port the ssh daemon is -listening on (usually~22). Don't worry about more exotic -possibilities for misconfiguration until you've checked these two -first. - -If you're using an authentication agent on the client side to store -passphrases for your keys, you ought to be able to log into the server -without being prompted for a passphrase or a password. If you're -prompted for a passphrase, there are a few possible culprits. -\begin{itemize} -\item You might have forgotten to use \command{ssh-add} or - \command{pageant} to store the passphrase. -\item You might have stored the passphrase for the wrong key. -\end{itemize} -If you're being prompted for the remote user's password, there are -another few possible problems to check. -\begin{itemize} -\item Either the user's home directory or their \sdirname{.ssh} - directory might have excessively liberal permissions. As a result, - the ssh daemon will not trust or read their - \sfilename{authorized\_keys} file. For example, a group-writable - home or \sdirname{.ssh} directory will often cause this symptom. -\item The user's \sfilename{authorized\_keys} file may have a problem. - If anyone other than the user owns or can write to that file, the - ssh daemon will not trust or read it. -\end{itemize} - -In the ideal world, you should be able to run the following command -successfully, and it should print exactly one line of output, the -current date and time. -\begin{codesample2} - ssh myserver date -\end{codesample2} - -If, on your server, you have login scripts that print banners or other -junk even when running non-interactive commands like this, you should -fix them before you continue, so that they only print output if -they're run interactively. Otherwise these banners will at least -clutter up Mercurial's output. Worse, they could potentially cause -problems with running Mercurial commands remotely. Mercurial makes -tries to detect and ignore banners in non-interactive \command{ssh} -sessions, but it is not foolproof. (If you're editing your login -scripts on your server, the usual way to see if a login script is -running in an interactive shell is to check the return code from the -command \Verb|tty -s|.) - -Once you've verified that plain old ssh is working with your server, -the next step is to ensure that Mercurial runs on the server. The -following command should run successfully: -\begin{codesample2} - ssh myserver hg version -\end{codesample2} -If you see an error message instead of normal \hgcmd{version} output, -this is usually because you haven't installed Mercurial to -\dirname{/usr/bin}. Don't worry if this is the case; you don't need -to do that. But you should check for a few possible problems. -\begin{itemize} -\item Is Mercurial really installed on the server at all? I know this - sounds trivial, but it's worth checking! -\item Maybe your shell's search path (usually set via the \envar{PATH} - environment variable) is simply misconfigured. -\item Perhaps your \envar{PATH} environment variable is only being set - to point to the location of the \command{hg} executable if the login - session is interactive. This can happen if you're setting the path - in the wrong shell login script. See your shell's documentation for - details. -\item The \envar{PYTHONPATH} environment variable may need to contain - the path to the Mercurial Python modules. It might not be set at - all; it could be incorrect; or it may be set only if the login is - interactive. -\end{itemize} - -If you can run \hgcmd{version} over an ssh connection, well done! -You've got the server and client sorted out. You should now be able -to use Mercurial to access repositories hosted by that username on -that server. If you run into problems with Mercurial and ssh at this -point, try using the \hggopt{--debug} option to get a clearer picture -of what's going on. - -\subsection{Using compression with ssh} - -Mercurial does not compress data when it uses the ssh protocol, -because the ssh protocol can transparently compress data. However, -the default behaviour of ssh clients is \emph{not} to request -compression. - -Over any network other than a fast LAN (even a wireless network), -using compression is likely to significantly speed up Mercurial's -network operations. For example, over a WAN, someone measured -compression as reducing the amount of time required to clone a -particularly large repository from~51 minutes to~17 minutes. - -Both \command{ssh} and \command{plink} accept a \cmdopt{ssh}{-C} -option which turns on compression. You can easily edit your \hgrc\ to -enable compression for all of Mercurial's uses of the ssh protocol. -\begin{codesample2} - [ui] - ssh = ssh -C -\end{codesample2} - -If you use \command{ssh}, you can configure it to always use -compression when talking to your server. To do this, edit your -\sfilename{.ssh/config} file (which may not yet exist), as follows. -\begin{codesample2} - Host hg - Compression yes - HostName hg.example.com -\end{codesample2} -This defines an alias, \texttt{hg}. When you use it on the -\command{ssh} command line or in a Mercurial \texttt{ssh}-protocol -URL, it will cause \command{ssh} to connect to \texttt{hg.example.com} -and use compression. This gives you both a shorter name to type and -compression, each of which is a good thing in its own right. - -\section{Serving over HTTP using CGI} -\label{sec:collab:cgi} - -Depending on how ambitious you are, configuring Mercurial's CGI -interface can take anything from a few moments to several hours. - -We'll begin with the simplest of examples, and work our way towards a -more complex configuration. Even for the most basic case, you're -almost certainly going to need to read and modify your web server's -configuration. - -\begin{note} - Configuring a web server is a complex, fiddly, and highly - system-dependent activity. I can't possibly give you instructions - that will cover anything like all of the cases you will encounter. - Please use your discretion and judgment in following the sections - below. Be prepared to make plenty of mistakes, and to spend a lot - of time reading your server's error logs. -\end{note} - -\subsection{Web server configuration checklist} - -Before you continue, do take a few moments to check a few aspects of -your system's setup. - -\begin{enumerate} -\item Do you have a web server installed at all? Mac OS X ships with - Apache, but many other systems may not have a web server installed. -\item If you have a web server installed, is it actually running? On - most systems, even if one is present, it will be disabled by - default. -\item Is your server configured to allow you to run CGI programs in - the directory where you plan to do so? Most servers default to - explicitly disabling the ability to run CGI programs. -\end{enumerate} - -If you don't have a web server installed, and don't have substantial -experience configuring Apache, you should consider using the -\texttt{lighttpd} web server instead of Apache. Apache has a -well-deserved reputation for baroque and confusing configuration. -While \texttt{lighttpd} is less capable in some ways than Apache, most -of these capabilities are not relevant to serving Mercurial -repositories. And \texttt{lighttpd} is undeniably \emph{much} easier -to get started with than Apache. - -\subsection{Basic CGI configuration} - -On Unix-like systems, it's common for users to have a subdirectory -named something like \dirname{public\_html} in their home directory, -from which they can serve up web pages. A file named \filename{foo} -in this directory will be accessible at a URL of the form -\texttt{http://www.example.com/\~{}username/foo}. - -To get started, find the \sfilename{hgweb.cgi} script that should be -present in your Mercurial installation. If you can't quickly find a -local copy on your system, simply download one from the master -Mercurial repository at -\url{http://www.selenic.com/repo/hg/raw-file/tip/hgweb.cgi}. - -You'll need to copy this script into your \dirname{public\_html} -directory, and ensure that it's executable. -\begin{codesample2} - cp .../hgweb.cgi ~/public_html - chmod 755 ~/public_html/hgweb.cgi -\end{codesample2} -The \texttt{755} argument to \command{chmod} is a little more general -than just making the script executable: it ensures that the script is -executable by anyone, and that ``group'' and ``other'' write -permissions are \emph{not} set. If you were to leave those write -permissions enabled, Apache's \texttt{suexec} subsystem would likely -refuse to execute the script. In fact, \texttt{suexec} also insists -that the \emph{directory} in which the script resides must not be -writable by others. -\begin{codesample2} - chmod 755 ~/public_html -\end{codesample2} - -\subsubsection{What could \emph{possibly} go wrong?} -\label{sec:collab:wtf} - -Once you've copied the CGI script into place, go into a web browser, -and try to open the URL \url{http://myhostname/~myuser/hgweb.cgi}, -\emph{but} brace yourself for instant failure. There's a high -probability that trying to visit this URL will fail, and there are -many possible reasons for this. In fact, you're likely to stumble -over almost every one of the possible errors below, so please read -carefully. The following are all of the problems I ran into on a -system running Fedora~7, with a fresh installation of Apache, and a -user account that I created specially to perform this exercise. - -Your web server may have per-user directories disabled. If you're -using Apache, search your config file for a \texttt{UserDir} -directive. If there's none present, per-user directories will be -disabled. If one exists, but its value is \texttt{disabled}, then -per-user directories will be disabled. Otherwise, the string after -\texttt{UserDir} gives the name of the subdirectory that Apache will -look in under your home directory, for example \dirname{public\_html}. - -Your file access permissions may be too restrictive. The web server -must be able to traverse your home directory and directories under -your \dirname{public\_html} directory, and read files under the latter -too. Here's a quick recipe to help you to make your permissions more -appropriate. -\begin{codesample2} - chmod 755 ~ - find ~/public_html -type d -print0 | xargs -0r chmod 755 - find ~/public_html -type f -print0 | xargs -0r chmod 644 -\end{codesample2} - -The other possibility with permissions is that you might get a -completely empty window when you try to load the script. In this -case, it's likely that your access permissions are \emph{too - permissive}. Apache's \texttt{suexec} subsystem won't execute a -script that's group-~or world-writable, for example. - -Your web server may be configured to disallow execution of CGI -programs in your per-user web directory. Here's Apache's -default per-user configuration from my Fedora system. -\begin{codesample2} - - AllowOverride FileInfo AuthConfig Limit - Options MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec - - Order allow,deny - Allow from all - - - Order deny,allow - Deny from all - - -\end{codesample2} -If you find a similar-looking \texttt{Directory} group in your Apache -configuration, the directive to look at inside it is \texttt{Options}. -Add \texttt{ExecCGI} to the end of this list if it's missing, and -restart the web server. - -If you find that Apache serves you the text of the CGI script instead -of executing it, you may need to either uncomment (if already present) -or add a directive like this. -\begin{codesample2} - AddHandler cgi-script .cgi -\end{codesample2} - -The next possibility is that you might be served with a colourful -Python backtrace claiming that it can't import a -\texttt{mercurial}-related module. This is actually progress! The -server is now capable of executing your CGI script. This error is -only likely to occur if you're running a private installation of -Mercurial, instead of a system-wide version. Remember that the web -server runs the CGI program without any of the environment variables -that you take for granted in an interactive session. If this error -happens to you, edit your copy of \sfilename{hgweb.cgi} and follow the -directions inside it to correctly set your \envar{PYTHONPATH} -environment variable. - -Finally, you are \emph{certain} to by served with another colourful -Python backtrace: this one will complain that it can't find -\dirname{/path/to/repository}. Edit your \sfilename{hgweb.cgi} script -and replace the \dirname{/path/to/repository} string with the complete -path to the repository you want to serve up. - -At this point, when you try to reload the page, you should be -presented with a nice HTML view of your repository's history. Whew! - -\subsubsection{Configuring lighttpd} - -To be exhaustive in my experiments, I tried configuring the -increasingly popular \texttt{lighttpd} web server to serve the same -repository as I described with Apache above. I had already overcome -all of the problems I outlined with Apache, many of which are not -server-specific. As a result, I was fairly sure that my file and -directory permissions were good, and that my \sfilename{hgweb.cgi} -script was properly edited. - -Once I had Apache running, getting \texttt{lighttpd} to serve the -repository was a snap (in other words, even if you're trying to use -\texttt{lighttpd}, you should read the Apache section). I first had -to edit the \texttt{mod\_access} section of its config file to enable -\texttt{mod\_cgi} and \texttt{mod\_userdir}, both of which were -disabled by default on my system. I then added a few lines to the end -of the config file, to configure these modules. -\begin{codesample2} - userdir.path = "public_html" - cgi.assign = ( ".cgi" => "" ) -\end{codesample2} -With this done, \texttt{lighttpd} ran immediately for me. If I had -configured \texttt{lighttpd} before Apache, I'd almost certainly have -run into many of the same system-level configuration problems as I did -with Apache. However, I found \texttt{lighttpd} to be noticeably -easier to configure than Apache, even though I've used Apache for over -a decade, and this was my first exposure to \texttt{lighttpd}. - -\subsection{Sharing multiple repositories with one CGI script} - -The \sfilename{hgweb.cgi} script only lets you publish a single -repository, which is an annoying restriction. If you want to publish -more than one without wracking yourself with multiple copies of the -same script, each with different names, a better choice is to use the -\sfilename{hgwebdir.cgi} script. - -The procedure to configure \sfilename{hgwebdir.cgi} is only a little -more involved than for \sfilename{hgweb.cgi}. First, you must obtain -a copy of the script. If you don't have one handy, you can download a -copy from the master Mercurial repository at -\url{http://www.selenic.com/repo/hg/raw-file/tip/hgwebdir.cgi}. - -You'll need to copy this script into your \dirname{public\_html} -directory, and ensure that it's executable. -\begin{codesample2} - cp .../hgwebdir.cgi ~/public_html - chmod 755 ~/public_html ~/public_html/hgwebdir.cgi -\end{codesample2} -With basic configuration out of the way, try to visit -\url{http://myhostname/~myuser/hgwebdir.cgi} in your browser. It -should display an empty list of repositories. If you get a blank -window or error message, try walking through the list of potential -problems in section~\ref{sec:collab:wtf}. - -The \sfilename{hgwebdir.cgi} script relies on an external -configuration file. By default, it searches for a file named -\sfilename{hgweb.config} in the same directory as itself. You'll need -to create this file, and make it world-readable. The format of the -file is similar to a Windows ``ini'' file, as understood by Python's -\texttt{ConfigParser}~\cite{web:configparser} module. - -The easiest way to configure \sfilename{hgwebdir.cgi} is with a -section named \texttt{collections}. This will automatically publish -\emph{every} repository under the directories you name. The section -should look like this: -\begin{codesample2} - [collections] - /my/root = /my/root -\end{codesample2} -Mercurial interprets this by looking at the directory name on the -\emph{right} hand side of the ``\texttt{=}'' sign; finding -repositories in that directory hierarchy; and using the text on the -\emph{left} to strip off matching text from the names it will actually -list in the web interface. The remaining component of a path after -this stripping has occurred is called a ``virtual path''. - -Given the example above, if we have a repository whose local path is -\dirname{/my/root/this/repo}, the CGI script will strip the leading -\dirname{/my/root} from the name, and publish the repository with a -virtual path of \dirname{this/repo}. If the base URL for our CGI -script is \url{http://myhostname/~myuser/hgwebdir.cgi}, the complete -URL for that repository will be -\url{http://myhostname/~myuser/hgwebdir.cgi/this/repo}. - -If we replace \dirname{/my/root} on the left hand side of this example -with \dirname{/my}, then \sfilename{hgwebdir.cgi} will only strip off -\dirname{/my} from the repository name, and will give us a virtual -path of \dirname{root/this/repo} instead of \dirname{this/repo}. - -The \sfilename{hgwebdir.cgi} script will recursively search each -directory listed in the \texttt{collections} section of its -configuration file, but it will \texttt{not} recurse into the -repositories it finds. - -The \texttt{collections} mechanism makes it easy to publish many -repositories in a ``fire and forget'' manner. You only need to set up -the CGI script and configuration file one time. Afterwards, you can -publish or unpublish a repository at any time by simply moving it -into, or out of, the directory hierarchy in which you've configured -\sfilename{hgwebdir.cgi} to look. - -\subsubsection{Explicitly specifying which repositories to publish} - -In addition to the \texttt{collections} mechanism, the -\sfilename{hgwebdir.cgi} script allows you to publish a specific list -of repositories. To do so, create a \texttt{paths} section, with -contents of the following form. -\begin{codesample2} - [paths] - repo1 = /my/path/to/some/repo - repo2 = /some/path/to/another -\end{codesample2} -In this case, the virtual path (the component that will appear in a -URL) is on the left hand side of each definition, while the path to -the repository is on the right. Notice that there does not need to be -any relationship between the virtual path you choose and the location -of a repository in your filesystem. - -If you wish, you can use both the \texttt{collections} and -\texttt{paths} mechanisms simultaneously in a single configuration -file. - -\begin{note} - If multiple repositories have the same virtual path, - \sfilename{hgwebdir.cgi} will not report an error. Instead, it will - behave unpredictably. -\end{note} - -\subsection{Downloading source archives} - -Mercurial's web interface lets users download an archive of any -revision. This archive will contain a snapshot of the working -directory as of that revision, but it will not contain a copy of the -repository data. - -By default, this feature is not enabled. To enable it, you'll need to -add an \rcitem{web}{allow\_archive} item to the \rcsection{web} -section of your \hgrc. - -\subsection{Web configuration options} - -Mercurial's web interfaces (the \hgcmd{serve} command, and the -\sfilename{hgweb.cgi} and \sfilename{hgwebdir.cgi} scripts) have a -number of configuration options that you can set. These belong in a -section named \rcsection{web}. -\begin{itemize} -\item[\rcitem{web}{allow\_archive}] Determines which (if any) archive - download mechanisms Mercurial supports. If you enable this - feature, users of the web interface will be able to download an - archive of whatever revision of a repository they are viewing. - To enable the archive feature, this item must take the form of a - sequence of words drawn from the list below. - \begin{itemize} - \item[\texttt{bz2}] A \command{tar} archive, compressed using - \texttt{bzip2} compression. This has the best compression ratio, - but uses the most CPU time on the server. - \item[\texttt{gz}] A \command{tar} archive, compressed using - \texttt{gzip} compression. - \item[\texttt{zip}] A \command{zip} archive, compressed using LZW - compression. This format has the worst compression ratio, but is - widely used in the Windows world. - \end{itemize} - If you provide an empty list, or don't have an - \rcitem{web}{allow\_archive} entry at all, this feature will be - disabled. Here is an example of how to enable all three supported - formats. - \begin{codesample4} - [web] - allow_archive = bz2 gz zip - \end{codesample4} -\item[\rcitem{web}{allowpull}] Boolean. Determines whether the web - interface allows remote users to \hgcmd{pull} and \hgcmd{clone} this - repository over~HTTP. If set to \texttt{no} or \texttt{false}, only - the ``human-oriented'' portion of the web interface is available. -\item[\rcitem{web}{contact}] String. A free-form (but preferably - brief) string identifying the person or group in charge of the - repository. This often contains the name and email address of a - person or mailing list. It often makes sense to place this entry in - a repository's own \sfilename{.hg/hgrc} file, but it can make sense - to use in a global \hgrc\ if every repository has a single - maintainer. -\item[\rcitem{web}{maxchanges}] Integer. The default maximum number - of changesets to display in a single page of output. -\item[\rcitem{web}{maxfiles}] Integer. The default maximum number - of modified files to display in a single page of output. -\item[\rcitem{web}{stripes}] Integer. If the web interface displays - alternating ``stripes'' to make it easier to visually align rows - when you are looking at a table, this number controls the number of - rows in each stripe. -\item[\rcitem{web}{style}] Controls the template Mercurial uses to - display the web interface. Mercurial ships with two web templates, - named \texttt{default} and \texttt{gitweb} (the latter is much more - visually attractive). You can also specify a custom template of - your own; see chapter~\ref{chap:template} for details. Here, you - can see how to enable the \texttt{gitweb} style. - \begin{codesample4} - [web] - style = gitweb - \end{codesample4} -\item[\rcitem{web}{templates}] Path. The directory in which to search - for template files. By default, Mercurial searches in the directory - in which it was installed. -\end{itemize} -If you are using \sfilename{hgwebdir.cgi}, you can place a few -configuration items in a \rcsection{web} section of the -\sfilename{hgweb.config} file instead of a \hgrc\ file, for -convenience. These items are \rcitem{web}{motd} and -\rcitem{web}{style}. - -\subsubsection{Options specific to an individual repository} - -A few \rcsection{web} configuration items ought to be placed in a -repository's local \sfilename{.hg/hgrc}, rather than a user's or -global \hgrc. -\begin{itemize} -\item[\rcitem{web}{description}] String. A free-form (but preferably - brief) string that describes the contents or purpose of the - repository. -\item[\rcitem{web}{name}] String. The name to use for the repository - in the web interface. This overrides the default name, which is the - last component of the repository's path. -\end{itemize} - -\subsubsection{Options specific to the \hgcmd{serve} command} - -Some of the items in the \rcsection{web} section of a \hgrc\ file are -only for use with the \hgcmd{serve} command. -\begin{itemize} -\item[\rcitem{web}{accesslog}] Path. The name of a file into which to - write an access log. By default, the \hgcmd{serve} command writes - this information to standard output, not to a file. Log entries are - written in the standard ``combined'' file format used by almost all - web servers. -\item[\rcitem{web}{address}] String. The local address on which the - server should listen for incoming connections. By default, the - server listens on all addresses. -\item[\rcitem{web}{errorlog}] Path. The name of a file into which to - write an error log. By default, the \hgcmd{serve} command writes this - information to standard error, not to a file. -\item[\rcitem{web}{ipv6}] Boolean. Whether to use the IPv6 protocol. - By default, IPv6 is not used. -\item[\rcitem{web}{port}] Integer. The TCP~port number on which the - server should listen. The default port number used is~8000. -\end{itemize} - -\subsubsection{Choosing the right \hgrc\ file to add \rcsection{web} - items to} - -It is important to remember that a web server like Apache or -\texttt{lighttpd} will run under a user~ID that is different to yours. -CGI scripts run by your server, such as \sfilename{hgweb.cgi}, will -usually also run under that user~ID. - -If you add \rcsection{web} items to your own personal \hgrc\ file, CGI -scripts won't read that \hgrc\ file. Those settings will thus only -affect the behaviour of the \hgcmd{serve} command when you run it. To -cause CGI scripts to see your settings, either create a \hgrc\ file in -the home directory of the user ID that runs your web server, or add -those settings to a system-wide \hgrc\ file. - - -%%% Local Variables: -%%% mode: latex -%%% TeX-master: "00book" -%%% End: diff -r bc14f94e726a -r 5cd47f721686 en/concepts.tex --- a/en/concepts.tex Thu Jan 29 22:47:34 2009 -0800 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,577 +0,0 @@ -\chapter{Behind the scenes} -\label{chap:concepts} - -Unlike many revision control systems, the concepts upon which -Mercurial is built are simple enough that it's easy to understand how -the software really works. Knowing this certainly isn't necessary, -but I find it useful to have a ``mental model'' of what's going on. - -This understanding gives me confidence that Mercurial has been -carefully designed to be both \emph{safe} and \emph{efficient}. And -just as importantly, if it's easy for me to retain a good idea of what -the software is doing when I perform a revision control task, I'm less -likely to be surprised by its behaviour. - -In this chapter, we'll initially cover the core concepts behind -Mercurial's design, then continue to discuss some of the interesting -details of its implementation. - -\section{Mercurial's historical record} - -\subsection{Tracking the history of a single file} - -When Mercurial tracks modifications to a file, it stores the history -of that file in a metadata object called a \emph{filelog}. Each entry -in the filelog contains enough information to reconstruct one revision -of the file that is being tracked. Filelogs are stored as files in -the \sdirname{.hg/store/data} directory. A filelog contains two kinds -of information: revision data, and an index to help Mercurial to find -a revision efficiently. - -A file that is large, or has a lot of history, has its filelog stored -in separate data (``\texttt{.d}'' suffix) and index (``\texttt{.i}'' -suffix) files. For small files without much history, the revision -data and index are combined in a single ``\texttt{.i}'' file. The -correspondence between a file in the working directory and the filelog -that tracks its history in the repository is illustrated in -figure~\ref{fig:concepts:filelog}. - -\begin{figure}[ht] - \centering - \grafix{filelog} - \caption{Relationships between files in working directory and - filelogs in repository} - \label{fig:concepts:filelog} -\end{figure} - -\subsection{Managing tracked files} - -Mercurial uses a structure called a \emph{manifest} to collect -together information about the files that it tracks. Each entry in -the manifest contains information about the files present in a single -changeset. An entry records which files are present in the changeset, -the revision of each file, and a few other pieces of file metadata. - -\subsection{Recording changeset information} - -The \emph{changelog} contains information about each changeset. Each -revision records who committed a change, the changeset comment, other -pieces of changeset-related information, and the revision of the -manifest to use. - -\subsection{Relationships between revisions} - -Within a changelog, a manifest, or a filelog, each revision stores a -pointer to its immediate parent (or to its two parents, if it's a -merge revision). As I mentioned above, there are also relationships -between revisions \emph{across} these structures, and they are -hierarchical in nature. - -For every changeset in a repository, there is exactly one revision -stored in the changelog. Each revision of the changelog contains a -pointer to a single revision of the manifest. A revision of the -manifest stores a pointer to a single revision of each filelog tracked -when that changeset was created. These relationships are illustrated -in figure~\ref{fig:concepts:metadata}. - -\begin{figure}[ht] - \centering - \grafix{metadata} - \caption{Metadata relationships} - \label{fig:concepts:metadata} -\end{figure} - -As the illustration shows, there is \emph{not} a ``one to one'' -relationship between revisions in the changelog, manifest, or filelog. -If the manifest hasn't changed between two changesets, the changelog -entries for those changesets will point to the same revision of the -manifest. If a file that Mercurial tracks hasn't changed between two -changesets, the entry for that file in the two revisions of the -manifest will point to the same revision of its filelog. - -\section{Safe, efficient storage} - -The underpinnings of changelogs, manifests, and filelogs are provided -by a single structure called the \emph{revlog}. - -\subsection{Efficient storage} - -The revlog provides efficient storage of revisions using a -\emph{delta} mechanism. Instead of storing a complete copy of a file -for each revision, it stores the changes needed to transform an older -revision into the new revision. For many kinds of file data, these -deltas are typically a fraction of a percent of the size of a full -copy of a file. - -Some obsolete revision control systems can only work with deltas of -text files. They must either store binary files as complete snapshots -or encoded into a text representation, both of which are wasteful -approaches. Mercurial can efficiently handle deltas of files with -arbitrary binary contents; it doesn't need to treat text as special. - -\subsection{Safe operation} -\label{sec:concepts:txn} - -Mercurial only ever \emph{appends} data to the end of a revlog file. -It never modifies a section of a file after it has written it. This -is both more robust and efficient than schemes that need to modify or -rewrite data. - -In addition, Mercurial treats every write as part of a -\emph{transaction} that can span a number of files. A transaction is -\emph{atomic}: either the entire transaction succeeds and its effects -are all visible to readers in one go, or the whole thing is undone. -This guarantee of atomicity means that if you're running two copies of -Mercurial, where one is reading data and one is writing it, the reader -will never see a partially written result that might confuse it. - -The fact that Mercurial only appends to files makes it easier to -provide this transactional guarantee. The easier it is to do stuff -like this, the more confident you should be that it's done correctly. - -\subsection{Fast retrieval} - -Mercurial cleverly avoids a pitfall common to all earlier -revision control systems: the problem of \emph{inefficient retrieval}. -Most revision control systems store the contents of a revision as an -incremental series of modifications against a ``snapshot''. To -reconstruct a specific revision, you must first read the snapshot, and -then every one of the revisions between the snapshot and your target -revision. The more history that a file accumulates, the more -revisions you must read, hence the longer it takes to reconstruct a -particular revision. - -\begin{figure}[ht] - \centering - \grafix{snapshot} - \caption{Snapshot of a revlog, with incremental deltas} - \label{fig:concepts:snapshot} -\end{figure} - -The innovation that Mercurial applies to this problem is simple but -effective. Once the cumulative amount of delta information stored -since the last snapshot exceeds a fixed threshold, it stores a new -snapshot (compressed, of course), instead of another delta. This -makes it possible to reconstruct \emph{any} revision of a file -quickly. This approach works so well that it has since been copied by -several other revision control systems. - -Figure~\ref{fig:concepts:snapshot} illustrates the idea. In an entry -in a revlog's index file, Mercurial stores the range of entries from -the data file that it must read to reconstruct a particular revision. - -\subsubsection{Aside: the influence of video compression} - -If you're familiar with video compression or have ever watched a TV -feed through a digital cable or satellite service, you may know that -most video compression schemes store each frame of video as a delta -against its predecessor frame. In addition, these schemes use -``lossy'' compression techniques to increase the compression ratio, so -visual errors accumulate over the course of a number of inter-frame -deltas. - -Because it's possible for a video stream to ``drop out'' occasionally -due to signal glitches, and to limit the accumulation of artefacts -introduced by the lossy compression process, video encoders -periodically insert a complete frame (called a ``key frame'') into the -video stream; the next delta is generated against that frame. This -means that if the video signal gets interrupted, it will resume once -the next key frame is received. Also, the accumulation of encoding -errors restarts anew with each key frame. - -\subsection{Identification and strong integrity} - -Along with delta or snapshot information, a revlog entry contains a -cryptographic hash of the data that it represents. This makes it -difficult to forge the contents of a revision, and easy to detect -accidental corruption. - -Hashes provide more than a mere check against corruption; they are -used as the identifiers for revisions. The changeset identification -hashes that you see as an end user are from revisions of the -changelog. Although filelogs and the manifest also use hashes, -Mercurial only uses these behind the scenes. - -Mercurial verifies that hashes are correct when it retrieves file -revisions and when it pulls changes from another repository. If it -encounters an integrity problem, it will complain and stop whatever -it's doing. - -In addition to the effect it has on retrieval efficiency, Mercurial's -use of periodic snapshots makes it more robust against partial data -corruption. If a revlog becomes partly corrupted due to a hardware -error or system bug, it's often possible to reconstruct some or most -revisions from the uncorrupted sections of the revlog, both before and -after the corrupted section. This would not be possible with a -delta-only storage model. - -\section{Revision history, branching, - and merging} - -Every entry in a Mercurial revlog knows the identity of its immediate -ancestor revision, usually referred to as its \emph{parent}. In fact, -a revision contains room for not one parent, but two. Mercurial uses -a special hash, called the ``null ID'', to represent the idea ``there -is no parent here''. This hash is simply a string of zeroes. - -In figure~\ref{fig:concepts:revlog}, you can see an example of the -conceptual structure of a revlog. Filelogs, manifests, and changelogs -all have this same structure; they differ only in the kind of data -stored in each delta or snapshot. - -The first revision in a revlog (at the bottom of the image) has the -null ID in both of its parent slots. For a ``normal'' revision, its -first parent slot contains the ID of its parent revision, and its -second contains the null ID, indicating that the revision has only one -real parent. Any two revisions that have the same parent ID are -branches. A revision that represents a merge between branches has two -normal revision IDs in its parent slots. - -\begin{figure}[ht] - \centering - \grafix{revlog} - \caption{} - \label{fig:concepts:revlog} -\end{figure} - -\section{The working directory} - -In the working directory, Mercurial stores a snapshot of the files -from the repository as of a particular changeset. - -The working directory ``knows'' which changeset it contains. When you -update the working directory to contain a particular changeset, -Mercurial looks up the appropriate revision of the manifest to find -out which files it was tracking at the time that changeset was -committed, and which revision of each file was then current. It then -recreates a copy of each of those files, with the same contents it had -when the changeset was committed. - -The \emph{dirstate} contains Mercurial's knowledge of the working -directory. This details which changeset the working directory is -updated to, and all of the files that Mercurial is tracking in the -working directory. - -Just as a revision of a revlog has room for two parents, so that it -can represent either a normal revision (with one parent) or a merge of -two earlier revisions, the dirstate has slots for two parents. When -you use the \hgcmd{update} command, the changeset that you update to -is stored in the ``first parent'' slot, and the null ID in the second. -When you \hgcmd{merge} with another changeset, the first parent -remains unchanged, and the second parent is filled in with the -changeset you're merging with. The \hgcmd{parents} command tells you -what the parents of the dirstate are. - -\subsection{What happens when you commit} - -The dirstate stores parent information for more than just book-keeping -purposes. Mercurial uses the parents of the dirstate as \emph{the - parents of a new changeset} when you perform a commit. - -\begin{figure}[ht] - \centering - \grafix{wdir} - \caption{The working directory can have two parents} - \label{fig:concepts:wdir} -\end{figure} - -Figure~\ref{fig:concepts:wdir} shows the normal state of the working -directory, where it has a single changeset as parent. That changeset -is the \emph{tip}, the newest changeset in the repository that has no -children. - -\begin{figure}[ht] - \centering - \grafix{wdir-after-commit} - \caption{The working directory gains new parents after a commit} - \label{fig:concepts:wdir-after-commit} -\end{figure} - -It's useful to think of the working directory as ``the changeset I'm -about to commit''. Any files that you tell Mercurial that you've -added, removed, renamed, or copied will be reflected in that -changeset, as will modifications to any files that Mercurial is -already tracking; the new changeset will have the parents of the -working directory as its parents. - -After a commit, Mercurial will update the parents of the working -directory, so that the first parent is the ID of the new changeset, -and the second is the null ID. This is shown in -figure~\ref{fig:concepts:wdir-after-commit}. Mercurial doesn't touch -any of the files in the working directory when you commit; it just -modifies the dirstate to note its new parents. - -\subsection{Creating a new head} - -It's perfectly normal to update the working directory to a changeset -other than the current tip. For example, you might want to know what -your project looked like last Tuesday, or you could be looking through -changesets to see which one introduced a bug. In cases like this, the -natural thing to do is update the working directory to the changeset -you're interested in, and then examine the files in the working -directory directly to see their contents as they were when you -committed that changeset. The effect of this is shown in -figure~\ref{fig:concepts:wdir-pre-branch}. - -\begin{figure}[ht] - \centering - \grafix{wdir-pre-branch} - \caption{The working directory, updated to an older changeset} - \label{fig:concepts:wdir-pre-branch} -\end{figure} - -Having updated the working directory to an older changeset, what -happens if you make some changes, and then commit? Mercurial behaves -in the same way as I outlined above. The parents of the working -directory become the parents of the new changeset. This new changeset -has no children, so it becomes the new tip. And the repository now -contains two changesets that have no children; we call these -\emph{heads}. You can see the structure that this creates in -figure~\ref{fig:concepts:wdir-branch}. - -\begin{figure}[ht] - \centering - \grafix{wdir-branch} - \caption{After a commit made while synced to an older changeset} - \label{fig:concepts:wdir-branch} -\end{figure} - -\begin{note} - If you're new to Mercurial, you should keep in mind a common - ``error'', which is to use the \hgcmd{pull} command without any - options. By default, the \hgcmd{pull} command \emph{does not} - update the working directory, so you'll bring new changesets into - your repository, but the working directory will stay synced at the - same changeset as before the pull. If you make some changes and - commit afterwards, you'll thus create a new head, because your - working directory isn't synced to whatever the current tip is. - - I put the word ``error'' in quotes because all that you need to do - to rectify this situation is \hgcmd{merge}, then \hgcmd{commit}. In - other words, this almost never has negative consequences; it just - surprises people. I'll discuss other ways to avoid this behaviour, - and why Mercurial behaves in this initially surprising way, later - on. -\end{note} - -\subsection{Merging heads} - -When you run the \hgcmd{merge} command, Mercurial leaves the first -parent of the working directory unchanged, and sets the second parent -to the changeset you're merging with, as shown in -figure~\ref{fig:concepts:wdir-merge}. - -\begin{figure}[ht] - \centering - \grafix{wdir-merge} - \caption{Merging two heads} - \label{fig:concepts:wdir-merge} -\end{figure} - -Mercurial also has to modify the working directory, to merge the files -managed in the two changesets. Simplified a little, the merging -process goes like this, for every file in the manifests of both -changesets. -\begin{itemize} -\item If neither changeset has modified a file, do nothing with that - file. -\item If one changeset has modified a file, and the other hasn't, - create the modified copy of the file in the working directory. -\item If one changeset has removed a file, and the other hasn't (or - has also deleted it), delete the file from the working directory. -\item If one changeset has removed a file, but the other has modified - the file, ask the user what to do: keep the modified file, or remove - it? -\item If both changesets have modified a file, invoke an external - merge program to choose the new contents for the merged file. This - may require input from the user. -\item If one changeset has modified a file, and the other has renamed - or copied the file, make sure that the changes follow the new name - of the file. -\end{itemize} -There are more details---merging has plenty of corner cases---but -these are the most common choices that are involved in a merge. As -you can see, most cases are completely automatic, and indeed most -merges finish automatically, without requiring your input to resolve -any conflicts. - -When you're thinking about what happens when you commit after a merge, -once again the working directory is ``the changeset I'm about to -commit''. After the \hgcmd{merge} command completes, the working -directory has two parents; these will become the parents of the new -changeset. - -Mercurial lets you perform multiple merges, but you must commit the -results of each individual merge as you go. This is necessary because -Mercurial only tracks two parents for both revisions and the working -directory. While it would be technically possible to merge multiple -changesets at once, the prospect of user confusion and making a -terrible mess of a merge immediately becomes overwhelming. - -\section{Other interesting design features} - -In the sections above, I've tried to highlight some of the most -important aspects of Mercurial's design, to illustrate that it pays -careful attention to reliability and performance. However, the -attention to detail doesn't stop there. There are a number of other -aspects of Mercurial's construction that I personally find -interesting. I'll detail a few of them here, separate from the ``big -ticket'' items above, so that if you're interested, you can gain a -better idea of the amount of thinking that goes into a well-designed -system. - -\subsection{Clever compression} - -When appropriate, Mercurial will store both snapshots and deltas in -compressed form. It does this by always \emph{trying to} compress a -snapshot or delta, but only storing the compressed version if it's -smaller than the uncompressed version. - -This means that Mercurial does ``the right thing'' when storing a file -whose native form is compressed, such as a \texttt{zip} archive or a -JPEG image. When these types of files are compressed a second time, -the resulting file is usually bigger than the once-compressed form, -and so Mercurial will store the plain \texttt{zip} or JPEG. - -Deltas between revisions of a compressed file are usually larger than -snapshots of the file, and Mercurial again does ``the right thing'' in -these cases. It finds that such a delta exceeds the threshold at -which it should store a complete snapshot of the file, so it stores -the snapshot, again saving space compared to a naive delta-only -approach. - -\subsubsection{Network recompression} - -When storing revisions on disk, Mercurial uses the ``deflate'' -compression algorithm (the same one used by the popular \texttt{zip} -archive format), which balances good speed with a respectable -compression ratio. However, when transmitting revision data over a -network connection, Mercurial uncompresses the compressed revision -data. - -If the connection is over HTTP, Mercurial recompresses the entire -stream of data using a compression algorithm that gives a better -compression ratio (the Burrows-Wheeler algorithm from the widely used -\texttt{bzip2} compression package). This combination of algorithm -and compression of the entire stream (instead of a revision at a time) -substantially reduces the number of bytes to be transferred, yielding -better network performance over almost all kinds of network. - -(If the connection is over \command{ssh}, Mercurial \emph{doesn't} -recompress the stream, because \command{ssh} can already do this -itself.) - -\subsection{Read/write ordering and atomicity} - -Appending to files isn't the whole story when it comes to guaranteeing -that a reader won't see a partial write. If you recall -figure~\ref{fig:concepts:metadata}, revisions in the changelog point to -revisions in the manifest, and revisions in the manifest point to -revisions in filelogs. This hierarchy is deliberate. - -A writer starts a transaction by writing filelog and manifest data, -and doesn't write any changelog data until those are finished. A -reader starts by reading changelog data, then manifest data, followed -by filelog data. - -Since the writer has always finished writing filelog and manifest data -before it writes to the changelog, a reader will never read a pointer -to a partially written manifest revision from the changelog, and it will -never read a pointer to a partially written filelog revision from the -manifest. - -\subsection{Concurrent access} - -The read/write ordering and atomicity guarantees mean that Mercurial -never needs to \emph{lock} a repository when it's reading data, even -if the repository is being written to while the read is occurring. -This has a big effect on scalability; you can have an arbitrary number -of Mercurial processes safely reading data from a repository safely -all at once, no matter whether it's being written to or not. - -The lockless nature of reading means that if you're sharing a -repository on a multi-user system, you don't need to grant other local -users permission to \emph{write} to your repository in order for them -to be able to clone it or pull changes from it; they only need -\emph{read} permission. (This is \emph{not} a common feature among -revision control systems, so don't take it for granted! Most require -readers to be able to lock a repository to access it safely, and this -requires write permission on at least one directory, which of course -makes for all kinds of nasty and annoying security and administrative -problems.) - -Mercurial uses locks to ensure that only one process can write to a -repository at a time (the locking mechanism is safe even over -filesystems that are notoriously hostile to locking, such as NFS). If -a repository is locked, a writer will wait for a while to retry if the -repository becomes unlocked, but if the repository remains locked for -too long, the process attempting to write will time out after a while. -This means that your daily automated scripts won't get stuck forever -and pile up if a system crashes unnoticed, for example. (Yes, the -timeout is configurable, from zero to infinity.) - -\subsubsection{Safe dirstate access} - -As with revision data, Mercurial doesn't take a lock to read the -dirstate file; it does acquire a lock to write it. To avoid the -possibility of reading a partially written copy of the dirstate file, -Mercurial writes to a file with a unique name in the same directory as -the dirstate file, then renames the temporary file atomically to -\filename{dirstate}. The file named \filename{dirstate} is thus -guaranteed to be complete, not partially written. - -\subsection{Avoiding seeks} - -Critical to Mercurial's performance is the avoidance of seeks of the -disk head, since any seek is far more expensive than even a -comparatively large read operation. - -This is why, for example, the dirstate is stored in a single file. If -there were a dirstate file per directory that Mercurial tracked, the -disk would seek once per directory. Instead, Mercurial reads the -entire single dirstate file in one step. - -Mercurial also uses a ``copy on write'' scheme when cloning a -repository on local storage. Instead of copying every revlog file -from the old repository into the new repository, it makes a ``hard -link'', which is a shorthand way to say ``these two names point to the -same file''. When Mercurial is about to write to one of a revlog's -files, it checks to see if the number of names pointing at the file is -greater than one. If it is, more than one repository is using the -file, so Mercurial makes a new copy of the file that is private to -this repository. - -A few revision control developers have pointed out that this idea of -making a complete private copy of a file is not very efficient in its -use of storage. While this is true, storage is cheap, and this method -gives the highest performance while deferring most book-keeping to the -operating system. An alternative scheme would most likely reduce -performance and increase the complexity of the software, each of which -is much more important to the ``feel'' of day-to-day use. - -\subsection{Other contents of the dirstate} - -Because Mercurial doesn't force you to tell it when you're modifying a -file, it uses the dirstate to store some extra information so it can -determine efficiently whether you have modified a file. For each file -in the working directory, it stores the time that it last modified the -file itself, and the size of the file at that time. - -When you explicitly \hgcmd{add}, \hgcmd{remove}, \hgcmd{rename} or -\hgcmd{copy} files, Mercurial updates the dirstate so that it knows -what to do with those files when you commit. - -When Mercurial is checking the states of files in the working -directory, it first checks a file's modification time. If that has -not changed, the file must not have been modified. If the file's size -has changed, the file must have been modified. If the modification -time has changed, but the size has not, only then does Mercurial need -to read the actual contents of the file to see if they've changed. -Storing these few extra pieces of information dramatically reduces the -amount of data that Mercurial needs to read, which yields large -performance improvements compared to other revision control systems. - -%%% Local Variables: -%%% mode: latex -%%% TeX-master: "00book" -%%% End: diff -r bc14f94e726a -r 5cd47f721686 en/daily.tex --- a/en/daily.tex Thu Jan 29 22:47:34 2009 -0800 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,381 +0,0 @@ -\chapter{Mercurial in daily use} -\label{chap:daily} - -\section{Telling Mercurial which files to track} - -Mercurial does not work with files in your repository unless you tell -it to manage them. The \hgcmd{status} command will tell you which -files Mercurial doesn't know about; it uses a ``\texttt{?}'' to -display such files. - -To tell Mercurial to track a file, use the \hgcmd{add} command. Once -you have added a file, the entry in the output of \hgcmd{status} for -that file changes from ``\texttt{?}'' to ``\texttt{A}''. -\interaction{daily.files.add} - -After you run a \hgcmd{commit}, the files that you added before the -commit will no longer be listed in the output of \hgcmd{status}. The -reason for this is that \hgcmd{status} only tells you about -``interesting'' files---those that you have modified or told Mercurial -to do something with---by default. If you have a repository that -contains thousands of files, you will rarely want to know about files -that Mercurial is tracking, but that have not changed. (You can still -get this information; we'll return to this later.) - -Once you add a file, Mercurial doesn't do anything with it -immediately. Instead, it will take a snapshot of the file's state the -next time you perform a commit. It will then continue to track the -changes you make to the file every time you commit, until you remove -the file. - -\subsection{Explicit versus implicit file naming} - -A useful behaviour that Mercurial has is that if you pass the name of -a directory to a command, every Mercurial command will treat this as -``I want to operate on every file in this directory and its -subdirectories''. -\interaction{daily.files.add-dir} -Notice in this example that Mercurial printed the names of the files -it added, whereas it didn't do so when we added the file named -\filename{a} in the earlier example. - -What's going on is that in the former case, we explicitly named the -file to add on the command line, so the assumption that Mercurial -makes in such cases is that you know what you were doing, and it -doesn't print any output. - -However, when we \emph{imply} the names of files by giving the name of -a directory, Mercurial takes the extra step of printing the name of -each file that it does something with. This makes it more clear what -is happening, and reduces the likelihood of a silent and nasty -surprise. This behaviour is common to most Mercurial commands. - -\subsection{Aside: Mercurial tracks files, not directories} - -Mercurial does not track directory information. Instead, it tracks -the path to a file. Before creating a file, it first creates any -missing directory components of the path. After it deletes a file, it -then deletes any empty directories that were in the deleted file's -path. This sounds like a trivial distinction, but it has one minor -practical consequence: it is not possible to represent a completely -empty directory in Mercurial. - -Empty directories are rarely useful, and there are unintrusive -workarounds that you can use to achieve an appropriate effect. The -developers of Mercurial thus felt that the complexity that would be -required to manage empty directories was not worth the limited benefit -this feature would bring. - -If you need an empty directory in your repository, there are a few -ways to achieve this. One is to create a directory, then \hgcmd{add} a -``hidden'' file to that directory. On Unix-like systems, any file -name that begins with a period (``\texttt{.}'') is treated as hidden -by most commands and GUI tools. This approach is illustrated in -figure~\ref{ex:daily:hidden}. - -\begin{figure}[ht] - \interaction{daily.files.hidden} - \caption{Simulating an empty directory using a hidden file} - \label{ex:daily:hidden} -\end{figure} - -Another way to tackle a need for an empty directory is to simply -create one in your automated build scripts before they will need it. - -\section{How to stop tracking a file} - -Once you decide that a file no longer belongs in your repository, use -the \hgcmd{remove} command; this deletes the file, and tells Mercurial -to stop tracking it. A removed file is represented in the output of -\hgcmd{status} with a ``\texttt{R}''. -\interaction{daily.files.remove} - -After you \hgcmd{remove} a file, Mercurial will no longer track -changes to that file, even if you recreate a file with the same name -in your working directory. If you do recreate a file with the same -name and want Mercurial to track the new file, simply \hgcmd{add} it. -Mercurial will know that the newly added file is not related to the -old file of the same name. - -\subsection{Removing a file does not affect its history} - -It is important to understand that removing a file has only two -effects. -\begin{itemize} -\item It removes the current version of the file from the working - directory. -\item It stops Mercurial from tracking changes to the file, from the - time of the next commit. -\end{itemize} -Removing a file \emph{does not} in any way alter the \emph{history} of -the file. - -If you update the working directory to a changeset in which a file -that you have removed was still tracked, it will reappear in the -working directory, with the contents it had when you committed that -changeset. If you then update the working directory to a later -changeset, in which the file had been removed, Mercurial will once -again remove the file from the working directory. - -\subsection{Missing files} - -Mercurial considers a file that you have deleted, but not used -\hgcmd{remove} to delete, to be \emph{missing}. A missing file is -represented with ``\texttt{!}'' in the output of \hgcmd{status}. -Mercurial commands will not generally do anything with missing files. -\interaction{daily.files.missing} - -If your repository contains a file that \hgcmd{status} reports as -missing, and you want the file to stay gone, you can run -\hgcmdargs{remove}{\hgopt{remove}{--after}} at any time later on, to -tell Mercurial that you really did mean to remove the file. -\interaction{daily.files.remove-after} - -On the other hand, if you deleted the missing file by accident, use -\hgcmdargs{revert}{\emph{filename}} to recover the file. It will -reappear, in unmodified form. -\interaction{daily.files.recover-missing} - -\subsection{Aside: why tell Mercurial explicitly to - remove a file?} - -You might wonder why Mercurial requires you to explicitly tell it that -you are deleting a file. Early during the development of Mercurial, -it let you delete a file however you pleased; Mercurial would notice -the absence of the file automatically when you next ran a -\hgcmd{commit}, and stop tracking the file. In practice, this made it -too easy to accidentally remove a file without noticing. - -\subsection{Useful shorthand---adding and removing files - in one step} - -Mercurial offers a combination command, \hgcmd{addremove}, that adds -untracked files and marks missing files as removed. -\interaction{daily.files.addremove} -The \hgcmd{commit} command also provides a \hgopt{commit}{-A} option -that performs this same add-and-remove, immediately followed by a -commit. -\interaction{daily.files.commit-addremove} - -\section{Copying files} - -Mercurial provides a \hgcmd{copy} command that lets you make a new -copy of a file. When you copy a file using this command, Mercurial -makes a record of the fact that the new file is a copy of the original -file. It treats these copied files specially when you merge your work -with someone else's. - -\subsection{The results of copying during a merge} - -What happens during a merge is that changes ``follow'' a copy. To -best illustrate what this means, let's create an example. We'll start -with the usual tiny repository that contains a single file. -\interaction{daily.copy.init} -We need to do some work in parallel, so that we'll have something to -merge. So let's clone our repository. -\interaction{daily.copy.clone} -Back in our initial repository, let's use the \hgcmd{copy} command to -make a copy of the first file we created. -\interaction{daily.copy.copy} - -If we look at the output of the \hgcmd{status} command afterwards, the -copied file looks just like a normal added file. -\interaction{daily.copy.status} -But if we pass the \hgopt{status}{-C} option to \hgcmd{status}, it -prints another line of output: this is the file that our newly-added -file was copied \emph{from}. -\interaction{daily.copy.status-copy} - -Now, back in the repository we cloned, let's make a change in -parallel. We'll add a line of content to the original file that we -created. -\interaction{daily.copy.other} -Now we have a modified \filename{file} in this repository. When we -pull the changes from the first repository, and merge the two heads, -Mercurial will propagate the changes that we made locally to -\filename{file} into its copy, \filename{new-file}. -\interaction{daily.copy.merge} - -\subsection{Why should changes follow copies?} -\label{sec:daily:why-copy} - -This behaviour, of changes to a file propagating out to copies of the -file, might seem esoteric, but in most cases it's highly desirable. - -First of all, remember that this propagation \emph{only} happens when -you merge. So if you \hgcmd{copy} a file, and subsequently modify the -original file during the normal course of your work, nothing will -happen. - -The second thing to know is that modifications will only propagate -across a copy as long as the repository that you're pulling changes -from \emph{doesn't know} about the copy. - -The reason that Mercurial does this is as follows. Let's say I make -an important bug fix in a source file, and commit my changes. -Meanwhile, you've decided to \hgcmd{copy} the file in your repository, -without knowing about the bug or having seen the fix, and you have -started hacking on your copy of the file. - -If you pulled and merged my changes, and Mercurial \emph{didn't} -propagate changes across copies, your source file would now contain -the bug, and unless you remembered to propagate the bug fix by hand, -the bug would \emph{remain} in your copy of the file. - -By automatically propagating the change that fixed the bug from the -original file to the copy, Mercurial prevents this class of problem. -To my knowledge, Mercurial is the \emph{only} revision control system -that propagates changes across copies like this. - -Once your change history has a record that the copy and subsequent -merge occurred, there's usually no further need to propagate changes -from the original file to the copied file, and that's why Mercurial -only propagates changes across copies until this point, and no -further. - -\subsection{How to make changes \emph{not} follow a copy} - -If, for some reason, you decide that this business of automatically -propagating changes across copies is not for you, simply use your -system's normal file copy command (on Unix-like systems, that's -\command{cp}) to make a copy of a file, then \hgcmd{add} the new copy -by hand. Before you do so, though, please do reread -section~\ref{sec:daily:why-copy}, and make an informed decision that -this behaviour is not appropriate to your specific case. - -\subsection{Behaviour of the \hgcmd{copy} command} - -When you use the \hgcmd{copy} command, Mercurial makes a copy of each -source file as it currently stands in the working directory. This -means that if you make some modifications to a file, then \hgcmd{copy} -it without first having committed those changes, the new copy will -also contain the modifications you have made up until that point. (I -find this behaviour a little counterintuitive, which is why I mention -it here.) - -The \hgcmd{copy} command acts similarly to the Unix \command{cp} -command (you can use the \hgcmd{cp} alias if you prefer). The last -argument is the \emph{destination}, and all prior arguments are -\emph{sources}. If you pass it a single file as the source, and the -destination does not exist, it creates a new file with that name. -\interaction{daily.copy.simple} -If the destination is a directory, Mercurial copies its sources into -that directory. -\interaction{daily.copy.dir-dest} -Copying a directory is recursive, and preserves the directory -structure of the source. -\interaction{daily.copy.dir-src} -If the source and destination are both directories, the source tree is -recreated in the destination directory. -\interaction{daily.copy.dir-src-dest} - -As with the \hgcmd{rename} command, if you copy a file manually and -then want Mercurial to know that you've copied the file, simply use -the \hgopt{copy}{--after} option to \hgcmd{copy}. -\interaction{daily.copy.after} - -\section{Renaming files} - -It's rather more common to need to rename a file than to make a copy -of it. The reason I discussed the \hgcmd{copy} command before talking -about renaming files is that Mercurial treats a rename in essentially -the same way as a copy. Therefore, knowing what Mercurial does when -you copy a file tells you what to expect when you rename a file. - -When you use the \hgcmd{rename} command, Mercurial makes a copy of -each source file, then deletes it and marks the file as removed. -\interaction{daily.rename.rename} -The \hgcmd{status} command shows the newly copied file as added, and -the copied-from file as removed. -\interaction{daily.rename.status} -As with the results of a \hgcmd{copy}, we must use the -\hgopt{status}{-C} option to \hgcmd{status} to see that the added file -is really being tracked by Mercurial as a copy of the original, now -removed, file. -\interaction{daily.rename.status-copy} - -As with \hgcmd{remove} and \hgcmd{copy}, you can tell Mercurial about -a rename after the fact using the \hgopt{rename}{--after} option. In -most other respects, the behaviour of the \hgcmd{rename} command, and -the options it accepts, are similar to the \hgcmd{copy} command. - -\subsection{Renaming files and merging changes} - -Since Mercurial's rename is implemented as copy-and-remove, the same -propagation of changes happens when you merge after a rename as after -a copy. - -If I modify a file, and you rename it to a new name, and then we merge -our respective changes, my modifications to the file under its -original name will be propagated into the file under its new name. -(This is something you might expect to ``simply work,'' but not all -revision control systems actually do this.) - -Whereas having changes follow a copy is a feature where you can -perhaps nod and say ``yes, that might be useful,'' it should be clear -that having them follow a rename is definitely important. Without -this facility, it would simply be too easy for changes to become -orphaned when files are renamed. - -\subsection{Divergent renames and merging} - -The case of diverging names occurs when two developers start with a -file---let's call it \filename{foo}---in their respective -repositories. - -\interaction{rename.divergent.clone} -Anne renames the file to \filename{bar}. -\interaction{rename.divergent.rename.anne} -Meanwhile, Bob renames it to \filename{quux}. -\interaction{rename.divergent.rename.bob} - -I like to think of this as a conflict because each developer has -expressed different intentions about what the file ought to be named. - -What do you think should happen when they merge their work? -Mercurial's actual behaviour is that it always preserves \emph{both} -names when it merges changesets that contain divergent renames. -\interaction{rename.divergent.merge} - -Notice that Mercurial does warn about the divergent renames, but it -leaves it up to you to do something about the divergence after the merge. - -\subsection{Convergent renames and merging} - -Another kind of rename conflict occurs when two people choose to -rename different \emph{source} files to the same \emph{destination}. -In this case, Mercurial runs its normal merge machinery, and lets you -guide it to a suitable resolution. - -\subsection{Other name-related corner cases} - -Mercurial has a longstanding bug in which it fails to handle a merge -where one side has a file with a given name, while another has a -directory with the same name. This is documented as~\bug{29}. -\interaction{issue29.go} - -\section{Recovering from mistakes} - -Mercurial has some useful commands that will help you to recover from -some common mistakes. - -The \hgcmd{revert} command lets you undo changes that you have made to -your working directory. For example, if you \hgcmd{add} a file by -accident, just run \hgcmd{revert} with the name of the file you added, -and while the file won't be touched in any way, it won't be tracked -for adding by Mercurial any longer, either. You can also use -\hgcmd{revert} to get rid of erroneous changes to a file. - -It's useful to remember that the \hgcmd{revert} command is useful for -changes that you have not yet committed. Once you've committed a -change, if you decide it was a mistake, you can still do something -about it, though your options may be more limited. - -For more information about the \hgcmd{revert} command, and details -about how to deal with changes you have already committed, see -chapter~\ref{chap:undo}. - -%%% Local Variables: -%%% mode: latex -%%% TeX-master: "00book" -%%% End: diff -r bc14f94e726a -r 5cd47f721686 en/filenames.tex --- a/en/filenames.tex Thu Jan 29 22:47:34 2009 -0800 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,306 +0,0 @@ -\chapter{File names and pattern matching} -\label{chap:names} - -Mercurial provides mechanisms that let you work with file names in a -consistent and expressive way. - -\section{Simple file naming} - -Mercurial uses a unified piece of machinery ``under the hood'' to -handle file names. Every command behaves uniformly with respect to -file names. The way in which commands work with file names is as -follows. - -If you explicitly name real files on the command line, Mercurial works -with exactly those files, as you would expect. -\interaction{filenames.files} - -When you provide a directory name, Mercurial will interpret this as -``operate on every file in this directory and its subdirectories''. -Mercurial traverses the files and subdirectories in a directory in -alphabetical order. When it encounters a subdirectory, it will -traverse that subdirectory before continuing with the current -directory. -\interaction{filenames.dirs} - -\section{Running commands without any file names} - -Mercurial's commands that work with file names have useful default -behaviours when you invoke them without providing any file names or -patterns. What kind of behaviour you should expect depends on what -the command does. Here are a few rules of thumb you can use to -predict what a command is likely to do if you don't give it any names -to work with. -\begin{itemize} -\item Most commands will operate on the entire working directory. - This is what the \hgcmd{add} command does, for example. -\item If the command has effects that are difficult or impossible to - reverse, it will force you to explicitly provide at least one name - or pattern (see below). This protects you from accidentally - deleting files by running \hgcmd{remove} with no arguments, for - example. -\end{itemize} - -It's easy to work around these default behaviours if they don't suit -you. If a command normally operates on the whole working directory, -you can invoke it on just the current directory and its subdirectories -by giving it the name ``\dirname{.}''. -\interaction{filenames.wdir-subdir} - -Along the same lines, some commands normally print file names relative -to the root of the repository, even if you're invoking them from a -subdirectory. Such a command will print file names relative to your -subdirectory if you give it explicit names. Here, we're going to run -\hgcmd{status} from a subdirectory, and get it to operate on the -entire working directory while printing file names relative to our -subdirectory, by passing it the output of the \hgcmd{root} command. -\interaction{filenames.wdir-relname} - -\section{Telling you what's going on} - -The \hgcmd{add} example in the preceding section illustrates something -else that's helpful about Mercurial commands. If a command operates -on a file that you didn't name explicitly on the command line, it will -usually print the name of the file, so that you will not be surprised -what's going on. - -The principle here is of \emph{least surprise}. If you've exactly -named a file on the command line, there's no point in repeating it -back at you. If Mercurial is acting on a file \emph{implicitly}, -because you provided no names, or a directory, or a pattern (see -below), it's safest to tell you what it's doing. - -For commands that behave this way, you can silence them using the -\hggopt{-q} option. You can also get them to print the name of every -file, even those you've named explicitly, using the \hggopt{-v} -option. - -\section{Using patterns to identify files} - -In addition to working with file and directory names, Mercurial lets -you use \emph{patterns} to identify files. Mercurial's pattern -handling is expressive. - -On Unix-like systems (Linux, MacOS, etc.), the job of matching file -names to patterns normally falls to the shell. On these systems, you -must explicitly tell Mercurial that a name is a pattern. On Windows, -the shell does not expand patterns, so Mercurial will automatically -identify names that are patterns, and expand them for you. - -To provide a pattern in place of a regular name on the command line, -the mechanism is simple: -\begin{codesample2} - syntax:patternbody -\end{codesample2} -That is, a pattern is identified by a short text string that says what -kind of pattern this is, followed by a colon, followed by the actual -pattern. - -Mercurial supports two kinds of pattern syntax. The most frequently -used is called \texttt{glob}; this is the same kind of pattern -matching used by the Unix shell, and should be familiar to Windows -command prompt users, too. - -When Mercurial does automatic pattern matching on Windows, it uses -\texttt{glob} syntax. You can thus omit the ``\texttt{glob:}'' prefix -on Windows, but it's safe to use it, too. - -The \texttt{re} syntax is more powerful; it lets you specify patterns -using regular expressions, also known as regexps. - -By the way, in the examples that follow, notice that I'm careful to -wrap all of my patterns in quote characters, so that they won't get -expanded by the shell before Mercurial sees them. - -\subsection{Shell-style \texttt{glob} patterns} - -This is an overview of the kinds of patterns you can use when you're -matching on glob patterns. - -The ``\texttt{*}'' character matches any string, within a single -directory. -\interaction{filenames.glob.star} - -The ``\texttt{**}'' pattern matches any string, and crosses directory -boundaries. It's not a standard Unix glob token, but it's accepted by -several popular Unix shells, and is very useful. -\interaction{filenames.glob.starstar} - -The ``\texttt{?}'' pattern matches any single character. -\interaction{filenames.glob.question} - -The ``\texttt{[}'' character begins a \emph{character class}. This -matches any single character within the class. The class ends with a -``\texttt{]}'' character. A class may contain multiple \emph{range}s -of the form ``\texttt{a-f}'', which is shorthand for -``\texttt{abcdef}''. -\interaction{filenames.glob.range} -If the first character after the ``\texttt{[}'' in a character class -is a ``\texttt{!}'', it \emph{negates} the class, making it match any -single character not in the class. - -A ``\texttt{\{}'' begins a group of subpatterns, where the whole group -matches if any subpattern in the group matches. The ``\texttt{,}'' -character separates subpatterns, and ``\texttt{\}}'' ends the group. -\interaction{filenames.glob.group} - -\subsubsection{Watch out!} - -Don't forget that if you want to match a pattern in any directory, you -should not be using the ``\texttt{*}'' match-any token, as this will -only match within one directory. Instead, use the ``\texttt{**}'' -token. This small example illustrates the difference between the two. -\interaction{filenames.glob.star-starstar} - -\subsection{Regular expression matching with \texttt{re} patterns} - -Mercurial accepts the same regular expression syntax as the Python -programming language (it uses Python's regexp engine internally). -This is based on the Perl language's regexp syntax, which is the most -popular dialect in use (it's also used in Java, for example). - -I won't discuss Mercurial's regexp dialect in any detail here, as -regexps are not often used. Perl-style regexps are in any case -already exhaustively documented on a multitude of web sites, and in -many books. Instead, I will focus here on a few things you should -know if you find yourself needing to use regexps with Mercurial. - -A regexp is matched against an entire file name, relative to the root -of the repository. In other words, even if you're already in -subbdirectory \dirname{foo}, if you want to match files under this -directory, your pattern must start with ``\texttt{foo/}''. - -One thing to note, if you're familiar with Perl-style regexps, is that -Mercurial's are \emph{rooted}. That is, a regexp starts matching -against the beginning of a string; it doesn't look for a match -anywhere within the string. To match anywhere in a string, start -your pattern with ``\texttt{.*}''. - -\section{Filtering files} - -Not only does Mercurial give you a variety of ways to specify files; -it lets you further winnow those files using \emph{filters}. Commands -that work with file names accept two filtering options. -\begin{itemize} -\item \hggopt{-I}, or \hggopt{--include}, lets you specify a pattern - that file names must match in order to be processed. -\item \hggopt{-X}, or \hggopt{--exclude}, gives you a way to - \emph{avoid} processing files, if they match this pattern. -\end{itemize} -You can provide multiple \hggopt{-I} and \hggopt{-X} options on the -command line, and intermix them as you please. Mercurial interprets -the patterns you provide using glob syntax by default (but you can use -regexps if you need to). - -You can read a \hggopt{-I} filter as ``process only the files that -match this filter''. -\interaction{filenames.filter.include} -The \hggopt{-X} filter is best read as ``process only the files that -don't match this pattern''. -\interaction{filenames.filter.exclude} - -\section{Ignoring unwanted files and directories} - -XXX. - -\section{Case sensitivity} -\label{sec:names:case} - -If you're working in a mixed development environment that contains -both Linux (or other Unix) systems and Macs or Windows systems, you -should keep in the back of your mind the knowledge that they treat the -case (``N'' versus ``n'') of file names in incompatible ways. This is -not very likely to affect you, and it's easy to deal with if it does, -but it could surprise you if you don't know about it. - -Operating systems and filesystems differ in the way they handle the -\emph{case} of characters in file and directory names. There are -three common ways to handle case in names. -\begin{itemize} -\item Completely case insensitive. Uppercase and lowercase versions - of a letter are treated as identical, both when creating a file and - during subsequent accesses. This is common on older DOS-based - systems. -\item Case preserving, but insensitive. When a file or directory is - created, the case of its name is stored, and can be retrieved and - displayed by the operating system. When an existing file is being - looked up, its case is ignored. This is the standard arrangement on - Windows and MacOS. The names \filename{foo} and \filename{FoO} - identify the same file. This treatment of uppercase and lowercase - letters as interchangeable is also referred to as \emph{case - folding}. -\item Case sensitive. The case of a name is significant at all times. - The names \filename{foo} and {FoO} identify different files. This - is the way Linux and Unix systems normally work. -\end{itemize} - -On Unix-like systems, it is possible to have any or all of the above -ways of handling case in action at once. For example, if you use a -USB thumb drive formatted with a FAT32 filesystem on a Linux system, -Linux will handle names on that filesystem in a case preserving, but -insensitive, way. - -\subsection{Safe, portable repository storage} - -Mercurial's repository storage mechanism is \emph{case safe}. It -translates file names so that they can be safely stored on both case -sensitive and case insensitive filesystems. This means that you can -use normal file copying tools to transfer a Mercurial repository onto, -for example, a USB thumb drive, and safely move that drive and -repository back and forth between a Mac, a PC running Windows, and a -Linux box. - -\subsection{Detecting case conflicts} - -When operating in the working directory, Mercurial honours the naming -policy of the filesystem where the working directory is located. If -the filesystem is case preserving, but insensitive, Mercurial will -treat names that differ only in case as the same. - -An important aspect of this approach is that it is possible to commit -a changeset on a case sensitive (typically Linux or Unix) filesystem -that will cause trouble for users on case insensitive (usually Windows -and MacOS) users. If a Linux user commits changes to two files, one -named \filename{myfile.c} and the other named \filename{MyFile.C}, -they will be stored correctly in the repository. And in the working -directories of other Linux users, they will be correctly represented -as separate files. - -If a Windows or Mac user pulls this change, they will not initially -have a problem, because Mercurial's repository storage mechanism is -case safe. However, once they try to \hgcmd{update} the working -directory to that changeset, or \hgcmd{merge} with that changeset, -Mercurial will spot the conflict between the two file names that the -filesystem would treat as the same, and forbid the update or merge -from occurring. - -\subsection{Fixing a case conflict} - -If you are using Windows or a Mac in a mixed environment where some of -your collaborators are using Linux or Unix, and Mercurial reports a -case folding conflict when you try to \hgcmd{update} or \hgcmd{merge}, -the procedure to fix the problem is simple. - -Just find a nearby Linux or Unix box, clone the problem repository -onto it, and use Mercurial's \hgcmd{rename} command to change the -names of any offending files or directories so that they will no -longer cause case folding conflicts. Commit this change, \hgcmd{pull} -or \hgcmd{push} it across to your Windows or MacOS system, and -\hgcmd{update} to the revision with the non-conflicting names. - -The changeset with case-conflicting names will remain in your -project's history, and you still won't be able to \hgcmd{update} your -working directory to that changeset on a Windows or MacOS system, but -you can continue development unimpeded. - -\begin{note} - Prior to version~0.9.3, Mercurial did not use a case safe repository - storage mechanism, and did not detect case folding conflicts. If - you are using an older version of Mercurial on Windows or MacOS, I - strongly recommend that you upgrade. -\end{note} - -%%% Local Variables: -%%% mode: latex -%%% TeX-master: "00book" -%%% End: diff -r bc14f94e726a -r 5cd47f721686 en/hgext.tex --- a/en/hgext.tex Thu Jan 29 22:47:34 2009 -0800 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,429 +0,0 @@ -\chapter{Adding functionality with extensions} -\label{chap:hgext} - -While the core of Mercurial is quite complete from a functionality -standpoint, it's deliberately shorn of fancy features. This approach -of preserving simplicity keeps the software easy to deal with for both -maintainers and users. - -However, Mercurial doesn't box you in with an inflexible command set: -you can add features to it as \emph{extensions} (sometimes known as -\emph{plugins}). We've already discussed a few of these extensions in -earlier chapters. -\begin{itemize} -\item Section~\ref{sec:tour-merge:fetch} covers the \hgext{fetch} - extension; this combines pulling new changes and merging them with - local changes into a single command, \hgxcmd{fetch}{fetch}. -\item In chapter~\ref{chap:hook}, we covered several extensions that - are useful for hook-related functionality: \hgext{acl} adds access - control lists; \hgext{bugzilla} adds integration with the Bugzilla - bug tracking system; and \hgext{notify} sends notification emails on - new changes. -\item The Mercurial Queues patch management extension is so invaluable - that it merits two chapters and an appendix all to itself. - Chapter~\ref{chap:mq} covers the basics; - chapter~\ref{chap:mq-collab} discusses advanced topics; and - appendix~\ref{chap:mqref} goes into detail on each command. -\end{itemize} - -In this chapter, we'll cover some of the other extensions that are -available for Mercurial, and briefly touch on some of the machinery -you'll need to know about if you want to write an extension of your -own. -\begin{itemize} -\item In section~\ref{sec:hgext:inotify}, we'll discuss the - possibility of \emph{huge} performance improvements using the - \hgext{inotify} extension. -\end{itemize} - -\section{Improve performance with the \hgext{inotify} extension} -\label{sec:hgext:inotify} - -Are you interested in having some of the most common Mercurial -operations run as much as a hundred times faster? Read on! - -Mercurial has great performance under normal circumstances. For -example, when you run the \hgcmd{status} command, Mercurial has to -scan almost every directory and file in your repository so that it can -display file status. Many other Mercurial commands need to do the -same work behind the scenes; for example, the \hgcmd{diff} command -uses the status machinery to avoid doing an expensive comparison -operation on files that obviously haven't changed. - -Because obtaining file status is crucial to good performance, the -authors of Mercurial have optimised this code to within an inch of its -life. However, there's no avoiding the fact that when you run -\hgcmd{status}, Mercurial is going to have to perform at least one -expensive system call for each managed file to determine whether it's -changed since the last time Mercurial checked. For a sufficiently -large repository, this can take a long time. - -To put a number on the magnitude of this effect, I created a -repository containing 150,000 managed files. I timed \hgcmd{status} -as taking ten seconds to run, even when \emph{none} of those files had -been modified. - -Many modern operating systems contain a file notification facility. -If a program signs up to an appropriate service, the operating system -will notify it every time a file of interest is created, modified, or -deleted. On Linux systems, the kernel component that does this is -called \texttt{inotify}. - -Mercurial's \hgext{inotify} extension talks to the kernel's -\texttt{inotify} component to optimise \hgcmd{status} commands. The -extension has two components. A daemon sits in the background and -receives notifications from the \texttt{inotify} subsystem. It also -listens for connections from a regular Mercurial command. The -extension modifies Mercurial's behaviour so that instead of scanning -the filesystem, it queries the daemon. Since the daemon has perfect -information about the state of the repository, it can respond with a -result instantaneously, avoiding the need to scan every directory and -file in the repository. - -Recall the ten seconds that I measured plain Mercurial as taking to -run \hgcmd{status} on a 150,000 file repository. With the -\hgext{inotify} extension enabled, the time dropped to 0.1~seconds, a -factor of \emph{one hundred} faster. - -Before we continue, please pay attention to some caveats. -\begin{itemize} -\item The \hgext{inotify} extension is Linux-specific. Because it - interfaces directly to the Linux kernel's \texttt{inotify} - subsystem, it does not work on other operating systems. -\item It should work on any Linux distribution that was released after - early~2005. Older distributions are likely to have a kernel that - lacks \texttt{inotify}, or a version of \texttt{glibc} that does not - have the necessary interfacing support. -\item Not all filesystems are suitable for use with the - \hgext{inotify} extension. Network filesystems such as NFS are a - non-starter, for example, particularly if you're running Mercurial - on several systems, all mounting the same network filesystem. The - kernel's \texttt{inotify} system has no way of knowing about changes - made on another system. Most local filesystems (e.g.~ext3, XFS, - ReiserFS) should work fine. -\end{itemize} - -The \hgext{inotify} extension is not yet shipped with Mercurial as of -May~2007, so it's a little more involved to set up than other -extensions. But the performance improvement is worth it! - -The extension currently comes in two parts: a set of patches to the -Mercurial source code, and a library of Python bindings to the -\texttt{inotify} subsystem. -\begin{note} - There are \emph{two} Python \texttt{inotify} binding libraries. One - of them is called \texttt{pyinotify}, and is packaged by some Linux - distributions as \texttt{python-inotify}. This is \emph{not} the - one you'll need, as it is too buggy and inefficient to be practical. -\end{note} -To get going, it's best to already have a functioning copy of -Mercurial installed. -\begin{note} - If you follow the instructions below, you'll be \emph{replacing} and - overwriting any existing installation of Mercurial that you might - already have, using the latest ``bleeding edge'' Mercurial code. - Don't say you weren't warned! -\end{note} -\begin{enumerate} -\item Clone the Python \texttt{inotify} binding repository. Build and - install it. - \begin{codesample4} - hg clone http://hg.kublai.com/python/inotify - cd inotify - python setup.py build --force - sudo python setup.py install --skip-build - \end{codesample4} -\item Clone the \dirname{crew} Mercurial repository. Clone the - \hgext{inotify} patch repository so that Mercurial Queues will be - able to apply patches to your cope of the \dirname{crew} repository. - \begin{codesample4} - hg clone http://hg.intevation.org/mercurial/crew - hg clone crew inotify - hg clone http://hg.kublai.com/mercurial/patches/inotify inotify/.hg/patches - \end{codesample4} -\item Make sure that you have the Mercurial Queues extension, - \hgext{mq}, enabled. If you've never used MQ, read - section~\ref{sec:mq:start} to get started quickly. -\item Go into the \dirname{inotify} repo, and apply all of the - \hgext{inotify} patches using the \hgxopt{mq}{qpush}{-a} option to - the \hgxcmd{mq}{qpush} command. - \begin{codesample4} - cd inotify - hg qpush -a - \end{codesample4} - If you get an error message from \hgxcmd{mq}{qpush}, you should not - continue. Instead, ask for help. -\item Build and install the patched version of Mercurial. - \begin{codesample4} - python setup.py build --force - sudo python setup.py install --skip-build - \end{codesample4} -\end{enumerate} -Once you've build a suitably patched version of Mercurial, all you -need to do to enable the \hgext{inotify} extension is add an entry to -your \hgrc. -\begin{codesample2} - [extensions] - inotify = -\end{codesample2} -When the \hgext{inotify} extension is enabled, Mercurial will -automatically and transparently start the status daemon the first time -you run a command that needs status in a repository. It runs one -status daemon per repository. - -The status daemon is started silently, and runs in the background. If -you look at a list of running processes after you've enabled the -\hgext{inotify} extension and run a few commands in different -repositories, you'll thus see a few \texttt{hg} processes sitting -around, waiting for updates from the kernel and queries from -Mercurial. - -The first time you run a Mercurial command in a repository when you -have the \hgext{inotify} extension enabled, it will run with about the -same performance as a normal Mercurial command. This is because the -status daemon needs to perform a normal status scan so that it has a -baseline against which to apply later updates from the kernel. -However, \emph{every} subsequent command that does any kind of status -check should be noticeably faster on repositories of even fairly -modest size. Better yet, the bigger your repository is, the greater a -performance advantage you'll see. The \hgext{inotify} daemon makes -status operations almost instantaneous on repositories of all sizes! - -If you like, you can manually start a status daemon using the -\hgxcmd{inotify}{inserve} command. This gives you slightly finer -control over how the daemon ought to run. This command will of course -only be available when the \hgext{inotify} extension is enabled. - -When you're using the \hgext{inotify} extension, you should notice -\emph{no difference at all} in Mercurial's behaviour, with the sole -exception of status-related commands running a whole lot faster than -they used to. You should specifically expect that commands will not -print different output; neither should they give different results. -If either of these situations occurs, please report a bug. - -\section{Flexible diff support with the \hgext{extdiff} extension} -\label{sec:hgext:extdiff} - -Mercurial's built-in \hgcmd{diff} command outputs plaintext unified -diffs. -\interaction{extdiff.diff} -If you would like to use an external tool to display modifications, -you'll want to use the \hgext{extdiff} extension. This will let you -use, for example, a graphical diff tool. - -The \hgext{extdiff} extension is bundled with Mercurial, so it's easy -to set up. In the \rcsection{extensions} section of your \hgrc, -simply add a one-line entry to enable the extension. -\begin{codesample2} - [extensions] - extdiff = -\end{codesample2} -This introduces a command named \hgxcmd{extdiff}{extdiff}, which by -default uses your system's \command{diff} command to generate a -unified diff in the same form as the built-in \hgcmd{diff} command. -\interaction{extdiff.extdiff} -The result won't be exactly the same as with the built-in \hgcmd{diff} -variations, because the output of \command{diff} varies from one -system to another, even when passed the same options. - -As the ``\texttt{making snapshot}'' lines of output above imply, the -\hgxcmd{extdiff}{extdiff} command works by creating two snapshots of -your source tree. The first snapshot is of the source revision; the -second, of the target revision or working directory. The -\hgxcmd{extdiff}{extdiff} command generates these snapshots in a -temporary directory, passes the name of each directory to an external -diff viewer, then deletes the temporary directory. For efficiency, it -only snapshots the directories and files that have changed between the -two revisions. - -Snapshot directory names have the same base name as your repository. -If your repository path is \dirname{/quux/bar/foo}, then \dirname{foo} -will be the name of each snapshot directory. Each snapshot directory -name has its changeset ID appended, if appropriate. If a snapshot is -of revision \texttt{a631aca1083f}, the directory will be named -\dirname{foo.a631aca1083f}. A snapshot of the working directory won't -have a changeset ID appended, so it would just be \dirname{foo} in -this example. To see what this looks like in practice, look again at -the \hgxcmd{extdiff}{extdiff} example above. Notice that the diff has -the snapshot directory names embedded in its header. - -The \hgxcmd{extdiff}{extdiff} command accepts two important options. -The \hgxopt{extdiff}{extdiff}{-p} option lets you choose a program to -view differences with, instead of \command{diff}. With the -\hgxopt{extdiff}{extdiff}{-o} option, you can change the options that -\hgxcmd{extdiff}{extdiff} passes to the program (by default, these -options are ``\texttt{-Npru}'', which only make sense if you're -running \command{diff}). In other respects, the -\hgxcmd{extdiff}{extdiff} command acts similarly to the built-in -\hgcmd{diff} command: you use the same option names, syntax, and -arguments to specify the revisions you want, the files you want, and -so on. - -As an example, here's how to run the normal system \command{diff} -command, getting it to generate context diffs (using the -\cmdopt{diff}{-c} option) instead of unified diffs, and five lines of -context instead of the default three (passing \texttt{5} as the -argument to the \cmdopt{diff}{-C} option). -\interaction{extdiff.extdiff-ctx} - -Launching a visual diff tool is just as easy. Here's how to launch -the \command{kdiff3} viewer. -\begin{codesample2} - hg extdiff -p kdiff3 -o '' -\end{codesample2} - -If your diff viewing command can't deal with directories, you can -easily work around this with a little scripting. For an example of -such scripting in action with the \hgext{mq} extension and the -\command{interdiff} command, see -section~\ref{mq-collab:tips:interdiff}. - -\subsection{Defining command aliases} - -It can be cumbersome to remember the options to both the -\hgxcmd{extdiff}{extdiff} command and the diff viewer you want to use, -so the \hgext{extdiff} extension lets you define \emph{new} commands -that will invoke your diff viewer with exactly the right options. - -All you need to do is edit your \hgrc, and add a section named -\rcsection{extdiff}. Inside this section, you can define multiple -commands. Here's how to add a \texttt{kdiff3} command. Once you've -defined this, you can type ``\texttt{hg kdiff3}'' and the -\hgext{extdiff} extension will run \command{kdiff3} for you. -\begin{codesample2} - [extdiff] - cmd.kdiff3 = -\end{codesample2} -If you leave the right hand side of the definition empty, as above, -the \hgext{extdiff} extension uses the name of the command you defined -as the name of the external program to run. But these names don't -have to be the same. Here, we define a command named ``\texttt{hg - wibble}'', which runs \command{kdiff3}. -\begin{codesample2} - [extdiff] - cmd.wibble = kdiff3 -\end{codesample2} - -You can also specify the default options that you want to invoke your -diff viewing program with. The prefix to use is ``\texttt{opts.}'', -followed by the name of the command to which the options apply. This -example defines a ``\texttt{hg vimdiff}'' command that runs the -\command{vim} editor's \texttt{DirDiff} extension. -\begin{codesample2} - [extdiff] - cmd.vimdiff = vim - opts.vimdiff = -f '+next' '+execute "DirDiff" argv(0) argv(1)' -\end{codesample2} - -\section{Cherrypicking changes with the \hgext{transplant} extension} -\label{sec:hgext:transplant} - -Need to have a long chat with Brendan about this. - -\section{Send changes via email with the \hgext{patchbomb} extension} -\label{sec:hgext:patchbomb} - -Many projects have a culture of ``change review'', in which people -send their modifications to a mailing list for others to read and -comment on before they commit the final version to a shared -repository. Some projects have people who act as gatekeepers; they -apply changes from other people to a repository to which those others -don't have access. - -Mercurial makes it easy to send changes over email for review or -application, via its \hgext{patchbomb} extension. The extension is so -namd because changes are formatted as patches, and it's usual to send -one changeset per email message. Sending a long series of changes by -email is thus much like ``bombing'' the recipient's inbox, hence -``patchbomb''. - -As usual, the basic configuration of the \hgext{patchbomb} extension -takes just one or two lines in your \hgrc. -\begin{codesample2} - [extensions] - patchbomb = -\end{codesample2} -Once you've enabled the extension, you will have a new command -available, named \hgxcmd{patchbomb}{email}. - -The safest and best way to invoke the \hgxcmd{patchbomb}{email} -command is to \emph{always} run it first with the -\hgxopt{patchbomb}{email}{-n} option. This will show you what the -command \emph{would} send, without actually sending anything. Once -you've had a quick glance over the changes and verified that you are -sending the right ones, you can rerun the same command, with the -\hgxopt{patchbomb}{email}{-n} option removed. - -The \hgxcmd{patchbomb}{email} command accepts the same kind of -revision syntax as every other Mercurial command. For example, this -command will send every revision between 7 and \texttt{tip}, -inclusive. -\begin{codesample2} - hg email -n 7:tip -\end{codesample2} -You can also specify a \emph{repository} to compare with. If you -provide a repository but no revisions, the \hgxcmd{patchbomb}{email} -command will send all revisions in the local repository that are not -present in the remote repository. If you additionally specify -revisions or a branch name (the latter using the -\hgxopt{patchbomb}{email}{-b} option), this will constrain the -revisions sent. - -It's perfectly safe to run the \hgxcmd{patchbomb}{email} command -without the names of the people you want to send to: if you do this, -it will just prompt you for those values interactively. (If you're -using a Linux or Unix-like system, you should have enhanced -\texttt{readline}-style editing capabilities when entering those -headers, too, which is useful.) - -When you are sending just one revision, the \hgxcmd{patchbomb}{email} -command will by default use the first line of the changeset -description as the subject of the single email message it sends. - -If you send multiple revisions, the \hgxcmd{patchbomb}{email} command -will usually send one message per changeset. It will preface the -series with an introductory message, in which you should describe the -purpose of the series of changes you're sending. - -\subsection{Changing the behaviour of patchbombs} - -Not every project has exactly the same conventions for sending changes -in email; the \hgext{patchbomb} extension tries to accommodate a -number of variations through command line options. -\begin{itemize} -\item You can write a subject for the introductory message on the - command line using the \hgxopt{patchbomb}{email}{-s} option. This - takes one argument, the text of the subject to use. -\item To change the email address from which the messages originate, - use the \hgxopt{patchbomb}{email}{-f} option. This takes one - argument, the email address to use. -\item The default behaviour is to send unified diffs (see - section~\ref{sec:mq:patch} for a description of the format), one per - message. You can send a binary bundle instead with the - \hgxopt{patchbomb}{email}{-b} option. -\item Unified diffs are normally prefaced with a metadata header. You - can omit this, and send unadorned diffs, with the - \hgxopt{patchbomb}{email}{--plain} option. -\item Diffs are normally sent ``inline'', in the same body part as the - description of a patch. This makes it easiest for the largest - number of readers to quote and respond to parts of a diff, as some - mail clients will only quote the first MIME body part in a message. - If you'd prefer to send the description and the diff in separate - body parts, use the \hgxopt{patchbomb}{email}{-a} option. -\item Instead of sending mail messages, you can write them to an - \texttt{mbox}-format mail folder using the - \hgxopt{patchbomb}{email}{-m} option. That option takes one - argument, the name of the file to write to. -\item If you would like to add a \command{diffstat}-format summary to - each patch, and one to the introductory message, use the - \hgxopt{patchbomb}{email}{-d} option. The \command{diffstat} - command displays a table containing the name of each file patched, - the number of lines affected, and a histogram showing how much each - file is modified. This gives readers a qualitative glance at how - complex a patch is. -\end{itemize} - -%%% Local Variables: -%%% mode: latex -%%% TeX-master: "00book" -%%% End: diff -r bc14f94e726a -r 5cd47f721686 en/hook.tex --- a/en/hook.tex Thu Jan 29 22:47:34 2009 -0800 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,1413 +0,0 @@ -\chapter{Handling repository events with hooks} -\label{chap:hook} - -Mercurial offers a powerful mechanism to let you perform automated -actions in response to events that occur in a repository. In some -cases, you can even control Mercurial's response to those events. - -The name Mercurial uses for one of these actions is a \emph{hook}. -Hooks are called ``triggers'' in some revision control systems, but -the two names refer to the same idea. - -\section{An overview of hooks in Mercurial} - -Here is a brief list of the hooks that Mercurial supports. We will -revisit each of these hooks in more detail later, in -section~\ref{sec:hook:ref}. - -\begin{itemize} -\item[\small\hook{changegroup}] This is run after a group of - changesets has been brought into the repository from elsewhere. -\item[\small\hook{commit}] This is run after a new changeset has been - created in the local repository. -\item[\small\hook{incoming}] This is run once for each new changeset - that is brought into the repository from elsewhere. Notice the - difference from \hook{changegroup}, which is run once per - \emph{group} of changesets brought in. -\item[\small\hook{outgoing}] This is run after a group of changesets - has been transmitted from this repository. -\item[\small\hook{prechangegroup}] This is run before starting to - bring a group of changesets into the repository. -\item[\small\hook{precommit}] Controlling. This is run before starting - a commit. -\item[\small\hook{preoutgoing}] Controlling. This is run before - starting to transmit a group of changesets from this repository. -\item[\small\hook{pretag}] Controlling. This is run before creating a tag. -\item[\small\hook{pretxnchangegroup}] Controlling. This is run after a - group of changesets has been brought into the local repository from - another, but before the transaction completes that will make the - changes permanent in the repository. -\item[\small\hook{pretxncommit}] Controlling. This is run after a new - changeset has been created in the local repository, but before the - transaction completes that will make it permanent. -\item[\small\hook{preupdate}] Controlling. This is run before starting - an update or merge of the working directory. -\item[\small\hook{tag}] This is run after a tag is created. -\item[\small\hook{update}] This is run after an update or merge of the - working directory has finished. -\end{itemize} -Each of the hooks whose description begins with the word -``Controlling'' has the ability to determine whether an activity can -proceed. If the hook succeeds, the activity may proceed; if it fails, -the activity is either not permitted or undone, depending on the hook. - -\section{Hooks and security} - -\subsection{Hooks are run with your privileges} - -When you run a Mercurial command in a repository, and the command -causes a hook to run, that hook runs on \emph{your} system, under -\emph{your} user account, with \emph{your} privilege level. Since -hooks are arbitrary pieces of executable code, you should treat them -with an appropriate level of suspicion. Do not install a hook unless -you are confident that you know who created it and what it does. - -In some cases, you may be exposed to hooks that you did not install -yourself. If you work with Mercurial on an unfamiliar system, -Mercurial will run hooks defined in that system's global \hgrc\ file. - -If you are working with a repository owned by another user, Mercurial -can run hooks defined in that user's repository, but it will still run -them as ``you''. For example, if you \hgcmd{pull} from that -repository, and its \sfilename{.hg/hgrc} defines a local -\hook{outgoing} hook, that hook will run under your user account, even -though you don't own that repository. - -\begin{note} - This only applies if you are pulling from a repository on a local or - network filesystem. If you're pulling over http or ssh, any - \hook{outgoing} hook will run under whatever account is executing - the server process, on the server. -\end{note} - -XXX To see what hooks are defined in a repository, use the -\hgcmdargs{config}{hooks} command. If you are working in one -repository, but talking to another that you do not own (e.g.~using -\hgcmd{pull} or \hgcmd{incoming}), remember that it is the other -repository's hooks you should be checking, not your own. - -\subsection{Hooks do not propagate} - -In Mercurial, hooks are not revision controlled, and do not propagate -when you clone, or pull from, a repository. The reason for this is -simple: a hook is a completely arbitrary piece of executable code. It -runs under your user identity, with your privilege level, on your -machine. - -It would be extremely reckless for any distributed revision control -system to implement revision-controlled hooks, as this would offer an -easily exploitable way to subvert the accounts of users of the -revision control system. - -Since Mercurial does not propagate hooks, if you are collaborating -with other people on a common project, you should not assume that they -are using the same Mercurial hooks as you are, or that theirs are -correctly configured. You should document the hooks you expect people -to use. - -In a corporate intranet, this is somewhat easier to control, as you -can for example provide a ``standard'' installation of Mercurial on an -NFS filesystem, and use a site-wide \hgrc\ file to define hooks that -all users will see. However, this too has its limits; see below. - -\subsection{Hooks can be overridden} - -Mercurial allows you to override a hook definition by redefining the -hook. You can disable it by setting its value to the empty string, or -change its behaviour as you wish. - -If you deploy a system-~or site-wide \hgrc\ file that defines some -hooks, you should thus understand that your users can disable or -override those hooks. - -\subsection{Ensuring that critical hooks are run} - -Sometimes you may want to enforce a policy that you do not want others -to be able to work around. For example, you may have a requirement -that every changeset must pass a rigorous set of tests. Defining this -requirement via a hook in a site-wide \hgrc\ won't work for remote -users on laptops, and of course local users can subvert it at will by -overriding the hook. - -Instead, you can set up your policies for use of Mercurial so that -people are expected to propagate changes through a well-known -``canonical'' server that you have locked down and configured -appropriately. - -One way to do this is via a combination of social engineering and -technology. Set up a restricted-access account; users can push -changes over the network to repositories managed by this account, but -they cannot log into the account and run normal shell commands. In -this scenario, a user can commit a changeset that contains any old -garbage they want. - -When someone pushes a changeset to the server that everyone pulls -from, the server will test the changeset before it accepts it as -permanent, and reject it if it fails to pass the test suite. If -people only pull changes from this filtering server, it will serve to -ensure that all changes that people pull have been automatically -vetted. - -\section{Care with \texttt{pretxn} hooks in a shared-access repository} - -If you want to use hooks to do some automated work in a repository -that a number of people have shared access to, you need to be careful -in how you do this. - -Mercurial only locks a repository when it is writing to the -repository, and only the parts of Mercurial that write to the -repository pay attention to locks. Write locks are necessary to -prevent multiple simultaneous writers from scribbling on each other's -work, corrupting the repository. - -Because Mercurial is careful with the order in which it reads and -writes data, it does not need to acquire a lock when it wants to read -data from the repository. The parts of Mercurial that read from the -repository never pay attention to locks. This lockless reading scheme -greatly increases performance and concurrency. - -With great performance comes a trade-off, though, one which has the -potential to cause you trouble unless you're aware of it. To describe -this requires a little detail about how Mercurial adds changesets to a -repository and reads those changes. - -When Mercurial \emph{writes} metadata, it writes it straight into the -destination file. It writes file data first, then manifest data -(which contains pointers to the new file data), then changelog data -(which contains pointers to the new manifest data). Before the first -write to each file, it stores a record of where the end of the file -was in its transaction log. If the transaction must be rolled back, -Mercurial simply truncates each file back to the size it was before the -transaction began. - -When Mercurial \emph{reads} metadata, it reads the changelog first, -then everything else. Since a reader will only access parts of the -manifest or file metadata that it can see in the changelog, it can -never see partially written data. - -Some controlling hooks (\hook{pretxncommit} and -\hook{pretxnchangegroup}) run when a transaction is almost complete. -All of the metadata has been written, but Mercurial can still roll the -transaction back and cause the newly-written data to disappear. - -If one of these hooks runs for long, it opens a window of time during -which a reader can see the metadata for changesets that are not yet -permanent, and should not be thought of as ``really there''. The -longer the hook runs, the longer that window is open. - -\subsection{The problem illustrated} - -In principle, a good use for the \hook{pretxnchangegroup} hook would -be to automatically build and test incoming changes before they are -accepted into a central repository. This could let you guarantee that -nobody can push changes to this repository that ``break the build''. -But if a client can pull changes while they're being tested, the -usefulness of the test is zero; an unsuspecting someone can pull -untested changes, potentially breaking their build. - -The safest technological answer to this challenge is to set up such a -``gatekeeper'' repository as \emph{unidirectional}. Let it take -changes pushed in from the outside, but do not allow anyone to pull -changes from it (use the \hook{preoutgoing} hook to lock it down). -Configure a \hook{changegroup} hook so that if a build or test -succeeds, the hook will push the new changes out to another repository -that people \emph{can} pull from. - -In practice, putting a centralised bottleneck like this in place is -not often a good idea, and transaction visibility has nothing to do -with the problem. As the size of a project---and the time it takes to -build and test---grows, you rapidly run into a wall with this ``try -before you buy'' approach, where you have more changesets to test than -time in which to deal with them. The inevitable result is frustration -on the part of all involved. - -An approach that scales better is to get people to build and test -before they push, then run automated builds and tests centrally -\emph{after} a push, to be sure all is well. The advantage of this -approach is that it does not impose a limit on the rate at which the -repository can accept changes. - -\section{A short tutorial on using hooks} -\label{sec:hook:simple} - -It is easy to write a Mercurial hook. Let's start with a hook that -runs when you finish a \hgcmd{commit}, and simply prints the hash of -the changeset you just created. The hook is called \hook{commit}. - -\begin{figure}[ht] - \interaction{hook.simple.init} - \caption{A simple hook that runs when a changeset is committed} - \label{ex:hook:init} -\end{figure} - -All hooks follow the pattern in example~\ref{ex:hook:init}. You add -an entry to the \rcsection{hooks} section of your \hgrc. On the left -is the name of the event to trigger on; on the right is the action to -take. As you can see, you can run an arbitrary shell command in a -hook. Mercurial passes extra information to the hook using -environment variables (look for \envar{HG\_NODE} in the example). - -\subsection{Performing multiple actions per event} - -Quite often, you will want to define more than one hook for a -particular kind of event, as shown in example~\ref{ex:hook:ext}. -Mercurial lets you do this by adding an \emph{extension} to the end of -a hook's name. You extend a hook's name by giving the name of the -hook, followed by a full stop (the ``\texttt{.}'' character), followed -by some more text of your choosing. For example, Mercurial will run -both \texttt{commit.foo} and \texttt{commit.bar} when the -\texttt{commit} event occurs. - -\begin{figure}[ht] - \interaction{hook.simple.ext} - \caption{Defining a second \hook{commit} hook} - \label{ex:hook:ext} -\end{figure} - -To give a well-defined order of execution when there are multiple -hooks defined for an event, Mercurial sorts hooks by extension, and -executes the hook commands in this sorted order. In the above -example, it will execute \texttt{commit.bar} before -\texttt{commit.foo}, and \texttt{commit} before both. - -It is a good idea to use a somewhat descriptive extension when you -define a new hook. This will help you to remember what the hook was -for. If the hook fails, you'll get an error message that contains the -hook name and extension, so using a descriptive extension could give -you an immediate hint as to why the hook failed (see -section~\ref{sec:hook:perm} for an example). - -\subsection{Controlling whether an activity can proceed} -\label{sec:hook:perm} - -In our earlier examples, we used the \hook{commit} hook, which is -run after a commit has completed. This is one of several Mercurial -hooks that run after an activity finishes. Such hooks have no way of -influencing the activity itself. - -Mercurial defines a number of events that occur before an activity -starts; or after it starts, but before it finishes. Hooks that -trigger on these events have the added ability to choose whether the -activity can continue, or will abort. - -The \hook{pretxncommit} hook runs after a commit has all but -completed. In other words, the metadata representing the changeset -has been written out to disk, but the transaction has not yet been -allowed to complete. The \hook{pretxncommit} hook has the ability to -decide whether the transaction can complete, or must be rolled back. - -If the \hook{pretxncommit} hook exits with a status code of zero, the -transaction is allowed to complete; the commit finishes; and the -\hook{commit} hook is run. If the \hook{pretxncommit} hook exits with -a non-zero status code, the transaction is rolled back; the metadata -representing the changeset is erased; and the \hook{commit} hook is -not run. - -\begin{figure}[ht] - \interaction{hook.simple.pretxncommit} - \caption{Using the \hook{pretxncommit} hook to control commits} - \label{ex:hook:pretxncommit} -\end{figure} - -The hook in example~\ref{ex:hook:pretxncommit} checks that a commit -comment contains a bug ID. If it does, the commit can complete. If -not, the commit is rolled back. - -\section{Writing your own hooks} - -When you are writing a hook, you might find it useful to run Mercurial -either with the \hggopt{-v} option, or the \rcitem{ui}{verbose} config -item set to ``true''. When you do so, Mercurial will print a message -before it calls each hook. - -\subsection{Choosing how your hook should run} -\label{sec:hook:lang} - -You can write a hook either as a normal program---typically a shell -script---or as a Python function that is executed within the Mercurial -process. - -Writing a hook as an external program has the advantage that it -requires no knowledge of Mercurial's internals. You can call normal -Mercurial commands to get any added information you need. The -trade-off is that external hooks are slower than in-process hooks. - -An in-process Python hook has complete access to the Mercurial API, -and does not ``shell out'' to another process, so it is inherently -faster than an external hook. It is also easier to obtain much of the -information that a hook requires by using the Mercurial API than by -running Mercurial commands. - -If you are comfortable with Python, or require high performance, -writing your hooks in Python may be a good choice. However, when you -have a straightforward hook to write and you don't need to care about -performance (probably the majority of hooks), a shell script is -perfectly fine. - -\subsection{Hook parameters} -\label{sec:hook:param} - -Mercurial calls each hook with a set of well-defined parameters. In -Python, a parameter is passed as a keyword argument to your hook -function. For an external program, a parameter is passed as an -environment variable. - -Whether your hook is written in Python or as a shell script, the -hook-specific parameter names and values will be the same. A boolean -parameter will be represented as a boolean value in Python, but as the -number 1 (for ``true'') or 0 (for ``false'') as an environment -variable for an external hook. If a hook parameter is named -\texttt{foo}, the keyword argument for a Python hook will also be -named \texttt{foo}, while the environment variable for an external -hook will be named \texttt{HG\_FOO}. - -\subsection{Hook return values and activity control} - -A hook that executes successfully must exit with a status of zero if -external, or return boolean ``false'' if in-process. Failure is -indicated with a non-zero exit status from an external hook, or an -in-process hook returning boolean ``true''. If an in-process hook -raises an exception, the hook is considered to have failed. - -For a hook that controls whether an activity can proceed, zero/false -means ``allow'', while non-zero/true/exception means ``deny''. - -\subsection{Writing an external hook} - -When you define an external hook in your \hgrc\ and the hook is run, -its value is passed to your shell, which interprets it. This means -that you can use normal shell constructs in the body of the hook. - -An executable hook is always run with its current directory set to a -repository's root directory. - -Each hook parameter is passed in as an environment variable; the name -is upper-cased, and prefixed with the string ``\texttt{HG\_}''. - -With the exception of hook parameters, Mercurial does not set or -modify any environment variables when running a hook. This is useful -to remember if you are writing a site-wide hook that may be run by a -number of different users with differing environment variables set. -In multi-user situations, you should not rely on environment variables -being set to the values you have in your environment when testing the -hook. - -\subsection{Telling Mercurial to use an in-process hook} - -The \hgrc\ syntax for defining an in-process hook is slightly -different than for an executable hook. The value of the hook must -start with the text ``\texttt{python:}'', and continue with the -fully-qualified name of a callable object to use as the hook's value. - -The module in which a hook lives is automatically imported when a hook -is run. So long as you have the module name and \envar{PYTHONPATH} -right, it should ``just work''. - -The following \hgrc\ example snippet illustrates the syntax and -meaning of the notions we just described. -\begin{codesample2} - [hooks] - commit.example = python:mymodule.submodule.myhook -\end{codesample2} -When Mercurial runs the \texttt{commit.example} hook, it imports -\texttt{mymodule.submodule}, looks for the callable object named -\texttt{myhook}, and calls it. - -\subsection{Writing an in-process hook} - -The simplest in-process hook does nothing, but illustrates the basic -shape of the hook API: -\begin{codesample2} - def myhook(ui, repo, **kwargs): - pass -\end{codesample2} -The first argument to a Python hook is always a -\pymodclass{mercurial.ui}{ui} object. The second is a repository object; -at the moment, it is always an instance of -\pymodclass{mercurial.localrepo}{localrepository}. Following these two -arguments are other keyword arguments. Which ones are passed in -depends on the hook being called, but a hook can ignore arguments it -doesn't care about by dropping them into a keyword argument dict, as -with \texttt{**kwargs} above. - -\section{Some hook examples} - -\subsection{Writing meaningful commit messages} - -It's hard to imagine a useful commit message being very short. The -simple \hook{pretxncommit} hook of figure~\ref{ex:hook:msglen.go} -will prevent you from committing a changeset with a message that is -less than ten bytes long. - -\begin{figure}[ht] - \interaction{hook.msglen.go} - \caption{A hook that forbids overly short commit messages} - \label{ex:hook:msglen.go} -\end{figure} - -\subsection{Checking for trailing whitespace} - -An interesting use of a commit-related hook is to help you to write -cleaner code. A simple example of ``cleaner code'' is the dictum that -a change should not add any new lines of text that contain ``trailing -whitespace''. Trailing whitespace is a series of space and tab -characters at the end of a line of text. In most cases, trailing -whitespace is unnecessary, invisible noise, but it is occasionally -problematic, and people often prefer to get rid of it. - -You can use either the \hook{precommit} or \hook{pretxncommit} hook to -tell whether you have a trailing whitespace problem. If you use the -\hook{precommit} hook, the hook will not know which files you are -committing, so it will have to check every modified file in the -repository for trailing white space. If you want to commit a change -to just the file \filename{foo}, but the file \filename{bar} contains -trailing whitespace, doing a check in the \hook{precommit} hook will -prevent you from committing \filename{foo} due to the problem with -\filename{bar}. This doesn't seem right. - -Should you choose the \hook{pretxncommit} hook, the check won't occur -until just before the transaction for the commit completes. This will -allow you to check for problems only the exact files that are being -committed. However, if you entered the commit message interactively -and the hook fails, the transaction will roll back; you'll have to -re-enter the commit message after you fix the trailing whitespace and -run \hgcmd{commit} again. - -\begin{figure}[ht] - \interaction{hook.ws.simple} - \caption{A simple hook that checks for trailing whitespace} - \label{ex:hook:ws.simple} -\end{figure} - -Figure~\ref{ex:hook:ws.simple} introduces a simple \hook{pretxncommit} -hook that checks for trailing whitespace. This hook is short, but not -very helpful. It exits with an error status if a change adds a line -with trailing whitespace to any file, but does not print any -information that might help us to identify the offending file or -line. It also has the nice property of not paying attention to -unmodified lines; only lines that introduce new trailing whitespace -cause problems. - -\begin{figure}[ht] - \interaction{hook.ws.better} - \caption{A better trailing whitespace hook} - \label{ex:hook:ws.better} -\end{figure} - -The example of figure~\ref{ex:hook:ws.better} is much more complex, -but also more useful. It parses a unified diff to see if any lines -add trailing whitespace, and prints the name of the file and the line -number of each such occurrence. Even better, if the change adds -trailing whitespace, this hook saves the commit comment and prints the -name of the save file before exiting and telling Mercurial to roll the -transaction back, so you can use -\hgcmdargs{commit}{\hgopt{commit}{-l}~\emph{filename}} to reuse the -saved commit message once you've corrected the problem. - -As a final aside, note in figure~\ref{ex:hook:ws.better} the use of -\command{perl}'s in-place editing feature to get rid of trailing -whitespace from a file. This is concise and useful enough that I will -reproduce it here. -\begin{codesample2} - perl -pi -e 's,\textbackslash{}s+\$,,' filename -\end{codesample2} - -\section{Bundled hooks} - -Mercurial ships with several bundled hooks. You can find them in the -\dirname{hgext} directory of a Mercurial source tree. If you are -using a Mercurial binary package, the hooks will be located in the -\dirname{hgext} directory of wherever your package installer put -Mercurial. - -\subsection{\hgext{acl}---access control for parts of a repository} - -The \hgext{acl} extension lets you control which remote users are -allowed to push changesets to a networked server. You can protect any -portion of a repository (including the entire repo), so that a -specific remote user can push changes that do not affect the protected -portion. - -This extension implements access control based on the identity of the -user performing a push, \emph{not} on who committed the changesets -they're pushing. It makes sense to use this hook only if you have a -locked-down server environment that authenticates remote users, and -you want to be sure that only specific users are allowed to push -changes to that server. - -\subsubsection{Configuring the \hook{acl} hook} - -In order to manage incoming changesets, the \hgext{acl} hook must be -used as a \hook{pretxnchangegroup} hook. This lets it see which files -are modified by each incoming changeset, and roll back a group of -changesets if they modify ``forbidden'' files. Example: -\begin{codesample2} - [hooks] - pretxnchangegroup.acl = python:hgext.acl.hook -\end{codesample2} - -The \hgext{acl} extension is configured using three sections. - -The \rcsection{acl} section has only one entry, \rcitem{acl}{sources}, -which lists the sources of incoming changesets that the hook should -pay attention to. You don't normally need to configure this section. -\begin{itemize} -\item[\rcitem{acl}{serve}] Control incoming changesets that are arriving - from a remote repository over http or ssh. This is the default - value of \rcitem{acl}{sources}, and usually the only setting you'll - need for this configuration item. -\item[\rcitem{acl}{pull}] Control incoming changesets that are - arriving via a pull from a local repository. -\item[\rcitem{acl}{push}] Control incoming changesets that are - arriving via a push from a local repository. -\item[\rcitem{acl}{bundle}] Control incoming changesets that are - arriving from another repository via a bundle. -\end{itemize} - -The \rcsection{acl.allow} section controls the users that are allowed to -add changesets to the repository. If this section is not present, all -users that are not explicitly denied are allowed. If this section is -present, all users that are not explicitly allowed are denied (so an -empty section means that all users are denied). - -The \rcsection{acl.deny} section determines which users are denied -from adding changesets to the repository. If this section is not -present or is empty, no users are denied. - -The syntaxes for the \rcsection{acl.allow} and \rcsection{acl.deny} -sections are identical. On the left of each entry is a glob pattern -that matches files or directories, relative to the root of the -repository; on the right, a user name. - -In the following example, the user \texttt{docwriter} can only push -changes to the \dirname{docs} subtree of the repository, while -\texttt{intern} can push changes to any file or directory except -\dirname{source/sensitive}. -\begin{codesample2} - [acl.allow] - docs/** = docwriter - - [acl.deny] - source/sensitive/** = intern -\end{codesample2} - -\subsubsection{Testing and troubleshooting} - -If you want to test the \hgext{acl} hook, run it with Mercurial's -debugging output enabled. Since you'll probably be running it on a -server where it's not convenient (or sometimes possible) to pass in -the \hggopt{--debug} option, don't forget that you can enable -debugging output in your \hgrc: -\begin{codesample2} - [ui] - debug = true -\end{codesample2} -With this enabled, the \hgext{acl} hook will print enough information -to let you figure out why it is allowing or forbidding pushes from -specific users. - -\subsection{\hgext{bugzilla}---integration with Bugzilla} - -The \hgext{bugzilla} extension adds a comment to a Bugzilla bug -whenever it finds a reference to that bug ID in a commit comment. You -can install this hook on a shared server, so that any time a remote -user pushes changes to this server, the hook gets run. - -It adds a comment to the bug that looks like this (you can configure -the contents of the comment---see below): -\begin{codesample2} - Changeset aad8b264143a, made by Joe User in - the frobnitz repository, refers to this bug. - - For complete details, see - http://hg.domain.com/frobnitz?cmd=changeset;node=aad8b264143a - - Changeset description: - Fix bug 10483 by guarding against some NULL pointers -\end{codesample2} -The value of this hook is that it automates the process of updating a -bug any time a changeset refers to it. If you configure the hook -properly, it makes it easy for people to browse straight from a -Bugzilla bug to a changeset that refers to that bug. - -You can use the code in this hook as a starting point for some more -exotic Bugzilla integration recipes. Here are a few possibilities: -\begin{itemize} -\item Require that every changeset pushed to the server have a valid - bug~ID in its commit comment. In this case, you'd want to configure - the hook as a \hook{pretxncommit} hook. This would allow the hook - to reject changes that didn't contain bug IDs. -\item Allow incoming changesets to automatically modify the - \emph{state} of a bug, as well as simply adding a comment. For - example, the hook could recognise the string ``fixed bug 31337'' as - indicating that it should update the state of bug 31337 to - ``requires testing''. -\end{itemize} - -\subsubsection{Configuring the \hook{bugzilla} hook} -\label{sec:hook:bugzilla:config} - -You should configure this hook in your server's \hgrc\ as an -\hook{incoming} hook, for example as follows: -\begin{codesample2} - [hooks] - incoming.bugzilla = python:hgext.bugzilla.hook -\end{codesample2} - -Because of the specialised nature of this hook, and because Bugzilla -was not written with this kind of integration in mind, configuring -this hook is a somewhat involved process. - -Before you begin, you must install the MySQL bindings for Python on -the host(s) where you'll be running the hook. If this is not -available as a binary package for your system, you can download it -from~\cite{web:mysql-python}. - -Configuration information for this hook lives in the -\rcsection{bugzilla} section of your \hgrc. -\begin{itemize} -\item[\rcitem{bugzilla}{version}] The version of Bugzilla installed on - the server. The database schema that Bugzilla uses changes - occasionally, so this hook has to know exactly which schema to use. - At the moment, the only version supported is \texttt{2.16}. -\item[\rcitem{bugzilla}{host}] The hostname of the MySQL server that - stores your Bugzilla data. The database must be configured to allow - connections from whatever host you are running the \hook{bugzilla} - hook on. -\item[\rcitem{bugzilla}{user}] The username with which to connect to - the MySQL server. The database must be configured to allow this - user to connect from whatever host you are running the - \hook{bugzilla} hook on. This user must be able to access and - modify Bugzilla tables. The default value of this item is - \texttt{bugs}, which is the standard name of the Bugzilla user in a - MySQL database. -\item[\rcitem{bugzilla}{password}] The MySQL password for the user you - configured above. This is stored as plain text, so you should make - sure that unauthorised users cannot read the \hgrc\ file where you - store this information. -\item[\rcitem{bugzilla}{db}] The name of the Bugzilla database on the - MySQL server. The default value of this item is \texttt{bugs}, - which is the standard name of the MySQL database where Bugzilla - stores its data. -\item[\rcitem{bugzilla}{notify}] If you want Bugzilla to send out a - notification email to subscribers after this hook has added a - comment to a bug, you will need this hook to run a command whenever - it updates the database. The command to run depends on where you - have installed Bugzilla, but it will typically look something like - this, if you have Bugzilla installed in - \dirname{/var/www/html/bugzilla}: - \begin{codesample4} - cd /var/www/html/bugzilla && ./processmail %s nobody@nowhere.com - \end{codesample4} - The Bugzilla \texttt{processmail} program expects to be given a - bug~ID (the hook replaces ``\texttt{\%s}'' with the bug~ID) and an - email address. It also expects to be able to write to some files in - the directory that it runs in. If Bugzilla and this hook are not - installed on the same machine, you will need to find a way to run - \texttt{processmail} on the server where Bugzilla is installed. -\end{itemize} - -\subsubsection{Mapping committer names to Bugzilla user names} - -By default, the \hgext{bugzilla} hook tries to use the email address -of a changeset's committer as the Bugzilla user name with which to -update a bug. If this does not suit your needs, you can map committer -email addresses to Bugzilla user names using a \rcsection{usermap} -section. - -Each item in the \rcsection{usermap} section contains an email address -on the left, and a Bugzilla user name on the right. -\begin{codesample2} - [usermap] - jane.user@example.com = jane -\end{codesample2} -You can either keep the \rcsection{usermap} data in a normal \hgrc, or -tell the \hgext{bugzilla} hook to read the information from an -external \filename{usermap} file. In the latter case, you can store -\filename{usermap} data by itself in (for example) a user-modifiable -repository. This makes it possible to let your users maintain their -own \rcitem{bugzilla}{usermap} entries. The main \hgrc\ file might -look like this: -\begin{codesample2} - # regular hgrc file refers to external usermap file - [bugzilla] - usermap = /home/hg/repos/userdata/bugzilla-usermap.conf -\end{codesample2} -While the \filename{usermap} file that it refers to might look like -this: -\begin{codesample2} - # bugzilla-usermap.conf - inside a hg repository - [usermap] - stephanie@example.com = steph -\end{codesample2} - -\subsubsection{Configuring the text that gets added to a bug} - -You can configure the text that this hook adds as a comment; you -specify it in the form of a Mercurial template. Several \hgrc\ -entries (still in the \rcsection{bugzilla} section) control this -behaviour. -\begin{itemize} -\item[\texttt{strip}] The number of leading path elements to strip - from a repository's path name to construct a partial path for a URL. - For example, if the repositories on your server live under - \dirname{/home/hg/repos}, and you have a repository whose path is - \dirname{/home/hg/repos/app/tests}, then setting \texttt{strip} to - \texttt{4} will give a partial path of \dirname{app/tests}. The - hook will make this partial path available when expanding a - template, as \texttt{webroot}. -\item[\texttt{template}] The text of the template to use. In addition - to the usual changeset-related variables, this template can use - \texttt{hgweb} (the value of the \texttt{hgweb} configuration item - above) and \texttt{webroot} (the path constructed using - \texttt{strip} above). -\end{itemize} - -In addition, you can add a \rcitem{web}{baseurl} item to the -\rcsection{web} section of your \hgrc. The \hgext{bugzilla} hook will -make this available when expanding a template, as the base string to -use when constructing a URL that will let users browse from a Bugzilla -comment to view a changeset. Example: -\begin{codesample2} - [web] - baseurl = http://hg.domain.com/ -\end{codesample2} - -Here is an example set of \hgext{bugzilla} hook config information. -\begin{codesample2} - [bugzilla] - host = bugzilla.example.com - password = mypassword - version = 2.16 - # server-side repos live in /home/hg/repos, so strip 4 leading - # separators - strip = 4 - hgweb = http://hg.example.com/ - usermap = /home/hg/repos/notify/bugzilla.conf - template = Changeset \{node|short\}, made by \{author\} in the \{webroot\} - repo, refers to this bug.\\nFor complete details, see - \{hgweb\}\{webroot\}?cmd=changeset;node=\{node|short\}\\nChangeset - description:\\n\\t\{desc|tabindent\} -\end{codesample2} - -\subsubsection{Testing and troubleshooting} - -The most common problems with configuring the \hgext{bugzilla} hook -relate to running Bugzilla's \filename{processmail} script and mapping -committer names to user names. - -Recall from section~\ref{sec:hook:bugzilla:config} above that the user -that runs the Mercurial process on the server is also the one that -will run the \filename{processmail} script. The -\filename{processmail} script sometimes causes Bugzilla to write to -files in its configuration directory, and Bugzilla's configuration -files are usually owned by the user that your web server runs under. - -You can cause \filename{processmail} to be run with the suitable -user's identity using the \command{sudo} command. Here is an example -entry for a \filename{sudoers} file. -\begin{codesample2} - hg_user = (httpd_user) NOPASSWD: /var/www/html/bugzilla/processmail-wrapper %s -\end{codesample2} -This allows the \texttt{hg\_user} user to run a -\filename{processmail-wrapper} program under the identity of -\texttt{httpd\_user}. - -This indirection through a wrapper script is necessary, because -\filename{processmail} expects to be run with its current directory -set to wherever you installed Bugzilla; you can't specify that kind of -constraint in a \filename{sudoers} file. The contents of the wrapper -script are simple: -\begin{codesample2} - #!/bin/sh - cd `dirname $0` && ./processmail "$1" nobody@example.com -\end{codesample2} -It doesn't seem to matter what email address you pass to -\filename{processmail}. - -If your \rcsection{usermap} is not set up correctly, users will see an -error message from the \hgext{bugzilla} hook when they push changes -to the server. The error message will look like this: -\begin{codesample2} - cannot find bugzilla user id for john.q.public@example.com -\end{codesample2} -What this means is that the committer's address, -\texttt{john.q.public@example.com}, is not a valid Bugzilla user name, -nor does it have an entry in your \rcsection{usermap} that maps it to -a valid Bugzilla user name. - -\subsection{\hgext{notify}---send email notifications} - -Although Mercurial's built-in web server provides RSS feeds of changes -in every repository, many people prefer to receive change -notifications via email. The \hgext{notify} hook lets you send out -notifications to a set of email addresses whenever changesets arrive -that those subscribers are interested in. - -As with the \hgext{bugzilla} hook, the \hgext{notify} hook is -template-driven, so you can customise the contents of the notification -messages that it sends. - -By default, the \hgext{notify} hook includes a diff of every changeset -that it sends out; you can limit the size of the diff, or turn this -feature off entirely. It is useful for letting subscribers review -changes immediately, rather than clicking to follow a URL. - -\subsubsection{Configuring the \hgext{notify} hook} - -You can set up the \hgext{notify} hook to send one email message per -incoming changeset, or one per incoming group of changesets (all those -that arrived in a single pull or push). -\begin{codesample2} - [hooks] - # send one email per group of changes - changegroup.notify = python:hgext.notify.hook - # send one email per change - incoming.notify = python:hgext.notify.hook -\end{codesample2} - -Configuration information for this hook lives in the -\rcsection{notify} section of a \hgrc\ file. -\begin{itemize} -\item[\rcitem{notify}{test}] By default, this hook does not send out - email at all; instead, it prints the message that it \emph{would} - send. Set this item to \texttt{false} to allow email to be sent. - The reason that sending of email is turned off by default is that it - takes several tries to configure this extension exactly as you would - like, and it would be bad form to spam subscribers with a number of - ``broken'' notifications while you debug your configuration. -\item[\rcitem{notify}{config}] The path to a configuration file that - contains subscription information. This is kept separate from the - main \hgrc\ so that you can maintain it in a repository of its own. - People can then clone that repository, update their subscriptions, - and push the changes back to your server. -\item[\rcitem{notify}{strip}] The number of leading path separator - characters to strip from a repository's path, when deciding whether - a repository has subscribers. For example, if the repositories on - your server live in \dirname{/home/hg/repos}, and \hgext{notify} is - considering a repository named \dirname{/home/hg/repos/shared/test}, - setting \rcitem{notify}{strip} to \texttt{4} will cause - \hgext{notify} to trim the path it considers down to - \dirname{shared/test}, and it will match subscribers against that. -\item[\rcitem{notify}{template}] The template text to use when sending - messages. This specifies both the contents of the message header - and its body. -\item[\rcitem{notify}{maxdiff}] The maximum number of lines of diff - data to append to the end of a message. If a diff is longer than - this, it is truncated. By default, this is set to 300. Set this to - \texttt{0} to omit diffs from notification emails. -\item[\rcitem{notify}{sources}] A list of sources of changesets to - consider. This lets you limit \hgext{notify} to only sending out - email about changes that remote users pushed into this repository - via a server, for example. See section~\ref{sec:hook:sources} for - the sources you can specify here. -\end{itemize} - -If you set the \rcitem{web}{baseurl} item in the \rcsection{web} -section, you can use it in a template; it will be available as -\texttt{webroot}. - -Here is an example set of \hgext{notify} configuration information. -\begin{codesample2} - [notify] - # really send email - test = false - # subscriber data lives in the notify repo - config = /home/hg/repos/notify/notify.conf - # repos live in /home/hg/repos on server, so strip 4 "/" chars - strip = 4 - template = X-Hg-Repo: \{webroot\} - Subject: \{webroot\}: \{desc|firstline|strip\} - From: \{author\} - - changeset \{node|short\} in \{root\} - details: \{baseurl\}\{webroot\}?cmd=changeset;node=\{node|short\} - description: - \{desc|tabindent|strip\} - - [web] - baseurl = http://hg.example.com/ -\end{codesample2} - -This will produce a message that looks like the following: -\begin{codesample2} - X-Hg-Repo: tests/slave - Subject: tests/slave: Handle error case when slave has no buffers - Date: Wed, 2 Aug 2006 15:25:46 -0700 (PDT) - - changeset 3cba9bfe74b5 in /home/hg/repos/tests/slave - details: http://hg.example.com/tests/slave?cmd=changeset;node=3cba9bfe74b5 - description: - Handle error case when slave has no buffers - diffs (54 lines): - - diff -r 9d95df7cf2ad -r 3cba9bfe74b5 include/tests.h - --- a/include/tests.h Wed Aug 02 15:19:52 2006 -0700 - +++ b/include/tests.h Wed Aug 02 15:25:26 2006 -0700 - @@ -212,6 +212,15 @@ static __inline__ void test_headers(void *h) - [...snip...] -\end{codesample2} - -\subsubsection{Testing and troubleshooting} - -Do not forget that by default, the \hgext{notify} extension \emph{will - not send any mail} until you explicitly configure it to do so, by -setting \rcitem{notify}{test} to \texttt{false}. Until you do that, -it simply prints the message it \emph{would} send. - -\section{Information for writers of hooks} -\label{sec:hook:ref} - -\subsection{In-process hook execution} - -An in-process hook is called with arguments of the following form: -\begin{codesample2} - def myhook(ui, repo, **kwargs): - pass -\end{codesample2} -The \texttt{ui} parameter is a \pymodclass{mercurial.ui}{ui} object. -The \texttt{repo} parameter is a -\pymodclass{mercurial.localrepo}{localrepository} object. The -names and values of the \texttt{**kwargs} parameters depend on the -hook being invoked, with the following common features: -\begin{itemize} -\item If a parameter is named \texttt{node} or - \texttt{parent\emph{N}}, it will contain a hexadecimal changeset ID. - The empty string is used to represent ``null changeset ID'' instead - of a string of zeroes. -\item If a parameter is named \texttt{url}, it will contain the URL of - a remote repository, if that can be determined. -\item Boolean-valued parameters are represented as Python - \texttt{bool} objects. -\end{itemize} - -An in-process hook is called without a change to the process's working -directory (unlike external hooks, which are run in the root of the -repository). It must not change the process's working directory, or -it will cause any calls it makes into the Mercurial API to fail. - -If a hook returns a boolean ``false'' value, it is considered to have -succeeded. If it returns a boolean ``true'' value or raises an -exception, it is considered to have failed. A useful way to think of -the calling convention is ``tell me if you fail''. - -Note that changeset IDs are passed into Python hooks as hexadecimal -strings, not the binary hashes that Mercurial's APIs normally use. To -convert a hash from hex to binary, use the -\pymodfunc{mercurial.node}{bin} function. - -\subsection{External hook execution} - -An external hook is passed to the shell of the user running Mercurial. -Features of that shell, such as variable substitution and command -redirection, are available. The hook is run in the root directory of -the repository (unlike in-process hooks, which are run in the same -directory that Mercurial was run in). - -Hook parameters are passed to the hook as environment variables. Each -environment variable's name is converted in upper case and prefixed -with the string ``\texttt{HG\_}''. For example, if the name of a -parameter is ``\texttt{node}'', the name of the environment variable -representing that parameter will be ``\texttt{HG\_NODE}''. - -A boolean parameter is represented as the string ``\texttt{1}'' for -``true'', ``\texttt{0}'' for ``false''. If an environment variable is -named \envar{HG\_NODE}, \envar{HG\_PARENT1} or \envar{HG\_PARENT2}, it -contains a changeset ID represented as a hexadecimal string. The -empty string is used to represent ``null changeset ID'' instead of a -string of zeroes. If an environment variable is named -\envar{HG\_URL}, it will contain the URL of a remote repository, if -that can be determined. - -If a hook exits with a status of zero, it is considered to have -succeeded. If it exits with a non-zero status, it is considered to -have failed. - -\subsection{Finding out where changesets come from} - -A hook that involves the transfer of changesets between a local -repository and another may be able to find out information about the -``far side''. Mercurial knows \emph{how} changes are being -transferred, and in many cases \emph{where} they are being transferred -to or from. - -\subsubsection{Sources of changesets} -\label{sec:hook:sources} - -Mercurial will tell a hook what means are, or were, used to transfer -changesets between repositories. This is provided by Mercurial in a -Python parameter named \texttt{source}, or an environment variable named -\envar{HG\_SOURCE}. - -\begin{itemize} -\item[\texttt{serve}] Changesets are transferred to or from a remote - repository over http or ssh. -\item[\texttt{pull}] Changesets are being transferred via a pull from - one repository into another. -\item[\texttt{push}] Changesets are being transferred via a push from - one repository into another. -\item[\texttt{bundle}] Changesets are being transferred to or from a - bundle. -\end{itemize} - -\subsubsection{Where changes are going---remote repository URLs} -\label{sec:hook:url} - -When possible, Mercurial will tell a hook the location of the ``far -side'' of an activity that transfers changeset data between -repositories. This is provided by Mercurial in a Python parameter -named \texttt{url}, or an environment variable named \envar{HG\_URL}. - -This information is not always known. If a hook is invoked in a -repository that is being served via http or ssh, Mercurial cannot tell -where the remote repository is, but it may know where the client is -connecting from. In such cases, the URL will take one of the -following forms: -\begin{itemize} -\item \texttt{remote:ssh:\emph{ip-address}}---remote ssh client, at - the given IP address. -\item \texttt{remote:http:\emph{ip-address}}---remote http client, at - the given IP address. If the client is using SSL, this will be of - the form \texttt{remote:https:\emph{ip-address}}. -\item Empty---no information could be discovered about the remote - client. -\end{itemize} - -\section{Hook reference} - -\subsection{\hook{changegroup}---after remote changesets added} -\label{sec:hook:changegroup} - -This hook is run after a group of pre-existing changesets has been -added to the repository, for example via a \hgcmd{pull} or -\hgcmd{unbundle}. This hook is run once per operation that added one -or more changesets. This is in contrast to the \hook{incoming} hook, -which is run once per changeset, regardless of whether the changesets -arrive in a group. - -Some possible uses for this hook include kicking off an automated -build or test of the added changesets, updating a bug database, or -notifying subscribers that a repository contains new changes. - -Parameters to this hook: -\begin{itemize} -\item[\texttt{node}] A changeset ID. The changeset ID of the first - changeset in the group that was added. All changesets between this - and \index{tags!\texttt{tip}}\texttt{tip}, inclusive, were added by - a single \hgcmd{pull}, \hgcmd{push} or \hgcmd{unbundle}. -\item[\texttt{source}] A string. The source of these changes. See - section~\ref{sec:hook:sources} for details. -\item[\texttt{url}] A URL. The location of the remote repository, if - known. See section~\ref{sec:hook:url} for more information. -\end{itemize} - -See also: \hook{incoming} (section~\ref{sec:hook:incoming}), -\hook{prechangegroup} (section~\ref{sec:hook:prechangegroup}), -\hook{pretxnchangegroup} (section~\ref{sec:hook:pretxnchangegroup}) - -\subsection{\hook{commit}---after a new changeset is created} -\label{sec:hook:commit} - -This hook is run after a new changeset has been created. - -Parameters to this hook: -\begin{itemize} -\item[\texttt{node}] A changeset ID. The changeset ID of the newly - committed changeset. -\item[\texttt{parent1}] A changeset ID. The changeset ID of the first - parent of the newly committed changeset. -\item[\texttt{parent2}] A changeset ID. The changeset ID of the second - parent of the newly committed changeset. -\end{itemize} - -See also: \hook{precommit} (section~\ref{sec:hook:precommit}), -\hook{pretxncommit} (section~\ref{sec:hook:pretxncommit}) - -\subsection{\hook{incoming}---after one remote changeset is added} -\label{sec:hook:incoming} - -This hook is run after a pre-existing changeset has been added to the -repository, for example via a \hgcmd{push}. If a group of changesets -was added in a single operation, this hook is called once for each -added changeset. - -You can use this hook for the same purposes as the \hook{changegroup} -hook (section~\ref{sec:hook:changegroup}); it's simply more convenient -sometimes to run a hook once per group of changesets, while other -times it's handier once per changeset. - -Parameters to this hook: -\begin{itemize} -\item[\texttt{node}] A changeset ID. The ID of the newly added - changeset. -\item[\texttt{source}] A string. The source of these changes. See - section~\ref{sec:hook:sources} for details. -\item[\texttt{url}] A URL. The location of the remote repository, if - known. See section~\ref{sec:hook:url} for more information. -\end{itemize} - -See also: \hook{changegroup} (section~\ref{sec:hook:changegroup}) \hook{prechangegroup} (section~\ref{sec:hook:prechangegroup}), \hook{pretxnchangegroup} (section~\ref{sec:hook:pretxnchangegroup}) - -\subsection{\hook{outgoing}---after changesets are propagated} -\label{sec:hook:outgoing} - -This hook is run after a group of changesets has been propagated out -of this repository, for example by a \hgcmd{push} or \hgcmd{bundle} -command. - -One possible use for this hook is to notify administrators that -changes have been pulled. - -Parameters to this hook: -\begin{itemize} -\item[\texttt{node}] A changeset ID. The changeset ID of the first - changeset of the group that was sent. -\item[\texttt{source}] A string. The source of the of the operation - (see section~\ref{sec:hook:sources}). If a remote client pulled - changes from this repository, \texttt{source} will be - \texttt{serve}. If the client that obtained changes from this - repository was local, \texttt{source} will be \texttt{bundle}, - \texttt{pull}, or \texttt{push}, depending on the operation the - client performed. -\item[\texttt{url}] A URL. The location of the remote repository, if - known. See section~\ref{sec:hook:url} for more information. -\end{itemize} - -See also: \hook{preoutgoing} (section~\ref{sec:hook:preoutgoing}) - -\subsection{\hook{prechangegroup}---before starting to add remote changesets} -\label{sec:hook:prechangegroup} - -This controlling hook is run before Mercurial begins to add a group of -changesets from another repository. - -This hook does not have any information about the changesets to be -added, because it is run before transmission of those changesets is -allowed to begin. If this hook fails, the changesets will not be -transmitted. - -One use for this hook is to prevent external changes from being added -to a repository. For example, you could use this to ``freeze'' a -server-hosted branch temporarily or permanently so that users cannot -push to it, while still allowing a local administrator to modify the -repository. - -Parameters to this hook: -\begin{itemize} -\item[\texttt{source}] A string. The source of these changes. See - section~\ref{sec:hook:sources} for details. -\item[\texttt{url}] A URL. The location of the remote repository, if - known. See section~\ref{sec:hook:url} for more information. -\end{itemize} - -See also: \hook{changegroup} (section~\ref{sec:hook:changegroup}), -\hook{incoming} (section~\ref{sec:hook:incoming}), , -\hook{pretxnchangegroup} (section~\ref{sec:hook:pretxnchangegroup}) - -\subsection{\hook{precommit}---before starting to commit a changeset} -\label{sec:hook:precommit} - -This hook is run before Mercurial begins to commit a new changeset. -It is run before Mercurial has any of the metadata for the commit, -such as the files to be committed, the commit message, or the commit -date. - -One use for this hook is to disable the ability to commit new -changesets, while still allowing incoming changesets. Another is to -run a build or test, and only allow the commit to begin if the build -or test succeeds. - -Parameters to this hook: -\begin{itemize} -\item[\texttt{parent1}] A changeset ID. The changeset ID of the first - parent of the working directory. -\item[\texttt{parent2}] A changeset ID. The changeset ID of the second - parent of the working directory. -\end{itemize} -If the commit proceeds, the parents of the working directory will -become the parents of the new changeset. - -See also: \hook{commit} (section~\ref{sec:hook:commit}), -\hook{pretxncommit} (section~\ref{sec:hook:pretxncommit}) - -\subsection{\hook{preoutgoing}---before starting to propagate changesets} -\label{sec:hook:preoutgoing} - -This hook is invoked before Mercurial knows the identities of the -changesets to be transmitted. - -One use for this hook is to prevent changes from being transmitted to -another repository. - -Parameters to this hook: -\begin{itemize} -\item[\texttt{source}] A string. The source of the operation that is - attempting to obtain changes from this repository (see - section~\ref{sec:hook:sources}). See the documentation for the - \texttt{source} parameter to the \hook{outgoing} hook, in - section~\ref{sec:hook:outgoing}, for possible values of this - parameter. -\item[\texttt{url}] A URL. The location of the remote repository, if - known. See section~\ref{sec:hook:url} for more information. -\end{itemize} - -See also: \hook{outgoing} (section~\ref{sec:hook:outgoing}) - -\subsection{\hook{pretag}---before tagging a changeset} -\label{sec:hook:pretag} - -This controlling hook is run before a tag is created. If the hook -succeeds, creation of the tag proceeds. If the hook fails, the tag is -not created. - -Parameters to this hook: -\begin{itemize} -\item[\texttt{local}] A boolean. Whether the tag is local to this - repository instance (i.e.~stored in \sfilename{.hg/localtags}) or - managed by Mercurial (stored in \sfilename{.hgtags}). -\item[\texttt{node}] A changeset ID. The ID of the changeset to be tagged. -\item[\texttt{tag}] A string. The name of the tag to be created. -\end{itemize} - -If the tag to be created is revision-controlled, the \hook{precommit} -and \hook{pretxncommit} hooks (sections~\ref{sec:hook:commit} -and~\ref{sec:hook:pretxncommit}) will also be run. - -See also: \hook{tag} (section~\ref{sec:hook:tag}) - -\subsection{\hook{pretxnchangegroup}---before completing addition of - remote changesets} -\label{sec:hook:pretxnchangegroup} - -This controlling hook is run before a transaction---that manages the -addition of a group of new changesets from outside the -repository---completes. If the hook succeeds, the transaction -completes, and all of the changesets become permanent within this -repository. If the hook fails, the transaction is rolled back, and -the data for the changesets is erased. - -This hook can access the metadata associated with the almost-added -changesets, but it should not do anything permanent with this data. -It must also not modify the working directory. - -While this hook is running, if other Mercurial processes access this -repository, they will be able to see the almost-added changesets as if -they are permanent. This may lead to race conditions if you do not -take steps to avoid them. - -This hook can be used to automatically vet a group of changesets. If -the hook fails, all of the changesets are ``rejected'' when the -transaction rolls back. - -Parameters to this hook: -\begin{itemize} -\item[\texttt{node}] A changeset ID. The changeset ID of the first - changeset in the group that was added. All changesets between this - and \index{tags!\texttt{tip}}\texttt{tip}, inclusive, were added by - a single \hgcmd{pull}, \hgcmd{push} or \hgcmd{unbundle}. -\item[\texttt{source}] A string. The source of these changes. See - section~\ref{sec:hook:sources} for details. -\item[\texttt{url}] A URL. The location of the remote repository, if - known. See section~\ref{sec:hook:url} for more information. -\end{itemize} - -See also: \hook{changegroup} (section~\ref{sec:hook:changegroup}), -\hook{incoming} (section~\ref{sec:hook:incoming}), -\hook{prechangegroup} (section~\ref{sec:hook:prechangegroup}) - -\subsection{\hook{pretxncommit}---before completing commit of new changeset} -\label{sec:hook:pretxncommit} - -This controlling hook is run before a transaction---that manages a new -commit---completes. If the hook succeeds, the transaction completes -and the changeset becomes permanent within this repository. If the -hook fails, the transaction is rolled back, and the commit data is -erased. - -This hook can access the metadata associated with the almost-new -changeset, but it should not do anything permanent with this data. It -must also not modify the working directory. - -While this hook is running, if other Mercurial processes access this -repository, they will be able to see the almost-new changeset as if it -is permanent. This may lead to race conditions if you do not take -steps to avoid them. - -Parameters to this hook: -\begin{itemize} -\item[\texttt{node}] A changeset ID. The changeset ID of the newly - committed changeset. -\item[\texttt{parent1}] A changeset ID. The changeset ID of the first - parent of the newly committed changeset. -\item[\texttt{parent2}] A changeset ID. The changeset ID of the second - parent of the newly committed changeset. -\end{itemize} - -See also: \hook{precommit} (section~\ref{sec:hook:precommit}) - -\subsection{\hook{preupdate}---before updating or merging working directory} -\label{sec:hook:preupdate} - -This controlling hook is run before an update or merge of the working -directory begins. It is run only if Mercurial's normal pre-update -checks determine that the update or merge can proceed. If the hook -succeeds, the update or merge may proceed; if it fails, the update or -merge does not start. - -Parameters to this hook: -\begin{itemize} -\item[\texttt{parent1}] A changeset ID. The ID of the parent that the - working directory is to be updated to. If the working directory is - being merged, it will not change this parent. -\item[\texttt{parent2}] A changeset ID. Only set if the working - directory is being merged. The ID of the revision that the working - directory is being merged with. -\end{itemize} - -See also: \hook{update} (section~\ref{sec:hook:update}) - -\subsection{\hook{tag}---after tagging a changeset} -\label{sec:hook:tag} - -This hook is run after a tag has been created. - -Parameters to this hook: -\begin{itemize} -\item[\texttt{local}] A boolean. Whether the new tag is local to this - repository instance (i.e.~stored in \sfilename{.hg/localtags}) or - managed by Mercurial (stored in \sfilename{.hgtags}). -\item[\texttt{node}] A changeset ID. The ID of the changeset that was - tagged. -\item[\texttt{tag}] A string. The name of the tag that was created. -\end{itemize} - -If the created tag is revision-controlled, the \hook{commit} hook -(section~\ref{sec:hook:commit}) is run before this hook. - -See also: \hook{pretag} (section~\ref{sec:hook:pretag}) - -\subsection{\hook{update}---after updating or merging working directory} -\label{sec:hook:update} - -This hook is run after an update or merge of the working directory -completes. Since a merge can fail (if the external \command{hgmerge} -command fails to resolve conflicts in a file), this hook communicates -whether the update or merge completed cleanly. - -\begin{itemize} -\item[\texttt{error}] A boolean. Indicates whether the update or - merge completed successfully. -\item[\texttt{parent1}] A changeset ID. The ID of the parent that the - working directory was updated to. If the working directory was - merged, it will not have changed this parent. -\item[\texttt{parent2}] A changeset ID. Only set if the working - directory was merged. The ID of the revision that the working - directory was merged with. -\end{itemize} - -See also: \hook{preupdate} (section~\ref{sec:hook:preupdate}) - -%%% Local Variables: -%%% mode: latex -%%% TeX-master: "00book" -%%% End: diff -r bc14f94e726a -r 5cd47f721686 en/intro.tex --- a/en/intro.tex Thu Jan 29 22:47:34 2009 -0800 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,561 +0,0 @@ -\chapter{Introduction} -\label{chap:intro} - -\section{About revision control} - -Revision control is the process of managing multiple versions of a -piece of information. In its simplest form, this is something that -many people do by hand: every time you modify a file, save it under a -new name that contains a number, each one higher than the number of -the preceding version. - -Manually managing multiple versions of even a single file is an -error-prone task, though, so software tools to help automate this -process have long been available. The earliest automated revision -control tools were intended to help a single user to manage revisions -of a single file. Over the past few decades, the scope of revision -control tools has expanded greatly; they now manage multiple files, -and help multiple people to work together. The best modern revision -control tools have no problem coping with thousands of people working -together on projects that consist of hundreds of thousands of files. - -\subsection{Why use revision control?} - -There are a number of reasons why you or your team might want to use -an automated revision control tool for a project. -\begin{itemize} -\item It will track the history and evolution of your project, so you - don't have to. For every change, you'll have a log of \emph{who} - made it; \emph{why} they made it; \emph{when} they made it; and - \emph{what} the change was. -\item When you're working with other people, revision control software - makes it easier for you to collaborate. For example, when people - more or less simultaneously make potentially incompatible changes, - the software will help you to identify and resolve those conflicts. -\item It can help you to recover from mistakes. If you make a change - that later turns out to be in error, you can revert to an earlier - version of one or more files. In fact, a \emph{really} good - revision control tool will even help you to efficiently figure out - exactly when a problem was introduced (see - section~\ref{sec:undo:bisect} for details). -\item It will help you to work simultaneously on, and manage the drift - between, multiple versions of your project. -\end{itemize} -Most of these reasons are equally valid---at least in theory---whether -you're working on a project by yourself, or with a hundred other -people. - -A key question about the practicality of revision control at these two -different scales (``lone hacker'' and ``huge team'') is how its -\emph{benefits} compare to its \emph{costs}. A revision control tool -that's difficult to understand or use is going to impose a high cost. - -A five-hundred-person project is likely to collapse under its own -weight almost immediately without a revision control tool and process. -In this case, the cost of using revision control might hardly seem -worth considering, since \emph{without} it, failure is almost -guaranteed. - -On the other hand, a one-person ``quick hack'' might seem like a poor -place to use a revision control tool, because surely the cost of using -one must be close to the overall cost of the project. Right? - -Mercurial uniquely supports \emph{both} of these scales of -development. You can learn the basics in just a few minutes, and due -to its low overhead, you can apply revision control to the smallest of -projects with ease. Its simplicity means you won't have a lot of -abstruse concepts or command sequences competing for mental space with -whatever you're \emph{really} trying to do. At the same time, -Mercurial's high performance and peer-to-peer nature let you scale -painlessly to handle large projects. - -No revision control tool can rescue a poorly run project, but a good -choice of tools can make a huge difference to the fluidity with which -you can work on a project. - -\subsection{The many names of revision control} - -Revision control is a diverse field, so much so that it doesn't -actually have a single name or acronym. Here are a few of the more -common names and acronyms you'll encounter: -\begin{itemize} -\item Revision control (RCS) -\item Software configuration management (SCM), or configuration management -\item Source code management -\item Source code control, or source control -\item Version control (VCS) -\end{itemize} -Some people claim that these terms actually have different meanings, -but in practice they overlap so much that there's no agreed or even -useful way to tease them apart. - -\section{A short history of revision control} - -The best known of the old-time revision control tools is SCCS (Source -Code Control System), which Marc Rochkind wrote at Bell Labs, in the -early 1970s. SCCS operated on individual files, and required every -person working on a project to have access to a shared workspace on a -single system. Only one person could modify a file at any time; -arbitration for access to files was via locks. It was common for -people to lock files, and later forget to unlock them, preventing -anyone else from modifying those files without the help of an -administrator. - -Walter Tichy developed a free alternative to SCCS in the early 1980s; -he called his program RCS (Revison Control System). Like SCCS, RCS -required developers to work in a single shared workspace, and to lock -files to prevent multiple people from modifying them simultaneously. - -Later in the 1980s, Dick Grune used RCS as a building block for a set -of shell scripts he initially called cmt, but then renamed to CVS -(Concurrent Versions System). The big innovation of CVS was that it -let developers work simultaneously and somewhat independently in their -own personal workspaces. The personal workspaces prevented developers -from stepping on each other's toes all the time, as was common with -SCCS and RCS. Each developer had a copy of every project file, and -could modify their copies independently. They had to merge their -edits prior to committing changes to the central repository. - -Brian Berliner took Grune's original scripts and rewrote them in~C, -releasing in 1989 the code that has since developed into the modern -version of CVS. CVS subsequently acquired the ability to operate over -a network connection, giving it a client/server architecture. CVS's -architecture is centralised; only the server has a copy of the history -of the project. Client workspaces just contain copies of recent -versions of the project's files, and a little metadata to tell them -where the server is. CVS has been enormously successful; it is -probably the world's most widely used revision control system. - -In the early 1990s, Sun Microsystems developed an early distributed -revision control system, called TeamWare. A TeamWare workspace -contains a complete copy of the project's history. TeamWare has no -notion of a central repository. (CVS relied upon RCS for its history -storage; TeamWare used SCCS.) - -As the 1990s progressed, awareness grew of a number of problems with -CVS. It records simultaneous changes to multiple files individually, -instead of grouping them together as a single logically atomic -operation. It does not manage its file hierarchy well; it is easy to -make a mess of a repository by renaming files and directories. Worse, -its source code is difficult to read and maintain, which made the -``pain level'' of fixing these architectural problems prohibitive. - -In 2001, Jim Blandy and Karl Fogel, two developers who had worked on -CVS, started a project to replace it with a tool that would have a -better architecture and cleaner code. The result, Subversion, does -not stray from CVS's centralised client/server model, but it adds -multi-file atomic commits, better namespace management, and a number -of other features that make it a generally better tool than CVS. -Since its initial release, it has rapidly grown in popularity. - -More or less simultaneously, Graydon Hoare began working on an -ambitious distributed revision control system that he named Monotone. -While Monotone addresses many of CVS's design flaws and has a -peer-to-peer architecture, it goes beyond earlier (and subsequent) -revision control tools in a number of innovative ways. It uses -cryptographic hashes as identifiers, and has an integral notion of -``trust'' for code from different sources. - -Mercurial began life in 2005. While a few aspects of its design are -influenced by Monotone, Mercurial focuses on ease of use, high -performance, and scalability to very large projects. - -\section{Trends in revision control} - -There has been an unmistakable trend in the development and use of -revision control tools over the past four decades, as people have -become familiar with the capabilities of their tools and constrained -by their limitations. - -The first generation began by managing single files on individual -computers. Although these tools represented a huge advance over -ad-hoc manual revision control, their locking model and reliance on a -single computer limited them to small, tightly-knit teams. - -The second generation loosened these constraints by moving to -network-centered architectures, and managing entire projects at a -time. As projects grew larger, they ran into new problems. With -clients needing to talk to servers very frequently, server scaling -became an issue for large projects. An unreliable network connection -could prevent remote users from being able to talk to the server at -all. As open source projects started making read-only access -available anonymously to anyone, people without commit privileges -found that they could not use the tools to interact with a project in -a natural way, as they could not record their changes. - -The current generation of revision control tools is peer-to-peer in -nature. All of these systems have dropped the dependency on a single -central server, and allow people to distribute their revision control -data to where it's actually needed. Collaboration over the Internet -has moved from constrained by technology to a matter of choice and -consensus. Modern tools can operate offline indefinitely and -autonomously, with a network connection only needed when syncing -changes with another repository. - -\section{A few of the advantages of distributed revision control} - -Even though distributed revision control tools have for several years -been as robust and usable as their previous-generation counterparts, -people using older tools have not yet necessarily woken up to their -advantages. There are a number of ways in which distributed tools -shine relative to centralised ones. - -For an individual developer, distributed tools are almost always much -faster than centralised tools. This is for a simple reason: a -centralised tool needs to talk over the network for many common -operations, because most metadata is stored in a single copy on the -central server. A distributed tool stores all of its metadata -locally. All else being equal, talking over the network adds overhead -to a centralised tool. Don't underestimate the value of a snappy, -responsive tool: you're going to spend a lot of time interacting with -your revision control software. - -Distributed tools are indifferent to the vagaries of your server -infrastructure, again because they replicate metadata to so many -locations. If you use a centralised system and your server catches -fire, you'd better hope that your backup media are reliable, and that -your last backup was recent and actually worked. With a distributed -tool, you have many backups available on every contributor's computer. - -The reliability of your network will affect distributed tools far less -than it will centralised tools. You can't even use a centralised tool -without a network connection, except for a few highly constrained -commands. With a distributed tool, if your network connection goes -down while you're working, you may not even notice. The only thing -you won't be able to do is talk to repositories on other computers, -something that is relatively rare compared with local operations. If -you have a far-flung team of collaborators, this may be significant. - -\subsection{Advantages for open source projects} - -If you take a shine to an open source project and decide that you -would like to start hacking on it, and that project uses a distributed -revision control tool, you are at once a peer with the people who -consider themselves the ``core'' of that project. If they publish -their repositories, you can immediately copy their project history, -start making changes, and record your work, using the same tools in -the same ways as insiders. By contrast, with a centralised tool, you -must use the software in a ``read only'' mode unless someone grants -you permission to commit changes to their central server. Until then, -you won't be able to record changes, and your local modifications will -be at risk of corruption any time you try to update your client's view -of the repository. - -\subsubsection{The forking non-problem} - -It has been suggested that distributed revision control tools pose -some sort of risk to open source projects because they make it easy to -``fork'' the development of a project. A fork happens when there are -differences in opinion or attitude between groups of developers that -cause them to decide that they can't work together any longer. Each -side takes a more or less complete copy of the project's source code, -and goes off in its own direction. - -Sometimes the camps in a fork decide to reconcile their differences. -With a centralised revision control system, the \emph{technical} -process of reconciliation is painful, and has to be performed largely -by hand. You have to decide whose revision history is going to -``win'', and graft the other team's changes into the tree somehow. -This usually loses some or all of one side's revision history. - -What distributed tools do with respect to forking is they make forking -the \emph{only} way to develop a project. Every single change that -you make is potentially a fork point. The great strength of this -approach is that a distributed revision control tool has to be really -good at \emph{merging} forks, because forks are absolutely -fundamental: they happen all the time. - -If every piece of work that everybody does, all the time, is framed in -terms of forking and merging, then what the open source world refers -to as a ``fork'' becomes \emph{purely} a social issue. If anything, -distributed tools \emph{lower} the likelihood of a fork: -\begin{itemize} -\item They eliminate the social distinction that centralised tools - impose: that between insiders (people with commit access) and - outsiders (people without). -\item They make it easier to reconcile after a social fork, because - all that's involved from the perspective of the revision control - software is just another merge. -\end{itemize} - -Some people resist distributed tools because they want to retain tight -control over their projects, and they believe that centralised tools -give them this control. However, if you're of this belief, and you -publish your CVS or Subversion repositories publically, there are -plenty of tools available that can pull out your entire project's -history (albeit slowly) and recreate it somewhere that you don't -control. So while your control in this case is illusory, you are -forgoing the ability to fluidly collaborate with whatever people feel -compelled to mirror and fork your history. - -\subsection{Advantages for commercial projects} - -Many commercial projects are undertaken by teams that are scattered -across the globe. Contributors who are far from a central server will -see slower command execution and perhaps less reliability. Commercial -revision control systems attempt to ameliorate these problems with -remote-site replication add-ons that are typically expensive to buy -and cantankerous to administer. A distributed system doesn't suffer -from these problems in the first place. Better yet, you can easily -set up multiple authoritative servers, say one per site, so that -there's no redundant communication between repositories over expensive -long-haul network links. - -Centralised revision control systems tend to have relatively low -scalability. It's not unusual for an expensive centralised system to -fall over under the combined load of just a few dozen concurrent -users. Once again, the typical response tends to be an expensive and -clunky replication facility. Since the load on a central server---if -you have one at all---is many times lower with a distributed -tool (because all of the data is replicated everywhere), a single -cheap server can handle the needs of a much larger team, and -replication to balance load becomes a simple matter of scripting. - -If you have an employee in the field, troubleshooting a problem at a -customer's site, they'll benefit from distributed revision control. -The tool will let them generate custom builds, try different fixes in -isolation from each other, and search efficiently through history for -the sources of bugs and regressions in the customer's environment, all -without needing to connect to your company's network. - -\section{Why choose Mercurial?} - -Mercurial has a unique set of properties that make it a particularly -good choice as a revision control system. -\begin{itemize} -\item It is easy to learn and use. -\item It is lightweight. -\item It scales excellently. -\item It is easy to customise. -\end{itemize} - -If you are at all familiar with revision control systems, you should -be able to get up and running with Mercurial in less than five -minutes. Even if not, it will take no more than a few minutes -longer. Mercurial's command and feature sets are generally uniform -and consistent, so you can keep track of a few general rules instead -of a host of exceptions. - -On a small project, you can start working with Mercurial in moments. -Creating new changes and branches; transferring changes around -(whether locally or over a network); and history and status operations -are all fast. Mercurial attempts to stay nimble and largely out of -your way by combining low cognitive overhead with blazingly fast -operations. - -The usefulness of Mercurial is not limited to small projects: it is -used by projects with hundreds to thousands of contributors, each -containing tens of thousands of files and hundreds of megabytes of -source code. - -If the core functionality of Mercurial is not enough for you, it's -easy to build on. Mercurial is well suited to scripting tasks, and -its clean internals and implementation in Python make it easy to add -features in the form of extensions. There are a number of popular and -useful extensions already available, ranging from helping to identify -bugs to improving performance. - -\section{Mercurial compared with other tools} - -Before you read on, please understand that this section necessarily -reflects my own experiences, interests, and (dare I say it) biases. I -have used every one of the revision control tools listed below, in -most cases for several years at a time. - - -\subsection{Subversion} - -Subversion is a popular revision control tool, developed to replace -CVS. It has a centralised client/server architecture. - -Subversion and Mercurial have similarly named commands for performing -the same operations, so if you're familiar with one, it is easy to -learn to use the other. Both tools are portable to all popular -operating systems. - -Prior to version 1.5, Subversion had no useful support for merges. -At the time of writing, its merge tracking capability is new, and known to be -\href{http://svnbook.red-bean.com/nightly/en/svn.branchmerge.advanced.html#svn.branchmerge.advanced.finalword}{complicated - and buggy}. - -Mercurial has a substantial performance advantage over Subversion on -every revision control operation I have benchmarked. I have measured -its advantage as ranging from a factor of two to a factor of six when -compared with Subversion~1.4.3's \emph{ra\_local} file store, which is -the fastest access method available. In more realistic deployments -involving a network-based store, Subversion will be at a substantially -larger disadvantage. Because many Subversion commands must talk to -the server and Subversion does not have useful replication facilities, -server capacity and network bandwidth become bottlenecks for modestly -large projects. - -Additionally, Subversion incurs substantial storage overhead to avoid -network transactions for a few common operations, such as finding -modified files (\texttt{status}) and displaying modifications against -the current revision (\texttt{diff}). As a result, a Subversion -working copy is often the same size as, or larger than, a Mercurial -repository and working directory, even though the Mercurial repository -contains a complete history of the project. - -Subversion is widely supported by third party tools. Mercurial -currently lags considerably in this area. This gap is closing, -however, and indeed some of Mercurial's GUI tools now outshine their -Subversion equivalents. Like Mercurial, Subversion has an excellent -user manual. - -Because Subversion doesn't store revision history on the client, it is -well suited to managing projects that deal with lots of large, opaque -binary files. If you check in fifty revisions to an incompressible -10MB file, Subversion's client-side space usage stays constant The -space used by any distributed SCM will grow rapidly in proportion to -the number of revisions, because the differences between each revision -are large. - -In addition, it's often difficult or, more usually, impossible to -merge different versions of a binary file. Subversion's ability to -let a user lock a file, so that they temporarily have the exclusive -right to commit changes to it, can be a significant advantage to a -project where binary files are widely used. - -Mercurial can import revision history from a Subversion repository. -It can also export revision history to a Subversion repository. This -makes it easy to ``test the waters'' and use Mercurial and Subversion -in parallel before deciding to switch. History conversion is -incremental, so you can perform an initial conversion, then small -additional conversions afterwards to bring in new changes. - - -\subsection{Git} - -Git is a distributed revision control tool that was developed for -managing the Linux kernel source tree. Like Mercurial, its early -design was somewhat influenced by Monotone. - -Git has a very large command set, with version~1.5.0 providing~139 -individual commands. It has something of a reputation for being -difficult to learn. Compared to Git, Mercurial has a strong focus on -simplicity. - -In terms of performance, Git is extremely fast. In several cases, it -is faster than Mercurial, at least on Linux, while Mercurial performs -better on other operations. However, on Windows, the performance and -general level of support that Git provides is, at the time of writing, -far behind that of Mercurial. - -While a Mercurial repository needs no maintenance, a Git repository -requires frequent manual ``repacks'' of its metadata. Without these, -performance degrades, while space usage grows rapidly. A server that -contains many Git repositories that are not rigorously and frequently -repacked will become heavily disk-bound during backups, and there have -been instances of daily backups taking far longer than~24 hours as a -result. A freshly packed Git repository is slightly smaller than a -Mercurial repository, but an unpacked repository is several orders of -magnitude larger. - -The core of Git is written in C. Many Git commands are implemented as -shell or Perl scripts, and the quality of these scripts varies widely. -I have encountered several instances where scripts charged along -blindly in the presence of errors that should have been fatal. - -Mercurial can import revision history from a Git repository. - - -\subsection{CVS} - -CVS is probably the most widely used revision control tool in the -world. Due to its age and internal untidiness, it has been only -lightly maintained for many years. - -It has a centralised client/server architecture. It does not group -related file changes into atomic commits, making it easy for people to -``break the build'': one person can successfully commit part of a -change and then be blocked by the need for a merge, causing other -people to see only a portion of the work they intended to do. This -also affects how you work with project history. If you want to see -all of the modifications someone made as part of a task, you will need -to manually inspect the descriptions and timestamps of the changes -made to each file involved (if you even know what those files were). - -CVS has a muddled notion of tags and branches that I will not attempt -to even describe. It does not support renaming of files or -directories well, making it easy to corrupt a repository. It has -almost no internal consistency checking capabilities, so it is usually -not even possible to tell whether or how a repository is corrupt. I -would not recommend CVS for any project, existing or new. - -Mercurial can import CVS revision history. However, there are a few -caveats that apply; these are true of every other revision control -tool's CVS importer, too. Due to CVS's lack of atomic changes and -unversioned filesystem hierarchy, it is not possible to reconstruct -CVS history completely accurately; some guesswork is involved, and -renames will usually not show up. Because a lot of advanced CVS -administration has to be done by hand and is hence error-prone, it's -common for CVS importers to run into multiple problems with corrupted -repositories (completely bogus revision timestamps and files that have -remained locked for over a decade are just two of the less interesting -problems I can recall from personal experience). - -Mercurial can import revision history from a CVS repository. - - -\subsection{Commercial tools} - -Perforce has a centralised client/server architecture, with no -client-side caching of any data. Unlike modern revision control -tools, Perforce requires that a user run a command to inform the -server about every file they intend to edit. - -The performance of Perforce is quite good for small teams, but it -falls off rapidly as the number of users grows beyond a few dozen. -Modestly large Perforce installations require the deployment of -proxies to cope with the load their users generate. - - -\subsection{Choosing a revision control tool} - -With the exception of CVS, all of the tools listed above have unique -strengths that suit them to particular styles of work. There is no -single revision control tool that is best in all situations. - -As an example, Subversion is a good choice for working with frequently -edited binary files, due to its centralised nature and support for -file locking. - -I personally find Mercurial's properties of simplicity, performance, -and good merge support to be a compelling combination that has served -me well for several years. - - -\section{Switching from another tool to Mercurial} - -Mercurial is bundled with an extension named \hgext{convert}, which -can incrementally import revision history from several other revision -control tools. By ``incremental'', I mean that you can convert all of -a project's history to date in one go, then rerun the conversion later -to obtain new changes that happened after the initial conversion. - -The revision control tools supported by \hgext{convert} are as -follows: -\begin{itemize} -\item Subversion -\item CVS -\item Git -\item Darcs -\end{itemize} - -In addition, \hgext{convert} can export changes from Mercurial to -Subversion. This makes it possible to try Subversion and Mercurial in -parallel before committing to a switchover, without risking the loss -of any work. - -The \hgxcmd{conver}{convert} command is easy to use. Simply point it -at the path or URL of the source repository, optionally give it the -name of the destination repository, and it will start working. After -the initial conversion, just run the same command again to import new -changes. - - -%%% Local Variables: -%%% mode: latex -%%% TeX-master: "00book" -%%% End: diff -r bc14f94e726a -r 5cd47f721686 en/license.tex --- a/en/license.tex Thu Jan 29 22:47:34 2009 -0800 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,138 +0,0 @@ -\chapter{Open Publication License} -\label{cha:opl} - -Version 1.0, 8 June 1999 - -\section{Requirements on both unmodified and modified versions} - -The Open Publication works may be reproduced and distributed in whole -or in part, in any medium physical or electronic, provided that the -terms of this license are adhered to, and that this license or an -incorporation of it by reference (with any options elected by the -author(s) and/or publisher) is displayed in the reproduction. - -Proper form for an incorporation by reference is as follows: - -\begin{quote} - Copyright (c) \emph{year} by \emph{author's name or designee}. This - material may be distributed only subject to the terms and conditions - set forth in the Open Publication License, v\emph{x.y} or later (the - latest version is presently available at - \url{http://www.opencontent.org/openpub/}). -\end{quote} - -The reference must be immediately followed with any options elected by -the author(s) and/or publisher of the document (see -section~\ref{sec:opl:options}). - -Commercial redistribution of Open Publication-licensed material is -permitted. - -Any publication in standard (paper) book form shall require the -citation of the original publisher and author. The publisher and -author's names shall appear on all outer surfaces of the book. On all -outer surfaces of the book the original publisher's name shall be as -large as the title of the work and cited as possessive with respect to -the title. - -\section{Copyright} - -The copyright to each Open Publication is owned by its author(s) or -designee. - -\section{Scope of license} - -The following license terms apply to all Open Publication works, -unless otherwise explicitly stated in the document. - -Mere aggregation of Open Publication works or a portion of an Open -Publication work with other works or programs on the same media shall -not cause this license to apply to those other works. The aggregate -work shall contain a notice specifying the inclusion of the Open -Publication material and appropriate copyright notice. - -\textbf{Severability}. If any part of this license is found to be -unenforceable in any jurisdiction, the remaining portions of the -license remain in force. - -\textbf{No warranty}. Open Publication works are licensed and provided -``as is'' without warranty of any kind, express or implied, including, -but not limited to, the implied warranties of merchantability and -fitness for a particular purpose or a warranty of non-infringement. - -\section{Requirements on modified works} - -All modified versions of documents covered by this license, including -translations, anthologies, compilations and partial documents, must -meet the following requirements: - -\begin{enumerate} -\item The modified version must be labeled as such. -\item The person making the modifications must be identified and the - modifications dated. -\item Acknowledgement of the original author and publisher if - applicable must be retained according to normal academic citation - practices. -\item The location of the original unmodified document must be - identified. -\item The original author's (or authors') name(s) may not be used to - assert or imply endorsement of the resulting document without the - original author's (or authors') permission. -\end{enumerate} - -\section{Good-practice recommendations} - -In addition to the requirements of this license, it is requested from -and strongly recommended of redistributors that: - -\begin{enumerate} -\item If you are distributing Open Publication works on hardcopy or - CD-ROM, you provide email notification to the authors of your intent - to redistribute at least thirty days before your manuscript or media - freeze, to give the authors time to provide updated documents. This - notification should describe modifications, if any, made to the - document. -\item All substantive modifications (including deletions) be either - clearly marked up in the document or else described in an attachment - to the document. -\item Finally, while it is not mandatory under this license, it is - considered good form to offer a free copy of any hardcopy and CD-ROM - expression of an Open Publication-licensed work to its author(s). -\end{enumerate} - -\section{License options} -\label{sec:opl:options} - -The author(s) and/or publisher of an Open Publication-licensed -document may elect certain options by appending language to the -reference to or copy of the license. These options are considered part -of the license instance and must be included with the license (or its -incorporation by reference) in derived works. - -\begin{enumerate}[A] -\item To prohibit distribution of substantively modified versions - without the explicit permission of the author(s). ``Substantive - modification'' is defined as a change to the semantic content of the - document, and excludes mere changes in format or typographical - corrections. - - To accomplish this, add the phrase ``Distribution of substantively - modified versions of this document is prohibited without the - explicit permission of the copyright holder.'' to the license - reference or copy. - -\item To prohibit any publication of this work or derivative works in - whole or in part in standard (paper) book form for commercial - purposes is prohibited unless prior permission is obtained from the - copyright holder. - - To accomplish this, add the phrase ``Distribution of the work or - derivative of the work in any standard (paper) book form is - prohibited unless prior permission is obtained from the copyright - holder.'' to the license reference or copy. -\end{enumerate} - -%%% Local Variables: -%%% mode: latex -%%% TeX-master: "00book" -%%% End: diff -r bc14f94e726a -r 5cd47f721686 en/mq-collab.tex --- a/en/mq-collab.tex Thu Jan 29 22:47:34 2009 -0800 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,393 +0,0 @@ -\chapter{Advanced uses of Mercurial Queues} -\label{chap:mq-collab} - -While it's easy to pick up straightforward uses of Mercurial Queues, -use of a little discipline and some of MQ's less frequently used -capabilities makes it possible to work in complicated development -environments. - -In this chapter, I will use as an example a technique I have used to -manage the development of an Infiniband device driver for the Linux -kernel. The driver in question is large (at least as drivers go), -with 25,000 lines of code spread across 35 source files. It is -maintained by a small team of developers. - -While much of the material in this chapter is specific to Linux, the -same principles apply to any code base for which you're not the -primary owner, and upon which you need to do a lot of development. - -\section{The problem of many targets} - -The Linux kernel changes rapidly, and has never been internally -stable; developers frequently make drastic changes between releases. -This means that a version of the driver that works well with a -particular released version of the kernel will not even \emph{compile} -correctly against, typically, any other version. - -To maintain a driver, we have to keep a number of distinct versions of -Linux in mind. -\begin{itemize} -\item One target is the main Linux kernel development tree. - Maintenance of the code is in this case partly shared by other - developers in the kernel community, who make ``drive-by'' - modifications to the driver as they develop and refine kernel - subsystems. -\item We also maintain a number of ``backports'' to older versions of - the Linux kernel, to support the needs of customers who are running - older Linux distributions that do not incorporate our drivers. (To - \emph{backport} a piece of code is to modify it to work in an older - version of its target environment than the version it was developed - for.) -\item Finally, we make software releases on a schedule that is - necessarily not aligned with those used by Linux distributors and - kernel developers, so that we can deliver new features to customers - without forcing them to upgrade their entire kernels or - distributions. -\end{itemize} - -\subsection{Tempting approaches that don't work well} - -There are two ``standard'' ways to maintain a piece of software that -has to target many different environments. - -The first is to maintain a number of branches, each intended for a -single target. The trouble with this approach is that you must -maintain iron discipline in the flow of changes between repositories. -A new feature or bug fix must start life in a ``pristine'' repository, -then percolate out to every backport repository. Backport changes are -more limited in the branches they should propagate to; a backport -change that is applied to a branch where it doesn't belong will -probably stop the driver from compiling. - -The second is to maintain a single source tree filled with conditional -statements that turn chunks of code on or off depending on the -intended target. Because these ``ifdefs'' are not allowed in the -Linux kernel tree, a manual or automatic process must be followed to -strip them out and yield a clean tree. A code base maintained in this -fashion rapidly becomes a rat's nest of conditional blocks that are -difficult to understand and maintain. - -Neither of these approaches is well suited to a situation where you -don't ``own'' the canonical copy of a source tree. In the case of a -Linux driver that is distributed with the standard kernel, Linus's -tree contains the copy of the code that will be treated by the world -as canonical. The upstream version of ``my'' driver can be modified -by people I don't know, without me even finding out about it until -after the changes show up in Linus's tree. - -These approaches have the added weakness of making it difficult to -generate well-formed patches to submit upstream. - -In principle, Mercurial Queues seems like a good candidate to manage a -development scenario such as the above. While this is indeed the -case, MQ contains a few added features that make the job more -pleasant. - -\section{Conditionally applying patches with - guards} - -Perhaps the best way to maintain sanity with so many targets is to be -able to choose specific patches to apply for a given situation. MQ -provides a feature called ``guards'' (which originates with quilt's -\texttt{guards} command) that does just this. To start off, let's -create a simple repository for experimenting in. -\interaction{mq.guards.init} -This gives us a tiny repository that contains two patches that don't -have any dependencies on each other, because they touch different files. - -The idea behind conditional application is that you can ``tag'' a -patch with a \emph{guard}, which is simply a text string of your -choosing, then tell MQ to select specific guards to use when applying -patches. MQ will then either apply, or skip over, a guarded patch, -depending on the guards that you have selected. - -A patch can have an arbitrary number of guards; -each one is \emph{positive} (``apply this patch if this guard is -selected'') or \emph{negative} (``skip this patch if this guard is -selected''). A patch with no guards is always applied. - -\section{Controlling the guards on a patch} - -The \hgxcmd{mq}{qguard} command lets you determine which guards should -apply to a patch, or display the guards that are already in effect. -Without any arguments, it displays the guards on the current topmost -patch. -\interaction{mq.guards.qguard} -To set a positive guard on a patch, prefix the name of the guard with -a ``\texttt{+}''. -\interaction{mq.guards.qguard.pos} -To set a negative guard on a patch, prefix the name of the guard with -a ``\texttt{-}''. -\interaction{mq.guards.qguard.neg} - -\begin{note} - The \hgxcmd{mq}{qguard} command \emph{sets} the guards on a patch; it - doesn't \emph{modify} them. What this means is that if you run - \hgcmdargs{qguard}{+a +b} on a patch, then \hgcmdargs{qguard}{+c} on - the same patch, the \emph{only} guard that will be set on it - afterwards is \texttt{+c}. -\end{note} - -Mercurial stores guards in the \sfilename{series} file; the form in -which they are stored is easy both to understand and to edit by hand. -(In other words, you don't have to use the \hgxcmd{mq}{qguard} command if -you don't want to; it's okay to simply edit the \sfilename{series} -file.) -\interaction{mq.guards.series} - -\section{Selecting the guards to use} - -The \hgxcmd{mq}{qselect} command determines which guards are active at a -given time. The effect of this is to determine which patches MQ will -apply the next time you run \hgxcmd{mq}{qpush}. It has no other effect; in -particular, it doesn't do anything to patches that are already -applied. - -With no arguments, the \hgxcmd{mq}{qselect} command lists the guards -currently in effect, one per line of output. Each argument is treated -as the name of a guard to apply. -\interaction{mq.guards.qselect.foo} -In case you're interested, the currently selected guards are stored in -the \sfilename{guards} file. -\interaction{mq.guards.qselect.cat} -We can see the effect the selected guards have when we run -\hgxcmd{mq}{qpush}. -\interaction{mq.guards.qselect.qpush} - -A guard cannot start with a ``\texttt{+}'' or ``\texttt{-}'' -character. The name of a guard must not contain white space, but most -other characters are acceptable. If you try to use a guard with an -invalid name, MQ will complain: -\interaction{mq.guards.qselect.error} -Changing the selected guards changes the patches that are applied. -\interaction{mq.guards.qselect.quux} -You can see in the example below that negative guards take precedence -over positive guards. -\interaction{mq.guards.qselect.foobar} - -\section{MQ's rules for applying patches} - -The rules that MQ uses when deciding whether to apply a patch -are as follows. -\begin{itemize} -\item A patch that has no guards is always applied. -\item If the patch has any negative guard that matches any currently - selected guard, the patch is skipped. -\item If the patch has any positive guard that matches any currently - selected guard, the patch is applied. -\item If the patch has positive or negative guards, but none matches - any currently selected guard, the patch is skipped. -\end{itemize} - -\section{Trimming the work environment} - -In working on the device driver I mentioned earlier, I don't apply the -patches to a normal Linux kernel tree. Instead, I use a repository -that contains only a snapshot of the source files and headers that are -relevant to Infiniband development. This repository is~1\% the size -of a kernel repository, so it's easier to work with. - -I then choose a ``base'' version on top of which the patches are -applied. This is a snapshot of the Linux kernel tree as of a revision -of my choosing. When I take the snapshot, I record the changeset ID -from the kernel repository in the commit message. Since the snapshot -preserves the ``shape'' and content of the relevant parts of the -kernel tree, I can apply my patches on top of either my tiny -repository or a normal kernel tree. - -Normally, the base tree atop which the patches apply should be a -snapshot of a very recent upstream tree. This best facilitates the -development of patches that can easily be submitted upstream with few -or no modifications. - -\section{Dividing up the \sfilename{series} file} - -I categorise the patches in the \sfilename{series} file into a number -of logical groups. Each section of like patches begins with a block -of comments that describes the purpose of the patches that follow. - -The sequence of patch groups that I maintain follows. The ordering of -these groups is important; I'll describe why after I introduce the -groups. -\begin{itemize} -\item The ``accepted'' group. Patches that the development team has - submitted to the maintainer of the Infiniband subsystem, and which - he has accepted, but which are not present in the snapshot that the - tiny repository is based on. These are ``read only'' patches, - present only to transform the tree into a similar state as it is in - the upstream maintainer's repository. -\item The ``rework'' group. Patches that I have submitted, but that - the upstream maintainer has requested modifications to before he - will accept them. -\item The ``pending'' group. Patches that I have not yet submitted to - the upstream maintainer, but which we have finished working on. - These will be ``read only'' for a while. If the upstream maintainer - accepts them upon submission, I'll move them to the end of the - ``accepted'' group. If he requests that I modify any, I'll move - them to the beginning of the ``rework'' group. -\item The ``in progress'' group. Patches that are actively being - developed, and should not be submitted anywhere yet. -\item The ``backport'' group. Patches that adapt the source tree to - older versions of the kernel tree. -\item The ``do not ship'' group. Patches that for some reason should - never be submitted upstream. For example, one such patch might - change embedded driver identification strings to make it easier to - distinguish, in the field, between an out-of-tree version of the - driver and a version shipped by a distribution vendor. -\end{itemize} - -Now to return to the reasons for ordering groups of patches in this -way. We would like the lowest patches in the stack to be as stable as -possible, so that we will not need to rework higher patches due to -changes in context. Putting patches that will never be changed first -in the \sfilename{series} file serves this purpose. - -We would also like the patches that we know we'll need to modify to be -applied on top of a source tree that resembles the upstream tree as -closely as possible. This is why we keep accepted patches around for -a while. - -The ``backport'' and ``do not ship'' patches float at the end of the -\sfilename{series} file. The backport patches must be applied on top -of all other patches, and the ``do not ship'' patches might as well -stay out of harm's way. - -\section{Maintaining the patch series} - -In my work, I use a number of guards to control which patches are to -be applied. - -\begin{itemize} -\item ``Accepted'' patches are guarded with \texttt{accepted}. I - enable this guard most of the time. When I'm applying the patches - on top of a tree where the patches are already present, I can turn - this patch off, and the patches that follow it will apply cleanly. -\item Patches that are ``finished'', but not yet submitted, have no - guards. If I'm applying the patch stack to a copy of the upstream - tree, I don't need to enable any guards in order to get a reasonably - safe source tree. -\item Those patches that need reworking before being resubmitted are - guarded with \texttt{rework}. -\item For those patches that are still under development, I use - \texttt{devel}. -\item A backport patch may have several guards, one for each version - of the kernel to which it applies. For example, a patch that - backports a piece of code to~2.6.9 will have a~\texttt{2.6.9} guard. -\end{itemize} -This variety of guards gives me considerable flexibility in -determining what kind of source tree I want to end up with. For most -situations, the selection of appropriate guards is automated during -the build process, but I can manually tune the guards to use for less -common circumstances. - -\subsection{The art of writing backport patches} - -Using MQ, writing a backport patch is a simple process. All such a -patch has to do is modify a piece of code that uses a kernel feature -not present in the older version of the kernel, so that the driver -continues to work correctly under that older version. - -A useful goal when writing a good backport patch is to make your code -look as if it was written for the older version of the kernel you're -targeting. The less obtrusive the patch, the easier it will be to -understand and maintain. If you're writing a collection of backport -patches to avoid the ``rat's nest'' effect of lots of -\texttt{\#ifdef}s (hunks of source code that are only used -conditionally) in your code, don't introduce version-dependent -\texttt{\#ifdef}s into the patches. Instead, write several patches, -each of which makes unconditional changes, and control their -application using guards. - -There are two reasons to divide backport patches into a distinct -group, away from the ``regular'' patches whose effects they modify. -The first is that intermingling the two makes it more difficult to use -a tool like the \hgext{patchbomb} extension to automate the process of -submitting the patches to an upstream maintainer. The second is that -a backport patch could perturb the context in which a subsequent -regular patch is applied, making it impossible to apply the regular -patch cleanly \emph{without} the earlier backport patch already being -applied. - -\section{Useful tips for developing with MQ} - -\subsection{Organising patches in directories} - -If you're working on a substantial project with MQ, it's not difficult -to accumulate a large number of patches. For example, I have one -patch repository that contains over 250 patches. - -If you can group these patches into separate logical categories, you -can if you like store them in different directories; MQ has no -problems with patch names that contain path separators. - -\subsection{Viewing the history of a patch} -\label{mq-collab:tips:interdiff} - -If you're developing a set of patches over a long time, it's a good -idea to maintain them in a repository, as discussed in -section~\ref{sec:mq:repo}. If you do so, you'll quickly discover that -using the \hgcmd{diff} command to look at the history of changes to a -patch is unworkable. This is in part because you're looking at the -second derivative of the real code (a diff of a diff), but also -because MQ adds noise to the process by modifying time stamps and -directory names when it updates a patch. - -However, you can use the \hgext{extdiff} extension, which is bundled -with Mercurial, to turn a diff of two versions of a patch into -something readable. To do this, you will need a third-party package -called \package{patchutils}~\cite{web:patchutils}. This provides a -command named \command{interdiff}, which shows the differences between -two diffs as a diff. Used on two versions of the same diff, it -generates a diff that represents the diff from the first to the second -version. - -You can enable the \hgext{extdiff} extension in the usual way, by -adding a line to the \rcsection{extensions} section of your \hgrc. -\begin{codesample2} - [extensions] - extdiff = -\end{codesample2} -The \command{interdiff} command expects to be passed the names of two -files, but the \hgext{extdiff} extension passes the program it runs a -pair of directories, each of which can contain an arbitrary number of -files. We thus need a small program that will run \command{interdiff} -on each pair of files in these two directories. This program is -available as \sfilename{hg-interdiff} in the \dirname{examples} -directory of the source code repository that accompanies this book. -\excode{hg-interdiff} - -With the \sfilename{hg-interdiff} program in your shell's search path, -you can run it as follows, from inside an MQ patch directory: -\begin{codesample2} - hg extdiff -p hg-interdiff -r A:B my-change.patch -\end{codesample2} -Since you'll probably want to use this long-winded command a lot, you -can get \hgext{hgext} to make it available as a normal Mercurial -command, again by editing your \hgrc. -\begin{codesample2} - [extdiff] - cmd.interdiff = hg-interdiff -\end{codesample2} -This directs \hgext{hgext} to make an \texttt{interdiff} command -available, so you can now shorten the previous invocation of -\hgxcmd{extdiff}{extdiff} to something a little more wieldy. -\begin{codesample2} - hg interdiff -r A:B my-change.patch -\end{codesample2} - -\begin{note} - The \command{interdiff} command works well only if the underlying - files against which versions of a patch are generated remain the - same. If you create a patch, modify the underlying files, and then - regenerate the patch, \command{interdiff} may not produce useful - output. -\end{note} - -The \hgext{extdiff} extension is useful for more than merely improving -the presentation of MQ~patches. To read more about it, go to -section~\ref{sec:hgext:extdiff}. - -%%% Local Variables: -%%% mode: latex -%%% TeX-master: "00book" -%%% End: diff -r bc14f94e726a -r 5cd47f721686 en/mq-ref.tex --- a/en/mq-ref.tex Thu Jan 29 22:47:34 2009 -0800 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,349 +0,0 @@ -\chapter{Mercurial Queues reference} -\label{chap:mqref} - -\section{MQ command reference} -\label{sec:mqref:cmdref} - -For an overview of the commands provided by MQ, use the command -\hgcmdargs{help}{mq}. - -\subsection{\hgxcmd{mq}{qapplied}---print applied patches} - -The \hgxcmd{mq}{qapplied} command prints the current stack of applied -patches. Patches are printed in oldest-to-newest order, so the last -patch in the list is the ``top'' patch. - -\subsection{\hgxcmd{mq}{qcommit}---commit changes in the queue repository} - -The \hgxcmd{mq}{qcommit} command commits any outstanding changes in the -\sdirname{.hg/patches} repository. This command only works if the -\sdirname{.hg/patches} directory is a repository, i.e.~you created the -directory using \hgcmdargs{qinit}{\hgxopt{mq}{qinit}{-c}} or ran -\hgcmd{init} in the directory after running \hgxcmd{mq}{qinit}. - -This command is shorthand for \hgcmdargs{commit}{--cwd .hg/patches}. - -\subsection{\hgxcmd{mq}{qdelete}---delete a patch from the - \sfilename{series} file} - -The \hgxcmd{mq}{qdelete} command removes the entry for a patch from the -\sfilename{series} file in the \sdirname{.hg/patches} directory. It -does not pop the patch if the patch is already applied. By default, -it does not delete the patch file; use the \hgxopt{mq}{qdel}{-f} option to -do that. - -Options: -\begin{itemize} -\item[\hgxopt{mq}{qdel}{-f}] Delete the patch file. -\end{itemize} - -\subsection{\hgxcmd{mq}{qdiff}---print a diff of the topmost applied patch} - -The \hgxcmd{mq}{qdiff} command prints a diff of the topmost applied patch. -It is equivalent to \hgcmdargs{diff}{-r-2:-1}. - -\subsection{\hgxcmd{mq}{qfold}---merge (``fold'') several patches into one} - -The \hgxcmd{mq}{qfold} command merges multiple patches into the topmost -applied patch, so that the topmost applied patch makes the union of -all of the changes in the patches in question. - -The patches to fold must not be applied; \hgxcmd{mq}{qfold} will exit with -an error if any is. The order in which patches are folded is -significant; \hgcmdargs{qfold}{a b} means ``apply the current topmost -patch, followed by \texttt{a}, followed by \texttt{b}''. - -The comments from the folded patches are appended to the comments of -the destination patch, with each block of comments separated by three -asterisk (``\texttt{*}'') characters. Use the \hgxopt{mq}{qfold}{-e} -option to edit the commit message for the combined patch/changeset -after the folding has completed. - -Options: -\begin{itemize} -\item[\hgxopt{mq}{qfold}{-e}] Edit the commit message and patch description - for the newly folded patch. -\item[\hgxopt{mq}{qfold}{-l}] Use the contents of the given file as the new - commit message and patch description for the folded patch. -\item[\hgxopt{mq}{qfold}{-m}] Use the given text as the new commit message - and patch description for the folded patch. -\end{itemize} - -\subsection{\hgxcmd{mq}{qheader}---display the header/description of a patch} - -The \hgxcmd{mq}{qheader} command prints the header, or description, of a -patch. By default, it prints the header of the topmost applied patch. -Given an argument, it prints the header of the named patch. - -\subsection{\hgxcmd{mq}{qimport}---import a third-party patch into the queue} - -The \hgxcmd{mq}{qimport} command adds an entry for an external patch to the -\sfilename{series} file, and copies the patch into the -\sdirname{.hg/patches} directory. It adds the entry immediately after -the topmost applied patch, but does not push the patch. - -If the \sdirname{.hg/patches} directory is a repository, -\hgxcmd{mq}{qimport} automatically does an \hgcmd{add} of the imported -patch. - -\subsection{\hgxcmd{mq}{qinit}---prepare a repository to work with MQ} - -The \hgxcmd{mq}{qinit} command prepares a repository to work with MQ. It -creates a directory called \sdirname{.hg/patches}. - -Options: -\begin{itemize} -\item[\hgxopt{mq}{qinit}{-c}] Create \sdirname{.hg/patches} as a repository - in its own right. Also creates a \sfilename{.hgignore} file that - will ignore the \sfilename{status} file. -\end{itemize} - -When the \sdirname{.hg/patches} directory is a repository, the -\hgxcmd{mq}{qimport} and \hgxcmd{mq}{qnew} commands automatically \hgcmd{add} -new patches. - -\subsection{\hgxcmd{mq}{qnew}---create a new patch} - -The \hgxcmd{mq}{qnew} command creates a new patch. It takes one mandatory -argument, the name to use for the patch file. The newly created patch -is created empty by default. It is added to the \sfilename{series} -file after the current topmost applied patch, and is immediately -pushed on top of that patch. - -If \hgxcmd{mq}{qnew} finds modified files in the working directory, it will -refuse to create a new patch unless the \hgxopt{mq}{qnew}{-f} option is -used (see below). This behaviour allows you to \hgxcmd{mq}{qrefresh} your -topmost applied patch before you apply a new patch on top of it. - -Options: -\begin{itemize} -\item[\hgxopt{mq}{qnew}{-f}] Create a new patch if the contents of the - working directory are modified. Any outstanding modifications are - added to the newly created patch, so after this command completes, - the working directory will no longer be modified. -\item[\hgxopt{mq}{qnew}{-m}] Use the given text as the commit message. - This text will be stored at the beginning of the patch file, before - the patch data. -\end{itemize} - -\subsection{\hgxcmd{mq}{qnext}---print the name of the next patch} - -The \hgxcmd{mq}{qnext} command prints the name name of the next patch in -the \sfilename{series} file after the topmost applied patch. This -patch will become the topmost applied patch if you run \hgxcmd{mq}{qpush}. - -\subsection{\hgxcmd{mq}{qpop}---pop patches off the stack} - -The \hgxcmd{mq}{qpop} command removes applied patches from the top of the -stack of applied patches. By default, it removes only one patch. - -This command removes the changesets that represent the popped patches -from the repository, and updates the working directory to undo the -effects of the patches. - -This command takes an optional argument, which it uses as the name or -index of the patch to pop to. If given a name, it will pop patches -until the named patch is the topmost applied patch. If given a -number, \hgxcmd{mq}{qpop} treats the number as an index into the entries in -the series file, counting from zero (empty lines and lines containing -only comments do not count). It pops patches until the patch -identified by the given index is the topmost applied patch. - -The \hgxcmd{mq}{qpop} command does not read or write patches or the -\sfilename{series} file. It is thus safe to \hgxcmd{mq}{qpop} a patch that -you have removed from the \sfilename{series} file, or a patch that you -have renamed or deleted entirely. In the latter two cases, use the -name of the patch as it was when you applied it. - -By default, the \hgxcmd{mq}{qpop} command will not pop any patches if the -working directory has been modified. You can override this behaviour -using the \hgxopt{mq}{qpop}{-f} option, which reverts all modifications in -the working directory. - -Options: -\begin{itemize} -\item[\hgxopt{mq}{qpop}{-a}] Pop all applied patches. This returns the - repository to its state before you applied any patches. -\item[\hgxopt{mq}{qpop}{-f}] Forcibly revert any modifications to the - working directory when popping. -\item[\hgxopt{mq}{qpop}{-n}] Pop a patch from the named queue. -\end{itemize} - -The \hgxcmd{mq}{qpop} command removes one line from the end of the -\sfilename{status} file for each patch that it pops. - -\subsection{\hgxcmd{mq}{qprev}---print the name of the previous patch} - -The \hgxcmd{mq}{qprev} command prints the name of the patch in the -\sfilename{series} file that comes before the topmost applied patch. -This will become the topmost applied patch if you run \hgxcmd{mq}{qpop}. - -\subsection{\hgxcmd{mq}{qpush}---push patches onto the stack} -\label{sec:mqref:cmd:qpush} - -The \hgxcmd{mq}{qpush} command adds patches onto the applied stack. By -default, it adds only one patch. - -This command creates a new changeset to represent each applied patch, -and updates the working directory to apply the effects of the patches. - -The default data used when creating a changeset are as follows: -\begin{itemize} -\item The commit date and time zone are the current date and time - zone. Because these data are used to compute the identity of a - changeset, this means that if you \hgxcmd{mq}{qpop} a patch and - \hgxcmd{mq}{qpush} it again, the changeset that you push will have a - different identity than the changeset you popped. -\item The author is the same as the default used by the \hgcmd{commit} - command. -\item The commit message is any text from the patch file that comes - before the first diff header. If there is no such text, a default - commit message is used that identifies the name of the patch. -\end{itemize} -If a patch contains a Mercurial patch header (XXX add link), the -information in the patch header overrides these defaults. - -Options: -\begin{itemize} -\item[\hgxopt{mq}{qpush}{-a}] Push all unapplied patches from the - \sfilename{series} file until there are none left to push. -\item[\hgxopt{mq}{qpush}{-l}] Add the name of the patch to the end - of the commit message. -\item[\hgxopt{mq}{qpush}{-m}] If a patch fails to apply cleanly, use the - entry for the patch in another saved queue to compute the parameters - for a three-way merge, and perform a three-way merge using the - normal Mercurial merge machinery. Use the resolution of the merge - as the new patch content. -\item[\hgxopt{mq}{qpush}{-n}] Use the named queue if merging while pushing. -\end{itemize} - -The \hgxcmd{mq}{qpush} command reads, but does not modify, the -\sfilename{series} file. It appends one line to the \hgcmd{status} -file for each patch that it pushes. - -\subsection{\hgxcmd{mq}{qrefresh}---update the topmost applied patch} - -The \hgxcmd{mq}{qrefresh} command updates the topmost applied patch. It -modifies the patch, removes the old changeset that represented the -patch, and creates a new changeset to represent the modified patch. - -The \hgxcmd{mq}{qrefresh} command looks for the following modifications: -\begin{itemize} -\item Changes to the commit message, i.e.~the text before the first - diff header in the patch file, are reflected in the new changeset - that represents the patch. -\item Modifications to tracked files in the working directory are - added to the patch. -\item Changes to the files tracked using \hgcmd{add}, \hgcmd{copy}, - \hgcmd{remove}, or \hgcmd{rename}. Added files and copy and rename - destinations are added to the patch, while removed files and rename - sources are removed. -\end{itemize} - -Even if \hgxcmd{mq}{qrefresh} detects no changes, it still recreates the -changeset that represents the patch. This causes the identity of the -changeset to differ from the previous changeset that identified the -patch. - -Options: -\begin{itemize} -\item[\hgxopt{mq}{qrefresh}{-e}] Modify the commit and patch description, - using the preferred text editor. -\item[\hgxopt{mq}{qrefresh}{-m}] Modify the commit message and patch - description, using the given text. -\item[\hgxopt{mq}{qrefresh}{-l}] Modify the commit message and patch - description, using text from the given file. -\end{itemize} - -\subsection{\hgxcmd{mq}{qrename}---rename a patch} - -The \hgxcmd{mq}{qrename} command renames a patch, and changes the entry for -the patch in the \sfilename{series} file. - -With a single argument, \hgxcmd{mq}{qrename} renames the topmost applied -patch. With two arguments, it renames its first argument to its -second. - -\subsection{\hgxcmd{mq}{qrestore}---restore saved queue state} - -XXX No idea what this does. - -\subsection{\hgxcmd{mq}{qsave}---save current queue state} - -XXX Likewise. - -\subsection{\hgxcmd{mq}{qseries}---print the entire patch series} - -The \hgxcmd{mq}{qseries} command prints the entire patch series from the -\sfilename{series} file. It prints only patch names, not empty lines -or comments. It prints in order from first to be applied to last. - -\subsection{\hgxcmd{mq}{qtop}---print the name of the current patch} - -The \hgxcmd{mq}{qtop} prints the name of the topmost currently applied -patch. - -\subsection{\hgxcmd{mq}{qunapplied}---print patches not yet applied} - -The \hgxcmd{mq}{qunapplied} command prints the names of patches from the -\sfilename{series} file that are not yet applied. It prints them in -order from the next patch that will be pushed to the last. - -\subsection{\hgcmd{strip}---remove a revision and descendants} - -The \hgcmd{strip} command removes a revision, and all of its -descendants, from the repository. It undoes the effects of the -removed revisions from the repository, and updates the working -directory to the first parent of the removed revision. - -The \hgcmd{strip} command saves a backup of the removed changesets in -a bundle, so that they can be reapplied if removed in error. - -Options: -\begin{itemize} -\item[\hgopt{strip}{-b}] Save unrelated changesets that are intermixed - with the stripped changesets in the backup bundle. -\item[\hgopt{strip}{-f}] If a branch has multiple heads, remove all - heads. XXX This should be renamed, and use \texttt{-f} to strip revs - when there are pending changes. -\item[\hgopt{strip}{-n}] Do not save a backup bundle. -\end{itemize} - -\section{MQ file reference} - -\subsection{The \sfilename{series} file} - -The \sfilename{series} file contains a list of the names of all -patches that MQ can apply. It is represented as a list of names, with -one name saved per line. Leading and trailing white space in each -line are ignored. - -Lines may contain comments. A comment begins with the ``\texttt{\#}'' -character, and extends to the end of the line. Empty lines, and lines -that contain only comments, are ignored. - -You will often need to edit the \sfilename{series} file by hand, hence -the support for comments and empty lines noted above. For example, -you can comment out a patch temporarily, and \hgxcmd{mq}{qpush} will skip -over that patch when applying patches. You can also change the order -in which patches are applied by reordering their entries in the -\sfilename{series} file. - -Placing the \sfilename{series} file under revision control is also -supported; it is a good idea to place all of the patches that it -refers to under revision control, as well. If you create a patch -directory using the \hgxopt{mq}{qinit}{-c} option to \hgxcmd{mq}{qinit}, this -will be done for you automatically. - -\subsection{The \sfilename{status} file} - -The \sfilename{status} file contains the names and changeset hashes of -all patches that MQ currently has applied. Unlike the -\sfilename{series} file, this file is not intended for editing. You -should not place this file under revision control, or modify it in any -way. It is used by MQ strictly for internal book-keeping. - -%%% Local Variables: -%%% mode: latex -%%% TeX-master: "00book" -%%% End: diff -r bc14f94e726a -r 5cd47f721686 en/mq.tex --- a/en/mq.tex Thu Jan 29 22:47:34 2009 -0800 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,1043 +0,0 @@ -\chapter{Managing change with Mercurial Queues} -\label{chap:mq} - -\section{The patch management problem} -\label{sec:mq:patch-mgmt} - -Here is a common scenario: you need to install a software package from -source, but you find a bug that you must fix in the source before you -can start using the package. You make your changes, forget about the -package for a while, and a few months later you need to upgrade to a -newer version of the package. If the newer version of the package -still has the bug, you must extract your fix from the older source -tree and apply it against the newer version. This is a tedious task, -and it's easy to make mistakes. - -This is a simple case of the ``patch management'' problem. You have -an ``upstream'' source tree that you can't change; you need to make -some local changes on top of the upstream tree; and you'd like to be -able to keep those changes separate, so that you can apply them to -newer versions of the upstream source. - -The patch management problem arises in many situations. Probably the -most visible is that a user of an open source software project will -contribute a bug fix or new feature to the project's maintainers in the -form of a patch. - -Distributors of operating systems that include open source software -often need to make changes to the packages they distribute so that -they will build properly in their environments. - -When you have few changes to maintain, it is easy to manage a single -patch using the standard \command{diff} and \command{patch} programs -(see section~\ref{sec:mq:patch} for a discussion of these tools). -Once the number of changes grows, it starts to make sense to maintain -patches as discrete ``chunks of work,'' so that for example a single -patch will contain only one bug fix (the patch might modify several -files, but it's doing ``only one thing''), and you may have a number -of such patches for different bugs you need fixed and local changes -you require. In this situation, if you submit a bug fix patch to the -upstream maintainers of a package and they include your fix in a -subsequent release, you can simply drop that single patch when you're -updating to the newer release. - -Maintaining a single patch against an upstream tree is a little -tedious and error-prone, but not difficult. However, the complexity -of the problem grows rapidly as the number of patches you have to -maintain increases. With more than a tiny number of patches in hand, -understanding which ones you have applied and maintaining them moves -from messy to overwhelming. - -Fortunately, Mercurial includes a powerful extension, Mercurial Queues -(or simply ``MQ''), that massively simplifies the patch management -problem. - -\section{The prehistory of Mercurial Queues} -\label{sec:mq:history} - -During the late 1990s, several Linux kernel developers started to -maintain ``patch series'' that modified the behaviour of the Linux -kernel. Some of these series were focused on stability, some on -feature coverage, and others were more speculative. - -The sizes of these patch series grew rapidly. In 2002, Andrew Morton -published some shell scripts he had been using to automate the task of -managing his patch queues. Andrew was successfully using these -scripts to manage hundreds (sometimes thousands) of patches on top of -the Linux kernel. - -\subsection{A patchwork quilt} -\label{sec:mq:quilt} - -In early 2003, Andreas Gruenbacher and Martin Quinson borrowed the -approach of Andrew's scripts and published a tool called ``patchwork -quilt''~\cite{web:quilt}, or simply ``quilt'' -(see~\cite{gruenbacher:2005} for a paper describing it). Because -quilt substantially automated patch management, it rapidly gained a -large following among open source software developers. - -Quilt manages a \emph{stack of patches} on top of a directory tree. -To begin, you tell quilt to manage a directory tree, and tell it which -files you want to manage; it stores away the names and contents of -those files. To fix a bug, you create a new patch (using a single -command), edit the files you need to fix, then ``refresh'' the patch. - -The refresh step causes quilt to scan the directory tree; it updates -the patch with all of the changes you have made. You can create -another patch on top of the first, which will track the changes -required to modify the tree from ``tree with one patch applied'' to -``tree with two patches applied''. - -You can \emph{change} which patches are applied to the tree. If you -``pop'' a patch, the changes made by that patch will vanish from the -directory tree. Quilt remembers which patches you have popped, -though, so you can ``push'' a popped patch again, and the directory -tree will be restored to contain the modifications in the patch. Most -importantly, you can run the ``refresh'' command at any time, and the -topmost applied patch will be updated. This means that you can, at -any time, change both which patches are applied and what -modifications those patches make. - -Quilt knows nothing about revision control tools, so it works equally -well on top of an unpacked tarball or a Subversion working copy. - -\subsection{From patchwork quilt to Mercurial Queues} -\label{sec:mq:quilt-mq} - -In mid-2005, Chris Mason took the features of quilt and wrote an -extension that he called Mercurial Queues, which added quilt-like -behaviour to Mercurial. - -The key difference between quilt and MQ is that quilt knows nothing -about revision control systems, while MQ is \emph{integrated} into -Mercurial. Each patch that you push is represented as a Mercurial -changeset. Pop a patch, and the changeset goes away. - -Because quilt does not care about revision control tools, it is still -a tremendously useful piece of software to know about for situations -where you cannot use Mercurial and MQ. - -\section{The huge advantage of MQ} - -I cannot overstate the value that MQ offers through the unification of -patches and revision control. - -A major reason that patches have persisted in the free software and -open source world---in spite of the availability of increasingly -capable revision control tools over the years---is the \emph{agility} -they offer. - -Traditional revision control tools make a permanent, irreversible -record of everything that you do. While this has great value, it's -also somewhat stifling. If you want to perform a wild-eyed -experiment, you have to be careful in how you go about it, or you risk -leaving unneeded---or worse, misleading or destabilising---traces of -your missteps and errors in the permanent revision record. - -By contrast, MQ's marriage of distributed revision control with -patches makes it much easier to isolate your work. Your patches live -on top of normal revision history, and you can make them disappear or -reappear at will. If you don't like a patch, you can drop it. If a -patch isn't quite as you want it to be, simply fix it---as many times -as you need to, until you have refined it into the form you desire. - -As an example, the integration of patches with revision control makes -understanding patches and debugging their effects---and their -interplay with the code they're based on---\emph{enormously} easier. -Since every applied patch has an associated changeset, you can use -\hgcmdargs{log}{\emph{filename}} to see which changesets and patches -affected a file. You can use the \hgext{bisect} command to -binary-search through all changesets and applied patches to see where -a bug got introduced or fixed. You can use the \hgcmd{annotate} -command to see which changeset or patch modified a particular line of -a source file. And so on. - -\section{Understanding patches} -\label{sec:mq:patch} - -Because MQ doesn't hide its patch-oriented nature, it is helpful to -understand what patches are, and a little about the tools that work -with them. - -The traditional Unix \command{diff} command compares two files, and -prints a list of differences between them. The \command{patch} command -understands these differences as \emph{modifications} to make to a -file. Take a look at figure~\ref{ex:mq:diff} for a simple example of -these commands in action. - -\begin{figure}[ht] - \interaction{mq.dodiff.diff} - \caption{Simple uses of the \command{diff} and \command{patch} commands} - \label{ex:mq:diff} -\end{figure} - -The type of file that \command{diff} generates (and \command{patch} -takes as input) is called a ``patch'' or a ``diff''; there is no -difference between a patch and a diff. (We'll use the term ``patch'', -since it's more commonly used.) - -A patch file can start with arbitrary text; the \command{patch} -command ignores this text, but MQ uses it as the commit message when -creating changesets. To find the beginning of the patch content, -\command{patch} searches for the first line that starts with the -string ``\texttt{diff~-}''. - -MQ works with \emph{unified} diffs (\command{patch} can accept several -other diff formats, but MQ doesn't). A unified diff contains two -kinds of header. The \emph{file header} describes the file being -modified; it contains the name of the file to modify. When -\command{patch} sees a new file header, it looks for a file with that -name to start modifying. - -After the file header comes a series of \emph{hunks}. Each hunk -starts with a header; this identifies the range of line numbers within -the file that the hunk should modify. Following the header, a hunk -starts and ends with a few (usually three) lines of text from the -unmodified file; these are called the \emph{context} for the hunk. If -there's only a small amount of context between successive hunks, -\command{diff} doesn't print a new hunk header; it just runs the hunks -together, with a few lines of context between modifications. - -Each line of context begins with a space character. Within the hunk, -a line that begins with ``\texttt{-}'' means ``remove this line,'' -while a line that begins with ``\texttt{+}'' means ``insert this -line.'' For example, a line that is modified is represented by one -deletion and one insertion. - -We will return to some of the more subtle aspects of patches later (in -section~\ref{sec:mq:adv-patch}), but you should have enough information -now to use MQ. - -\section{Getting started with Mercurial Queues} -\label{sec:mq:start} - -Because MQ is implemented as an extension, you must explicitly enable -before you can use it. (You don't need to download anything; MQ ships -with the standard Mercurial distribution.) To enable MQ, edit your -\tildefile{.hgrc} file, and add the lines in figure~\ref{ex:mq:config}. - -\begin{figure}[ht] - \begin{codesample4} - [extensions] - hgext.mq = - \end{codesample4} - \label{ex:mq:config} - \caption{Contents to add to \tildefile{.hgrc} to enable the MQ extension} -\end{figure} - -Once the extension is enabled, it will make a number of new commands -available. To verify that the extension is working, you can use -\hgcmd{help} to see if the \hgxcmd{mq}{qinit} command is now available; see -the example in figure~\ref{ex:mq:enabled}. - -\begin{figure}[ht] - \interaction{mq.qinit-help.help} - \caption{How to verify that MQ is enabled} - \label{ex:mq:enabled} -\end{figure} - -You can use MQ with \emph{any} Mercurial repository, and its commands -only operate within that repository. To get started, simply prepare -the repository using the \hgxcmd{mq}{qinit} command (see -figure~\ref{ex:mq:qinit}). This command creates an empty directory -called \sdirname{.hg/patches}, where MQ will keep its metadata. As -with many Mercurial commands, the \hgxcmd{mq}{qinit} command prints nothing -if it succeeds. - -\begin{figure}[ht] - \interaction{mq.tutorial.qinit} - \caption{Preparing a repository for use with MQ} - \label{ex:mq:qinit} -\end{figure} - -\begin{figure}[ht] - \interaction{mq.tutorial.qnew} - \caption{Creating a new patch} - \label{ex:mq:qnew} -\end{figure} - -\subsection{Creating a new patch} - -To begin work on a new patch, use the \hgxcmd{mq}{qnew} command. This -command takes one argument, the name of the patch to create. MQ will -use this as the name of an actual file in the \sdirname{.hg/patches} -directory, as you can see in figure~\ref{ex:mq:qnew}. - -Also newly present in the \sdirname{.hg/patches} directory are two -other files, \sfilename{series} and \sfilename{status}. The -\sfilename{series} file lists all of the patches that MQ knows about -for this repository, with one patch per line. Mercurial uses the -\sfilename{status} file for internal book-keeping; it tracks all of the -patches that MQ has \emph{applied} in this repository. - -\begin{note} - You may sometimes want to edit the \sfilename{series} file by hand; - for example, to change the sequence in which some patches are - applied. However, manually editing the \sfilename{status} file is - almost always a bad idea, as it's easy to corrupt MQ's idea of what - is happening. -\end{note} - -Once you have created your new patch, you can edit files in the -working directory as you usually would. All of the normal Mercurial -commands, such as \hgcmd{diff} and \hgcmd{annotate}, work exactly as -they did before. - -\subsection{Refreshing a patch} - -When you reach a point where you want to save your work, use the -\hgxcmd{mq}{qrefresh} command (figure~\ref{ex:mq:qnew}) to update the patch -you are working on. This command folds the changes you have made in -the working directory into your patch, and updates its corresponding -changeset to contain those changes. - -\begin{figure}[ht] - \interaction{mq.tutorial.qrefresh} - \caption{Refreshing a patch} - \label{ex:mq:qrefresh} -\end{figure} - -You can run \hgxcmd{mq}{qrefresh} as often as you like, so it's a good way -to ``checkpoint'' your work. Refresh your patch at an opportune -time; try an experiment; and if the experiment doesn't work out, -\hgcmd{revert} your modifications back to the last time you refreshed. - -\begin{figure}[ht] - \interaction{mq.tutorial.qrefresh2} - \caption{Refresh a patch many times to accumulate changes} - \label{ex:mq:qrefresh2} -\end{figure} - -\subsection{Stacking and tracking patches} - -Once you have finished working on a patch, or need to work on another, -you can use the \hgxcmd{mq}{qnew} command again to create a new patch. -Mercurial will apply this patch on top of your existing patch. See -figure~\ref{ex:mq:qnew2} for an example. Notice that the patch -contains the changes in our prior patch as part of its context (you -can see this more clearly in the output of \hgcmd{annotate}). - -\begin{figure}[ht] - \interaction{mq.tutorial.qnew2} - \caption{Stacking a second patch on top of the first} - \label{ex:mq:qnew2} -\end{figure} - -So far, with the exception of \hgxcmd{mq}{qnew} and \hgxcmd{mq}{qrefresh}, we've -been careful to only use regular Mercurial commands. However, MQ -provides many commands that are easier to use when you are thinking -about patches, as illustrated in figure~\ref{ex:mq:qseries}: - -\begin{itemize} -\item The \hgxcmd{mq}{qseries} command lists every patch that MQ knows - about in this repository, from oldest to newest (most recently - \emph{created}). -\item The \hgxcmd{mq}{qapplied} command lists every patch that MQ has - \emph{applied} in this repository, again from oldest to newest (most - recently applied). -\end{itemize} - -\begin{figure}[ht] - \interaction{mq.tutorial.qseries} - \caption{Understanding the patch stack with \hgxcmd{mq}{qseries} and - \hgxcmd{mq}{qapplied}} - \label{ex:mq:qseries} -\end{figure} - -\subsection{Manipulating the patch stack} - -The previous discussion implied that there must be a difference -between ``known'' and ``applied'' patches, and there is. MQ can -manage a patch without it being applied in the repository. - -An \emph{applied} patch has a corresponding changeset in the -repository, and the effects of the patch and changeset are visible in -the working directory. You can undo the application of a patch using -the \hgxcmd{mq}{qpop} command. MQ still \emph{knows about}, or manages, a -popped patch, but the patch no longer has a corresponding changeset in -the repository, and the working directory does not contain the changes -made by the patch. Figure~\ref{fig:mq:stack} illustrates the -difference between applied and tracked patches. - -\begin{figure}[ht] - \centering - \grafix{mq-stack} - \caption{Applied and unapplied patches in the MQ patch stack} - \label{fig:mq:stack} -\end{figure} - -You can reapply an unapplied, or popped, patch using the \hgxcmd{mq}{qpush} -command. This creates a new changeset to correspond to the patch, and -the patch's changes once again become present in the working -directory. See figure~\ref{ex:mq:qpop} for examples of \hgxcmd{mq}{qpop} -and \hgxcmd{mq}{qpush} in action. Notice that once we have popped a patch -or two patches, the output of \hgxcmd{mq}{qseries} remains the same, while -that of \hgxcmd{mq}{qapplied} has changed. - -\begin{figure}[ht] - \interaction{mq.tutorial.qpop} - \caption{Modifying the stack of applied patches} - \label{ex:mq:qpop} -\end{figure} - -\subsection{Pushing and popping many patches} - -While \hgxcmd{mq}{qpush} and \hgxcmd{mq}{qpop} each operate on a single patch at -a time by default, you can push and pop many patches in one go. The -\hgxopt{mq}{qpush}{-a} option to \hgxcmd{mq}{qpush} causes it to push all -unapplied patches, while the \hgxopt{mq}{qpop}{-a} option to \hgxcmd{mq}{qpop} -causes it to pop all applied patches. (For some more ways to push and -pop many patches, see section~\ref{sec:mq:perf} below.) - -\begin{figure}[ht] - \interaction{mq.tutorial.qpush-a} - \caption{Pushing all unapplied patches} - \label{ex:mq:qpush-a} -\end{figure} - -\subsection{Safety checks, and overriding them} - -Several MQ commands check the working directory before they do -anything, and fail if they find any modifications. They do this to -ensure that you won't lose any changes that you have made, but not yet -incorporated into a patch. Figure~\ref{ex:mq:add} illustrates this; -the \hgxcmd{mq}{qnew} command will not create a new patch if there are -outstanding changes, caused in this case by the \hgcmd{add} of -\filename{file3}. - -\begin{figure}[ht] - \interaction{mq.tutorial.add} - \caption{Forcibly creating a patch} - \label{ex:mq:add} -\end{figure} - -Commands that check the working directory all take an ``I know what -I'm doing'' option, which is always named \option{-f}. The exact -meaning of \option{-f} depends on the command. For example, -\hgcmdargs{qnew}{\hgxopt{mq}{qnew}{-f}} will incorporate any outstanding -changes into the new patch it creates, but -\hgcmdargs{qpop}{\hgxopt{mq}{qpop}{-f}} will revert modifications to any -files affected by the patch that it is popping. Be sure to read the -documentation for a command's \option{-f} option before you use it! - -\subsection{Working on several patches at once} - -The \hgxcmd{mq}{qrefresh} command always refreshes the \emph{topmost} -applied patch. This means that you can suspend work on one patch (by -refreshing it), pop or push to make a different patch the top, and -work on \emph{that} patch for a while. - -Here's an example that illustrates how you can use this ability. -Let's say you're developing a new feature as two patches. The first -is a change to the core of your software, and the second---layered on -top of the first---changes the user interface to use the code you just -added to the core. If you notice a bug in the core while you're -working on the UI patch, it's easy to fix the core. Simply -\hgxcmd{mq}{qrefresh} the UI patch to save your in-progress changes, and -\hgxcmd{mq}{qpop} down to the core patch. Fix the core bug, -\hgxcmd{mq}{qrefresh} the core patch, and \hgxcmd{mq}{qpush} back to the UI -patch to continue where you left off. - -\section{More about patches} -\label{sec:mq:adv-patch} - -MQ uses the GNU \command{patch} command to apply patches, so it's -helpful to know a few more detailed aspects of how \command{patch} -works, and about patches themselves. - -\subsection{The strip count} - -If you look at the file headers in a patch, you will notice that the -pathnames usually have an extra component on the front that isn't -present in the actual path name. This is a holdover from the way that -people used to generate patches (people still do this, but it's -somewhat rare with modern revision control tools). - -Alice would unpack a tarball, edit her files, then decide that she -wanted to create a patch. So she'd rename her working directory, -unpack the tarball again (hence the need for the rename), and use the -\cmdopt{diff}{-r} and \cmdopt{diff}{-N} options to \command{diff} to -recursively generate a patch between the unmodified directory and the -modified one. The result would be that the name of the unmodified -directory would be at the front of the left-hand path in every file -header, and the name of the modified directory would be at the front -of the right-hand path. - -Since someone receiving a patch from the Alices of the net would be -unlikely to have unmodified and modified directories with exactly the -same names, the \command{patch} command has a \cmdopt{patch}{-p} -option that indicates the number of leading path name components to -strip when trying to apply a patch. This number is called the -\emph{strip count}. - -An option of ``\texttt{-p1}'' means ``use a strip count of one''. If -\command{patch} sees a file name \filename{foo/bar/baz} in a file -header, it will strip \filename{foo} and try to patch a file named -\filename{bar/baz}. (Strictly speaking, the strip count refers to the -number of \emph{path separators} (and the components that go with them -) to strip. A strip count of one will turn \filename{foo/bar} into -\filename{bar}, but \filename{/foo/bar} (notice the extra leading -slash) into \filename{foo/bar}.) - -The ``standard'' strip count for patches is one; almost all patches -contain one leading path name component that needs to be stripped. -Mercurial's \hgcmd{diff} command generates path names in this form, -and the \hgcmd{import} command and MQ expect patches to have a strip -count of one. - -If you receive a patch from someone that you want to add to your patch -queue, and the patch needs a strip count other than one, you cannot -just \hgxcmd{mq}{qimport} the patch, because \hgxcmd{mq}{qimport} does not yet -have a \texttt{-p} option (see~\bug{311}). Your best bet is to -\hgxcmd{mq}{qnew} a patch of your own, then use \cmdargs{patch}{-p\emph{N}} -to apply their patch, followed by \hgcmd{addremove} to pick up any -files added or removed by the patch, followed by \hgxcmd{mq}{qrefresh}. -This complexity may become unnecessary; see~\bug{311} for details. -\subsection{Strategies for applying a patch} - -When \command{patch} applies a hunk, it tries a handful of -successively less accurate strategies to try to make the hunk apply. -This falling-back technique often makes it possible to take a patch -that was generated against an old version of a file, and apply it -against a newer version of that file. - -First, \command{patch} tries an exact match, where the line numbers, -the context, and the text to be modified must apply exactly. If it -cannot make an exact match, it tries to find an exact match for the -context, without honouring the line numbering information. If this -succeeds, it prints a line of output saying that the hunk was applied, -but at some \emph{offset} from the original line number. - -If a context-only match fails, \command{patch} removes the first and -last lines of the context, and tries a \emph{reduced} context-only -match. If the hunk with reduced context succeeds, it prints a message -saying that it applied the hunk with a \emph{fuzz factor} (the number -after the fuzz factor indicates how many lines of context -\command{patch} had to trim before the patch applied). - -When neither of these techniques works, \command{patch} prints a -message saying that the hunk in question was rejected. It saves -rejected hunks (also simply called ``rejects'') to a file with the -same name, and an added \sfilename{.rej} extension. It also saves an -unmodified copy of the file with a \sfilename{.orig} extension; the -copy of the file without any extensions will contain any changes made -by hunks that \emph{did} apply cleanly. If you have a patch that -modifies \filename{foo} with six hunks, and one of them fails to -apply, you will have: an unmodified \filename{foo.orig}, a -\filename{foo.rej} containing one hunk, and \filename{foo}, containing -the changes made by the five successful hunks. - -\subsection{Some quirks of patch representation} - -There are a few useful things to know about how \command{patch} works -with files. -\begin{itemize} -\item This should already be obvious, but \command{patch} cannot - handle binary files. -\item Neither does it care about the executable bit; it creates new - files as readable, but not executable. -\item \command{patch} treats the removal of a file as a diff between - the file to be removed and the empty file. So your idea of ``I - deleted this file'' looks like ``every line of this file was - deleted'' in a patch. -\item It treats the addition of a file as a diff between the empty - file and the file to be added. So in a patch, your idea of ``I - added this file'' looks like ``every line of this file was added''. -\item It treats a renamed file as the removal of the old name, and the - addition of the new name. This means that renamed files have a big - footprint in patches. (Note also that Mercurial does not currently - try to infer when files have been renamed or copied in a patch.) -\item \command{patch} cannot represent empty files, so you cannot use - a patch to represent the notion ``I added this empty file to the - tree''. -\end{itemize} -\subsection{Beware the fuzz} - -While applying a hunk at an offset, or with a fuzz factor, will often -be completely successful, these inexact techniques naturally leave -open the possibility of corrupting the patched file. The most common -cases typically involve applying a patch twice, or at an incorrect -location in the file. If \command{patch} or \hgxcmd{mq}{qpush} ever -mentions an offset or fuzz factor, you should make sure that the -modified files are correct afterwards. - -It's often a good idea to refresh a patch that has applied with an -offset or fuzz factor; refreshing the patch generates new context -information that will make it apply cleanly. I say ``often,'' not -``always,'' because sometimes refreshing a patch will make it fail to -apply against a different revision of the underlying files. In some -cases, such as when you're maintaining a patch that must sit on top of -multiple versions of a source tree, it's acceptable to have a patch -apply with some fuzz, provided you've verified the results of the -patching process in such cases. - -\subsection{Handling rejection} - -If \hgxcmd{mq}{qpush} fails to apply a patch, it will print an error -message and exit. If it has left \sfilename{.rej} files behind, it is -usually best to fix up the rejected hunks before you push more patches -or do any further work. - -If your patch \emph{used to} apply cleanly, and no longer does because -you've changed the underlying code that your patches are based on, -Mercurial Queues can help; see section~\ref{sec:mq:merge} for details. - -Unfortunately, there aren't any great techniques for dealing with -rejected hunks. Most often, you'll need to view the \sfilename{.rej} -file and edit the target file, applying the rejected hunks by hand. - -If you're feeling adventurous, Neil Brown, a Linux kernel hacker, -wrote a tool called \command{wiggle}~\cite{web:wiggle}, which is more -vigorous than \command{patch} in its attempts to make a patch apply. - -Another Linux kernel hacker, Chris Mason (the author of Mercurial -Queues), wrote a similar tool called -\command{mpatch}~\cite{web:mpatch}, which takes a simple approach to -automating the application of hunks rejected by \command{patch}. The -\command{mpatch} command can help with four common reasons that a hunk -may be rejected: - -\begin{itemize} -\item The context in the middle of a hunk has changed. -\item A hunk is missing some context at the beginning or end. -\item A large hunk might apply better---either entirely or in - part---if it was broken up into smaller hunks. -\item A hunk removes lines with slightly different content than those - currently present in the file. -\end{itemize} - -If you use \command{wiggle} or \command{mpatch}, you should be doubly -careful to check your results when you're done. In fact, -\command{mpatch} enforces this method of double-checking the tool's -output, by automatically dropping you into a merge program when it has -done its job, so that you can verify its work and finish off any -remaining merges. - -\section{Getting the best performance out of MQ} -\label{sec:mq:perf} - -MQ is very efficient at handling a large number of patches. I ran -some performance experiments in mid-2006 for a talk that I gave at the -2006 EuroPython conference~\cite{web:europython}. I used as my data -set the Linux 2.6.17-mm1 patch series, which consists of 1,738 -patches. I applied these on top of a Linux kernel repository -containing all 27,472 revisions between Linux 2.6.12-rc2 and Linux -2.6.17. - -On my old, slow laptop, I was able to -\hgcmdargs{qpush}{\hgxopt{mq}{qpush}{-a}} all 1,738 patches in 3.5 minutes, -and \hgcmdargs{qpop}{\hgxopt{mq}{qpop}{-a}} them all in 30 seconds. (On a -newer laptop, the time to push all patches dropped to two minutes.) I -could \hgxcmd{mq}{qrefresh} one of the biggest patches (which made 22,779 -lines of changes to 287 files) in 6.6 seconds. - -Clearly, MQ is well suited to working in large trees, but there are a -few tricks you can use to get the best performance of it. - -First of all, try to ``batch'' operations together. Every time you -run \hgxcmd{mq}{qpush} or \hgxcmd{mq}{qpop}, these commands scan the working -directory once to make sure you haven't made some changes and then -forgotten to run \hgxcmd{mq}{qrefresh}. On a small tree, the time that -this scan takes is unnoticeable. However, on a medium-sized tree -(containing tens of thousands of files), it can take a second or more. - -The \hgxcmd{mq}{qpush} and \hgxcmd{mq}{qpop} commands allow you to push and pop -multiple patches at a time. You can identify the ``destination -patch'' that you want to end up at. When you \hgxcmd{mq}{qpush} with a -destination specified, it will push patches until that patch is at the -top of the applied stack. When you \hgxcmd{mq}{qpop} to a destination, MQ -will pop patches until the destination patch is at the top. - -You can identify a destination patch using either the name of the -patch, or by number. If you use numeric addressing, patches are -counted from zero; this means that the first patch is zero, the second -is one, and so on. - -\section{Updating your patches when the underlying code changes} -\label{sec:mq:merge} - -It's common to have a stack of patches on top of an underlying -repository that you don't modify directly. If you're working on -changes to third-party code, or on a feature that is taking longer to -develop than the rate of change of the code beneath, you will often -need to sync up with the underlying code, and fix up any hunks in your -patches that no longer apply. This is called \emph{rebasing} your -patch series. - -The simplest way to do this is to \hgcmdargs{qpop}{\hgxopt{mq}{qpop}{-a}} -your patches, then \hgcmd{pull} changes into the underlying -repository, and finally \hgcmdargs{qpush}{\hgxopt{mq}{qpop}{-a}} your -patches again. MQ will stop pushing any time it runs across a patch -that fails to apply during conflicts, allowing you to fix your -conflicts, \hgxcmd{mq}{qrefresh} the affected patch, and continue pushing -until you have fixed your entire stack. - -This approach is easy to use and works well if you don't expect -changes to the underlying code to affect how well your patches apply. -If your patch stack touches code that is modified frequently or -invasively in the underlying repository, however, fixing up rejected -hunks by hand quickly becomes tiresome. - -It's possible to partially automate the rebasing process. If your -patches apply cleanly against some revision of the underlying repo, MQ -can use this information to help you to resolve conflicts between your -patches and a different revision. - -The process is a little involved. -\begin{enumerate} -\item To begin, \hgcmdargs{qpush}{-a} all of your patches on top of - the revision where you know that they apply cleanly. -\item Save a backup copy of your patch directory using - \hgcmdargs{qsave}{\hgxopt{mq}{qsave}{-e} \hgxopt{mq}{qsave}{-c}}. This prints - the name of the directory that it has saved the patches in. It will - save the patches to a directory called - \sdirname{.hg/patches.\emph{N}}, where \texttt{\emph{N}} is a small - integer. It also commits a ``save changeset'' on top of your - applied patches; this is for internal book-keeping, and records the - states of the \sfilename{series} and \sfilename{status} files. -\item Use \hgcmd{pull} to bring new changes into the underlying - repository. (Don't run \hgcmdargs{pull}{-u}; see below for why.) -\item Update to the new tip revision, using - \hgcmdargs{update}{\hgopt{update}{-C}} to override the patches you - have pushed. -\item Merge all patches using \hgcmdargs{qpush}{\hgxopt{mq}{qpush}{-m} - \hgxopt{mq}{qpush}{-a}}. The \hgxopt{mq}{qpush}{-m} option to \hgxcmd{mq}{qpush} - tells MQ to perform a three-way merge if the patch fails to apply. -\end{enumerate} - -During the \hgcmdargs{qpush}{\hgxopt{mq}{qpush}{-m}}, each patch in the -\sfilename{series} file is applied normally. If a patch applies with -fuzz or rejects, MQ looks at the queue you \hgxcmd{mq}{qsave}d, and -performs a three-way merge with the corresponding changeset. This -merge uses Mercurial's normal merge machinery, so it may pop up a GUI -merge tool to help you to resolve problems. - -When you finish resolving the effects of a patch, MQ refreshes your -patch based on the result of the merge. - -At the end of this process, your repository will have one extra head -from the old patch queue, and a copy of the old patch queue will be in -\sdirname{.hg/patches.\emph{N}}. You can remove the extra head using -\hgcmdargs{qpop}{\hgxopt{mq}{qpop}{-a} \hgxopt{mq}{qpop}{-n} patches.\emph{N}} -or \hgcmd{strip}. You can delete \sdirname{.hg/patches.\emph{N}} once -you are sure that you no longer need it as a backup. - -\section{Identifying patches} - -MQ commands that work with patches let you refer to a patch either by -using its name or by a number. By name is obvious enough; pass the -name \filename{foo.patch} to \hgxcmd{mq}{qpush}, for example, and it will -push patches until \filename{foo.patch} is applied. - -As a shortcut, you can refer to a patch using both a name and a -numeric offset; \texttt{foo.patch-2} means ``two patches before -\texttt{foo.patch}'', while \texttt{bar.patch+4} means ``four patches -after \texttt{bar.patch}''. - -Referring to a patch by index isn't much different. The first patch -printed in the output of \hgxcmd{mq}{qseries} is patch zero (yes, it's one -of those start-at-zero counting systems); the second is patch one; and -so on. - -MQ also makes it easy to work with patches when you are using normal -Mercurial commands. Every command that accepts a changeset ID will -also accept the name of an applied patch. MQ augments the tags -normally in the repository with an eponymous one for each applied -patch. In addition, the special tags \index{tags!special tag - names!\texttt{qbase}}\texttt{qbase} and \index{tags!special tag - names!\texttt{qtip}}\texttt{qtip} identify the ``bottom-most'' and -topmost applied patches, respectively. - -These additions to Mercurial's normal tagging capabilities make -dealing with patches even more of a breeze. -\begin{itemize} -\item Want to patchbomb a mailing list with your latest series of - changes? - \begin{codesample4} - hg email qbase:qtip - \end{codesample4} - (Don't know what ``patchbombing'' is? See - section~\ref{sec:hgext:patchbomb}.) -\item Need to see all of the patches since \texttt{foo.patch} that - have touched files in a subdirectory of your tree? - \begin{codesample4} - hg log -r foo.patch:qtip \emph{subdir} - \end{codesample4} -\end{itemize} - -Because MQ makes the names of patches available to the rest of -Mercurial through its normal internal tag machinery, you don't need to -type in the entire name of a patch when you want to identify it by -name. - -\begin{figure}[ht] - \interaction{mq.id.output} - \caption{Using MQ's tag features to work with patches} - \label{ex:mq:id} -\end{figure} - -Another nice consequence of representing patch names as tags is that -when you run the \hgcmd{log} command, it will display a patch's name -as a tag, simply as part of its normal output. This makes it easy to -visually distinguish applied patches from underlying ``normal'' -revisions. Figure~\ref{ex:mq:id} shows a few normal Mercurial -commands in use with applied patches. - -\section{Useful things to know about} - -There are a number of aspects of MQ usage that don't fit tidily into -sections of their own, but that are good to know. Here they are, in -one place. - -\begin{itemize} -\item Normally, when you \hgxcmd{mq}{qpop} a patch and \hgxcmd{mq}{qpush} it - again, the changeset that represents the patch after the pop/push - will have a \emph{different identity} than the changeset that - represented the hash beforehand. See - section~\ref{sec:mqref:cmd:qpush} for information as to why this is. -\item It's not a good idea to \hgcmd{merge} changes from another - branch with a patch changeset, at least if you want to maintain the - ``patchiness'' of that changeset and changesets below it on the - patch stack. If you try to do this, it will appear to succeed, but - MQ will become confused. -\end{itemize} - -\section{Managing patches in a repository} -\label{sec:mq:repo} - -Because MQ's \sdirname{.hg/patches} directory resides outside a -Mercurial repository's working directory, the ``underlying'' Mercurial -repository knows nothing about the management or presence of patches. - -This presents the interesting possibility of managing the contents of -the patch directory as a Mercurial repository in its own right. This -can be a useful way to work. For example, you can work on a patch for -a while, \hgxcmd{mq}{qrefresh} it, then \hgcmd{commit} the current state of -the patch. This lets you ``roll back'' to that version of the patch -later on. - -You can then share different versions of the same patch stack among -multiple underlying repositories. I use this when I am developing a -Linux kernel feature. I have a pristine copy of my kernel sources for -each of several CPU architectures, and a cloned repository under each -that contains the patches I am working on. When I want to test a -change on a different architecture, I push my current patches to the -patch repository associated with that kernel tree, pop and push all of -my patches, and build and test that kernel. - -Managing patches in a repository makes it possible for multiple -developers to work on the same patch series without colliding with -each other, all on top of an underlying source base that they may or -may not control. - -\subsection{MQ support for patch repositories} - -MQ helps you to work with the \sdirname{.hg/patches} directory as a -repository; when you prepare a repository for working with patches -using \hgxcmd{mq}{qinit}, you can pass the \hgxopt{mq}{qinit}{-c} option to -create the \sdirname{.hg/patches} directory as a Mercurial repository. - -\begin{note} - If you forget to use the \hgxopt{mq}{qinit}{-c} option, you can simply go - into the \sdirname{.hg/patches} directory at any time and run - \hgcmd{init}. Don't forget to add an entry for the - \sfilename{status} file to the \sfilename{.hgignore} file, though - - (\hgcmdargs{qinit}{\hgxopt{mq}{qinit}{-c}} does this for you - automatically); you \emph{really} don't want to manage the - \sfilename{status} file. -\end{note} - -As a convenience, if MQ notices that the \dirname{.hg/patches} -directory is a repository, it will automatically \hgcmd{add} every -patch that you create and import. - -MQ provides a shortcut command, \hgxcmd{mq}{qcommit}, that runs -\hgcmd{commit} in the \sdirname{.hg/patches} directory. This saves -some bothersome typing. - -Finally, as a convenience to manage the patch directory, you can -define the alias \command{mq} on Unix systems. For example, on Linux -systems using the \command{bash} shell, you can include the following -snippet in your \tildefile{.bashrc}. - -\begin{codesample2} - alias mq=`hg -R \$(hg root)/.hg/patches' -\end{codesample2} - -You can then issue commands of the form \cmdargs{mq}{pull} from -the main repository. - -\subsection{A few things to watch out for} - -MQ's support for working with a repository full of patches is limited -in a few small respects. - -MQ cannot automatically detect changes that you make to the patch -directory. If you \hgcmd{pull}, manually edit, or \hgcmd{update} -changes to patches or the \sfilename{series} file, you will have to -\hgcmdargs{qpop}{\hgxopt{mq}{qpop}{-a}} and then -\hgcmdargs{qpush}{\hgxopt{mq}{qpush}{-a}} in the underlying repository to -see those changes show up there. If you forget to do this, you can -confuse MQ's idea of which patches are applied. - -\section{Third party tools for working with patches} -\label{sec:mq:tools} - -Once you've been working with patches for a while, you'll find -yourself hungry for tools that will help you to understand and -manipulate the patches you're dealing with. - -The \command{diffstat} command~\cite{web:diffstat} generates a -histogram of the modifications made to each file in a patch. It -provides a good way to ``get a sense of'' a patch---which files it -affects, and how much change it introduces to each file and as a -whole. (I find that it's a good idea to use \command{diffstat}'s -\cmdopt{diffstat}{-p} option as a matter of course, as otherwise it -will try to do clever things with prefixes of file names that -inevitably confuse at least me.) - -\begin{figure}[ht] - \interaction{mq.tools.tools} - \caption{The \command{diffstat}, \command{filterdiff}, and \command{lsdiff} commands} - \label{ex:mq:tools} -\end{figure} - -The \package{patchutils} package~\cite{web:patchutils} is invaluable. -It provides a set of small utilities that follow the ``Unix -philosophy;'' each does one useful thing with a patch. The -\package{patchutils} command I use most is \command{filterdiff}, which -extracts subsets from a patch file. For example, given a patch that -modifies hundreds of files across dozens of directories, a single -invocation of \command{filterdiff} can generate a smaller patch that -only touches files whose names match a particular glob pattern. See -section~\ref{mq-collab:tips:interdiff} for another example. - -\section{Good ways to work with patches} - -Whether you are working on a patch series to submit to a free software -or open source project, or a series that you intend to treat as a -sequence of regular changesets when you're done, you can use some -simple techniques to keep your work well organised. - -Give your patches descriptive names. A good name for a patch might be -\filename{rework-device-alloc.patch}, because it will immediately give -you a hint what the purpose of the patch is. Long names shouldn't be -a problem; you won't be typing the names often, but you \emph{will} be -running commands like \hgxcmd{mq}{qapplied} and \hgxcmd{mq}{qtop} over and over. -Good naming becomes especially important when you have a number of -patches to work with, or if you are juggling a number of different -tasks and your patches only get a fraction of your attention. - -Be aware of what patch you're working on. Use the \hgxcmd{mq}{qtop} -command and skim over the text of your patches frequently---for -example, using \hgcmdargs{tip}{\hgopt{tip}{-p}})---to be sure of where -you stand. I have several times worked on and \hgxcmd{mq}{qrefresh}ed a -patch other than the one I intended, and it's often tricky to migrate -changes into the right patch after making them in the wrong one. - -For this reason, it is very much worth investing a little time to -learn how to use some of the third-party tools I described in -section~\ref{sec:mq:tools}, particularly \command{diffstat} and -\command{filterdiff}. The former will give you a quick idea of what -changes your patch is making, while the latter makes it easy to splice -hunks selectively out of one patch and into another. - -\section{MQ cookbook} - -\subsection{Manage ``trivial'' patches} - -Because the overhead of dropping files into a new Mercurial repository -is so low, it makes a lot of sense to manage patches this way even if -you simply want to make a few changes to a source tarball that you -downloaded. - -Begin by downloading and unpacking the source tarball, -and turning it into a Mercurial repository. -\interaction{mq.tarball.download} - -Continue by creating a patch stack and making your changes. -\interaction{mq.tarball.qinit} - -Let's say a few weeks or months pass, and your package author releases -a new version. First, bring their changes into the repository. -\interaction{mq.tarball.newsource} -The pipeline starting with \hgcmd{locate} above deletes all files in -the working directory, so that \hgcmd{commit}'s -\hgopt{commit}{--addremove} option can actually tell which files have -really been removed in the newer version of the source. - -Finally, you can apply your patches on top of the new tree. -\interaction{mq.tarball.repush} - -\subsection{Combining entire patches} -\label{sec:mq:combine} - -MQ provides a command, \hgxcmd{mq}{qfold} that lets you combine entire -patches. This ``folds'' the patches you name, in the order you name -them, into the topmost applied patch, and concatenates their -descriptions onto the end of its description. The patches that you -fold must be unapplied before you fold them. - -The order in which you fold patches matters. If your topmost applied -patch is \texttt{foo}, and you \hgxcmd{mq}{qfold} \texttt{bar} and -\texttt{quux} into it, you will end up with a patch that has the same -effect as if you applied first \texttt{foo}, then \texttt{bar}, -followed by \texttt{quux}. - -\subsection{Merging part of one patch into another} - -Merging \emph{part} of one patch into another is more difficult than -combining entire patches. - -If you want to move changes to entire files, you can use -\command{filterdiff}'s \cmdopt{filterdiff}{-i} and -\cmdopt{filterdiff}{-x} options to choose the modifications to snip -out of one patch, concatenating its output onto the end of the patch -you want to merge into. You usually won't need to modify the patch -you've merged the changes from. Instead, MQ will report some rejected -hunks when you \hgxcmd{mq}{qpush} it (from the hunks you moved into the -other patch), and you can simply \hgxcmd{mq}{qrefresh} the patch to drop -the duplicate hunks. - -If you have a patch that has multiple hunks modifying a file, and you -only want to move a few of those hunks, the job becomes more messy, -but you can still partly automate it. Use \cmdargs{lsdiff}{-nvv} to -print some metadata about the patch. -\interaction{mq.tools.lsdiff} - -This command prints three different kinds of number: -\begin{itemize} -\item (in the first column) a \emph{file number} to identify each file - modified in the patch; -\item (on the next line, indented) the line number within a modified - file where a hunk starts; and -\item (on the same line) a \emph{hunk number} to identify that hunk. -\end{itemize} - -You'll have to use some visual inspection, and reading of the patch, -to identify the file and hunk numbers you'll want, but you can then -pass them to to \command{filterdiff}'s \cmdopt{filterdiff}{--files} -and \cmdopt{filterdiff}{--hunks} options, to select exactly the file -and hunk you want to extract. - -Once you have this hunk, you can concatenate it onto the end of your -destination patch and continue with the remainder of -section~\ref{sec:mq:combine}. - -\section{Differences between quilt and MQ} - -If you are already familiar with quilt, MQ provides a similar command -set. There are a few differences in the way that it works. - -You will already have noticed that most quilt commands have MQ -counterparts that simply begin with a ``\texttt{q}''. The exceptions -are quilt's \texttt{add} and \texttt{remove} commands, the -counterparts for which are the normal Mercurial \hgcmd{add} and -\hgcmd{remove} commands. There is no MQ equivalent of the quilt -\texttt{edit} command. - -%%% Local Variables: -%%% mode: latex -%%% TeX-master: "00book" -%%% End: diff -r bc14f94e726a -r 5cd47f721686 en/preface.tex --- a/en/preface.tex Thu Jan 29 22:47:34 2009 -0800 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,67 +0,0 @@ -\chapter*{Preface} -\addcontentsline{toc}{chapter}{Preface} -\label{chap:preface} - -Distributed revision control is a relatively new territory, and has -thus far grown due to people's willingness to strike out into -ill-charted territory. - -I am writing a book about distributed revision control because I -believe that it is an important subject that deserves a field guide. -I chose to write about Mercurial because it is the easiest tool to -learn the terrain with, and yet it scales to the demands of real, -challenging environments where many other revision control tools fail. - -\section{This book is a work in progress} - -I am releasing this book while I am still writing it, in the hope that -it will prove useful to others. I also hope that readers will -contribute as they see fit. - -\section{About the examples in this book} - -This book takes an unusual approach to code samples. Every example is -``live''---each one is actually the result of a shell script that -executes the Mercurial commands you see. Every time an image of the -book is built from its sources, all the example scripts are -automatically run, and their current results compared against their -expected results. - -The advantage of this approach is that the examples are always -accurate; they describe \emph{exactly} the behaviour of the version of -Mercurial that's mentioned at the front of the book. If I update the -version of Mercurial that I'm documenting, and the output of some -command changes, the build fails. - -There is a small disadvantage to this approach, which is that the -dates and times you'll see in examples tend to be ``squashed'' -together in a way that they wouldn't be if the same commands were -being typed by a human. Where a human can issue no more than one -command every few seconds, with any resulting timestamps -correspondingly spread out, my automated example scripts run many -commands in one second. - -As an instance of this, several consecutive commits in an example can -show up as having occurred during the same second. You can see this -occur in the \hgext{bisect} example in section~\ref{sec:undo:bisect}, -for instance. - -So when you're reading examples, don't place too much weight on the -dates or times you see in the output of commands. But \emph{do} be -confident that the behaviour you're seeing is consistent and -reproducible. - -\section{Colophon---this book is Free} - -This book is licensed under the Open Publication License, and is -produced entirely using Free Software tools. It is typeset with -\LaTeX{}; illustrations are drawn and rendered with -\href{http://www.inkscape.org/}{Inkscape}. - -The complete source code for this book is published as a Mercurial -repository, at \url{http://hg.serpentine.com/mercurial/book}. - -%%% Local Variables: -%%% mode: latex -%%% TeX-master: "00book" -%%% End: diff -r bc14f94e726a -r 5cd47f721686 en/srcinstall.tex --- a/en/srcinstall.tex Thu Jan 29 22:47:34 2009 -0800 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,53 +0,0 @@ -\chapter{Installing Mercurial from source} -\label{chap:srcinstall} - -\section{On a Unix-like system} -\label{sec:srcinstall:unixlike} - -If you are using a Unix-like system that has a sufficiently recent -version of Python (2.3~or newer) available, it is easy to install -Mercurial from source. -\begin{enumerate} -\item Download a recent source tarball from - \url{http://www.selenic.com/mercurial/download}. -\item Unpack the tarball: - \begin{codesample4} - gzip -dc mercurial-\emph{version}.tar.gz | tar xf - - \end{codesample4} -\item Go into the source directory and run the installer script. This - will build Mercurial and install it in your home directory. - \begin{codesample4} - cd mercurial-\emph{version} - python setup.py install --force --home=\$HOME - \end{codesample4} -\end{enumerate} -Once the install finishes, Mercurial will be in the \texttt{bin} -subdirectory of your home directory. Don't forget to make sure that -this directory is present in your shell's search path. - -You will probably need to set the \envar{PYTHONPATH} environment -variable so that the Mercurial executable can find the rest of the -Mercurial packages. For example, on my laptop, I have set it to -\texttt{/home/bos/lib/python}. The exact path that you will need to -use depends on how Python was built for your system, but should be -easy to figure out. If you're uncertain, look through the output of -the installer script above, and see where the contents of the -\texttt{mercurial} directory were installed to. - -\section{On Windows} - -Building and installing Mercurial on Windows requires a variety of -tools, a fair amount of technical knowledge, and considerable -patience. I very much \emph{do not recommend} this route if you are a -``casual user''. Unless you intend to hack on Mercurial, I strongly -suggest that you use a binary package instead. - -If you are intent on building Mercurial from source on Windows, follow -the ``hard way'' directions on the Mercurial wiki at -\url{http://www.selenic.com/mercurial/wiki/index.cgi/WindowsInstall}, -and expect the process to involve a lot of fiddly work. - -%%% Local Variables: -%%% mode: latex -%%% TeX-master: "00book" -%%% End: diff -r bc14f94e726a -r 5cd47f721686 en/template.tex --- a/en/template.tex Thu Jan 29 22:47:34 2009 -0800 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,475 +0,0 @@ -\chapter{Customising the output of Mercurial} -\label{chap:template} - -Mercurial provides a powerful mechanism to let you control how it -displays information. The mechanism is based on templates. You can -use templates to generate specific output for a single command, or to -customise the entire appearance of the built-in web interface. - -\section{Using precanned output styles} -\label{sec:style} - -Packaged with Mercurial are some output styles that you can use -immediately. A style is simply a precanned template that someone -wrote and installed somewhere that Mercurial can find. - -Before we take a look at Mercurial's bundled styles, let's review its -normal output. - -\interaction{template.simple.normal} - -This is somewhat informative, but it takes up a lot of space---five -lines of output per changeset. The \texttt{compact} style reduces -this to three lines, presented in a sparse manner. - -\interaction{template.simple.compact} - -The \texttt{changelog} style hints at the expressive power of -Mercurial's templating engine. This style attempts to follow the GNU -Project's changelog guidelines\cite{web:changelog}. - -\interaction{template.simple.changelog} - -You will not be shocked to learn that Mercurial's default output style -is named \texttt{default}. - -\subsection{Setting a default style} - -You can modify the output style that Mercurial will use for every -command by editing your \hgrc\ file, naming the style you would -prefer to use. - -\begin{codesample2} - [ui] - style = compact -\end{codesample2} - -If you write a style of your own, you can use it by either providing -the path to your style file, or copying your style file into a -location where Mercurial can find it (typically the \texttt{templates} -subdirectory of your Mercurial install directory). - -\section{Commands that support styles and templates} - -All of Mercurial's ``\texttt{log}-like'' commands let you use styles -and templates: \hgcmd{incoming}, \hgcmd{log}, \hgcmd{outgoing}, and -\hgcmd{tip}. - -As I write this manual, these are so far the only commands that -support styles and templates. Since these are the most important -commands that need customisable output, there has been little pressure -from the Mercurial user community to add style and template support to -other commands. - -\section{The basics of templating} - -At its simplest, a Mercurial template is a piece of text. Some of the -text never changes, while other parts are \emph{expanded}, or replaced -with new text, when necessary. - -Before we continue, let's look again at a simple example of -Mercurial's normal output. - -\interaction{template.simple.normal} - -Now, let's run the same command, but using a template to change its -output. - -\interaction{template.simple.simplest} - -The example above illustrates the simplest possible template; it's -just a piece of static text, printed once for each changeset. The -\hgopt{log}{--template} option to the \hgcmd{log} command tells -Mercurial to use the given text as the template when printing each -changeset. - -Notice that the template string above ends with the text -``\Verb+\n+''. This is an \emph{escape sequence}, telling Mercurial -to print a newline at the end of each template item. If you omit this -newline, Mercurial will run each piece of output together. See -section~\ref{sec:template:escape} for more details of escape sequences. - -A template that prints a fixed string of text all the time isn't very -useful; let's try something a bit more complex. - -\interaction{template.simple.simplesub} - -As you can see, the string ``\Verb+{desc}+'' in the template has been -replaced in the output with the description of each changeset. Every -time Mercurial finds text enclosed in curly braces (``\texttt{\{}'' -and ``\texttt{\}}''), it will try to replace the braces and text with -the expansion of whatever is inside. To print a literal curly brace, -you must escape it, as described in section~\ref{sec:template:escape}. - -\section{Common template keywords} -\label{sec:template:keyword} - -You can start writing simple templates immediately using the keywords -below. - -\begin{itemize} -\item[\tplkword{author}] String. The unmodified author of the changeset. -\item[\tplkword{branches}] String. The name of the branch on which - the changeset was committed. Will be empty if the branch name was - \texttt{default}. -\item[\tplkword{date}] Date information. The date when the changeset - was committed. This is \emph{not} human-readable; you must pass it - through a filter that will render it appropriately. See - section~\ref{sec:template:filter} for more information on filters. - The date is expressed as a pair of numbers. The first number is a - Unix UTC timestamp (seconds since January 1, 1970); the second is - the offset of the committer's timezone from UTC, in seconds. -\item[\tplkword{desc}] String. The text of the changeset description. -\item[\tplkword{files}] List of strings. All files modified, added, or - removed by this changeset. -\item[\tplkword{file\_adds}] List of strings. Files added by this - changeset. -\item[\tplkword{file\_dels}] List of strings. Files removed by this - changeset. -\item[\tplkword{node}] String. The changeset identification hash, as a - 40-character hexadecimal string. -\item[\tplkword{parents}] List of strings. The parents of the - changeset. -\item[\tplkword{rev}] Integer. The repository-local changeset revision - number. -\item[\tplkword{tags}] List of strings. Any tags associated with the - changeset. -\end{itemize} - -A few simple experiments will show us what to expect when we use these -keywords; you can see the results in -figure~\ref{fig:template:keywords}. - -\begin{figure} - \interaction{template.simple.keywords} - \caption{Template keywords in use} - \label{fig:template:keywords} -\end{figure} - -As we noted above, the date keyword does not produce human-readable -output, so we must treat it specially. This involves using a -\emph{filter}, about which more in section~\ref{sec:template:filter}. - -\interaction{template.simple.datekeyword} - -\section{Escape sequences} -\label{sec:template:escape} - -Mercurial's templating engine recognises the most commonly used escape -sequences in strings. When it sees a backslash (``\Verb+\+'') -character, it looks at the following character and substitutes the two -characters with a single replacement, as described below. - -\begin{itemize} -\item[\Verb+\textbackslash\textbackslash+] Backslash, ``\Verb+\+'', - ASCII~134. -\item[\Verb+\textbackslash n+] Newline, ASCII~12. -\item[\Verb+\textbackslash r+] Carriage return, ASCII~15. -\item[\Verb+\textbackslash t+] Tab, ASCII~11. -\item[\Verb+\textbackslash v+] Vertical tab, ASCII~13. -\item[\Verb+\textbackslash \{+] Open curly brace, ``\Verb+{+'', ASCII~173. -\item[\Verb+\textbackslash \}+] Close curly brace, ``\Verb+}+'', ASCII~175. -\end{itemize} - -As indicated above, if you want the expansion of a template to contain -a literal ``\Verb+\+'', ``\Verb+{+'', or ``\Verb+{+'' character, you -must escape it. - -\section{Filtering keywords to change their results} -\label{sec:template:filter} - -Some of the results of template expansion are not immediately easy to -use. Mercurial lets you specify an optional chain of \emph{filters} -to modify the result of expanding a keyword. You have already seen a -common filter, \tplkwfilt{date}{isodate}, in action above, to make a -date readable. - -Below is a list of the most commonly used filters that Mercurial -supports. While some filters can be applied to any text, others can -only be used in specific circumstances. The name of each filter is -followed first by an indication of where it can be used, then a -description of its effect. - -\begin{itemize} -\item[\tplfilter{addbreaks}] Any text. Add an XHTML ``\Verb+
+'' - tag before the end of every line except the last. For example, - ``\Verb+foo\nbar+'' becomes ``\Verb+foo
\nbar+''. -\item[\tplkwfilt{date}{age}] \tplkword{date} keyword. Render the - age of the date, relative to the current time. Yields a string like - ``\Verb+10 minutes+''. -\item[\tplfilter{basename}] Any text, but most useful for the - \tplkword{files} keyword and its relatives. Treat the text as a - path, and return the basename. For example, ``\Verb+foo/bar/baz+'' - becomes ``\Verb+baz+''. -\item[\tplkwfilt{date}{date}] \tplkword{date} keyword. Render a date - in a similar format to the Unix \tplkword{date} command, but with - timezone included. Yields a string like - ``\Verb+Mon Sep 04 15:13:13 2006 -0700+''. -\item[\tplkwfilt{author}{domain}] Any text, but most useful for the - \tplkword{author} keyword. Finds the first string that looks like - an email address, and extract just the domain component. For - example, ``\Verb+Bryan O'Sullivan +'' becomes - ``\Verb+serpentine.com+''. -\item[\tplkwfilt{author}{email}] Any text, but most useful for the - \tplkword{author} keyword. Extract the first string that looks like - an email address. For example, - ``\Verb+Bryan O'Sullivan +'' becomes - ``\Verb+bos@serpentine.com+''. -\item[\tplfilter{escape}] Any text. Replace the special XML/XHTML - characters ``\Verb+&+'', ``\Verb+<+'' and ``\Verb+>+'' with - XML entities. -\item[\tplfilter{fill68}] Any text. Wrap the text to fit in 68 - columns. This is useful before you pass text through the - \tplfilter{tabindent} filter, and still want it to fit in an - 80-column fixed-font window. -\item[\tplfilter{fill76}] Any text. Wrap the text to fit in 76 - columns. -\item[\tplfilter{firstline}] Any text. Yield the first line of text, - without any trailing newlines. -\item[\tplkwfilt{date}{hgdate}] \tplkword{date} keyword. Render the - date as a pair of readable numbers. Yields a string like - ``\Verb+1157407993 25200+''. -\item[\tplkwfilt{date}{isodate}] \tplkword{date} keyword. Render the - date as a text string in ISO~8601 format. Yields a string like - ``\Verb+2006-09-04 15:13:13 -0700+''. -\item[\tplfilter{obfuscate}] Any text, but most useful for the - \tplkword{author} keyword. Yield the input text rendered as a - sequence of XML entities. This helps to defeat some particularly - stupid screen-scraping email harvesting spambots. -\item[\tplkwfilt{author}{person}] Any text, but most useful for the - \tplkword{author} keyword. Yield the text before an email address. - For example, ``\Verb+Bryan O'Sullivan +'' - becomes ``\Verb+Bryan O'Sullivan+''. -\item[\tplkwfilt{date}{rfc822date}] \tplkword{date} keyword. Render a - date using the same format used in email headers. Yields a string - like ``\Verb+Mon, 04 Sep 2006 15:13:13 -0700+''. -\item[\tplkwfilt{node}{short}] Changeset hash. Yield the short form - of a changeset hash, i.e.~a 12-character hexadecimal string. -\item[\tplkwfilt{date}{shortdate}] \tplkword{date} keyword. Render - the year, month, and day of the date. Yields a string like - ``\Verb+2006-09-04+''. -\item[\tplfilter{strip}] Any text. Strip all leading and trailing - whitespace from the string. -\item[\tplfilter{tabindent}] Any text. Yield the text, with every line - except the first starting with a tab character. -\item[\tplfilter{urlescape}] Any text. Escape all characters that are - considered ``special'' by URL parsers. For example, \Verb+foo bar+ - becomes \Verb+foo%20bar+. -\item[\tplkwfilt{author}{user}] Any text, but most useful for the - \tplkword{author} keyword. Return the ``user'' portion of an email - address. For example, - ``\Verb+Bryan O'Sullivan +'' becomes - ``\Verb+bos+''. -\end{itemize} - -\begin{figure} - \interaction{template.simple.manyfilters} - \caption{Template filters in action} - \label{fig:template:filters} -\end{figure} - -\begin{note} - If you try to apply a filter to a piece of data that it cannot - process, Mercurial will fail and print a Python exception. For - example, trying to run the output of the \tplkword{desc} keyword - into the \tplkwfilt{date}{isodate} filter is not a good idea. -\end{note} - -\subsection{Combining filters} - -It is easy to combine filters to yield output in the form you would -like. The following chain of filters tidies up a description, then -makes sure that it fits cleanly into 68 columns, then indents it by a -further 8~characters (at least on Unix-like systems, where a tab is -conventionally 8~characters wide). - -\interaction{template.simple.combine} - -Note the use of ``\Verb+\t+'' (a tab character) in the template to -force the first line to be indented; this is necessary since -\tplkword{tabindent} indents all lines \emph{except} the first. - -Keep in mind that the order of filters in a chain is significant. The -first filter is applied to the result of the keyword; the second to -the result of the first filter; and so on. For example, using -\Verb+fill68|tabindent+ gives very different results from -\Verb+tabindent|fill68+. - - -\section{From templates to styles} - -A command line template provides a quick and simple way to format some -output. Templates can become verbose, though, and it's useful to be -able to give a template a name. A style file is a template with a -name, stored in a file. - -More than that, using a style file unlocks the power of Mercurial's -templating engine in ways that are not possible using the command line -\hgopt{log}{--template} option. - -\subsection{The simplest of style files} - -Our simple style file contains just one line: - -\interaction{template.simple.rev} - -This tells Mercurial, ``if you're printing a changeset, use the text -on the right as the template''. - -\subsection{Style file syntax} - -The syntax rules for a style file are simple. - -\begin{itemize} -\item The file is processed one line at a time. - -\item Leading and trailing white space are ignored. - -\item Empty lines are skipped. - -\item If a line starts with either of the characters ``\texttt{\#}'' or - ``\texttt{;}'', the entire line is treated as a comment, and skipped - as if empty. - -\item A line starts with a keyword. This must start with an - alphabetic character or underscore, and can subsequently contain any - alphanumeric character or underscore. (In regexp notation, a - keyword must match \Verb+[A-Za-z_][A-Za-z0-9_]*+.) - -\item The next element must be an ``\texttt{=}'' character, which can - be preceded or followed by an arbitrary amount of white space. - -\item If the rest of the line starts and ends with matching quote - characters (either single or double quote), it is treated as a - template body. - -\item If the rest of the line \emph{does not} start with a quote - character, it is treated as the name of a file; the contents of this - file will be read and used as a template body. -\end{itemize} - -\section{Style files by example} - -To illustrate how to write a style file, we will construct a few by -example. Rather than provide a complete style file and walk through -it, we'll mirror the usual process of developing a style file by -starting with something very simple, and walking through a series of -successively more complete examples. - -\subsection{Identifying mistakes in style files} - -If Mercurial encounters a problem in a style file you are working on, -it prints a terse error message that, once you figure out what it -means, is actually quite useful. - -\interaction{template.svnstyle.syntax.input} - -Notice that \filename{broken.style} attempts to define a -\texttt{changeset} keyword, but forgets to give any content for it. -When instructed to use this style file, Mercurial promptly complains. - -\interaction{template.svnstyle.syntax.error} - -This error message looks intimidating, but it is not too hard to -follow. - -\begin{itemize} -\item The first component is simply Mercurial's way of saying ``I am - giving up''. - \begin{codesample4} - \textbf{abort:} broken.style:1: parse error - \end{codesample4} - -\item Next comes the name of the style file that contains the error. - \begin{codesample4} - abort: \textbf{broken.style}:1: parse error - \end{codesample4} - -\item Following the file name is the line number where the error was - encountered. - \begin{codesample4} - abort: broken.style:\textbf{1}: parse error - \end{codesample4} - -\item Finally, a description of what went wrong. - \begin{codesample4} - abort: broken.style:1: \textbf{parse error} - \end{codesample4} - The description of the problem is not always clear (as in this - case), but even when it is cryptic, it is almost always trivial to - visually inspect the offending line in the style file and see what - is wrong. -\end{itemize} - -\subsection{Uniquely identifying a repository} - -If you would like to be able to identify a Mercurial repository -``fairly uniquely'' using a short string as an identifier, you can -use the first revision in the repository. -\interaction{template.svnstyle.id} -This is not guaranteed to be unique, but it is nevertheless useful in -many cases. -\begin{itemize} -\item It will not work in a completely empty repository, because such - a repository does not have a revision~zero. -\item Neither will it work in the (extremely rare) case where a - repository is a merge of two or more formerly independent - repositories, and you still have those repositories around. -\end{itemize} -Here are some uses to which you could put this identifier: -\begin{itemize} -\item As a key into a table for a database that manages repositories - on a server. -\item As half of a \{\emph{repository~ID}, \emph{revision~ID}\} tuple. - Save this information away when you run an automated build or other - activity, so that you can ``replay'' the build later if necessary. -\end{itemize} - -\subsection{Mimicking Subversion's output} - -Let's try to emulate the default output format used by another -revision control tool, Subversion. -\interaction{template.svnstyle.short} - -Since Subversion's output style is fairly simple, it is easy to -copy-and-paste a hunk of its output into a file, and replace the text -produced above by Subversion with the template values we'd like to see -expanded. -\interaction{template.svnstyle.template} - -There are a few small ways in which this template deviates from the -output produced by Subversion. -\begin{itemize} -\item Subversion prints a ``readable'' date (the ``\texttt{Wed, 27 Sep - 2006}'' in the example output above) in parentheses. Mercurial's - templating engine does not provide a way to display a date in this - format without also printing the time and time zone. -\item We emulate Subversion's printing of ``separator'' lines full of - ``\texttt{-}'' characters by ending the template with such a line. - We use the templating engine's \tplkword{header} keyword to print a - separator line as the first line of output (see below), thus - achieving similar output to Subversion. -\item Subversion's output includes a count in the header of the number - of lines in the commit message. We cannot replicate this in - Mercurial; the templating engine does not currently provide a filter - that counts the number of lines the template generates. -\end{itemize} -It took me no more than a minute or two of work to replace literal -text from an example of Subversion's output with some keywords and -filters to give the template above. The style file simply refers to -the template. -\interaction{template.svnstyle.style} - -We could have included the text of the template file directly in the -style file by enclosing it in quotes and replacing the newlines with -``\verb!\n!'' sequences, but it would have made the style file too -difficult to read. Readability is a good guide when you're trying to -decide whether some text belongs in a style file, or in a template -file that the style file points to. If the style file will look too -big or cluttered if you insert a literal piece of text, drop it into a -template instead. - -%%% Local Variables: -%%% mode: latex -%%% TeX-master: "00book" -%%% End: diff -r bc14f94e726a -r 5cd47f721686 en/tour-basic.tex --- a/en/tour-basic.tex Thu Jan 29 22:47:34 2009 -0800 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,624 +0,0 @@ -\chapter{A tour of Mercurial: the basics} -\label{chap:tour-basic} - -\section{Installing Mercurial on your system} -\label{sec:tour:install} - -Prebuilt binary packages of Mercurial are available for every popular -operating system. These make it easy to start using Mercurial on your -computer immediately. - -\subsection{Linux} - -Because each Linux distribution has its own packaging tools, policies, -and rate of development, it's difficult to give a comprehensive set of -instructions on how to install Mercurial binaries. The version of -Mercurial that you will end up with can vary depending on how active -the person is who maintains the package for your distribution. - -To keep things simple, I will focus on installing Mercurial from the -command line under the most popular Linux distributions. Most of -these distributions provide graphical package managers that will let -you install Mercurial with a single click; the package name to look -for is \texttt{mercurial}. - -\begin{itemize} -\item[Debian] - \begin{codesample4} - apt-get install mercurial - \end{codesample4} - -\item[Fedora Core] - \begin{codesample4} - yum install mercurial - \end{codesample4} - -\item[Gentoo] - \begin{codesample4} - emerge mercurial - \end{codesample4} - -\item[OpenSUSE] - \begin{codesample4} - yum install mercurial - \end{codesample4} - -\item[Ubuntu] Ubuntu's Mercurial package is based on Debian's. To - install it, run the following command. - \begin{codesample4} - apt-get install mercurial - \end{codesample4} - The Ubuntu package for Mercurial tends to lag behind the Debian - version by a considerable time margin (at the time of writing, seven - months), which in some cases will mean that on Ubuntu, you may run - into problems that have since been fixed in the Debian package. -\end{itemize} - -\subsection{Solaris} - -SunFreeWare, at \url{http://www.sunfreeware.com}, is a good source for a -large number of pre-built Solaris packages for 32 and 64 bit Intel and -Sparc architectures, including current versions of Mercurial. - -\subsection{Mac OS X} - -Lee Cantey publishes an installer of Mercurial for Mac OS~X at -\url{http://mercurial.berkwood.com}. This package works on both -Intel-~and Power-based Macs. Before you can use it, you must install -a compatible version of Universal MacPython~\cite{web:macpython}. This -is easy to do; simply follow the instructions on Lee's site. - -It's also possible to install Mercurial using Fink or MacPorts, -two popular free package managers for Mac OS X. If you have Fink, -use \command{sudo apt-get install mercurial-py25}. If MacPorts, -\command{sudo port install mercurial}. - -\subsection{Windows} - -Lee Cantey publishes an installer of Mercurial for Windows at -\url{http://mercurial.berkwood.com}. This package has no external -dependencies; it ``just works''. - -\begin{note} - The Windows version of Mercurial does not automatically convert line - endings between Windows and Unix styles. If you want to share work - with Unix users, you must do a little additional configuration - work. XXX Flesh this out. -\end{note} - -\section{Getting started} - -To begin, we'll use the \hgcmd{version} command to find out whether -Mercurial is actually installed properly. The actual version -information that it prints isn't so important; it's whether it prints -anything at all that we care about. -\interaction{tour.version} - -\subsection{Built-in help} - -Mercurial provides a built-in help system. This is invaluable for those -times when you find yourself stuck trying to remember how to run a -command. If you are completely stuck, simply run \hgcmd{help}; it -will print a brief list of commands, along with a description of what -each does. If you ask for help on a specific command (as below), it -prints more detailed information. -\interaction{tour.help} -For a more impressive level of detail (which you won't usually need) -run \hgcmdargs{help}{\hggopt{-v}}. The \hggopt{-v} option is short -for \hggopt{--verbose}, and tells Mercurial to print more information -than it usually would. - -\section{Working with a repository} - -In Mercurial, everything happens inside a \emph{repository}. The -repository for a project contains all of the files that ``belong to'' -that project, along with a historical record of the project's files. - -There's nothing particularly magical about a repository; it is simply -a directory tree in your filesystem that Mercurial treats as special. -You can rename or delete a repository any time you like, using either the -command line or your file browser. - -\subsection{Making a local copy of a repository} - -\emph{Copying} a repository is just a little bit special. While you -could use a normal file copying command to make a copy of a -repository, it's best to use a built-in command that Mercurial -provides. This command is called \hgcmd{clone}, because it creates an -identical copy of an existing repository. -\interaction{tour.clone} -If our clone succeeded, we should now have a local directory called -\dirname{hello}. This directory will contain some files. -\interaction{tour.ls} -These files have the same contents and history in our repository as -they do in the repository we cloned. - -Every Mercurial repository is complete, self-contained, and -independent. It contains its own private copy of a project's files -and history. A cloned repository remembers the location of the -repository it was cloned from, but it does not communicate with that -repository, or any other, unless you tell it to. - -What this means for now is that we're free to experiment with our -repository, safe in the knowledge that it's a private ``sandbox'' that -won't affect anyone else. - -\subsection{What's in a repository?} - -When we take a more detailed look inside a repository, we can see that -it contains a directory named \dirname{.hg}. This is where Mercurial -keeps all of its metadata for the repository. -\interaction{tour.ls-a} - -The contents of the \dirname{.hg} directory and its subdirectories are -private to Mercurial. Every other file and directory in the -repository is yours to do with as you please. - -To introduce a little terminology, the \dirname{.hg} directory is the -``real'' repository, and all of the files and directories that coexist -with it are said to live in the \emph{working directory}. An easy way -to remember the distinction is that the \emph{repository} contains the -\emph{history} of your project, while the \emph{working directory} -contains a \emph{snapshot} of your project at a particular point in -history. - -\section{A tour through history} - -One of the first things we might want to do with a new, unfamiliar -repository is understand its history. The \hgcmd{log} command gives -us a view of history. -\interaction{tour.log} -By default, this command prints a brief paragraph of output for each -change to the project that was recorded. In Mercurial terminology, we -call each of these recorded events a \emph{changeset}, because it can -contain a record of changes to several files. - -The fields in a record of output from \hgcmd{log} are as follows. -\begin{itemize} -\item[\texttt{changeset}] This field has the format of a number, - followed by a colon, followed by a hexadecimal string. These are - \emph{identifiers} for the changeset. There are two identifiers - because the number is shorter and easier to type than the hex - string. -\item[\texttt{user}] The identity of the person who created the - changeset. This is a free-form field, but it most often contains a - person's name and email address. -\item[\texttt{date}] The date and time on which the changeset was - created, and the timezone in which it was created. (The date and - time are local to that timezone; they display what time and date it - was for the person who created the changeset.) -\item[\texttt{summary}] The first line of the text message that the - creator of the changeset entered to describe the changeset. -\end{itemize} -The default output printed by \hgcmd{log} is purely a summary; it is -missing a lot of detail. - -Figure~\ref{fig:tour-basic:history} provides a graphical representation of -the history of the \dirname{hello} repository, to make it a little -easier to see which direction history is ``flowing'' in. We'll be -returning to this figure several times in this chapter and the chapter -that follows. - -\begin{figure}[ht] - \centering - \grafix{tour-history} - \caption{Graphical history of the \dirname{hello} repository} - \label{fig:tour-basic:history} -\end{figure} - -\subsection{Changesets, revisions, and talking to other - people} - -As English is a notoriously sloppy language, and computer science has -a hallowed history of terminological confusion (why use one term when -four will do?), revision control has a variety of words and phrases -that mean the same thing. If you are talking about Mercurial history -with other people, you will find that the word ``changeset'' is often -compressed to ``change'' or (when written) ``cset'', and sometimes a -changeset is referred to as a ``revision'' or a ``rev''. - -While it doesn't matter what \emph{word} you use to refer to the -concept of ``a~changeset'', the \emph{identifier} that you use to -refer to ``a~\emph{specific} changeset'' is of great importance. -Recall that the \texttt{changeset} field in the output from -\hgcmd{log} identifies a changeset using both a number and a -hexadecimal string. -\begin{itemize} -\item The revision number is \emph{only valid in that repository}, -\item while the hex string is the \emph{permanent, unchanging - identifier} that will always identify that exact changeset in - \emph{every} copy of the repository. -\end{itemize} -This distinction is important. If you send someone an email talking -about ``revision~33'', there's a high likelihood that their -revision~33 will \emph{not be the same} as yours. The reason for this -is that a revision number depends on the order in which changes -arrived in a repository, and there is no guarantee that the same -changes will happen in the same order in different repositories. -Three changes $a,b,c$ can easily appear in one repository as $0,1,2$, -while in another as $1,0,2$. - -Mercurial uses revision numbers purely as a convenient shorthand. If -you need to discuss a changeset with someone, or make a record of a -changeset for some other reason (for example, in a bug report), use -the hexadecimal identifier. - -\subsection{Viewing specific revisions} - -To narrow the output of \hgcmd{log} down to a single revision, use the -\hgopt{log}{-r} (or \hgopt{log}{--rev}) option. You can use either a -revision number or a long-form changeset identifier, and you can -provide as many revisions as you want. \interaction{tour.log-r} - -If you want to see the history of several revisions without having to -list each one, you can use \emph{range notation}; this lets you -express the idea ``I want all revisions between $a$ and $b$, -inclusive''. -\interaction{tour.log.range} -Mercurial also honours the order in which you specify revisions, so -\hgcmdargs{log}{-r 2:4} prints $2,3,4$ while \hgcmdargs{log}{-r 4:2} -prints $4,3,2$. - -\subsection{More detailed information} - -While the summary information printed by \hgcmd{log} is useful if you -already know what you're looking for, you may need to see a complete -description of the change, or a list of the files changed, if you're -trying to decide whether a changeset is the one you're looking for. -The \hgcmd{log} command's \hggopt{-v} (or \hggopt{--verbose}) -option gives you this extra detail. -\interaction{tour.log-v} - -If you want to see both the description and content of a change, add -the \hgopt{log}{-p} (or \hgopt{log}{--patch}) option. This displays -the content of a change as a \emph{unified diff} (if you've never seen -a unified diff before, see section~\ref{sec:mq:patch} for an overview). -\interaction{tour.log-vp} - -\section{All about command options} - -Let's take a brief break from exploring Mercurial commands to discuss -a pattern in the way that they work; you may find this useful to keep -in mind as we continue our tour. - -Mercurial has a consistent and straightforward approach to dealing -with the options that you can pass to commands. It follows the -conventions for options that are common to modern Linux and Unix -systems. -\begin{itemize} -\item Every option has a long name. For example, as we've already - seen, the \hgcmd{log} command accepts a \hgopt{log}{--rev} option. -\item Most options have short names, too. Instead of - \hgopt{log}{--rev}, we can use \hgopt{log}{-r}. (The reason that - some options don't have short names is that the options in question - are rarely used.) -\item Long options start with two dashes (e.g.~\hgopt{log}{--rev}), - while short options start with one (e.g.~\hgopt{log}{-r}). -\item Option naming and usage is consistent across commands. For - example, every command that lets you specify a changeset~ID or - revision number accepts both \hgopt{log}{-r} and \hgopt{log}{--rev} - arguments. -\end{itemize} -In the examples throughout this book, I use short options instead of -long. This just reflects my own preference, so don't read anything -significant into it. - -Most commands that print output of some kind will print more output -when passed a \hggopt{-v} (or \hggopt{--verbose}) option, and less -when passed \hggopt{-q} (or \hggopt{--quiet}). - -\section{Making and reviewing changes} - -Now that we have a grasp of viewing history in Mercurial, let's take a -look at making some changes and examining them. - -The first thing we'll do is isolate our experiment in a repository of -its own. We use the \hgcmd{clone} command, but we don't need to -clone a copy of the remote repository. Since we already have a copy -of it locally, we can just clone that instead. This is much faster -than cloning over the network, and cloning a local repository uses -less disk space in most cases, too. -\interaction{tour.reclone} -As an aside, it's often good practice to keep a ``pristine'' copy of a -remote repository around, which you can then make temporary clones of -to create sandboxes for each task you want to work on. This lets you -work on multiple tasks in parallel, each isolated from the others -until it's complete and you're ready to integrate it back. Because -local clones are so cheap, there's almost no overhead to cloning and -destroying repositories whenever you want. - -In our \dirname{my-hello} repository, we have a file -\filename{hello.c} that contains the classic ``hello, world'' program. -Let's use the ancient and venerable \command{sed} command to edit this -file so that it prints a second line of output. (I'm only using -\command{sed} to do this because it's easy to write a scripted example -this way. Since you're not under the same constraint, you probably -won't want to use \command{sed}; simply use your preferred text editor to -do the same thing.) -\interaction{tour.sed} - -Mercurial's \hgcmd{status} command will tell us what Mercurial knows -about the files in the repository. -\interaction{tour.status} -The \hgcmd{status} command prints no output for some files, but a line -starting with ``\texttt{M}'' for \filename{hello.c}. Unless you tell -it to, \hgcmd{status} will not print any output for files that have -not been modified. - -The ``\texttt{M}'' indicates that Mercurial has noticed that we -modified \filename{hello.c}. We didn't need to \emph{inform} -Mercurial that we were going to modify the file before we started, or -that we had modified the file after we were done; it was able to -figure this out itself. - -It's a little bit helpful to know that we've modified -\filename{hello.c}, but we might prefer to know exactly \emph{what} -changes we've made to it. To do this, we use the \hgcmd{diff} -command. -\interaction{tour.diff} - -\section{Recording changes in a new changeset} - -We can modify files, build and test our changes, and use -\hgcmd{status} and \hgcmd{diff} to review our changes, until we're -satisfied with what we've done and arrive at a natural stopping point -where we want to record our work in a new changeset. - -The \hgcmd{commit} command lets us create a new changeset; we'll -usually refer to this as ``making a commit'' or ``committing''. - -\subsection{Setting up a username} - -When you try to run \hgcmd{commit} for the first time, it is not -guaranteed to succeed. Mercurial records your name and address with -each change that you commit, so that you and others will later be able -to tell who made each change. Mercurial tries to automatically figure -out a sensible username to commit the change with. It will attempt -each of the following methods, in order: -\begin{enumerate} -\item If you specify a \hgopt{commit}{-u} option to the \hgcmd{commit} - command on the command line, followed by a username, this is always - given the highest precedence. -\item If you have set the \envar{HGUSER} environment variable, this is - checked next. -\item If you create a file in your home directory called - \sfilename{.hgrc}, with a \rcitem{ui}{username} entry, that will be - used next. To see what the contents of this file should look like, - refer to section~\ref{sec:tour-basic:username} below. -\item If you have set the \envar{EMAIL} environment variable, this - will be used next. -\item Mercurial will query your system to find out your local user - name and host name, and construct a username from these components. - Since this often results in a username that is not very useful, it - will print a warning if it has to do this. -\end{enumerate} -If all of these mechanisms fail, Mercurial will fail, printing an -error message. In this case, it will not let you commit until you set -up a username. - -You should think of the \envar{HGUSER} environment variable and the -\hgopt{commit}{-u} option to the \hgcmd{commit} command as ways to -\emph{override} Mercurial's default selection of username. For normal -use, the simplest and most robust way to set a username for yourself -is by creating a \sfilename{.hgrc} file; see below for details. - -\subsubsection{Creating a Mercurial configuration file} -\label{sec:tour-basic:username} - -To set a user name, use your favourite editor to create a file called -\sfilename{.hgrc} in your home directory. Mercurial will use this -file to look up your personalised configuration settings. The initial -contents of your \sfilename{.hgrc} should look like this. -\begin{codesample2} - # This is a Mercurial configuration file. - [ui] - username = Firstname Lastname -\end{codesample2} -The ``\texttt{[ui]}'' line begins a \emph{section} of the config file, -so you can read the ``\texttt{username = ...}'' line as meaning ``set -the value of the \texttt{username} item in the \texttt{ui} section''. -A section continues until a new section begins, or the end of the -file. Mercurial ignores empty lines and treats any text from -``\texttt{\#}'' to the end of a line as a comment. - -\subsubsection{Choosing a user name} - -You can use any text you like as the value of the \texttt{username} -config item, since this information is for reading by other people, -but for interpreting by Mercurial. The convention that most people -follow is to use their name and email address, as in the example -above. - -\begin{note} - Mercurial's built-in web server obfuscates email addresses, to make - it more difficult for the email harvesting tools that spammers use. - This reduces the likelihood that you'll start receiving more junk - email if you publish a Mercurial repository on the web. -\end{note} - -\subsection{Writing a commit message} - -When we commit a change, Mercurial drops us into a text editor, to -enter a message that will describe the modifications we've made in -this changeset. This is called the \emph{commit message}. It will be -a record for readers of what we did and why, and it will be printed by -\hgcmd{log} after we've finished committing. -\interaction{tour.commit} - -The editor that the \hgcmd{commit} command drops us into will contain -an empty line, followed by a number of lines starting with -``\texttt{HG:}''. -\begin{codesample2} - \emph{empty line} - HG: changed hello.c -\end{codesample2} -Mercurial ignores the lines that start with ``\texttt{HG:}''; it uses -them only to tell us which files it's recording changes to. Modifying -or deleting these lines has no effect. - -\subsection{Writing a good commit message} - -Since \hgcmd{log} only prints the first line of a commit message by -default, it's best to write a commit message whose first line stands -alone. Here's a real example of a commit message that \emph{doesn't} -follow this guideline, and hence has a summary that is not readable. -\begin{codesample2} - changeset: 73:584af0e231be - user: Censored Person - date: Tue Sep 26 21:37:07 2006 -0700 - summary: include buildmeister/commondefs. Add an exports and install -\end{codesample2} - -As far as the remainder of the contents of the commit message are -concerned, there are no hard-and-fast rules. Mercurial itself doesn't -interpret or care about the contents of the commit message, though -your project may have policies that dictate a certain kind of -formatting. - -My personal preference is for short, but informative, commit messages -that tell me something that I can't figure out with a quick glance at -the output of \hgcmdargs{log}{--patch}. - -\subsection{Aborting a commit} - -If you decide that you don't want to commit while in the middle of -editing a commit message, simply exit from your editor without saving -the file that it's editing. This will cause nothing to happen to -either the repository or the working directory. - -If we run the \hgcmd{commit} command without any arguments, it records -all of the changes we've made, as reported by \hgcmd{status} and -\hgcmd{diff}. - -\subsection{Admiring our new handiwork} - -Once we've finished the commit, we can use the \hgcmd{tip} command to -display the changeset we just created. This command produces output -that is identical to \hgcmd{log}, but it only displays the newest -revision in the repository. -\interaction{tour.tip} -We refer to the newest revision in the repository as the tip revision, -or simply the tip. - -\section{Sharing changes} - -We mentioned earlier that repositories in Mercurial are -self-contained. This means that the changeset we just created exists -only in our \dirname{my-hello} repository. Let's look at a few ways -that we can propagate this change into other repositories. - -\subsection{Pulling changes from another repository} -\label{sec:tour:pull} - -To get started, let's clone our original \dirname{hello} repository, -which does not contain the change we just committed. We'll call our -temporary repository \dirname{hello-pull}. -\interaction{tour.clone-pull} - -We'll use the \hgcmd{pull} command to bring changes from -\dirname{my-hello} into \dirname{hello-pull}. However, blindly -pulling unknown changes into a repository is a somewhat scary -prospect. Mercurial provides the \hgcmd{incoming} command to tell us -what changes the \hgcmd{pull} command \emph{would} pull into the -repository, without actually pulling the changes in. -\interaction{tour.incoming} -(Of course, someone could cause more changesets to appear in the -repository that we ran \hgcmd{incoming} in, before we get a chance to -\hgcmd{pull} the changes, so that we could end up pulling changes that we -didn't expect.) - -Bringing changes into a repository is a simple matter of running the -\hgcmd{pull} command, and telling it which repository to pull from. -\interaction{tour.pull} -As you can see from the before-and-after output of \hgcmd{tip}, we -have successfully pulled changes into our repository. There remains -one step before we can see these changes in the working directory. - -\subsection{Updating the working directory} - -We have so far glossed over the relationship between a repository and -its working directory. The \hgcmd{pull} command that we ran in -section~\ref{sec:tour:pull} brought changes into the repository, but -if we check, there's no sign of those changes in the working -directory. This is because \hgcmd{pull} does not (by default) touch -the working directory. Instead, we use the \hgcmd{update} command to -do this. -\interaction{tour.update} - -It might seem a bit strange that \hgcmd{pull} doesn't update the -working directory automatically. There's actually a good reason for -this: you can use \hgcmd{update} to update the working directory to -the state it was in at \emph{any revision} in the history of the -repository. If you had the working directory updated to an old -revision---to hunt down the origin of a bug, say---and ran a -\hgcmd{pull} which automatically updated the working directory to a -new revision, you might not be terribly happy. - -However, since pull-then-update is such a common thing to do, -Mercurial lets you combine the two by passing the \hgopt{pull}{-u} -option to \hgcmd{pull}. -\begin{codesample2} - hg pull -u -\end{codesample2} -If you look back at the output of \hgcmd{pull} in -section~\ref{sec:tour:pull} when we ran it without \hgopt{pull}{-u}, -you can see that it printed a helpful reminder that we'd have to take -an explicit step to update the working directory: -\begin{codesample2} - (run 'hg update' to get a working copy) -\end{codesample2} - -To find out what revision the working directory is at, use the -\hgcmd{parents} command. -\interaction{tour.parents} -If you look back at figure~\ref{fig:tour-basic:history}, you'll see -arrows connecting each changeset. The node that the arrow leads -\emph{from} in each case is a parent, and the node that the arrow -leads \emph{to} is its child. The working directory has a parent in -just the same way; this is the changeset that the working directory -currently contains. - -To update the working directory to a particular revision, give a -revision number or changeset~ID to the \hgcmd{update} command. -\interaction{tour.older} -If you omit an explicit revision, \hgcmd{update} will update to the -tip revision, as shown by the second call to \hgcmd{update} in the -example above. - -\subsection{Pushing changes to another repository} - -Mercurial lets us push changes to another repository, from the -repository we're currently visiting. As with the example of -\hgcmd{pull} above, we'll create a temporary repository to push our -changes into. -\interaction{tour.clone-push} -The \hgcmd{outgoing} command tells us what changes would be pushed -into another repository. -\interaction{tour.outgoing} -And the \hgcmd{push} command does the actual push. -\interaction{tour.push} -As with \hgcmd{pull}, the \hgcmd{push} command does not update the -working directory in the repository that it's pushing changes into. -(Unlike \hgcmd{pull}, \hgcmd{push} does not provide a \texttt{-u} -option that updates the other repository's working directory.) - -What happens if we try to pull or push changes and the receiving -repository already has those changes? Nothing too exciting. -\interaction{tour.push.nothing} - -\subsection{Sharing changes over a network} - -The commands we have covered in the previous few sections are not -limited to working with local repositories. Each works in exactly the -same fashion over a network connection; simply pass in a URL instead -of a local path. -\interaction{tour.outgoing.net} -In this example, we can see what changes we could push to the remote -repository, but the repository is understandably not set up to let -anonymous users push to it. -\interaction{tour.push.net} - -%%% Local Variables: -%%% mode: latex -%%% TeX-master: "00book" -%%% End: diff -r bc14f94e726a -r 5cd47f721686 en/tour-merge.tex --- a/en/tour-merge.tex Thu Jan 29 22:47:34 2009 -0800 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,283 +0,0 @@ -\chapter{A tour of Mercurial: merging work} -\label{chap:tour-merge} - -We've now covered cloning a repository, making changes in a -repository, and pulling or pushing changes from one repository into -another. Our next step is \emph{merging} changes from separate -repositories. - -\section{Merging streams of work} - -Merging is a fundamental part of working with a distributed revision -control tool. -\begin{itemize} -\item Alice and Bob each have a personal copy of a repository for a - project they're collaborating on. Alice fixes a bug in her - repository; Bob adds a new feature in his. They want the shared - repository to contain both the bug fix and the new feature. -\item I frequently work on several different tasks for a single - project at once, each safely isolated in its own repository. - Working this way means that I often need to merge one piece of my - own work with another. -\end{itemize} - -Because merging is such a common thing to need to do, Mercurial makes -it easy. Let's walk through the process. We'll begin by cloning yet -another repository (see how often they spring up?) and making a change -in it. -\interaction{tour.merge.clone} -We should now have two copies of \filename{hello.c} with different -contents. The histories of the two repositories have also diverged, -as illustrated in figure~\ref{fig:tour-merge:sep-repos}. -\interaction{tour.merge.cat} - -\begin{figure}[ht] - \centering - \grafix{tour-merge-sep-repos} - \caption{Divergent recent histories of the \dirname{my-hello} and - \dirname{my-new-hello} repositories} - \label{fig:tour-merge:sep-repos} -\end{figure} - -We already know that pulling changes from our \dirname{my-hello} -repository will have no effect on the working directory. -\interaction{tour.merge.pull} -However, the \hgcmd{pull} command says something about ``heads''. - -\subsection{Head changesets} - -A head is a change that has no descendants, or children, as they're -also known. The tip revision is thus a head, because the newest -revision in a repository doesn't have any children, but a repository -can contain more than one head. - -\begin{figure}[ht] - \centering - \grafix{tour-merge-pull} - \caption{Repository contents after pulling from \dirname{my-hello} into - \dirname{my-new-hello}} - \label{fig:tour-merge:pull} -\end{figure} - -In figure~\ref{fig:tour-merge:pull}, you can see the effect of the -pull from \dirname{my-hello} into \dirname{my-new-hello}. The history -that was already present in \dirname{my-new-hello} is untouched, but a -new revision has been added. By referring to -figure~\ref{fig:tour-merge:sep-repos}, we can see that the -\emph{changeset ID} remains the same in the new repository, but the -\emph{revision number} has changed. (This, incidentally, is a fine -example of why it's not safe to use revision numbers when discussing -changesets.) We can view the heads in a repository using the -\hgcmd{heads} command. -\interaction{tour.merge.heads} - -\subsection{Performing the merge} - -What happens if we try to use the normal \hgcmd{update} command to -update to the new tip? -\interaction{tour.merge.update} -Mercurial is telling us that the \hgcmd{update} command won't do a -merge; it won't update the working directory when it thinks we might -be wanting to do a merge, unless we force it to do so. Instead, we -use the \hgcmd{merge} command to merge the two heads. -\interaction{tour.merge.merge} - -\begin{figure}[ht] - \centering - \grafix{tour-merge-merge} - \caption{Working directory and repository during merge, and - following commit} - \label{fig:tour-merge:merge} -\end{figure} - -This updates the working directory so that it contains changes from -\emph{both} heads, which is reflected in both the output of -\hgcmd{parents} and the contents of \filename{hello.c}. -\interaction{tour.merge.parents} - -\subsection{Committing the results of the merge} - -Whenever we've done a merge, \hgcmd{parents} will display two parents -until we \hgcmd{commit} the results of the merge. -\interaction{tour.merge.commit} -We now have a new tip revision; notice that it has \emph{both} of -our former heads as its parents. These are the same revisions that -were previously displayed by \hgcmd{parents}. -\interaction{tour.merge.tip} -In figure~\ref{fig:tour-merge:merge}, you can see a representation of -what happens to the working directory during the merge, and how this -affects the repository when the commit happens. During the merge, the -working directory has two parent changesets, and these become the -parents of the new changeset. - -\section{Merging conflicting changes} - -Most merges are simple affairs, but sometimes you'll find yourself -merging changes where each modifies the same portions of the same -files. Unless both modifications are identical, this results in a -\emph{conflict}, where you have to decide how to reconcile the -different changes into something coherent. - -\begin{figure}[ht] - \centering - \grafix{tour-merge-conflict} - \caption{Conflicting changes to a document} - \label{fig:tour-merge:conflict} -\end{figure} - -Figure~\ref{fig:tour-merge:conflict} illustrates an instance of two -conflicting changes to a document. We started with a single version -of the file; then we made some changes; while someone else made -different changes to the same text. Our task in resolving the -conflicting changes is to decide what the file should look like. - -Mercurial doesn't have a built-in facility for handling conflicts. -Instead, it runs an external program called \command{hgmerge}. This -is a shell script that is bundled with Mercurial; you can change it to -behave however you please. What it does by default is try to find one -of several different merging tools that are likely to be installed on -your system. It first tries a few fully automatic merging tools; if -these don't succeed (because the resolution process requires human -guidance) or aren't present, the script tries a few different -graphical merging tools. - -It's also possible to get Mercurial to run another program or script -instead of \command{hgmerge}, by setting the \envar{HGMERGE} -environment variable to the name of your preferred program. - -\subsection{Using a graphical merge tool} - -My preferred graphical merge tool is \command{kdiff3}, which I'll use -to describe the features that are common to graphical file merging -tools. You can see a screenshot of \command{kdiff3} in action in -figure~\ref{fig:tour-merge:kdiff3}. The kind of merge it is -performing is called a \emph{three-way merge}, because there are three -different versions of the file of interest to us. The tool thus -splits the upper portion of the window into three panes: -\begin{itemize} -\item At the left is the \emph{base} version of the file, i.e.~the - most recent version from which the two versions we're trying to - merge are descended. -\item In the middle is ``our'' version of the file, with the contents - that we modified. -\item On the right is ``their'' version of the file, the one that - from the changeset that we're trying to merge with. -\end{itemize} -In the pane below these is the current \emph{result} of the merge. -Our task is to replace all of the red text, which indicates unresolved -conflicts, with some sensible merger of the ``ours'' and ``theirs'' -versions of the file. - -All four of these panes are \emph{locked together}; if we scroll -vertically or horizontally in any of them, the others are updated to -display the corresponding sections of their respective files. - -\begin{figure}[ht] - \centering - \grafix{kdiff3} - \caption{Using \command{kdiff3} to merge versions of a file} - \label{fig:tour-merge:kdiff3} -\end{figure} - -For each conflicting portion of the file, we can choose to resolve -the conflict using some combination of text from the base version, -ours, or theirs. We can also manually edit the merged file at any -time, in case we need to make further modifications. - -There are \emph{many} file merging tools available, too many to cover -here. They vary in which platforms they are available for, and in -their particular strengths and weaknesses. Most are tuned for merging -files containing plain text, while a few are aimed at specialised file -formats (generally XML). - -\subsection{A worked example} - -In this example, we will reproduce the file modification history of -figure~\ref{fig:tour-merge:conflict} above. Let's begin by creating a -repository with a base version of our document. -\interaction{tour-merge-conflict.wife} -We'll clone the repository and make a change to the file. -\interaction{tour-merge-conflict.cousin} -And another clone, to simulate someone else making a change to the -file. (This hints at the idea that it's not all that unusual to merge -with yourself when you isolate tasks in separate repositories, and -indeed to find and resolve conflicts while doing so.) -\interaction{tour-merge-conflict.son} -Having created two different versions of the file, we'll set up an -environment suitable for running our merge. -\interaction{tour-merge-conflict.pull} - -In this example, I won't use Mercurial's normal \command{hgmerge} -program to do the merge, because it would drop my nice automated -example-running tool into a graphical user interface. Instead, I'll -set \envar{HGMERGE} to tell Mercurial to use the non-interactive -\command{merge} command. This is bundled with many Unix-like systems. -If you're following this example on your computer, don't bother -setting \envar{HGMERGE}. -\interaction{tour-merge-conflict.merge} -Because \command{merge} can't resolve the conflicting changes, it -leaves \emph{merge markers} inside the file that has conflicts, -indicating which lines have conflicts, and whether they came from our -version of the file or theirs. - -Mercurial can tell from the way \command{merge} exits that it wasn't -able to merge successfully, so it tells us what commands we'll need to -run if we want to redo the merging operation. This could be useful -if, for example, we were running a graphical merge tool and quit -because we were confused or realised we had made a mistake. - -If automatic or manual merges fail, there's nothing to prevent us from -``fixing up'' the affected files ourselves, and committing the results -of our merge: -\interaction{tour-merge-conflict.commit} - -\section{Simplifying the pull-merge-commit sequence} -\label{sec:tour-merge:fetch} - -The process of merging changes as outlined above is straightforward, -but requires running three commands in sequence. -\begin{codesample2} - hg pull - hg merge - hg commit -m 'Merged remote changes' -\end{codesample2} -In the case of the final commit, you also need to enter a commit -message, which is almost always going to be a piece of uninteresting -``boilerplate'' text. - -It would be nice to reduce the number of steps needed, if this were -possible. Indeed, Mercurial is distributed with an extension called -\hgext{fetch} that does just this. - -Mercurial provides a flexible extension mechanism that lets people -extend its functionality, while keeping the core of Mercurial small -and easy to deal with. Some extensions add new commands that you can -use from the command line, while others work ``behind the scenes,'' -for example adding capabilities to the server. - -The \hgext{fetch} extension adds a new command called, not -surprisingly, \hgcmd{fetch}. This extension acts as a combination of -\hgcmd{pull}, \hgcmd{update} and \hgcmd{merge}. It begins by pulling -changes from another repository into the current repository. If it -finds that the changes added a new head to the repository, it begins a -merge, then commits the result of the merge with an -automatically-generated commit message. If no new heads were added, -it updates the working directory to the new tip changeset. - -Enabling the \hgext{fetch} extension is easy. Edit your -\sfilename{.hgrc}, and either go to the \rcsection{extensions} section -or create an \rcsection{extensions} section. Then add a line that -simply reads ``\Verb+fetch +''. -\begin{codesample2} - [extensions] - fetch = -\end{codesample2} -(Normally, on the right-hand side of the ``\texttt{=}'' would appear -the location of the extension, but since the \hgext{fetch} extension -is in the standard distribution, Mercurial knows where to search for -it.) - -%%% Local Variables: -%%% mode: latex -%%% TeX-master: "00book" -%%% End: diff -r bc14f94e726a -r 5cd47f721686 en/undo.tex --- a/en/undo.tex Thu Jan 29 22:47:34 2009 -0800 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,767 +0,0 @@ -\chapter{Finding and fixing your mistakes} -\label{chap:undo} - -To err might be human, but to really handle the consequences well -takes a top-notch revision control system. In this chapter, we'll -discuss some of the techniques you can use when you find that a -problem has crept into your project. Mercurial has some highly -capable features that will help you to isolate the sources of -problems, and to handle them appropriately. - -\section{Erasing local history} - -\subsection{The accidental commit} - -I have the occasional but persistent problem of typing rather more -quickly than I can think, which sometimes results in me committing a -changeset that is either incomplete or plain wrong. In my case, the -usual kind of incomplete changeset is one in which I've created a new -source file, but forgotten to \hgcmd{add} it. A ``plain wrong'' -changeset is not as common, but no less annoying. - -\subsection{Rolling back a transaction} -\label{sec:undo:rollback} - -In section~\ref{sec:concepts:txn}, I mentioned that Mercurial treats -each modification of a repository as a \emph{transaction}. Every time -you commit a changeset or pull changes from another repository, -Mercurial remembers what you did. You can undo, or \emph{roll back}, -exactly one of these actions using the \hgcmd{rollback} command. (See -section~\ref{sec:undo:rollback-after-push} for an important caveat -about the use of this command.) - -Here's a mistake that I often find myself making: committing a change -in which I've created a new file, but forgotten to \hgcmd{add} it. -\interaction{rollback.commit} -Looking at the output of \hgcmd{status} after the commit immediately -confirms the error. -\interaction{rollback.status} -The commit captured the changes to the file \filename{a}, but not the -new file \filename{b}. If I were to push this changeset to a -repository that I shared with a colleague, the chances are high that -something in \filename{a} would refer to \filename{b}, which would not -be present in their repository when they pulled my changes. I would -thus become the object of some indignation. - -However, luck is with me---I've caught my error before I pushed the -changeset. I use the \hgcmd{rollback} command, and Mercurial makes -that last changeset vanish. -\interaction{rollback.rollback} -Notice that the changeset is no longer present in the repository's -history, and the working directory once again thinks that the file -\filename{a} is modified. The commit and rollback have left the -working directory exactly as it was prior to the commit; the changeset -has been completely erased. I can now safely \hgcmd{add} the file -\filename{b}, and rerun my commit. -\interaction{rollback.add} - -\subsection{The erroneous pull} - -It's common practice with Mercurial to maintain separate development -branches of a project in different repositories. Your development -team might have one shared repository for your project's ``0.9'' -release, and another, containing different changes, for the ``1.0'' -release. - -Given this, you can imagine that the consequences could be messy if -you had a local ``0.9'' repository, and accidentally pulled changes -from the shared ``1.0'' repository into it. At worst, you could be -paying insufficient attention, and push those changes into the shared -``0.9'' tree, confusing your entire team (but don't worry, we'll -return to this horror scenario later). However, it's more likely that -you'll notice immediately, because Mercurial will display the URL it's -pulling from, or you will see it pull a suspiciously large number of -changes into the repository. - -The \hgcmd{rollback} command will work nicely to expunge all of the -changesets that you just pulled. Mercurial groups all changes from -one \hgcmd{pull} into a single transaction, so one \hgcmd{rollback} is -all you need to undo this mistake. - -\subsection{Rolling back is useless once you've pushed} -\label{sec:undo:rollback-after-push} - -The value of the \hgcmd{rollback} command drops to zero once you've -pushed your changes to another repository. Rolling back a change -makes it disappear entirely, but \emph{only} in the repository in -which you perform the \hgcmd{rollback}. Because a rollback eliminates -history, there's no way for the disappearance of a change to propagate -between repositories. - -If you've pushed a change to another repository---particularly if it's -a shared repository---it has essentially ``escaped into the wild,'' -and you'll have to recover from your mistake in a different way. What -will happen if you push a changeset somewhere, then roll it back, then -pull from the repository you pushed to, is that the changeset will -reappear in your repository. - -(If you absolutely know for sure that the change you want to roll back -is the most recent change in the repository that you pushed to, -\emph{and} you know that nobody else could have pulled it from that -repository, you can roll back the changeset there, too, but you really -should really not rely on this working reliably. If you do this, -sooner or later a change really will make it into a repository that -you don't directly control (or have forgotten about), and come back to -bite you.) - -\subsection{You can only roll back once} - -Mercurial stores exactly one transaction in its transaction log; that -transaction is the most recent one that occurred in the repository. -This means that you can only roll back one transaction. If you expect -to be able to roll back one transaction, then its predecessor, this is -not the behaviour you will get. -\interaction{rollback.twice} -Once you've rolled back one transaction in a repository, you can't -roll back again in that repository until you perform another commit or -pull. - -\section{Reverting the mistaken change} - -If you make a modification to a file, and decide that you really -didn't want to change the file at all, and you haven't yet committed -your changes, the \hgcmd{revert} command is the one you'll need. It -looks at the changeset that's the parent of the working directory, and -restores the contents of the file to their state as of that changeset. -(That's a long-winded way of saying that, in the normal case, it -undoes your modifications.) - -Let's illustrate how the \hgcmd{revert} command works with yet another -small example. We'll begin by modifying a file that Mercurial is -already tracking. -\interaction{daily.revert.modify} -If we don't want that change, we can simply \hgcmd{revert} the file. -\interaction{daily.revert.unmodify} -The \hgcmd{revert} command provides us with an extra degree of safety -by saving our modified file with a \filename{.orig} extension. -\interaction{daily.revert.status} - -Here is a summary of the cases that the \hgcmd{revert} command can -deal with. We will describe each of these in more detail in the -section that follows. -\begin{itemize} -\item If you modify a file, it will restore the file to its unmodified - state. -\item If you \hgcmd{add} a file, it will undo the ``added'' state of - the file, but leave the file itself untouched. -\item If you delete a file without telling Mercurial, it will restore - the file to its unmodified contents. -\item If you use the \hgcmd{remove} command to remove a file, it will - undo the ``removed'' state of the file, and restore the file to its - unmodified contents. -\end{itemize} - -\subsection{File management errors} -\label{sec:undo:mgmt} - -The \hgcmd{revert} command is useful for more than just modified -files. It lets you reverse the results of all of Mercurial's file -management commands---\hgcmd{add}, \hgcmd{remove}, and so on. - -If you \hgcmd{add} a file, then decide that in fact you don't want -Mercurial to track it, use \hgcmd{revert} to undo the add. Don't -worry; Mercurial will not modify the file in any way. It will just -``unmark'' the file. -\interaction{daily.revert.add} - -Similarly, if you ask Mercurial to \hgcmd{remove} a file, you can use -\hgcmd{revert} to restore it to the contents it had as of the parent -of the working directory. -\interaction{daily.revert.remove} -This works just as well for a file that you deleted by hand, without -telling Mercurial (recall that in Mercurial terminology, this kind of -file is called ``missing''). -\interaction{daily.revert.missing} - -If you revert a \hgcmd{copy}, the copied-to file remains in your -working directory afterwards, untracked. Since a copy doesn't affect -the copied-from file in any way, Mercurial doesn't do anything with -the copied-from file. -\interaction{daily.revert.copy} - -\subsubsection{A slightly special case: reverting a rename} - -If you \hgcmd{rename} a file, there is one small detail that -you should remember. When you \hgcmd{revert} a rename, it's not -enough to provide the name of the renamed-to file, as you can see -here. -\interaction{daily.revert.rename} -As you can see from the output of \hgcmd{status}, the renamed-to file -is no longer identified as added, but the renamed-\emph{from} file is -still removed! This is counter-intuitive (at least to me), but at -least it's easy to deal with. -\interaction{daily.revert.rename-orig} -So remember, to revert a \hgcmd{rename}, you must provide \emph{both} -the source and destination names. - -% TODO: the output doesn't look like it will be removed! - -(By the way, if you rename a file, then modify the renamed-to file, -then revert both components of the rename, when Mercurial restores the -file that was removed as part of the rename, it will be unmodified. -If you need the modifications in the renamed-to file to show up in the -renamed-from file, don't forget to copy them over.) - -These fiddly aspects of reverting a rename arguably constitute a small -bug in Mercurial. - -\section{Dealing with committed changes} - -Consider a case where you have committed a change $a$, and another -change $b$ on top of it; you then realise that change $a$ was -incorrect. Mercurial lets you ``back out'' an entire changeset -automatically, and building blocks that let you reverse part of a -changeset by hand. - -Before you read this section, here's something to keep in mind: the -\hgcmd{backout} command undoes changes by \emph{adding} history, not -by modifying or erasing it. It's the right tool to use if you're -fixing bugs, but not if you're trying to undo some change that has -catastrophic consequences. To deal with those, see -section~\ref{sec:undo:aaaiiieee}. - -\subsection{Backing out a changeset} - -The \hgcmd{backout} command lets you ``undo'' the effects of an entire -changeset in an automated fashion. Because Mercurial's history is -immutable, this command \emph{does not} get rid of the changeset you -want to undo. Instead, it creates a new changeset that -\emph{reverses} the effect of the to-be-undone changeset. - -The operation of the \hgcmd{backout} command is a little intricate, so -let's illustrate it with some examples. First, we'll create a -repository with some simple changes. -\interaction{backout.init} - -The \hgcmd{backout} command takes a single changeset ID as its -argument; this is the changeset to back out. Normally, -\hgcmd{backout} will drop you into a text editor to write a commit -message, so you can record why you're backing the change out. In this -example, we provide a commit message on the command line using the -\hgopt{backout}{-m} option. - -\subsection{Backing out the tip changeset} - -We're going to start by backing out the last changeset we committed. -\interaction{backout.simple} -You can see that the second line from \filename{myfile} is no longer -present. Taking a look at the output of \hgcmd{log} gives us an idea -of what the \hgcmd{backout} command has done. -\interaction{backout.simple.log} -Notice that the new changeset that \hgcmd{backout} has created is a -child of the changeset we backed out. It's easier to see this in -figure~\ref{fig:undo:backout}, which presents a graphical view of the -change history. As you can see, the history is nice and linear. - -\begin{figure}[htb] - \centering - \grafix{undo-simple} - \caption{Backing out a change using the \hgcmd{backout} command} - \label{fig:undo:backout} -\end{figure} - -\subsection{Backing out a non-tip change} - -If you want to back out a change other than the last one you -committed, pass the \hgopt{backout}{--merge} option to the -\hgcmd{backout} command. -\interaction{backout.non-tip.clone} -This makes backing out any changeset a ``one-shot'' operation that's -usually simple and fast. -\interaction{backout.non-tip.backout} - -If you take a look at the contents of \filename{myfile} after the -backout finishes, you'll see that the first and third changes are -present, but not the second. -\interaction{backout.non-tip.cat} - -As the graphical history in figure~\ref{fig:undo:backout-non-tip} -illustrates, Mercurial actually commits \emph{two} changes in this -kind of situation (the box-shaped nodes are the ones that Mercurial -commits automatically). Before Mercurial begins the backout process, -it first remembers what the current parent of the working directory -is. It then backs out the target changeset, and commits that as a -changeset. Finally, it merges back to the previous parent of the -working directory, and commits the result of the merge. - -% TODO: to me it looks like mercurial doesn't commit the second merge automatically! - -\begin{figure}[htb] - \centering - \grafix{undo-non-tip} - \caption{Automated backout of a non-tip change using the \hgcmd{backout} command} - \label{fig:undo:backout-non-tip} -\end{figure} - -The result is that you end up ``back where you were'', only with some -extra history that undoes the effect of the changeset you wanted to -back out. - -\subsubsection{Always use the \hgopt{backout}{--merge} option} - -In fact, since the \hgopt{backout}{--merge} option will do the ``right -thing'' whether or not the changeset you're backing out is the tip -(i.e.~it won't try to merge if it's backing out the tip, since there's -no need), you should \emph{always} use this option when you run the -\hgcmd{backout} command. - -\subsection{Gaining more control of the backout process} - -While I've recommended that you always use the -\hgopt{backout}{--merge} option when backing out a change, the -\hgcmd{backout} command lets you decide how to merge a backout -changeset. Taking control of the backout process by hand is something -you will rarely need to do, but it can be useful to understand what -the \hgcmd{backout} command is doing for you automatically. To -illustrate this, let's clone our first repository, but omit the -backout change that it contains. - -\interaction{backout.manual.clone} -As with our earlier example, We'll commit a third changeset, then back -out its parent, and see what happens. -\interaction{backout.manual.backout} -Our new changeset is again a descendant of the changeset we backout -out; it's thus a new head, \emph{not} a descendant of the changeset -that was the tip. The \hgcmd{backout} command was quite explicit in -telling us this. -\interaction{backout.manual.log} - -Again, it's easier to see what has happened by looking at a graph of -the revision history, in figure~\ref{fig:undo:backout-manual}. This -makes it clear that when we use \hgcmd{backout} to back out a change -other than the tip, Mercurial adds a new head to the repository (the -change it committed is box-shaped). - -\begin{figure}[htb] - \centering - \grafix{undo-manual} - \caption{Backing out a change using the \hgcmd{backout} command} - \label{fig:undo:backout-manual} -\end{figure} - -After the \hgcmd{backout} command has completed, it leaves the new -``backout'' changeset as the parent of the working directory. -\interaction{backout.manual.parents} -Now we have two isolated sets of changes. -\interaction{backout.manual.heads} - -Let's think about what we expect to see as the contents of -\filename{myfile} now. The first change should be present, because -we've never backed it out. The second change should be missing, as -that's the change we backed out. Since the history graph shows the -third change as a separate head, we \emph{don't} expect to see the -third change present in \filename{myfile}. -\interaction{backout.manual.cat} -To get the third change back into the file, we just do a normal merge -of our two heads. -\interaction{backout.manual.merge} -Afterwards, the graphical history of our repository looks like -figure~\ref{fig:undo:backout-manual-merge}. - -\begin{figure}[htb] - \centering - \grafix{undo-manual-merge} - \caption{Manually merging a backout change} - \label{fig:undo:backout-manual-merge} -\end{figure} - -\subsection{Why \hgcmd{backout} works as it does} - -Here's a brief description of how the \hgcmd{backout} command works. -\begin{enumerate} -\item It ensures that the working directory is ``clean'', i.e.~that - the output of \hgcmd{status} would be empty. -\item It remembers the current parent of the working directory. Let's - call this changeset \texttt{orig} -\item It does the equivalent of a \hgcmd{update} to sync the working - directory to the changeset you want to back out. Let's call this - changeset \texttt{backout} -\item It finds the parent of that changeset. Let's call that - changeset \texttt{parent}. -\item For each file that the \texttt{backout} changeset affected, it - does the equivalent of a \hgcmdargs{revert}{-r parent} on that file, - to restore it to the contents it had before that changeset was - committed. -\item It commits the result as a new changeset. This changeset has - \texttt{backout} as its parent. -\item If you specify \hgopt{backout}{--merge} on the command line, it - merges with \texttt{orig}, and commits the result of the merge. -\end{enumerate} - -An alternative way to implement the \hgcmd{backout} command would be -to \hgcmd{export} the to-be-backed-out changeset as a diff, then use -the \cmdopt{patch}{--reverse} option to the \command{patch} command to -reverse the effect of the change without fiddling with the working -directory. This sounds much simpler, but it would not work nearly as -well. - -The reason that \hgcmd{backout} does an update, a commit, a merge, and -another commit is to give the merge machinery the best chance to do a -good job when dealing with all the changes \emph{between} the change -you're backing out and the current tip. - -If you're backing out a changeset that's~100 revisions back in your -project's history, the chances that the \command{patch} command will -be able to apply a reverse diff cleanly are not good, because -intervening changes are likely to have ``broken the context'' that -\command{patch} uses to determine whether it can apply a patch (if -this sounds like gibberish, see \ref{sec:mq:patch} for a -discussion of the \command{patch} command). Also, Mercurial's merge -machinery will handle files and directories being renamed, permission -changes, and modifications to binary files, none of which -\command{patch} can deal with. - -\section{Changes that should never have been} -\label{sec:undo:aaaiiieee} - -Most of the time, the \hgcmd{backout} command is exactly what you need -if you want to undo the effects of a change. It leaves a permanent -record of exactly what you did, both when committing the original -changeset and when you cleaned up after it. - -On rare occasions, though, you may find that you've committed a change -that really should not be present in the repository at all. For -example, it would be very unusual, and usually considered a mistake, -to commit a software project's object files as well as its source -files. Object files have almost no intrinsic value, and they're -\emph{big}, so they increase the size of the repository and the amount -of time it takes to clone or pull changes. - -Before I discuss the options that you have if you commit a ``brown -paper bag'' change (the kind that's so bad that you want to pull a -brown paper bag over your head), let me first discuss some approaches -that probably won't work. - -Since Mercurial treats history as accumulative---every change builds -on top of all changes that preceded it---you generally can't just make -disastrous changes disappear. The one exception is when you've just -committed a change, and it hasn't been pushed or pulled into another -repository. That's when you can safely use the \hgcmd{rollback} -command, as I detailed in section~\ref{sec:undo:rollback}. - -After you've pushed a bad change to another repository, you -\emph{could} still use \hgcmd{rollback} to make your local copy of the -change disappear, but it won't have the consequences you want. The -change will still be present in the remote repository, so it will -reappear in your local repository the next time you pull. - -If a situation like this arises, and you know which repositories your -bad change has propagated into, you can \emph{try} to get rid of the -changeefrom \emph{every} one of those repositories. This is, of -course, not a satisfactory solution: if you miss even a single -repository while you're expunging, the change is still ``in the -wild'', and could propagate further. - -If you've committed one or more changes \emph{after} the change that -you'd like to see disappear, your options are further reduced. -Mercurial doesn't provide a way to ``punch a hole'' in history, -leaving changesets intact. - -XXX This needs filling out. The \texttt{hg-replay} script in the -\texttt{examples} directory works, but doesn't handle merge -changesets. Kind of an important omission. - -\subsection{Protect yourself from ``escaped'' changes} - -If you've committed some changes to your local repository and they've -been pushed or pulled somewhere else, this isn't necessarily a -disaster. You can protect yourself ahead of time against some classes -of bad changeset. This is particularly easy if your team usually -pulls changes from a central repository. - -By configuring some hooks on that repository to validate incoming -changesets (see chapter~\ref{chap:hook}), you can automatically -prevent some kinds of bad changeset from being pushed to the central -repository at all. With such a configuration in place, some kinds of -bad changeset will naturally tend to ``die out'' because they can't -propagate into the central repository. Better yet, this happens -without any need for explicit intervention. - -For instance, an incoming change hook that verifies that a changeset -will actually compile can prevent people from inadvertantly ``breaking -the build''. - -\section{Finding the source of a bug} -\label{sec:undo:bisect} - -While it's all very well to be able to back out a changeset that -introduced a bug, this requires that you know which changeset to back -out. Mercurial provides an invaluable command, called -\hgcmd{bisect}, that helps you to automate this process and accomplish -it very efficiently. - -The idea behind the \hgcmd{bisect} command is that a changeset has -introduced some change of behaviour that you can identify with a -simple binary test. You don't know which piece of code introduced the -change, but you know how to test for the presence of the bug. The -\hgcmd{bisect} command uses your test to direct its search for the -changeset that introduced the code that caused the bug. - -Here are a few scenarios to help you understand how you might apply -this command. -\begin{itemize} -\item The most recent version of your software has a bug that you - remember wasn't present a few weeks ago, but you don't know when it - was introduced. Here, your binary test checks for the presence of - that bug. -\item You fixed a bug in a rush, and now it's time to close the entry - in your team's bug database. The bug database requires a changeset - ID when you close an entry, but you don't remember which changeset - you fixed the bug in. Once again, your binary test checks for the - presence of the bug. -\item Your software works correctly, but runs~15\% slower than the - last time you measured it. You want to know which changeset - introduced the performance regression. In this case, your binary - test measures the performance of your software, to see whether it's - ``fast'' or ``slow''. -\item The sizes of the components of your project that you ship - exploded recently, and you suspect that something changed in the way - you build your project. -\end{itemize} - -From these examples, it should be clear that the \hgcmd{bisect} -command is not useful only for finding the sources of bugs. You can -use it to find any ``emergent property'' of a repository (anything -that you can't find from a simple text search of the files in the -tree) for which you can write a binary test. - -We'll introduce a little bit of terminology here, just to make it -clear which parts of the search process are your responsibility, and -which are Mercurial's. A \emph{test} is something that \emph{you} run -when \hgcmd{bisect} chooses a changeset. A \emph{probe} is what -\hgcmd{bisect} runs to tell whether a revision is good. Finally, -we'll use the word ``bisect'', as both a noun and a verb, to stand in -for the phrase ``search using the \hgcmd{bisect} command. - -One simple way to automate the searching process would be simply to -probe every changeset. However, this scales poorly. If it took ten -minutes to test a single changeset, and you had 10,000 changesets in -your repository, the exhaustive approach would take on average~35 -\emph{days} to find the changeset that introduced a bug. Even if you -knew that the bug was introduced by one of the last 500 changesets, -and limited your search to those, you'd still be looking at over 40 -hours to find the changeset that introduced your bug. - -What the \hgcmd{bisect} command does is use its knowledge of the -``shape'' of your project's revision history to perform a search in -time proportional to the \emph{logarithm} of the number of changesets -to check (the kind of search it performs is called a dichotomic -search). With this approach, searching through 10,000 changesets will -take less than three hours, even at ten minutes per test (the search -will require about 14 tests). Limit your search to the last hundred -changesets, and it will take only about an hour (roughly seven tests). - -The \hgcmd{bisect} command is aware of the ``branchy'' nature of a -Mercurial project's revision history, so it has no problems dealing -with branches, merges, or multiple heads in a repository. It can -prune entire branches of history with a single probe, which is how it -operates so efficiently. - -\subsection{Using the \hgcmd{bisect} command} - -Here's an example of \hgcmd{bisect} in action. - -\begin{note} - In versions 0.9.5 and earlier of Mercurial, \hgcmd{bisect} was not a - core command: it was distributed with Mercurial as an extension. - This section describes the built-in command, not the old extension. -\end{note} - -Now let's create a repository, so that we can try out the -\hgcmd{bisect} command in isolation. -\interaction{bisect.init} -We'll simulate a project that has a bug in it in a simple-minded way: -create trivial changes in a loop, and nominate one specific change -that will have the ``bug''. This loop creates 35 changesets, each -adding a single file to the repository. We'll represent our ``bug'' -with a file that contains the text ``i have a gub''. -\interaction{bisect.commits} - -The next thing that we'd like to do is figure out how to use the -\hgcmd{bisect} command. We can use Mercurial's normal built-in help -mechanism for this. -\interaction{bisect.help} - -The \hgcmd{bisect} command works in steps. Each step proceeds as follows. -\begin{enumerate} -\item You run your binary test. - \begin{itemize} - \item If the test succeeded, you tell \hgcmd{bisect} by running the - \hgcmdargs{bisect}{good} command. - \item If it failed, run the \hgcmdargs{bisect}{--bad} command. - \end{itemize} -\item The command uses your information to decide which changeset to - test next. -\item It updates the working directory to that changeset, and the - process begins again. -\end{enumerate} -The process ends when \hgcmd{bisect} identifies a unique changeset -that marks the point where your test transitioned from ``succeeding'' -to ``failing''. - -To start the search, we must run the \hgcmdargs{bisect}{--reset} command. -\interaction{bisect.search.init} - -In our case, the binary test we use is simple: we check to see if any -file in the repository contains the string ``i have a gub''. If it -does, this changeset contains the change that ``caused the bug''. By -convention, a changeset that has the property we're searching for is -``bad'', while one that doesn't is ``good''. - -Most of the time, the revision to which the working directory is -synced (usually the tip) already exhibits the problem introduced by -the buggy change, so we'll mark it as ``bad''. -\interaction{bisect.search.bad-init} - -Our next task is to nominate a changeset that we know \emph{doesn't} -have the bug; the \hgcmd{bisect} command will ``bracket'' its search -between the first pair of good and bad changesets. In our case, we -know that revision~10 didn't have the bug. (I'll have more words -about choosing the first ``good'' changeset later.) -\interaction{bisect.search.good-init} - -Notice that this command printed some output. -\begin{itemize} -\item It told us how many changesets it must consider before it can - identify the one that introduced the bug, and how many tests that - will require. -\item It updated the working directory to the next changeset to test, - and told us which changeset it's testing. -\end{itemize} - -We now run our test in the working directory. We use the -\command{grep} command to see if our ``bad'' file is present in the -working directory. If it is, this revision is bad; if not, this -revision is good. -\interaction{bisect.search.step1} - -This test looks like a perfect candidate for automation, so let's turn -it into a shell function. -\interaction{bisect.search.mytest} -We can now run an entire test step with a single command, -\texttt{mytest}. -\interaction{bisect.search.step2} -A few more invocations of our canned test step command, and we're -done. -\interaction{bisect.search.rest} - -Even though we had~40 changesets to search through, the \hgcmd{bisect} -command let us find the changeset that introduced our ``bug'' with -only five tests. Because the number of tests that the \hgcmd{bisect} -command performs grows logarithmically with the number of changesets to -search, the advantage that it has over the ``brute force'' search -approach increases with every changeset you add. - -\subsection{Cleaning up after your search} - -When you're finished using the \hgcmd{bisect} command in a -repository, you can use the \hgcmdargs{bisect}{reset} command to drop -the information it was using to drive your search. The command -doesn't use much space, so it doesn't matter if you forget to run this -command. However, \hgcmd{bisect} won't let you start a new search in -that repository until you do a \hgcmdargs{bisect}{reset}. -\interaction{bisect.search.reset} - -\section{Tips for finding bugs effectively} - -\subsection{Give consistent input} - -The \hgcmd{bisect} command requires that you correctly report the -result of every test you perform. If you tell it that a test failed -when it really succeeded, it \emph{might} be able to detect the -inconsistency. If it can identify an inconsistency in your reports, -it will tell you that a particular changeset is both good and bad. -However, it can't do this perfectly; it's about as likely to report -the wrong changeset as the source of the bug. - -\subsection{Automate as much as possible} - -When I started using the \hgcmd{bisect} command, I tried a few times -to run my tests by hand, on the command line. This is an approach -that I, at least, am not suited to. After a few tries, I found that I -was making enough mistakes that I was having to restart my searches -several times before finally getting correct results. - -My initial problems with driving the \hgcmd{bisect} command by hand -occurred even with simple searches on small repositories; if the -problem you're looking for is more subtle, or the number of tests that -\hgcmd{bisect} must perform increases, the likelihood of operator -error ruining the search is much higher. Once I started automating my -tests, I had much better results. - -The key to automated testing is twofold: -\begin{itemize} -\item always test for the same symptom, and -\item always feed consistent input to the \hgcmd{bisect} command. -\end{itemize} -In my tutorial example above, the \command{grep} command tests for the -symptom, and the \texttt{if} statement takes the result of this check -and ensures that we always feed the same input to the \hgcmd{bisect} -command. The \texttt{mytest} function marries these together in a -reproducible way, so that every test is uniform and consistent. - -\subsection{Check your results} - -Because the output of a \hgcmd{bisect} search is only as good as the -input you give it, don't take the changeset it reports as the -absolute truth. A simple way to cross-check its report is to manually -run your test at each of the following changesets: -\begin{itemize} -\item The changeset that it reports as the first bad revision. Your - test should still report this as bad. -\item The parent of that changeset (either parent, if it's a merge). - Your test should report this changeset as good. -\item A child of that changeset. Your test should report this - changeset as bad. -\end{itemize} - -\subsection{Beware interference between bugs} - -It's possible that your search for one bug could be disrupted by the -presence of another. For example, let's say your software crashes at -revision 100, and worked correctly at revision 50. Unknown to you, -someone else introduced a different crashing bug at revision 60, and -fixed it at revision 80. This could distort your results in one of -several ways. - -It is possible that this other bug completely ``masks'' yours, which -is to say that it occurs before your bug has a chance to manifest -itself. If you can't avoid that other bug (for example, it prevents -your project from building), and so can't tell whether your bug is -present in a particular changeset, the \hgcmd{bisect} command cannot -help you directly. Instead, you can mark a changeset as untested by -running \hgcmdargs{bisect}{--skip}. - -A different problem could arise if your test for a bug's presence is -not specific enough. If you check for ``my program crashes'', then -both your crashing bug and an unrelated crashing bug that masks it -will look like the same thing, and mislead \hgcmd{bisect}. - -Another useful situation in which to use \hgcmdargs{bisect}{--skip} is -if you can't test a revision because your project was in a broken and -hence untestable state at that revision, perhaps because someone -checked in a change that prevented the project from building. - -\subsection{Bracket your search lazily} - -Choosing the first ``good'' and ``bad'' changesets that will mark the -end points of your search is often easy, but it bears a little -discussion nevertheless. From the perspective of \hgcmd{bisect}, the -``newest'' changeset is conventionally ``bad'', and the older -changeset is ``good''. - -If you're having trouble remembering when a suitable ``good'' change -was, so that you can tell \hgcmd{bisect}, you could do worse than -testing changesets at random. Just remember to eliminate contenders -that can't possibly exhibit the bug (perhaps because the feature with -the bug isn't present yet) and those where another problem masks the -bug (as I discussed above). - -Even if you end up ``early'' by thousands of changesets or months of -history, you will only add a handful of tests to the total number that -\hgcmd{bisect} must perform, thanks to its logarithmic behaviour. - -%%% Local Variables: -%%% mode: latex -%%% TeX-master: "00book" -%%% End: