Mercurial > hgbook
changeset 682:28b5a5befb08
Fold preface and intro into one
author | Bryan O'Sullivan <bos@serpentine.com> |
---|---|
date | Thu, 19 Mar 2009 20:54:12 -0700 |
parents | 5bfa0df6aaed |
children | c838b3975bc6 |
files | en/00book.xml en/ch00-preface.xml en/ch01-intro.xml en/ch01-tour-basic.xml en/ch02-tour-basic.xml en/ch02-tour-merge.xml en/ch03-concepts.xml en/ch03-tour-merge.xml en/ch04-concepts.xml en/ch04-daily.xml en/ch05-collab.xml en/ch05-daily.xml en/ch06-collab.xml en/ch06-filenames.xml en/ch07-branch.xml en/ch07-filenames.xml en/ch08-branch.xml en/ch08-undo.xml en/ch09-hook.xml en/ch09-undo.xml en/ch10-hook.xml en/ch10-template.xml en/ch11-mq.xml en/ch11-template.xml en/ch12-mq-collab.xml en/ch12-mq.xml en/ch13-hgext.xml en/ch13-mq-collab.xml en/ch14-hgext.xml |
diffstat | 29 files changed, 11778 insertions(+), 11784 deletions(-) [+] |
line wrap: on
line diff
--- a/en/00book.xml Wed Mar 18 00:08:22 2009 -0700 +++ b/en/00book.xml Thu Mar 19 20:54:12 2009 -0700 @@ -7,20 +7,20 @@ <!-- Chapters. --> -<!ENTITY ch01 SYSTEM "ch01-intro.xml"> -<!ENTITY ch02 SYSTEM "ch02-tour-basic.xml"> -<!ENTITY ch03 SYSTEM "ch03-tour-merge.xml"> -<!ENTITY ch04 SYSTEM "ch04-concepts.xml"> -<!ENTITY ch05 SYSTEM "ch05-daily.xml"> -<!ENTITY ch06 SYSTEM "ch06-collab.xml"> -<!ENTITY ch07 SYSTEM "ch07-filenames.xml"> -<!ENTITY ch08 SYSTEM "ch08-branch.xml"> -<!ENTITY ch09 SYSTEM "ch09-undo.xml"> -<!ENTITY ch10 SYSTEM "ch10-hook.xml"> -<!ENTITY ch11 SYSTEM "ch11-template.xml"> -<!ENTITY ch12 SYSTEM "ch12-mq.xml"> -<!ENTITY ch13 SYSTEM "ch13-mq-collab.xml"> -<!ENTITY ch14 SYSTEM "ch14-hgext.xml"> +<!ENTITY ch00 SYSTEM "ch00-preface.xml"> +<!ENTITY ch01 SYSTEM "ch01-tour-basic.xml"> +<!ENTITY ch02 SYSTEM "ch02-tour-merge.xml"> +<!ENTITY ch03 SYSTEM "ch03-concepts.xml"> +<!ENTITY ch04 SYSTEM "ch04-daily.xml"> +<!ENTITY ch05 SYSTEM "ch05-collab.xml"> +<!ENTITY ch06 SYSTEM "ch06-filenames.xml"> +<!ENTITY ch07 SYSTEM "ch07-branch.xml"> +<!ENTITY ch08 SYSTEM "ch08-undo.xml"> +<!ENTITY ch09 SYSTEM "ch09-hook.xml"> +<!ENTITY ch10 SYSTEM "ch10-template.xml"> +<!ENTITY ch11 SYSTEM "ch11-mq.xml"> +<!ENTITY ch12 SYSTEM "ch12-mq-collab.xml"> +<!ENTITY ch13 SYSTEM "ch13-hgext.xml"> <!ENTITY appA SYSTEM "appA-cmdref.xml"> <!ENTITY appB SYSTEM "appB-mq-ref.xml"> <!ENTITY appC SYSTEM "appC-srcinstall.xml"> @@ -74,7 +74,6 @@ &ch11; &ch12; &ch13; - &ch14; <!-- &appA; --> &appB; &appC;
--- a/en/ch00-preface.xml Wed Mar 18 00:08:22 2009 -0700 +++ b/en/ch00-preface.xml Thu Mar 19 20:54:12 2009 -0700 @@ -3,23 +3,139 @@ <preface id="chap:preface"> <title>Preface</title> - <para>Distributed revision control is a relatively new territory, - and has thus far grown due to people's willingness to strike out - into ill-charted territory.</para> + <sect1> + <title>Why revision control? Why Mercurial?</title> + + <para>Revision control is the process of managing multiple + versions of a piece of information. In its simplest form, this + is something that many people do by hand: every time you modify + a file, save it under a new name that contains a number, each + one higher than the number of the preceding version.</para> + + <para>Manually managing multiple versions of even a single file is + an error-prone task, though, so software tools to help automate + this process have long been available. The earliest automated + revision control tools were intended to help a single user to + manage revisions of a single file. Over the past few decades, + the scope of revision control tools has expanded greatly; they + now manage multiple files, and help multiple people to work + together. The best modern revision control tools have no + problem coping with thousands of people working together on + projects that consist of hundreds of thousands of files.</para> + + <para>The arrival of distributed revision control is relatively + recent, and so far this new field has grown due to people's + willingness to explore ill-charted territory.</para> + + <para>I am writing a book about distributed revision control + because I believe that it is an important subject that deserves + a field guide. I chose to write about Mercurial because it is + the easiest tool to learn the terrain with, and yet it scales to + the demands of real, challenging environments where many other + revision control tools buckle.</para> + + <sect2> + <title>Why use revision control?</title> + + <para>There are a number of reasons why you or your team might + want to use an automated revision control tool for a + project.</para> - <para>I am writing a book about distributed revision control because - I believe that it is an important subject that deserves a field - guide. I chose to write about Mercurial because it is the easiest - tool to learn the terrain with, and yet it scales to the demands - of real, challenging environments where many other revision - control tools fail.</para> + <itemizedlist> + <listitem><para>It will track the history and evolution of + your project, so you don't have to. For every change, + you'll have a log of <emphasis>who</emphasis> made it; + <emphasis>why</emphasis> they made it; + <emphasis>when</emphasis> they made it; and + <emphasis>what</emphasis> the change + was.</para></listitem> + <listitem><para>When you're working with other people, + revision control software makes it easier for you to + collaborate. For example, when people more or less + simultaneously make potentially incompatible changes, the + software will help you to identify and resolve those + conflicts.</para></listitem> + <listitem><para>It can help you to recover from mistakes. If + you make a change that later turns out to be in error, you + can revert to an earlier version of one or more files. In + fact, a <emphasis>really</emphasis> good revision control + tool will even help you to efficiently figure out exactly + when a problem was introduced (see section <xref + linkend="sec:undo:bisect"/> for details).</para></listitem> + <listitem><para>It will help you to work simultaneously on, + and manage the drift between, multiple versions of your + project.</para></listitem> + </itemizedlist> + + <para>Most of these reasons are equally valid---at least in + theory---whether you're working on a project by yourself, or + with a hundred other people.</para> + + <para>A key question about the practicality of revision control + at these two different scales (<quote>lone hacker</quote> and + <quote>huge team</quote>) is how its + <emphasis>benefits</emphasis> compare to its + <emphasis>costs</emphasis>. A revision control tool that's + difficult to understand or use is going to impose a high + cost.</para> + + <para>A five-hundred-person project is likely to collapse under + its own weight almost immediately without a revision control + tool and process. In this case, the cost of using revision + control might hardly seem worth considering, since + <emphasis>without</emphasis> it, failure is almost + guaranteed.</para> + + <para>On the other hand, a one-person <quote>quick hack</quote> + might seem like a poor place to use a revision control tool, + because surely the cost of using one must be close to the + overall cost of the project. Right?</para> + + <para>Mercurial uniquely supports <emphasis>both</emphasis> of + these scales of development. You can learn the basics in just + a few minutes, and due to its low overhead, you can apply + revision control to the smallest of projects with ease. Its + simplicity means you won't have a lot of abstruse concepts or + command sequences competing for mental space with whatever + you're <emphasis>really</emphasis> trying to do. At the same + time, Mercurial's high performance and peer-to-peer nature let + you scale painlessly to handle large projects.</para> + + <para>No revision control tool can rescue a poorly run project, + but a good choice of tools can make a huge difference to the + fluidity with which you can work on a project.</para> + + </sect2> + + <sect2> + <title>The many names of revision control</title> + + <para>Revision control is a diverse field, so much so that it is + referred to by many names and acronyms. Here are a few of the + more common variations you'll encounter:</para> + <itemizedlist> + <listitem><para>Revision control (RCS)</para></listitem> + <listitem><para>Software configuration management (SCM), or + configuration management</para></listitem> + <listitem><para>Source code management</para></listitem> + <listitem><para>Source code control, or source + control</para></listitem> + <listitem><para>Version control + (VCS)</para></listitem></itemizedlist> + <para>Some people claim that these terms actually have different + meanings, but in practice they overlap so much that there's no + agreed or even useful way to tease them apart.</para> + + </sect2> + </sect1> <sect1> <title>This book is a work in progress</title> <para>I am releasing this book while I am still writing it, in the - hope that it will prove useful to others. I also hope that - readers will contribute as they see fit.</para> + hope that it will prove useful to others. I am writing under an + open license in the hope that you, my readers, will contribute + feedback and perhaps content of your own.</para> </sect1> <sect1> @@ -59,8 +175,567 @@ seeing is consistent and reproducible.</para> </sect1> + <sect1> - <title>Colophon---this book is Free</title> + <title>Trends in the field</title> + + <para>There has been an unmistakable trend in the development and + use of revision control tools over the past four decades, as + people have become familiar with the capabilities of their tools + and constrained by their limitations.</para> + + <para>The first generation began by managing single files on + individual computers. Although these tools represented a huge + advance over ad-hoc manual revision control, their locking model + and reliance on a single computer limited them to small, + tightly-knit teams.</para> + + <para>The second generation loosened these constraints by moving + to network-centered architectures, and managing entire projects + at a time. As projects grew larger, they ran into new problems. + With clients needing to talk to servers very frequently, server + scaling became an issue for large projects. An unreliable + network connection could prevent remote users from being able to + talk to the server at all. As open source projects started + making read-only access available anonymously to anyone, people + without commit privileges found that they could not use the + tools to interact with a project in a natural way, as they could + not record their changes.</para> + + <para>The current generation of revision control tools is + peer-to-peer in nature. All of these systems have dropped the + dependency on a single central server, and allow people to + distribute their revision control data to where it's actually + needed. Collaboration over the Internet has moved from + constrained by technology to a matter of choice and consensus. + Modern tools can operate offline indefinitely and autonomously, + with a network connection only needed when syncing changes with + another repository.</para> + + </sect1> + <sect1> + <title>A few of the advantages of distributed revision + control</title> + + <para>Even though distributed revision control tools have for + several years been as robust and usable as their + previous-generation counterparts, people using older tools have + not yet necessarily woken up to their advantages. There are a + number of ways in which distributed tools shine relative to + centralised ones.</para> + + <para>For an individual developer, distributed tools are almost + always much faster than centralised tools. This is for a simple + reason: a centralised tool needs to talk over the network for + many common operations, because most metadata is stored in a + single copy on the central server. A distributed tool stores + all of its metadata locally. All else being equal, talking over + the network adds overhead to a centralised tool. Don't + underestimate the value of a snappy, responsive tool: you're + going to spend a lot of time interacting with your revision + control software.</para> + + <para>Distributed tools are indifferent to the vagaries of your + server infrastructure, again because they replicate metadata to + so many locations. If you use a centralised system and your + server catches fire, you'd better hope that your backup media + are reliable, and that your last backup was recent and actually + worked. With a distributed tool, you have many backups + available on every contributor's computer.</para> + + <para>The reliability of your network will affect distributed + tools far less than it will centralised tools. You can't even + use a centralised tool without a network connection, except for + a few highly constrained commands. With a distributed tool, if + your network connection goes down while you're working, you may + not even notice. The only thing you won't be able to do is talk + to repositories on other computers, something that is relatively + rare compared with local operations. If you have a far-flung + team of collaborators, this may be significant.</para> + + <sect2> + <title>Advantages for open source projects</title> + + <para>If you take a shine to an open source project and decide + that you would like to start hacking on it, and that project + uses a distributed revision control tool, you are at once a + peer with the people who consider themselves the + <quote>core</quote> of that project. If they publish their + repositories, you can immediately copy their project history, + start making changes, and record your work, using the same + tools in the same ways as insiders. By contrast, with a + centralised tool, you must use the software in a <quote>read + only</quote> mode unless someone grants you permission to + commit changes to their central server. Until then, you won't + be able to record changes, and your local modifications will + be at risk of corruption any time you try to update your + client's view of the repository.</para> + + <sect3> + <title>The forking non-problem</title> + + <para>It has been suggested that distributed revision control + tools pose some sort of risk to open source projects because + they make it easy to <quote>fork</quote> the development of + a project. A fork happens when there are differences in + opinion or attitude between groups of developers that cause + them to decide that they can't work together any longer. + Each side takes a more or less complete copy of the + project's source code, and goes off in its own + direction.</para> + + <para>Sometimes the camps in a fork decide to reconcile their + differences. With a centralised revision control system, the + <emphasis>technical</emphasis> process of reconciliation is + painful, and has to be performed largely by hand. You have + to decide whose revision history is going to + <quote>win</quote>, and graft the other team's changes into + the tree somehow. This usually loses some or all of one + side's revision history.</para> + + <para>What distributed tools do with respect to forking is + they make forking the <emphasis>only</emphasis> way to + develop a project. Every single change that you make is + potentially a fork point. The great strength of this + approach is that a distributed revision control tool has to + be really good at <emphasis>merging</emphasis> forks, + because forks are absolutely fundamental: they happen all + the time.</para> + + <para>If every piece of work that everybody does, all the + time, is framed in terms of forking and merging, then what + the open source world refers to as a <quote>fork</quote> + becomes <emphasis>purely</emphasis> a social issue. If + anything, distributed tools <emphasis>lower</emphasis> the + likelihood of a fork:</para> + <itemizedlist> + <listitem><para>They eliminate the social distinction that + centralised tools impose: that between insiders (people + with commit access) and outsiders (people + without).</para></listitem> + <listitem><para>They make it easier to reconcile after a + social fork, because all that's involved from the + perspective of the revision control software is just + another merge.</para></listitem></itemizedlist> + + <para>Some people resist distributed tools because they want + to retain tight control over their projects, and they + believe that centralised tools give them this control. + However, if you're of this belief, and you publish your CVS + or Subversion repositories publicly, there are plenty of + tools available that can pull out your entire project's + history (albeit slowly) and recreate it somewhere that you + don't control. So while your control in this case is + illusory, you are forgoing the ability to fluidly + collaborate with whatever people feel compelled to mirror + and fork your history.</para> + + </sect3> + </sect2> + <sect2> + <title>Advantages for commercial projects</title> + + <para>Many commercial projects are undertaken by teams that are + scattered across the globe. Contributors who are far from a + central server will see slower command execution and perhaps + less reliability. Commercial revision control systems attempt + to ameliorate these problems with remote-site replication + add-ons that are typically expensive to buy and cantankerous + to administer. A distributed system doesn't suffer from these + problems in the first place. Better yet, you can easily set + up multiple authoritative servers, say one per site, so that + there's no redundant communication between repositories over + expensive long-haul network links.</para> + + <para>Centralised revision control systems tend to have + relatively low scalability. It's not unusual for an expensive + centralised system to fall over under the combined load of + just a few dozen concurrent users. Once again, the typical + response tends to be an expensive and clunky replication + facility. Since the load on a central server---if you have + one at all---is many times lower with a distributed tool + (because all of the data is replicated everywhere), a single + cheap server can handle the needs of a much larger team, and + replication to balance load becomes a simple matter of + scripting.</para> + + <para>If you have an employee in the field, troubleshooting a + problem at a customer's site, they'll benefit from distributed + revision control. The tool will let them generate custom + builds, try different fixes in isolation from each other, and + search efficiently through history for the sources of bugs and + regressions in the customer's environment, all without needing + to connect to your company's network.</para> + + </sect2> + </sect1> + <sect1> + <title>Why choose Mercurial?</title> + + <para>Mercurial has a unique set of properties that make it a + particularly good choice as a revision control system.</para> + <itemizedlist> + <listitem><para>It is easy to learn and use.</para></listitem> + <listitem><para>It is lightweight.</para></listitem> + <listitem><para>It scales excellently.</para></listitem> + <listitem><para>It is easy to + customise.</para></listitem></itemizedlist> + + <para>If you are at all familiar with revision control systems, + you should be able to get up and running with Mercurial in less + than five minutes. Even if not, it will take no more than a few + minutes longer. Mercurial's command and feature sets are + generally uniform and consistent, so you can keep track of a few + general rules instead of a host of exceptions.</para> + + <para>On a small project, you can start working with Mercurial in + moments. Creating new changes and branches; transferring changes + around (whether locally or over a network); and history and + status operations are all fast. Mercurial attempts to stay + nimble and largely out of your way by combining low cognitive + overhead with blazingly fast operations.</para> + + <para>The usefulness of Mercurial is not limited to small + projects: it is used by projects with hundreds to thousands of + contributors, each containing tens of thousands of files and + hundreds of megabytes of source code.</para> + + <para>If the core functionality of Mercurial is not enough for + you, it's easy to build on. Mercurial is well suited to + scripting tasks, and its clean internals and implementation in + Python make it easy to add features in the form of extensions. + There are a number of popular and useful extensions already + available, ranging from helping to identify bugs to improving + performance.</para> + + </sect1> + <sect1> + <title>Mercurial compared with other tools</title> + + <para>Before you read on, please understand that this section + necessarily reflects my own experiences, interests, and (dare I + say it) biases. I have used every one of the revision control + tools listed below, in most cases for several years at a + time.</para> + + + <sect2> + <title>Subversion</title> + + <para>Subversion is a popular revision control tool, developed + to replace CVS. It has a centralised client/server + architecture.</para> + + <para>Subversion and Mercurial have similarly named commands for + performing the same operations, so if you're familiar with + one, it is easy to learn to use the other. Both tools are + portable to all popular operating systems.</para> + + <para>Prior to version 1.5, Subversion had no useful support for + merges. At the time of writing, its merge tracking capability + is new, and known to be <ulink + url="http://svnbook.red-bean.com/nightly/en/svn.branchmerge.advanced.html#svn.branchmerge.advanced.finalword">complicated + and buggy</ulink>.</para> + + <para>Mercurial has a substantial performance advantage over + Subversion on every revision control operation I have + benchmarked. I have measured its advantage as ranging from a + factor of two to a factor of six when compared with Subversion + 1.4.3's <emphasis>ra_local</emphasis> file store, which is the + fastest access method available. In more realistic + deployments involving a network-based store, Subversion will + be at a substantially larger disadvantage. Because many + Subversion commands must talk to the server and Subversion + does not have useful replication facilities, server capacity + and network bandwidth become bottlenecks for modestly large + projects.</para> + + <para>Additionally, Subversion incurs substantial storage + overhead to avoid network transactions for a few common + operations, such as finding modified files + (<literal>status</literal>) and displaying modifications + against the current revision (<literal>diff</literal>). As a + result, a Subversion working copy is often the same size as, + or larger than, a Mercurial repository and working directory, + even though the Mercurial repository contains a complete + history of the project.</para> + + <para>Subversion is widely supported by third party tools. + Mercurial currently lags considerably in this area. This gap + is closing, however, and indeed some of Mercurial's GUI tools + now outshine their Subversion equivalents. Like Mercurial, + Subversion has an excellent user manual.</para> + + <para>Because Subversion doesn't store revision history on the + client, it is well suited to managing projects that deal with + lots of large, opaque binary files. If you check in fifty + revisions to an incompressible 10MB file, Subversion's + client-side space usage stays constant The space used by any + distributed SCM will grow rapidly in proportion to the number + of revisions, because the differences between each revision + are large.</para> + + <para>In addition, it's often difficult or, more usually, + impossible to merge different versions of a binary file. + Subversion's ability to let a user lock a file, so that they + temporarily have the exclusive right to commit changes to it, + can be a significant advantage to a project where binary files + are widely used.</para> + + <para>Mercurial can import revision history from a Subversion + repository. It can also export revision history to a + Subversion repository. This makes it easy to <quote>test the + waters</quote> and use Mercurial and Subversion in parallel + before deciding to switch. History conversion is incremental, + so you can perform an initial conversion, then small + additional conversions afterwards to bring in new + changes.</para> + + + </sect2> + <sect2> + <title>Git</title> + + <para>Git is a distributed revision control tool that was + developed for managing the Linux kernel source tree. Like + Mercurial, its early design was somewhat influenced by + Monotone.</para> + + <para>Git has a very large command set, with version 1.5.0 + providing 139 individual commands. It has something of a + reputation for being difficult to learn. Compared to Git, + Mercurial has a strong focus on simplicity.</para> + + <para>In terms of performance, Git is extremely fast. In + several cases, it is faster than Mercurial, at least on Linux, + while Mercurial performs better on other operations. However, + on Windows, the performance and general level of support that + Git provides is, at the time of writing, far behind that of + Mercurial.</para> + + <para>While a Mercurial repository needs no maintenance, a Git + repository requires frequent manual <quote>repacks</quote> of + its metadata. Without these, performance degrades, while + space usage grows rapidly. A server that contains many Git + repositories that are not rigorously and frequently repacked + will become heavily disk-bound during backups, and there have + been instances of daily backups taking far longer than 24 + hours as a result. A freshly packed Git repository is + slightly smaller than a Mercurial repository, but an unpacked + repository is several orders of magnitude larger.</para> + + <para>The core of Git is written in C. Many Git commands are + implemented as shell or Perl scripts, and the quality of these + scripts varies widely. I have encountered several instances + where scripts charged along blindly in the presence of errors + that should have been fatal.</para> + + <para>Mercurial can import revision history from a Git + repository.</para> + + + </sect2> + <sect2> + <title>CVS</title> + + <para>CVS is probably the most widely used revision control tool + in the world. Due to its age and internal untidiness, it has + been only lightly maintained for many years.</para> + + <para>It has a centralised client/server architecture. It does + not group related file changes into atomic commits, making it + easy for people to <quote>break the build</quote>: one person + can successfully commit part of a change and then be blocked + by the need for a merge, causing other people to see only a + portion of the work they intended to do. This also affects + how you work with project history. If you want to see all of + the modifications someone made as part of a task, you will + need to manually inspect the descriptions and timestamps of + the changes made to each file involved (if you even know what + those files were).</para> + + <para>CVS has a muddled notion of tags and branches that I will + not attempt to even describe. It does not support renaming of + files or directories well, making it easy to corrupt a + repository. It has almost no internal consistency checking + capabilities, so it is usually not even possible to tell + whether or how a repository is corrupt. I would not recommend + CVS for any project, existing or new.</para> + + <para>Mercurial can import CVS revision history. However, there + are a few caveats that apply; these are true of every other + revision control tool's CVS importer, too. Due to CVS's lack + of atomic changes and unversioned filesystem hierarchy, it is + not possible to reconstruct CVS history completely accurately; + some guesswork is involved, and renames will usually not show + up. Because a lot of advanced CVS administration has to be + done by hand and is hence error-prone, it's common for CVS + importers to run into multiple problems with corrupted + repositories (completely bogus revision timestamps and files + that have remained locked for over a decade are just two of + the less interesting problems I can recall from personal + experience).</para> + + <para>Mercurial can import revision history from a CVS + repository.</para> + + + </sect2> + <sect2> + <title>Commercial tools</title> + + <para>Perforce has a centralised client/server architecture, + with no client-side caching of any data. Unlike modern + revision control tools, Perforce requires that a user run a + command to inform the server about every file they intend to + edit.</para> + + <para>The performance of Perforce is quite good for small teams, + but it falls off rapidly as the number of users grows beyond a + few dozen. Modestly large Perforce installations require the + deployment of proxies to cope with the load their users + generate.</para> + + + </sect2> + <sect2> + <title>Choosing a revision control tool</title> + + <para>With the exception of CVS, all of the tools listed above + have unique strengths that suit them to particular styles of + work. There is no single revision control tool that is best + in all situations.</para> + + <para>As an example, Subversion is a good choice for working + with frequently edited binary files, due to its centralised + nature and support for file locking.</para> + + <para>I personally find Mercurial's properties of simplicity, + performance, and good merge support to be a compelling + combination that has served me well for several years.</para> + + + </sect2> + </sect1> + <sect1> + <title>Switching from another tool to Mercurial</title> + + <para>Mercurial is bundled with an extension named <literal + role="hg-ext">convert</literal>, which can incrementally + import revision history from several other revision control + tools. By <quote>incremental</quote>, I mean that you can + convert all of a project's history to date in one go, then rerun + the conversion later to obtain new changes that happened after + the initial conversion.</para> + + <para>The revision control tools supported by <literal + role="hg-ext">convert</literal> are as follows:</para> + <itemizedlist> + <listitem><para>Subversion</para></listitem> + <listitem><para>CVS</para></listitem> + <listitem><para>Git</para></listitem> + <listitem><para>Darcs</para></listitem></itemizedlist> + + <para>In addition, <literal role="hg-ext">convert</literal> can + export changes from Mercurial to Subversion. This makes it + possible to try Subversion and Mercurial in parallel before + committing to a switchover, without risking the loss of any + work.</para> + + <para>The <command role="hg-ext-convert">convert</command> command + is easy to use. Simply point it at the path or URL of the + source repository, optionally give it the name of the + destination repository, and it will start working. After the + initial conversion, just run the same command again to import + new changes.</para> + </sect1> + + <sect1> + <title>A short history of revision control</title> + + <para>The best known of the old-time revision control tools is + SCCS (Source Code Control System), which Marc Rochkind wrote at + Bell Labs, in the early 1970s. SCCS operated on individual + files, and required every person working on a project to have + access to a shared workspace on a single system. Only one + person could modify a file at any time; arbitration for access + to files was via locks. It was common for people to lock files, + and later forget to unlock them, preventing anyone else from + modifying those files without the help of an + administrator.</para> + + <para>Walter Tichy developed a free alternative to SCCS in the + early 1980s; he called his program RCS (Revision Control System). + Like SCCS, RCS required developers to work in a single shared + workspace, and to lock files to prevent multiple people from + modifying them simultaneously.</para> + + <para>Later in the 1980s, Dick Grune used RCS as a building block + for a set of shell scripts he initially called cmt, but then + renamed to CVS (Concurrent Versions System). The big innovation + of CVS was that it let developers work simultaneously and + somewhat independently in their own personal workspaces. The + personal workspaces prevented developers from stepping on each + other's toes all the time, as was common with SCCS and RCS. Each + developer had a copy of every project file, and could modify + their copies independently. They had to merge their edits prior + to committing changes to the central repository.</para> + + <para>Brian Berliner took Grune's original scripts and rewrote + them in C, releasing in 1989 the code that has since developed + into the modern version of CVS. CVS subsequently acquired the + ability to operate over a network connection, giving it a + client/server architecture. CVS's architecture is centralised; + only the server has a copy of the history of the project. Client + workspaces just contain copies of recent versions of the + project's files, and a little metadata to tell them where the + server is. CVS has been enormously successful; it is probably + the world's most widely used revision control system.</para> + + <para>In the early 1990s, Sun Microsystems developed an early + distributed revision control system, called TeamWare. A + TeamWare workspace contains a complete copy of the project's + history. TeamWare has no notion of a central repository. (CVS + relied upon RCS for its history storage; TeamWare used + SCCS.)</para> + + <para>As the 1990s progressed, awareness grew of a number of + problems with CVS. It records simultaneous changes to multiple + files individually, instead of grouping them together as a + single logically atomic operation. It does not manage its file + hierarchy well; it is easy to make a mess of a repository by + renaming files and directories. Worse, its source code is + difficult to read and maintain, which made the <quote>pain + level</quote> of fixing these architectural problems + prohibitive.</para> + + <para>In 2001, Jim Blandy and Karl Fogel, two developers who had + worked on CVS, started a project to replace it with a tool that + would have a better architecture and cleaner code. The result, + Subversion, does not stray from CVS's centralised client/server + model, but it adds multi-file atomic commits, better namespace + management, and a number of other features that make it a + generally better tool than CVS. Since its initial release, it + has rapidly grown in popularity.</para> + + <para>More or less simultaneously, Graydon Hoare began working on + an ambitious distributed revision control system that he named + Monotone. While Monotone addresses many of CVS's design flaws + and has a peer-to-peer architecture, it goes beyond earlier (and + subsequent) revision control tools in a number of innovative + ways. It uses cryptographic hashes as identifiers, and has an + integral notion of <quote>trust</quote> for code from different + sources.</para> + + <para>Mercurial began life in 2005. While a few aspects of its + design are influenced by Monotone, Mercurial focuses on ease of + use, high performance, and scalability to very large + projects.</para> + + </sect1> + + <sect1> + <title>Colophon&emdash;this book is Free</title> <para>This book is licensed under the Open Publication License, and is produced entirely using Free Software tools. It is
--- a/en/ch01-intro.xml Wed Mar 18 00:08:22 2009 -0700 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,680 +0,0 @@ -<!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : --> - -<chapter id="chap:intro"> - <?dbhtml filename="introduction.html"?> - <title>Introduction</title> - - <sect1> - <title>About revision control</title> - - <para>Revision control is the process of managing multiple - versions of a piece of information. In its simplest form, this - is something that many people do by hand: every time you modify - a file, save it under a new name that contains a number, each - one higher than the number of the preceding version.</para> - - <para>Manually managing multiple versions of even a single file is - an error-prone task, though, so software tools to help automate - this process have long been available. The earliest automated - revision control tools were intended to help a single user to - manage revisions of a single file. Over the past few decades, - the scope of revision control tools has expanded greatly; they - now manage multiple files, and help multiple people to work - together. The best modern revision control tools have no - problem coping with thousands of people working together on - projects that consist of hundreds of thousands of files.</para> - - <sect2> - <title>Why use revision control?</title> - - <para>There are a number of reasons why you or your team might - want to use an automated revision control tool for a - project.</para> - <itemizedlist> - <listitem><para>It will track the history and evolution of - your project, so you don't have to. For every change, - you'll have a log of <emphasis>who</emphasis> made it; - <emphasis>why</emphasis> they made it; - <emphasis>when</emphasis> they made it; and - <emphasis>what</emphasis> the change - was.</para></listitem> - <listitem><para>When you're working with other people, - revision control software makes it easier for you to - collaborate. For example, when people more or less - simultaneously make potentially incompatible changes, the - software will help you to identify and resolve those - conflicts.</para></listitem> - <listitem><para>It can help you to recover from mistakes. If - you make a change that later turns out to be in error, you - can revert to an earlier version of one or more files. In - fact, a <emphasis>really</emphasis> good revision control - tool will even help you to efficiently figure out exactly - when a problem was introduced (see section <xref - linkend="sec:undo:bisect"/> for details).</para></listitem> - <listitem><para>It will help you to work simultaneously on, - and manage the drift between, multiple versions of your - project.</para></listitem></itemizedlist> - <para>Most of these reasons are equally valid---at least in - theory---whether you're working on a project by yourself, or - with a hundred other people.</para> - - <para>A key question about the practicality of revision control - at these two different scales (<quote>lone hacker</quote> and - <quote>huge team</quote>) is how its - <emphasis>benefits</emphasis> compare to its - <emphasis>costs</emphasis>. A revision control tool that's - difficult to understand or use is going to impose a high - cost.</para> - - <para>A five-hundred-person project is likely to collapse under - its own weight almost immediately without a revision control - tool and process. In this case, the cost of using revision - control might hardly seem worth considering, since - <emphasis>without</emphasis> it, failure is almost - guaranteed.</para> - - <para>On the other hand, a one-person <quote>quick hack</quote> - might seem like a poor place to use a revision control tool, - because surely the cost of using one must be close to the - overall cost of the project. Right?</para> - - <para>Mercurial uniquely supports <emphasis>both</emphasis> of - these scales of development. You can learn the basics in just - a few minutes, and due to its low overhead, you can apply - revision control to the smallest of projects with ease. Its - simplicity means you won't have a lot of abstruse concepts or - command sequences competing for mental space with whatever - you're <emphasis>really</emphasis> trying to do. At the same - time, Mercurial's high performance and peer-to-peer nature let - you scale painlessly to handle large projects.</para> - - <para>No revision control tool can rescue a poorly run project, - but a good choice of tools can make a huge difference to the - fluidity with which you can work on a project.</para> - - </sect2> - <sect2> - <title>The many names of revision control</title> - - <para>Revision control is a diverse field, so much so that it - doesn't actually have a single name or acronym. Here are a - few of the more common names and acronyms you'll - encounter:</para> - <itemizedlist> - <listitem><para>Revision control (RCS)</para></listitem> - <listitem><para>Software configuration management (SCM), or - configuration management</para></listitem> - <listitem><para>Source code management</para></listitem> - <listitem><para>Source code control, or source - control</para></listitem> - <listitem><para>Version control - (VCS)</para></listitem></itemizedlist> - <para>Some people claim that these terms actually have different - meanings, but in practice they overlap so much that there's no - agreed or even useful way to tease them apart.</para> - - </sect2> - </sect1> - <sect1> - <title>A short history of revision control</title> - - <para>The best known of the old-time revision control tools is - SCCS (Source Code Control System), which Marc Rochkind wrote at - Bell Labs, in the early 1970s. SCCS operated on individual - files, and required every person working on a project to have - access to a shared workspace on a single system. Only one - person could modify a file at any time; arbitration for access - to files was via locks. It was common for people to lock files, - and later forget to unlock them, preventing anyone else from - modifying those files without the help of an - administrator.</para> - - <para>Walter Tichy developed a free alternative to SCCS in the - early 1980s; he called his program RCS (Revision Control System). - Like SCCS, RCS required developers to work in a single shared - workspace, and to lock files to prevent multiple people from - modifying them simultaneously.</para> - - <para>Later in the 1980s, Dick Grune used RCS as a building block - for a set of shell scripts he initially called cmt, but then - renamed to CVS (Concurrent Versions System). The big innovation - of CVS was that it let developers work simultaneously and - somewhat independently in their own personal workspaces. The - personal workspaces prevented developers from stepping on each - other's toes all the time, as was common with SCCS and RCS. Each - developer had a copy of every project file, and could modify - their copies independently. They had to merge their edits prior - to committing changes to the central repository.</para> - - <para>Brian Berliner took Grune's original scripts and rewrote - them in C, releasing in 1989 the code that has since developed - into the modern version of CVS. CVS subsequently acquired the - ability to operate over a network connection, giving it a - client/server architecture. CVS's architecture is centralised; - only the server has a copy of the history of the project. Client - workspaces just contain copies of recent versions of the - project's files, and a little metadata to tell them where the - server is. CVS has been enormously successful; it is probably - the world's most widely used revision control system.</para> - - <para>In the early 1990s, Sun Microsystems developed an early - distributed revision control system, called TeamWare. A - TeamWare workspace contains a complete copy of the project's - history. TeamWare has no notion of a central repository. (CVS - relied upon RCS for its history storage; TeamWare used - SCCS.)</para> - - <para>As the 1990s progressed, awareness grew of a number of - problems with CVS. It records simultaneous changes to multiple - files individually, instead of grouping them together as a - single logically atomic operation. It does not manage its file - hierarchy well; it is easy to make a mess of a repository by - renaming files and directories. Worse, its source code is - difficult to read and maintain, which made the <quote>pain - level</quote> of fixing these architectural problems - prohibitive.</para> - - <para>In 2001, Jim Blandy and Karl Fogel, two developers who had - worked on CVS, started a project to replace it with a tool that - would have a better architecture and cleaner code. The result, - Subversion, does not stray from CVS's centralised client/server - model, but it adds multi-file atomic commits, better namespace - management, and a number of other features that make it a - generally better tool than CVS. Since its initial release, it - has rapidly grown in popularity.</para> - - <para>More or less simultaneously, Graydon Hoare began working on - an ambitious distributed revision control system that he named - Monotone. While Monotone addresses many of CVS's design flaws - and has a peer-to-peer architecture, it goes beyond earlier (and - subsequent) revision control tools in a number of innovative - ways. It uses cryptographic hashes as identifiers, and has an - integral notion of <quote>trust</quote> for code from different - sources.</para> - - <para>Mercurial began life in 2005. While a few aspects of its - design are influenced by Monotone, Mercurial focuses on ease of - use, high performance, and scalability to very large - projects.</para> - - </sect1> - <sect1> - <title>Trends in revision control</title> - - <para>There has been an unmistakable trend in the development and - use of revision control tools over the past four decades, as - people have become familiar with the capabilities of their tools - and constrained by their limitations.</para> - - <para>The first generation began by managing single files on - individual computers. Although these tools represented a huge - advance over ad-hoc manual revision control, their locking model - and reliance on a single computer limited them to small, - tightly-knit teams.</para> - - <para>The second generation loosened these constraints by moving - to network-centered architectures, and managing entire projects - at a time. As projects grew larger, they ran into new problems. - With clients needing to talk to servers very frequently, server - scaling became an issue for large projects. An unreliable - network connection could prevent remote users from being able to - talk to the server at all. As open source projects started - making read-only access available anonymously to anyone, people - without commit privileges found that they could not use the - tools to interact with a project in a natural way, as they could - not record their changes.</para> - - <para>The current generation of revision control tools is - peer-to-peer in nature. All of these systems have dropped the - dependency on a single central server, and allow people to - distribute their revision control data to where it's actually - needed. Collaboration over the Internet has moved from - constrained by technology to a matter of choice and consensus. - Modern tools can operate offline indefinitely and autonomously, - with a network connection only needed when syncing changes with - another repository.</para> - - </sect1> - <sect1> - <title>A few of the advantages of distributed revision - control</title> - - <para>Even though distributed revision control tools have for - several years been as robust and usable as their - previous-generation counterparts, people using older tools have - not yet necessarily woken up to their advantages. There are a - number of ways in which distributed tools shine relative to - centralised ones.</para> - - <para>For an individual developer, distributed tools are almost - always much faster than centralised tools. This is for a simple - reason: a centralised tool needs to talk over the network for - many common operations, because most metadata is stored in a - single copy on the central server. A distributed tool stores - all of its metadata locally. All else being equal, talking over - the network adds overhead to a centralised tool. Don't - underestimate the value of a snappy, responsive tool: you're - going to spend a lot of time interacting with your revision - control software.</para> - - <para>Distributed tools are indifferent to the vagaries of your - server infrastructure, again because they replicate metadata to - so many locations. If you use a centralised system and your - server catches fire, you'd better hope that your backup media - are reliable, and that your last backup was recent and actually - worked. With a distributed tool, you have many backups - available on every contributor's computer.</para> - - <para>The reliability of your network will affect distributed - tools far less than it will centralised tools. You can't even - use a centralised tool without a network connection, except for - a few highly constrained commands. With a distributed tool, if - your network connection goes down while you're working, you may - not even notice. The only thing you won't be able to do is talk - to repositories on other computers, something that is relatively - rare compared with local operations. If you have a far-flung - team of collaborators, this may be significant.</para> - - <sect2> - <title>Advantages for open source projects</title> - - <para>If you take a shine to an open source project and decide - that you would like to start hacking on it, and that project - uses a distributed revision control tool, you are at once a - peer with the people who consider themselves the - <quote>core</quote> of that project. If they publish their - repositories, you can immediately copy their project history, - start making changes, and record your work, using the same - tools in the same ways as insiders. By contrast, with a - centralised tool, you must use the software in a <quote>read - only</quote> mode unless someone grants you permission to - commit changes to their central server. Until then, you won't - be able to record changes, and your local modifications will - be at risk of corruption any time you try to update your - client's view of the repository.</para> - - <sect3> - <title>The forking non-problem</title> - - <para>It has been suggested that distributed revision control - tools pose some sort of risk to open source projects because - they make it easy to <quote>fork</quote> the development of - a project. A fork happens when there are differences in - opinion or attitude between groups of developers that cause - them to decide that they can't work together any longer. - Each side takes a more or less complete copy of the - project's source code, and goes off in its own - direction.</para> - - <para>Sometimes the camps in a fork decide to reconcile their - differences. With a centralised revision control system, the - <emphasis>technical</emphasis> process of reconciliation is - painful, and has to be performed largely by hand. You have - to decide whose revision history is going to - <quote>win</quote>, and graft the other team's changes into - the tree somehow. This usually loses some or all of one - side's revision history.</para> - - <para>What distributed tools do with respect to forking is - they make forking the <emphasis>only</emphasis> way to - develop a project. Every single change that you make is - potentially a fork point. The great strength of this - approach is that a distributed revision control tool has to - be really good at <emphasis>merging</emphasis> forks, - because forks are absolutely fundamental: they happen all - the time.</para> - - <para>If every piece of work that everybody does, all the - time, is framed in terms of forking and merging, then what - the open source world refers to as a <quote>fork</quote> - becomes <emphasis>purely</emphasis> a social issue. If - anything, distributed tools <emphasis>lower</emphasis> the - likelihood of a fork:</para> - <itemizedlist> - <listitem><para>They eliminate the social distinction that - centralised tools impose: that between insiders (people - with commit access) and outsiders (people - without).</para></listitem> - <listitem><para>They make it easier to reconcile after a - social fork, because all that's involved from the - perspective of the revision control software is just - another merge.</para></listitem></itemizedlist> - - <para>Some people resist distributed tools because they want - to retain tight control over their projects, and they - believe that centralised tools give them this control. - However, if you're of this belief, and you publish your CVS - or Subversion repositories publicly, there are plenty of - tools available that can pull out your entire project's - history (albeit slowly) and recreate it somewhere that you - don't control. So while your control in this case is - illusory, you are forgoing the ability to fluidly - collaborate with whatever people feel compelled to mirror - and fork your history.</para> - - </sect3> - </sect2> - <sect2> - <title>Advantages for commercial projects</title> - - <para>Many commercial projects are undertaken by teams that are - scattered across the globe. Contributors who are far from a - central server will see slower command execution and perhaps - less reliability. Commercial revision control systems attempt - to ameliorate these problems with remote-site replication - add-ons that are typically expensive to buy and cantankerous - to administer. A distributed system doesn't suffer from these - problems in the first place. Better yet, you can easily set - up multiple authoritative servers, say one per site, so that - there's no redundant communication between repositories over - expensive long-haul network links.</para> - - <para>Centralised revision control systems tend to have - relatively low scalability. It's not unusual for an expensive - centralised system to fall over under the combined load of - just a few dozen concurrent users. Once again, the typical - response tends to be an expensive and clunky replication - facility. Since the load on a central server---if you have - one at all---is many times lower with a distributed tool - (because all of the data is replicated everywhere), a single - cheap server can handle the needs of a much larger team, and - replication to balance load becomes a simple matter of - scripting.</para> - - <para>If you have an employee in the field, troubleshooting a - problem at a customer's site, they'll benefit from distributed - revision control. The tool will let them generate custom - builds, try different fixes in isolation from each other, and - search efficiently through history for the sources of bugs and - regressions in the customer's environment, all without needing - to connect to your company's network.</para> - - </sect2> - </sect1> - <sect1> - <title>Why choose Mercurial?</title> - - <para>Mercurial has a unique set of properties that make it a - particularly good choice as a revision control system.</para> - <itemizedlist> - <listitem><para>It is easy to learn and use.</para></listitem> - <listitem><para>It is lightweight.</para></listitem> - <listitem><para>It scales excellently.</para></listitem> - <listitem><para>It is easy to - customise.</para></listitem></itemizedlist> - - <para>If you are at all familiar with revision control systems, - you should be able to get up and running with Mercurial in less - than five minutes. Even if not, it will take no more than a few - minutes longer. Mercurial's command and feature sets are - generally uniform and consistent, so you can keep track of a few - general rules instead of a host of exceptions.</para> - - <para>On a small project, you can start working with Mercurial in - moments. Creating new changes and branches; transferring changes - around (whether locally or over a network); and history and - status operations are all fast. Mercurial attempts to stay - nimble and largely out of your way by combining low cognitive - overhead with blazingly fast operations.</para> - - <para>The usefulness of Mercurial is not limited to small - projects: it is used by projects with hundreds to thousands of - contributors, each containing tens of thousands of files and - hundreds of megabytes of source code.</para> - - <para>If the core functionality of Mercurial is not enough for - you, it's easy to build on. Mercurial is well suited to - scripting tasks, and its clean internals and implementation in - Python make it easy to add features in the form of extensions. - There are a number of popular and useful extensions already - available, ranging from helping to identify bugs to improving - performance.</para> - - </sect1> - <sect1> - <title>Mercurial compared with other tools</title> - - <para>Before you read on, please understand that this section - necessarily reflects my own experiences, interests, and (dare I - say it) biases. I have used every one of the revision control - tools listed below, in most cases for several years at a - time.</para> - - - <sect2> - <title>Subversion</title> - - <para>Subversion is a popular revision control tool, developed - to replace CVS. It has a centralised client/server - architecture.</para> - - <para>Subversion and Mercurial have similarly named commands for - performing the same operations, so if you're familiar with - one, it is easy to learn to use the other. Both tools are - portable to all popular operating systems.</para> - - <para>Prior to version 1.5, Subversion had no useful support for - merges. At the time of writing, its merge tracking capability - is new, and known to be <ulink - url="http://svnbook.red-bean.com/nightly/en/svn.branchmerge.advanced.html#svn.branchmerge.advanced.finalword">complicated - and buggy</ulink>.</para> - - <para>Mercurial has a substantial performance advantage over - Subversion on every revision control operation I have - benchmarked. I have measured its advantage as ranging from a - factor of two to a factor of six when compared with Subversion - 1.4.3's <emphasis>ra_local</emphasis> file store, which is the - fastest access method available. In more realistic - deployments involving a network-based store, Subversion will - be at a substantially larger disadvantage. Because many - Subversion commands must talk to the server and Subversion - does not have useful replication facilities, server capacity - and network bandwidth become bottlenecks for modestly large - projects.</para> - - <para>Additionally, Subversion incurs substantial storage - overhead to avoid network transactions for a few common - operations, such as finding modified files - (<literal>status</literal>) and displaying modifications - against the current revision (<literal>diff</literal>). As a - result, a Subversion working copy is often the same size as, - or larger than, a Mercurial repository and working directory, - even though the Mercurial repository contains a complete - history of the project.</para> - - <para>Subversion is widely supported by third party tools. - Mercurial currently lags considerably in this area. This gap - is closing, however, and indeed some of Mercurial's GUI tools - now outshine their Subversion equivalents. Like Mercurial, - Subversion has an excellent user manual.</para> - - <para>Because Subversion doesn't store revision history on the - client, it is well suited to managing projects that deal with - lots of large, opaque binary files. If you check in fifty - revisions to an incompressible 10MB file, Subversion's - client-side space usage stays constant The space used by any - distributed SCM will grow rapidly in proportion to the number - of revisions, because the differences between each revision - are large.</para> - - <para>In addition, it's often difficult or, more usually, - impossible to merge different versions of a binary file. - Subversion's ability to let a user lock a file, so that they - temporarily have the exclusive right to commit changes to it, - can be a significant advantage to a project where binary files - are widely used.</para> - - <para>Mercurial can import revision history from a Subversion - repository. It can also export revision history to a - Subversion repository. This makes it easy to <quote>test the - waters</quote> and use Mercurial and Subversion in parallel - before deciding to switch. History conversion is incremental, - so you can perform an initial conversion, then small - additional conversions afterwards to bring in new - changes.</para> - - - </sect2> - <sect2> - <title>Git</title> - - <para>Git is a distributed revision control tool that was - developed for managing the Linux kernel source tree. Like - Mercurial, its early design was somewhat influenced by - Monotone.</para> - - <para>Git has a very large command set, with version 1.5.0 - providing 139 individual commands. It has something of a - reputation for being difficult to learn. Compared to Git, - Mercurial has a strong focus on simplicity.</para> - - <para>In terms of performance, Git is extremely fast. In - several cases, it is faster than Mercurial, at least on Linux, - while Mercurial performs better on other operations. However, - on Windows, the performance and general level of support that - Git provides is, at the time of writing, far behind that of - Mercurial.</para> - - <para>While a Mercurial repository needs no maintenance, a Git - repository requires frequent manual <quote>repacks</quote> of - its metadata. Without these, performance degrades, while - space usage grows rapidly. A server that contains many Git - repositories that are not rigorously and frequently repacked - will become heavily disk-bound during backups, and there have - been instances of daily backups taking far longer than 24 - hours as a result. A freshly packed Git repository is - slightly smaller than a Mercurial repository, but an unpacked - repository is several orders of magnitude larger.</para> - - <para>The core of Git is written in C. Many Git commands are - implemented as shell or Perl scripts, and the quality of these - scripts varies widely. I have encountered several instances - where scripts charged along blindly in the presence of errors - that should have been fatal.</para> - - <para>Mercurial can import revision history from a Git - repository.</para> - - - </sect2> - <sect2> - <title>CVS</title> - - <para>CVS is probably the most widely used revision control tool - in the world. Due to its age and internal untidiness, it has - been only lightly maintained for many years.</para> - - <para>It has a centralised client/server architecture. It does - not group related file changes into atomic commits, making it - easy for people to <quote>break the build</quote>: one person - can successfully commit part of a change and then be blocked - by the need for a merge, causing other people to see only a - portion of the work they intended to do. This also affects - how you work with project history. If you want to see all of - the modifications someone made as part of a task, you will - need to manually inspect the descriptions and timestamps of - the changes made to each file involved (if you even know what - those files were).</para> - - <para>CVS has a muddled notion of tags and branches that I will - not attempt to even describe. It does not support renaming of - files or directories well, making it easy to corrupt a - repository. It has almost no internal consistency checking - capabilities, so it is usually not even possible to tell - whether or how a repository is corrupt. I would not recommend - CVS for any project, existing or new.</para> - - <para>Mercurial can import CVS revision history. However, there - are a few caveats that apply; these are true of every other - revision control tool's CVS importer, too. Due to CVS's lack - of atomic changes and unversioned filesystem hierarchy, it is - not possible to reconstruct CVS history completely accurately; - some guesswork is involved, and renames will usually not show - up. Because a lot of advanced CVS administration has to be - done by hand and is hence error-prone, it's common for CVS - importers to run into multiple problems with corrupted - repositories (completely bogus revision timestamps and files - that have remained locked for over a decade are just two of - the less interesting problems I can recall from personal - experience).</para> - - <para>Mercurial can import revision history from a CVS - repository.</para> - - - </sect2> - <sect2> - <title>Commercial tools</title> - - <para>Perforce has a centralised client/server architecture, - with no client-side caching of any data. Unlike modern - revision control tools, Perforce requires that a user run a - command to inform the server about every file they intend to - edit.</para> - - <para>The performance of Perforce is quite good for small teams, - but it falls off rapidly as the number of users grows beyond a - few dozen. Modestly large Perforce installations require the - deployment of proxies to cope with the load their users - generate.</para> - - - </sect2> - <sect2> - <title>Choosing a revision control tool</title> - - <para>With the exception of CVS, all of the tools listed above - have unique strengths that suit them to particular styles of - work. There is no single revision control tool that is best - in all situations.</para> - - <para>As an example, Subversion is a good choice for working - with frequently edited binary files, due to its centralised - nature and support for file locking.</para> - - <para>I personally find Mercurial's properties of simplicity, - performance, and good merge support to be a compelling - combination that has served me well for several years.</para> - - - </sect2> - </sect1> - <sect1> - <title>Switching from another tool to Mercurial</title> - - <para>Mercurial is bundled with an extension named <literal - role="hg-ext">convert</literal>, which can incrementally - import revision history from several other revision control - tools. By <quote>incremental</quote>, I mean that you can - convert all of a project's history to date in one go, then rerun - the conversion later to obtain new changes that happened after - the initial conversion.</para> - - <para>The revision control tools supported by <literal - role="hg-ext">convert</literal> are as follows:</para> - <itemizedlist> - <listitem><para>Subversion</para></listitem> - <listitem><para>CVS</para></listitem> - <listitem><para>Git</para></listitem> - <listitem><para>Darcs</para></listitem></itemizedlist> - - <para>In addition, <literal role="hg-ext">convert</literal> can - export changes from Mercurial to Subversion. This makes it - possible to try Subversion and Mercurial in parallel before - committing to a switchover, without risking the loss of any - work.</para> - - <para>The <command role="hg-ext-conver">convert</command> command - is easy to use. Simply point it at the path or URL of the - source repository, optionally give it the name of the - destination repository, and it will start working. After the - initial conversion, just run the same command again to import - new changes.</para> - </sect1> -</chapter> - -<!-- -local variables: -sgml-parent-document: ("00book.xml" "book" "chapter") -end: --->
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/en/ch01-tour-basic.xml Thu Mar 19 20:54:12 2009 -0700 @@ -0,0 +1,860 @@ +<!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : --> + +<chapter id="chap:tour-basic"> + <?dbhtml filename="a-tour-of-mercurial-the-basics.html"?> + <title>A tour of Mercurial: the basics</title> + + <sect1 id="sec:tour:install"> + <title>Installing Mercurial on your system</title> + + <para>Prebuilt binary packages of Mercurial are available for + every popular operating system. These make it easy to start + using Mercurial on your computer immediately.</para> + + <sect2> + <title>Linux</title> + + <para>Because each Linux distribution has its own packaging + tools, policies, and rate of development, it's difficult to + give a comprehensive set of instructions on how to install + Mercurial binaries. The version of Mercurial that you will + end up with can vary depending on how active the person is who + maintains the package for your distribution.</para> + + <para>To keep things simple, I will focus on installing + Mercurial from the command line under the most popular Linux + distributions. Most of these distributions provide graphical + package managers that will let you install Mercurial with a + single click; the package name to look for is + <literal>mercurial</literal>.</para> + + <itemizedlist> + <listitem><para>Debian:</para> + <programlisting>apt-get install mercurial</programlisting></listitem> + <listitem><para>Fedora Core:</para> + <programlisting>yum install mercurial</programlisting></listitem> + <listitem><para>Gentoo:</para> + <programlisting>emerge mercurial</programlisting></listitem> + <listitem><para>OpenSUSE:</para> + <programlisting>yum install mercurial</programlisting></listitem> + <listitem><para>Ubuntu: Ubuntu's Mercurial package is based on + Debian's. To install it, run the following + command.</para> + <programlisting>apt-get install mercurial</programlisting></listitem> + </itemizedlist> + + </sect2> + <sect2> + <title>Solaris</title> + + <para>SunFreeWare, at <ulink + url="http://www.sunfreeware.com">http://www.sunfreeware.com</ulink>, + is a good source for a large number of pre-built Solaris + packages for 32 and 64 bit Intel and Sparc architectures, + including current versions of Mercurial.</para> + + </sect2> + <sect2> + <title>Mac OS X</title> + + <para>Lee Cantey publishes an installer of Mercurial for Mac OS + X at <ulink + url="http://mercurial.berkwood.com">http://mercurial.berkwood.com</ulink>. + This package works on both Intel- and Power-based Macs. Before + you can use it, you must install a compatible version of + Universal MacPython <citation>web:macpython</citation>. This + is easy to do; simply follow the instructions on Lee's + site.</para> + + <para>It's also possible to install Mercurial using Fink or + MacPorts, two popular free package managers for Mac OS X. If + you have Fink, use <command>sudo apt-get install + mercurial-py25</command>. If MacPorts, <command>sudo port + install mercurial</command>.</para> + + </sect2> + <sect2> + <title>Windows</title> + + <para>Lee Cantey publishes an installer of Mercurial for Windows + at <ulink + url="http://mercurial.berkwood.com">http://mercurial.berkwood.com</ulink>. + This package has no external dependencies; it <quote>just + works</quote>.</para> + + <note> + <para> The Windows version of Mercurial does not + automatically convert line endings between Windows and Unix + styles. If you want to share work with Unix users, you must + do a little additional configuration work. XXX Flesh this + out.</para> + </note> + + </sect2> + </sect1> + <sect1> + <title>Getting started</title> + + <para>To begin, we'll use the <command role="hg-cmd">hg + version</command> command to find out whether Mercurial is + actually installed properly. The actual version information + that it prints isn't so important; it's whether it prints + anything at all that we care about.</para> + + &interaction.tour.version; + + <sect2> + <title>Built-in help</title> + + <para>Mercurial provides a built-in help system. This is + invaluable for those times when you find yourself stuck + trying to remember how to run a command. If you are + completely stuck, simply run <command role="hg-cmd">hg + help</command>; it will print a brief list of commands, + along with a description of what each does. If you ask for + help on a specific command (as below), it prints more + detailed information.</para> + + &interaction.tour.help; + + <para>For a more impressive level of detail (which you won't + usually need) run <command role="hg-cmd">hg help <option + role="hg-opt-global">-v</option></command>. The <option + role="hg-opt-global">-v</option> option is short for + <option role="hg-opt-global">--verbose</option>, and tells + Mercurial to print more information than it usually + would.</para> + + </sect2> + </sect1> + <sect1> + <title>Working with a repository</title> + + <para>In Mercurial, everything happens inside a + <emphasis>repository</emphasis>. The repository for a project + contains all of the files that <quote>belong to</quote> that + project, along with a historical record of the project's + files.</para> + + <para>There's nothing particularly magical about a repository; it + is simply a directory tree in your filesystem that Mercurial + treats as special. You can rename or delete a repository any + time you like, using either the command line or your file + browser.</para> + + <sect2> + <title>Making a local copy of a repository</title> + + <para><emphasis>Copying</emphasis> a repository is just a little + bit special. While you could use a normal file copying + command to make a copy of a repository, it's best to use a + built-in command that Mercurial provides. This command is + called <command role="hg-cmd">hg clone</command>, because it + creates an identical copy of an existing repository.</para> + + &interaction.tour.clone; + + <para>If our clone succeeded, we should now have a local + directory called <filename class="directory">hello</filename>. + This directory will contain some files.</para> + + &interaction.tour.ls; + + <para>These files have the same contents and history in our + repository as they do in the repository we cloned.</para> + + <para>Every Mercurial repository is complete, self-contained, + and independent. It contains its own private copy of a + project's files and history. A cloned repository remembers + the location of the repository it was cloned from, but it does + not communicate with that repository, or any other, unless you + tell it to.</para> + + <para>What this means for now is that we're free to experiment + with our repository, safe in the knowledge that it's a private + <quote>sandbox</quote> that won't affect anyone else.</para> + + </sect2> + <sect2> + <title>What's in a repository?</title> + + <para>When we take a more detailed look inside a repository, we + can see that it contains a directory named <filename + class="directory">.hg</filename>. This is where Mercurial + keeps all of its metadata for the repository.</para> + + &interaction.tour.ls-a; + + <para>The contents of the <filename + class="directory">.hg</filename> directory and its + subdirectories are private to Mercurial. Every other file and + directory in the repository is yours to do with as you + please.</para> + + <para>To introduce a little terminology, the <filename + class="directory">.hg</filename> directory is the + <quote>real</quote> repository, and all of the files and + directories that coexist with it are said to live in the + <emphasis>working directory</emphasis>. An easy way to + remember the distinction is that the + <emphasis>repository</emphasis> contains the + <emphasis>history</emphasis> of your project, while the + <emphasis>working directory</emphasis> contains a + <emphasis>snapshot</emphasis> of your project at a particular + point in history.</para> + + </sect2> + </sect1> + <sect1> + <title>A tour through history</title> + + <para>One of the first things we might want to do with a new, + unfamiliar repository is understand its history. The <command + role="hg-cmd">hg log</command> command gives us a view of + history.</para> + + &interaction.tour.log; + + <para>By default, this command prints a brief paragraph of output + for each change to the project that was recorded. In Mercurial + terminology, we call each of these recorded events a + <emphasis>changeset</emphasis>, because it can contain a record + of changes to several files.</para> + + <para>The fields in a record of output from <command + role="hg-cmd">hg log</command> are as follows.</para> + <itemizedlist> + <listitem><para><literal>changeset</literal>: This field has the + format of a number, followed by a colon, followed by a + hexadecimal string. These are + <emphasis>identifiers</emphasis> for the changeset. There + are two identifiers because the number is shorter and easier + to type than the hex string.</para></listitem> + <listitem><para><literal>user</literal>: The identity of the + person who created the changeset. This is a free-form + field, but it most often contains a person's name and email + address.</para></listitem> + <listitem><para><literal>date</literal>: The date and time on + which the changeset was created, and the timezone in which + it was created. (The date and time are local to that + timezone; they display what time and date it was for the + person who created the changeset.)</para></listitem> + <listitem><para><literal>summary</literal>: The first line of + the text message that the creator of the changeset entered + to describe the changeset.</para></listitem></itemizedlist> + <para>The default output printed by <command role="hg-cmd">hg + log</command> is purely a summary; it is missing a lot of + detail.</para> + + <para>Figure <xref linkend="fig:tour-basic:history"/> provides a + graphical representation of the history of the <filename + class="directory">hello</filename> repository, to make it a + little easier to see which direction history is + <quote>flowing</quote> in. We'll be returning to this figure + several times in this chapter and the chapter that + follows.</para> + + <informalfigure id="fig:tour-basic:history"> + <mediaobject> + <imageobject><imagedata fileref="tour-history"/></imageobject> + <textobject><phrase>XXX add text</phrase></textobject> + <caption><para>Graphical history of the <filename + class="directory">hello</filename> + repository</para></caption> + </mediaobject> + </informalfigure> + + <sect2> + <title>Changesets, revisions, and talking to other + people</title> + + <para>As English is a notoriously sloppy language, and computer + science has a hallowed history of terminological confusion + (why use one term when four will do?), revision control has a + variety of words and phrases that mean the same thing. If you + are talking about Mercurial history with other people, you + will find that the word <quote>changeset</quote> is often + compressed to <quote>change</quote> or (when written) + <quote>cset</quote>, and sometimes a changeset is referred to + as a <quote>revision</quote> or a <quote>rev</quote>.</para> + + <para>While it doesn't matter what <emphasis>word</emphasis> you + use to refer to the concept of <quote>a changeset</quote>, the + <emphasis>identifier</emphasis> that you use to refer to + <quote>a <emphasis>specific</emphasis> changeset</quote> is of + great importance. Recall that the <literal>changeset</literal> + field in the output from <command role="hg-cmd">hg + log</command> identifies a changeset using both a number and + a hexadecimal string.</para> + <itemizedlist> + <listitem><para>The revision number is <emphasis>only valid in + that repository</emphasis>,</para></listitem> + <listitem><para>while the hex string is the + <emphasis>permanent, unchanging identifier</emphasis> that + will always identify that exact changeset in + <emphasis>every</emphasis> copy of the + repository.</para></listitem></itemizedlist> + <para>This distinction is important. If you send someone an + email talking about <quote>revision 33</quote>, there's a high + likelihood that their revision 33 will <emphasis>not be the + same</emphasis> as yours. The reason for this is that a + revision number depends on the order in which changes arrived + in a repository, and there is no guarantee that the same + changes will happen in the same order in different + repositories. Three changes $a,b,c$ can easily appear in one + repository as $0,1,2$, while in another as $1,0,2$.</para> + + <para>Mercurial uses revision numbers purely as a convenient + shorthand. If you need to discuss a changeset with someone, + or make a record of a changeset for some other reason (for + example, in a bug report), use the hexadecimal + identifier.</para> + + </sect2> + <sect2> + <title>Viewing specific revisions</title> + + <para>To narrow the output of <command role="hg-cmd">hg + log</command> down to a single revision, use the <option + role="hg-opt-log">-r</option> (or <option + role="hg-opt-log">--rev</option>) option. You can use + either a revision number or a long-form changeset identifier, + and you can provide as many revisions as you want.</para> + + &interaction.tour.log-r; + + <para>If you want to see the history of several revisions + without having to list each one, you can use <emphasis>range + notation</emphasis>; this lets you express the idea <quote>I + want all revisions between <literal>abc</literal> and + <literal>def</literal>, inclusive</quote>.</para> + + &interaction.tour.log.range; + + <para>Mercurial also honours the order in which you specify + revisions, so <command role="hg-cmd">hg log -r 2:4</command> + prints 2, 3, and 4. while <command role="hg-cmd">hg log -r + 4:2</command> prints 4, 3, and 2.</para> + + </sect2> + <sect2> + <title>More detailed information</title> + + <para>While the summary information printed by <command + role="hg-cmd">hg log</command> is useful if you already know + what you're looking for, you may need to see a complete + description of the change, or a list of the files changed, if + you're trying to decide whether a changeset is the one you're + looking for. The <command role="hg-cmd">hg log</command> + command's <option role="hg-opt-global">-v</option> (or <option + role="hg-opt-global">--verbose</option>) option gives you + this extra detail.</para> + + &interaction.tour.log-v; + + <para>If you want to see both the description and content of a + change, add the <option role="hg-opt-log">-p</option> (or + <option role="hg-opt-log">--patch</option>) option. This + displays the content of a change as a <emphasis>unified + diff</emphasis> (if you've never seen a unified diff before, + see section <xref linkend="sec:mq:patch"/> for an + overview).</para> + + &interaction.tour.log-vp; + + </sect2> + </sect1> + <sect1> + <title>All about command options</title> + + <para>Let's take a brief break from exploring Mercurial commands + to discuss a pattern in the way that they work; you may find + this useful to keep in mind as we continue our tour.</para> + + <para>Mercurial has a consistent and straightforward approach to + dealing with the options that you can pass to commands. It + follows the conventions for options that are common to modern + Linux and Unix systems.</para> + <itemizedlist> + <listitem><para>Every option has a long name. For example, as + we've already seen, the <command role="hg-cmd">hg + log</command> command accepts a <option + role="hg-opt-log">--rev</option> option.</para></listitem> + <listitem><para>Most options have short names, too. Instead of + <option role="hg-opt-log">--rev</option>, we can use <option + role="hg-opt-log">-r</option>. (The reason that some + options don't have short names is that the options in + question are rarely used.)</para></listitem> + <listitem><para>Long options start with two dashes (e.g. <option + role="hg-opt-log">--rev</option>), while short options + start with one (e.g. <option + role="hg-opt-log">-r</option>).</para></listitem> + <listitem><para>Option naming and usage is consistent across + commands. For example, every command that lets you specify + a changeset ID or revision number accepts both <option + role="hg-opt-log">-r</option> and <option + role="hg-opt-log">--rev</option> + arguments.</para></listitem></itemizedlist> + <para>In the examples throughout this book, I use short options + instead of long. This just reflects my own preference, so don't + read anything significant into it.</para> + + <para>Most commands that print output of some kind will print more + output when passed a <option role="hg-opt-global">-v</option> + (or <option role="hg-opt-global">--verbose</option>) option, and + less when passed <option role="hg-opt-global">-q</option> (or + <option role="hg-opt-global">--quiet</option>).</para> + + </sect1> + <sect1> + <title>Making and reviewing changes</title> + + <para>Now that we have a grasp of viewing history in Mercurial, + let's take a look at making some changes and examining + them.</para> + + <para>The first thing we'll do is isolate our experiment in a + repository of its own. We use the <command role="hg-cmd">hg + clone</command> command, but we don't need to clone a copy of + the remote repository. Since we already have a copy of it + locally, we can just clone that instead. This is much faster + than cloning over the network, and cloning a local repository + uses less disk space in most cases, too.</para> + + &interaction.tour.reclone; + + <para>As an aside, it's often good practice to keep a + <quote>pristine</quote> copy of a remote repository around, + which you can then make temporary clones of to create sandboxes + for each task you want to work on. This lets you work on + multiple tasks in parallel, each isolated from the others until + it's complete and you're ready to integrate it back. Because + local clones are so cheap, there's almost no overhead to cloning + and destroying repositories whenever you want.</para> + + <para>In our <filename class="directory">my-hello</filename> + repository, we have a file <filename>hello.c</filename> that + contains the classic <quote>hello, world</quote> program. Let's + use the ancient and venerable <command>sed</command> command to + edit this file so that it prints a second line of output. (I'm + only using <command>sed</command> to do this because it's easy + to write a scripted example this way. Since you're not under + the same constraint, you probably won't want to use + <command>sed</command>; simply use your preferred text editor to + do the same thing.)</para> + + &interaction.tour.sed; + + <para>Mercurial's <command role="hg-cmd">hg status</command> + command will tell us what Mercurial knows about the files in the + repository.</para> + + &interaction.tour.status; + + <para>The <command role="hg-cmd">hg status</command> command + prints no output for some files, but a line starting with + <quote><literal>M</literal></quote> for + <filename>hello.c</filename>. Unless you tell it to, <command + role="hg-cmd">hg status</command> will not print any output + for files that have not been modified.</para> + + <para>The <quote><literal>M</literal></quote> indicates that + Mercurial has noticed that we modified + <filename>hello.c</filename>. We didn't need to + <emphasis>inform</emphasis> Mercurial that we were going to + modify the file before we started, or that we had modified the + file after we were done; it was able to figure this out + itself.</para> + + <para>It's a little bit helpful to know that we've modified + <filename>hello.c</filename>, but we might prefer to know + exactly <emphasis>what</emphasis> changes we've made to it. To + do this, we use the <command role="hg-cmd">hg diff</command> + command.</para> + + &interaction.tour.diff; + + </sect1> + <sect1> + <title>Recording changes in a new changeset</title> + + <para>We can modify files, build and test our changes, and use + <command role="hg-cmd">hg status</command> and <command + role="hg-cmd">hg diff</command> to review our changes, until + we're satisfied with what we've done and arrive at a natural + stopping point where we want to record our work in a new + changeset.</para> + + <para>The <command role="hg-cmd">hg commit</command> command lets + us create a new changeset; we'll usually refer to this as + <quote>making a commit</quote> or + <quote>committing</quote>.</para> + + <sect2> + <title>Setting up a username</title> + + <para>When you try to run <command role="hg-cmd">hg + commit</command> for the first time, it is not guaranteed to + succeed. Mercurial records your name and address with each + change that you commit, so that you and others will later be + able to tell who made each change. Mercurial tries to + automatically figure out a sensible username to commit the + change with. It will attempt each of the following methods, + in order:</para> + <orderedlist> + <listitem><para>If you specify a <option + role="hg-opt-commit">-u</option> option to the <command + role="hg-cmd">hg commit</command> command on the command + line, followed by a username, this is always given the + highest precedence.</para></listitem> + <listitem><para>If you have set the <envar>HGUSER</envar> + environment variable, this is checked + next.</para></listitem> + <listitem><para>If you create a file in your home directory + called <filename role="special">.hgrc</filename>, with a + <envar role="rc-item-ui">username</envar> entry, that will + be used next. To see what the contents of this file + should look like, refer to section <xref + linkend="sec:tour-basic:username"/> + below.</para></listitem> + <listitem><para>If you have set the <envar>EMAIL</envar> + environment variable, this will be used + next.</para></listitem> + <listitem><para>Mercurial will query your system to find out + your local user name and host name, and construct a + username from these components. Since this often results + in a username that is not very useful, it will print a + warning if it has to do + this.</para></listitem> + </orderedlist> + <para>If all of these mechanisms fail, Mercurial will + fail, printing an error message. In this case, it will not + let you commit until you set up a + username.</para> + <para>You should think of the <envar>HGUSER</envar> environment + variable and the <option role="hg-opt-commit">-u</option> + option to the <command role="hg-cmd">hg commit</command> + command as ways to <emphasis>override</emphasis> Mercurial's + default selection of username. For normal use, the simplest + and most robust way to set a username for yourself is by + creating a <filename role="special">.hgrc</filename> file; see + below for details.</para> + <sect3 id="sec:tour-basic:username"> + <title>Creating a Mercurial configuration file</title> + + <para>To set a user name, use your favourite editor + to create a file called <filename + role="special">.hgrc</filename> in your home directory. + Mercurial will use this file to look up your personalised + configuration settings. The initial contents of your + <filename role="special">.hgrc</filename> should look like + this.</para> + <programlisting># This is a Mercurial configuration file. +[ui] +username = Firstname Lastname +<email.address@domain.net></programlisting> + + <para>The <quote><literal>[ui]</literal></quote> line begins a + <emphasis>section</emphasis> of the config file, so you can + read the <quote><literal>username = ...</literal></quote> + line as meaning <quote>set the value of the + <literal>username</literal> item in the + <literal>ui</literal> section</quote>. A section continues + until a new section begins, or the end of the file. + Mercurial ignores empty lines and treats any text from + <quote><literal>#</literal></quote> to the end of a line as + a comment.</para> + </sect3> + + <sect3> + <title>Choosing a user name</title> + + <para>You can use any text you like as the value of + the <literal>username</literal> config item, since this + information is for reading by other people, but for + interpreting by Mercurial. The convention that most + people follow is to use their name and email address, as + in the example above.</para> + <note> + <para>Mercurial's built-in web server obfuscates + email addresses, to make it more difficult for the email + harvesting tools that spammers use. This reduces the + likelihood that you'll start receiving more junk email + if you publish a Mercurial repository on the + web.</para></note> + + </sect3> + </sect2> + <sect2> + <title>Writing a commit message</title> + + <para>When we commit a change, Mercurial drops us into + a text editor, to enter a message that will describe the + modifications we've made in this changeset. This is called + the <emphasis>commit message</emphasis>. It will be a + record for readers of what we did and why, and it will be + printed by <command role="hg-cmd">hg log</command> after + we've finished committing.</para> + + &interaction.tour.commit; + + <para>The editor that the <command role="hg-cmd">hg + commit</command> command drops us into will contain an + empty line, followed by a number of lines starting with + <quote><literal>HG:</literal></quote>.</para> + + <programlisting>XXX fix this XXX</programlisting> + + <para>Mercurial ignores the lines that start with + <quote><literal>HG:</literal></quote>; it uses them only to + tell us which files it's recording changes to. Modifying or + deleting these lines has no effect.</para> + </sect2> + <sect2> + <title>Writing a good commit message</title> + + <para>Since <command role="hg-cmd">hg log</command> + only prints the first line of a commit message by default, + it's best to write a commit message whose first line stands + alone. Here's a real example of a commit message that + <emphasis>doesn't</emphasis> follow this guideline, and + hence has a summary that is not + readable.</para> + + <programlisting> +changeset: 73:584af0e231be +user: Censored Person <censored.person@example.org> +date: Tue Sep 26 21:37:07 2006 -0700 +summary: include buildmeister/commondefs. Add exports.</programlisting> + + <para>As far as the remainder of the contents of the + commit message are concerned, there are no hard-and-fast + rules. Mercurial itself doesn't interpret or care about the + contents of the commit message, though your project may have + policies that dictate a certain kind of + formatting.</para> + <para>My personal preference is for short, but + informative, commit messages that tell me something that I + can't figure out with a quick glance at the output of + <command role="hg-cmd">hg log + --patch</command>.</para> + </sect2> + <sect2> + <title>Aborting a commit</title> + + <para>If you decide that you don't want to commit + while in the middle of editing a commit message, simply exit + from your editor without saving the file that it's editing. + This will cause nothing to happen to either the repository + or the working directory.</para> + <para>If we run the <command role="hg-cmd">hg + commit</command> command without any arguments, it records + all of the changes we've made, as reported by <command + role="hg-cmd">hg status</command> and <command + role="hg-cmd">hg diff</command>.</para> + </sect2> + <sect2> + <title>Admiring our new handiwork</title> + + <para>Once we've finished the commit, we can use the + <command role="hg-cmd">hg tip</command> command to display + the changeset we just created. This command produces output + that is identical to <command role="hg-cmd">hg + log</command>, but it only displays the newest revision in + the repository.</para> + + &interaction.tour.tip; + + <para>We refer to + the newest revision in the repository as the tip revision, + or simply the tip.</para> + </sect2> + </sect1> + + <sect1> + <title>Sharing changes</title> + + <para>We mentioned earlier that repositories in + Mercurial are self-contained. This means that the changeset + we just created exists only in our <filename + class="directory">my-hello</filename> repository. Let's + look at a few ways that we can propagate this change into + other repositories.</para> + + <sect2 id="sec:tour:pull"> + <title>Pulling changes from another repository</title> + <para>To get started, let's clone our original + <filename class="directory">hello</filename> repository, + which does not contain the change we just committed. We'll + call our temporary repository <filename + class="directory">hello-pull</filename>.</para> + + &interaction.tour.clone-pull; + + <para>We'll use the <command role="hg-cmd">hg + pull</command> command to bring changes from <filename + class="directory">my-hello</filename> into <filename + class="directory">hello-pull</filename>. However, blindly + pulling unknown changes into a repository is a somewhat + scary prospect. Mercurial provides the <command + role="hg-cmd">hg incoming</command> command to tell us + what changes the <command role="hg-cmd">hg pull</command> + command <emphasis>would</emphasis> pull into the repository, + without actually pulling the changes in.</para> + + &interaction.tour.incoming; + + <para>(Of course, someone could + cause more changesets to appear in the repository that we + ran <command role="hg-cmd">hg incoming</command> in, before + we get a chance to <command role="hg-cmd">hg pull</command> + the changes, so that we could end up pulling changes that we + didn't expect.)</para> + + <para>Bringing changes into a repository is a simple + matter of running the <command role="hg-cmd">hg + pull</command> command, and telling it which repository to + pull from.</para> + + &interaction.tour.pull; + + <para>As you can see + from the before-and-after output of <command + role="hg-cmd">hg tip</command>, we have successfully + pulled changes into our repository. There remains one step + before we can see these changes in the working + directory.</para> + </sect2> + <sect2> + <title>Updating the working directory</title> + + <para>We have so far glossed over the relationship between a + repository and its working directory. The <command + role="hg-cmd">hg pull</command> command that we ran in + section <xref linkend="sec:tour:pull"/> brought changes + into the repository, but if we check, there's no sign of those + changes in the working directory. This is because <command + role="hg-cmd">hg pull</command> does not (by default) touch + the working directory. Instead, we use the <command + role="hg-cmd">hg update</command> command to do this.</para> + + &interaction.tour.update; + + <para>It might seem a bit strange that <command role="hg-cmd">hg + pull</command> doesn't update the working directory + automatically. There's actually a good reason for this: you + can use <command role="hg-cmd">hg update</command> to update + the working directory to the state it was in at <emphasis>any + revision</emphasis> in the history of the repository. If + you had the working directory updated to an old revision---to + hunt down the origin of a bug, say---and ran a <command + role="hg-cmd">hg pull</command> which automatically updated + the working directory to a new revision, you might not be + terribly happy.</para> + <para>However, since pull-then-update is such a common thing to + do, Mercurial lets you combine the two by passing the <option + role="hg-opt-pull">-u</option> option to <command + role="hg-cmd">hg pull</command>.</para> + + <para>If you look back at the output of <command + role="hg-cmd">hg pull</command> in section <xref + linkend="sec:tour:pull"/> when we ran it without <option + role="hg-opt-pull">-u</option>, you can see that it printed + a helpful reminder that we'd have to take an explicit step to + update the working directory:</para> + + <!-- &interaction.xxx.fixme; --> + + <para>To find out what revision the working directory is at, use + the <command role="hg-cmd">hg parents</command> + command.</para> + + &interaction.tour.parents; + + <para>If you look back at figure <xref + linkend="fig:tour-basic:history"/>, + you'll see arrows connecting each changeset. The node that + the arrow leads <emphasis>from</emphasis> in each case is a + parent, and the node that the arrow leads + <emphasis>to</emphasis> is its child. The working directory + has a parent in just the same way; this is the changeset that + the working directory currently contains.</para> + + <para>To update the working directory to a particular revision, + + give a revision number or changeset ID to the <command + role="hg-cmd">hg update</command> command.</para> + + &interaction.tour.older; + + <para>If you omit an explicit revision, <command + role="hg-cmd">hg update</command> will update to the tip + revision, as shown by the second call to <command + role="hg-cmd">hg update</command> in the example + above.</para> + </sect2> + + <sect2> + <title>Pushing changes to another repository</title> + + <para>Mercurial lets us push changes to another + repository, from the repository we're currently visiting. + As with the example of <command role="hg-cmd">hg + pull</command> above, we'll create a temporary repository + to push our changes into.</para> + + &interaction.tour.clone-push; + + <para>The <command role="hg-cmd">hg outgoing</command> command + tells us what changes would be pushed into another + repository.</para> + + &interaction.tour.outgoing; + + <para>And the + <command role="hg-cmd">hg push</command> command does the + actual push.</para> + + &interaction.tour.push; + + <para>As with + <command role="hg-cmd">hg pull</command>, the <command + role="hg-cmd">hg push</command> command does not update + the working directory in the repository that it's pushing + changes into. (Unlike <command role="hg-cmd">hg + pull</command>, <command role="hg-cmd">hg push</command> + does not provide a <literal>-u</literal> option that updates + the other repository's working directory.)</para> + + <para>What happens if we try to pull or push changes + and the receiving repository already has those changes? + Nothing too exciting.</para> + + &interaction.tour.push.nothing; + </sect2> + <sect2> + <title>Sharing changes over a network</title> + + <para>The commands we have covered in the previous few + sections are not limited to working with local repositories. + Each works in exactly the same fashion over a network + connection; simply pass in a URL instead of a local + path.</para> + + &interaction.tour.outgoing.net; + + <para>In this example, we + can see what changes we could push to the remote repository, + but the repository is understandably not set up to let + anonymous users push to it.</para> + + &interaction.tour.push.net; + </sect2> + </sect1> +</chapter> + +<!-- +local variables: +sgml-parent-document: ("00book.xml" "book" "chapter") +end: +-->
--- a/en/ch02-tour-basic.xml Wed Mar 18 00:08:22 2009 -0700 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,860 +0,0 @@ -<!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : --> - -<chapter id="chap:tour-basic"> - <?dbhtml filename="a-tour-of-mercurial-the-basics.html"?> - <title>A tour of Mercurial: the basics</title> - - <sect1 id="sec:tour:install"> - <title>Installing Mercurial on your system</title> - - <para>Prebuilt binary packages of Mercurial are available for - every popular operating system. These make it easy to start - using Mercurial on your computer immediately.</para> - - <sect2> - <title>Linux</title> - - <para>Because each Linux distribution has its own packaging - tools, policies, and rate of development, it's difficult to - give a comprehensive set of instructions on how to install - Mercurial binaries. The version of Mercurial that you will - end up with can vary depending on how active the person is who - maintains the package for your distribution.</para> - - <para>To keep things simple, I will focus on installing - Mercurial from the command line under the most popular Linux - distributions. Most of these distributions provide graphical - package managers that will let you install Mercurial with a - single click; the package name to look for is - <literal>mercurial</literal>.</para> - - <itemizedlist> - <listitem><para>Debian:</para> - <programlisting>apt-get install mercurial</programlisting></listitem> - <listitem><para>Fedora Core:</para> - <programlisting>yum install mercurial</programlisting></listitem> - <listitem><para>Gentoo:</para> - <programlisting>emerge mercurial</programlisting></listitem> - <listitem><para>OpenSUSE:</para> - <programlisting>yum install mercurial</programlisting></listitem> - <listitem><para>Ubuntu: Ubuntu's Mercurial package is based on - Debian's. To install it, run the following - command.</para> - <programlisting>apt-get install mercurial</programlisting></listitem> - </itemizedlist> - - </sect2> - <sect2> - <title>Solaris</title> - - <para>SunFreeWare, at <ulink - url="http://www.sunfreeware.com">http://www.sunfreeware.com</ulink>, - is a good source for a large number of pre-built Solaris - packages for 32 and 64 bit Intel and Sparc architectures, - including current versions of Mercurial.</para> - - </sect2> - <sect2> - <title>Mac OS X</title> - - <para>Lee Cantey publishes an installer of Mercurial for Mac OS - X at <ulink - url="http://mercurial.berkwood.com">http://mercurial.berkwood.com</ulink>. - This package works on both Intel- and Power-based Macs. Before - you can use it, you must install a compatible version of - Universal MacPython <citation>web:macpython</citation>. This - is easy to do; simply follow the instructions on Lee's - site.</para> - - <para>It's also possible to install Mercurial using Fink or - MacPorts, two popular free package managers for Mac OS X. If - you have Fink, use <command>sudo apt-get install - mercurial-py25</command>. If MacPorts, <command>sudo port - install mercurial</command>.</para> - - </sect2> - <sect2> - <title>Windows</title> - - <para>Lee Cantey publishes an installer of Mercurial for Windows - at <ulink - url="http://mercurial.berkwood.com">http://mercurial.berkwood.com</ulink>. - This package has no external dependencies; it <quote>just - works</quote>.</para> - - <note> - <para> The Windows version of Mercurial does not - automatically convert line endings between Windows and Unix - styles. If you want to share work with Unix users, you must - do a little additional configuration work. XXX Flesh this - out.</para> - </note> - - </sect2> - </sect1> - <sect1> - <title>Getting started</title> - - <para>To begin, we'll use the <command role="hg-cmd">hg - version</command> command to find out whether Mercurial is - actually installed properly. The actual version information - that it prints isn't so important; it's whether it prints - anything at all that we care about.</para> - - &interaction.tour.version; - - <sect2> - <title>Built-in help</title> - - <para>Mercurial provides a built-in help system. This is - invaluable for those times when you find yourself stuck - trying to remember how to run a command. If you are - completely stuck, simply run <command role="hg-cmd">hg - help</command>; it will print a brief list of commands, - along with a description of what each does. If you ask for - help on a specific command (as below), it prints more - detailed information.</para> - - &interaction.tour.help; - - <para>For a more impressive level of detail (which you won't - usually need) run <command role="hg-cmd">hg help <option - role="hg-opt-global">-v</option></command>. The <option - role="hg-opt-global">-v</option> option is short for - <option role="hg-opt-global">--verbose</option>, and tells - Mercurial to print more information than it usually - would.</para> - - </sect2> - </sect1> - <sect1> - <title>Working with a repository</title> - - <para>In Mercurial, everything happens inside a - <emphasis>repository</emphasis>. The repository for a project - contains all of the files that <quote>belong to</quote> that - project, along with a historical record of the project's - files.</para> - - <para>There's nothing particularly magical about a repository; it - is simply a directory tree in your filesystem that Mercurial - treats as special. You can rename or delete a repository any - time you like, using either the command line or your file - browser.</para> - - <sect2> - <title>Making a local copy of a repository</title> - - <para><emphasis>Copying</emphasis> a repository is just a little - bit special. While you could use a normal file copying - command to make a copy of a repository, it's best to use a - built-in command that Mercurial provides. This command is - called <command role="hg-cmd">hg clone</command>, because it - creates an identical copy of an existing repository.</para> - - &interaction.tour.clone; - - <para>If our clone succeeded, we should now have a local - directory called <filename class="directory">hello</filename>. - This directory will contain some files.</para> - - &interaction.tour.ls; - - <para>These files have the same contents and history in our - repository as they do in the repository we cloned.</para> - - <para>Every Mercurial repository is complete, self-contained, - and independent. It contains its own private copy of a - project's files and history. A cloned repository remembers - the location of the repository it was cloned from, but it does - not communicate with that repository, or any other, unless you - tell it to.</para> - - <para>What this means for now is that we're free to experiment - with our repository, safe in the knowledge that it's a private - <quote>sandbox</quote> that won't affect anyone else.</para> - - </sect2> - <sect2> - <title>What's in a repository?</title> - - <para>When we take a more detailed look inside a repository, we - can see that it contains a directory named <filename - class="directory">.hg</filename>. This is where Mercurial - keeps all of its metadata for the repository.</para> - - &interaction.tour.ls-a; - - <para>The contents of the <filename - class="directory">.hg</filename> directory and its - subdirectories are private to Mercurial. Every other file and - directory in the repository is yours to do with as you - please.</para> - - <para>To introduce a little terminology, the <filename - class="directory">.hg</filename> directory is the - <quote>real</quote> repository, and all of the files and - directories that coexist with it are said to live in the - <emphasis>working directory</emphasis>. An easy way to - remember the distinction is that the - <emphasis>repository</emphasis> contains the - <emphasis>history</emphasis> of your project, while the - <emphasis>working directory</emphasis> contains a - <emphasis>snapshot</emphasis> of your project at a particular - point in history.</para> - - </sect2> - </sect1> - <sect1> - <title>A tour through history</title> - - <para>One of the first things we might want to do with a new, - unfamiliar repository is understand its history. The <command - role="hg-cmd">hg log</command> command gives us a view of - history.</para> - - &interaction.tour.log; - - <para>By default, this command prints a brief paragraph of output - for each change to the project that was recorded. In Mercurial - terminology, we call each of these recorded events a - <emphasis>changeset</emphasis>, because it can contain a record - of changes to several files.</para> - - <para>The fields in a record of output from <command - role="hg-cmd">hg log</command> are as follows.</para> - <itemizedlist> - <listitem><para><literal>changeset</literal>: This field has the - format of a number, followed by a colon, followed by a - hexadecimal string. These are - <emphasis>identifiers</emphasis> for the changeset. There - are two identifiers because the number is shorter and easier - to type than the hex string.</para></listitem> - <listitem><para><literal>user</literal>: The identity of the - person who created the changeset. This is a free-form - field, but it most often contains a person's name and email - address.</para></listitem> - <listitem><para><literal>date</literal>: The date and time on - which the changeset was created, and the timezone in which - it was created. (The date and time are local to that - timezone; they display what time and date it was for the - person who created the changeset.)</para></listitem> - <listitem><para><literal>summary</literal>: The first line of - the text message that the creator of the changeset entered - to describe the changeset.</para></listitem></itemizedlist> - <para>The default output printed by <command role="hg-cmd">hg - log</command> is purely a summary; it is missing a lot of - detail.</para> - - <para>Figure <xref linkend="fig:tour-basic:history"/> provides a - graphical representation of the history of the <filename - class="directory">hello</filename> repository, to make it a - little easier to see which direction history is - <quote>flowing</quote> in. We'll be returning to this figure - several times in this chapter and the chapter that - follows.</para> - - <informalfigure id="fig:tour-basic:history"> - <mediaobject> - <imageobject><imagedata fileref="tour-history"/></imageobject> - <textobject><phrase>XXX add text</phrase></textobject> - <caption><para>Graphical history of the <filename - class="directory">hello</filename> - repository</para></caption> - </mediaobject> - </informalfigure> - - <sect2> - <title>Changesets, revisions, and talking to other - people</title> - - <para>As English is a notoriously sloppy language, and computer - science has a hallowed history of terminological confusion - (why use one term when four will do?), revision control has a - variety of words and phrases that mean the same thing. If you - are talking about Mercurial history with other people, you - will find that the word <quote>changeset</quote> is often - compressed to <quote>change</quote> or (when written) - <quote>cset</quote>, and sometimes a changeset is referred to - as a <quote>revision</quote> or a <quote>rev</quote>.</para> - - <para>While it doesn't matter what <emphasis>word</emphasis> you - use to refer to the concept of <quote>a changeset</quote>, the - <emphasis>identifier</emphasis> that you use to refer to - <quote>a <emphasis>specific</emphasis> changeset</quote> is of - great importance. Recall that the <literal>changeset</literal> - field in the output from <command role="hg-cmd">hg - log</command> identifies a changeset using both a number and - a hexadecimal string.</para> - <itemizedlist> - <listitem><para>The revision number is <emphasis>only valid in - that repository</emphasis>,</para></listitem> - <listitem><para>while the hex string is the - <emphasis>permanent, unchanging identifier</emphasis> that - will always identify that exact changeset in - <emphasis>every</emphasis> copy of the - repository.</para></listitem></itemizedlist> - <para>This distinction is important. If you send someone an - email talking about <quote>revision 33</quote>, there's a high - likelihood that their revision 33 will <emphasis>not be the - same</emphasis> as yours. The reason for this is that a - revision number depends on the order in which changes arrived - in a repository, and there is no guarantee that the same - changes will happen in the same order in different - repositories. Three changes $a,b,c$ can easily appear in one - repository as $0,1,2$, while in another as $1,0,2$.</para> - - <para>Mercurial uses revision numbers purely as a convenient - shorthand. If you need to discuss a changeset with someone, - or make a record of a changeset for some other reason (for - example, in a bug report), use the hexadecimal - identifier.</para> - - </sect2> - <sect2> - <title>Viewing specific revisions</title> - - <para>To narrow the output of <command role="hg-cmd">hg - log</command> down to a single revision, use the <option - role="hg-opt-log">-r</option> (or <option - role="hg-opt-log">--rev</option>) option. You can use - either a revision number or a long-form changeset identifier, - and you can provide as many revisions as you want.</para> - - &interaction.tour.log-r; - - <para>If you want to see the history of several revisions - without having to list each one, you can use <emphasis>range - notation</emphasis>; this lets you express the idea <quote>I - want all revisions between <literal>abc</literal> and - <literal>def</literal>, inclusive</quote>.</para> - - &interaction.tour.log.range; - - <para>Mercurial also honours the order in which you specify - revisions, so <command role="hg-cmd">hg log -r 2:4</command> - prints 2, 3, and 4. while <command role="hg-cmd">hg log -r - 4:2</command> prints 4, 3, and 2.</para> - - </sect2> - <sect2> - <title>More detailed information</title> - - <para>While the summary information printed by <command - role="hg-cmd">hg log</command> is useful if you already know - what you're looking for, you may need to see a complete - description of the change, or a list of the files changed, if - you're trying to decide whether a changeset is the one you're - looking for. The <command role="hg-cmd">hg log</command> - command's <option role="hg-opt-global">-v</option> (or <option - role="hg-opt-global">--verbose</option>) option gives you - this extra detail.</para> - - &interaction.tour.log-v; - - <para>If you want to see both the description and content of a - change, add the <option role="hg-opt-log">-p</option> (or - <option role="hg-opt-log">--patch</option>) option. This - displays the content of a change as a <emphasis>unified - diff</emphasis> (if you've never seen a unified diff before, - see section <xref linkend="sec:mq:patch"/> for an - overview).</para> - - &interaction.tour.log-vp; - - </sect2> - </sect1> - <sect1> - <title>All about command options</title> - - <para>Let's take a brief break from exploring Mercurial commands - to discuss a pattern in the way that they work; you may find - this useful to keep in mind as we continue our tour.</para> - - <para>Mercurial has a consistent and straightforward approach to - dealing with the options that you can pass to commands. It - follows the conventions for options that are common to modern - Linux and Unix systems.</para> - <itemizedlist> - <listitem><para>Every option has a long name. For example, as - we've already seen, the <command role="hg-cmd">hg - log</command> command accepts a <option - role="hg-opt-log">--rev</option> option.</para></listitem> - <listitem><para>Most options have short names, too. Instead of - <option role="hg-opt-log">--rev</option>, we can use <option - role="hg-opt-log">-r</option>. (The reason that some - options don't have short names is that the options in - question are rarely used.)</para></listitem> - <listitem><para>Long options start with two dashes (e.g. <option - role="hg-opt-log">--rev</option>), while short options - start with one (e.g. <option - role="hg-opt-log">-r</option>).</para></listitem> - <listitem><para>Option naming and usage is consistent across - commands. For example, every command that lets you specify - a changeset ID or revision number accepts both <option - role="hg-opt-log">-r</option> and <option - role="hg-opt-log">--rev</option> - arguments.</para></listitem></itemizedlist> - <para>In the examples throughout this book, I use short options - instead of long. This just reflects my own preference, so don't - read anything significant into it.</para> - - <para>Most commands that print output of some kind will print more - output when passed a <option role="hg-opt-global">-v</option> - (or <option role="hg-opt-global">--verbose</option>) option, and - less when passed <option role="hg-opt-global">-q</option> (or - <option role="hg-opt-global">--quiet</option>).</para> - - </sect1> - <sect1> - <title>Making and reviewing changes</title> - - <para>Now that we have a grasp of viewing history in Mercurial, - let's take a look at making some changes and examining - them.</para> - - <para>The first thing we'll do is isolate our experiment in a - repository of its own. We use the <command role="hg-cmd">hg - clone</command> command, but we don't need to clone a copy of - the remote repository. Since we already have a copy of it - locally, we can just clone that instead. This is much faster - than cloning over the network, and cloning a local repository - uses less disk space in most cases, too.</para> - - &interaction.tour.reclone; - - <para>As an aside, it's often good practice to keep a - <quote>pristine</quote> copy of a remote repository around, - which you can then make temporary clones of to create sandboxes - for each task you want to work on. This lets you work on - multiple tasks in parallel, each isolated from the others until - it's complete and you're ready to integrate it back. Because - local clones are so cheap, there's almost no overhead to cloning - and destroying repositories whenever you want.</para> - - <para>In our <filename class="directory">my-hello</filename> - repository, we have a file <filename>hello.c</filename> that - contains the classic <quote>hello, world</quote> program. Let's - use the ancient and venerable <command>sed</command> command to - edit this file so that it prints a second line of output. (I'm - only using <command>sed</command> to do this because it's easy - to write a scripted example this way. Since you're not under - the same constraint, you probably won't want to use - <command>sed</command>; simply use your preferred text editor to - do the same thing.)</para> - - &interaction.tour.sed; - - <para>Mercurial's <command role="hg-cmd">hg status</command> - command will tell us what Mercurial knows about the files in the - repository.</para> - - &interaction.tour.status; - - <para>The <command role="hg-cmd">hg status</command> command - prints no output for some files, but a line starting with - <quote><literal>M</literal></quote> for - <filename>hello.c</filename>. Unless you tell it to, <command - role="hg-cmd">hg status</command> will not print any output - for files that have not been modified.</para> - - <para>The <quote><literal>M</literal></quote> indicates that - Mercurial has noticed that we modified - <filename>hello.c</filename>. We didn't need to - <emphasis>inform</emphasis> Mercurial that we were going to - modify the file before we started, or that we had modified the - file after we were done; it was able to figure this out - itself.</para> - - <para>It's a little bit helpful to know that we've modified - <filename>hello.c</filename>, but we might prefer to know - exactly <emphasis>what</emphasis> changes we've made to it. To - do this, we use the <command role="hg-cmd">hg diff</command> - command.</para> - - &interaction.tour.diff; - - </sect1> - <sect1> - <title>Recording changes in a new changeset</title> - - <para>We can modify files, build and test our changes, and use - <command role="hg-cmd">hg status</command> and <command - role="hg-cmd">hg diff</command> to review our changes, until - we're satisfied with what we've done and arrive at a natural - stopping point where we want to record our work in a new - changeset.</para> - - <para>The <command role="hg-cmd">hg commit</command> command lets - us create a new changeset; we'll usually refer to this as - <quote>making a commit</quote> or - <quote>committing</quote>.</para> - - <sect2> - <title>Setting up a username</title> - - <para>When you try to run <command role="hg-cmd">hg - commit</command> for the first time, it is not guaranteed to - succeed. Mercurial records your name and address with each - change that you commit, so that you and others will later be - able to tell who made each change. Mercurial tries to - automatically figure out a sensible username to commit the - change with. It will attempt each of the following methods, - in order:</para> - <orderedlist> - <listitem><para>If you specify a <option - role="hg-opt-commit">-u</option> option to the <command - role="hg-cmd">hg commit</command> command on the command - line, followed by a username, this is always given the - highest precedence.</para></listitem> - <listitem><para>If you have set the <envar>HGUSER</envar> - environment variable, this is checked - next.</para></listitem> - <listitem><para>If you create a file in your home directory - called <filename role="special">.hgrc</filename>, with a - <envar role="rc-item-ui">username</envar> entry, that will - be used next. To see what the contents of this file - should look like, refer to section <xref - linkend="sec:tour-basic:username"/> - below.</para></listitem> - <listitem><para>If you have set the <envar>EMAIL</envar> - environment variable, this will be used - next.</para></listitem> - <listitem><para>Mercurial will query your system to find out - your local user name and host name, and construct a - username from these components. Since this often results - in a username that is not very useful, it will print a - warning if it has to do - this.</para></listitem> - </orderedlist> - <para>If all of these mechanisms fail, Mercurial will - fail, printing an error message. In this case, it will not - let you commit until you set up a - username.</para> - <para>You should think of the <envar>HGUSER</envar> environment - variable and the <option role="hg-opt-commit">-u</option> - option to the <command role="hg-cmd">hg commit</command> - command as ways to <emphasis>override</emphasis> Mercurial's - default selection of username. For normal use, the simplest - and most robust way to set a username for yourself is by - creating a <filename role="special">.hgrc</filename> file; see - below for details.</para> - <sect3 id="sec:tour-basic:username"> - <title>Creating a Mercurial configuration file</title> - - <para>To set a user name, use your favourite editor - to create a file called <filename - role="special">.hgrc</filename> in your home directory. - Mercurial will use this file to look up your personalised - configuration settings. The initial contents of your - <filename role="special">.hgrc</filename> should look like - this.</para> - <programlisting># This is a Mercurial configuration file. -[ui] -username = Firstname Lastname -<email.address@domain.net></programlisting> - - <para>The <quote><literal>[ui]</literal></quote> line begins a - <emphasis>section</emphasis> of the config file, so you can - read the <quote><literal>username = ...</literal></quote> - line as meaning <quote>set the value of the - <literal>username</literal> item in the - <literal>ui</literal> section</quote>. A section continues - until a new section begins, or the end of the file. - Mercurial ignores empty lines and treats any text from - <quote><literal>#</literal></quote> to the end of a line as - a comment.</para> - </sect3> - - <sect3> - <title>Choosing a user name</title> - - <para>You can use any text you like as the value of - the <literal>username</literal> config item, since this - information is for reading by other people, but for - interpreting by Mercurial. The convention that most - people follow is to use their name and email address, as - in the example above.</para> - <note> - <para>Mercurial's built-in web server obfuscates - email addresses, to make it more difficult for the email - harvesting tools that spammers use. This reduces the - likelihood that you'll start receiving more junk email - if you publish a Mercurial repository on the - web.</para></note> - - </sect3> - </sect2> - <sect2> - <title>Writing a commit message</title> - - <para>When we commit a change, Mercurial drops us into - a text editor, to enter a message that will describe the - modifications we've made in this changeset. This is called - the <emphasis>commit message</emphasis>. It will be a - record for readers of what we did and why, and it will be - printed by <command role="hg-cmd">hg log</command> after - we've finished committing.</para> - - &interaction.tour.commit; - - <para>The editor that the <command role="hg-cmd">hg - commit</command> command drops us into will contain an - empty line, followed by a number of lines starting with - <quote><literal>HG:</literal></quote>.</para> - - <programlisting>XXX fix this XXX</programlisting> - - <para>Mercurial ignores the lines that start with - <quote><literal>HG:</literal></quote>; it uses them only to - tell us which files it's recording changes to. Modifying or - deleting these lines has no effect.</para> - </sect2> - <sect2> - <title>Writing a good commit message</title> - - <para>Since <command role="hg-cmd">hg log</command> - only prints the first line of a commit message by default, - it's best to write a commit message whose first line stands - alone. Here's a real example of a commit message that - <emphasis>doesn't</emphasis> follow this guideline, and - hence has a summary that is not - readable.</para> - - <programlisting> -changeset: 73:584af0e231be -user: Censored Person <censored.person@example.org> -date: Tue Sep 26 21:37:07 2006 -0700 -summary: include buildmeister/commondefs. Add exports.</programlisting> - - <para>As far as the remainder of the contents of the - commit message are concerned, there are no hard-and-fast - rules. Mercurial itself doesn't interpret or care about the - contents of the commit message, though your project may have - policies that dictate a certain kind of - formatting.</para> - <para>My personal preference is for short, but - informative, commit messages that tell me something that I - can't figure out with a quick glance at the output of - <command role="hg-cmd">hg log - --patch</command>.</para> - </sect2> - <sect2> - <title>Aborting a commit</title> - - <para>If you decide that you don't want to commit - while in the middle of editing a commit message, simply exit - from your editor without saving the file that it's editing. - This will cause nothing to happen to either the repository - or the working directory.</para> - <para>If we run the <command role="hg-cmd">hg - commit</command> command without any arguments, it records - all of the changes we've made, as reported by <command - role="hg-cmd">hg status</command> and <command - role="hg-cmd">hg diff</command>.</para> - </sect2> - <sect2> - <title>Admiring our new handiwork</title> - - <para>Once we've finished the commit, we can use the - <command role="hg-cmd">hg tip</command> command to display - the changeset we just created. This command produces output - that is identical to <command role="hg-cmd">hg - log</command>, but it only displays the newest revision in - the repository.</para> - - &interaction.tour.tip; - - <para>We refer to - the newest revision in the repository as the tip revision, - or simply the tip.</para> - </sect2> - </sect1> - - <sect1> - <title>Sharing changes</title> - - <para>We mentioned earlier that repositories in - Mercurial are self-contained. This means that the changeset - we just created exists only in our <filename - class="directory">my-hello</filename> repository. Let's - look at a few ways that we can propagate this change into - other repositories.</para> - - <sect2 id="sec:tour:pull"> - <title>Pulling changes from another repository</title> - <para>To get started, let's clone our original - <filename class="directory">hello</filename> repository, - which does not contain the change we just committed. We'll - call our temporary repository <filename - class="directory">hello-pull</filename>.</para> - - &interaction.tour.clone-pull; - - <para>We'll use the <command role="hg-cmd">hg - pull</command> command to bring changes from <filename - class="directory">my-hello</filename> into <filename - class="directory">hello-pull</filename>. However, blindly - pulling unknown changes into a repository is a somewhat - scary prospect. Mercurial provides the <command - role="hg-cmd">hg incoming</command> command to tell us - what changes the <command role="hg-cmd">hg pull</command> - command <emphasis>would</emphasis> pull into the repository, - without actually pulling the changes in.</para> - - &interaction.tour.incoming; - - <para>(Of course, someone could - cause more changesets to appear in the repository that we - ran <command role="hg-cmd">hg incoming</command> in, before - we get a chance to <command role="hg-cmd">hg pull</command> - the changes, so that we could end up pulling changes that we - didn't expect.)</para> - - <para>Bringing changes into a repository is a simple - matter of running the <command role="hg-cmd">hg - pull</command> command, and telling it which repository to - pull from.</para> - - &interaction.tour.pull; - - <para>As you can see - from the before-and-after output of <command - role="hg-cmd">hg tip</command>, we have successfully - pulled changes into our repository. There remains one step - before we can see these changes in the working - directory.</para> - </sect2> - <sect2> - <title>Updating the working directory</title> - - <para>We have so far glossed over the relationship between a - repository and its working directory. The <command - role="hg-cmd">hg pull</command> command that we ran in - section <xref linkend="sec:tour:pull"/> brought changes - into the repository, but if we check, there's no sign of those - changes in the working directory. This is because <command - role="hg-cmd">hg pull</command> does not (by default) touch - the working directory. Instead, we use the <command - role="hg-cmd">hg update</command> command to do this.</para> - - &interaction.tour.update; - - <para>It might seem a bit strange that <command role="hg-cmd">hg - pull</command> doesn't update the working directory - automatically. There's actually a good reason for this: you - can use <command role="hg-cmd">hg update</command> to update - the working directory to the state it was in at <emphasis>any - revision</emphasis> in the history of the repository. If - you had the working directory updated to an old revision---to - hunt down the origin of a bug, say---and ran a <command - role="hg-cmd">hg pull</command> which automatically updated - the working directory to a new revision, you might not be - terribly happy.</para> - <para>However, since pull-then-update is such a common thing to - do, Mercurial lets you combine the two by passing the <option - role="hg-opt-pull">-u</option> option to <command - role="hg-cmd">hg pull</command>.</para> - - <para>If you look back at the output of <command - role="hg-cmd">hg pull</command> in section <xref - linkend="sec:tour:pull"/> when we ran it without <option - role="hg-opt-pull">-u</option>, you can see that it printed - a helpful reminder that we'd have to take an explicit step to - update the working directory:</para> - - <!-- &interaction.xxx.fixme; --> - - <para>To find out what revision the working directory is at, use - the <command role="hg-cmd">hg parents</command> - command.</para> - - &interaction.tour.parents; - - <para>If you look back at figure <xref - linkend="fig:tour-basic:history"/>, - you'll see arrows connecting each changeset. The node that - the arrow leads <emphasis>from</emphasis> in each case is a - parent, and the node that the arrow leads - <emphasis>to</emphasis> is its child. The working directory - has a parent in just the same way; this is the changeset that - the working directory currently contains.</para> - - <para>To update the working directory to a particular revision, - - give a revision number or changeset ID to the <command - role="hg-cmd">hg update</command> command.</para> - - &interaction.tour.older; - - <para>If you omit an explicit revision, <command - role="hg-cmd">hg update</command> will update to the tip - revision, as shown by the second call to <command - role="hg-cmd">hg update</command> in the example - above.</para> - </sect2> - - <sect2> - <title>Pushing changes to another repository</title> - - <para>Mercurial lets us push changes to another - repository, from the repository we're currently visiting. - As with the example of <command role="hg-cmd">hg - pull</command> above, we'll create a temporary repository - to push our changes into.</para> - - &interaction.tour.clone-push; - - <para>The <command role="hg-cmd">hg outgoing</command> command - tells us what changes would be pushed into another - repository.</para> - - &interaction.tour.outgoing; - - <para>And the - <command role="hg-cmd">hg push</command> command does the - actual push.</para> - - &interaction.tour.push; - - <para>As with - <command role="hg-cmd">hg pull</command>, the <command - role="hg-cmd">hg push</command> command does not update - the working directory in the repository that it's pushing - changes into. (Unlike <command role="hg-cmd">hg - pull</command>, <command role="hg-cmd">hg push</command> - does not provide a <literal>-u</literal> option that updates - the other repository's working directory.)</para> - - <para>What happens if we try to pull or push changes - and the receiving repository already has those changes? - Nothing too exciting.</para> - - &interaction.tour.push.nothing; - </sect2> - <sect2> - <title>Sharing changes over a network</title> - - <para>The commands we have covered in the previous few - sections are not limited to working with local repositories. - Each works in exactly the same fashion over a network - connection; simply pass in a URL instead of a local - path.</para> - - &interaction.tour.outgoing.net; - - <para>In this example, we - can see what changes we could push to the remote repository, - but the repository is understandably not set up to let - anonymous users push to it.</para> - - &interaction.tour.push.net; - </sect2> - </sect1> -</chapter> - -<!-- -local variables: -sgml-parent-document: ("00book.xml" "book" "chapter") -end: --->
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/en/ch02-tour-merge.xml Thu Mar 19 20:54:12 2009 -0700 @@ -0,0 +1,394 @@ +<!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : --> + +<chapter id="chap:tour-merge"> + <?dbhtml filename="a-tour-of-mercurial-merging-work.html"?> + <title>A tour of Mercurial: merging work</title> + + <para>We've now covered cloning a repository, making changes in a + repository, and pulling or pushing changes from one repository + into another. Our next step is <emphasis>merging</emphasis> + changes from separate repositories.</para> + + <sect1> + <title>Merging streams of work</title> + + <para>Merging is a fundamental part of working with a distributed + revision control tool.</para> + <itemizedlist> + <listitem><para>Alice and Bob each have a personal copy of a + repository for a project they're collaborating on. Alice + fixes a bug in her repository; Bob adds a new feature in + his. They want the shared repository to contain both the + bug fix and the new feature.</para> + </listitem> + <listitem><para>I frequently work on several different tasks for + a single project at once, each safely isolated in its own + repository. Working this way means that I often need to + merge one piece of my own work with another.</para> + </listitem></itemizedlist> + + <para>Because merging is such a common thing to need to do, + Mercurial makes it easy. Let's walk through the process. We'll + begin by cloning yet another repository (see how often they + spring up?) and making a change in it.</para> + + &interaction.tour.merge.clone; + + <para>We should now have two copies of + <filename>hello.c</filename> with different contents. The + histories of the two repositories have also diverged, as + illustrated in figure <xref + linkend="fig:tour-merge:sep-repos"/>.</para> + + &interaction.tour.merge.cat; + + <informalfigure id="fig:tour-merge:sep-repos"> + <mediaobject> + <imageobject><imagedata fileref="tour-merge-sep-repos"/></imageobject> + <textobject><phrase>XXX add text</phrase></textobject> + <caption><para>Divergent recent histories of the <filename + class="directory">my-hello</filename> and <filename + class="directory">my-new-hello</filename> + repositories</para></caption> + </mediaobject> + </informalfigure> + + <para>We already know that pulling changes from our <filename + class="directory">my-hello</filename> repository will have no + effect on the working directory.</para> + + &interaction.tour.merge.pull; + + <para>However, the <command role="hg-cmd">hg pull</command> + command says something about <quote>heads</quote>.</para> + + <sect2> + <title>Head changesets</title> + + <para>A head is a change that has no descendants, or children, + as they're also known. The tip revision is thus a head, + because the newest revision in a repository doesn't have any + children, but a repository can contain more than one + head.</para> + + <informalfigure id="fig:tour-merge:pull"> + <mediaobject><imageobject><imagedata + fileref="tour-merge-pull"/></imageobject><textobject><phrase>XXX + add text</phrase></textobject> + <caption><para>Repository contents after pulling from + <filename class="directory">my-hello</filename> into + <filename + class="directory">my-new-hello</filename></para></caption> + </mediaobject> + </informalfigure> + + <para>In figure <xref linkend="fig:tour-merge:pull"/>, you can + see the effect of the pull from <filename + class="directory">my-hello</filename> into <filename + class="directory">my-new-hello</filename>. The history that + was already present in <filename + class="directory">my-new-hello</filename> is untouched, but + a new revision has been added. By referring to figure <xref + linkend="fig:tour-merge:sep-repos"/>, we can see that the + <emphasis>changeset ID</emphasis> remains the same in the new + repository, but the <emphasis>revision number</emphasis> has + changed. (This, incidentally, is a fine example of why it's + not safe to use revision numbers when discussing changesets.) + We can view the heads in a repository using the <command + role="hg-cmd">hg heads</command> command.</para> + + &interaction.tour.merge.heads; + + </sect2> + <sect2> + <title>Performing the merge</title> + + <para>What happens if we try to use the normal <command + role="hg-cmd">hg update</command> command to update to the + new tip?</para> + + &interaction.tour.merge.update; + + <para>Mercurial is telling us that the <command role="hg-cmd">hg + update</command> command won't do a merge; it won't update + the working directory when it thinks we might be wanting to do + a merge, unless we force it to do so. Instead, we use the + <command role="hg-cmd">hg merge</command> command to merge the + two heads.</para> + + &interaction.tour.merge.merge; + + <informalfigure id="fig:tour-merge:merge"> + + <mediaobject><imageobject><imagedata + fileref="tour-merge-merge"/></imageobject><textobject><phrase>XXX + add text</phrase></textobject> + <caption><para>Working directory and repository during + merge, and following commit</para></caption> + </mediaobject> + </informalfigure> + + <para>This updates the working directory so that it contains + changes from <emphasis>both</emphasis> heads, which is + reflected in both the output of <command role="hg-cmd">hg + parents</command> and the contents of + <filename>hello.c</filename>.</para> + + &interaction.tour.merge.parents; + + </sect2> + <sect2> + <title>Committing the results of the merge</title> + + <para>Whenever we've done a merge, <command role="hg-cmd">hg + parents</command> will display two parents until we <command + role="hg-cmd">hg commit</command> the results of the + merge.</para> + + &interaction.tour.merge.commit; + + <para>We now have a new tip revision; notice that it has + <emphasis>both</emphasis> of our former heads as its parents. + These are the same revisions that were previously displayed by + <command role="hg-cmd">hg parents</command>.</para> + + &interaction.tour.merge.tip; + + <para>In figure <xref + linkend="fig:tour-merge:merge"/>, you can see a + representation of what happens to the working directory during + the merge, and how this affects the repository when the commit + happens. During the merge, the working directory has two + parent changesets, and these become the parents of the new + changeset.</para> + + </sect2> + </sect1> + <sect1> + <title>Merging conflicting changes</title> + + <para>Most merges are simple affairs, but sometimes you'll find + yourself merging changes where each modifies the same portions + of the same files. Unless both modifications are identical, + this results in a <emphasis>conflict</emphasis>, where you have + to decide how to reconcile the different changes into something + coherent.</para> + + <informalfigure> + + <mediaobject id="fig:tour-merge:conflict"> + <imageobject><imagedata fileref="tour-merge-conflict"/></imageobject> + <textobject><phrase>XXX add text</phrase></textobject> + <caption><para>Conflicting changes to a + document</para></caption> </mediaobject> + </informalfigure> + + <para>Figure <xref linkend="fig:tour-merge:conflict"/> illustrates + an instance of two conflicting changes to a document. We + started with a single version of the file; then we made some + changes; while someone else made different changes to the same + text. Our task in resolving the conflicting changes is to + decide what the file should look like.</para> + + <para>Mercurial doesn't have a built-in facility for handling + conflicts. Instead, it runs an external program called + <command>hgmerge</command>. This is a shell script that is + bundled with Mercurial; you can change it to behave however you + please. What it does by default is try to find one of several + different merging tools that are likely to be installed on your + system. It first tries a few fully automatic merging tools; if + these don't succeed (because the resolution process requires + human guidance) or aren't present, the script tries a few + different graphical merging tools.</para> + + <para>It's also possible to get Mercurial to run another program + or script instead of <command>hgmerge</command>, by setting the + <envar>HGMERGE</envar> environment variable to the name of your + preferred program.</para> + + <sect2> + <title>Using a graphical merge tool</title> + + <para>My preferred graphical merge tool is + <command>kdiff3</command>, which I'll use to describe the + features that are common to graphical file merging tools. You + can see a screenshot of <command>kdiff3</command> in action in + figure <xref linkend="fig:tour-merge:kdiff3"/>. The kind of + merge it is performing is called a <emphasis>three-way + merge</emphasis>, because there are three different versions + of the file of interest to us. The tool thus splits the upper + portion of the window into three panes:</para> + <itemizedlist> + <listitem><para>At the left is the <emphasis>base</emphasis> + version of the file, i.e. the most recent version from + which the two versions we're trying to merge are + descended.</para> + </listitem> + <listitem><para>In the middle is <quote>our</quote> version of + the file, with the contents that we modified.</para> + </listitem> + <listitem><para>On the right is <quote>their</quote> version + of the file, the one that from the changeset that we're + trying to merge with.</para> + </listitem></itemizedlist> + <para>In the pane below these is the current + <emphasis>result</emphasis> of the merge. Our task is to + replace all of the red text, which indicates unresolved + conflicts, with some sensible merger of the + <quote>ours</quote> and <quote>theirs</quote> versions of the + file.</para> + + <para>All four of these panes are <emphasis>locked + together</emphasis>; if we scroll vertically or horizontally + in any of them, the others are updated to display the + corresponding sections of their respective files.</para> + + <informalfigure id="fig:tour-merge:kdiff3"> + <mediaobject><imageobject><imagedata + fileref="kdiff3"/></imageobject><textobject><phrase>XXX + add text</phrase></textobject> + <caption><para>Using <command>kdiff3</command> to merge + versions of a file</para></caption> + </mediaobject> + </informalfigure> + + <para>For each conflicting portion of the file, we can choose to + resolve the conflict using some combination of text from the + base version, ours, or theirs. We can also manually edit the + merged file at any time, in case we need to make further + modifications.</para> + + <para>There are <emphasis>many</emphasis> file merging tools + available, too many to cover here. They vary in which + platforms they are available for, and in their particular + strengths and weaknesses. Most are tuned for merging files + containing plain text, while a few are aimed at specialised + file formats (generally XML).</para> + + </sect2> + <sect2> + <title>A worked example</title> + + <para>In this example, we will reproduce the file modification + history of figure <xref linkend="fig:tour-merge:conflict"/> + above. Let's begin by creating a repository with a base + version of our document.</para> + + &interaction.tour-merge-conflict.wife; + + <para>We'll clone the repository and make a change to the + file.</para> + + &interaction.tour-merge-conflict.cousin; + + <para>And another clone, to simulate someone else making a + change to the file. (This hints at the idea that it's not all + that unusual to merge with yourself when you isolate tasks in + separate repositories, and indeed to find and resolve + conflicts while doing so.)</para> + + &interaction.tour-merge-conflict.son; + + <para>Having created two + different versions of the file, we'll set up an environment + suitable for running our merge.</para> + + &interaction.tour-merge-conflict.pull; + + <para>In this example, I won't use Mercurial's normal + <command>hgmerge</command> program to do the merge, because it + would drop my nice automated example-running tool into a + graphical user interface. Instead, I'll set + <envar>HGMERGE</envar> to tell Mercurial to use the + non-interactive <command>merge</command> command. This is + bundled with many Unix-like systems. If you're following this + example on your computer, don't bother setting + <envar>HGMERGE</envar>.</para> + + <para><emphasis role="bold">XXX FIX THIS + EXAMPLE.</emphasis></para> + + &interaction.tour-merge-conflict.merge; + + <para>Because <command>merge</command> can't resolve the + conflicting changes, it leaves <emphasis>merge + markers</emphasis> inside the file that has conflicts, + indicating which lines have conflicts, and whether they came + from our version of the file or theirs.</para> + + <para>Mercurial can tell from the way <command>merge</command> + exits that it wasn't able to merge successfully, so it tells + us what commands we'll need to run if we want to redo the + merging operation. This could be useful if, for example, we + were running a graphical merge tool and quit because we were + confused or realised we had made a mistake.</para> + + <para>If automatic or manual merges fail, there's nothing to + prevent us from <quote>fixing up</quote> the affected files + ourselves, and committing the results of our merge:</para> + + &interaction.tour-merge-conflict.commit; + + </sect2> + </sect1> + <sect1 id="sec:tour-merge:fetch"> + <title>Simplifying the pull-merge-commit sequence</title> + + <para>The process of merging changes as outlined above is + straightforward, but requires running three commands in + sequence.</para> + <programlisting>hg pull +hg merge +hg commit -m 'Merged remote changes'</programlisting> + <para>In the case of the final commit, you also need to enter a + commit message, which is almost always going to be a piece of + uninteresting <quote>boilerplate</quote> text.</para> + + <para>It would be nice to reduce the number of steps needed, if + this were possible. Indeed, Mercurial is distributed with an + extension called <literal role="hg-ext">fetch</literal> that + does just this.</para> + + <para>Mercurial provides a flexible extension mechanism that lets + people extend its functionality, while keeping the core of + Mercurial small and easy to deal with. Some extensions add new + commands that you can use from the command line, while others + work <quote>behind the scenes,</quote> for example adding + capabilities to the server.</para> + + <para>The <literal role="hg-ext">fetch</literal> extension adds a + new command called, not surprisingly, <command role="hg-cmd">hg + fetch</command>. This extension acts as a combination of + <command role="hg-cmd">hg pull</command>, <command + role="hg-cmd">hg update</command> and <command + role="hg-cmd">hg merge</command>. It begins by pulling + changes from another repository into the current repository. If + it finds that the changes added a new head to the repository, it + begins a merge, then commits the result of the merge with an + automatically-generated commit message. If no new heads were + added, it updates the working directory to the new tip + changeset.</para> + + <para>Enabling the <literal role="hg-ext">fetch</literal> + extension is easy. Edit your <filename + role="special">.hgrc</filename>, and either go to the <literal + role="rc-extensions">extensions</literal> section or create an + <literal role="rc-extensions">extensions</literal> section. Then + add a line that simply reads <quote><literal>fetch + </literal></quote>.</para> + <programlisting>[extensions] +fetch =</programlisting> + <para>(Normally, on the right-hand side of the + <quote><literal>=</literal></quote> would appear the location of + the extension, but since the <literal + role="hg-ext">fetch</literal> extension is in the standard + distribution, Mercurial knows where to search for it.)</para> + + </sect1> +</chapter> + +<!-- +local variables: +sgml-parent-document: ("00book.xml" "book" "chapter") +end: +-->
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/en/ch03-concepts.xml Thu Mar 19 20:54:12 2009 -0700 @@ -0,0 +1,726 @@ +<!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : --> + +<chapter id="chap:concepts"> + <?dbhtml filename="behind-the-scenes.html"?> + <title>Behind the scenes</title> + + <para>Unlike many revision control systems, the concepts upon which + Mercurial is built are simple enough that it's easy to understand + how the software really works. Knowing this certainly isn't + necessary, but I find it useful to have a <quote>mental + model</quote> of what's going on.</para> + + <para>This understanding gives me confidence that Mercurial has been + carefully designed to be both <emphasis>safe</emphasis> and + <emphasis>efficient</emphasis>. And just as importantly, if it's + easy for me to retain a good idea of what the software is doing + when I perform a revision control task, I'm less likely to be + surprised by its behaviour.</para> + + <para>In this chapter, we'll initially cover the core concepts + behind Mercurial's design, then continue to discuss some of the + interesting details of its implementation.</para> + + <sect1> + <title>Mercurial's historical record</title> + + <sect2> + <title>Tracking the history of a single file</title> + + <para>When Mercurial tracks modifications to a file, it stores + the history of that file in a metadata object called a + <emphasis>filelog</emphasis>. Each entry in the filelog + contains enough information to reconstruct one revision of the + file that is being tracked. Filelogs are stored as files in + the <filename role="special" + class="directory">.hg/store/data</filename> directory. A + filelog contains two kinds of information: revision data, and + an index to help Mercurial to find a revision + efficiently.</para> + + <para>A file that is large, or has a lot of history, has its + filelog stored in separate data + (<quote><literal>.d</literal></quote> suffix) and index + (<quote><literal>.i</literal></quote> suffix) files. For + small files without much history, the revision data and index + are combined in a single <quote><literal>.i</literal></quote> + file. The correspondence between a file in the working + directory and the filelog that tracks its history in the + repository is illustrated in figure <xref + linkend="fig:concepts:filelog"/>.</para> + + <informalfigure id="fig:concepts:filelog"> + <mediaobject><imageobject><imagedata + fileref="filelog"/></imageobject><textobject><phrase>XXX + add text</phrase></textobject> + <caption><para>Relationships between files in working + directory and filelogs in + repository</para></caption></mediaobject> + </informalfigure> + + </sect2> + <sect2> + <title>Managing tracked files</title> + + <para>Mercurial uses a structure called a + <emphasis>manifest</emphasis> to collect together information + about the files that it tracks. Each entry in the manifest + contains information about the files present in a single + changeset. An entry records which files are present in the + changeset, the revision of each file, and a few other pieces + of file metadata.</para> + + </sect2> + <sect2> + <title>Recording changeset information</title> + + <para>The <emphasis>changelog</emphasis> contains information + about each changeset. Each revision records who committed a + change, the changeset comment, other pieces of + changeset-related information, and the revision of the + manifest to use.</para> + + </sect2> + <sect2> + <title>Relationships between revisions</title> + + <para>Within a changelog, a manifest, or a filelog, each + revision stores a pointer to its immediate parent (or to its + two parents, if it's a merge revision). As I mentioned above, + there are also relationships between revisions + <emphasis>across</emphasis> these structures, and they are + hierarchical in nature.</para> + + <para>For every changeset in a repository, there is exactly one + revision stored in the changelog. Each revision of the + changelog contains a pointer to a single revision of the + manifest. A revision of the manifest stores a pointer to a + single revision of each filelog tracked when that changeset + was created. These relationships are illustrated in figure + <xref linkend="fig:concepts:metadata"/>.</para> + + <informalfigure id="fig:concepts:metadata"> + <mediaobject><imageobject><imagedata + fileref="metadata"/></imageobject><textobject><phrase>XXX + add text</phrase></textobject><caption><para>Metadata + relationships</para></caption> + </mediaobject> + </informalfigure> + + <para>As the illustration shows, there is + <emphasis>not</emphasis> a <quote>one to one</quote> + relationship between revisions in the changelog, manifest, or + filelog. If the manifest hasn't changed between two + changesets, the changelog entries for those changesets will + point to the same revision of the manifest. If a file that + Mercurial tracks hasn't changed between two changesets, the + entry for that file in the two revisions of the manifest will + point to the same revision of its filelog.</para> + + </sect2> + </sect1> + <sect1> + <title>Safe, efficient storage</title> + + <para>The underpinnings of changelogs, manifests, and filelogs are + provided by a single structure called the + <emphasis>revlog</emphasis>.</para> + + <sect2> + <title>Efficient storage</title> + + <para>The revlog provides efficient storage of revisions using a + <emphasis>delta</emphasis> mechanism. Instead of storing a + complete copy of a file for each revision, it stores the + changes needed to transform an older revision into the new + revision. For many kinds of file data, these deltas are + typically a fraction of a percent of the size of a full copy + of a file.</para> + + <para>Some obsolete revision control systems can only work with + deltas of text files. They must either store binary files as + complete snapshots or encoded into a text representation, both + of which are wasteful approaches. Mercurial can efficiently + handle deltas of files with arbitrary binary contents; it + doesn't need to treat text as special.</para> + + </sect2> + <sect2 id="sec:concepts:txn"> + <title>Safe operation</title> + + <para>Mercurial only ever <emphasis>appends</emphasis> data to + the end of a revlog file. It never modifies a section of a + file after it has written it. This is both more robust and + efficient than schemes that need to modify or rewrite + data.</para> + + <para>In addition, Mercurial treats every write as part of a + <emphasis>transaction</emphasis> that can span a number of + files. A transaction is <emphasis>atomic</emphasis>: either + the entire transaction succeeds and its effects are all + visible to readers in one go, or the whole thing is undone. + This guarantee of atomicity means that if you're running two + copies of Mercurial, where one is reading data and one is + writing it, the reader will never see a partially written + result that might confuse it.</para> + + <para>The fact that Mercurial only appends to files makes it + easier to provide this transactional guarantee. The easier it + is to do stuff like this, the more confident you should be + that it's done correctly.</para> + + </sect2> + <sect2> + <title>Fast retrieval</title> + + <para>Mercurial cleverly avoids a pitfall common to all earlier + revision control systems: the problem of <emphasis>inefficient + retrieval</emphasis>. Most revision control systems store + the contents of a revision as an incremental series of + modifications against a <quote>snapshot</quote>. To + reconstruct a specific revision, you must first read the + snapshot, and then every one of the revisions between the + snapshot and your target revision. The more history that a + file accumulates, the more revisions you must read, hence the + longer it takes to reconstruct a particular revision.</para> + + <informalfigure id="fig:concepts:snapshot"> + <mediaobject><imageobject><imagedata + fileref="snapshot"/></imageobject><textobject><phrase>XXX + add text</phrase></textobject><caption><para>Snapshot of + a revlog, with incremental + deltas</para></caption></mediaobject> + </informalfigure> + + <para>The innovation that Mercurial applies to this problem is + simple but effective. Once the cumulative amount of delta + information stored since the last snapshot exceeds a fixed + threshold, it stores a new snapshot (compressed, of course), + instead of another delta. This makes it possible to + reconstruct <emphasis>any</emphasis> revision of a file + quickly. This approach works so well that it has since been + copied by several other revision control systems.</para> + + <para>Figure <xref linkend="fig:concepts:snapshot"/> illustrates + the idea. In an entry in a revlog's index file, Mercurial + stores the range of entries from the data file that it must + read to reconstruct a particular revision.</para> + + <sect3> + <title>Aside: the influence of video compression</title> + + <para>If you're familiar with video compression or have ever + watched a TV feed through a digital cable or satellite + service, you may know that most video compression schemes + store each frame of video as a delta against its predecessor + frame. In addition, these schemes use <quote>lossy</quote> + compression techniques to increase the compression ratio, so + visual errors accumulate over the course of a number of + inter-frame deltas.</para> + + <para>Because it's possible for a video stream to <quote>drop + out</quote> occasionally due to signal glitches, and to + limit the accumulation of artefacts introduced by the lossy + compression process, video encoders periodically insert a + complete frame (called a <quote>key frame</quote>) into the + video stream; the next delta is generated against that + frame. This means that if the video signal gets + interrupted, it will resume once the next key frame is + received. Also, the accumulation of encoding errors + restarts anew with each key frame.</para> + + </sect3> + </sect2> + <sect2> + <title>Identification and strong integrity</title> + + <para>Along with delta or snapshot information, a revlog entry + contains a cryptographic hash of the data that it represents. + This makes it difficult to forge the contents of a revision, + and easy to detect accidental corruption.</para> + + <para>Hashes provide more than a mere check against corruption; + they are used as the identifiers for revisions. The changeset + identification hashes that you see as an end user are from + revisions of the changelog. Although filelogs and the + manifest also use hashes, Mercurial only uses these behind the + scenes.</para> + + <para>Mercurial verifies that hashes are correct when it + retrieves file revisions and when it pulls changes from + another repository. If it encounters an integrity problem, it + will complain and stop whatever it's doing.</para> + + <para>In addition to the effect it has on retrieval efficiency, + Mercurial's use of periodic snapshots makes it more robust + against partial data corruption. If a revlog becomes partly + corrupted due to a hardware error or system bug, it's often + possible to reconstruct some or most revisions from the + uncorrupted sections of the revlog, both before and after the + corrupted section. This would not be possible with a + delta-only storage model.</para> + + </sect2> + </sect1> + <sect1> + <title>Revision history, branching, and merging</title> + + <para>Every entry in a Mercurial revlog knows the identity of its + immediate ancestor revision, usually referred to as its + <emphasis>parent</emphasis>. In fact, a revision contains room + for not one parent, but two. Mercurial uses a special hash, + called the <quote>null ID</quote>, to represent the idea + <quote>there is no parent here</quote>. This hash is simply a + string of zeroes.</para> + + <para>In figure <xref linkend="fig:concepts:revlog"/>, you can see + an example of the conceptual structure of a revlog. Filelogs, + manifests, and changelogs all have this same structure; they + differ only in the kind of data stored in each delta or + snapshot.</para> + + <para>The first revision in a revlog (at the bottom of the image) + has the null ID in both of its parent slots. For a + <quote>normal</quote> revision, its first parent slot contains + the ID of its parent revision, and its second contains the null + ID, indicating that the revision has only one real parent. Any + two revisions that have the same parent ID are branches. A + revision that represents a merge between branches has two normal + revision IDs in its parent slots.</para> + + <informalfigure id="fig:concepts:revlog"> + <mediaobject><imageobject><imagedata + fileref="revlog"/></imageobject><textobject><phrase>XXX + add text</phrase></textobject></mediaobject> + </informalfigure> + + </sect1> + <sect1> + <title>The working directory</title> + + <para>In the working directory, Mercurial stores a snapshot of the + files from the repository as of a particular changeset.</para> + + <para>The working directory <quote>knows</quote> which changeset + it contains. When you update the working directory to contain a + particular changeset, Mercurial looks up the appropriate + revision of the manifest to find out which files it was tracking + at the time that changeset was committed, and which revision of + each file was then current. It then recreates a copy of each of + those files, with the same contents it had when the changeset + was committed.</para> + + <para>The <emphasis>dirstate</emphasis> contains Mercurial's + knowledge of the working directory. This details which + changeset the working directory is updated to, and all of the + files that Mercurial is tracking in the working + directory.</para> + + <para>Just as a revision of a revlog has room for two parents, so + that it can represent either a normal revision (with one parent) + or a merge of two earlier revisions, the dirstate has slots for + two parents. When you use the <command role="hg-cmd">hg + update</command> command, the changeset that you update to is + stored in the <quote>first parent</quote> slot, and the null ID + in the second. When you <command role="hg-cmd">hg + merge</command> with another changeset, the first parent + remains unchanged, and the second parent is filled in with the + changeset you're merging with. The <command role="hg-cmd">hg + parents</command> command tells you what the parents of the + dirstate are.</para> + + <sect2> + <title>What happens when you commit</title> + + <para>The dirstate stores parent information for more than just + book-keeping purposes. Mercurial uses the parents of the + dirstate as <emphasis>the parents of a new + changeset</emphasis> when you perform a commit.</para> + + <informalfigure id="fig:concepts:wdir"> + <mediaobject><imageobject><imagedata + fileref="wdir"/></imageobject><textobject><phrase>XXX + add text</phrase></textobject><caption><para>The working + directory can have two + parents</para></caption></mediaobject> + </informalfigure> + + <para>Figure <xref linkend="fig:concepts:wdir"/> shows the + normal state of the working directory, where it has a single + changeset as parent. That changeset is the + <emphasis>tip</emphasis>, the newest changeset in the + repository that has no children.</para> + + <informalfigure id="fig:concepts:wdir-after-commit"> + <mediaobject><imageobject><imagedata + fileref="wdir-after-commit"/></imageobject><textobject><phrase>XXX + add text</phrase></textobject><caption><para>The working + directory gains new parents after a + commit</para></caption></mediaobject> + </informalfigure> + + <para>It's useful to think of the working directory as + <quote>the changeset I'm about to commit</quote>. Any files + that you tell Mercurial that you've added, removed, renamed, + or copied will be reflected in that changeset, as will + modifications to any files that Mercurial is already tracking; + the new changeset will have the parents of the working + directory as its parents.</para> + + <para>After a commit, Mercurial will update the parents of the + working directory, so that the first parent is the ID of the + new changeset, and the second is the null ID. This is shown + in figure <xref linkend="fig:concepts:wdir-after-commit"/>. + Mercurial + doesn't touch any of the files in the working directory when + you commit; it just modifies the dirstate to note its new + parents.</para> + + </sect2> + <sect2> + <title>Creating a new head</title> + + <para>It's perfectly normal to update the working directory to a + changeset other than the current tip. For example, you might + want to know what your project looked like last Tuesday, or + you could be looking through changesets to see which one + introduced a bug. In cases like this, the natural thing to do + is update the working directory to the changeset you're + interested in, and then examine the files in the working + directory directly to see their contents as they were when you + committed that changeset. The effect of this is shown in + figure <xref linkend="fig:concepts:wdir-pre-branch"/>.</para> + + <informalfigure id="fig:concepts:wdir-pre-branch"> + <mediaobject><imageobject><imagedata + fileref="wdir-pre-branch"/></imageobject><textobject><phrase>XXX + add text</phrase></textobject><caption><para>The working + directory, updated to an older + changeset</para></caption></mediaobject> + </informalfigure> + + <para>Having updated the working directory to an older + changeset, what happens if you make some changes, and then + commit? Mercurial behaves in the same way as I outlined + above. The parents of the working directory become the + parents of the new changeset. This new changeset has no + children, so it becomes the new tip. And the repository now + contains two changesets that have no children; we call these + <emphasis>heads</emphasis>. You can see the structure that + this creates in figure <xref + linkend="fig:concepts:wdir-branch"/>.</para> + + <informalfigure id="fig:concepts:wdir-branch"> + <mediaobject><imageobject><imagedata + fileref="wdir-branch"/></imageobject><textobject><phrase>XXX + add text</phrase></textobject><caption><para>After a + commit made while synced to an older + changeset</para></caption></mediaobject> + </informalfigure> + + <note> + <para> If you're new to Mercurial, you should keep in mind a + common <quote>error</quote>, which is to use the <command + role="hg-cmd">hg pull</command> command without any + options. By default, the <command role="hg-cmd">hg + pull</command> command <emphasis>does not</emphasis> + update the working directory, so you'll bring new changesets + into your repository, but the working directory will stay + synced at the same changeset as before the pull. If you + make some changes and commit afterwards, you'll thus create + a new head, because your working directory isn't synced to + whatever the current tip is.</para> + + <para> I put the word <quote>error</quote> in quotes because + all that you need to do to rectify this situation is + <command role="hg-cmd">hg merge</command>, then <command + role="hg-cmd">hg commit</command>. In other words, this + almost never has negative consequences; it just surprises + people. I'll discuss other ways to avoid this behaviour, + and why Mercurial behaves in this initially surprising way, + later on.</para> + </note> + + </sect2> + <sect2> + <title>Merging heads</title> + + <para>When you run the <command role="hg-cmd">hg merge</command> + command, Mercurial leaves the first parent of the working + directory unchanged, and sets the second parent to the + changeset you're merging with, as shown in figure <xref + linkend="fig:concepts:wdir-merge"/>.</para> + + <informalfigure id="fig:concepts:wdir-merge"> + <mediaobject><imageobject><imagedata + fileref="wdir-merge"/></imageobject><textobject><phrase>XXX + add text</phrase></textobject><caption><para>Merging two + heads</para></caption></mediaobject> + </informalfigure> + + <para>Mercurial also has to modify the working directory, to + merge the files managed in the two changesets. Simplified a + little, the merging process goes like this, for every file in + the manifests of both changesets.</para> + <itemizedlist> + <listitem><para>If neither changeset has modified a file, do + nothing with that file.</para> + </listitem> + <listitem><para>If one changeset has modified a file, and the + other hasn't, create the modified copy of the file in the + working directory.</para> + </listitem> + <listitem><para>If one changeset has removed a file, and the + other hasn't (or has also deleted it), delete the file + from the working directory.</para> + </listitem> + <listitem><para>If one changeset has removed a file, but the + other has modified the file, ask the user what to do: keep + the modified file, or remove it?</para> + </listitem> + <listitem><para>If both changesets have modified a file, + invoke an external merge program to choose the new + contents for the merged file. This may require input from + the user.</para> + </listitem> + <listitem><para>If one changeset has modified a file, and the + other has renamed or copied the file, make sure that the + changes follow the new name of the file.</para> + </listitem></itemizedlist> + <para>There are more details&emdash;merging has plenty of corner + cases&emdash;but these are the most common choices that are + involved in a merge. As you can see, most cases are + completely automatic, and indeed most merges finish + automatically, without requiring your input to resolve any + conflicts.</para> + + <para>When you're thinking about what happens when you commit + after a merge, once again the working directory is <quote>the + changeset I'm about to commit</quote>. After the <command + role="hg-cmd">hg merge</command> command completes, the + working directory has two parents; these will become the + parents of the new changeset.</para> + + <para>Mercurial lets you perform multiple merges, but you must + commit the results of each individual merge as you go. This + is necessary because Mercurial only tracks two parents for + both revisions and the working directory. While it would be + technically possible to merge multiple changesets at once, the + prospect of user confusion and making a terrible mess of a + merge immediately becomes overwhelming.</para> + + </sect2> + </sect1> + <sect1> + <title>Other interesting design features</title> + + <para>In the sections above, I've tried to highlight some of the + most important aspects of Mercurial's design, to illustrate that + it pays careful attention to reliability and performance. + However, the attention to detail doesn't stop there. There are + a number of other aspects of Mercurial's construction that I + personally find interesting. I'll detail a few of them here, + separate from the <quote>big ticket</quote> items above, so that + if you're interested, you can gain a better idea of the amount + of thinking that goes into a well-designed system.</para> + + <sect2> + <title>Clever compression</title> + + <para>When appropriate, Mercurial will store both snapshots and + deltas in compressed form. It does this by always + <emphasis>trying to</emphasis> compress a snapshot or delta, + but only storing the compressed version if it's smaller than + the uncompressed version.</para> + + <para>This means that Mercurial does <quote>the right + thing</quote> when storing a file whose native form is + compressed, such as a <literal>zip</literal> archive or a JPEG + image. When these types of files are compressed a second + time, the resulting file is usually bigger than the + once-compressed form, and so Mercurial will store the plain + <literal>zip</literal> or JPEG.</para> + + <para>Deltas between revisions of a compressed file are usually + larger than snapshots of the file, and Mercurial again does + <quote>the right thing</quote> in these cases. It finds that + such a delta exceeds the threshold at which it should store a + complete snapshot of the file, so it stores the snapshot, + again saving space compared to a naive delta-only + approach.</para> + + <sect3> + <title>Network recompression</title> + + <para>When storing revisions on disk, Mercurial uses the + <quote>deflate</quote> compression algorithm (the same one + used by the popular <literal>zip</literal> archive format), + which balances good speed with a respectable compression + ratio. However, when transmitting revision data over a + network connection, Mercurial uncompresses the compressed + revision data.</para> + + <para>If the connection is over HTTP, Mercurial recompresses + the entire stream of data using a compression algorithm that + gives a better compression ratio (the Burrows-Wheeler + algorithm from the widely used <literal>bzip2</literal> + compression package). This combination of algorithm and + compression of the entire stream (instead of a revision at a + time) substantially reduces the number of bytes to be + transferred, yielding better network performance over almost + all kinds of network.</para> + + <para>(If the connection is over <command>ssh</command>, + Mercurial <emphasis>doesn't</emphasis> recompress the + stream, because <command>ssh</command> can already do this + itself.)</para> + + </sect3> + </sect2> + <sect2> + <title>Read/write ordering and atomicity</title> + + <para>Appending to files isn't the whole story when it comes to + guaranteeing that a reader won't see a partial write. If you + recall figure <xref linkend="fig:concepts:metadata"/>, + revisions in the + changelog point to revisions in the manifest, and revisions in + the manifest point to revisions in filelogs. This hierarchy + is deliberate.</para> + + <para>A writer starts a transaction by writing filelog and + manifest data, and doesn't write any changelog data until + those are finished. A reader starts by reading changelog + data, then manifest data, followed by filelog data.</para> + + <para>Since the writer has always finished writing filelog and + manifest data before it writes to the changelog, a reader will + never read a pointer to a partially written manifest revision + from the changelog, and it will never read a pointer to a + partially written filelog revision from the manifest.</para> + + </sect2> + <sect2> + <title>Concurrent access</title> + + <para>The read/write ordering and atomicity guarantees mean that + Mercurial never needs to <emphasis>lock</emphasis> a + repository when it's reading data, even if the repository is + being written to while the read is occurring. This has a big + effect on scalability; you can have an arbitrary number of + Mercurial processes safely reading data from a repository + safely all at once, no matter whether it's being written to or + not.</para> + + <para>The lockless nature of reading means that if you're + sharing a repository on a multi-user system, you don't need to + grant other local users permission to + <emphasis>write</emphasis> to your repository in order for + them to be able to clone it or pull changes from it; they only + need <emphasis>read</emphasis> permission. (This is + <emphasis>not</emphasis> a common feature among revision + control systems, so don't take it for granted! Most require + readers to be able to lock a repository to access it safely, + and this requires write permission on at least one directory, + which of course makes for all kinds of nasty and annoying + security and administrative problems.)</para> + + <para>Mercurial uses locks to ensure that only one process can + write to a repository at a time (the locking mechanism is safe + even over filesystems that are notoriously hostile to locking, + such as NFS). If a repository is locked, a writer will wait + for a while to retry if the repository becomes unlocked, but + if the repository remains locked for too long, the process + attempting to write will time out after a while. This means + that your daily automated scripts won't get stuck forever and + pile up if a system crashes unnoticed, for example. (Yes, the + timeout is configurable, from zero to infinity.)</para> + + <sect3> + <title>Safe dirstate access</title> + + <para>As with revision data, Mercurial doesn't take a lock to + read the dirstate file; it does acquire a lock to write it. + To avoid the possibility of reading a partially written copy + of the dirstate file, Mercurial writes to a file with a + unique name in the same directory as the dirstate file, then + renames the temporary file atomically to + <filename>dirstate</filename>. The file named + <filename>dirstate</filename> is thus guaranteed to be + complete, not partially written.</para> + + </sect3> + </sect2> + <sect2> + <title>Avoiding seeks</title> + + <para>Critical to Mercurial's performance is the avoidance of + seeks of the disk head, since any seek is far more expensive + than even a comparatively large read operation.</para> + + <para>This is why, for example, the dirstate is stored in a + single file. If there were a dirstate file per directory that + Mercurial tracked, the disk would seek once per directory. + Instead, Mercurial reads the entire single dirstate file in + one step.</para> + + <para>Mercurial also uses a <quote>copy on write</quote> scheme + when cloning a repository on local storage. Instead of + copying every revlog file from the old repository into the new + repository, it makes a <quote>hard link</quote>, which is a + shorthand way to say <quote>these two names point to the same + file</quote>. When Mercurial is about to write to one of a + revlog's files, it checks to see if the number of names + pointing at the file is greater than one. If it is, more than + one repository is using the file, so Mercurial makes a new + copy of the file that is private to this repository.</para> + + <para>A few revision control developers have pointed out that + this idea of making a complete private copy of a file is not + very efficient in its use of storage. While this is true, + storage is cheap, and this method gives the highest + performance while deferring most book-keeping to the operating + system. An alternative scheme would most likely reduce + performance and increase the complexity of the software, each + of which is much more important to the <quote>feel</quote> of + day-to-day use.</para> + + </sect2> + <sect2> + <title>Other contents of the dirstate</title> + + <para>Because Mercurial doesn't force you to tell it when you're + modifying a file, it uses the dirstate to store some extra + information so it can determine efficiently whether you have + modified a file. For each file in the working directory, it + stores the time that it last modified the file itself, and the + size of the file at that time.</para> + + <para>When you explicitly <command role="hg-cmd">hg + add</command>, <command role="hg-cmd">hg remove</command>, + <command role="hg-cmd">hg rename</command> or <command + role="hg-cmd">hg copy</command> files, Mercurial updates the + dirstate so that it knows what to do with those files when you + commit.</para> + + <para>When Mercurial is checking the states of files in the + working directory, it first checks a file's modification time. + If that has not changed, the file must not have been modified. + If the file's size has changed, the file must have been + modified. If the modification time has changed, but the size + has not, only then does Mercurial need to read the actual + contents of the file to see if they've changed. Storing these + few extra pieces of information dramatically reduces the + amount of data that Mercurial needs to read, which yields + large performance improvements compared to other revision + control systems.</para> + + </sect2> + </sect1> +</chapter> + +<!-- +local variables: +sgml-parent-document: ("00book.xml" "book" "chapter") +end: +-->
--- a/en/ch03-tour-merge.xml Wed Mar 18 00:08:22 2009 -0700 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,394 +0,0 @@ -<!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : --> - -<chapter id="chap:tour-merge"> - <?dbhtml filename="a-tour-of-mercurial-merging-work.html"?> - <title>A tour of Mercurial: merging work</title> - - <para>We've now covered cloning a repository, making changes in a - repository, and pulling or pushing changes from one repository - into another. Our next step is <emphasis>merging</emphasis> - changes from separate repositories.</para> - - <sect1> - <title>Merging streams of work</title> - - <para>Merging is a fundamental part of working with a distributed - revision control tool.</para> - <itemizedlist> - <listitem><para>Alice and Bob each have a personal copy of a - repository for a project they're collaborating on. Alice - fixes a bug in her repository; Bob adds a new feature in - his. They want the shared repository to contain both the - bug fix and the new feature.</para> - </listitem> - <listitem><para>I frequently work on several different tasks for - a single project at once, each safely isolated in its own - repository. Working this way means that I often need to - merge one piece of my own work with another.</para> - </listitem></itemizedlist> - - <para>Because merging is such a common thing to need to do, - Mercurial makes it easy. Let's walk through the process. We'll - begin by cloning yet another repository (see how often they - spring up?) and making a change in it.</para> - - &interaction.tour.merge.clone; - - <para>We should now have two copies of - <filename>hello.c</filename> with different contents. The - histories of the two repositories have also diverged, as - illustrated in figure <xref - linkend="fig:tour-merge:sep-repos"/>.</para> - - &interaction.tour.merge.cat; - - <informalfigure id="fig:tour-merge:sep-repos"> - <mediaobject> - <imageobject><imagedata fileref="tour-merge-sep-repos"/></imageobject> - <textobject><phrase>XXX add text</phrase></textobject> - <caption><para>Divergent recent histories of the <filename - class="directory">my-hello</filename> and <filename - class="directory">my-new-hello</filename> - repositories</para></caption> - </mediaobject> - </informalfigure> - - <para>We already know that pulling changes from our <filename - class="directory">my-hello</filename> repository will have no - effect on the working directory.</para> - - &interaction.tour.merge.pull; - - <para>However, the <command role="hg-cmd">hg pull</command> - command says something about <quote>heads</quote>.</para> - - <sect2> - <title>Head changesets</title> - - <para>A head is a change that has no descendants, or children, - as they're also known. The tip revision is thus a head, - because the newest revision in a repository doesn't have any - children, but a repository can contain more than one - head.</para> - - <informalfigure id="fig:tour-merge:pull"> - <mediaobject><imageobject><imagedata - fileref="tour-merge-pull"/></imageobject><textobject><phrase>XXX - add text</phrase></textobject> - <caption><para>Repository contents after pulling from - <filename class="directory">my-hello</filename> into - <filename - class="directory">my-new-hello</filename></para></caption> - </mediaobject> - </informalfigure> - - <para>In figure <xref linkend="fig:tour-merge:pull"/>, you can - see the effect of the pull from <filename - class="directory">my-hello</filename> into <filename - class="directory">my-new-hello</filename>. The history that - was already present in <filename - class="directory">my-new-hello</filename> is untouched, but - a new revision has been added. By referring to figure <xref - linkend="fig:tour-merge:sep-repos"/>, we can see that the - <emphasis>changeset ID</emphasis> remains the same in the new - repository, but the <emphasis>revision number</emphasis> has - changed. (This, incidentally, is a fine example of why it's - not safe to use revision numbers when discussing changesets.) - We can view the heads in a repository using the <command - role="hg-cmd">hg heads</command> command.</para> - - &interaction.tour.merge.heads; - - </sect2> - <sect2> - <title>Performing the merge</title> - - <para>What happens if we try to use the normal <command - role="hg-cmd">hg update</command> command to update to the - new tip?</para> - - &interaction.tour.merge.update; - - <para>Mercurial is telling us that the <command role="hg-cmd">hg - update</command> command won't do a merge; it won't update - the working directory when it thinks we might be wanting to do - a merge, unless we force it to do so. Instead, we use the - <command role="hg-cmd">hg merge</command> command to merge the - two heads.</para> - - &interaction.tour.merge.merge; - - <informalfigure id="fig:tour-merge:merge"> - - <mediaobject><imageobject><imagedata - fileref="tour-merge-merge"/></imageobject><textobject><phrase>XXX - add text</phrase></textobject> - <caption><para>Working directory and repository during - merge, and following commit</para></caption> - </mediaobject> - </informalfigure> - - <para>This updates the working directory so that it contains - changes from <emphasis>both</emphasis> heads, which is - reflected in both the output of <command role="hg-cmd">hg - parents</command> and the contents of - <filename>hello.c</filename>.</para> - - &interaction.tour.merge.parents; - - </sect2> - <sect2> - <title>Committing the results of the merge</title> - - <para>Whenever we've done a merge, <command role="hg-cmd">hg - parents</command> will display two parents until we <command - role="hg-cmd">hg commit</command> the results of the - merge.</para> - - &interaction.tour.merge.commit; - - <para>We now have a new tip revision; notice that it has - <emphasis>both</emphasis> of our former heads as its parents. - These are the same revisions that were previously displayed by - <command role="hg-cmd">hg parents</command>.</para> - - &interaction.tour.merge.tip; - - <para>In figure <xref - linkend="fig:tour-merge:merge"/>, you can see a - representation of what happens to the working directory during - the merge, and how this affects the repository when the commit - happens. During the merge, the working directory has two - parent changesets, and these become the parents of the new - changeset.</para> - - </sect2> - </sect1> - <sect1> - <title>Merging conflicting changes</title> - - <para>Most merges are simple affairs, but sometimes you'll find - yourself merging changes where each modifies the same portions - of the same files. Unless both modifications are identical, - this results in a <emphasis>conflict</emphasis>, where you have - to decide how to reconcile the different changes into something - coherent.</para> - - <informalfigure> - - <mediaobject id="fig:tour-merge:conflict"> - <imageobject><imagedata fileref="tour-merge-conflict"/></imageobject> - <textobject><phrase>XXX add text</phrase></textobject> - <caption><para>Conflicting changes to a - document</para></caption> </mediaobject> - </informalfigure> - - <para>Figure <xref linkend="fig:tour-merge:conflict"/> illustrates - an instance of two conflicting changes to a document. We - started with a single version of the file; then we made some - changes; while someone else made different changes to the same - text. Our task in resolving the conflicting changes is to - decide what the file should look like.</para> - - <para>Mercurial doesn't have a built-in facility for handling - conflicts. Instead, it runs an external program called - <command>hgmerge</command>. This is a shell script that is - bundled with Mercurial; you can change it to behave however you - please. What it does by default is try to find one of several - different merging tools that are likely to be installed on your - system. It first tries a few fully automatic merging tools; if - these don't succeed (because the resolution process requires - human guidance) or aren't present, the script tries a few - different graphical merging tools.</para> - - <para>It's also possible to get Mercurial to run another program - or script instead of <command>hgmerge</command>, by setting the - <envar>HGMERGE</envar> environment variable to the name of your - preferred program.</para> - - <sect2> - <title>Using a graphical merge tool</title> - - <para>My preferred graphical merge tool is - <command>kdiff3</command>, which I'll use to describe the - features that are common to graphical file merging tools. You - can see a screenshot of <command>kdiff3</command> in action in - figure <xref linkend="fig:tour-merge:kdiff3"/>. The kind of - merge it is performing is called a <emphasis>three-way - merge</emphasis>, because there are three different versions - of the file of interest to us. The tool thus splits the upper - portion of the window into three panes:</para> - <itemizedlist> - <listitem><para>At the left is the <emphasis>base</emphasis> - version of the file, i.e. the most recent version from - which the two versions we're trying to merge are - descended.</para> - </listitem> - <listitem><para>In the middle is <quote>our</quote> version of - the file, with the contents that we modified.</para> - </listitem> - <listitem><para>On the right is <quote>their</quote> version - of the file, the one that from the changeset that we're - trying to merge with.</para> - </listitem></itemizedlist> - <para>In the pane below these is the current - <emphasis>result</emphasis> of the merge. Our task is to - replace all of the red text, which indicates unresolved - conflicts, with some sensible merger of the - <quote>ours</quote> and <quote>theirs</quote> versions of the - file.</para> - - <para>All four of these panes are <emphasis>locked - together</emphasis>; if we scroll vertically or horizontally - in any of them, the others are updated to display the - corresponding sections of their respective files.</para> - - <informalfigure id="fig:tour-merge:kdiff3"> - <mediaobject><imageobject><imagedata - fileref="kdiff3"/></imageobject><textobject><phrase>XXX - add text</phrase></textobject> - <caption><para>Using <command>kdiff3</command> to merge - versions of a file</para></caption> - </mediaobject> - </informalfigure> - - <para>For each conflicting portion of the file, we can choose to - resolve the conflict using some combination of text from the - base version, ours, or theirs. We can also manually edit the - merged file at any time, in case we need to make further - modifications.</para> - - <para>There are <emphasis>many</emphasis> file merging tools - available, too many to cover here. They vary in which - platforms they are available for, and in their particular - strengths and weaknesses. Most are tuned for merging files - containing plain text, while a few are aimed at specialised - file formats (generally XML).</para> - - </sect2> - <sect2> - <title>A worked example</title> - - <para>In this example, we will reproduce the file modification - history of figure <xref linkend="fig:tour-merge:conflict"/> - above. Let's begin by creating a repository with a base - version of our document.</para> - - &interaction.tour-merge-conflict.wife; - - <para>We'll clone the repository and make a change to the - file.</para> - - &interaction.tour-merge-conflict.cousin; - - <para>And another clone, to simulate someone else making a - change to the file. (This hints at the idea that it's not all - that unusual to merge with yourself when you isolate tasks in - separate repositories, and indeed to find and resolve - conflicts while doing so.)</para> - - &interaction.tour-merge-conflict.son; - - <para>Having created two - different versions of the file, we'll set up an environment - suitable for running our merge.</para> - - &interaction.tour-merge-conflict.pull; - - <para>In this example, I won't use Mercurial's normal - <command>hgmerge</command> program to do the merge, because it - would drop my nice automated example-running tool into a - graphical user interface. Instead, I'll set - <envar>HGMERGE</envar> to tell Mercurial to use the - non-interactive <command>merge</command> command. This is - bundled with many Unix-like systems. If you're following this - example on your computer, don't bother setting - <envar>HGMERGE</envar>.</para> - - <para><emphasis role="bold">XXX FIX THIS - EXAMPLE.</emphasis></para> - - &interaction.tour-merge-conflict.merge; - - <para>Because <command>merge</command> can't resolve the - conflicting changes, it leaves <emphasis>merge - markers</emphasis> inside the file that has conflicts, - indicating which lines have conflicts, and whether they came - from our version of the file or theirs.</para> - - <para>Mercurial can tell from the way <command>merge</command> - exits that it wasn't able to merge successfully, so it tells - us what commands we'll need to run if we want to redo the - merging operation. This could be useful if, for example, we - were running a graphical merge tool and quit because we were - confused or realised we had made a mistake.</para> - - <para>If automatic or manual merges fail, there's nothing to - prevent us from <quote>fixing up</quote> the affected files - ourselves, and committing the results of our merge:</para> - - &interaction.tour-merge-conflict.commit; - - </sect2> - </sect1> - <sect1 id="sec:tour-merge:fetch"> - <title>Simplifying the pull-merge-commit sequence</title> - - <para>The process of merging changes as outlined above is - straightforward, but requires running three commands in - sequence.</para> - <programlisting>hg pull -hg merge -hg commit -m 'Merged remote changes'</programlisting> - <para>In the case of the final commit, you also need to enter a - commit message, which is almost always going to be a piece of - uninteresting <quote>boilerplate</quote> text.</para> - - <para>It would be nice to reduce the number of steps needed, if - this were possible. Indeed, Mercurial is distributed with an - extension called <literal role="hg-ext">fetch</literal> that - does just this.</para> - - <para>Mercurial provides a flexible extension mechanism that lets - people extend its functionality, while keeping the core of - Mercurial small and easy to deal with. Some extensions add new - commands that you can use from the command line, while others - work <quote>behind the scenes,</quote> for example adding - capabilities to the server.</para> - - <para>The <literal role="hg-ext">fetch</literal> extension adds a - new command called, not surprisingly, <command role="hg-cmd">hg - fetch</command>. This extension acts as a combination of - <command role="hg-cmd">hg pull</command>, <command - role="hg-cmd">hg update</command> and <command - role="hg-cmd">hg merge</command>. It begins by pulling - changes from another repository into the current repository. If - it finds that the changes added a new head to the repository, it - begins a merge, then commits the result of the merge with an - automatically-generated commit message. If no new heads were - added, it updates the working directory to the new tip - changeset.</para> - - <para>Enabling the <literal role="hg-ext">fetch</literal> - extension is easy. Edit your <filename - role="special">.hgrc</filename>, and either go to the <literal - role="rc-extensions">extensions</literal> section or create an - <literal role="rc-extensions">extensions</literal> section. Then - add a line that simply reads <quote><literal>fetch - </literal></quote>.</para> - <programlisting>[extensions] -fetch =</programlisting> - <para>(Normally, on the right-hand side of the - <quote><literal>=</literal></quote> would appear the location of - the extension, but since the <literal - role="hg-ext">fetch</literal> extension is in the standard - distribution, Mercurial knows where to search for it.)</para> - - </sect1> -</chapter> - -<!-- -local variables: -sgml-parent-document: ("00book.xml" "book" "chapter") -end: --->
--- a/en/ch04-concepts.xml Wed Mar 18 00:08:22 2009 -0700 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,726 +0,0 @@ -<!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : --> - -<chapter id="chap:concepts"> - <?dbhtml filename="behind-the-scenes.html"?> - <title>Behind the scenes</title> - - <para>Unlike many revision control systems, the concepts upon which - Mercurial is built are simple enough that it's easy to understand - how the software really works. Knowing this certainly isn't - necessary, but I find it useful to have a <quote>mental - model</quote> of what's going on.</para> - - <para>This understanding gives me confidence that Mercurial has been - carefully designed to be both <emphasis>safe</emphasis> and - <emphasis>efficient</emphasis>. And just as importantly, if it's - easy for me to retain a good idea of what the software is doing - when I perform a revision control task, I'm less likely to be - surprised by its behaviour.</para> - - <para>In this chapter, we'll initially cover the core concepts - behind Mercurial's design, then continue to discuss some of the - interesting details of its implementation.</para> - - <sect1> - <title>Mercurial's historical record</title> - - <sect2> - <title>Tracking the history of a single file</title> - - <para>When Mercurial tracks modifications to a file, it stores - the history of that file in a metadata object called a - <emphasis>filelog</emphasis>. Each entry in the filelog - contains enough information to reconstruct one revision of the - file that is being tracked. Filelogs are stored as files in - the <filename role="special" - class="directory">.hg/store/data</filename> directory. A - filelog contains two kinds of information: revision data, and - an index to help Mercurial to find a revision - efficiently.</para> - - <para>A file that is large, or has a lot of history, has its - filelog stored in separate data - (<quote><literal>.d</literal></quote> suffix) and index - (<quote><literal>.i</literal></quote> suffix) files. For - small files without much history, the revision data and index - are combined in a single <quote><literal>.i</literal></quote> - file. The correspondence between a file in the working - directory and the filelog that tracks its history in the - repository is illustrated in figure <xref - linkend="fig:concepts:filelog"/>.</para> - - <informalfigure id="fig:concepts:filelog"> - <mediaobject><imageobject><imagedata - fileref="filelog"/></imageobject><textobject><phrase>XXX - add text</phrase></textobject> - <caption><para>Relationships between files in working - directory and filelogs in - repository</para></caption></mediaobject> - </informalfigure> - - </sect2> - <sect2> - <title>Managing tracked files</title> - - <para>Mercurial uses a structure called a - <emphasis>manifest</emphasis> to collect together information - about the files that it tracks. Each entry in the manifest - contains information about the files present in a single - changeset. An entry records which files are present in the - changeset, the revision of each file, and a few other pieces - of file metadata.</para> - - </sect2> - <sect2> - <title>Recording changeset information</title> - - <para>The <emphasis>changelog</emphasis> contains information - about each changeset. Each revision records who committed a - change, the changeset comment, other pieces of - changeset-related information, and the revision of the - manifest to use.</para> - - </sect2> - <sect2> - <title>Relationships between revisions</title> - - <para>Within a changelog, a manifest, or a filelog, each - revision stores a pointer to its immediate parent (or to its - two parents, if it's a merge revision). As I mentioned above, - there are also relationships between revisions - <emphasis>across</emphasis> these structures, and they are - hierarchical in nature.</para> - - <para>For every changeset in a repository, there is exactly one - revision stored in the changelog. Each revision of the - changelog contains a pointer to a single revision of the - manifest. A revision of the manifest stores a pointer to a - single revision of each filelog tracked when that changeset - was created. These relationships are illustrated in figure - <xref linkend="fig:concepts:metadata"/>.</para> - - <informalfigure id="fig:concepts:metadata"> - <mediaobject><imageobject><imagedata - fileref="metadata"/></imageobject><textobject><phrase>XXX - add text</phrase></textobject><caption><para>Metadata - relationships</para></caption> - </mediaobject> - </informalfigure> - - <para>As the illustration shows, there is - <emphasis>not</emphasis> a <quote>one to one</quote> - relationship between revisions in the changelog, manifest, or - filelog. If the manifest hasn't changed between two - changesets, the changelog entries for those changesets will - point to the same revision of the manifest. If a file that - Mercurial tracks hasn't changed between two changesets, the - entry for that file in the two revisions of the manifest will - point to the same revision of its filelog.</para> - - </sect2> - </sect1> - <sect1> - <title>Safe, efficient storage</title> - - <para>The underpinnings of changelogs, manifests, and filelogs are - provided by a single structure called the - <emphasis>revlog</emphasis>.</para> - - <sect2> - <title>Efficient storage</title> - - <para>The revlog provides efficient storage of revisions using a - <emphasis>delta</emphasis> mechanism. Instead of storing a - complete copy of a file for each revision, it stores the - changes needed to transform an older revision into the new - revision. For many kinds of file data, these deltas are - typically a fraction of a percent of the size of a full copy - of a file.</para> - - <para>Some obsolete revision control systems can only work with - deltas of text files. They must either store binary files as - complete snapshots or encoded into a text representation, both - of which are wasteful approaches. Mercurial can efficiently - handle deltas of files with arbitrary binary contents; it - doesn't need to treat text as special.</para> - - </sect2> - <sect2 id="sec:concepts:txn"> - <title>Safe operation</title> - - <para>Mercurial only ever <emphasis>appends</emphasis> data to - the end of a revlog file. It never modifies a section of a - file after it has written it. This is both more robust and - efficient than schemes that need to modify or rewrite - data.</para> - - <para>In addition, Mercurial treats every write as part of a - <emphasis>transaction</emphasis> that can span a number of - files. A transaction is <emphasis>atomic</emphasis>: either - the entire transaction succeeds and its effects are all - visible to readers in one go, or the whole thing is undone. - This guarantee of atomicity means that if you're running two - copies of Mercurial, where one is reading data and one is - writing it, the reader will never see a partially written - result that might confuse it.</para> - - <para>The fact that Mercurial only appends to files makes it - easier to provide this transactional guarantee. The easier it - is to do stuff like this, the more confident you should be - that it's done correctly.</para> - - </sect2> - <sect2> - <title>Fast retrieval</title> - - <para>Mercurial cleverly avoids a pitfall common to all earlier - revision control systems: the problem of <emphasis>inefficient - retrieval</emphasis>. Most revision control systems store - the contents of a revision as an incremental series of - modifications against a <quote>snapshot</quote>. To - reconstruct a specific revision, you must first read the - snapshot, and then every one of the revisions between the - snapshot and your target revision. The more history that a - file accumulates, the more revisions you must read, hence the - longer it takes to reconstruct a particular revision.</para> - - <informalfigure id="fig:concepts:snapshot"> - <mediaobject><imageobject><imagedata - fileref="snapshot"/></imageobject><textobject><phrase>XXX - add text</phrase></textobject><caption><para>Snapshot of - a revlog, with incremental - deltas</para></caption></mediaobject> - </informalfigure> - - <para>The innovation that Mercurial applies to this problem is - simple but effective. Once the cumulative amount of delta - information stored since the last snapshot exceeds a fixed - threshold, it stores a new snapshot (compressed, of course), - instead of another delta. This makes it possible to - reconstruct <emphasis>any</emphasis> revision of a file - quickly. This approach works so well that it has since been - copied by several other revision control systems.</para> - - <para>Figure <xref linkend="fig:concepts:snapshot"/> illustrates - the idea. In an entry in a revlog's index file, Mercurial - stores the range of entries from the data file that it must - read to reconstruct a particular revision.</para> - - <sect3> - <title>Aside: the influence of video compression</title> - - <para>If you're familiar with video compression or have ever - watched a TV feed through a digital cable or satellite - service, you may know that most video compression schemes - store each frame of video as a delta against its predecessor - frame. In addition, these schemes use <quote>lossy</quote> - compression techniques to increase the compression ratio, so - visual errors accumulate over the course of a number of - inter-frame deltas.</para> - - <para>Because it's possible for a video stream to <quote>drop - out</quote> occasionally due to signal glitches, and to - limit the accumulation of artefacts introduced by the lossy - compression process, video encoders periodically insert a - complete frame (called a <quote>key frame</quote>) into the - video stream; the next delta is generated against that - frame. This means that if the video signal gets - interrupted, it will resume once the next key frame is - received. Also, the accumulation of encoding errors - restarts anew with each key frame.</para> - - </sect3> - </sect2> - <sect2> - <title>Identification and strong integrity</title> - - <para>Along with delta or snapshot information, a revlog entry - contains a cryptographic hash of the data that it represents. - This makes it difficult to forge the contents of a revision, - and easy to detect accidental corruption.</para> - - <para>Hashes provide more than a mere check against corruption; - they are used as the identifiers for revisions. The changeset - identification hashes that you see as an end user are from - revisions of the changelog. Although filelogs and the - manifest also use hashes, Mercurial only uses these behind the - scenes.</para> - - <para>Mercurial verifies that hashes are correct when it - retrieves file revisions and when it pulls changes from - another repository. If it encounters an integrity problem, it - will complain and stop whatever it's doing.</para> - - <para>In addition to the effect it has on retrieval efficiency, - Mercurial's use of periodic snapshots makes it more robust - against partial data corruption. If a revlog becomes partly - corrupted due to a hardware error or system bug, it's often - possible to reconstruct some or most revisions from the - uncorrupted sections of the revlog, both before and after the - corrupted section. This would not be possible with a - delta-only storage model.</para> - - </sect2> - </sect1> - <sect1> - <title>Revision history, branching, and merging</title> - - <para>Every entry in a Mercurial revlog knows the identity of its - immediate ancestor revision, usually referred to as its - <emphasis>parent</emphasis>. In fact, a revision contains room - for not one parent, but two. Mercurial uses a special hash, - called the <quote>null ID</quote>, to represent the idea - <quote>there is no parent here</quote>. This hash is simply a - string of zeroes.</para> - - <para>In figure <xref linkend="fig:concepts:revlog"/>, you can see - an example of the conceptual structure of a revlog. Filelogs, - manifests, and changelogs all have this same structure; they - differ only in the kind of data stored in each delta or - snapshot.</para> - - <para>The first revision in a revlog (at the bottom of the image) - has the null ID in both of its parent slots. For a - <quote>normal</quote> revision, its first parent slot contains - the ID of its parent revision, and its second contains the null - ID, indicating that the revision has only one real parent. Any - two revisions that have the same parent ID are branches. A - revision that represents a merge between branches has two normal - revision IDs in its parent slots.</para> - - <informalfigure id="fig:concepts:revlog"> - <mediaobject><imageobject><imagedata - fileref="revlog"/></imageobject><textobject><phrase>XXX - add text</phrase></textobject></mediaobject> - </informalfigure> - - </sect1> - <sect1> - <title>The working directory</title> - - <para>In the working directory, Mercurial stores a snapshot of the - files from the repository as of a particular changeset.</para> - - <para>The working directory <quote>knows</quote> which changeset - it contains. When you update the working directory to contain a - particular changeset, Mercurial looks up the appropriate - revision of the manifest to find out which files it was tracking - at the time that changeset was committed, and which revision of - each file was then current. It then recreates a copy of each of - those files, with the same contents it had when the changeset - was committed.</para> - - <para>The <emphasis>dirstate</emphasis> contains Mercurial's - knowledge of the working directory. This details which - changeset the working directory is updated to, and all of the - files that Mercurial is tracking in the working - directory.</para> - - <para>Just as a revision of a revlog has room for two parents, so - that it can represent either a normal revision (with one parent) - or a merge of two earlier revisions, the dirstate has slots for - two parents. When you use the <command role="hg-cmd">hg - update</command> command, the changeset that you update to is - stored in the <quote>first parent</quote> slot, and the null ID - in the second. When you <command role="hg-cmd">hg - merge</command> with another changeset, the first parent - remains unchanged, and the second parent is filled in with the - changeset you're merging with. The <command role="hg-cmd">hg - parents</command> command tells you what the parents of the - dirstate are.</para> - - <sect2> - <title>What happens when you commit</title> - - <para>The dirstate stores parent information for more than just - book-keeping purposes. Mercurial uses the parents of the - dirstate as <emphasis>the parents of a new - changeset</emphasis> when you perform a commit.</para> - - <informalfigure id="fig:concepts:wdir"> - <mediaobject><imageobject><imagedata - fileref="wdir"/></imageobject><textobject><phrase>XXX - add text</phrase></textobject><caption><para>The working - directory can have two - parents</para></caption></mediaobject> - </informalfigure> - - <para>Figure <xref linkend="fig:concepts:wdir"/> shows the - normal state of the working directory, where it has a single - changeset as parent. That changeset is the - <emphasis>tip</emphasis>, the newest changeset in the - repository that has no children.</para> - - <informalfigure id="fig:concepts:wdir-after-commit"> - <mediaobject><imageobject><imagedata - fileref="wdir-after-commit"/></imageobject><textobject><phrase>XXX - add text</phrase></textobject><caption><para>The working - directory gains new parents after a - commit</para></caption></mediaobject> - </informalfigure> - - <para>It's useful to think of the working directory as - <quote>the changeset I'm about to commit</quote>. Any files - that you tell Mercurial that you've added, removed, renamed, - or copied will be reflected in that changeset, as will - modifications to any files that Mercurial is already tracking; - the new changeset will have the parents of the working - directory as its parents.</para> - - <para>After a commit, Mercurial will update the parents of the - working directory, so that the first parent is the ID of the - new changeset, and the second is the null ID. This is shown - in figure <xref linkend="fig:concepts:wdir-after-commit"/>. - Mercurial - doesn't touch any of the files in the working directory when - you commit; it just modifies the dirstate to note its new - parents.</para> - - </sect2> - <sect2> - <title>Creating a new head</title> - - <para>It's perfectly normal to update the working directory to a - changeset other than the current tip. For example, you might - want to know what your project looked like last Tuesday, or - you could be looking through changesets to see which one - introduced a bug. In cases like this, the natural thing to do - is update the working directory to the changeset you're - interested in, and then examine the files in the working - directory directly to see their contents as they were when you - committed that changeset. The effect of this is shown in - figure <xref linkend="fig:concepts:wdir-pre-branch"/>.</para> - - <informalfigure id="fig:concepts:wdir-pre-branch"> - <mediaobject><imageobject><imagedata - fileref="wdir-pre-branch"/></imageobject><textobject><phrase>XXX - add text</phrase></textobject><caption><para>The working - directory, updated to an older - changeset</para></caption></mediaobject> - </informalfigure> - - <para>Having updated the working directory to an older - changeset, what happens if you make some changes, and then - commit? Mercurial behaves in the same way as I outlined - above. The parents of the working directory become the - parents of the new changeset. This new changeset has no - children, so it becomes the new tip. And the repository now - contains two changesets that have no children; we call these - <emphasis>heads</emphasis>. You can see the structure that - this creates in figure <xref - linkend="fig:concepts:wdir-branch"/>.</para> - - <informalfigure id="fig:concepts:wdir-branch"> - <mediaobject><imageobject><imagedata - fileref="wdir-branch"/></imageobject><textobject><phrase>XXX - add text</phrase></textobject><caption><para>After a - commit made while synced to an older - changeset</para></caption></mediaobject> - </informalfigure> - - <note> - <para> If you're new to Mercurial, you should keep in mind a - common <quote>error</quote>, which is to use the <command - role="hg-cmd">hg pull</command> command without any - options. By default, the <command role="hg-cmd">hg - pull</command> command <emphasis>does not</emphasis> - update the working directory, so you'll bring new changesets - into your repository, but the working directory will stay - synced at the same changeset as before the pull. If you - make some changes and commit afterwards, you'll thus create - a new head, because your working directory isn't synced to - whatever the current tip is.</para> - - <para> I put the word <quote>error</quote> in quotes because - all that you need to do to rectify this situation is - <command role="hg-cmd">hg merge</command>, then <command - role="hg-cmd">hg commit</command>. In other words, this - almost never has negative consequences; it just surprises - people. I'll discuss other ways to avoid this behaviour, - and why Mercurial behaves in this initially surprising way, - later on.</para> - </note> - - </sect2> - <sect2> - <title>Merging heads</title> - - <para>When you run the <command role="hg-cmd">hg merge</command> - command, Mercurial leaves the first parent of the working - directory unchanged, and sets the second parent to the - changeset you're merging with, as shown in figure <xref - linkend="fig:concepts:wdir-merge"/>.</para> - - <informalfigure id="fig:concepts:wdir-merge"> - <mediaobject><imageobject><imagedata - fileref="wdir-merge"/></imageobject><textobject><phrase>XXX - add text</phrase></textobject><caption><para>Merging two - heads</para></caption></mediaobject> - </informalfigure> - - <para>Mercurial also has to modify the working directory, to - merge the files managed in the two changesets. Simplified a - little, the merging process goes like this, for every file in - the manifests of both changesets.</para> - <itemizedlist> - <listitem><para>If neither changeset has modified a file, do - nothing with that file.</para> - </listitem> - <listitem><para>If one changeset has modified a file, and the - other hasn't, create the modified copy of the file in the - working directory.</para> - </listitem> - <listitem><para>If one changeset has removed a file, and the - other hasn't (or has also deleted it), delete the file - from the working directory.</para> - </listitem> - <listitem><para>If one changeset has removed a file, but the - other has modified the file, ask the user what to do: keep - the modified file, or remove it?</para> - </listitem> - <listitem><para>If both changesets have modified a file, - invoke an external merge program to choose the new - contents for the merged file. This may require input from - the user.</para> - </listitem> - <listitem><para>If one changeset has modified a file, and the - other has renamed or copied the file, make sure that the - changes follow the new name of the file.</para> - </listitem></itemizedlist> - <para>There are more details&emdash;merging has plenty of corner - cases&emdash;but these are the most common choices that are - involved in a merge. As you can see, most cases are - completely automatic, and indeed most merges finish - automatically, without requiring your input to resolve any - conflicts.</para> - - <para>When you're thinking about what happens when you commit - after a merge, once again the working directory is <quote>the - changeset I'm about to commit</quote>. After the <command - role="hg-cmd">hg merge</command> command completes, the - working directory has two parents; these will become the - parents of the new changeset.</para> - - <para>Mercurial lets you perform multiple merges, but you must - commit the results of each individual merge as you go. This - is necessary because Mercurial only tracks two parents for - both revisions and the working directory. While it would be - technically possible to merge multiple changesets at once, the - prospect of user confusion and making a terrible mess of a - merge immediately becomes overwhelming.</para> - - </sect2> - </sect1> - <sect1> - <title>Other interesting design features</title> - - <para>In the sections above, I've tried to highlight some of the - most important aspects of Mercurial's design, to illustrate that - it pays careful attention to reliability and performance. - However, the attention to detail doesn't stop there. There are - a number of other aspects of Mercurial's construction that I - personally find interesting. I'll detail a few of them here, - separate from the <quote>big ticket</quote> items above, so that - if you're interested, you can gain a better idea of the amount - of thinking that goes into a well-designed system.</para> - - <sect2> - <title>Clever compression</title> - - <para>When appropriate, Mercurial will store both snapshots and - deltas in compressed form. It does this by always - <emphasis>trying to</emphasis> compress a snapshot or delta, - but only storing the compressed version if it's smaller than - the uncompressed version.</para> - - <para>This means that Mercurial does <quote>the right - thing</quote> when storing a file whose native form is - compressed, such as a <literal>zip</literal> archive or a JPEG - image. When these types of files are compressed a second - time, the resulting file is usually bigger than the - once-compressed form, and so Mercurial will store the plain - <literal>zip</literal> or JPEG.</para> - - <para>Deltas between revisions of a compressed file are usually - larger than snapshots of the file, and Mercurial again does - <quote>the right thing</quote> in these cases. It finds that - such a delta exceeds the threshold at which it should store a - complete snapshot of the file, so it stores the snapshot, - again saving space compared to a naive delta-only - approach.</para> - - <sect3> - <title>Network recompression</title> - - <para>When storing revisions on disk, Mercurial uses the - <quote>deflate</quote> compression algorithm (the same one - used by the popular <literal>zip</literal> archive format), - which balances good speed with a respectable compression - ratio. However, when transmitting revision data over a - network connection, Mercurial uncompresses the compressed - revision data.</para> - - <para>If the connection is over HTTP, Mercurial recompresses - the entire stream of data using a compression algorithm that - gives a better compression ratio (the Burrows-Wheeler - algorithm from the widely used <literal>bzip2</literal> - compression package). This combination of algorithm and - compression of the entire stream (instead of a revision at a - time) substantially reduces the number of bytes to be - transferred, yielding better network performance over almost - all kinds of network.</para> - - <para>(If the connection is over <command>ssh</command>, - Mercurial <emphasis>doesn't</emphasis> recompress the - stream, because <command>ssh</command> can already do this - itself.)</para> - - </sect3> - </sect2> - <sect2> - <title>Read/write ordering and atomicity</title> - - <para>Appending to files isn't the whole story when it comes to - guaranteeing that a reader won't see a partial write. If you - recall figure <xref linkend="fig:concepts:metadata"/>, - revisions in the - changelog point to revisions in the manifest, and revisions in - the manifest point to revisions in filelogs. This hierarchy - is deliberate.</para> - - <para>A writer starts a transaction by writing filelog and - manifest data, and doesn't write any changelog data until - those are finished. A reader starts by reading changelog - data, then manifest data, followed by filelog data.</para> - - <para>Since the writer has always finished writing filelog and - manifest data before it writes to the changelog, a reader will - never read a pointer to a partially written manifest revision - from the changelog, and it will never read a pointer to a - partially written filelog revision from the manifest.</para> - - </sect2> - <sect2> - <title>Concurrent access</title> - - <para>The read/write ordering and atomicity guarantees mean that - Mercurial never needs to <emphasis>lock</emphasis> a - repository when it's reading data, even if the repository is - being written to while the read is occurring. This has a big - effect on scalability; you can have an arbitrary number of - Mercurial processes safely reading data from a repository - safely all at once, no matter whether it's being written to or - not.</para> - - <para>The lockless nature of reading means that if you're - sharing a repository on a multi-user system, you don't need to - grant other local users permission to - <emphasis>write</emphasis> to your repository in order for - them to be able to clone it or pull changes from it; they only - need <emphasis>read</emphasis> permission. (This is - <emphasis>not</emphasis> a common feature among revision - control systems, so don't take it for granted! Most require - readers to be able to lock a repository to access it safely, - and this requires write permission on at least one directory, - which of course makes for all kinds of nasty and annoying - security and administrative problems.)</para> - - <para>Mercurial uses locks to ensure that only one process can - write to a repository at a time (the locking mechanism is safe - even over filesystems that are notoriously hostile to locking, - such as NFS). If a repository is locked, a writer will wait - for a while to retry if the repository becomes unlocked, but - if the repository remains locked for too long, the process - attempting to write will time out after a while. This means - that your daily automated scripts won't get stuck forever and - pile up if a system crashes unnoticed, for example. (Yes, the - timeout is configurable, from zero to infinity.)</para> - - <sect3> - <title>Safe dirstate access</title> - - <para>As with revision data, Mercurial doesn't take a lock to - read the dirstate file; it does acquire a lock to write it. - To avoid the possibility of reading a partially written copy - of the dirstate file, Mercurial writes to a file with a - unique name in the same directory as the dirstate file, then - renames the temporary file atomically to - <filename>dirstate</filename>. The file named - <filename>dirstate</filename> is thus guaranteed to be - complete, not partially written.</para> - - </sect3> - </sect2> - <sect2> - <title>Avoiding seeks</title> - - <para>Critical to Mercurial's performance is the avoidance of - seeks of the disk head, since any seek is far more expensive - than even a comparatively large read operation.</para> - - <para>This is why, for example, the dirstate is stored in a - single file. If there were a dirstate file per directory that - Mercurial tracked, the disk would seek once per directory. - Instead, Mercurial reads the entire single dirstate file in - one step.</para> - - <para>Mercurial also uses a <quote>copy on write</quote> scheme - when cloning a repository on local storage. Instead of - copying every revlog file from the old repository into the new - repository, it makes a <quote>hard link</quote>, which is a - shorthand way to say <quote>these two names point to the same - file</quote>. When Mercurial is about to write to one of a - revlog's files, it checks to see if the number of names - pointing at the file is greater than one. If it is, more than - one repository is using the file, so Mercurial makes a new - copy of the file that is private to this repository.</para> - - <para>A few revision control developers have pointed out that - this idea of making a complete private copy of a file is not - very efficient in its use of storage. While this is true, - storage is cheap, and this method gives the highest - performance while deferring most book-keeping to the operating - system. An alternative scheme would most likely reduce - performance and increase the complexity of the software, each - of which is much more important to the <quote>feel</quote> of - day-to-day use.</para> - - </sect2> - <sect2> - <title>Other contents of the dirstate</title> - - <para>Because Mercurial doesn't force you to tell it when you're - modifying a file, it uses the dirstate to store some extra - information so it can determine efficiently whether you have - modified a file. For each file in the working directory, it - stores the time that it last modified the file itself, and the - size of the file at that time.</para> - - <para>When you explicitly <command role="hg-cmd">hg - add</command>, <command role="hg-cmd">hg remove</command>, - <command role="hg-cmd">hg rename</command> or <command - role="hg-cmd">hg copy</command> files, Mercurial updates the - dirstate so that it knows what to do with those files when you - commit.</para> - - <para>When Mercurial is checking the states of files in the - working directory, it first checks a file's modification time. - If that has not changed, the file must not have been modified. - If the file's size has changed, the file must have been - modified. If the modification time has changed, but the size - has not, only then does Mercurial need to read the actual - contents of the file to see if they've changed. Storing these - few extra pieces of information dramatically reduces the - amount of data that Mercurial needs to read, which yields - large performance improvements compared to other revision - control systems.</para> - - </sect2> - </sect1> -</chapter> - -<!-- -local variables: -sgml-parent-document: ("00book.xml" "book" "chapter") -end: --->
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/en/ch04-daily.xml Thu Mar 19 20:54:12 2009 -0700 @@ -0,0 +1,544 @@ +<!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : --> + +<chapter id="chap:daily"> + <?dbhtml filename="mercurial-in-daily-use.html"?> + <title>Mercurial in daily use</title> + + <sect1> + <title>Telling Mercurial which files to track</title> + + <para>Mercurial does not work with files in your repository unless + you tell it to manage them. The <command role="hg-cmd">hg + status</command> command will tell you which files Mercurial + doesn't know about; it uses a + <quote><literal>?</literal></quote> to display such + files.</para> + + <para>To tell Mercurial to track a file, use the <command + role="hg-cmd">hg add</command> command. Once you have added a + file, the entry in the output of <command role="hg-cmd">hg + status</command> for that file changes from + <quote><literal>?</literal></quote> to + <quote><literal>A</literal></quote>.</para> + + &interaction.daily.files.add; + + <para>After you run a <command role="hg-cmd">hg commit</command>, + the files that you added before the commit will no longer be + listed in the output of <command role="hg-cmd">hg + status</command>. The reason for this is that <command + role="hg-cmd">hg status</command> only tells you about + <quote>interesting</quote> files&emdash;those that you have + modified or told Mercurial to do something with&emdash;by + default. If you have a repository that contains thousands of + files, you will rarely want to know about files that Mercurial + is tracking, but that have not changed. (You can still get this + information; we'll return to this later.)</para> + + <para>Once you add a file, Mercurial doesn't do anything with it + immediately. Instead, it will take a snapshot of the file's + state the next time you perform a commit. It will then continue + to track the changes you make to the file every time you commit, + until you remove the file.</para> + + <sect2> + <title>Explicit versus implicit file naming</title> + + <para>A useful behaviour that Mercurial has is that if you pass + the name of a directory to a command, every Mercurial command + will treat this as <quote>I want to operate on every file in + this directory and its subdirectories</quote>.</para> + + &interaction.daily.files.add-dir; + + <para>Notice in this example that Mercurial printed the names of + the files it added, whereas it didn't do so when we added the + file named <filename>a</filename> in the earlier + example.</para> + + <para>What's going on is that in the former case, we explicitly + named the file to add on the command line, so the assumption + that Mercurial makes in such cases is that you know what you + were doing, and it doesn't print any output.</para> + + <para>However, when we <emphasis>imply</emphasis> the names of + files by giving the name of a directory, Mercurial takes the + extra step of printing the name of each file that it does + something with. This makes it more clear what is happening, + and reduces the likelihood of a silent and nasty surprise. + This behaviour is common to most Mercurial commands.</para> + + </sect2> + <sect2> + <title>Aside: Mercurial tracks files, not directories</title> + + <para>Mercurial does not track directory information. Instead, + it tracks the path to a file. Before creating a file, it + first creates any missing directory components of the path. + After it deletes a file, it then deletes any empty directories + that were in the deleted file's path. This sounds like a + trivial distinction, but it has one minor practical + consequence: it is not possible to represent a completely + empty directory in Mercurial.</para> + + <para>Empty directories are rarely useful, and there are + unintrusive workarounds that you can use to achieve an + appropriate effect. The developers of Mercurial thus felt + that the complexity that would be required to manage empty + directories was not worth the limited benefit this feature + would bring.</para> + + <para>If you need an empty directory in your repository, there + are a few ways to achieve this. One is to create a directory, + then <command role="hg-cmd">hg add</command> a + <quote>hidden</quote> file to that directory. On Unix-like + systems, any file name that begins with a period + (<quote><literal>.</literal></quote>) is treated as hidden by + most commands and GUI tools. This approach is illustrated + below.</para> + +&interaction.daily.files.hidden; + + <para>Another way to tackle a need for an empty directory is to + simply create one in your automated build scripts before they + will need it.</para> + + </sect2> + </sect1> + <sect1> + <title>How to stop tracking a file</title> + + <para>Once you decide that a file no longer belongs in your + repository, use the <command role="hg-cmd">hg remove</command> + command; this deletes the file, and tells Mercurial to stop + tracking it. A removed file is represented in the output of + <command role="hg-cmd">hg status</command> with a + <quote><literal>R</literal></quote>.</para> + + &interaction.daily.files.remove; + + <para>After you <command role="hg-cmd">hg remove</command> a file, + Mercurial will no longer track changes to that file, even if you + recreate a file with the same name in your working directory. + If you do recreate a file with the same name and want Mercurial + to track the new file, simply <command role="hg-cmd">hg + add</command> it. Mercurial will know that the newly added + file is not related to the old file of the same name.</para> + + <sect2> + <title>Removing a file does not affect its history</title> + + <para>It is important to understand that removing a file has + only two effects.</para> + <itemizedlist> + <listitem><para>It removes the current version of the file + from the working directory.</para> + </listitem> + <listitem><para>It stops Mercurial from tracking changes to + the file, from the time of the next commit.</para> + </listitem></itemizedlist> + <para>Removing a file <emphasis>does not</emphasis> in any way + alter the <emphasis>history</emphasis> of the file.</para> + + <para>If you update the working directory to a changeset in + which a file that you have removed was still tracked, it will + reappear in the working directory, with the contents it had + when you committed that changeset. If you then update the + working directory to a later changeset, in which the file had + been removed, Mercurial will once again remove the file from + the working directory.</para> + + </sect2> + <sect2> + <title>Missing files</title> + + <para>Mercurial considers a file that you have deleted, but not + used <command role="hg-cmd">hg remove</command> to delete, to + be <emphasis>missing</emphasis>. A missing file is + represented with <quote><literal>!</literal></quote> in the + output of <command role="hg-cmd">hg status</command>. + Mercurial commands will not generally do anything with missing + files.</para> + + &interaction.daily.files.missing; + + <para>If your repository contains a file that <command + role="hg-cmd">hg status</command> reports as missing, and + you want the file to stay gone, you can run <command + role="hg-cmd">hg remove <option + role="hg-opt-remove">--after</option></command> at any + time later on, to tell Mercurial that you really did mean to + remove the file.</para> + + &interaction.daily.files.remove-after; + + <para>On the other hand, if you deleted the missing file by + accident, give <command role="hg-cmd">hg revert</command> the + name of the file to recover. It will reappear, in unmodified + form.</para> + +&interaction.daily.files.recover-missing; + + </sect2> + <sect2> + <title>Aside: why tell Mercurial explicitly to remove a + file?</title> + + <para>You might wonder why Mercurial requires you to explicitly + tell it that you are deleting a file. Early during the + development of Mercurial, it let you delete a file however you + pleased; Mercurial would notice the absence of the file + automatically when you next ran a <command role="hg-cmd">hg + commit</command>, and stop tracking the file. In practice, + this made it too easy to accidentally remove a file without + noticing.</para> + + </sect2> + <sect2> + <title>Useful shorthand&emdash;adding and removing files in one + step</title> + + <para>Mercurial offers a combination command, <command + role="hg-cmd">hg addremove</command>, that adds untracked + files and marks missing files as removed.</para> + + &interaction.daily.files.addremove; + + <para>The <command role="hg-cmd">hg commit</command> command + also provides a <option role="hg-opt-commit">-A</option> + option that performs this same add-and-remove, immediately + followed by a commit.</para> + + &interaction.daily.files.commit-addremove; + + </sect2> + </sect1> + <sect1> + <title>Copying files</title> + + <para>Mercurial provides a <command role="hg-cmd">hg + copy</command> command that lets you make a new copy of a + file. When you copy a file using this command, Mercurial makes + a record of the fact that the new file is a copy of the original + file. It treats these copied files specially when you merge + your work with someone else's.</para> + + <sect2> + <title>The results of copying during a merge</title> + + <para>What happens during a merge is that changes + <quote>follow</quote> a copy. To best illustrate what this + means, let's create an example. We'll start with the usual + tiny repository that contains a single file.</para> + + &interaction.daily.copy.init; + + <para>We need to do some work in + parallel, so that we'll have something to merge. So let's + clone our repository.</para> + + &interaction.daily.copy.clone; + + <para>Back in our initial repository, let's use the <command + role="hg-cmd">hg copy</command> command to make a copy of + the first file we created.</para> + + &interaction.daily.copy.copy; + + <para>If we look at the output of the <command role="hg-cmd">hg + status</command> command afterwards, the copied file looks + just like a normal added file.</para> + + &interaction.daily.copy.status; + + <para>But if we pass the <option + role="hg-opt-status">-C</option> option to <command + role="hg-cmd">hg status</command>, it prints another line of + output: this is the file that our newly-added file was copied + <emphasis>from</emphasis>.</para> + + &interaction.daily.copy.status-copy; + + <para>Now, back in the repository we cloned, let's make a change + in parallel. We'll add a line of content to the original file + that we created.</para> + + &interaction.daily.copy.other; + + <para>Now we have a modified <filename>file</filename> in this + repository. When we pull the changes from the first + repository, and merge the two heads, Mercurial will propagate + the changes that we made locally to <filename>file</filename> + into its copy, <filename>new-file</filename>.</para> + + &interaction.daily.copy.merge; + + </sect2> + <sect2 id="sec:daily:why-copy"> + <title>Why should changes follow copies?</title> + + <para>This behaviour, of changes to a file propagating out to + copies of the file, might seem esoteric, but in most cases + it's highly desirable.</para> + + <para>First of all, remember that this propagation + <emphasis>only</emphasis> happens when you merge. So if you + <command role="hg-cmd">hg copy</command> a file, and + subsequently modify the original file during the normal course + of your work, nothing will happen.</para> + + <para>The second thing to know is that modifications will only + propagate across a copy as long as the repository that you're + pulling changes from <emphasis>doesn't know</emphasis> about + the copy.</para> + + <para>The reason that Mercurial does this is as follows. Let's + say I make an important bug fix in a source file, and commit + my changes. Meanwhile, you've decided to <command + role="hg-cmd">hg copy</command> the file in your repository, + without knowing about the bug or having seen the fix, and you + have started hacking on your copy of the file.</para> + + <para>If you pulled and merged my changes, and Mercurial + <emphasis>didn't</emphasis> propagate changes across copies, + your source file would now contain the bug, and unless you + remembered to propagate the bug fix by hand, the bug would + <emphasis>remain</emphasis> in your copy of the file.</para> + + <para>By automatically propagating the change that fixed the bug + from the original file to the copy, Mercurial prevents this + class of problem. To my knowledge, Mercurial is the + <emphasis>only</emphasis> revision control system that + propagates changes across copies like this.</para> + + <para>Once your change history has a record that the copy and + subsequent merge occurred, there's usually no further need to + propagate changes from the original file to the copied file, + and that's why Mercurial only propagates changes across copies + until this point, and no further.</para> + + </sect2> + <sect2> + <title>How to make changes <emphasis>not</emphasis> follow a + copy</title> + + <para>If, for some reason, you decide that this business of + automatically propagating changes across copies is not for + you, simply use your system's normal file copy command (on + Unix-like systems, that's <command>cp</command>) to make a + copy of a file, then <command role="hg-cmd">hg add</command> + the new copy by hand. Before you do so, though, please do + reread section <xref linkend="sec:daily:why-copy"/>, and make + an informed + decision that this behaviour is not appropriate to your + specific case.</para> + + </sect2> + <sect2> + <title>Behaviour of the <command role="hg-cmd">hg copy</command> + command</title> + + <para>When you use the <command role="hg-cmd">hg copy</command> + command, Mercurial makes a copy of each source file as it + currently stands in the working directory. This means that if + you make some modifications to a file, then <command + role="hg-cmd">hg copy</command> it without first having + committed those changes, the new copy will also contain the + modifications you have made up until that point. (I find this + behaviour a little counterintuitive, which is why I mention it + here.)</para> + + <para>The <command role="hg-cmd">hg copy</command> command acts + similarly to the Unix <command>cp</command> command (you can + use the <command role="hg-cmd">hg cp</command> alias if you + prefer). The last argument is the + <emphasis>destination</emphasis>, and all prior arguments are + <emphasis>sources</emphasis>. If you pass it a single file as + the source, and the destination does not exist, it creates a + new file with that name.</para> + + &interaction.daily.copy.simple; + + <para>If the destination is a directory, Mercurial copies its + sources into that directory.</para> + + &interaction.daily.copy.dir-dest; + + <para>Copying a directory is + recursive, and preserves the directory structure of the + source.</para> + + &interaction.daily.copy.dir-src; + + <para>If the source and destination are both directories, the + source tree is recreated in the destination directory.</para> + + &interaction.daily.copy.dir-src-dest; + + <para>As with the <command role="hg-cmd">hg rename</command> + command, if you copy a file manually and then want Mercurial + to know that you've copied the file, simply use the <option + role="hg-opt-copy">--after</option> option to <command + role="hg-cmd">hg copy</command>.</para> + + &interaction.daily.copy.after; + + </sect2> + </sect1> + <sect1> + <title>Renaming files</title> + + <para>It's rather more common to need to rename a file than to + make a copy of it. The reason I discussed the <command + role="hg-cmd">hg copy</command> command before talking about + renaming files is that Mercurial treats a rename in essentially + the same way as a copy. Therefore, knowing what Mercurial does + when you copy a file tells you what to expect when you rename a + file.</para> + + <para>When you use the <command role="hg-cmd">hg rename</command> + command, Mercurial makes a copy of each source file, then + deletes it and marks the file as removed.</para> + + &interaction.daily.rename.rename; + + <para>The <command role="hg-cmd">hg status</command> command shows + the newly copied file as added, and the copied-from file as + removed.</para> + + &interaction.daily.rename.status; + + <para>As with the results of a <command role="hg-cmd">hg + copy</command>, we must use the <option + role="hg-opt-status">-C</option> option to <command + role="hg-cmd">hg status</command> to see that the added file + is really being tracked by Mercurial as a copy of the original, + now removed, file.</para> + + &interaction.daily.rename.status-copy; + + <para>As with <command role="hg-cmd">hg remove</command> and + <command role="hg-cmd">hg copy</command>, you can tell Mercurial + about a rename after the fact using the <option + role="hg-opt-rename">--after</option> option. In most other + respects, the behaviour of the <command role="hg-cmd">hg + rename</command> command, and the options it accepts, are + similar to the <command role="hg-cmd">hg copy</command> + command.</para> + + <sect2> + <title>Renaming files and merging changes</title> + + <para>Since Mercurial's rename is implemented as + copy-and-remove, the same propagation of changes happens when + you merge after a rename as after a copy.</para> + + <para>If I modify a file, and you rename it to a new name, and + then we merge our respective changes, my modifications to the + file under its original name will be propagated into the file + under its new name. (This is something you might expect to + <quote>simply work,</quote> but not all revision control + systems actually do this.)</para> + + <para>Whereas having changes follow a copy is a feature where + you can perhaps nod and say <quote>yes, that might be + useful,</quote> it should be clear that having them follow a + rename is definitely important. Without this facility, it + would simply be too easy for changes to become orphaned when + files are renamed.</para> + + </sect2> + <sect2> + <title>Divergent renames and merging</title> + + <para>The case of diverging names occurs when two developers + start with a file&emdash;let's call it + <filename>foo</filename>&emdash;in their respective + repositories.</para> + + &interaction.rename.divergent.clone; + + <para>Anne renames the file to <filename>bar</filename>.</para> + + &interaction.rename.divergent.rename.anne; + + <para>Meanwhile, Bob renames it to + <filename>quux</filename>.</para> + + &interaction.rename.divergent.rename.bob; + + <para>I like to think of this as a conflict because each + developer has expressed different intentions about what the + file ought to be named.</para> + + <para>What do you think should happen when they merge their + work? Mercurial's actual behaviour is that it always preserves + <emphasis>both</emphasis> names when it merges changesets that + contain divergent renames.</para> + + &interaction.rename.divergent.merge; + + <para>Notice that Mercurial does warn about the divergent + renames, but it leaves it up to you to do something about the + divergence after the merge.</para> + + </sect2> + <sect2> + <title>Convergent renames and merging</title> + + <para>Another kind of rename conflict occurs when two people + choose to rename different <emphasis>source</emphasis> files + to the same <emphasis>destination</emphasis>. In this case, + Mercurial runs its normal merge machinery, and lets you guide + it to a suitable resolution.</para> + + </sect2> + <sect2> + <title>Other name-related corner cases</title> + + <para>Mercurial has a longstanding bug in which it fails to + handle a merge where one side has a file with a given name, + while another has a directory with the same name. This is + documented as <ulink role="hg-bug" + url="http://www.selenic.com/mercurial/bts/issue29">issue + 29</ulink>.</para> + + &interaction.issue29.go; + + </sect2> + </sect1> + <sect1> + <title>Recovering from mistakes</title> + + <para>Mercurial has some useful commands that will help you to + recover from some common mistakes.</para> + + <para>The <command role="hg-cmd">hg revert</command> command lets + you undo changes that you have made to your working directory. + For example, if you <command role="hg-cmd">hg add</command> a + file by accident, just run <command role="hg-cmd">hg + revert</command> with the name of the file you added, and + while the file won't be touched in any way, it won't be tracked + for adding by Mercurial any longer, either. You can also use + <command role="hg-cmd">hg revert</command> to get rid of + erroneous changes to a file.</para> + + <para>It's useful to remember that the <command role="hg-cmd">hg + revert</command> command is useful for changes that you have + not yet committed. Once you've committed a change, if you + decide it was a mistake, you can still do something about it, + though your options may be more limited.</para> + + <para>For more information about the <command role="hg-cmd">hg + revert</command> command, and details about how to deal with + changes you have already committed, see chapter <xref + linkend="chap:undo"/>.</para> + + </sect1> +</chapter> + +<!-- +local variables: +sgml-parent-document: ("00book.xml" "book" "chapter") +end: +-->
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/en/ch05-collab.xml Thu Mar 19 20:54:12 2009 -0700 @@ -0,0 +1,1434 @@ +<!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : --> + +<chapter id="cha:collab"> + <?dbhtml filename="collaborating-with-other-people.html"?> + <title>Collaborating with other people</title> + + <para>As a completely decentralised tool, Mercurial doesn't impose + any policy on how people ought to work with each other. However, + if you're new to distributed revision control, it helps to have + some tools and examples in mind when you're thinking about + possible workflow models.</para> + + <sect1> + <title>Mercurial's web interface</title> + + <para>Mercurial has a powerful web interface that provides several + useful capabilities.</para> + + <para>For interactive use, the web interface lets you browse a + single repository or a collection of repositories. You can view + the history of a repository, examine each change (comments and + diffs), and view the contents of each directory and file.</para> + + <para>Also for human consumption, the web interface provides an + RSS feed of the changes in a repository. This lets you + <quote>subscribe</quote> to a repository using your favourite + feed reader, and be automatically notified of activity in that + repository as soon as it happens. I find this capability much + more convenient than the model of subscribing to a mailing list + to which notifications are sent, as it requires no additional + configuration on the part of whoever is serving the + repository.</para> + + <para>The web interface also lets remote users clone a repository, + pull changes from it, and (when the server is configured to + permit it) push changes back to it. Mercurial's HTTP tunneling + protocol aggressively compresses data, so that it works + efficiently even over low-bandwidth network connections.</para> + + <para>The easiest way to get started with the web interface is to + use your web browser to visit an existing repository, such as + the master Mercurial repository at <ulink + url="http://www.selenic.com/repo/hg?style=gitweb">http://www.selenic.com/repo/hg?style=gitweb</ulink>.</para> + + <para>If you're interested in providing a web interface to your + own repositories, Mercurial provides two ways to do this. The + first is using the <command role="hg-cmd">hg serve</command> + command, which is best suited to short-term + <quote>lightweight</quote> serving. See section <xref + linkend="sec:collab:serve"/> below for details of how to use + this command. If you have a long-lived repository that you'd + like to make permanently available, Mercurial has built-in + support for the CGI (Common Gateway Interface) standard, which + all common web servers support. See section <xref + linkend="sec:collab:cgi"/> for details of CGI + configuration.</para> + + </sect1> + <sect1> + <title>Collaboration models</title> + + <para>With a suitably flexible tool, making decisions about + workflow is much more of a social engineering challenge than a + technical one. Mercurial imposes few limitations on how you can + structure the flow of work in a project, so it's up to you and + your group to set up and live with a model that matches your own + particular needs.</para> + + <sect2> + <title>Factors to keep in mind</title> + + <para>The most important aspect of any model that you must keep + in mind is how well it matches the needs and capabilities of + the people who will be using it. This might seem + self-evident; even so, you still can't afford to forget it for + a moment.</para> + + <para>I once put together a workflow model that seemed to make + perfect sense to me, but that caused a considerable amount of + consternation and strife within my development team. In spite + of my attempts to explain why we needed a complex set of + branches, and how changes ought to flow between them, a few + team members revolted. Even though they were smart people, + they didn't want to pay attention to the constraints we were + operating under, or face the consequences of those constraints + in the details of the model that I was advocating.</para> + + <para>Don't sweep foreseeable social or technical problems under + the rug. Whatever scheme you put into effect, you should plan + for mistakes and problem scenarios. Consider adding automated + machinery to prevent, or quickly recover from, trouble that + you can anticipate. As an example, if you intend to have a + branch with not-for-release changes in it, you'd do well to + think early about the possibility that someone might + accidentally merge those changes into a release branch. You + could avoid this particular problem by writing a hook that + prevents changes from being merged from an inappropriate + branch.</para> + + </sect2> + <sect2> + <title>Informal anarchy</title> + + <para>I wouldn't suggest an <quote>anything goes</quote> + approach as something sustainable, but it's a model that's + easy to grasp, and it works perfectly well in a few unusual + situations.</para> + + <para>As one example, many projects have a loose-knit group of + collaborators who rarely physically meet each other. Some + groups like to overcome the isolation of working at a distance + by organising occasional <quote>sprints</quote>. In a sprint, + a number of people get together in a single location (a + company's conference room, a hotel meeting room, that kind of + place) and spend several days more or less locked in there, + hacking intensely on a handful of projects.</para> + + <para>A sprint is the perfect place to use the <command + role="hg-cmd">hg serve</command> command, since <command + role="hg-cmd">hg serve</command> does not require any fancy + server infrastructure. You can get started with <command + role="hg-cmd">hg serve</command> in moments, by reading + section <xref linkend="sec:collab:serve"/> below. Then simply + tell + the person next to you that you're running a server, send the + URL to them in an instant message, and you immediately have a + quick-turnaround way to work together. They can type your URL + into their web browser and quickly review your changes; or + they can pull a bugfix from you and verify it; or they can + clone a branch containing a new feature and try it out.</para> + + <para>The charm, and the problem, with doing things in an ad hoc + fashion like this is that only people who know about your + changes, and where they are, can see them. Such an informal + approach simply doesn't scale beyond a handful people, because + each individual needs to know about $n$ different repositories + to pull from.</para> + + </sect2> + <sect2> + <title>A single central repository</title> + + <para>For smaller projects migrating from a centralised revision + control tool, perhaps the easiest way to get started is to + have changes flow through a single shared central repository. + This is also the most common <quote>building block</quote> for + more ambitious workflow schemes.</para> + + <para>Contributors start by cloning a copy of this repository. + They can pull changes from it whenever they need to, and some + (perhaps all) developers have permission to push a change back + when they're ready for other people to see it.</para> + + <para>Under this model, it can still often make sense for people + to pull changes directly from each other, without going + through the central repository. Consider a case in which I + have a tentative bug fix, but I am worried that if I were to + publish it to the central repository, it might subsequently + break everyone else's trees as they pull it. To reduce the + potential for damage, I can ask you to clone my repository + into a temporary repository of your own and test it. This + lets us put off publishing the potentially unsafe change until + it has had a little testing.</para> + + <para>In this kind of scenario, people usually use the + <command>ssh</command> protocol to securely push changes to + the central repository, as documented in section <xref + linkend="sec:collab:ssh"/>. It's also + usual to publish a read-only copy of the repository over HTTP + using CGI, as in section <xref linkend="sec:collab:cgi"/>. + Publishing over HTTP + satisfies the needs of people who don't have push access, and + those who want to use web browsers to browse the repository's + history.</para> + + </sect2> + <sect2> + <title>Working with multiple branches</title> + + <para>Projects of any significant size naturally tend to make + progress on several fronts simultaneously. In the case of + software, it's common for a project to go through periodic + official releases. A release might then go into + <quote>maintenance mode</quote> for a while after its first + publication; maintenance releases tend to contain only bug + fixes, not new features. In parallel with these maintenance + releases, one or more future releases may be under + development. People normally use the word + <quote>branch</quote> to refer to one of these many slightly + different directions in which development is + proceeding.</para> + + <para>Mercurial is particularly well suited to managing a number + of simultaneous, but not identical, branches. Each + <quote>development direction</quote> can live in its own + central repository, and you can merge changes from one to + another as the need arises. Because repositories are + independent of each other, unstable changes in a development + branch will never affect a stable branch unless someone + explicitly merges those changes in.</para> + + <para>Here's an example of how this can work in practice. Let's + say you have one <quote>main branch</quote> on a central + server.</para> + + &interaction.branching.init; + + <para>People clone it, make changes locally, test them, and push + them back.</para> + + <para>Once the main branch reaches a release milestone, you can + use the <command role="hg-cmd">hg tag</command> command to + give a permanent name to the milestone revision.</para> + + &interaction.branching.tag; + + <para>Let's say some ongoing + development occurs on the main branch.</para> + + &interaction.branching.main; + + <para>Using the tag that was recorded at the milestone, people + who clone that repository at any time in the future can use + <command role="hg-cmd">hg update</command> to get a copy of + the working directory exactly as it was when that tagged + revision was committed.</para> + + &interaction.branching.update; + + <para>In addition, immediately after the main branch is tagged, + someone can then clone the main branch on the server to a new + <quote>stable</quote> branch, also on the server.</para> + + &interaction.branching.clone; + + <para>Someone who needs to make a change to the stable branch + can then clone <emphasis>that</emphasis> repository, make + their changes, commit, and push their changes back there.</para> + + &interaction.branching.stable; + + <para>Because Mercurial repositories are independent, and + Mercurial doesn't move changes around automatically, the + stable and main branches are <emphasis>isolated</emphasis> + from each other. The changes that you made on the main branch + don't <quote>leak</quote> to the stable branch, and vice + versa.</para> + + <para>You'll often want all of your bugfixes on the stable + branch to show up on the main branch, too. Rather than + rewrite a bugfix on the main branch, you can simply pull and + merge changes from the stable to the main branch, and + Mercurial will bring those bugfixes in for you.</para> + + &interaction.branching.merge; + + <para>The main branch will still contain changes that are not on + the stable branch, but it will also contain all of the + bugfixes from the stable branch. The stable branch remains + unaffected by these changes.</para> + + </sect2> + <sect2> + <title>Feature branches</title> + + <para>For larger projects, an effective way to manage change is + to break up a team into smaller groups. Each group has a + shared branch of its own, cloned from a single + <quote>master</quote> branch used by the entire project. + People working on an individual branch are typically quite + isolated from developments on other branches.</para> + + <informalfigure id="fig:collab:feature-branches"> + <mediaobject><imageobject><imagedata + fileref="feature-branches"/></imageobject><textobject><phrase>XXX + add text</phrase></textobject><caption><para>Feature + branches</para></caption></mediaobject> + </informalfigure> + + <para>When a particular feature is deemed to be in suitable + shape, someone on that feature team pulls and merges from the + master branch into the feature branch, then pushes back up to + the master branch.</para> + + </sect2> + <sect2> + <title>The release train</title> + + <para>Some projects are organised on a <quote>train</quote> + basis: a release is scheduled to happen every few months, and + whatever features are ready when the <quote>train</quote> is + ready to leave are allowed in.</para> + + <para>This model resembles working with feature branches. The + difference is that when a feature branch misses a train, + someone on the feature team pulls and merges the changes that + went out on that train release into the feature branch, and + the team continues its work on top of that release so that + their feature can make the next release.</para> + + </sect2> + <sect2> + <title>The Linux kernel model</title> + + <para>The development of the Linux kernel has a shallow + hierarchical structure, surrounded by a cloud of apparent + chaos. Because most Linux developers use + <command>git</command>, a distributed revision control tool + with capabilities similar to Mercurial, it's useful to + describe the way work flows in that environment; if you like + the ideas, the approach translates well across tools.</para> + + <para>At the center of the community sits Linus Torvalds, the + creator of Linux. He publishes a single source repository + that is considered the <quote>authoritative</quote> current + tree by the entire developer community. Anyone can clone + Linus's tree, but he is very choosy about whose trees he pulls + from.</para> + + <para>Linus has a number of <quote>trusted lieutenants</quote>. + As a general rule, he pulls whatever changes they publish, in + most cases without even reviewing those changes. Some of + those lieutenants are generally agreed to be + <quote>maintainers</quote>, responsible for specific + subsystems within the kernel. If a random kernel hacker wants + to make a change to a subsystem that they want to end up in + Linus's tree, they must find out who the subsystem's + maintainer is, and ask that maintainer to take their change. + If the maintainer reviews their changes and agrees to take + them, they'll pass them along to Linus in due course.</para> + + <para>Individual lieutenants have their own approaches to + reviewing, accepting, and publishing changes; and for deciding + when to feed them to Linus. In addition, there are several + well known branches that people use for different purposes. + For example, a few people maintain <quote>stable</quote> + repositories of older versions of the kernel, to which they + apply critical fixes as needed. Some maintainers publish + multiple trees: one for experimental changes; one for changes + that they are about to feed upstream; and so on. Others just + publish a single tree.</para> + + <para>This model has two notable features. The first is that + it's <quote>pull only</quote>. You have to ask, convince, or + beg another developer to take a change from you, because there + are almost no trees to which more than one person can push, + and there's no way to push changes into a tree that someone + else controls.</para> + + <para>The second is that it's based on reputation and acclaim. + If you're an unknown, Linus will probably ignore changes from + you without even responding. But a subsystem maintainer will + probably review them, and will likely take them if they pass + their criteria for suitability. The more <quote>good</quote> + changes you contribute to a maintainer, the more likely they + are to trust your judgment and accept your changes. If you're + well-known and maintain a long-lived branch for something + Linus hasn't yet accepted, people with similar interests may + pull your changes regularly to keep up with your work.</para> + + <para>Reputation and acclaim don't necessarily cross subsystem + or <quote>people</quote> boundaries. If you're a respected + but specialised storage hacker, and you try to fix a + networking bug, that change will receive a level of scrutiny + from a network maintainer comparable to a change from a + complete stranger.</para> + + <para>To people who come from more orderly project backgrounds, + the comparatively chaotic Linux kernel development process + often seems completely insane. It's subject to the whims of + individuals; people make sweeping changes whenever they deem + it appropriate; and the pace of development is astounding. + And yet Linux is a highly successful, well-regarded piece of + software.</para> + + </sect2> + <sect2> + <title>Pull-only versus shared-push collaboration</title> + + <para>A perpetual source of heat in the open source community is + whether a development model in which people only ever pull + changes from others is <quote>better than</quote> one in which + multiple people can push changes to a shared + repository.</para> + + <para>Typically, the backers of the shared-push model use tools + that actively enforce this approach. If you're using a + centralised revision control tool such as Subversion, there's + no way to make a choice over which model you'll use: the tool + gives you shared-push, and if you want to do anything else, + you'll have to roll your own approach on top (such as applying + a patch by hand).</para> + + <para>A good distributed revision control tool, such as + Mercurial, will support both models. You and your + collaborators can then structure how you work together based + on your own needs and preferences, not on what contortions + your tools force you into.</para> + + </sect2> + <sect2> + <title>Where collaboration meets branch management</title> + + <para>Once you and your team set up some shared repositories and + start propagating changes back and forth between local and + shared repos, you begin to face a related, but slightly + different challenge: that of managing the multiple directions + in which your team may be moving at once. Even though this + subject is intimately related to how your team collaborates, + it's dense enough to merit treatment of its own, in chapter + <xref linkend="chap:branch"/>.</para> + + </sect2> + </sect1> + <sect1> + <title>The technical side of sharing</title> + + <para>The remainder of this chapter is devoted to the question of + serving data to your collaborators.</para> + + </sect1> + <sect1 id="sec:collab:serve"> + <title>Informal sharing with <command role="hg-cmd">hg + serve</command></title> + + <para>Mercurial's <command role="hg-cmd">hg serve</command> + command is wonderfully suited to small, tight-knit, and + fast-paced group environments. It also provides a great way to + get a feel for using Mercurial commands over a network.</para> + + <para>Run <command role="hg-cmd">hg serve</command> inside a + repository, and in under a second it will bring up a specialised + HTTP server; this will accept connections from any client, and + serve up data for that repository until you terminate it. + Anyone who knows the URL of the server you just started, and can + talk to your computer over the network, can then use a web + browser or Mercurial to read data from that repository. A URL + for a <command role="hg-cmd">hg serve</command> instance running + on a laptop is likely to look something like + <literal>http://my-laptop.local:8000/</literal>.</para> + + <para>The <command role="hg-cmd">hg serve</command> command is + <emphasis>not</emphasis> a general-purpose web server. It can do + only two things:</para> + <itemizedlist> + <listitem><para>Allow people to browse the history of the + repository it's serving, from their normal web + browsers.</para> + </listitem> + <listitem><para>Speak Mercurial's wire protocol, so that people + can <command role="hg-cmd">hg clone</command> or <command + role="hg-cmd">hg pull</command> changes from that + repository.</para> + </listitem></itemizedlist> + <para>In particular, <command role="hg-cmd">hg serve</command> + won't allow remote users to <emphasis>modify</emphasis> your + repository. It's intended for read-only use.</para> + + <para>If you're getting started with Mercurial, there's nothing to + prevent you from using <command role="hg-cmd">hg serve</command> + to serve up a repository on your own computer, then use commands + like <command role="hg-cmd">hg clone</command>, <command + role="hg-cmd">hg incoming</command>, and so on to talk to that + server as if the repository was hosted remotely. This can help + you to quickly get acquainted with using commands on + network-hosted repositories.</para> + + <sect2> + <title>A few things to keep in mind</title> + + <para>Because it provides unauthenticated read access to all + clients, you should only use <command role="hg-cmd">hg + serve</command> in an environment where you either don't + care, or have complete control over, who can access your + network and pull data from your repository.</para> + + <para>The <command role="hg-cmd">hg serve</command> command + knows nothing about any firewall software you might have + installed on your system or network. It cannot detect or + control your firewall software. If other people are unable to + talk to a running <command role="hg-cmd">hg serve</command> + instance, the second thing you should do + (<emphasis>after</emphasis> you make sure that they're using + the correct URL) is check your firewall configuration.</para> + + <para>By default, <command role="hg-cmd">hg serve</command> + listens for incoming connections on port 8000. If another + process is already listening on the port you want to use, you + can specify a different port to listen on using the <option + role="hg-opt-serve">-p</option> option.</para> + + <para>Normally, when <command role="hg-cmd">hg serve</command> + starts, it prints no output, which can be a bit unnerving. If + you'd like to confirm that it is indeed running correctly, and + find out what URL you should send to your collaborators, start + it with the <option role="hg-opt-global">-v</option> + option.</para> + + </sect2> + </sect1> + <sect1 id="sec:collab:ssh"> + <title>Using the Secure Shell (ssh) protocol</title> + + <para>You can pull and push changes securely over a network + connection using the Secure Shell (<literal>ssh</literal>) + protocol. To use this successfully, you may have to do a little + bit of configuration on the client or server sides.</para> + + <para>If you're not familiar with ssh, it's a network protocol + that lets you securely communicate with another computer. To + use it with Mercurial, you'll be setting up one or more user + accounts on a server so that remote users can log in and execute + commands.</para> + + <para>(If you <emphasis>are</emphasis> familiar with ssh, you'll + probably find some of the material that follows to be elementary + in nature.)</para> + + <sect2> + <title>How to read and write ssh URLs</title> + + <para>An ssh URL tends to look like this:</para> + <programlisting>ssh://bos@hg.serpentine.com:22/hg/hgbook</programlisting> + <orderedlist> + <listitem><para>The <quote><literal>ssh://</literal></quote> + part tells Mercurial to use the ssh protocol.</para> + </listitem> + <listitem><para>The <quote><literal>bos@</literal></quote> + component indicates what username to log into the server + as. You can leave this out if the remote username is the + same as your local username.</para> + </listitem> + <listitem><para>The + <quote><literal>hg.serpentine.com</literal></quote> gives + the hostname of the server to log into.</para> + </listitem> + <listitem><para>The <quote>:22</quote> identifies the port + number to connect to the server on. The default port is + 22, so you only need to specify a colon and port number if + you're <emphasis>not</emphasis> using port 22.</para> + </listitem> + <listitem><para>The remainder of the URL is the local path to + the repository on the server.</para> + </listitem></orderedlist> + + <para>There's plenty of scope for confusion with the path + component of ssh URLs, as there is no standard way for tools + to interpret it. Some programs behave differently than others + when dealing with these paths. This isn't an ideal situation, + but it's unlikely to change. Please read the following + paragraphs carefully.</para> + + <para>Mercurial treats the path to a repository on the server as + relative to the remote user's home directory. For example, if + user <literal>foo</literal> on the server has a home directory + of <filename class="directory">/home/foo</filename>, then an + ssh URL that contains a path component of <filename + class="directory">bar</filename> <emphasis>really</emphasis> + refers to the directory <filename + class="directory">/home/foo/bar</filename>.</para> + + <para>If you want to specify a path relative to another user's + home directory, you can use a path that starts with a tilde + character followed by the user's name (let's call them + <literal>otheruser</literal>), like this.</para> + <programlisting>ssh://server/~otheruser/hg/repo</programlisting> + + <para>And if you really want to specify an + <emphasis>absolute</emphasis> path on the server, begin the + path component with two slashes, as in this example.</para> + <programlisting>ssh://server//absolute/path</programlisting> + + </sect2> + <sect2> + <title>Finding an ssh client for your system</title> + + <para>Almost every Unix-like system comes with OpenSSH + preinstalled. If you're using such a system, run + <literal>which ssh</literal> to find out if the + <command>ssh</command> command is installed (it's usually in + <filename class="directory">/usr/bin</filename>). In the + unlikely event that it isn't present, take a look at your + system documentation to figure out how to install it.</para> + + <para>On Windows, you'll first need to download a suitable ssh + client. There are two alternatives.</para> + <itemizedlist> + <listitem><para>Simon Tatham's excellent PuTTY package + <citation>web:putty</citation> provides a complete suite + of ssh client commands.</para> + </listitem> + <listitem><para>If you have a high tolerance for pain, you can + use the Cygwin port of OpenSSH.</para> + </listitem></itemizedlist> + <para>In either case, you'll need to edit your <filename + role="special">hg.ini</filename> file to + tell Mercurial where to find the actual client command. For + example, if you're using PuTTY, you'll need to use the + <command>plink</command> command as a command-line ssh + client.</para> + <programlisting>[ui] +ssh = C:/path/to/plink.exe -ssh -i "C:/path/to/my/private/key"</programlisting> + + <note> + <para> The path to <command>plink</command> shouldn't contain + any whitespace characters, or Mercurial may not be able to + run it correctly (so putting it in <filename + class="directory">C:\Program Files</filename> is probably + not a good idea).</para> + </note> + + </sect2> + <sect2> + <title>Generating a key pair</title> + + <para>To avoid the need to repetitively type a password every + time you need to use your ssh client, I recommend generating a + key pair. On a Unix-like system, the + <command>ssh-keygen</command> command will do the trick. On + Windows, if you're using PuTTY, the + <command>puttygen</command> command is what you'll + need.</para> + + <para>When you generate a key pair, it's usually + <emphasis>highly</emphasis> advisable to protect it with a + passphrase. (The only time that you might not want to do this + is when you're using the ssh protocol for automated tasks on a + secure network.)</para> + + <para>Simply generating a key pair isn't enough, however. + You'll need to add the public key to the set of authorised + keys for whatever user you're logging in remotely as. For + servers using OpenSSH (the vast majority), this will mean + adding the public key to a list in a file called <filename + role="special">authorized_keys</filename> in their <filename + role="special" class="directory">.ssh</filename> + directory.</para> + + <para>On a Unix-like system, your public key will have a + <filename>.pub</filename> extension. If you're using + <command>puttygen</command> on Windows, you can save the + public key to a file of your choosing, or paste it from the + window it's displayed in straight into the <filename + role="special">authorized_keys</filename> file.</para> + + </sect2> + <sect2> + <title>Using an authentication agent</title> + + <para>An authentication agent is a daemon that stores + passphrases in memory (so it will forget passphrases if you + log out and log back in again). An ssh client will notice if + it's running, and query it for a passphrase. If there's no + authentication agent running, or the agent doesn't store the + necessary passphrase, you'll have to type your passphrase + every time Mercurial tries to communicate with a server on + your behalf (e.g. whenever you pull or push changes).</para> + + <para>The downside of storing passphrases in an agent is that + it's possible for a well-prepared attacker to recover the + plain text of your passphrases, in some cases even if your + system has been power-cycled. You should make your own + judgment as to whether this is an acceptable risk. It + certainly saves a lot of repeated typing.</para> + + <para>On Unix-like systems, the agent is called + <command>ssh-agent</command>, and it's often run automatically + for you when you log in. You'll need to use the + <command>ssh-add</command> command to add passphrases to the + agent's store. On Windows, if you're using PuTTY, the + <command>pageant</command> command acts as the agent. It adds + an icon to your system tray that will let you manage stored + passphrases.</para> + + </sect2> + <sect2> + <title>Configuring the server side properly</title> + + <para>Because ssh can be fiddly to set up if you're new to it, + there's a variety of things that can go wrong. Add Mercurial + on top, and there's plenty more scope for head-scratching. + Most of these potential problems occur on the server side, not + the client side. The good news is that once you've gotten a + configuration working, it will usually continue to work + indefinitely.</para> + + <para>Before you try using Mercurial to talk to an ssh server, + it's best to make sure that you can use the normal + <command>ssh</command> or <command>putty</command> command to + talk to the server first. If you run into problems with using + these commands directly, Mercurial surely won't work. Worse, + it will obscure the underlying problem. Any time you want to + debug ssh-related Mercurial problems, you should drop back to + making sure that plain ssh client commands work first, + <emphasis>before</emphasis> you worry about whether there's a + problem with Mercurial.</para> + + <para>The first thing to be sure of on the server side is that + you can actually log in from another machine at all. If you + can't use <command>ssh</command> or <command>putty</command> + to log in, the error message you get may give you a few hints + as to what's wrong. The most common problems are as + follows.</para> + <itemizedlist> + <listitem><para>If you get a <quote>connection refused</quote> + error, either there isn't an SSH daemon running on the + server at all, or it's inaccessible due to firewall + configuration.</para> + </listitem> + <listitem><para>If you get a <quote>no route to host</quote> + error, you either have an incorrect address for the server + or a seriously locked down firewall that won't admit its + existence at all.</para> + </listitem> + <listitem><para>If you get a <quote>permission denied</quote> + error, you may have mistyped the username on the server, + or you could have mistyped your key's passphrase or the + remote user's password.</para> + </listitem></itemizedlist> + <para>In summary, if you're having trouble talking to the + server's ssh daemon, first make sure that one is running at + all. On many systems it will be installed, but disabled, by + default. Once you're done with this step, you should then + check that the server's firewall is configured to allow + incoming connections on the port the ssh daemon is listening + on (usually 22). Don't worry about more exotic possibilities + for misconfiguration until you've checked these two + first.</para> + + <para>If you're using an authentication agent on the client side + to store passphrases for your keys, you ought to be able to + log into the server without being prompted for a passphrase or + a password. If you're prompted for a passphrase, there are a + few possible culprits.</para> + <itemizedlist> + <listitem><para>You might have forgotten to use + <command>ssh-add</command> or <command>pageant</command> + to store the passphrase.</para> + </listitem> + <listitem><para>You might have stored the passphrase for the + wrong key.</para> + </listitem></itemizedlist> + <para>If you're being prompted for the remote user's password, + there are another few possible problems to check.</para> + <itemizedlist> + <listitem><para>Either the user's home directory or their + <filename role="special" class="directory">.ssh</filename> + directory might have excessively liberal permissions. As + a result, the ssh daemon will not trust or read their + <filename role="special">authorized_keys</filename> file. + For example, a group-writable home or <filename + role="special" class="directory">.ssh</filename> + directory will often cause this symptom.</para> + </listitem> + <listitem><para>The user's <filename + role="special">authorized_keys</filename> file may have + a problem. If anyone other than the user owns or can write + to that file, the ssh daemon will not trust or read + it.</para> + </listitem></itemizedlist> + + <para>In the ideal world, you should be able to run the + following command successfully, and it should print exactly + one line of output, the current date and time.</para> + <programlisting>ssh myserver date</programlisting> + + <para>If, on your server, you have login scripts that print + banners or other junk even when running non-interactive + commands like this, you should fix them before you continue, + so that they only print output if they're run interactively. + Otherwise these banners will at least clutter up Mercurial's + output. Worse, they could potentially cause problems with + running Mercurial commands remotely. Mercurial makes tries to + detect and ignore banners in non-interactive + <command>ssh</command> sessions, but it is not foolproof. (If + you're editing your login scripts on your server, the usual + way to see if a login script is running in an interactive + shell is to check the return code from the command + <literal>tty -s</literal>.)</para> + + <para>Once you've verified that plain old ssh is working with + your server, the next step is to ensure that Mercurial runs on + the server. The following command should run + successfully:</para> + + <programlisting>ssh myserver hg version</programlisting> + + <para>If you see an error message instead of normal <command + role="hg-cmd">hg version</command> output, this is usually + because you haven't installed Mercurial to <filename + class="directory">/usr/bin</filename>. Don't worry if this + is the case; you don't need to do that. But you should check + for a few possible problems.</para> + <itemizedlist> + <listitem><para>Is Mercurial really installed on the server at + all? I know this sounds trivial, but it's worth + checking!</para> + </listitem> + <listitem><para>Maybe your shell's search path (usually set + via the <envar>PATH</envar> environment variable) is + simply misconfigured.</para> + </listitem> + <listitem><para>Perhaps your <envar>PATH</envar> environment + variable is only being set to point to the location of the + <command>hg</command> executable if the login session is + interactive. This can happen if you're setting the path + in the wrong shell login script. See your shell's + documentation for details.</para> + </listitem> + <listitem><para>The <envar>PYTHONPATH</envar> environment + variable may need to contain the path to the Mercurial + Python modules. It might not be set at all; it could be + incorrect; or it may be set only if the login is + interactive.</para> + </listitem></itemizedlist> + + <para>If you can run <command role="hg-cmd">hg version</command> + over an ssh connection, well done! You've got the server and + client sorted out. You should now be able to use Mercurial to + access repositories hosted by that username on that server. + If you run into problems with Mercurial and ssh at this point, + try using the <option role="hg-opt-global">--debug</option> + option to get a clearer picture of what's going on.</para> + + </sect2> + <sect2> + <title>Using compression with ssh</title> + + <para>Mercurial does not compress data when it uses the ssh + protocol, because the ssh protocol can transparently compress + data. However, the default behaviour of ssh clients is + <emphasis>not</emphasis> to request compression.</para> + + <para>Over any network other than a fast LAN (even a wireless + network), using compression is likely to significantly speed + up Mercurial's network operations. For example, over a WAN, + someone measured compression as reducing the amount of time + required to clone a particularly large repository from 51 + minutes to 17 minutes.</para> + + <para>Both <command>ssh</command> and <command>plink</command> + accept a <option role="cmd-opt-ssh">-C</option> option which + turns on compression. You can easily edit your <filename + role="special">~/.hgrc</filename> to enable compression for + all of Mercurial's uses of the ssh protocol.</para> + <programlisting>[ui] +ssh = ssh -C</programlisting> + + <para>If you use <command>ssh</command>, you can configure it to + always use compression when talking to your server. To do + this, edit your <filename + role="special">.ssh/config</filename> file (which may not + yet exist), as follows.</para> + <programlisting>Host hg + Compression yes + HostName hg.example.com</programlisting> + <para>This defines an alias, <literal>hg</literal>. When you + use it on the <command>ssh</command> command line or in a + Mercurial <literal>ssh</literal>-protocol URL, it will cause + <command>ssh</command> to connect to + <literal>hg.example.com</literal> and use compression. This + gives you both a shorter name to type and compression, each of + which is a good thing in its own right.</para> + + </sect2> + </sect1> + <sect1 id="sec:collab:cgi"> + <title>Serving over HTTP using CGI</title> + + <para>Depending on how ambitious you are, configuring Mercurial's + CGI interface can take anything from a few moments to several + hours.</para> + + <para>We'll begin with the simplest of examples, and work our way + towards a more complex configuration. Even for the most basic + case, you're almost certainly going to need to read and modify + your web server's configuration.</para> + + <note> + <para> Configuring a web server is a complex, fiddly, and + highly system-dependent activity. I can't possibly give you + instructions that will cover anything like all of the cases + you will encounter. Please use your discretion and judgment in + following the sections below. Be prepared to make plenty of + mistakes, and to spend a lot of time reading your server's + error logs.</para> + </note> + + <sect2> + <title>Web server configuration checklist</title> + + <para>Before you continue, do take a few moments to check a few + aspects of your system's setup.</para> + + <orderedlist> + <listitem><para>Do you have a web server installed at all? + Mac OS X ships with Apache, but many other systems may not + have a web server installed.</para> + </listitem> + <listitem><para>If you have a web server installed, is it + actually running? On most systems, even if one is + present, it will be disabled by default.</para> + </listitem> + <listitem><para>Is your server configured to allow you to run + CGI programs in the directory where you plan to do so? + Most servers default to explicitly disabling the ability + to run CGI programs.</para> + </listitem></orderedlist> + + <para>If you don't have a web server installed, and don't have + substantial experience configuring Apache, you should consider + using the <literal>lighttpd</literal> web server instead of + Apache. Apache has a well-deserved reputation for baroque and + confusing configuration. While <literal>lighttpd</literal> is + less capable in some ways than Apache, most of these + capabilities are not relevant to serving Mercurial + repositories. And <literal>lighttpd</literal> is undeniably + <emphasis>much</emphasis> easier to get started with than + Apache.</para> + + </sect2> + <sect2> + <title>Basic CGI configuration</title> + + <para>On Unix-like systems, it's common for users to have a + subdirectory named something like <filename + class="directory">public_html</filename> in their home + directory, from which they can serve up web pages. A file + named <filename>foo</filename> in this directory will be + accessible at a URL of the form + <literal>http://www.example.com/username/foo</literal>.</para> + + <para>To get started, find the <filename + role="special">hgweb.cgi</filename> script that should be + present in your Mercurial installation. If you can't quickly + find a local copy on your system, simply download one from the + master Mercurial repository at <ulink + url="http://www.selenic.com/repo/hg/raw-file/tip/hgweb.cgi">http://www.selenic.com/repo/hg/raw-file/tip/hgweb.cgi</ulink>.</para> + + <para>You'll need to copy this script into your <filename + class="directory">public_html</filename> directory, and + ensure that it's executable.</para> + <programlisting>cp .../hgweb.cgi ~/public_html +chmod 755 ~/public_html/hgweb.cgi</programlisting> + <para>The <literal>755</literal> argument to + <command>chmod</command> is a little more general than just + making the script executable: it ensures that the script is + executable by anyone, and that <quote>group</quote> and + <quote>other</quote> write permissions are + <emphasis>not</emphasis> set. If you were to leave those + write permissions enabled, Apache's <literal>suexec</literal> + subsystem would likely refuse to execute the script. In fact, + <literal>suexec</literal> also insists that the + <emphasis>directory</emphasis> in which the script resides + must not be writable by others.</para> + <programlisting>chmod 755 ~/public_html</programlisting> + + <sect3 id="sec:collab:wtf"> + <title>What could <emphasis>possibly</emphasis> go + wrong?</title> + + <para>Once you've copied the CGI script into place, go into a + web browser, and try to open the URL <ulink + url="http://myhostname/ + myuser/hgweb.cgi">http://myhostname/ + myuser/hgweb.cgi</ulink>, <emphasis>but</emphasis> brace + yourself for instant failure. There's a high probability + that trying to visit this URL will fail, and there are many + possible reasons for this. In fact, you're likely to + stumble over almost every one of the possible errors below, + so please read carefully. The following are all of the + problems I ran into on a system running Fedora 7, with a + fresh installation of Apache, and a user account that I + created specially to perform this exercise.</para> + + <para>Your web server may have per-user directories disabled. + If you're using Apache, search your config file for a + <literal>UserDir</literal> directive. If there's none + present, per-user directories will be disabled. If one + exists, but its value is <literal>disabled</literal>, then + per-user directories will be disabled. Otherwise, the + string after <literal>UserDir</literal> gives the name of + the subdirectory that Apache will look in under your home + directory, for example <filename + class="directory">public_html</filename>.</para> + + <para>Your file access permissions may be too restrictive. + The web server must be able to traverse your home directory + and directories under your <filename + class="directory">public_html</filename> directory, and + read files under the latter too. Here's a quick recipe to + help you to make your permissions more appropriate.</para> + <programlisting>chmod 755 ~ +find ~/public_html -type d -print0 | xargs -0r chmod 755 +find ~/public_html -type f -print0 | xargs -0r chmod 644</programlisting> + + <para>The other possibility with permissions is that you might + get a completely empty window when you try to load the + script. In this case, it's likely that your access + permissions are <emphasis>too permissive</emphasis>. Apache's + <literal>suexec</literal> subsystem won't execute a script + that's group- or world-writable, for example.</para> + + <para>Your web server may be configured to disallow execution + of CGI programs in your per-user web directory. Here's + Apache's default per-user configuration from my Fedora + system.</para> + + &ch06-apache-config.lst; + + <para>If you find a similar-looking + <literal>Directory</literal> group in your Apache + configuration, the directive to look at inside it is + <literal>Options</literal>. Add <literal>ExecCGI</literal> + to the end of this list if it's missing, and restart the web + server.</para> + + <para>If you find that Apache serves you the text of the CGI + script instead of executing it, you may need to either + uncomment (if already present) or add a directive like + this.</para> + <programlisting>AddHandler cgi-script .cgi</programlisting> + + <para>The next possibility is that you might be served with a + colourful Python backtrace claiming that it can't import a + <literal>mercurial</literal>-related module. This is + actually progress! The server is now capable of executing + your CGI script. This error is only likely to occur if + you're running a private installation of Mercurial, instead + of a system-wide version. Remember that the web server runs + the CGI program without any of the environment variables + that you take for granted in an interactive session. If + this error happens to you, edit your copy of <filename + role="special">hgweb.cgi</filename> and follow the + directions inside it to correctly set your + <envar>PYTHONPATH</envar> environment variable.</para> + + <para>Finally, you are <emphasis>certain</emphasis> to by + served with another colourful Python backtrace: this one + will complain that it can't find <filename + class="directory">/path/to/repository</filename>. Edit + your <filename role="special">hgweb.cgi</filename> script + and replace the <filename + class="directory">/path/to/repository</filename> string + with the complete path to the repository you want to serve + up.</para> + + <para>At this point, when you try to reload the page, you + should be presented with a nice HTML view of your + repository's history. Whew!</para> + + </sect3> + <sect3> + <title>Configuring lighttpd</title> + + <para>To be exhaustive in my experiments, I tried configuring + the increasingly popular <literal>lighttpd</literal> web + server to serve the same repository as I described with + Apache above. I had already overcome all of the problems I + outlined with Apache, many of which are not server-specific. + As a result, I was fairly sure that my file and directory + permissions were good, and that my <filename + role="special">hgweb.cgi</filename> script was properly + edited.</para> + + <para>Once I had Apache running, getting + <literal>lighttpd</literal> to serve the repository was a + snap (in other words, even if you're trying to use + <literal>lighttpd</literal>, you should read the Apache + section). I first had to edit the + <literal>mod_access</literal> section of its config file to + enable <literal>mod_cgi</literal> and + <literal>mod_userdir</literal>, both of which were disabled + by default on my system. I then added a few lines to the + end of the config file, to configure these modules.</para> + <programlisting>userdir.path = "public_html" +cgi.assign = (".cgi" => "" )</programlisting> + <para>With this done, <literal>lighttpd</literal> ran + immediately for me. If I had configured + <literal>lighttpd</literal> before Apache, I'd almost + certainly have run into many of the same system-level + configuration problems as I did with Apache. However, I + found <literal>lighttpd</literal> to be noticeably easier to + configure than Apache, even though I've used Apache for over + a decade, and this was my first exposure to + <literal>lighttpd</literal>.</para> + + </sect3> + </sect2> + <sect2> + <title>Sharing multiple repositories with one CGI script</title> + + <para>The <filename role="special">hgweb.cgi</filename> script + only lets you publish a single repository, which is an + annoying restriction. If you want to publish more than one + without wracking yourself with multiple copies of the same + script, each with different names, a better choice is to use + the <filename role="special">hgwebdir.cgi</filename> + script.</para> + + <para>The procedure to configure <filename + role="special">hgwebdir.cgi</filename> is only a little more + involved than for <filename + role="special">hgweb.cgi</filename>. First, you must obtain + a copy of the script. If you don't have one handy, you can + download a copy from the master Mercurial repository at <ulink + url="http://www.selenic.com/repo/hg/raw-file/tip/hgwebdir.cgi">http://www.selenic.com/repo/hg/raw-file/tip/hgwebdir.cgi</ulink>.</para> + + <para>You'll need to copy this script into your <filename + class="directory">public_html</filename> directory, and + ensure that it's executable.</para> + <programlisting>cp .../hgwebdir.cgi ~/public_html +chmod 755 ~/public_html ~/public_html/hgwebdir.cgi</programlisting> + <para>With basic configuration out of the way, try to visit + <ulink url="http://myhostname/ + myuser/hgwebdir.cgi">http://myhostname/ + myuser/hgwebdir.cgi</ulink> in your browser. It should + display an empty list of repositories. If you get a blank + window or error message, try walking through the list of + potential problems in section <xref + linkend="sec:collab:wtf"/>.</para> + + <para>The <filename role="special">hgwebdir.cgi</filename> + script relies on an external configuration file. By default, + it searches for a file named <filename + role="special">hgweb.config</filename> in the same directory + as itself. You'll need to create this file, and make it + world-readable. The format of the file is similar to a + Windows <quote>ini</quote> file, as understood by Python's + <literal>ConfigParser</literal> + <citation>web:configparser</citation> module.</para> + + <para>The easiest way to configure <filename + role="special">hgwebdir.cgi</filename> is with a section + named <literal>collections</literal>. This will automatically + publish <emphasis>every</emphasis> repository under the + directories you name. The section should look like + this:</para> + <programlisting>[collections] +/my/root = /my/root</programlisting> + <para>Mercurial interprets this by looking at the directory name + on the <emphasis>right</emphasis> hand side of the + <quote><literal>=</literal></quote> sign; finding repositories + in that directory hierarchy; and using the text on the + <emphasis>left</emphasis> to strip off matching text from the + names it will actually list in the web interface. The + remaining component of a path after this stripping has + occurred is called a <quote>virtual path</quote>.</para> + + <para>Given the example above, if we have a repository whose + local path is <filename + class="directory">/my/root/this/repo</filename>, the CGI + script will strip the leading <filename + class="directory">/my/root</filename> from the name, and + publish the repository with a virtual path of <filename + class="directory">this/repo</filename>. If the base URL for + our CGI script is <ulink url="http://myhostname/ + myuser/hgwebdir.cgi">http://myhostname/ + myuser/hgwebdir.cgi</ulink>, the complete URL for that + repository will be <ulink url="http://myhostname/ + myuser/hgwebdir.cgi/this/repo">http://myhostname/ + myuser/hgwebdir.cgi/this/repo</ulink>.</para> + + <para>If we replace <filename + class="directory">/my/root</filename> on the left hand side + of this example with <filename + class="directory">/my</filename>, then <filename + role="special">hgwebdir.cgi</filename> will only strip off + <filename class="directory">/my</filename> from the repository + name, and will give us a virtual path of <filename + class="directory">root/this/repo</filename> instead of + <filename class="directory">this/repo</filename>.</para> + + <para>The <filename role="special">hgwebdir.cgi</filename> + script will recursively search each directory listed in the + <literal>collections</literal> section of its configuration + file, but it will <literal>not</literal> recurse into the + repositories it finds.</para> + + <para>The <literal>collections</literal> mechanism makes it easy + to publish many repositories in a <quote>fire and + forget</quote> manner. You only need to set up the CGI + script and configuration file one time. Afterwards, you can + publish or unpublish a repository at any time by simply moving + it into, or out of, the directory hierarchy in which you've + configured <filename role="special">hgwebdir.cgi</filename> to + look.</para> + + <sect3> + <title>Explicitly specifying which repositories to + publish</title> + + <para>In addition to the <literal>collections</literal> + mechanism, the <filename + role="special">hgwebdir.cgi</filename> script allows you + to publish a specific list of repositories. To do so, + create a <literal>paths</literal> section, with contents of + the following form.</para> + <programlisting>[paths] +repo1 = /my/path/to/some/repo +repo2 = /some/path/to/another</programlisting> + <para>In this case, the virtual path (the component that will + appear in a URL) is on the left hand side of each + definition, while the path to the repository is on the + right. Notice that there does not need to be any + relationship between the virtual path you choose and the + location of a repository in your filesystem.</para> + + <para>If you wish, you can use both the + <literal>collections</literal> and <literal>paths</literal> + mechanisms simultaneously in a single configuration + file.</para> + + <note> + <para> If multiple repositories have the same virtual path, + <filename role="special">hgwebdir.cgi</filename> will not + report an error. Instead, it will behave + unpredictably.</para> + </note> + + </sect3> + </sect2> + <sect2> + <title>Downloading source archives</title> + + <para>Mercurial's web interface lets users download an archive + of any revision. This archive will contain a snapshot of the + working directory as of that revision, but it will not contain + a copy of the repository data.</para> + + <para>By default, this feature is not enabled. To enable it, + you'll need to add an <envar + role="rc-item-web">allow_archive</envar> item to the + <literal role="rc-web">web</literal> section of your <filename + role="special">~/.hgrc</filename>.</para> + + </sect2> + <sect2> + <title>Web configuration options</title> + + <para>Mercurial's web interfaces (the <command role="hg-cmd">hg + serve</command> command, and the <filename + role="special">hgweb.cgi</filename> and <filename + role="special">hgwebdir.cgi</filename> scripts) have a + number of configuration options that you can set. These + belong in a section named <literal + role="rc-web">web</literal>.</para> + <itemizedlist> + <listitem><para><envar + role="rc-item-web">allow_archive</envar>: Determines + which (if any) archive download mechanisms Mercurial + supports. If you enable this feature, users of the web + interface will be able to download an archive of whatever + revision of a repository they are viewing. To enable the + archive feature, this item must take the form of a + sequence of words drawn from the list below.</para> + <itemizedlist> + <listitem><para><literal>bz2</literal>: A + <command>tar</command> archive, compressed using + <literal>bzip2</literal> compression. This has the + best compression ratio, but uses the most CPU time on + the server.</para> + </listitem> + <listitem><para><literal>gz</literal>: A + <command>tar</command> archive, compressed using + <literal>gzip</literal> compression.</para> + </listitem> + <listitem><para><literal>zip</literal>: A + <command>zip</command> archive, compressed using LZW + compression. This format has the worst compression + ratio, but is widely used in the Windows world.</para> + </listitem> + </itemizedlist> + <para> If you provide an empty list, or don't have an + <envar role="rc-item-web">allow_archive</envar> entry at + all, this feature will be disabled. Here is an example of + how to enable all three supported formats.</para> + <programlisting>[web] +allow_archive = bz2 gz zip</programlisting> + </listitem> + <listitem><para><envar role="rc-item-web">allowpull</envar>: + Boolean. Determines whether the web interface allows + remote users to <command role="hg-cmd">hg pull</command> + and <command role="hg-cmd">hg clone</command> this + repository over HTTP. If set to <literal>no</literal> or + <literal>false</literal>, only the + <quote>human-oriented</quote> portion of the web interface + is available.</para> + </listitem> + <listitem><para><envar role="rc-item-web">contact</envar>: + String. A free-form (but preferably brief) string + identifying the person or group in charge of the + repository. This often contains the name and email + address of a person or mailing list. It often makes sense + to place this entry in a repository's own <filename + role="special">.hg/hgrc</filename> file, but it can make + sense to use in a global <filename + role="special">~/.hgrc</filename> if every repository + has a single maintainer.</para> + </listitem> + <listitem><para><envar role="rc-item-web">maxchanges</envar>: + Integer. The default maximum number of changesets to + display in a single page of output.</para> + </listitem> + <listitem><para><envar role="rc-item-web">maxfiles</envar>: + Integer. The default maximum number of modified files to + display in a single page of output.</para> + </listitem> + <listitem><para><envar role="rc-item-web">stripes</envar>: + Integer. If the web interface displays alternating + <quote>stripes</quote> to make it easier to visually align + rows when you are looking at a table, this number controls + the number of rows in each stripe.</para> + </listitem> + <listitem><para><envar role="rc-item-web">style</envar>: + Controls the template Mercurial uses to display the web + interface. Mercurial ships with two web templates, named + <literal>default</literal> and <literal>gitweb</literal> + (the latter is much more visually attractive). You can + also specify a custom template of your own; see chapter + <xref linkend="chap:template"/> for details. + Here, you can see how to enable the + <literal>gitweb</literal> style.</para> + <programlisting>[web] +style = gitweb</programlisting> + </listitem> + <listitem><para><envar role="rc-item-web">templates</envar>: + Path. The directory in which to search for template + files. By default, Mercurial searches in the directory in + which it was installed.</para> + </listitem></itemizedlist> + <para>If you are using <filename + role="special">hgwebdir.cgi</filename>, you can place a few + configuration items in a <literal role="rc-web">web</literal> + section of the <filename + role="special">hgweb.config</filename> file instead of a + <filename role="special">~/.hgrc</filename> file, for + convenience. These items are <envar + role="rc-item-web">motd</envar> and <envar + role="rc-item-web">style</envar>.</para> + + <sect3> + <title>Options specific to an individual repository</title> + + <para>A few <literal role="rc-web">web</literal> configuration + items ought to be placed in a repository's local <filename + role="special">.hg/hgrc</filename>, rather than a user's + or global <filename role="special">~/.hgrc</filename>.</para> + <itemizedlist> + <listitem><para><envar + role="rc-item-web">description</envar>: String. A + free-form (but preferably brief) string that describes + the contents or purpose of the repository.</para> + </listitem> + <listitem><para><envar role="rc-item-web">name</envar>: + String. The name to use for the repository in the web + interface. This overrides the default name, which is + the last component of the repository's path.</para> + </listitem></itemizedlist> + + </sect3> + <sect3> + <title>Options specific to the <command role="hg-cmd">hg + serve</command> command</title> + + <para>Some of the items in the <literal + role="rc-web">web</literal> section of a <filename + role="special">~/.hgrc</filename> file are only for use + with the <command role="hg-cmd">hg serve</command> + command.</para> + <itemizedlist> + <listitem><para><envar role="rc-item-web">accesslog</envar>: + Path. The name of a file into which to write an access + log. By default, the <command role="hg-cmd">hg + serve</command> command writes this information to + standard output, not to a file. Log entries are written + in the standard <quote>combined</quote> file format used + by almost all web servers.</para> + </listitem> + <listitem><para><envar role="rc-item-web">address</envar>: + String. The local address on which the server should + listen for incoming connections. By default, the server + listens on all addresses.</para> + </listitem> + <listitem><para><envar role="rc-item-web">errorlog</envar>: + Path. The name of a file into which to write an error + log. By default, the <command role="hg-cmd">hg + serve</command> command writes this information to + standard error, not to a file.</para> + </listitem> + <listitem><para><envar role="rc-item-web">ipv6</envar>: + Boolean. Whether to use the IPv6 protocol. By default, + IPv6 is not used.</para> + </listitem> + <listitem><para><envar role="rc-item-web">port</envar>: + Integer. The TCP port number on which the server should + listen. The default port number used is 8000.</para> + </listitem></itemizedlist> + + </sect3> + <sect3> + <title>Choosing the right <filename + role="special">~/.hgrc</filename> file to add <literal + role="rc-web">web</literal> items to</title> + + <para>It is important to remember that a web server like + Apache or <literal>lighttpd</literal> will run under a user + ID that is different to yours. CGI scripts run by your + server, such as <filename + role="special">hgweb.cgi</filename>, will usually also run + under that user ID.</para> + + <para>If you add <literal role="rc-web">web</literal> items to + your own personal <filename role="special">~/.hgrc</filename> file, CGI scripts won't read that + <filename role="special">~/.hgrc</filename> file. Those + settings will thus only affect the behaviour of the <command + role="hg-cmd">hg serve</command> command when you run it. + To cause CGI scripts to see your settings, either create a + <filename role="special">~/.hgrc</filename> file in the + home directory of the user ID that runs your web server, or + add those settings to a system-wide <filename + role="special">~/.hgrc</filename> file.</para> + + + </sect3> + </sect2> + </sect1> +</chapter> + +<!-- +local variables: +sgml-parent-document: ("00book.xml" "book" "chapter") +end: +-->
--- a/en/ch05-daily.xml Wed Mar 18 00:08:22 2009 -0700 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,544 +0,0 @@ -<!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : --> - -<chapter id="chap:daily"> - <?dbhtml filename="mercurial-in-daily-use.html"?> - <title>Mercurial in daily use</title> - - <sect1> - <title>Telling Mercurial which files to track</title> - - <para>Mercurial does not work with files in your repository unless - you tell it to manage them. The <command role="hg-cmd">hg - status</command> command will tell you which files Mercurial - doesn't know about; it uses a - <quote><literal>?</literal></quote> to display such - files.</para> - - <para>To tell Mercurial to track a file, use the <command - role="hg-cmd">hg add</command> command. Once you have added a - file, the entry in the output of <command role="hg-cmd">hg - status</command> for that file changes from - <quote><literal>?</literal></quote> to - <quote><literal>A</literal></quote>.</para> - - &interaction.daily.files.add; - - <para>After you run a <command role="hg-cmd">hg commit</command>, - the files that you added before the commit will no longer be - listed in the output of <command role="hg-cmd">hg - status</command>. The reason for this is that <command - role="hg-cmd">hg status</command> only tells you about - <quote>interesting</quote> files&emdash;those that you have - modified or told Mercurial to do something with&emdash;by - default. If you have a repository that contains thousands of - files, you will rarely want to know about files that Mercurial - is tracking, but that have not changed. (You can still get this - information; we'll return to this later.)</para> - - <para>Once you add a file, Mercurial doesn't do anything with it - immediately. Instead, it will take a snapshot of the file's - state the next time you perform a commit. It will then continue - to track the changes you make to the file every time you commit, - until you remove the file.</para> - - <sect2> - <title>Explicit versus implicit file naming</title> - - <para>A useful behaviour that Mercurial has is that if you pass - the name of a directory to a command, every Mercurial command - will treat this as <quote>I want to operate on every file in - this directory and its subdirectories</quote>.</para> - - &interaction.daily.files.add-dir; - - <para>Notice in this example that Mercurial printed the names of - the files it added, whereas it didn't do so when we added the - file named <filename>a</filename> in the earlier - example.</para> - - <para>What's going on is that in the former case, we explicitly - named the file to add on the command line, so the assumption - that Mercurial makes in such cases is that you know what you - were doing, and it doesn't print any output.</para> - - <para>However, when we <emphasis>imply</emphasis> the names of - files by giving the name of a directory, Mercurial takes the - extra step of printing the name of each file that it does - something with. This makes it more clear what is happening, - and reduces the likelihood of a silent and nasty surprise. - This behaviour is common to most Mercurial commands.</para> - - </sect2> - <sect2> - <title>Aside: Mercurial tracks files, not directories</title> - - <para>Mercurial does not track directory information. Instead, - it tracks the path to a file. Before creating a file, it - first creates any missing directory components of the path. - After it deletes a file, it then deletes any empty directories - that were in the deleted file's path. This sounds like a - trivial distinction, but it has one minor practical - consequence: it is not possible to represent a completely - empty directory in Mercurial.</para> - - <para>Empty directories are rarely useful, and there are - unintrusive workarounds that you can use to achieve an - appropriate effect. The developers of Mercurial thus felt - that the complexity that would be required to manage empty - directories was not worth the limited benefit this feature - would bring.</para> - - <para>If you need an empty directory in your repository, there - are a few ways to achieve this. One is to create a directory, - then <command role="hg-cmd">hg add</command> a - <quote>hidden</quote> file to that directory. On Unix-like - systems, any file name that begins with a period - (<quote><literal>.</literal></quote>) is treated as hidden by - most commands and GUI tools. This approach is illustrated - below.</para> - -&interaction.daily.files.hidden; - - <para>Another way to tackle a need for an empty directory is to - simply create one in your automated build scripts before they - will need it.</para> - - </sect2> - </sect1> - <sect1> - <title>How to stop tracking a file</title> - - <para>Once you decide that a file no longer belongs in your - repository, use the <command role="hg-cmd">hg remove</command> - command; this deletes the file, and tells Mercurial to stop - tracking it. A removed file is represented in the output of - <command role="hg-cmd">hg status</command> with a - <quote><literal>R</literal></quote>.</para> - - &interaction.daily.files.remove; - - <para>After you <command role="hg-cmd">hg remove</command> a file, - Mercurial will no longer track changes to that file, even if you - recreate a file with the same name in your working directory. - If you do recreate a file with the same name and want Mercurial - to track the new file, simply <command role="hg-cmd">hg - add</command> it. Mercurial will know that the newly added - file is not related to the old file of the same name.</para> - - <sect2> - <title>Removing a file does not affect its history</title> - - <para>It is important to understand that removing a file has - only two effects.</para> - <itemizedlist> - <listitem><para>It removes the current version of the file - from the working directory.</para> - </listitem> - <listitem><para>It stops Mercurial from tracking changes to - the file, from the time of the next commit.</para> - </listitem></itemizedlist> - <para>Removing a file <emphasis>does not</emphasis> in any way - alter the <emphasis>history</emphasis> of the file.</para> - - <para>If you update the working directory to a changeset in - which a file that you have removed was still tracked, it will - reappear in the working directory, with the contents it had - when you committed that changeset. If you then update the - working directory to a later changeset, in which the file had - been removed, Mercurial will once again remove the file from - the working directory.</para> - - </sect2> - <sect2> - <title>Missing files</title> - - <para>Mercurial considers a file that you have deleted, but not - used <command role="hg-cmd">hg remove</command> to delete, to - be <emphasis>missing</emphasis>. A missing file is - represented with <quote><literal>!</literal></quote> in the - output of <command role="hg-cmd">hg status</command>. - Mercurial commands will not generally do anything with missing - files.</para> - - &interaction.daily.files.missing; - - <para>If your repository contains a file that <command - role="hg-cmd">hg status</command> reports as missing, and - you want the file to stay gone, you can run <command - role="hg-cmd">hg remove <option - role="hg-opt-remove">--after</option></command> at any - time later on, to tell Mercurial that you really did mean to - remove the file.</para> - - &interaction.daily.files.remove-after; - - <para>On the other hand, if you deleted the missing file by - accident, give <command role="hg-cmd">hg revert</command> the - name of the file to recover. It will reappear, in unmodified - form.</para> - -&interaction.daily.files.recover-missing; - - </sect2> - <sect2> - <title>Aside: why tell Mercurial explicitly to remove a - file?</title> - - <para>You might wonder why Mercurial requires you to explicitly - tell it that you are deleting a file. Early during the - development of Mercurial, it let you delete a file however you - pleased; Mercurial would notice the absence of the file - automatically when you next ran a <command role="hg-cmd">hg - commit</command>, and stop tracking the file. In practice, - this made it too easy to accidentally remove a file without - noticing.</para> - - </sect2> - <sect2> - <title>Useful shorthand&emdash;adding and removing files in one - step</title> - - <para>Mercurial offers a combination command, <command - role="hg-cmd">hg addremove</command>, that adds untracked - files and marks missing files as removed.</para> - - &interaction.daily.files.addremove; - - <para>The <command role="hg-cmd">hg commit</command> command - also provides a <option role="hg-opt-commit">-A</option> - option that performs this same add-and-remove, immediately - followed by a commit.</para> - - &interaction.daily.files.commit-addremove; - - </sect2> - </sect1> - <sect1> - <title>Copying files</title> - - <para>Mercurial provides a <command role="hg-cmd">hg - copy</command> command that lets you make a new copy of a - file. When you copy a file using this command, Mercurial makes - a record of the fact that the new file is a copy of the original - file. It treats these copied files specially when you merge - your work with someone else's.</para> - - <sect2> - <title>The results of copying during a merge</title> - - <para>What happens during a merge is that changes - <quote>follow</quote> a copy. To best illustrate what this - means, let's create an example. We'll start with the usual - tiny repository that contains a single file.</para> - - &interaction.daily.copy.init; - - <para>We need to do some work in - parallel, so that we'll have something to merge. So let's - clone our repository.</para> - - &interaction.daily.copy.clone; - - <para>Back in our initial repository, let's use the <command - role="hg-cmd">hg copy</command> command to make a copy of - the first file we created.</para> - - &interaction.daily.copy.copy; - - <para>If we look at the output of the <command role="hg-cmd">hg - status</command> command afterwards, the copied file looks - just like a normal added file.</para> - - &interaction.daily.copy.status; - - <para>But if we pass the <option - role="hg-opt-status">-C</option> option to <command - role="hg-cmd">hg status</command>, it prints another line of - output: this is the file that our newly-added file was copied - <emphasis>from</emphasis>.</para> - - &interaction.daily.copy.status-copy; - - <para>Now, back in the repository we cloned, let's make a change - in parallel. We'll add a line of content to the original file - that we created.</para> - - &interaction.daily.copy.other; - - <para>Now we have a modified <filename>file</filename> in this - repository. When we pull the changes from the first - repository, and merge the two heads, Mercurial will propagate - the changes that we made locally to <filename>file</filename> - into its copy, <filename>new-file</filename>.</para> - - &interaction.daily.copy.merge; - - </sect2> - <sect2 id="sec:daily:why-copy"> - <title>Why should changes follow copies?</title> - - <para>This behaviour, of changes to a file propagating out to - copies of the file, might seem esoteric, but in most cases - it's highly desirable.</para> - - <para>First of all, remember that this propagation - <emphasis>only</emphasis> happens when you merge. So if you - <command role="hg-cmd">hg copy</command> a file, and - subsequently modify the original file during the normal course - of your work, nothing will happen.</para> - - <para>The second thing to know is that modifications will only - propagate across a copy as long as the repository that you're - pulling changes from <emphasis>doesn't know</emphasis> about - the copy.</para> - - <para>The reason that Mercurial does this is as follows. Let's - say I make an important bug fix in a source file, and commit - my changes. Meanwhile, you've decided to <command - role="hg-cmd">hg copy</command> the file in your repository, - without knowing about the bug or having seen the fix, and you - have started hacking on your copy of the file.</para> - - <para>If you pulled and merged my changes, and Mercurial - <emphasis>didn't</emphasis> propagate changes across copies, - your source file would now contain the bug, and unless you - remembered to propagate the bug fix by hand, the bug would - <emphasis>remain</emphasis> in your copy of the file.</para> - - <para>By automatically propagating the change that fixed the bug - from the original file to the copy, Mercurial prevents this - class of problem. To my knowledge, Mercurial is the - <emphasis>only</emphasis> revision control system that - propagates changes across copies like this.</para> - - <para>Once your change history has a record that the copy and - subsequent merge occurred, there's usually no further need to - propagate changes from the original file to the copied file, - and that's why Mercurial only propagates changes across copies - until this point, and no further.</para> - - </sect2> - <sect2> - <title>How to make changes <emphasis>not</emphasis> follow a - copy</title> - - <para>If, for some reason, you decide that this business of - automatically propagating changes across copies is not for - you, simply use your system's normal file copy command (on - Unix-like systems, that's <command>cp</command>) to make a - copy of a file, then <command role="hg-cmd">hg add</command> - the new copy by hand. Before you do so, though, please do - reread section <xref linkend="sec:daily:why-copy"/>, and make - an informed - decision that this behaviour is not appropriate to your - specific case.</para> - - </sect2> - <sect2> - <title>Behaviour of the <command role="hg-cmd">hg copy</command> - command</title> - - <para>When you use the <command role="hg-cmd">hg copy</command> - command, Mercurial makes a copy of each source file as it - currently stands in the working directory. This means that if - you make some modifications to a file, then <command - role="hg-cmd">hg copy</command> it without first having - committed those changes, the new copy will also contain the - modifications you have made up until that point. (I find this - behaviour a little counterintuitive, which is why I mention it - here.)</para> - - <para>The <command role="hg-cmd">hg copy</command> command acts - similarly to the Unix <command>cp</command> command (you can - use the <command role="hg-cmd">hg cp</command> alias if you - prefer). The last argument is the - <emphasis>destination</emphasis>, and all prior arguments are - <emphasis>sources</emphasis>. If you pass it a single file as - the source, and the destination does not exist, it creates a - new file with that name.</para> - - &interaction.daily.copy.simple; - - <para>If the destination is a directory, Mercurial copies its - sources into that directory.</para> - - &interaction.daily.copy.dir-dest; - - <para>Copying a directory is - recursive, and preserves the directory structure of the - source.</para> - - &interaction.daily.copy.dir-src; - - <para>If the source and destination are both directories, the - source tree is recreated in the destination directory.</para> - - &interaction.daily.copy.dir-src-dest; - - <para>As with the <command role="hg-cmd">hg rename</command> - command, if you copy a file manually and then want Mercurial - to know that you've copied the file, simply use the <option - role="hg-opt-copy">--after</option> option to <command - role="hg-cmd">hg copy</command>.</para> - - &interaction.daily.copy.after; - - </sect2> - </sect1> - <sect1> - <title>Renaming files</title> - - <para>It's rather more common to need to rename a file than to - make a copy of it. The reason I discussed the <command - role="hg-cmd">hg copy</command> command before talking about - renaming files is that Mercurial treats a rename in essentially - the same way as a copy. Therefore, knowing what Mercurial does - when you copy a file tells you what to expect when you rename a - file.</para> - - <para>When you use the <command role="hg-cmd">hg rename</command> - command, Mercurial makes a copy of each source file, then - deletes it and marks the file as removed.</para> - - &interaction.daily.rename.rename; - - <para>The <command role="hg-cmd">hg status</command> command shows - the newly copied file as added, and the copied-from file as - removed.</para> - - &interaction.daily.rename.status; - - <para>As with the results of a <command role="hg-cmd">hg - copy</command>, we must use the <option - role="hg-opt-status">-C</option> option to <command - role="hg-cmd">hg status</command> to see that the added file - is really being tracked by Mercurial as a copy of the original, - now removed, file.</para> - - &interaction.daily.rename.status-copy; - - <para>As with <command role="hg-cmd">hg remove</command> and - <command role="hg-cmd">hg copy</command>, you can tell Mercurial - about a rename after the fact using the <option - role="hg-opt-rename">--after</option> option. In most other - respects, the behaviour of the <command role="hg-cmd">hg - rename</command> command, and the options it accepts, are - similar to the <command role="hg-cmd">hg copy</command> - command.</para> - - <sect2> - <title>Renaming files and merging changes</title> - - <para>Since Mercurial's rename is implemented as - copy-and-remove, the same propagation of changes happens when - you merge after a rename as after a copy.</para> - - <para>If I modify a file, and you rename it to a new name, and - then we merge our respective changes, my modifications to the - file under its original name will be propagated into the file - under its new name. (This is something you might expect to - <quote>simply work,</quote> but not all revision control - systems actually do this.)</para> - - <para>Whereas having changes follow a copy is a feature where - you can perhaps nod and say <quote>yes, that might be - useful,</quote> it should be clear that having them follow a - rename is definitely important. Without this facility, it - would simply be too easy for changes to become orphaned when - files are renamed.</para> - - </sect2> - <sect2> - <title>Divergent renames and merging</title> - - <para>The case of diverging names occurs when two developers - start with a file&emdash;let's call it - <filename>foo</filename>&emdash;in their respective - repositories.</para> - - &interaction.rename.divergent.clone; - - <para>Anne renames the file to <filename>bar</filename>.</para> - - &interaction.rename.divergent.rename.anne; - - <para>Meanwhile, Bob renames it to - <filename>quux</filename>.</para> - - &interaction.rename.divergent.rename.bob; - - <para>I like to think of this as a conflict because each - developer has expressed different intentions about what the - file ought to be named.</para> - - <para>What do you think should happen when they merge their - work? Mercurial's actual behaviour is that it always preserves - <emphasis>both</emphasis> names when it merges changesets that - contain divergent renames.</para> - - &interaction.rename.divergent.merge; - - <para>Notice that Mercurial does warn about the divergent - renames, but it leaves it up to you to do something about the - divergence after the merge.</para> - - </sect2> - <sect2> - <title>Convergent renames and merging</title> - - <para>Another kind of rename conflict occurs when two people - choose to rename different <emphasis>source</emphasis> files - to the same <emphasis>destination</emphasis>. In this case, - Mercurial runs its normal merge machinery, and lets you guide - it to a suitable resolution.</para> - - </sect2> - <sect2> - <title>Other name-related corner cases</title> - - <para>Mercurial has a longstanding bug in which it fails to - handle a merge where one side has a file with a given name, - while another has a directory with the same name. This is - documented as <ulink role="hg-bug" - url="http://www.selenic.com/mercurial/bts/issue29">issue - 29</ulink>.</para> - - &interaction.issue29.go; - - </sect2> - </sect1> - <sect1> - <title>Recovering from mistakes</title> - - <para>Mercurial has some useful commands that will help you to - recover from some common mistakes.</para> - - <para>The <command role="hg-cmd">hg revert</command> command lets - you undo changes that you have made to your working directory. - For example, if you <command role="hg-cmd">hg add</command> a - file by accident, just run <command role="hg-cmd">hg - revert</command> with the name of the file you added, and - while the file won't be touched in any way, it won't be tracked - for adding by Mercurial any longer, either. You can also use - <command role="hg-cmd">hg revert</command> to get rid of - erroneous changes to a file.</para> - - <para>It's useful to remember that the <command role="hg-cmd">hg - revert</command> command is useful for changes that you have - not yet committed. Once you've committed a change, if you - decide it was a mistake, you can still do something about it, - though your options may be more limited.</para> - - <para>For more information about the <command role="hg-cmd">hg - revert</command> command, and details about how to deal with - changes you have already committed, see chapter <xref - linkend="chap:undo"/>.</para> - - </sect1> -</chapter> - -<!-- -local variables: -sgml-parent-document: ("00book.xml" "book" "chapter") -end: --->
--- a/en/ch06-collab.xml Wed Mar 18 00:08:22 2009 -0700 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,1434 +0,0 @@ -<!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : --> - -<chapter id="cha:collab"> - <?dbhtml filename="collaborating-with-other-people.html"?> - <title>Collaborating with other people</title> - - <para>As a completely decentralised tool, Mercurial doesn't impose - any policy on how people ought to work with each other. However, - if you're new to distributed revision control, it helps to have - some tools and examples in mind when you're thinking about - possible workflow models.</para> - - <sect1> - <title>Mercurial's web interface</title> - - <para>Mercurial has a powerful web interface that provides several - useful capabilities.</para> - - <para>For interactive use, the web interface lets you browse a - single repository or a collection of repositories. You can view - the history of a repository, examine each change (comments and - diffs), and view the contents of each directory and file.</para> - - <para>Also for human consumption, the web interface provides an - RSS feed of the changes in a repository. This lets you - <quote>subscribe</quote> to a repository using your favourite - feed reader, and be automatically notified of activity in that - repository as soon as it happens. I find this capability much - more convenient than the model of subscribing to a mailing list - to which notifications are sent, as it requires no additional - configuration on the part of whoever is serving the - repository.</para> - - <para>The web interface also lets remote users clone a repository, - pull changes from it, and (when the server is configured to - permit it) push changes back to it. Mercurial's HTTP tunneling - protocol aggressively compresses data, so that it works - efficiently even over low-bandwidth network connections.</para> - - <para>The easiest way to get started with the web interface is to - use your web browser to visit an existing repository, such as - the master Mercurial repository at <ulink - url="http://www.selenic.com/repo/hg?style=gitweb">http://www.selenic.com/repo/hg?style=gitweb</ulink>.</para> - - <para>If you're interested in providing a web interface to your - own repositories, Mercurial provides two ways to do this. The - first is using the <command role="hg-cmd">hg serve</command> - command, which is best suited to short-term - <quote>lightweight</quote> serving. See section <xref - linkend="sec:collab:serve"/> below for details of how to use - this command. If you have a long-lived repository that you'd - like to make permanently available, Mercurial has built-in - support for the CGI (Common Gateway Interface) standard, which - all common web servers support. See section <xref - linkend="sec:collab:cgi"/> for details of CGI - configuration.</para> - - </sect1> - <sect1> - <title>Collaboration models</title> - - <para>With a suitably flexible tool, making decisions about - workflow is much more of a social engineering challenge than a - technical one. Mercurial imposes few limitations on how you can - structure the flow of work in a project, so it's up to you and - your group to set up and live with a model that matches your own - particular needs.</para> - - <sect2> - <title>Factors to keep in mind</title> - - <para>The most important aspect of any model that you must keep - in mind is how well it matches the needs and capabilities of - the people who will be using it. This might seem - self-evident; even so, you still can't afford to forget it for - a moment.</para> - - <para>I once put together a workflow model that seemed to make - perfect sense to me, but that caused a considerable amount of - consternation and strife within my development team. In spite - of my attempts to explain why we needed a complex set of - branches, and how changes ought to flow between them, a few - team members revolted. Even though they were smart people, - they didn't want to pay attention to the constraints we were - operating under, or face the consequences of those constraints - in the details of the model that I was advocating.</para> - - <para>Don't sweep foreseeable social or technical problems under - the rug. Whatever scheme you put into effect, you should plan - for mistakes and problem scenarios. Consider adding automated - machinery to prevent, or quickly recover from, trouble that - you can anticipate. As an example, if you intend to have a - branch with not-for-release changes in it, you'd do well to - think early about the possibility that someone might - accidentally merge those changes into a release branch. You - could avoid this particular problem by writing a hook that - prevents changes from being merged from an inappropriate - branch.</para> - - </sect2> - <sect2> - <title>Informal anarchy</title> - - <para>I wouldn't suggest an <quote>anything goes</quote> - approach as something sustainable, but it's a model that's - easy to grasp, and it works perfectly well in a few unusual - situations.</para> - - <para>As one example, many projects have a loose-knit group of - collaborators who rarely physically meet each other. Some - groups like to overcome the isolation of working at a distance - by organising occasional <quote>sprints</quote>. In a sprint, - a number of people get together in a single location (a - company's conference room, a hotel meeting room, that kind of - place) and spend several days more or less locked in there, - hacking intensely on a handful of projects.</para> - - <para>A sprint is the perfect place to use the <command - role="hg-cmd">hg serve</command> command, since <command - role="hg-cmd">hg serve</command> does not require any fancy - server infrastructure. You can get started with <command - role="hg-cmd">hg serve</command> in moments, by reading - section <xref linkend="sec:collab:serve"/> below. Then simply - tell - the person next to you that you're running a server, send the - URL to them in an instant message, and you immediately have a - quick-turnaround way to work together. They can type your URL - into their web browser and quickly review your changes; or - they can pull a bugfix from you and verify it; or they can - clone a branch containing a new feature and try it out.</para> - - <para>The charm, and the problem, with doing things in an ad hoc - fashion like this is that only people who know about your - changes, and where they are, can see them. Such an informal - approach simply doesn't scale beyond a handful people, because - each individual needs to know about $n$ different repositories - to pull from.</para> - - </sect2> - <sect2> - <title>A single central repository</title> - - <para>For smaller projects migrating from a centralised revision - control tool, perhaps the easiest way to get started is to - have changes flow through a single shared central repository. - This is also the most common <quote>building block</quote> for - more ambitious workflow schemes.</para> - - <para>Contributors start by cloning a copy of this repository. - They can pull changes from it whenever they need to, and some - (perhaps all) developers have permission to push a change back - when they're ready for other people to see it.</para> - - <para>Under this model, it can still often make sense for people - to pull changes directly from each other, without going - through the central repository. Consider a case in which I - have a tentative bug fix, but I am worried that if I were to - publish it to the central repository, it might subsequently - break everyone else's trees as they pull it. To reduce the - potential for damage, I can ask you to clone my repository - into a temporary repository of your own and test it. This - lets us put off publishing the potentially unsafe change until - it has had a little testing.</para> - - <para>In this kind of scenario, people usually use the - <command>ssh</command> protocol to securely push changes to - the central repository, as documented in section <xref - linkend="sec:collab:ssh"/>. It's also - usual to publish a read-only copy of the repository over HTTP - using CGI, as in section <xref linkend="sec:collab:cgi"/>. - Publishing over HTTP - satisfies the needs of people who don't have push access, and - those who want to use web browsers to browse the repository's - history.</para> - - </sect2> - <sect2> - <title>Working with multiple branches</title> - - <para>Projects of any significant size naturally tend to make - progress on several fronts simultaneously. In the case of - software, it's common for a project to go through periodic - official releases. A release might then go into - <quote>maintenance mode</quote> for a while after its first - publication; maintenance releases tend to contain only bug - fixes, not new features. In parallel with these maintenance - releases, one or more future releases may be under - development. People normally use the word - <quote>branch</quote> to refer to one of these many slightly - different directions in which development is - proceeding.</para> - - <para>Mercurial is particularly well suited to managing a number - of simultaneous, but not identical, branches. Each - <quote>development direction</quote> can live in its own - central repository, and you can merge changes from one to - another as the need arises. Because repositories are - independent of each other, unstable changes in a development - branch will never affect a stable branch unless someone - explicitly merges those changes in.</para> - - <para>Here's an example of how this can work in practice. Let's - say you have one <quote>main branch</quote> on a central - server.</para> - - &interaction.branching.init; - - <para>People clone it, make changes locally, test them, and push - them back.</para> - - <para>Once the main branch reaches a release milestone, you can - use the <command role="hg-cmd">hg tag</command> command to - give a permanent name to the milestone revision.</para> - - &interaction.branching.tag; - - <para>Let's say some ongoing - development occurs on the main branch.</para> - - &interaction.branching.main; - - <para>Using the tag that was recorded at the milestone, people - who clone that repository at any time in the future can use - <command role="hg-cmd">hg update</command> to get a copy of - the working directory exactly as it was when that tagged - revision was committed.</para> - - &interaction.branching.update; - - <para>In addition, immediately after the main branch is tagged, - someone can then clone the main branch on the server to a new - <quote>stable</quote> branch, also on the server.</para> - - &interaction.branching.clone; - - <para>Someone who needs to make a change to the stable branch - can then clone <emphasis>that</emphasis> repository, make - their changes, commit, and push their changes back there.</para> - - &interaction.branching.stable; - - <para>Because Mercurial repositories are independent, and - Mercurial doesn't move changes around automatically, the - stable and main branches are <emphasis>isolated</emphasis> - from each other. The changes that you made on the main branch - don't <quote>leak</quote> to the stable branch, and vice - versa.</para> - - <para>You'll often want all of your bugfixes on the stable - branch to show up on the main branch, too. Rather than - rewrite a bugfix on the main branch, you can simply pull and - merge changes from the stable to the main branch, and - Mercurial will bring those bugfixes in for you.</para> - - &interaction.branching.merge; - - <para>The main branch will still contain changes that are not on - the stable branch, but it will also contain all of the - bugfixes from the stable branch. The stable branch remains - unaffected by these changes.</para> - - </sect2> - <sect2> - <title>Feature branches</title> - - <para>For larger projects, an effective way to manage change is - to break up a team into smaller groups. Each group has a - shared branch of its own, cloned from a single - <quote>master</quote> branch used by the entire project. - People working on an individual branch are typically quite - isolated from developments on other branches.</para> - - <informalfigure id="fig:collab:feature-branches"> - <mediaobject><imageobject><imagedata - fileref="feature-branches"/></imageobject><textobject><phrase>XXX - add text</phrase></textobject><caption><para>Feature - branches</para></caption></mediaobject> - </informalfigure> - - <para>When a particular feature is deemed to be in suitable - shape, someone on that feature team pulls and merges from the - master branch into the feature branch, then pushes back up to - the master branch.</para> - - </sect2> - <sect2> - <title>The release train</title> - - <para>Some projects are organised on a <quote>train</quote> - basis: a release is scheduled to happen every few months, and - whatever features are ready when the <quote>train</quote> is - ready to leave are allowed in.</para> - - <para>This model resembles working with feature branches. The - difference is that when a feature branch misses a train, - someone on the feature team pulls and merges the changes that - went out on that train release into the feature branch, and - the team continues its work on top of that release so that - their feature can make the next release.</para> - - </sect2> - <sect2> - <title>The Linux kernel model</title> - - <para>The development of the Linux kernel has a shallow - hierarchical structure, surrounded by a cloud of apparent - chaos. Because most Linux developers use - <command>git</command>, a distributed revision control tool - with capabilities similar to Mercurial, it's useful to - describe the way work flows in that environment; if you like - the ideas, the approach translates well across tools.</para> - - <para>At the center of the community sits Linus Torvalds, the - creator of Linux. He publishes a single source repository - that is considered the <quote>authoritative</quote> current - tree by the entire developer community. Anyone can clone - Linus's tree, but he is very choosy about whose trees he pulls - from.</para> - - <para>Linus has a number of <quote>trusted lieutenants</quote>. - As a general rule, he pulls whatever changes they publish, in - most cases without even reviewing those changes. Some of - those lieutenants are generally agreed to be - <quote>maintainers</quote>, responsible for specific - subsystems within the kernel. If a random kernel hacker wants - to make a change to a subsystem that they want to end up in - Linus's tree, they must find out who the subsystem's - maintainer is, and ask that maintainer to take their change. - If the maintainer reviews their changes and agrees to take - them, they'll pass them along to Linus in due course.</para> - - <para>Individual lieutenants have their own approaches to - reviewing, accepting, and publishing changes; and for deciding - when to feed them to Linus. In addition, there are several - well known branches that people use for different purposes. - For example, a few people maintain <quote>stable</quote> - repositories of older versions of the kernel, to which they - apply critical fixes as needed. Some maintainers publish - multiple trees: one for experimental changes; one for changes - that they are about to feed upstream; and so on. Others just - publish a single tree.</para> - - <para>This model has two notable features. The first is that - it's <quote>pull only</quote>. You have to ask, convince, or - beg another developer to take a change from you, because there - are almost no trees to which more than one person can push, - and there's no way to push changes into a tree that someone - else controls.</para> - - <para>The second is that it's based on reputation and acclaim. - If you're an unknown, Linus will probably ignore changes from - you without even responding. But a subsystem maintainer will - probably review them, and will likely take them if they pass - their criteria for suitability. The more <quote>good</quote> - changes you contribute to a maintainer, the more likely they - are to trust your judgment and accept your changes. If you're - well-known and maintain a long-lived branch for something - Linus hasn't yet accepted, people with similar interests may - pull your changes regularly to keep up with your work.</para> - - <para>Reputation and acclaim don't necessarily cross subsystem - or <quote>people</quote> boundaries. If you're a respected - but specialised storage hacker, and you try to fix a - networking bug, that change will receive a level of scrutiny - from a network maintainer comparable to a change from a - complete stranger.</para> - - <para>To people who come from more orderly project backgrounds, - the comparatively chaotic Linux kernel development process - often seems completely insane. It's subject to the whims of - individuals; people make sweeping changes whenever they deem - it appropriate; and the pace of development is astounding. - And yet Linux is a highly successful, well-regarded piece of - software.</para> - - </sect2> - <sect2> - <title>Pull-only versus shared-push collaboration</title> - - <para>A perpetual source of heat in the open source community is - whether a development model in which people only ever pull - changes from others is <quote>better than</quote> one in which - multiple people can push changes to a shared - repository.</para> - - <para>Typically, the backers of the shared-push model use tools - that actively enforce this approach. If you're using a - centralised revision control tool such as Subversion, there's - no way to make a choice over which model you'll use: the tool - gives you shared-push, and if you want to do anything else, - you'll have to roll your own approach on top (such as applying - a patch by hand).</para> - - <para>A good distributed revision control tool, such as - Mercurial, will support both models. You and your - collaborators can then structure how you work together based - on your own needs and preferences, not on what contortions - your tools force you into.</para> - - </sect2> - <sect2> - <title>Where collaboration meets branch management</title> - - <para>Once you and your team set up some shared repositories and - start propagating changes back and forth between local and - shared repos, you begin to face a related, but slightly - different challenge: that of managing the multiple directions - in which your team may be moving at once. Even though this - subject is intimately related to how your team collaborates, - it's dense enough to merit treatment of its own, in chapter - <xref linkend="chap:branch"/>.</para> - - </sect2> - </sect1> - <sect1> - <title>The technical side of sharing</title> - - <para>The remainder of this chapter is devoted to the question of - serving data to your collaborators.</para> - - </sect1> - <sect1 id="sec:collab:serve"> - <title>Informal sharing with <command role="hg-cmd">hg - serve</command></title> - - <para>Mercurial's <command role="hg-cmd">hg serve</command> - command is wonderfully suited to small, tight-knit, and - fast-paced group environments. It also provides a great way to - get a feel for using Mercurial commands over a network.</para> - - <para>Run <command role="hg-cmd">hg serve</command> inside a - repository, and in under a second it will bring up a specialised - HTTP server; this will accept connections from any client, and - serve up data for that repository until you terminate it. - Anyone who knows the URL of the server you just started, and can - talk to your computer over the network, can then use a web - browser or Mercurial to read data from that repository. A URL - for a <command role="hg-cmd">hg serve</command> instance running - on a laptop is likely to look something like - <literal>http://my-laptop.local:8000/</literal>.</para> - - <para>The <command role="hg-cmd">hg serve</command> command is - <emphasis>not</emphasis> a general-purpose web server. It can do - only two things:</para> - <itemizedlist> - <listitem><para>Allow people to browse the history of the - repository it's serving, from their normal web - browsers.</para> - </listitem> - <listitem><para>Speak Mercurial's wire protocol, so that people - can <command role="hg-cmd">hg clone</command> or <command - role="hg-cmd">hg pull</command> changes from that - repository.</para> - </listitem></itemizedlist> - <para>In particular, <command role="hg-cmd">hg serve</command> - won't allow remote users to <emphasis>modify</emphasis> your - repository. It's intended for read-only use.</para> - - <para>If you're getting started with Mercurial, there's nothing to - prevent you from using <command role="hg-cmd">hg serve</command> - to serve up a repository on your own computer, then use commands - like <command role="hg-cmd">hg clone</command>, <command - role="hg-cmd">hg incoming</command>, and so on to talk to that - server as if the repository was hosted remotely. This can help - you to quickly get acquainted with using commands on - network-hosted repositories.</para> - - <sect2> - <title>A few things to keep in mind</title> - - <para>Because it provides unauthenticated read access to all - clients, you should only use <command role="hg-cmd">hg - serve</command> in an environment where you either don't - care, or have complete control over, who can access your - network and pull data from your repository.</para> - - <para>The <command role="hg-cmd">hg serve</command> command - knows nothing about any firewall software you might have - installed on your system or network. It cannot detect or - control your firewall software. If other people are unable to - talk to a running <command role="hg-cmd">hg serve</command> - instance, the second thing you should do - (<emphasis>after</emphasis> you make sure that they're using - the correct URL) is check your firewall configuration.</para> - - <para>By default, <command role="hg-cmd">hg serve</command> - listens for incoming connections on port 8000. If another - process is already listening on the port you want to use, you - can specify a different port to listen on using the <option - role="hg-opt-serve">-p</option> option.</para> - - <para>Normally, when <command role="hg-cmd">hg serve</command> - starts, it prints no output, which can be a bit unnerving. If - you'd like to confirm that it is indeed running correctly, and - find out what URL you should send to your collaborators, start - it with the <option role="hg-opt-global">-v</option> - option.</para> - - </sect2> - </sect1> - <sect1 id="sec:collab:ssh"> - <title>Using the Secure Shell (ssh) protocol</title> - - <para>You can pull and push changes securely over a network - connection using the Secure Shell (<literal>ssh</literal>) - protocol. To use this successfully, you may have to do a little - bit of configuration on the client or server sides.</para> - - <para>If you're not familiar with ssh, it's a network protocol - that lets you securely communicate with another computer. To - use it with Mercurial, you'll be setting up one or more user - accounts on a server so that remote users can log in and execute - commands.</para> - - <para>(If you <emphasis>are</emphasis> familiar with ssh, you'll - probably find some of the material that follows to be elementary - in nature.)</para> - - <sect2> - <title>How to read and write ssh URLs</title> - - <para>An ssh URL tends to look like this:</para> - <programlisting>ssh://bos@hg.serpentine.com:22/hg/hgbook</programlisting> - <orderedlist> - <listitem><para>The <quote><literal>ssh://</literal></quote> - part tells Mercurial to use the ssh protocol.</para> - </listitem> - <listitem><para>The <quote><literal>bos@</literal></quote> - component indicates what username to log into the server - as. You can leave this out if the remote username is the - same as your local username.</para> - </listitem> - <listitem><para>The - <quote><literal>hg.serpentine.com</literal></quote> gives - the hostname of the server to log into.</para> - </listitem> - <listitem><para>The <quote>:22</quote> identifies the port - number to connect to the server on. The default port is - 22, so you only need to specify a colon and port number if - you're <emphasis>not</emphasis> using port 22.</para> - </listitem> - <listitem><para>The remainder of the URL is the local path to - the repository on the server.</para> - </listitem></orderedlist> - - <para>There's plenty of scope for confusion with the path - component of ssh URLs, as there is no standard way for tools - to interpret it. Some programs behave differently than others - when dealing with these paths. This isn't an ideal situation, - but it's unlikely to change. Please read the following - paragraphs carefully.</para> - - <para>Mercurial treats the path to a repository on the server as - relative to the remote user's home directory. For example, if - user <literal>foo</literal> on the server has a home directory - of <filename class="directory">/home/foo</filename>, then an - ssh URL that contains a path component of <filename - class="directory">bar</filename> <emphasis>really</emphasis> - refers to the directory <filename - class="directory">/home/foo/bar</filename>.</para> - - <para>If you want to specify a path relative to another user's - home directory, you can use a path that starts with a tilde - character followed by the user's name (let's call them - <literal>otheruser</literal>), like this.</para> - <programlisting>ssh://server/~otheruser/hg/repo</programlisting> - - <para>And if you really want to specify an - <emphasis>absolute</emphasis> path on the server, begin the - path component with two slashes, as in this example.</para> - <programlisting>ssh://server//absolute/path</programlisting> - - </sect2> - <sect2> - <title>Finding an ssh client for your system</title> - - <para>Almost every Unix-like system comes with OpenSSH - preinstalled. If you're using such a system, run - <literal>which ssh</literal> to find out if the - <command>ssh</command> command is installed (it's usually in - <filename class="directory">/usr/bin</filename>). In the - unlikely event that it isn't present, take a look at your - system documentation to figure out how to install it.</para> - - <para>On Windows, you'll first need to download a suitable ssh - client. There are two alternatives.</para> - <itemizedlist> - <listitem><para>Simon Tatham's excellent PuTTY package - <citation>web:putty</citation> provides a complete suite - of ssh client commands.</para> - </listitem> - <listitem><para>If you have a high tolerance for pain, you can - use the Cygwin port of OpenSSH.</para> - </listitem></itemizedlist> - <para>In either case, you'll need to edit your <filename - role="special">hg.ini</filename> file to - tell Mercurial where to find the actual client command. For - example, if you're using PuTTY, you'll need to use the - <command>plink</command> command as a command-line ssh - client.</para> - <programlisting>[ui] -ssh = C:/path/to/plink.exe -ssh -i "C:/path/to/my/private/key"</programlisting> - - <note> - <para> The path to <command>plink</command> shouldn't contain - any whitespace characters, or Mercurial may not be able to - run it correctly (so putting it in <filename - class="directory">C:\Program Files</filename> is probably - not a good idea).</para> - </note> - - </sect2> - <sect2> - <title>Generating a key pair</title> - - <para>To avoid the need to repetitively type a password every - time you need to use your ssh client, I recommend generating a - key pair. On a Unix-like system, the - <command>ssh-keygen</command> command will do the trick. On - Windows, if you're using PuTTY, the - <command>puttygen</command> command is what you'll - need.</para> - - <para>When you generate a key pair, it's usually - <emphasis>highly</emphasis> advisable to protect it with a - passphrase. (The only time that you might not want to do this - is when you're using the ssh protocol for automated tasks on a - secure network.)</para> - - <para>Simply generating a key pair isn't enough, however. - You'll need to add the public key to the set of authorised - keys for whatever user you're logging in remotely as. For - servers using OpenSSH (the vast majority), this will mean - adding the public key to a list in a file called <filename - role="special">authorized_keys</filename> in their <filename - role="special" class="directory">.ssh</filename> - directory.</para> - - <para>On a Unix-like system, your public key will have a - <filename>.pub</filename> extension. If you're using - <command>puttygen</command> on Windows, you can save the - public key to a file of your choosing, or paste it from the - window it's displayed in straight into the <filename - role="special">authorized_keys</filename> file.</para> - - </sect2> - <sect2> - <title>Using an authentication agent</title> - - <para>An authentication agent is a daemon that stores - passphrases in memory (so it will forget passphrases if you - log out and log back in again). An ssh client will notice if - it's running, and query it for a passphrase. If there's no - authentication agent running, or the agent doesn't store the - necessary passphrase, you'll have to type your passphrase - every time Mercurial tries to communicate with a server on - your behalf (e.g. whenever you pull or push changes).</para> - - <para>The downside of storing passphrases in an agent is that - it's possible for a well-prepared attacker to recover the - plain text of your passphrases, in some cases even if your - system has been power-cycled. You should make your own - judgment as to whether this is an acceptable risk. It - certainly saves a lot of repeated typing.</para> - - <para>On Unix-like systems, the agent is called - <command>ssh-agent</command>, and it's often run automatically - for you when you log in. You'll need to use the - <command>ssh-add</command> command to add passphrases to the - agent's store. On Windows, if you're using PuTTY, the - <command>pageant</command> command acts as the agent. It adds - an icon to your system tray that will let you manage stored - passphrases.</para> - - </sect2> - <sect2> - <title>Configuring the server side properly</title> - - <para>Because ssh can be fiddly to set up if you're new to it, - there's a variety of things that can go wrong. Add Mercurial - on top, and there's plenty more scope for head-scratching. - Most of these potential problems occur on the server side, not - the client side. The good news is that once you've gotten a - configuration working, it will usually continue to work - indefinitely.</para> - - <para>Before you try using Mercurial to talk to an ssh server, - it's best to make sure that you can use the normal - <command>ssh</command> or <command>putty</command> command to - talk to the server first. If you run into problems with using - these commands directly, Mercurial surely won't work. Worse, - it will obscure the underlying problem. Any time you want to - debug ssh-related Mercurial problems, you should drop back to - making sure that plain ssh client commands work first, - <emphasis>before</emphasis> you worry about whether there's a - problem with Mercurial.</para> - - <para>The first thing to be sure of on the server side is that - you can actually log in from another machine at all. If you - can't use <command>ssh</command> or <command>putty</command> - to log in, the error message you get may give you a few hints - as to what's wrong. The most common problems are as - follows.</para> - <itemizedlist> - <listitem><para>If you get a <quote>connection refused</quote> - error, either there isn't an SSH daemon running on the - server at all, or it's inaccessible due to firewall - configuration.</para> - </listitem> - <listitem><para>If you get a <quote>no route to host</quote> - error, you either have an incorrect address for the server - or a seriously locked down firewall that won't admit its - existence at all.</para> - </listitem> - <listitem><para>If you get a <quote>permission denied</quote> - error, you may have mistyped the username on the server, - or you could have mistyped your key's passphrase or the - remote user's password.</para> - </listitem></itemizedlist> - <para>In summary, if you're having trouble talking to the - server's ssh daemon, first make sure that one is running at - all. On many systems it will be installed, but disabled, by - default. Once you're done with this step, you should then - check that the server's firewall is configured to allow - incoming connections on the port the ssh daemon is listening - on (usually 22). Don't worry about more exotic possibilities - for misconfiguration until you've checked these two - first.</para> - - <para>If you're using an authentication agent on the client side - to store passphrases for your keys, you ought to be able to - log into the server without being prompted for a passphrase or - a password. If you're prompted for a passphrase, there are a - few possible culprits.</para> - <itemizedlist> - <listitem><para>You might have forgotten to use - <command>ssh-add</command> or <command>pageant</command> - to store the passphrase.</para> - </listitem> - <listitem><para>You might have stored the passphrase for the - wrong key.</para> - </listitem></itemizedlist> - <para>If you're being prompted for the remote user's password, - there are another few possible problems to check.</para> - <itemizedlist> - <listitem><para>Either the user's home directory or their - <filename role="special" class="directory">.ssh</filename> - directory might have excessively liberal permissions. As - a result, the ssh daemon will not trust or read their - <filename role="special">authorized_keys</filename> file. - For example, a group-writable home or <filename - role="special" class="directory">.ssh</filename> - directory will often cause this symptom.</para> - </listitem> - <listitem><para>The user's <filename - role="special">authorized_keys</filename> file may have - a problem. If anyone other than the user owns or can write - to that file, the ssh daemon will not trust or read - it.</para> - </listitem></itemizedlist> - - <para>In the ideal world, you should be able to run the - following command successfully, and it should print exactly - one line of output, the current date and time.</para> - <programlisting>ssh myserver date</programlisting> - - <para>If, on your server, you have login scripts that print - banners or other junk even when running non-interactive - commands like this, you should fix them before you continue, - so that they only print output if they're run interactively. - Otherwise these banners will at least clutter up Mercurial's - output. Worse, they could potentially cause problems with - running Mercurial commands remotely. Mercurial makes tries to - detect and ignore banners in non-interactive - <command>ssh</command> sessions, but it is not foolproof. (If - you're editing your login scripts on your server, the usual - way to see if a login script is running in an interactive - shell is to check the return code from the command - <literal>tty -s</literal>.)</para> - - <para>Once you've verified that plain old ssh is working with - your server, the next step is to ensure that Mercurial runs on - the server. The following command should run - successfully:</para> - - <programlisting>ssh myserver hg version</programlisting> - - <para>If you see an error message instead of normal <command - role="hg-cmd">hg version</command> output, this is usually - because you haven't installed Mercurial to <filename - class="directory">/usr/bin</filename>. Don't worry if this - is the case; you don't need to do that. But you should check - for a few possible problems.</para> - <itemizedlist> - <listitem><para>Is Mercurial really installed on the server at - all? I know this sounds trivial, but it's worth - checking!</para> - </listitem> - <listitem><para>Maybe your shell's search path (usually set - via the <envar>PATH</envar> environment variable) is - simply misconfigured.</para> - </listitem> - <listitem><para>Perhaps your <envar>PATH</envar> environment - variable is only being set to point to the location of the - <command>hg</command> executable if the login session is - interactive. This can happen if you're setting the path - in the wrong shell login script. See your shell's - documentation for details.</para> - </listitem> - <listitem><para>The <envar>PYTHONPATH</envar> environment - variable may need to contain the path to the Mercurial - Python modules. It might not be set at all; it could be - incorrect; or it may be set only if the login is - interactive.</para> - </listitem></itemizedlist> - - <para>If you can run <command role="hg-cmd">hg version</command> - over an ssh connection, well done! You've got the server and - client sorted out. You should now be able to use Mercurial to - access repositories hosted by that username on that server. - If you run into problems with Mercurial and ssh at this point, - try using the <option role="hg-opt-global">--debug</option> - option to get a clearer picture of what's going on.</para> - - </sect2> - <sect2> - <title>Using compression with ssh</title> - - <para>Mercurial does not compress data when it uses the ssh - protocol, because the ssh protocol can transparently compress - data. However, the default behaviour of ssh clients is - <emphasis>not</emphasis> to request compression.</para> - - <para>Over any network other than a fast LAN (even a wireless - network), using compression is likely to significantly speed - up Mercurial's network operations. For example, over a WAN, - someone measured compression as reducing the amount of time - required to clone a particularly large repository from 51 - minutes to 17 minutes.</para> - - <para>Both <command>ssh</command> and <command>plink</command> - accept a <option role="cmd-opt-ssh">-C</option> option which - turns on compression. You can easily edit your <filename - role="special">~/.hgrc</filename> to enable compression for - all of Mercurial's uses of the ssh protocol.</para> - <programlisting>[ui] -ssh = ssh -C</programlisting> - - <para>If you use <command>ssh</command>, you can configure it to - always use compression when talking to your server. To do - this, edit your <filename - role="special">.ssh/config</filename> file (which may not - yet exist), as follows.</para> - <programlisting>Host hg - Compression yes - HostName hg.example.com</programlisting> - <para>This defines an alias, <literal>hg</literal>. When you - use it on the <command>ssh</command> command line or in a - Mercurial <literal>ssh</literal>-protocol URL, it will cause - <command>ssh</command> to connect to - <literal>hg.example.com</literal> and use compression. This - gives you both a shorter name to type and compression, each of - which is a good thing in its own right.</para> - - </sect2> - </sect1> - <sect1 id="sec:collab:cgi"> - <title>Serving over HTTP using CGI</title> - - <para>Depending on how ambitious you are, configuring Mercurial's - CGI interface can take anything from a few moments to several - hours.</para> - - <para>We'll begin with the simplest of examples, and work our way - towards a more complex configuration. Even for the most basic - case, you're almost certainly going to need to read and modify - your web server's configuration.</para> - - <note> - <para> Configuring a web server is a complex, fiddly, and - highly system-dependent activity. I can't possibly give you - instructions that will cover anything like all of the cases - you will encounter. Please use your discretion and judgment in - following the sections below. Be prepared to make plenty of - mistakes, and to spend a lot of time reading your server's - error logs.</para> - </note> - - <sect2> - <title>Web server configuration checklist</title> - - <para>Before you continue, do take a few moments to check a few - aspects of your system's setup.</para> - - <orderedlist> - <listitem><para>Do you have a web server installed at all? - Mac OS X ships with Apache, but many other systems may not - have a web server installed.</para> - </listitem> - <listitem><para>If you have a web server installed, is it - actually running? On most systems, even if one is - present, it will be disabled by default.</para> - </listitem> - <listitem><para>Is your server configured to allow you to run - CGI programs in the directory where you plan to do so? - Most servers default to explicitly disabling the ability - to run CGI programs.</para> - </listitem></orderedlist> - - <para>If you don't have a web server installed, and don't have - substantial experience configuring Apache, you should consider - using the <literal>lighttpd</literal> web server instead of - Apache. Apache has a well-deserved reputation for baroque and - confusing configuration. While <literal>lighttpd</literal> is - less capable in some ways than Apache, most of these - capabilities are not relevant to serving Mercurial - repositories. And <literal>lighttpd</literal> is undeniably - <emphasis>much</emphasis> easier to get started with than - Apache.</para> - - </sect2> - <sect2> - <title>Basic CGI configuration</title> - - <para>On Unix-like systems, it's common for users to have a - subdirectory named something like <filename - class="directory">public_html</filename> in their home - directory, from which they can serve up web pages. A file - named <filename>foo</filename> in this directory will be - accessible at a URL of the form - <literal>http://www.example.com/username/foo</literal>.</para> - - <para>To get started, find the <filename - role="special">hgweb.cgi</filename> script that should be - present in your Mercurial installation. If you can't quickly - find a local copy on your system, simply download one from the - master Mercurial repository at <ulink - url="http://www.selenic.com/repo/hg/raw-file/tip/hgweb.cgi">http://www.selenic.com/repo/hg/raw-file/tip/hgweb.cgi</ulink>.</para> - - <para>You'll need to copy this script into your <filename - class="directory">public_html</filename> directory, and - ensure that it's executable.</para> - <programlisting>cp .../hgweb.cgi ~/public_html -chmod 755 ~/public_html/hgweb.cgi</programlisting> - <para>The <literal>755</literal> argument to - <command>chmod</command> is a little more general than just - making the script executable: it ensures that the script is - executable by anyone, and that <quote>group</quote> and - <quote>other</quote> write permissions are - <emphasis>not</emphasis> set. If you were to leave those - write permissions enabled, Apache's <literal>suexec</literal> - subsystem would likely refuse to execute the script. In fact, - <literal>suexec</literal> also insists that the - <emphasis>directory</emphasis> in which the script resides - must not be writable by others.</para> - <programlisting>chmod 755 ~/public_html</programlisting> - - <sect3 id="sec:collab:wtf"> - <title>What could <emphasis>possibly</emphasis> go - wrong?</title> - - <para>Once you've copied the CGI script into place, go into a - web browser, and try to open the URL <ulink - url="http://myhostname/ - myuser/hgweb.cgi">http://myhostname/ - myuser/hgweb.cgi</ulink>, <emphasis>but</emphasis> brace - yourself for instant failure. There's a high probability - that trying to visit this URL will fail, and there are many - possible reasons for this. In fact, you're likely to - stumble over almost every one of the possible errors below, - so please read carefully. The following are all of the - problems I ran into on a system running Fedora 7, with a - fresh installation of Apache, and a user account that I - created specially to perform this exercise.</para> - - <para>Your web server may have per-user directories disabled. - If you're using Apache, search your config file for a - <literal>UserDir</literal> directive. If there's none - present, per-user directories will be disabled. If one - exists, but its value is <literal>disabled</literal>, then - per-user directories will be disabled. Otherwise, the - string after <literal>UserDir</literal> gives the name of - the subdirectory that Apache will look in under your home - directory, for example <filename - class="directory">public_html</filename>.</para> - - <para>Your file access permissions may be too restrictive. - The web server must be able to traverse your home directory - and directories under your <filename - class="directory">public_html</filename> directory, and - read files under the latter too. Here's a quick recipe to - help you to make your permissions more appropriate.</para> - <programlisting>chmod 755 ~ -find ~/public_html -type d -print0 | xargs -0r chmod 755 -find ~/public_html -type f -print0 | xargs -0r chmod 644</programlisting> - - <para>The other possibility with permissions is that you might - get a completely empty window when you try to load the - script. In this case, it's likely that your access - permissions are <emphasis>too permissive</emphasis>. Apache's - <literal>suexec</literal> subsystem won't execute a script - that's group- or world-writable, for example.</para> - - <para>Your web server may be configured to disallow execution - of CGI programs in your per-user web directory. Here's - Apache's default per-user configuration from my Fedora - system.</para> - - &ch06-apache-config.lst; - - <para>If you find a similar-looking - <literal>Directory</literal> group in your Apache - configuration, the directive to look at inside it is - <literal>Options</literal>. Add <literal>ExecCGI</literal> - to the end of this list if it's missing, and restart the web - server.</para> - - <para>If you find that Apache serves you the text of the CGI - script instead of executing it, you may need to either - uncomment (if already present) or add a directive like - this.</para> - <programlisting>AddHandler cgi-script .cgi</programlisting> - - <para>The next possibility is that you might be served with a - colourful Python backtrace claiming that it can't import a - <literal>mercurial</literal>-related module. This is - actually progress! The server is now capable of executing - your CGI script. This error is only likely to occur if - you're running a private installation of Mercurial, instead - of a system-wide version. Remember that the web server runs - the CGI program without any of the environment variables - that you take for granted in an interactive session. If - this error happens to you, edit your copy of <filename - role="special">hgweb.cgi</filename> and follow the - directions inside it to correctly set your - <envar>PYTHONPATH</envar> environment variable.</para> - - <para>Finally, you are <emphasis>certain</emphasis> to by - served with another colourful Python backtrace: this one - will complain that it can't find <filename - class="directory">/path/to/repository</filename>. Edit - your <filename role="special">hgweb.cgi</filename> script - and replace the <filename - class="directory">/path/to/repository</filename> string - with the complete path to the repository you want to serve - up.</para> - - <para>At this point, when you try to reload the page, you - should be presented with a nice HTML view of your - repository's history. Whew!</para> - - </sect3> - <sect3> - <title>Configuring lighttpd</title> - - <para>To be exhaustive in my experiments, I tried configuring - the increasingly popular <literal>lighttpd</literal> web - server to serve the same repository as I described with - Apache above. I had already overcome all of the problems I - outlined with Apache, many of which are not server-specific. - As a result, I was fairly sure that my file and directory - permissions were good, and that my <filename - role="special">hgweb.cgi</filename> script was properly - edited.</para> - - <para>Once I had Apache running, getting - <literal>lighttpd</literal> to serve the repository was a - snap (in other words, even if you're trying to use - <literal>lighttpd</literal>, you should read the Apache - section). I first had to edit the - <literal>mod_access</literal> section of its config file to - enable <literal>mod_cgi</literal> and - <literal>mod_userdir</literal>, both of which were disabled - by default on my system. I then added a few lines to the - end of the config file, to configure these modules.</para> - <programlisting>userdir.path = "public_html" -cgi.assign = (".cgi" => "" )</programlisting> - <para>With this done, <literal>lighttpd</literal> ran - immediately for me. If I had configured - <literal>lighttpd</literal> before Apache, I'd almost - certainly have run into many of the same system-level - configuration problems as I did with Apache. However, I - found <literal>lighttpd</literal> to be noticeably easier to - configure than Apache, even though I've used Apache for over - a decade, and this was my first exposure to - <literal>lighttpd</literal>.</para> - - </sect3> - </sect2> - <sect2> - <title>Sharing multiple repositories with one CGI script</title> - - <para>The <filename role="special">hgweb.cgi</filename> script - only lets you publish a single repository, which is an - annoying restriction. If you want to publish more than one - without wracking yourself with multiple copies of the same - script, each with different names, a better choice is to use - the <filename role="special">hgwebdir.cgi</filename> - script.</para> - - <para>The procedure to configure <filename - role="special">hgwebdir.cgi</filename> is only a little more - involved than for <filename - role="special">hgweb.cgi</filename>. First, you must obtain - a copy of the script. If you don't have one handy, you can - download a copy from the master Mercurial repository at <ulink - url="http://www.selenic.com/repo/hg/raw-file/tip/hgwebdir.cgi">http://www.selenic.com/repo/hg/raw-file/tip/hgwebdir.cgi</ulink>.</para> - - <para>You'll need to copy this script into your <filename - class="directory">public_html</filename> directory, and - ensure that it's executable.</para> - <programlisting>cp .../hgwebdir.cgi ~/public_html -chmod 755 ~/public_html ~/public_html/hgwebdir.cgi</programlisting> - <para>With basic configuration out of the way, try to visit - <ulink url="http://myhostname/ - myuser/hgwebdir.cgi">http://myhostname/ - myuser/hgwebdir.cgi</ulink> in your browser. It should - display an empty list of repositories. If you get a blank - window or error message, try walking through the list of - potential problems in section <xref - linkend="sec:collab:wtf"/>.</para> - - <para>The <filename role="special">hgwebdir.cgi</filename> - script relies on an external configuration file. By default, - it searches for a file named <filename - role="special">hgweb.config</filename> in the same directory - as itself. You'll need to create this file, and make it - world-readable. The format of the file is similar to a - Windows <quote>ini</quote> file, as understood by Python's - <literal>ConfigParser</literal> - <citation>web:configparser</citation> module.</para> - - <para>The easiest way to configure <filename - role="special">hgwebdir.cgi</filename> is with a section - named <literal>collections</literal>. This will automatically - publish <emphasis>every</emphasis> repository under the - directories you name. The section should look like - this:</para> - <programlisting>[collections] -/my/root = /my/root</programlisting> - <para>Mercurial interprets this by looking at the directory name - on the <emphasis>right</emphasis> hand side of the - <quote><literal>=</literal></quote> sign; finding repositories - in that directory hierarchy; and using the text on the - <emphasis>left</emphasis> to strip off matching text from the - names it will actually list in the web interface. The - remaining component of a path after this stripping has - occurred is called a <quote>virtual path</quote>.</para> - - <para>Given the example above, if we have a repository whose - local path is <filename - class="directory">/my/root/this/repo</filename>, the CGI - script will strip the leading <filename - class="directory">/my/root</filename> from the name, and - publish the repository with a virtual path of <filename - class="directory">this/repo</filename>. If the base URL for - our CGI script is <ulink url="http://myhostname/ - myuser/hgwebdir.cgi">http://myhostname/ - myuser/hgwebdir.cgi</ulink>, the complete URL for that - repository will be <ulink url="http://myhostname/ - myuser/hgwebdir.cgi/this/repo">http://myhostname/ - myuser/hgwebdir.cgi/this/repo</ulink>.</para> - - <para>If we replace <filename - class="directory">/my/root</filename> on the left hand side - of this example with <filename - class="directory">/my</filename>, then <filename - role="special">hgwebdir.cgi</filename> will only strip off - <filename class="directory">/my</filename> from the repository - name, and will give us a virtual path of <filename - class="directory">root/this/repo</filename> instead of - <filename class="directory">this/repo</filename>.</para> - - <para>The <filename role="special">hgwebdir.cgi</filename> - script will recursively search each directory listed in the - <literal>collections</literal> section of its configuration - file, but it will <literal>not</literal> recurse into the - repositories it finds.</para> - - <para>The <literal>collections</literal> mechanism makes it easy - to publish many repositories in a <quote>fire and - forget</quote> manner. You only need to set up the CGI - script and configuration file one time. Afterwards, you can - publish or unpublish a repository at any time by simply moving - it into, or out of, the directory hierarchy in which you've - configured <filename role="special">hgwebdir.cgi</filename> to - look.</para> - - <sect3> - <title>Explicitly specifying which repositories to - publish</title> - - <para>In addition to the <literal>collections</literal> - mechanism, the <filename - role="special">hgwebdir.cgi</filename> script allows you - to publish a specific list of repositories. To do so, - create a <literal>paths</literal> section, with contents of - the following form.</para> - <programlisting>[paths] -repo1 = /my/path/to/some/repo -repo2 = /some/path/to/another</programlisting> - <para>In this case, the virtual path (the component that will - appear in a URL) is on the left hand side of each - definition, while the path to the repository is on the - right. Notice that there does not need to be any - relationship between the virtual path you choose and the - location of a repository in your filesystem.</para> - - <para>If you wish, you can use both the - <literal>collections</literal> and <literal>paths</literal> - mechanisms simultaneously in a single configuration - file.</para> - - <note> - <para> If multiple repositories have the same virtual path, - <filename role="special">hgwebdir.cgi</filename> will not - report an error. Instead, it will behave - unpredictably.</para> - </note> - - </sect3> - </sect2> - <sect2> - <title>Downloading source archives</title> - - <para>Mercurial's web interface lets users download an archive - of any revision. This archive will contain a snapshot of the - working directory as of that revision, but it will not contain - a copy of the repository data.</para> - - <para>By default, this feature is not enabled. To enable it, - you'll need to add an <envar - role="rc-item-web">allow_archive</envar> item to the - <literal role="rc-web">web</literal> section of your <filename - role="special">~/.hgrc</filename>.</para> - - </sect2> - <sect2> - <title>Web configuration options</title> - - <para>Mercurial's web interfaces (the <command role="hg-cmd">hg - serve</command> command, and the <filename - role="special">hgweb.cgi</filename> and <filename - role="special">hgwebdir.cgi</filename> scripts) have a - number of configuration options that you can set. These - belong in a section named <literal - role="rc-web">web</literal>.</para> - <itemizedlist> - <listitem><para><envar - role="rc-item-web">allow_archive</envar>: Determines - which (if any) archive download mechanisms Mercurial - supports. If you enable this feature, users of the web - interface will be able to download an archive of whatever - revision of a repository they are viewing. To enable the - archive feature, this item must take the form of a - sequence of words drawn from the list below.</para> - <itemizedlist> - <listitem><para><literal>bz2</literal>: A - <command>tar</command> archive, compressed using - <literal>bzip2</literal> compression. This has the - best compression ratio, but uses the most CPU time on - the server.</para> - </listitem> - <listitem><para><literal>gz</literal>: A - <command>tar</command> archive, compressed using - <literal>gzip</literal> compression.</para> - </listitem> - <listitem><para><literal>zip</literal>: A - <command>zip</command> archive, compressed using LZW - compression. This format has the worst compression - ratio, but is widely used in the Windows world.</para> - </listitem> - </itemizedlist> - <para> If you provide an empty list, or don't have an - <envar role="rc-item-web">allow_archive</envar> entry at - all, this feature will be disabled. Here is an example of - how to enable all three supported formats.</para> - <programlisting>[web] -allow_archive = bz2 gz zip</programlisting> - </listitem> - <listitem><para><envar role="rc-item-web">allowpull</envar>: - Boolean. Determines whether the web interface allows - remote users to <command role="hg-cmd">hg pull</command> - and <command role="hg-cmd">hg clone</command> this - repository over HTTP. If set to <literal>no</literal> or - <literal>false</literal>, only the - <quote>human-oriented</quote> portion of the web interface - is available.</para> - </listitem> - <listitem><para><envar role="rc-item-web">contact</envar>: - String. A free-form (but preferably brief) string - identifying the person or group in charge of the - repository. This often contains the name and email - address of a person or mailing list. It often makes sense - to place this entry in a repository's own <filename - role="special">.hg/hgrc</filename> file, but it can make - sense to use in a global <filename - role="special">~/.hgrc</filename> if every repository - has a single maintainer.</para> - </listitem> - <listitem><para><envar role="rc-item-web">maxchanges</envar>: - Integer. The default maximum number of changesets to - display in a single page of output.</para> - </listitem> - <listitem><para><envar role="rc-item-web">maxfiles</envar>: - Integer. The default maximum number of modified files to - display in a single page of output.</para> - </listitem> - <listitem><para><envar role="rc-item-web">stripes</envar>: - Integer. If the web interface displays alternating - <quote>stripes</quote> to make it easier to visually align - rows when you are looking at a table, this number controls - the number of rows in each stripe.</para> - </listitem> - <listitem><para><envar role="rc-item-web">style</envar>: - Controls the template Mercurial uses to display the web - interface. Mercurial ships with two web templates, named - <literal>default</literal> and <literal>gitweb</literal> - (the latter is much more visually attractive). You can - also specify a custom template of your own; see chapter - <xref linkend="chap:template"/> for details. - Here, you can see how to enable the - <literal>gitweb</literal> style.</para> - <programlisting>[web] -style = gitweb</programlisting> - </listitem> - <listitem><para><envar role="rc-item-web">templates</envar>: - Path. The directory in which to search for template - files. By default, Mercurial searches in the directory in - which it was installed.</para> - </listitem></itemizedlist> - <para>If you are using <filename - role="special">hgwebdir.cgi</filename>, you can place a few - configuration items in a <literal role="rc-web">web</literal> - section of the <filename - role="special">hgweb.config</filename> file instead of a - <filename role="special">~/.hgrc</filename> file, for - convenience. These items are <envar - role="rc-item-web">motd</envar> and <envar - role="rc-item-web">style</envar>.</para> - - <sect3> - <title>Options specific to an individual repository</title> - - <para>A few <literal role="rc-web">web</literal> configuration - items ought to be placed in a repository's local <filename - role="special">.hg/hgrc</filename>, rather than a user's - or global <filename role="special">~/.hgrc</filename>.</para> - <itemizedlist> - <listitem><para><envar - role="rc-item-web">description</envar>: String. A - free-form (but preferably brief) string that describes - the contents or purpose of the repository.</para> - </listitem> - <listitem><para><envar role="rc-item-web">name</envar>: - String. The name to use for the repository in the web - interface. This overrides the default name, which is - the last component of the repository's path.</para> - </listitem></itemizedlist> - - </sect3> - <sect3> - <title>Options specific to the <command role="hg-cmd">hg - serve</command> command</title> - - <para>Some of the items in the <literal - role="rc-web">web</literal> section of a <filename - role="special">~/.hgrc</filename> file are only for use - with the <command role="hg-cmd">hg serve</command> - command.</para> - <itemizedlist> - <listitem><para><envar role="rc-item-web">accesslog</envar>: - Path. The name of a file into which to write an access - log. By default, the <command role="hg-cmd">hg - serve</command> command writes this information to - standard output, not to a file. Log entries are written - in the standard <quote>combined</quote> file format used - by almost all web servers.</para> - </listitem> - <listitem><para><envar role="rc-item-web">address</envar>: - String. The local address on which the server should - listen for incoming connections. By default, the server - listens on all addresses.</para> - </listitem> - <listitem><para><envar role="rc-item-web">errorlog</envar>: - Path. The name of a file into which to write an error - log. By default, the <command role="hg-cmd">hg - serve</command> command writes this information to - standard error, not to a file.</para> - </listitem> - <listitem><para><envar role="rc-item-web">ipv6</envar>: - Boolean. Whether to use the IPv6 protocol. By default, - IPv6 is not used.</para> - </listitem> - <listitem><para><envar role="rc-item-web">port</envar>: - Integer. The TCP port number on which the server should - listen. The default port number used is 8000.</para> - </listitem></itemizedlist> - - </sect3> - <sect3> - <title>Choosing the right <filename - role="special">~/.hgrc</filename> file to add <literal - role="rc-web">web</literal> items to</title> - - <para>It is important to remember that a web server like - Apache or <literal>lighttpd</literal> will run under a user - ID that is different to yours. CGI scripts run by your - server, such as <filename - role="special">hgweb.cgi</filename>, will usually also run - under that user ID.</para> - - <para>If you add <literal role="rc-web">web</literal> items to - your own personal <filename role="special">~/.hgrc</filename> file, CGI scripts won't read that - <filename role="special">~/.hgrc</filename> file. Those - settings will thus only affect the behaviour of the <command - role="hg-cmd">hg serve</command> command when you run it. - To cause CGI scripts to see your settings, either create a - <filename role="special">~/.hgrc</filename> file in the - home directory of the user ID that runs your web server, or - add those settings to a system-wide <filename - role="special">~/.hgrc</filename> file.</para> - - - </sect3> - </sect2> - </sect1> -</chapter> - -<!-- -local variables: -sgml-parent-document: ("00book.xml" "book" "chapter") -end: --->
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/en/ch06-filenames.xml Thu Mar 19 20:54:12 2009 -0700 @@ -0,0 +1,408 @@ +<!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : --> + +<chapter id="chap:names"> + <?dbhtml filename="file-names-and-pattern-matching.html"?> + <title>File names and pattern matching</title> + + <para>Mercurial provides mechanisms that let you work with file + names in a consistent and expressive way.</para> + + <sect1> + <title>Simple file naming</title> + + <para>Mercurial uses a unified piece of machinery <quote>under the + hood</quote> to handle file names. Every command behaves + uniformly with respect to file names. The way in which commands + work with file names is as follows.</para> + + <para>If you explicitly name real files on the command line, + Mercurial works with exactly those files, as you would expect. + &interaction.filenames.files;</para> + + <para>When you provide a directory name, Mercurial will interpret + this as <quote>operate on every file in this directory and its + subdirectories</quote>. Mercurial traverses the files and + subdirectories in a directory in alphabetical order. When it + encounters a subdirectory, it will traverse that subdirectory + before continuing with the current directory.</para> + + &interaction.filenames.dirs; + + </sect1> + <sect1> + <title>Running commands without any file names</title> + + <para>Mercurial's commands that work with file names have useful + default behaviours when you invoke them without providing any + file names or patterns. What kind of behaviour you should + expect depends on what the command does. Here are a few rules + of thumb you can use to predict what a command is likely to do + if you don't give it any names to work with.</para> + <itemizedlist> + <listitem><para>Most commands will operate on the entire working + directory. This is what the <command role="hg-cmd">hg + add</command> command does, for example.</para> + </listitem> + <listitem><para>If the command has effects that are difficult or + impossible to reverse, it will force you to explicitly + provide at least one name or pattern (see below). This + protects you from accidentally deleting files by running + <command role="hg-cmd">hg remove</command> with no + arguments, for example.</para> + </listitem></itemizedlist> + + <para>It's easy to work around these default behaviours if they + don't suit you. If a command normally operates on the whole + working directory, you can invoke it on just the current + directory and its subdirectories by giving it the name + <quote><filename class="directory">.</filename></quote>.</para> + + &interaction.filenames.wdir-subdir; + + <para>Along the same lines, some commands normally print file + names relative to the root of the repository, even if you're + invoking them from a subdirectory. Such a command will print + file names relative to your subdirectory if you give it explicit + names. Here, we're going to run <command role="hg-cmd">hg + status</command> from a subdirectory, and get it to operate on + the entire working directory while printing file names relative + to our subdirectory, by passing it the output of the <command + role="hg-cmd">hg root</command> command.</para> + + &interaction.filenames.wdir-relname; + + </sect1> + <sect1> + <title>Telling you what's going on</title> + + <para>The <command role="hg-cmd">hg add</command> example in the + preceding section illustrates something else that's helpful + about Mercurial commands. If a command operates on a file that + you didn't name explicitly on the command line, it will usually + print the name of the file, so that you will not be surprised + what's going on.</para> + + <para>The principle here is of <emphasis>least + surprise</emphasis>. If you've exactly named a file on the + command line, there's no point in repeating it back at you. If + Mercurial is acting on a file <emphasis>implicitly</emphasis>, + because you provided no names, or a directory, or a pattern (see + below), it's safest to tell you what it's doing.</para> + + <para>For commands that behave this way, you can silence them + using the <option role="hg-opt-global">-q</option> option. You + can also get them to print the name of every file, even those + you've named explicitly, using the <option + role="hg-opt-global">-v</option> option.</para> + + </sect1> + <sect1> + <title>Using patterns to identify files</title> + + <para>In addition to working with file and directory names, + Mercurial lets you use <emphasis>patterns</emphasis> to identify + files. Mercurial's pattern handling is expressive.</para> + + <para>On Unix-like systems (Linux, MacOS, etc.), the job of + matching file names to patterns normally falls to the shell. On + these systems, you must explicitly tell Mercurial that a name is + a pattern. On Windows, the shell does not expand patterns, so + Mercurial will automatically identify names that are patterns, + and expand them for you.</para> + + <para>To provide a pattern in place of a regular name on the + command line, the mechanism is simple:</para> + <programlisting>syntax:patternbody</programlisting> + <para>That is, a pattern is identified by a short text string that + says what kind of pattern this is, followed by a colon, followed + by the actual pattern.</para> + + <para>Mercurial supports two kinds of pattern syntax. The most + frequently used is called <literal>glob</literal>; this is the + same kind of pattern matching used by the Unix shell, and should + be familiar to Windows command prompt users, too.</para> + + <para>When Mercurial does automatic pattern matching on Windows, + it uses <literal>glob</literal> syntax. You can thus omit the + <quote><literal>glob:</literal></quote> prefix on Windows, but + it's safe to use it, too.</para> + + <para>The <literal>re</literal> syntax is more powerful; it lets + you specify patterns using regular expressions, also known as + regexps.</para> + + <para>By the way, in the examples that follow, notice that I'm + careful to wrap all of my patterns in quote characters, so that + they won't get expanded by the shell before Mercurial sees + them.</para> + + <sect2> + <title>Shell-style <literal>glob</literal> patterns</title> + + <para>This is an overview of the kinds of patterns you can use + when you're matching on glob patterns.</para> + + <para>The <quote><literal>*</literal></quote> character matches + any string, within a single directory.</para> + + &interaction.filenames.glob.star; + + <para>The <quote><literal>**</literal></quote> pattern matches + any string, and crosses directory boundaries. It's not a + standard Unix glob token, but it's accepted by several popular + Unix shells, and is very useful.</para> + + &interaction.filenames.glob.starstar; + + <para>The <quote><literal>?</literal></quote> pattern matches + any single character.</para> + + &interaction.filenames.glob.question; + + <para>The <quote><literal>[</literal></quote> character begins a + <emphasis>character class</emphasis>. This matches any single + character within the class. The class ends with a + <quote><literal>]</literal></quote> character. A class may + contain multiple <emphasis>range</emphasis>s of the form + <quote><literal>a-f</literal></quote>, which is shorthand for + <quote><literal>abcdef</literal></quote>.</para> + + &interaction.filenames.glob.range; + + <para>If the first character after the + <quote><literal>[</literal></quote> in a character class is a + <quote><literal>!</literal></quote>, it + <emphasis>negates</emphasis> the class, making it match any + single character not in the class.</para> + + <para>A <quote><literal>{</literal></quote> begins a group of + subpatterns, where the whole group matches if any subpattern + in the group matches. The <quote><literal>,</literal></quote> + character separates subpatterns, and + <quote><literal>}</literal></quote> ends the group.</para> + + &interaction.filenames.glob.group; + + <sect3> + <title>Watch out!</title> + + <para>Don't forget that if you want to match a pattern in any + directory, you should not be using the + <quote><literal>*</literal></quote> match-any token, as this + will only match within one directory. Instead, use the + <quote><literal>**</literal></quote> token. This small + example illustrates the difference between the two.</para> + + &interaction.filenames.glob.star-starstar; + + </sect3> + </sect2> + <sect2> + <title>Regular expression matching with <literal>re</literal> + patterns</title> + + <para>Mercurial accepts the same regular expression syntax as + the Python programming language (it uses Python's regexp + engine internally). This is based on the Perl language's + regexp syntax, which is the most popular dialect in use (it's + also used in Java, for example).</para> + + <para>I won't discuss Mercurial's regexp dialect in any detail + here, as regexps are not often used. Perl-style regexps are + in any case already exhaustively documented on a multitude of + web sites, and in many books. Instead, I will focus here on a + few things you should know if you find yourself needing to use + regexps with Mercurial.</para> + + <para>A regexp is matched against an entire file name, relative + to the root of the repository. In other words, even if you're + already in subbdirectory <filename + class="directory">foo</filename>, if you want to match files + under this directory, your pattern must start with + <quote><literal>foo/</literal></quote>.</para> + + <para>One thing to note, if you're familiar with Perl-style + regexps, is that Mercurial's are <emphasis>rooted</emphasis>. + That is, a regexp starts matching against the beginning of a + string; it doesn't look for a match anywhere within the + string. To match anywhere in a string, start your pattern + with <quote><literal>.*</literal></quote>.</para> + + </sect2> + </sect1> + <sect1> + <title>Filtering files</title> + + <para>Not only does Mercurial give you a variety of ways to + specify files; it lets you further winnow those files using + <emphasis>filters</emphasis>. Commands that work with file + names accept two filtering options.</para> + <itemizedlist> + <listitem><para><option role="hg-opt-global">-I</option>, or + <option role="hg-opt-global">--include</option>, lets you + specify a pattern that file names must match in order to be + processed.</para> + </listitem> + <listitem><para><option role="hg-opt-global">-X</option>, or + <option role="hg-opt-global">--exclude</option>, gives you a + way to <emphasis>avoid</emphasis> processing files, if they + match this pattern.</para> + </listitem></itemizedlist> + <para>You can provide multiple <option + role="hg-opt-global">-I</option> and <option + role="hg-opt-global">-X</option> options on the command line, + and intermix them as you please. Mercurial interprets the + patterns you provide using glob syntax by default (but you can + use regexps if you need to).</para> + + <para>You can read a <option role="hg-opt-global">-I</option> + filter as <quote>process only the files that match this + filter</quote>.</para> + + &interaction.filenames.filter.include; + + <para>The <option role="hg-opt-global">-X</option> filter is best + read as <quote>process only the files that don't match this + pattern</quote>.</para> + + &interaction.filenames.filter.exclude; + + </sect1> + <sect1> + <title>Ignoring unwanted files and directories</title> + + <para>XXX.</para> + + </sect1> + <sect1 id="sec:names:case"> + <title>Case sensitivity</title> + + <para>If you're working in a mixed development environment that + contains both Linux (or other Unix) systems and Macs or Windows + systems, you should keep in the back of your mind the knowledge + that they treat the case (<quote>N</quote> versus + <quote>n</quote>) of file names in incompatible ways. This is + not very likely to affect you, and it's easy to deal with if it + does, but it could surprise you if you don't know about + it.</para> + + <para>Operating systems and filesystems differ in the way they + handle the <emphasis>case</emphasis> of characters in file and + directory names. There are three common ways to handle case in + names.</para> + <itemizedlist> + <listitem><para>Completely case insensitive. Uppercase and + lowercase versions of a letter are treated as identical, + both when creating a file and during subsequent accesses. + This is common on older DOS-based systems.</para> + </listitem> + <listitem><para>Case preserving, but insensitive. When a file + or directory is created, the case of its name is stored, and + can be retrieved and displayed by the operating system. + When an existing file is being looked up, its case is + ignored. This is the standard arrangement on Windows and + MacOS. The names <filename>foo</filename> and + <filename>FoO</filename> identify the same file. This + treatment of uppercase and lowercase letters as + interchangeable is also referred to as <emphasis>case + folding</emphasis>.</para> + </listitem> + <listitem><para>Case sensitive. The case of a name is + significant at all times. The names <filename>foo</filename> + and {FoO} identify different files. This is the way Linux + and Unix systems normally work.</para> + </listitem></itemizedlist> + + <para>On Unix-like systems, it is possible to have any or all of + the above ways of handling case in action at once. For example, + if you use a USB thumb drive formatted with a FAT32 filesystem + on a Linux system, Linux will handle names on that filesystem in + a case preserving, but insensitive, way.</para> + + <sect2> + <title>Safe, portable repository storage</title> + + <para>Mercurial's repository storage mechanism is <emphasis>case + safe</emphasis>. It translates file names so that they can + be safely stored on both case sensitive and case insensitive + filesystems. This means that you can use normal file copying + tools to transfer a Mercurial repository onto, for example, a + USB thumb drive, and safely move that drive and repository + back and forth between a Mac, a PC running Windows, and a + Linux box.</para> + + </sect2> + <sect2> + <title>Detecting case conflicts</title> + + <para>When operating in the working directory, Mercurial honours + the naming policy of the filesystem where the working + directory is located. If the filesystem is case preserving, + but insensitive, Mercurial will treat names that differ only + in case as the same.</para> + + <para>An important aspect of this approach is that it is + possible to commit a changeset on a case sensitive (typically + Linux or Unix) filesystem that will cause trouble for users on + case insensitive (usually Windows and MacOS) users. If a + Linux user commits changes to two files, one named + <filename>myfile.c</filename> and the other named + <filename>MyFile.C</filename>, they will be stored correctly + in the repository. And in the working directories of other + Linux users, they will be correctly represented as separate + files.</para> + + <para>If a Windows or Mac user pulls this change, they will not + initially have a problem, because Mercurial's repository + storage mechanism is case safe. However, once they try to + <command role="hg-cmd">hg update</command> the working + directory to that changeset, or <command role="hg-cmd">hg + merge</command> with that changeset, Mercurial will spot the + conflict between the two file names that the filesystem would + treat as the same, and forbid the update or merge from + occurring.</para> + + </sect2> + <sect2> + <title>Fixing a case conflict</title> + + <para>If you are using Windows or a Mac in a mixed environment + where some of your collaborators are using Linux or Unix, and + Mercurial reports a case folding conflict when you try to + <command role="hg-cmd">hg update</command> or <command + role="hg-cmd">hg merge</command>, the procedure to fix the + problem is simple.</para> + + <para>Just find a nearby Linux or Unix box, clone the problem + repository onto it, and use Mercurial's <command + role="hg-cmd">hg rename</command> command to change the + names of any offending files or directories so that they will + no longer cause case folding conflicts. Commit this change, + <command role="hg-cmd">hg pull</command> or <command + role="hg-cmd">hg push</command> it across to your Windows or + MacOS system, and <command role="hg-cmd">hg update</command> + to the revision with the non-conflicting names.</para> + + <para>The changeset with case-conflicting names will remain in + your project's history, and you still won't be able to + <command role="hg-cmd">hg update</command> your working + directory to that changeset on a Windows or MacOS system, but + you can continue development unimpeded.</para> + + <note> + <para> Prior to version 0.9.3, Mercurial did not use a case + safe repository storage mechanism, and did not detect case + folding conflicts. If you are using an older version of + Mercurial on Windows or MacOS, I strongly recommend that you + upgrade.</para> + </note> + + </sect2> + </sect1> +</chapter> + +<!-- +local variables: +sgml-parent-document: ("00book.xml" "book" "chapter") +end: +-->
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/en/ch07-branch.xml Thu Mar 19 20:54:12 2009 -0700 @@ -0,0 +1,533 @@ +<!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : --> + +<chapter id="chap:branch"> + <?dbhtml filename="managing-releases-and-branchy-development.html"?> + <title>Managing releases and branchy development</title> + + <para>Mercurial provides several mechanisms for you to manage a + project that is making progress on multiple fronts at once. To + understand these mechanisms, let's first take a brief look at a + fairly normal software project structure.</para> + + <para>Many software projects issue periodic <quote>major</quote> + releases that contain substantial new features. In parallel, they + may issue <quote>minor</quote> releases. These are usually + identical to the major releases off which they're based, but with + a few bugs fixed.</para> + + <para>In this chapter, we'll start by talking about how to keep + records of project milestones such as releases. We'll then + continue on to talk about the flow of work between different + phases of a project, and how Mercurial can help you to isolate and + manage this work.</para> + + <sect1> + <title>Giving a persistent name to a revision</title> + + <para>Once you decide that you'd like to call a particular + revision a <quote>release</quote>, it's a good idea to record + the identity of that revision. This will let you reproduce that + release at a later date, for whatever purpose you might need at + the time (reproducing a bug, porting to a new platform, etc). + &interaction.tag.init;</para> + + <para>Mercurial lets you give a permanent name to any revision + using the <command role="hg-cmd">hg tag</command> command. Not + surprisingly, these names are called <quote>tags</quote>.</para> + + &interaction.tag.tag; + + <para>A tag is nothing more than a <quote>symbolic name</quote> + for a revision. Tags exist purely for your convenience, so that + you have a handy permanent way to refer to a revision; Mercurial + doesn't interpret the tag names you use in any way. Neither + does Mercurial place any restrictions on the name of a tag, + beyond a few that are necessary to ensure that a tag can be + parsed unambiguously. A tag name cannot contain any of the + following characters:</para> + <itemizedlist> + <listitem><para>Colon (ASCII 58, + <quote><literal>:</literal></quote>)</para> + </listitem> + <listitem><para>Carriage return (ASCII 13, + <quote><literal>\r</literal></quote>)</para> + </listitem> + <listitem><para>Newline (ASCII 10, + <quote><literal>\n</literal></quote>)</para> + </listitem></itemizedlist> + + <para>You can use the <command role="hg-cmd">hg tags</command> + command to display the tags present in your repository. In the + output, each tagged revision is identified first by its name, + then by revision number, and finally by the unique hash of the + revision.</para> + + &interaction.tag.tags; + + <para>Notice that <literal>tip</literal> is listed in the output + of <command role="hg-cmd">hg tags</command>. The + <literal>tip</literal> tag is a special <quote>floating</quote> + tag, which always identifies the newest revision in the + repository.</para> + + <para>In the output of the <command role="hg-cmd">hg + tags</command> command, tags are listed in reverse order, by + revision number. This usually means that recent tags are listed + before older tags. It also means that <literal>tip</literal> is + always going to be the first tag listed in the output of + <command role="hg-cmd">hg tags</command>.</para> + + <para>When you run <command role="hg-cmd">hg log</command>, if it + displays a revision that has tags associated with it, it will + print those tags.</para> + + &interaction.tag.log; + + <para>Any time you need to provide a revision ID to a Mercurial + command, the command will accept a tag name in its place. + Internally, Mercurial will translate your tag name into the + corresponding revision ID, then use that.</para> + + &interaction.tag.log.v1.0; + + <para>There's no limit on the number of tags you can have in a + repository, or on the number of tags that a single revision can + have. As a practical matter, it's not a great idea to have + <quote>too many</quote> (a number which will vary from project + to project), simply because tags are supposed to help you to + find revisions. If you have lots of tags, the ease of using + them to identify revisions diminishes rapidly.</para> + + <para>For example, if your project has milestones as frequent as + every few days, it's perfectly reasonable to tag each one of + those. But if you have a continuous build system that makes + sure every revision can be built cleanly, you'd be introducing a + lot of noise if you were to tag every clean build. Instead, you + could tag failed builds (on the assumption that they're rare!), + or simply not use tags to track buildability.</para> + + <para>If you want to remove a tag that you no longer want, use + <command role="hg-cmd">hg tag --remove</command>.</para> + + &interaction.tag.remove; + + <para>You can also modify a tag at any time, so that it identifies + a different revision, by simply issuing a new <command + role="hg-cmd">hg tag</command> command. You'll have to use the + <option role="hg-opt-tag">-f</option> option to tell Mercurial + that you <emphasis>really</emphasis> want to update the + tag.</para> + + &interaction.tag.replace; + + <para>There will still be a permanent record of the previous + identity of the tag, but Mercurial will no longer use it. + There's thus no penalty to tagging the wrong revision; all you + have to do is turn around and tag the correct revision once you + discover your error.</para> + + <para>Mercurial stores tags in a normal revision-controlled file + in your repository. If you've created any tags, you'll find + them in a file named <filename + role="special">.hgtags</filename>. When you run the <command + role="hg-cmd">hg tag</command> command, Mercurial modifies + this file, then automatically commits the change to it. This + means that every time you run <command role="hg-cmd">hg + tag</command>, you'll see a corresponding changeset in the + output of <command role="hg-cmd">hg log</command>.</para> + + &interaction.tag.tip; + + <sect2> + <title>Handling tag conflicts during a merge</title> + + <para>You won't often need to care about the <filename + role="special">.hgtags</filename> file, but it sometimes + makes its presence known during a merge. The format of the + file is simple: it consists of a series of lines. Each line + starts with a changeset hash, followed by a space, followed by + the name of a tag.</para> + + <para>If you're resolving a conflict in the <filename + role="special">.hgtags</filename> file during a merge, + there's one twist to modifying the <filename + role="special">.hgtags</filename> file: when Mercurial is + parsing the tags in a repository, it + <emphasis>never</emphasis> reads the working copy of the + <filename role="special">.hgtags</filename> file. Instead, it + reads the <emphasis>most recently committed</emphasis> + revision of the file.</para> + + <para>An unfortunate consequence of this design is that you + can't actually verify that your merged <filename + role="special">.hgtags</filename> file is correct until + <emphasis>after</emphasis> you've committed a change. So if + you find yourself resolving a conflict on <filename + role="special">.hgtags</filename> during a merge, be sure to + run <command role="hg-cmd">hg tags</command> after you commit. + If it finds an error in the <filename + role="special">.hgtags</filename> file, it will report the + location of the error, which you can then fix and commit. You + should then run <command role="hg-cmd">hg tags</command> + again, just to be sure that your fix is correct.</para> + + </sect2> + <sect2> + <title>Tags and cloning</title> + + <para>You may have noticed that the <command role="hg-cmd">hg + clone</command> command has a <option + role="hg-opt-clone">-r</option> option that lets you clone + an exact copy of the repository as of a particular changeset. + The new clone will not contain any project history that comes + after the revision you specified. This has an interaction + with tags that can surprise the unwary.</para> + + <para>Recall that a tag is stored as a revision to the <filename + role="special">.hgtags</filename> file, so that when you + create a tag, the changeset in which it's recorded necessarily + refers to an older changeset. When you run <command + role="hg-cmd">hg clone -r foo</command> to clone a + repository as of tag <literal>foo</literal>, the new clone + <emphasis>will not contain the history that created the + tag</emphasis> that you used to clone the repository. The + result is that you'll get exactly the right subset of the + project's history in the new repository, but + <emphasis>not</emphasis> the tag you might have + expected.</para> + + </sect2> + <sect2> + <title>When permanent tags are too much</title> + + <para>Since Mercurial's tags are revision controlled and carried + around with a project's history, everyone you work with will + see the tags you create. But giving names to revisions has + uses beyond simply noting that revision + <literal>4237e45506ee</literal> is really + <literal>v2.0.2</literal>. If you're trying to track down a + subtle bug, you might want a tag to remind you of something + like <quote>Anne saw the symptoms with this + revision</quote>.</para> + + <para>For cases like this, what you might want to use are + <emphasis>local</emphasis> tags. You can create a local tag + with the <option role="hg-opt-tag">-l</option> option to the + <command role="hg-cmd">hg tag</command> command. This will + store the tag in a file called <filename + role="special">.hg/localtags</filename>. Unlike <filename + role="special">.hgtags</filename>, <filename + role="special">.hg/localtags</filename> is not revision + controlled. Any tags you create using <option + role="hg-opt-tag">-l</option> remain strictly local to the + repository you're currently working in.</para> + + </sect2> + </sect1> + <sect1> + <title>The flow of changes&emdash;big picture vs. little</title> + + <para>To return to the outline I sketched at the beginning of a + chapter, let's think about a project that has multiple + concurrent pieces of work under development at once.</para> + + <para>There might be a push for a new <quote>main</quote> release; + a new minor bugfix release to the last main release; and an + unexpected <quote>hot fix</quote> to an old release that is now + in maintenance mode.</para> + + <para>The usual way people refer to these different concurrent + directions of development is as <quote>branches</quote>. + However, we've already seen numerous times that Mercurial treats + <emphasis>all of history</emphasis> as a series of branches and + merges. Really, what we have here is two ideas that are + peripherally related, but which happen to share a name.</para> + <itemizedlist> + <listitem><para><quote>Big picture</quote> branches represent + the sweep of a project's evolution; people give them names, + and talk about them in conversation.</para> + </listitem> + <listitem><para><quote>Little picture</quote> branches are + artefacts of the day-to-day activity of developing and + merging changes. They expose the narrative of how the code + was developed.</para> + </listitem></itemizedlist> + + </sect1> + <sect1> + <title>Managing big-picture branches in repositories</title> + + <para>The easiest way to isolate a <quote>big picture</quote> + branch in Mercurial is in a dedicated repository. If you have + an existing shared repository&emdash;let's call it + <literal>myproject</literal>&emdash;that reaches a + <quote>1.0</quote> milestone, you can start to prepare for + future maintenance releases on top of version 1.0 by tagging the + revision from which you prepared the 1.0 release.</para> + + &interaction.branch-repo.tag; + + <para>You can then clone a new shared + <literal>myproject-1.0.1</literal> repository as of that + tag.</para> + + &interaction.branch-repo.clone; + + <para>Afterwards, if someone needs to work on a bug fix that ought + to go into an upcoming 1.0.1 minor release, they clone the + <literal>myproject-1.0.1</literal> repository, make their + changes, and push them back.</para> + + &interaction.branch-repo.bugfix; + + <para>Meanwhile, development for + the next major release can continue, isolated and unabated, in + the <literal>myproject</literal> repository.</para> + + &interaction.branch-repo.new; + + </sect1> + <sect1> + <title>Don't repeat yourself: merging across branches</title> + + <para>In many cases, if you have a bug to fix on a maintenance + branch, the chances are good that the bug exists on your + project's main branch (and possibly other maintenance branches, + too). It's a rare developer who wants to fix the same bug + multiple times, so let's look at a few ways that Mercurial can + help you to manage these bugfixes without duplicating your + work.</para> + + <para>In the simplest instance, all you need to do is pull changes + from your maintenance branch into your local clone of the target + branch.</para> + + &interaction.branch-repo.pull; + + <para>You'll then need to merge the heads of the two branches, and + push back to the main branch.</para> + + &interaction.branch-repo.merge; + + </sect1> + <sect1> + <title>Naming branches within one repository</title> + + <para>In most instances, isolating branches in repositories is the + right approach. Its simplicity makes it easy to understand; and + so it's hard to make mistakes. There's a one-to-one + relationship between branches you're working in and directories + on your system. This lets you use normal (non-Mercurial-aware) + tools to work on files within a branch/repository.</para> + + <para>If you're more in the <quote>power user</quote> category + (<emphasis>and</emphasis> your collaborators are too), there is + an alternative way of handling branches that you can consider. + I've already mentioned the human-level distinction between + <quote>small picture</quote> and <quote>big picture</quote> + branches. While Mercurial works with multiple <quote>small + picture</quote> branches in a repository all the time (for + example after you pull changes in, but before you merge them), + it can <emphasis>also</emphasis> work with multiple <quote>big + picture</quote> branches.</para> + + <para>The key to working this way is that Mercurial lets you + assign a persistent <emphasis>name</emphasis> to a branch. + There always exists a branch named <literal>default</literal>. + Even before you start naming branches yourself, you can find + traces of the <literal>default</literal> branch if you look for + them.</para> + + <para>As an example, when you run the <command role="hg-cmd">hg + commit</command> command, and it pops up your editor so that + you can enter a commit message, look for a line that contains + the text <quote><literal>HG: branch default</literal></quote> at + the bottom. This is telling you that your commit will occur on + the branch named <literal>default</literal>.</para> + + <para>To start working with named branches, use the <command + role="hg-cmd">hg branches</command> command. This command + lists the named branches already present in your repository, + telling you which changeset is the tip of each.</para> + + &interaction.branch-named.branches; + + <para>Since you haven't created any named branches yet, the only + one that exists is <literal>default</literal>.</para> + + <para>To find out what the <quote>current</quote> branch is, run + the <command role="hg-cmd">hg branch</command> command, giving + it no arguments. This tells you what branch the parent of the + current changeset is on.</para> + + &interaction.branch-named.branch; + + <para>To create a new branch, run the <command role="hg-cmd">hg + branch</command> command again. This time, give it one + argument: the name of the branch you want to create.</para> + + &interaction.branch-named.create; + + <para>After you've created a branch, you might wonder what effect + the <command role="hg-cmd">hg branch</command> command has had. + What do the <command role="hg-cmd">hg status</command> and + <command role="hg-cmd">hg tip</command> commands report?</para> + + &interaction.branch-named.status; + + <para>Nothing has changed in the + working directory, and there's been no new history created. As + this suggests, running the <command role="hg-cmd">hg + branch</command> command has no permanent effect; it only + tells Mercurial what branch name to use the + <emphasis>next</emphasis> time you commit a changeset.</para> + + <para>When you commit a change, Mercurial records the name of the + branch on which you committed. Once you've switched from the + <literal>default</literal> branch to another and committed, + you'll see the name of the new branch show up in the output of + <command role="hg-cmd">hg log</command>, <command + role="hg-cmd">hg tip</command>, and other commands that + display the same kind of output.</para> + + &interaction.branch-named.commit; + + <para>The <command role="hg-cmd">hg log</command>-like commands + will print the branch name of every changeset that's not on the + <literal>default</literal> branch. As a result, if you never + use named branches, you'll never see this information.</para> + + <para>Once you've named a branch and committed a change with that + name, every subsequent commit that descends from that change + will inherit the same branch name. You can change the name of a + branch at any time, using the <command role="hg-cmd">hg + branch</command> command.</para> + + &interaction.branch-named.rebranch; + + <para>In practice, this is something you won't do very often, as + branch names tend to have fairly long lifetimes. (This isn't a + rule, just an observation.)</para> + + </sect1> + <sect1> + <title>Dealing with multiple named branches in a + repository</title> + + <para>If you have more than one named branch in a repository, + Mercurial will remember the branch that your working directory + on when you start a command like <command role="hg-cmd">hg + update</command> or <command role="hg-cmd">hg pull + -u</command>. It will update the working directory to the tip + of this branch, no matter what the <quote>repo-wide</quote> tip + is. To update to a revision that's on a different named branch, + you may need to use the <option role="hg-opt-update">-C</option> + option to <command role="hg-cmd">hg update</command>.</para> + + <para>This behaviour is a little subtle, so let's see it in + action. First, let's remind ourselves what branch we're + currently on, and what branches are in our repository.</para> + + &interaction.branch-named.parents; + + <para>We're on the <literal>bar</literal> branch, but there also + exists an older <command role="hg-cmd">hg foo</command> + branch.</para> + + <para>We can <command role="hg-cmd">hg update</command> back and + forth between the tips of the <literal>foo</literal> and + <literal>bar</literal> branches without needing to use the + <option role="hg-opt-update">-C</option> option, because this + only involves going backwards and forwards linearly through our + change history.</para> + + &interaction.branch-named.update-switchy; + + <para>If we go back to the <literal>foo</literal> branch and then + run <command role="hg-cmd">hg update</command>, it will keep us + on <literal>foo</literal>, not move us to the tip of + <literal>bar</literal>.</para> + + &interaction.branch-named.update-nothing; + + <para>Committing a new change on the <literal>foo</literal> branch + introduces a new head.</para> + + &interaction.branch-named.foo-commit; + + </sect1> + <sect1> + <title>Branch names and merging</title> + + <para>As you've probably noticed, merges in Mercurial are not + symmetrical. Let's say our repository has two heads, 17 and 23. + If I <command role="hg-cmd">hg update</command> to 17 and then + <command role="hg-cmd">hg merge</command> with 23, Mercurial + records 17 as the first parent of the merge, and 23 as the + second. Whereas if I <command role="hg-cmd">hg update</command> + to 23 and then <command role="hg-cmd">hg merge</command> with + 17, it records 23 as the first parent, and 17 as the + second.</para> + + <para>This affects Mercurial's choice of branch name when you + merge. After a merge, Mercurial will retain the branch name of + the first parent when you commit the result of the merge. If + your first parent's branch name is <literal>foo</literal>, and + you merge with <literal>bar</literal>, the branch name will + still be <literal>foo</literal> after you merge.</para> + + <para>It's not unusual for a repository to contain multiple heads, + each with the same branch name. Let's say I'm working on the + <literal>foo</literal> branch, and so are you. We commit + different changes; I pull your changes; I now have two heads, + each claiming to be on the <literal>foo</literal> branch. The + result of a merge will be a single head on the + <literal>foo</literal> branch, as you might hope.</para> + + <para>But if I'm working on the <literal>bar</literal> branch, and + I merge work from the <literal>foo</literal> branch, the result + will remain on the <literal>bar</literal> branch.</para> + + &interaction.branch-named.merge; + + <para>To give a more concrete example, if I'm working on the + <literal>bleeding-edge</literal> branch, and I want to bring in + the latest fixes from the <literal>stable</literal> branch, + Mercurial will choose the <quote>right</quote> + (<literal>bleeding-edge</literal>) branch name when I pull and + merge from <literal>stable</literal>.</para> + + </sect1> + <sect1> + <title>Branch naming is generally useful</title> + + <para>You shouldn't think of named branches as applicable only to + situations where you have multiple long-lived branches + cohabiting in a single repository. They're very useful even in + the one-branch-per-repository case.</para> + + <para>In the simplest case, giving a name to each branch gives you + a permanent record of which branch a changeset originated on. + This gives you more context when you're trying to follow the + history of a long-lived branchy project.</para> + + <para>If you're working with shared repositories, you can set up a + <literal role="hook">pretxnchangegroup</literal> hook on each + that will block incoming changes that have the + <quote>wrong</quote> branch name. This provides a simple, but + effective, defence against people accidentally pushing changes + from a <quote>bleeding edge</quote> branch to a + <quote>stable</quote> branch. Such a hook might look like this + inside the shared repo's <filename role="special"> + /.hgrc</filename>.</para> + <programlisting>[hooks] +pretxnchangegroup.branch = hg heads --template '{branches} ' | grep mybranch</programlisting> + + </sect1> +</chapter> + +<!-- +local variables: +sgml-parent-document: ("00book.xml" "book" "chapter") +end: +-->
--- a/en/ch07-filenames.xml Wed Mar 18 00:08:22 2009 -0700 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,408 +0,0 @@ -<!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : --> - -<chapter id="chap:names"> - <?dbhtml filename="file-names-and-pattern-matching.html"?> - <title>File names and pattern matching</title> - - <para>Mercurial provides mechanisms that let you work with file - names in a consistent and expressive way.</para> - - <sect1> - <title>Simple file naming</title> - - <para>Mercurial uses a unified piece of machinery <quote>under the - hood</quote> to handle file names. Every command behaves - uniformly with respect to file names. The way in which commands - work with file names is as follows.</para> - - <para>If you explicitly name real files on the command line, - Mercurial works with exactly those files, as you would expect. - &interaction.filenames.files;</para> - - <para>When you provide a directory name, Mercurial will interpret - this as <quote>operate on every file in this directory and its - subdirectories</quote>. Mercurial traverses the files and - subdirectories in a directory in alphabetical order. When it - encounters a subdirectory, it will traverse that subdirectory - before continuing with the current directory.</para> - - &interaction.filenames.dirs; - - </sect1> - <sect1> - <title>Running commands without any file names</title> - - <para>Mercurial's commands that work with file names have useful - default behaviours when you invoke them without providing any - file names or patterns. What kind of behaviour you should - expect depends on what the command does. Here are a few rules - of thumb you can use to predict what a command is likely to do - if you don't give it any names to work with.</para> - <itemizedlist> - <listitem><para>Most commands will operate on the entire working - directory. This is what the <command role="hg-cmd">hg - add</command> command does, for example.</para> - </listitem> - <listitem><para>If the command has effects that are difficult or - impossible to reverse, it will force you to explicitly - provide at least one name or pattern (see below). This - protects you from accidentally deleting files by running - <command role="hg-cmd">hg remove</command> with no - arguments, for example.</para> - </listitem></itemizedlist> - - <para>It's easy to work around these default behaviours if they - don't suit you. If a command normally operates on the whole - working directory, you can invoke it on just the current - directory and its subdirectories by giving it the name - <quote><filename class="directory">.</filename></quote>.</para> - - &interaction.filenames.wdir-subdir; - - <para>Along the same lines, some commands normally print file - names relative to the root of the repository, even if you're - invoking them from a subdirectory. Such a command will print - file names relative to your subdirectory if you give it explicit - names. Here, we're going to run <command role="hg-cmd">hg - status</command> from a subdirectory, and get it to operate on - the entire working directory while printing file names relative - to our subdirectory, by passing it the output of the <command - role="hg-cmd">hg root</command> command.</para> - - &interaction.filenames.wdir-relname; - - </sect1> - <sect1> - <title>Telling you what's going on</title> - - <para>The <command role="hg-cmd">hg add</command> example in the - preceding section illustrates something else that's helpful - about Mercurial commands. If a command operates on a file that - you didn't name explicitly on the command line, it will usually - print the name of the file, so that you will not be surprised - what's going on.</para> - - <para>The principle here is of <emphasis>least - surprise</emphasis>. If you've exactly named a file on the - command line, there's no point in repeating it back at you. If - Mercurial is acting on a file <emphasis>implicitly</emphasis>, - because you provided no names, or a directory, or a pattern (see - below), it's safest to tell you what it's doing.</para> - - <para>For commands that behave this way, you can silence them - using the <option role="hg-opt-global">-q</option> option. You - can also get them to print the name of every file, even those - you've named explicitly, using the <option - role="hg-opt-global">-v</option> option.</para> - - </sect1> - <sect1> - <title>Using patterns to identify files</title> - - <para>In addition to working with file and directory names, - Mercurial lets you use <emphasis>patterns</emphasis> to identify - files. Mercurial's pattern handling is expressive.</para> - - <para>On Unix-like systems (Linux, MacOS, etc.), the job of - matching file names to patterns normally falls to the shell. On - these systems, you must explicitly tell Mercurial that a name is - a pattern. On Windows, the shell does not expand patterns, so - Mercurial will automatically identify names that are patterns, - and expand them for you.</para> - - <para>To provide a pattern in place of a regular name on the - command line, the mechanism is simple:</para> - <programlisting>syntax:patternbody</programlisting> - <para>That is, a pattern is identified by a short text string that - says what kind of pattern this is, followed by a colon, followed - by the actual pattern.</para> - - <para>Mercurial supports two kinds of pattern syntax. The most - frequently used is called <literal>glob</literal>; this is the - same kind of pattern matching used by the Unix shell, and should - be familiar to Windows command prompt users, too.</para> - - <para>When Mercurial does automatic pattern matching on Windows, - it uses <literal>glob</literal> syntax. You can thus omit the - <quote><literal>glob:</literal></quote> prefix on Windows, but - it's safe to use it, too.</para> - - <para>The <literal>re</literal> syntax is more powerful; it lets - you specify patterns using regular expressions, also known as - regexps.</para> - - <para>By the way, in the examples that follow, notice that I'm - careful to wrap all of my patterns in quote characters, so that - they won't get expanded by the shell before Mercurial sees - them.</para> - - <sect2> - <title>Shell-style <literal>glob</literal> patterns</title> - - <para>This is an overview of the kinds of patterns you can use - when you're matching on glob patterns.</para> - - <para>The <quote><literal>*</literal></quote> character matches - any string, within a single directory.</para> - - &interaction.filenames.glob.star; - - <para>The <quote><literal>**</literal></quote> pattern matches - any string, and crosses directory boundaries. It's not a - standard Unix glob token, but it's accepted by several popular - Unix shells, and is very useful.</para> - - &interaction.filenames.glob.starstar; - - <para>The <quote><literal>?</literal></quote> pattern matches - any single character.</para> - - &interaction.filenames.glob.question; - - <para>The <quote><literal>[</literal></quote> character begins a - <emphasis>character class</emphasis>. This matches any single - character within the class. The class ends with a - <quote><literal>]</literal></quote> character. A class may - contain multiple <emphasis>range</emphasis>s of the form - <quote><literal>a-f</literal></quote>, which is shorthand for - <quote><literal>abcdef</literal></quote>.</para> - - &interaction.filenames.glob.range; - - <para>If the first character after the - <quote><literal>[</literal></quote> in a character class is a - <quote><literal>!</literal></quote>, it - <emphasis>negates</emphasis> the class, making it match any - single character not in the class.</para> - - <para>A <quote><literal>{</literal></quote> begins a group of - subpatterns, where the whole group matches if any subpattern - in the group matches. The <quote><literal>,</literal></quote> - character separates subpatterns, and - <quote><literal>}</literal></quote> ends the group.</para> - - &interaction.filenames.glob.group; - - <sect3> - <title>Watch out!</title> - - <para>Don't forget that if you want to match a pattern in any - directory, you should not be using the - <quote><literal>*</literal></quote> match-any token, as this - will only match within one directory. Instead, use the - <quote><literal>**</literal></quote> token. This small - example illustrates the difference between the two.</para> - - &interaction.filenames.glob.star-starstar; - - </sect3> - </sect2> - <sect2> - <title>Regular expression matching with <literal>re</literal> - patterns</title> - - <para>Mercurial accepts the same regular expression syntax as - the Python programming language (it uses Python's regexp - engine internally). This is based on the Perl language's - regexp syntax, which is the most popular dialect in use (it's - also used in Java, for example).</para> - - <para>I won't discuss Mercurial's regexp dialect in any detail - here, as regexps are not often used. Perl-style regexps are - in any case already exhaustively documented on a multitude of - web sites, and in many books. Instead, I will focus here on a - few things you should know if you find yourself needing to use - regexps with Mercurial.</para> - - <para>A regexp is matched against an entire file name, relative - to the root of the repository. In other words, even if you're - already in subbdirectory <filename - class="directory">foo</filename>, if you want to match files - under this directory, your pattern must start with - <quote><literal>foo/</literal></quote>.</para> - - <para>One thing to note, if you're familiar with Perl-style - regexps, is that Mercurial's are <emphasis>rooted</emphasis>. - That is, a regexp starts matching against the beginning of a - string; it doesn't look for a match anywhere within the - string. To match anywhere in a string, start your pattern - with <quote><literal>.*</literal></quote>.</para> - - </sect2> - </sect1> - <sect1> - <title>Filtering files</title> - - <para>Not only does Mercurial give you a variety of ways to - specify files; it lets you further winnow those files using - <emphasis>filters</emphasis>. Commands that work with file - names accept two filtering options.</para> - <itemizedlist> - <listitem><para><option role="hg-opt-global">-I</option>, or - <option role="hg-opt-global">--include</option>, lets you - specify a pattern that file names must match in order to be - processed.</para> - </listitem> - <listitem><para><option role="hg-opt-global">-X</option>, or - <option role="hg-opt-global">--exclude</option>, gives you a - way to <emphasis>avoid</emphasis> processing files, if they - match this pattern.</para> - </listitem></itemizedlist> - <para>You can provide multiple <option - role="hg-opt-global">-I</option> and <option - role="hg-opt-global">-X</option> options on the command line, - and intermix them as you please. Mercurial interprets the - patterns you provide using glob syntax by default (but you can - use regexps if you need to).</para> - - <para>You can read a <option role="hg-opt-global">-I</option> - filter as <quote>process only the files that match this - filter</quote>.</para> - - &interaction.filenames.filter.include; - - <para>The <option role="hg-opt-global">-X</option> filter is best - read as <quote>process only the files that don't match this - pattern</quote>.</para> - - &interaction.filenames.filter.exclude; - - </sect1> - <sect1> - <title>Ignoring unwanted files and directories</title> - - <para>XXX.</para> - - </sect1> - <sect1 id="sec:names:case"> - <title>Case sensitivity</title> - - <para>If you're working in a mixed development environment that - contains both Linux (or other Unix) systems and Macs or Windows - systems, you should keep in the back of your mind the knowledge - that they treat the case (<quote>N</quote> versus - <quote>n</quote>) of file names in incompatible ways. This is - not very likely to affect you, and it's easy to deal with if it - does, but it could surprise you if you don't know about - it.</para> - - <para>Operating systems and filesystems differ in the way they - handle the <emphasis>case</emphasis> of characters in file and - directory names. There are three common ways to handle case in - names.</para> - <itemizedlist> - <listitem><para>Completely case insensitive. Uppercase and - lowercase versions of a letter are treated as identical, - both when creating a file and during subsequent accesses. - This is common on older DOS-based systems.</para> - </listitem> - <listitem><para>Case preserving, but insensitive. When a file - or directory is created, the case of its name is stored, and - can be retrieved and displayed by the operating system. - When an existing file is being looked up, its case is - ignored. This is the standard arrangement on Windows and - MacOS. The names <filename>foo</filename> and - <filename>FoO</filename> identify the same file. This - treatment of uppercase and lowercase letters as - interchangeable is also referred to as <emphasis>case - folding</emphasis>.</para> - </listitem> - <listitem><para>Case sensitive. The case of a name is - significant at all times. The names <filename>foo</filename> - and {FoO} identify different files. This is the way Linux - and Unix systems normally work.</para> - </listitem></itemizedlist> - - <para>On Unix-like systems, it is possible to have any or all of - the above ways of handling case in action at once. For example, - if you use a USB thumb drive formatted with a FAT32 filesystem - on a Linux system, Linux will handle names on that filesystem in - a case preserving, but insensitive, way.</para> - - <sect2> - <title>Safe, portable repository storage</title> - - <para>Mercurial's repository storage mechanism is <emphasis>case - safe</emphasis>. It translates file names so that they can - be safely stored on both case sensitive and case insensitive - filesystems. This means that you can use normal file copying - tools to transfer a Mercurial repository onto, for example, a - USB thumb drive, and safely move that drive and repository - back and forth between a Mac, a PC running Windows, and a - Linux box.</para> - - </sect2> - <sect2> - <title>Detecting case conflicts</title> - - <para>When operating in the working directory, Mercurial honours - the naming policy of the filesystem where the working - directory is located. If the filesystem is case preserving, - but insensitive, Mercurial will treat names that differ only - in case as the same.</para> - - <para>An important aspect of this approach is that it is - possible to commit a changeset on a case sensitive (typically - Linux or Unix) filesystem that will cause trouble for users on - case insensitive (usually Windows and MacOS) users. If a - Linux user commits changes to two files, one named - <filename>myfile.c</filename> and the other named - <filename>MyFile.C</filename>, they will be stored correctly - in the repository. And in the working directories of other - Linux users, they will be correctly represented as separate - files.</para> - - <para>If a Windows or Mac user pulls this change, they will not - initially have a problem, because Mercurial's repository - storage mechanism is case safe. However, once they try to - <command role="hg-cmd">hg update</command> the working - directory to that changeset, or <command role="hg-cmd">hg - merge</command> with that changeset, Mercurial will spot the - conflict between the two file names that the filesystem would - treat as the same, and forbid the update or merge from - occurring.</para> - - </sect2> - <sect2> - <title>Fixing a case conflict</title> - - <para>If you are using Windows or a Mac in a mixed environment - where some of your collaborators are using Linux or Unix, and - Mercurial reports a case folding conflict when you try to - <command role="hg-cmd">hg update</command> or <command - role="hg-cmd">hg merge</command>, the procedure to fix the - problem is simple.</para> - - <para>Just find a nearby Linux or Unix box, clone the problem - repository onto it, and use Mercurial's <command - role="hg-cmd">hg rename</command> command to change the - names of any offending files or directories so that they will - no longer cause case folding conflicts. Commit this change, - <command role="hg-cmd">hg pull</command> or <command - role="hg-cmd">hg push</command> it across to your Windows or - MacOS system, and <command role="hg-cmd">hg update</command> - to the revision with the non-conflicting names.</para> - - <para>The changeset with case-conflicting names will remain in - your project's history, and you still won't be able to - <command role="hg-cmd">hg update</command> your working - directory to that changeset on a Windows or MacOS system, but - you can continue development unimpeded.</para> - - <note> - <para> Prior to version 0.9.3, Mercurial did not use a case - safe repository storage mechanism, and did not detect case - folding conflicts. If you are using an older version of - Mercurial on Windows or MacOS, I strongly recommend that you - upgrade.</para> - </note> - - </sect2> - </sect1> -</chapter> - -<!-- -local variables: -sgml-parent-document: ("00book.xml" "book" "chapter") -end: --->
--- a/en/ch08-branch.xml Wed Mar 18 00:08:22 2009 -0700 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,533 +0,0 @@ -<!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : --> - -<chapter id="chap:branch"> - <?dbhtml filename="managing-releases-and-branchy-development.html"?> - <title>Managing releases and branchy development</title> - - <para>Mercurial provides several mechanisms for you to manage a - project that is making progress on multiple fronts at once. To - understand these mechanisms, let's first take a brief look at a - fairly normal software project structure.</para> - - <para>Many software projects issue periodic <quote>major</quote> - releases that contain substantial new features. In parallel, they - may issue <quote>minor</quote> releases. These are usually - identical to the major releases off which they're based, but with - a few bugs fixed.</para> - - <para>In this chapter, we'll start by talking about how to keep - records of project milestones such as releases. We'll then - continue on to talk about the flow of work between different - phases of a project, and how Mercurial can help you to isolate and - manage this work.</para> - - <sect1> - <title>Giving a persistent name to a revision</title> - - <para>Once you decide that you'd like to call a particular - revision a <quote>release</quote>, it's a good idea to record - the identity of that revision. This will let you reproduce that - release at a later date, for whatever purpose you might need at - the time (reproducing a bug, porting to a new platform, etc). - &interaction.tag.init;</para> - - <para>Mercurial lets you give a permanent name to any revision - using the <command role="hg-cmd">hg tag</command> command. Not - surprisingly, these names are called <quote>tags</quote>.</para> - - &interaction.tag.tag; - - <para>A tag is nothing more than a <quote>symbolic name</quote> - for a revision. Tags exist purely for your convenience, so that - you have a handy permanent way to refer to a revision; Mercurial - doesn't interpret the tag names you use in any way. Neither - does Mercurial place any restrictions on the name of a tag, - beyond a few that are necessary to ensure that a tag can be - parsed unambiguously. A tag name cannot contain any of the - following characters:</para> - <itemizedlist> - <listitem><para>Colon (ASCII 58, - <quote><literal>:</literal></quote>)</para> - </listitem> - <listitem><para>Carriage return (ASCII 13, - <quote><literal>\r</literal></quote>)</para> - </listitem> - <listitem><para>Newline (ASCII 10, - <quote><literal>\n</literal></quote>)</para> - </listitem></itemizedlist> - - <para>You can use the <command role="hg-cmd">hg tags</command> - command to display the tags present in your repository. In the - output, each tagged revision is identified first by its name, - then by revision number, and finally by the unique hash of the - revision.</para> - - &interaction.tag.tags; - - <para>Notice that <literal>tip</literal> is listed in the output - of <command role="hg-cmd">hg tags</command>. The - <literal>tip</literal> tag is a special <quote>floating</quote> - tag, which always identifies the newest revision in the - repository.</para> - - <para>In the output of the <command role="hg-cmd">hg - tags</command> command, tags are listed in reverse order, by - revision number. This usually means that recent tags are listed - before older tags. It also means that <literal>tip</literal> is - always going to be the first tag listed in the output of - <command role="hg-cmd">hg tags</command>.</para> - - <para>When you run <command role="hg-cmd">hg log</command>, if it - displays a revision that has tags associated with it, it will - print those tags.</para> - - &interaction.tag.log; - - <para>Any time you need to provide a revision ID to a Mercurial - command, the command will accept a tag name in its place. - Internally, Mercurial will translate your tag name into the - corresponding revision ID, then use that.</para> - - &interaction.tag.log.v1.0; - - <para>There's no limit on the number of tags you can have in a - repository, or on the number of tags that a single revision can - have. As a practical matter, it's not a great idea to have - <quote>too many</quote> (a number which will vary from project - to project), simply because tags are supposed to help you to - find revisions. If you have lots of tags, the ease of using - them to identify revisions diminishes rapidly.</para> - - <para>For example, if your project has milestones as frequent as - every few days, it's perfectly reasonable to tag each one of - those. But if you have a continuous build system that makes - sure every revision can be built cleanly, you'd be introducing a - lot of noise if you were to tag every clean build. Instead, you - could tag failed builds (on the assumption that they're rare!), - or simply not use tags to track buildability.</para> - - <para>If you want to remove a tag that you no longer want, use - <command role="hg-cmd">hg tag --remove</command>.</para> - - &interaction.tag.remove; - - <para>You can also modify a tag at any time, so that it identifies - a different revision, by simply issuing a new <command - role="hg-cmd">hg tag</command> command. You'll have to use the - <option role="hg-opt-tag">-f</option> option to tell Mercurial - that you <emphasis>really</emphasis> want to update the - tag.</para> - - &interaction.tag.replace; - - <para>There will still be a permanent record of the previous - identity of the tag, but Mercurial will no longer use it. - There's thus no penalty to tagging the wrong revision; all you - have to do is turn around and tag the correct revision once you - discover your error.</para> - - <para>Mercurial stores tags in a normal revision-controlled file - in your repository. If you've created any tags, you'll find - them in a file named <filename - role="special">.hgtags</filename>. When you run the <command - role="hg-cmd">hg tag</command> command, Mercurial modifies - this file, then automatically commits the change to it. This - means that every time you run <command role="hg-cmd">hg - tag</command>, you'll see a corresponding changeset in the - output of <command role="hg-cmd">hg log</command>.</para> - - &interaction.tag.tip; - - <sect2> - <title>Handling tag conflicts during a merge</title> - - <para>You won't often need to care about the <filename - role="special">.hgtags</filename> file, but it sometimes - makes its presence known during a merge. The format of the - file is simple: it consists of a series of lines. Each line - starts with a changeset hash, followed by a space, followed by - the name of a tag.</para> - - <para>If you're resolving a conflict in the <filename - role="special">.hgtags</filename> file during a merge, - there's one twist to modifying the <filename - role="special">.hgtags</filename> file: when Mercurial is - parsing the tags in a repository, it - <emphasis>never</emphasis> reads the working copy of the - <filename role="special">.hgtags</filename> file. Instead, it - reads the <emphasis>most recently committed</emphasis> - revision of the file.</para> - - <para>An unfortunate consequence of this design is that you - can't actually verify that your merged <filename - role="special">.hgtags</filename> file is correct until - <emphasis>after</emphasis> you've committed a change. So if - you find yourself resolving a conflict on <filename - role="special">.hgtags</filename> during a merge, be sure to - run <command role="hg-cmd">hg tags</command> after you commit. - If it finds an error in the <filename - role="special">.hgtags</filename> file, it will report the - location of the error, which you can then fix and commit. You - should then run <command role="hg-cmd">hg tags</command> - again, just to be sure that your fix is correct.</para> - - </sect2> - <sect2> - <title>Tags and cloning</title> - - <para>You may have noticed that the <command role="hg-cmd">hg - clone</command> command has a <option - role="hg-opt-clone">-r</option> option that lets you clone - an exact copy of the repository as of a particular changeset. - The new clone will not contain any project history that comes - after the revision you specified. This has an interaction - with tags that can surprise the unwary.</para> - - <para>Recall that a tag is stored as a revision to the <filename - role="special">.hgtags</filename> file, so that when you - create a tag, the changeset in which it's recorded necessarily - refers to an older changeset. When you run <command - role="hg-cmd">hg clone -r foo</command> to clone a - repository as of tag <literal>foo</literal>, the new clone - <emphasis>will not contain the history that created the - tag</emphasis> that you used to clone the repository. The - result is that you'll get exactly the right subset of the - project's history in the new repository, but - <emphasis>not</emphasis> the tag you might have - expected.</para> - - </sect2> - <sect2> - <title>When permanent tags are too much</title> - - <para>Since Mercurial's tags are revision controlled and carried - around with a project's history, everyone you work with will - see the tags you create. But giving names to revisions has - uses beyond simply noting that revision - <literal>4237e45506ee</literal> is really - <literal>v2.0.2</literal>. If you're trying to track down a - subtle bug, you might want a tag to remind you of something - like <quote>Anne saw the symptoms with this - revision</quote>.</para> - - <para>For cases like this, what you might want to use are - <emphasis>local</emphasis> tags. You can create a local tag - with the <option role="hg-opt-tag">-l</option> option to the - <command role="hg-cmd">hg tag</command> command. This will - store the tag in a file called <filename - role="special">.hg/localtags</filename>. Unlike <filename - role="special">.hgtags</filename>, <filename - role="special">.hg/localtags</filename> is not revision - controlled. Any tags you create using <option - role="hg-opt-tag">-l</option> remain strictly local to the - repository you're currently working in.</para> - - </sect2> - </sect1> - <sect1> - <title>The flow of changes&emdash;big picture vs. little</title> - - <para>To return to the outline I sketched at the beginning of a - chapter, let's think about a project that has multiple - concurrent pieces of work under development at once.</para> - - <para>There might be a push for a new <quote>main</quote> release; - a new minor bugfix release to the last main release; and an - unexpected <quote>hot fix</quote> to an old release that is now - in maintenance mode.</para> - - <para>The usual way people refer to these different concurrent - directions of development is as <quote>branches</quote>. - However, we've already seen numerous times that Mercurial treats - <emphasis>all of history</emphasis> as a series of branches and - merges. Really, what we have here is two ideas that are - peripherally related, but which happen to share a name.</para> - <itemizedlist> - <listitem><para><quote>Big picture</quote> branches represent - the sweep of a project's evolution; people give them names, - and talk about them in conversation.</para> - </listitem> - <listitem><para><quote>Little picture</quote> branches are - artefacts of the day-to-day activity of developing and - merging changes. They expose the narrative of how the code - was developed.</para> - </listitem></itemizedlist> - - </sect1> - <sect1> - <title>Managing big-picture branches in repositories</title> - - <para>The easiest way to isolate a <quote>big picture</quote> - branch in Mercurial is in a dedicated repository. If you have - an existing shared repository&emdash;let's call it - <literal>myproject</literal>&emdash;that reaches a - <quote>1.0</quote> milestone, you can start to prepare for - future maintenance releases on top of version 1.0 by tagging the - revision from which you prepared the 1.0 release.</para> - - &interaction.branch-repo.tag; - - <para>You can then clone a new shared - <literal>myproject-1.0.1</literal> repository as of that - tag.</para> - - &interaction.branch-repo.clone; - - <para>Afterwards, if someone needs to work on a bug fix that ought - to go into an upcoming 1.0.1 minor release, they clone the - <literal>myproject-1.0.1</literal> repository, make their - changes, and push them back.</para> - - &interaction.branch-repo.bugfix; - - <para>Meanwhile, development for - the next major release can continue, isolated and unabated, in - the <literal>myproject</literal> repository.</para> - - &interaction.branch-repo.new; - - </sect1> - <sect1> - <title>Don't repeat yourself: merging across branches</title> - - <para>In many cases, if you have a bug to fix on a maintenance - branch, the chances are good that the bug exists on your - project's main branch (and possibly other maintenance branches, - too). It's a rare developer who wants to fix the same bug - multiple times, so let's look at a few ways that Mercurial can - help you to manage these bugfixes without duplicating your - work.</para> - - <para>In the simplest instance, all you need to do is pull changes - from your maintenance branch into your local clone of the target - branch.</para> - - &interaction.branch-repo.pull; - - <para>You'll then need to merge the heads of the two branches, and - push back to the main branch.</para> - - &interaction.branch-repo.merge; - - </sect1> - <sect1> - <title>Naming branches within one repository</title> - - <para>In most instances, isolating branches in repositories is the - right approach. Its simplicity makes it easy to understand; and - so it's hard to make mistakes. There's a one-to-one - relationship between branches you're working in and directories - on your system. This lets you use normal (non-Mercurial-aware) - tools to work on files within a branch/repository.</para> - - <para>If you're more in the <quote>power user</quote> category - (<emphasis>and</emphasis> your collaborators are too), there is - an alternative way of handling branches that you can consider. - I've already mentioned the human-level distinction between - <quote>small picture</quote> and <quote>big picture</quote> - branches. While Mercurial works with multiple <quote>small - picture</quote> branches in a repository all the time (for - example after you pull changes in, but before you merge them), - it can <emphasis>also</emphasis> work with multiple <quote>big - picture</quote> branches.</para> - - <para>The key to working this way is that Mercurial lets you - assign a persistent <emphasis>name</emphasis> to a branch. - There always exists a branch named <literal>default</literal>. - Even before you start naming branches yourself, you can find - traces of the <literal>default</literal> branch if you look for - them.</para> - - <para>As an example, when you run the <command role="hg-cmd">hg - commit</command> command, and it pops up your editor so that - you can enter a commit message, look for a line that contains - the text <quote><literal>HG: branch default</literal></quote> at - the bottom. This is telling you that your commit will occur on - the branch named <literal>default</literal>.</para> - - <para>To start working with named branches, use the <command - role="hg-cmd">hg branches</command> command. This command - lists the named branches already present in your repository, - telling you which changeset is the tip of each.</para> - - &interaction.branch-named.branches; - - <para>Since you haven't created any named branches yet, the only - one that exists is <literal>default</literal>.</para> - - <para>To find out what the <quote>current</quote> branch is, run - the <command role="hg-cmd">hg branch</command> command, giving - it no arguments. This tells you what branch the parent of the - current changeset is on.</para> - - &interaction.branch-named.branch; - - <para>To create a new branch, run the <command role="hg-cmd">hg - branch</command> command again. This time, give it one - argument: the name of the branch you want to create.</para> - - &interaction.branch-named.create; - - <para>After you've created a branch, you might wonder what effect - the <command role="hg-cmd">hg branch</command> command has had. - What do the <command role="hg-cmd">hg status</command> and - <command role="hg-cmd">hg tip</command> commands report?</para> - - &interaction.branch-named.status; - - <para>Nothing has changed in the - working directory, and there's been no new history created. As - this suggests, running the <command role="hg-cmd">hg - branch</command> command has no permanent effect; it only - tells Mercurial what branch name to use the - <emphasis>next</emphasis> time you commit a changeset.</para> - - <para>When you commit a change, Mercurial records the name of the - branch on which you committed. Once you've switched from the - <literal>default</literal> branch to another and committed, - you'll see the name of the new branch show up in the output of - <command role="hg-cmd">hg log</command>, <command - role="hg-cmd">hg tip</command>, and other commands that - display the same kind of output.</para> - - &interaction.branch-named.commit; - - <para>The <command role="hg-cmd">hg log</command>-like commands - will print the branch name of every changeset that's not on the - <literal>default</literal> branch. As a result, if you never - use named branches, you'll never see this information.</para> - - <para>Once you've named a branch and committed a change with that - name, every subsequent commit that descends from that change - will inherit the same branch name. You can change the name of a - branch at any time, using the <command role="hg-cmd">hg - branch</command> command.</para> - - &interaction.branch-named.rebranch; - - <para>In practice, this is something you won't do very often, as - branch names tend to have fairly long lifetimes. (This isn't a - rule, just an observation.)</para> - - </sect1> - <sect1> - <title>Dealing with multiple named branches in a - repository</title> - - <para>If you have more than one named branch in a repository, - Mercurial will remember the branch that your working directory - on when you start a command like <command role="hg-cmd">hg - update</command> or <command role="hg-cmd">hg pull - -u</command>. It will update the working directory to the tip - of this branch, no matter what the <quote>repo-wide</quote> tip - is. To update to a revision that's on a different named branch, - you may need to use the <option role="hg-opt-update">-C</option> - option to <command role="hg-cmd">hg update</command>.</para> - - <para>This behaviour is a little subtle, so let's see it in - action. First, let's remind ourselves what branch we're - currently on, and what branches are in our repository.</para> - - &interaction.branch-named.parents; - - <para>We're on the <literal>bar</literal> branch, but there also - exists an older <command role="hg-cmd">hg foo</command> - branch.</para> - - <para>We can <command role="hg-cmd">hg update</command> back and - forth between the tips of the <literal>foo</literal> and - <literal>bar</literal> branches without needing to use the - <option role="hg-opt-update">-C</option> option, because this - only involves going backwards and forwards linearly through our - change history.</para> - - &interaction.branch-named.update-switchy; - - <para>If we go back to the <literal>foo</literal> branch and then - run <command role="hg-cmd">hg update</command>, it will keep us - on <literal>foo</literal>, not move us to the tip of - <literal>bar</literal>.</para> - - &interaction.branch-named.update-nothing; - - <para>Committing a new change on the <literal>foo</literal> branch - introduces a new head.</para> - - &interaction.branch-named.foo-commit; - - </sect1> - <sect1> - <title>Branch names and merging</title> - - <para>As you've probably noticed, merges in Mercurial are not - symmetrical. Let's say our repository has two heads, 17 and 23. - If I <command role="hg-cmd">hg update</command> to 17 and then - <command role="hg-cmd">hg merge</command> with 23, Mercurial - records 17 as the first parent of the merge, and 23 as the - second. Whereas if I <command role="hg-cmd">hg update</command> - to 23 and then <command role="hg-cmd">hg merge</command> with - 17, it records 23 as the first parent, and 17 as the - second.</para> - - <para>This affects Mercurial's choice of branch name when you - merge. After a merge, Mercurial will retain the branch name of - the first parent when you commit the result of the merge. If - your first parent's branch name is <literal>foo</literal>, and - you merge with <literal>bar</literal>, the branch name will - still be <literal>foo</literal> after you merge.</para> - - <para>It's not unusual for a repository to contain multiple heads, - each with the same branch name. Let's say I'm working on the - <literal>foo</literal> branch, and so are you. We commit - different changes; I pull your changes; I now have two heads, - each claiming to be on the <literal>foo</literal> branch. The - result of a merge will be a single head on the - <literal>foo</literal> branch, as you might hope.</para> - - <para>But if I'm working on the <literal>bar</literal> branch, and - I merge work from the <literal>foo</literal> branch, the result - will remain on the <literal>bar</literal> branch.</para> - - &interaction.branch-named.merge; - - <para>To give a more concrete example, if I'm working on the - <literal>bleeding-edge</literal> branch, and I want to bring in - the latest fixes from the <literal>stable</literal> branch, - Mercurial will choose the <quote>right</quote> - (<literal>bleeding-edge</literal>) branch name when I pull and - merge from <literal>stable</literal>.</para> - - </sect1> - <sect1> - <title>Branch naming is generally useful</title> - - <para>You shouldn't think of named branches as applicable only to - situations where you have multiple long-lived branches - cohabiting in a single repository. They're very useful even in - the one-branch-per-repository case.</para> - - <para>In the simplest case, giving a name to each branch gives you - a permanent record of which branch a changeset originated on. - This gives you more context when you're trying to follow the - history of a long-lived branchy project.</para> - - <para>If you're working with shared repositories, you can set up a - <literal role="hook">pretxnchangegroup</literal> hook on each - that will block incoming changes that have the - <quote>wrong</quote> branch name. This provides a simple, but - effective, defence against people accidentally pushing changes - from a <quote>bleeding edge</quote> branch to a - <quote>stable</quote> branch. Such a hook might look like this - inside the shared repo's <filename role="special"> - /.hgrc</filename>.</para> - <programlisting>[hooks] -pretxnchangegroup.branch = hg heads --template '{branches} ' | grep mybranch</programlisting> - - </sect1> -</chapter> - -<!-- -local variables: -sgml-parent-document: ("00book.xml" "book" "chapter") -end: --->
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/en/ch08-undo.xml Thu Mar 19 20:54:12 2009 -0700 @@ -0,0 +1,1072 @@ +<!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : --> + +<chapter id="chap:undo"> + <?dbhtml filename="finding-and-fixing-mistakes.html"?> + <title>Finding and fixing mistakes</title> + + <para>To err might be human, but to really handle the consequences + well takes a top-notch revision control system. In this chapter, + we'll discuss some of the techniques you can use when you find + that a problem has crept into your project. Mercurial has some + highly capable features that will help you to isolate the sources + of problems, and to handle them appropriately.</para> + + <sect1> + <title>Erasing local history</title> + + <sect2> + <title>The accidental commit</title> + + <para>I have the occasional but persistent problem of typing + rather more quickly than I can think, which sometimes results + in me committing a changeset that is either incomplete or + plain wrong. In my case, the usual kind of incomplete + changeset is one in which I've created a new source file, but + forgotten to <command role="hg-cmd">hg add</command> it. A + <quote>plain wrong</quote> changeset is not as common, but no + less annoying.</para> + + </sect2> + <sect2 id="sec:undo:rollback"> + <title>Rolling back a transaction</title> + + <para>In section <xref linkend="sec:concepts:txn"/>, I mentioned + that Mercurial treats each modification of a repository as a + <emphasis>transaction</emphasis>. Every time you commit a + changeset or pull changes from another repository, Mercurial + remembers what you did. You can undo, or <emphasis>roll + back</emphasis>, exactly one of these actions using the + <command role="hg-cmd">hg rollback</command> command. (See + section <xref linkend="sec:undo:rollback-after-push"/> for an + important caveat about the use of this command.)</para> + + <para>Here's a mistake that I often find myself making: + committing a change in which I've created a new file, but + forgotten to <command role="hg-cmd">hg add</command> + it.</para> + + &interaction.rollback.commit; + + <para>Looking at the output of <command role="hg-cmd">hg + status</command> after the commit immediately confirms the + error.</para> + + &interaction.rollback.status; + + <para>The commit captured the changes to the file + <filename>a</filename>, but not the new file + <filename>b</filename>. If I were to push this changeset to a + repository that I shared with a colleague, the chances are + high that something in <filename>a</filename> would refer to + <filename>b</filename>, which would not be present in their + repository when they pulled my changes. I would thus become + the object of some indignation.</para> + + <para>However, luck is with me&emdash;I've caught my error + before I pushed the changeset. I use the <command + role="hg-cmd">hg rollback</command> command, and Mercurial + makes that last changeset vanish.</para> + + &interaction.rollback.rollback; + + <para>Notice that the changeset is no longer present in the + repository's history, and the working directory once again + thinks that the file <filename>a</filename> is modified. The + commit and rollback have left the working directory exactly as + it was prior to the commit; the changeset has been completely + erased. I can now safely <command role="hg-cmd">hg + add</command> the file <filename>b</filename>, and rerun my + commit.</para> + + &interaction.rollback.add; + + </sect2> + <sect2> + <title>The erroneous pull</title> + + <para>It's common practice with Mercurial to maintain separate + development branches of a project in different repositories. + Your development team might have one shared repository for + your project's <quote>0.9</quote> release, and another, + containing different changes, for the <quote>1.0</quote> + release.</para> + + <para>Given this, you can imagine that the consequences could be + messy if you had a local <quote>0.9</quote> repository, and + accidentally pulled changes from the shared <quote>1.0</quote> + repository into it. At worst, you could be paying + insufficient attention, and push those changes into the shared + <quote>0.9</quote> tree, confusing your entire team (but don't + worry, we'll return to this horror scenario later). However, + it's more likely that you'll notice immediately, because + Mercurial will display the URL it's pulling from, or you will + see it pull a suspiciously large number of changes into the + repository.</para> + + <para>The <command role="hg-cmd">hg rollback</command> command + will work nicely to expunge all of the changesets that you + just pulled. Mercurial groups all changes from one <command + role="hg-cmd">hg pull</command> into a single transaction, + so one <command role="hg-cmd">hg rollback</command> is all you + need to undo this mistake.</para> + + </sect2> + <sect2 id="sec:undo:rollback-after-push"> + <title>Rolling back is useless once you've pushed</title> + + <para>The value of the <command role="hg-cmd">hg + rollback</command> command drops to zero once you've pushed + your changes to another repository. Rolling back a change + makes it disappear entirely, but <emphasis>only</emphasis> in + the repository in which you perform the <command + role="hg-cmd">hg rollback</command>. Because a rollback + eliminates history, there's no way for the disappearance of a + change to propagate between repositories.</para> + + <para>If you've pushed a change to another + repository&emdash;particularly if it's a shared + repository&emdash;it has essentially <quote>escaped into the + wild,</quote> and you'll have to recover from your mistake + in a different way. What will happen if you push a changeset + somewhere, then roll it back, then pull from the repository + you pushed to, is that the changeset will reappear in your + repository.</para> + + <para>(If you absolutely know for sure that the change you want + to roll back is the most recent change in the repository that + you pushed to, <emphasis>and</emphasis> you know that nobody + else could have pulled it from that repository, you can roll + back the changeset there, too, but you really should really + not rely on this working reliably. If you do this, sooner or + later a change really will make it into a repository that you + don't directly control (or have forgotten about), and come + back to bite you.)</para> + + </sect2> + <sect2> + <title>You can only roll back once</title> + + <para>Mercurial stores exactly one transaction in its + transaction log; that transaction is the most recent one that + occurred in the repository. This means that you can only roll + back one transaction. If you expect to be able to roll back + one transaction, then its predecessor, this is not the + behaviour you will get.</para> + + &interaction.rollback.twice; + + <para>Once you've rolled back one transaction in a repository, + you can't roll back again in that repository until you perform + another commit or pull.</para> + + </sect2> + </sect1> + <sect1> + <title>Reverting the mistaken change</title> + + <para>If you make a modification to a file, and decide that you + really didn't want to change the file at all, and you haven't + yet committed your changes, the <command role="hg-cmd">hg + revert</command> command is the one you'll need. It looks at + the changeset that's the parent of the working directory, and + restores the contents of the file to their state as of that + changeset. (That's a long-winded way of saying that, in the + normal case, it undoes your modifications.)</para> + + <para>Let's illustrate how the <command role="hg-cmd">hg + revert</command> command works with yet another small example. + We'll begin by modifying a file that Mercurial is already + tracking.</para> + + &interaction.daily.revert.modify; + + <para>If we don't + want that change, we can simply <command role="hg-cmd">hg + revert</command> the file.</para> + + &interaction.daily.revert.unmodify; + + <para>The <command role="hg-cmd">hg revert</command> command + provides us with an extra degree of safety by saving our + modified file with a <filename>.orig</filename> + extension.</para> + + &interaction.daily.revert.status; + + <para>Here is a summary of the cases that the <command + role="hg-cmd">hg revert</command> command can deal with. We + will describe each of these in more detail in the section that + follows.</para> + <itemizedlist> + <listitem><para>If you modify a file, it will restore the file + to its unmodified state.</para> + </listitem> + <listitem><para>If you <command role="hg-cmd">hg add</command> a + file, it will undo the <quote>added</quote> state of the + file, but leave the file itself untouched.</para> + </listitem> + <listitem><para>If you delete a file without telling Mercurial, + it will restore the file to its unmodified contents.</para> + </listitem> + <listitem><para>If you use the <command role="hg-cmd">hg + remove</command> command to remove a file, it will undo + the <quote>removed</quote> state of the file, and restore + the file to its unmodified contents.</para> + </listitem></itemizedlist> + + <sect2 id="sec:undo:mgmt"> + <title>File management errors</title> + + <para>The <command role="hg-cmd">hg revert</command> command is + useful for more than just modified files. It lets you reverse + the results of all of Mercurial's file management + commands&emdash;<command role="hg-cmd">hg add</command>, + <command role="hg-cmd">hg remove</command>, and so on.</para> + + <para>If you <command role="hg-cmd">hg add</command> a file, + then decide that in fact you don't want Mercurial to track it, + use <command role="hg-cmd">hg revert</command> to undo the + add. Don't worry; Mercurial will not modify the file in any + way. It will just <quote>unmark</quote> the file.</para> + + &interaction.daily.revert.add; + + <para>Similarly, if you ask Mercurial to <command + role="hg-cmd">hg remove</command> a file, you can use + <command role="hg-cmd">hg revert</command> to restore it to + the contents it had as of the parent of the working directory. + &interaction.daily.revert.remove; This works just as + well for a file that you deleted by hand, without telling + Mercurial (recall that in Mercurial terminology, this kind of + file is called <quote>missing</quote>).</para> + + &interaction.daily.revert.missing; + + <para>If you revert a <command role="hg-cmd">hg copy</command>, + the copied-to file remains in your working directory + afterwards, untracked. Since a copy doesn't affect the + copied-from file in any way, Mercurial doesn't do anything + with the copied-from file.</para> + + &interaction.daily.revert.copy; + + <sect3> + <title>A slightly special case: reverting a rename</title> + + <para>If you <command role="hg-cmd">hg rename</command> a + file, there is one small detail that you should remember. + When you <command role="hg-cmd">hg revert</command> a + rename, it's not enough to provide the name of the + renamed-to file, as you can see here.</para> + + &interaction.daily.revert.rename; + + <para>As you can see from the output of <command + role="hg-cmd">hg status</command>, the renamed-to file is + no longer identified as added, but the + renamed-<emphasis>from</emphasis> file is still removed! + This is counter-intuitive (at least to me), but at least + it's easy to deal with.</para> + + &interaction.daily.revert.rename-orig; + + <para>So remember, to revert a <command role="hg-cmd">hg + rename</command>, you must provide + <emphasis>both</emphasis> the source and destination + names.</para> + + <para>% TODO: the output doesn't look like it will be + removed!</para> + + <para>(By the way, if you rename a file, then modify the + renamed-to file, then revert both components of the rename, + when Mercurial restores the file that was removed as part of + the rename, it will be unmodified. If you need the + modifications in the renamed-to file to show up in the + renamed-from file, don't forget to copy them over.)</para> + + <para>These fiddly aspects of reverting a rename arguably + constitute a small bug in Mercurial.</para> + + </sect3> + </sect2> + </sect1> + <sect1> + <title>Dealing with committed changes</title> + + <para>Consider a case where you have committed a change $a$, and + another change $b$ on top of it; you then realise that change + $a$ was incorrect. Mercurial lets you <quote>back out</quote> + an entire changeset automatically, and building blocks that let + you reverse part of a changeset by hand.</para> + + <para>Before you read this section, here's something to keep in + mind: the <command role="hg-cmd">hg backout</command> command + undoes changes by <emphasis>adding</emphasis> history, not by + modifying or erasing it. It's the right tool to use if you're + fixing bugs, but not if you're trying to undo some change that + has catastrophic consequences. To deal with those, see section + <xref linkend="sec:undo:aaaiiieee"/>.</para> + + <sect2> + <title>Backing out a changeset</title> + + <para>The <command role="hg-cmd">hg backout</command> command + lets you <quote>undo</quote> the effects of an entire + changeset in an automated fashion. Because Mercurial's + history is immutable, this command <emphasis>does + not</emphasis> get rid of the changeset you want to undo. + Instead, it creates a new changeset that + <emphasis>reverses</emphasis> the effect of the to-be-undone + changeset.</para> + + <para>The operation of the <command role="hg-cmd">hg + backout</command> command is a little intricate, so let's + illustrate it with some examples. First, we'll create a + repository with some simple changes.</para> + + &interaction.backout.init; + + <para>The <command role="hg-cmd">hg backout</command> command + takes a single changeset ID as its argument; this is the + changeset to back out. Normally, <command role="hg-cmd">hg + backout</command> will drop you into a text editor to write + a commit message, so you can record why you're backing the + change out. In this example, we provide a commit message on + the command line using the <option + role="hg-opt-backout">-m</option> option.</para> + + </sect2> + <sect2> + <title>Backing out the tip changeset</title> + + <para>We're going to start by backing out the last changeset we + committed.</para> + + &interaction.backout.simple; + + <para>You can see that the second line from + <filename>myfile</filename> is no longer present. Taking a + look at the output of <command role="hg-cmd">hg log</command> + gives us an idea of what the <command role="hg-cmd">hg + backout</command> command has done. + &interaction.backout.simple.log; Notice that the new changeset + that <command role="hg-cmd">hg backout</command> has created + is a child of the changeset we backed out. It's easier to see + this in figure <xref + linkend="fig:undo:backout"/>, which presents a graphical + view of the change history. As you can see, the history is + nice and linear.</para> + + <informalfigure id="fig:undo:backout"> + <mediaobject><imageobject><imagedata + fileref="undo-simple"/></imageobject><textobject><phrase>XXX + add text</phrase></textobject><caption><para>Backing out + a change using the <command role="hg-cmd">hg + backout</command> + command</para></caption></mediaobject> + + </informalfigure> + + </sect2> + <sect2> + <title>Backing out a non-tip change</title> + + <para>If you want to back out a change other than the last one + you committed, pass the <option + role="hg-opt-backout">--merge</option> option to the + <command role="hg-cmd">hg backout</command> command.</para> + + &interaction.backout.non-tip.clone; + + <para>This makes backing out any changeset a + <quote>one-shot</quote> operation that's usually simple and + fast.</para> + + &interaction.backout.non-tip.backout; + + <para>If you take a look at the contents of + <filename>myfile</filename> after the backout finishes, you'll + see that the first and third changes are present, but not the + second.</para> + + &interaction.backout.non-tip.cat; + + <para>As the graphical history in figure <xref + linkend="fig:undo:backout-non-tip"/> illustrates, Mercurial + actually commits <emphasis>two</emphasis> changes in this kind + of situation (the box-shaped nodes are the ones that Mercurial + commits automatically). Before Mercurial begins the backout + process, it first remembers what the current parent of the + working directory is. It then backs out the target changeset, + and commits that as a changeset. Finally, it merges back to + the previous parent of the working directory, and commits the + result of the merge.</para> + + <para>% TODO: to me it looks like mercurial doesn't commit the + second merge automatically!</para> + + <informalfigure id="fig:undo:backout-non-tip"> + <mediaobject><imageobject><imagedata + fileref="undo-non-tip"/></imageobject><textobject><phrase>XXX + add text</phrase></textobject><caption><para>Automated + backout of a non-tip change using the <command + role="hg-cmd">hg backout</command> + command</para></caption></mediaobject> + </informalfigure> + + <para>The result is that you end up <quote>back where you + were</quote>, only with some extra history that undoes the + effect of the changeset you wanted to back out.</para> + + <sect3> + <title>Always use the <option + role="hg-opt-backout">--merge</option> option</title> + + <para>In fact, since the <option + role="hg-opt-backout">--merge</option> option will do the + <quote>right thing</quote> whether or not the changeset + you're backing out is the tip (i.e. it won't try to merge if + it's backing out the tip, since there's no need), you should + <emphasis>always</emphasis> use this option when you run the + <command role="hg-cmd">hg backout</command> command.</para> + + </sect3> + </sect2> + <sect2> + <title>Gaining more control of the backout process</title> + + <para>While I've recommended that you always use the <option + role="hg-opt-backout">--merge</option> option when backing + out a change, the <command role="hg-cmd">hg backout</command> + command lets you decide how to merge a backout changeset. + Taking control of the backout process by hand is something you + will rarely need to do, but it can be useful to understand + what the <command role="hg-cmd">hg backout</command> command + is doing for you automatically. To illustrate this, let's + clone our first repository, but omit the backout change that + it contains.</para> + + &interaction.backout.manual.clone; + + <para>As with our + earlier example, We'll commit a third changeset, then back out + its parent, and see what happens.</para> + + &interaction.backout.manual.backout; + + <para>Our new changeset is again a descendant of the changeset + we backout out; it's thus a new head, <emphasis>not</emphasis> + a descendant of the changeset that was the tip. The <command + role="hg-cmd">hg backout</command> command was quite + explicit in telling us this.</para> + + &interaction.backout.manual.log; + + <para>Again, it's easier to see what has happened by looking at + a graph of the revision history, in figure <xref + linkend="fig:undo:backout-manual"/>. This makes it clear + that when we use <command role="hg-cmd">hg backout</command> + to back out a change other than the tip, Mercurial adds a new + head to the repository (the change it committed is + box-shaped).</para> + + <informalfigure id="fig:undo:backout-manual"> + <mediaobject><imageobject><imagedata + fileref="undo-manual"/></imageobject><textobject><phrase>XXX + add text</phrase></textobject><caption><para>Backing out + a change using the <command role="hg-cmd">hg + backout</command> + command</para></caption></mediaobject> + + </informalfigure> + + <para>After the <command role="hg-cmd">hg backout</command> + command has completed, it leaves the new + <quote>backout</quote> changeset as the parent of the working + directory.</para> + + &interaction.backout.manual.parents; + + <para>Now we have two isolated sets of changes.</para> + + &interaction.backout.manual.heads; + + <para>Let's think about what we expect to see as the contents of + <filename>myfile</filename> now. The first change should be + present, because we've never backed it out. The second change + should be missing, as that's the change we backed out. Since + the history graph shows the third change as a separate head, + we <emphasis>don't</emphasis> expect to see the third change + present in <filename>myfile</filename>.</para> + + &interaction.backout.manual.cat; + + <para>To get the third change back into the file, we just do a + normal merge of our two heads.</para> + + &interaction.backout.manual.merge; + + <para>Afterwards, the graphical history of our repository looks + like figure + <xref linkend="fig:undo:backout-manual-merge"/>.</para> + + <informalfigure id="fig:undo:backout-manual-merge"> + <mediaobject><imageobject><imagedata + fileref="undo-manual-merge"/></imageobject><textobject><phrase>XXX + add text</phrase></textobject><caption><para>Manually + merging a backout change</para></caption></mediaobject> + + </informalfigure> + + </sect2> + <sect2> + <title>Why <command role="hg-cmd">hg backout</command> works as + it does</title> + + <para>Here's a brief description of how the <command + role="hg-cmd">hg backout</command> command works.</para> + <orderedlist> + <listitem><para>It ensures that the working directory is + <quote>clean</quote>, i.e. that the output of <command + role="hg-cmd">hg status</command> would be empty.</para> + </listitem> + <listitem><para>It remembers the current parent of the working + directory. Let's call this changeset + <literal>orig</literal></para> + </listitem> + <listitem><para>It does the equivalent of a <command + role="hg-cmd">hg update</command> to sync the working + directory to the changeset you want to back out. Let's + call this changeset <literal>backout</literal></para> + </listitem> + <listitem><para>It finds the parent of that changeset. Let's + call that changeset <literal>parent</literal>.</para> + </listitem> + <listitem><para>For each file that the + <literal>backout</literal> changeset affected, it does the + equivalent of a <command role="hg-cmd">hg revert -r + parent</command> on that file, to restore it to the + contents it had before that changeset was + committed.</para> + </listitem> + <listitem><para>It commits the result as a new changeset. + This changeset has <literal>backout</literal> as its + parent.</para> + </listitem> + <listitem><para>If you specify <option + role="hg-opt-backout">--merge</option> on the command + line, it merges with <literal>orig</literal>, and commits + the result of the merge.</para> + </listitem></orderedlist> + + <para>An alternative way to implement the <command + role="hg-cmd">hg backout</command> command would be to + <command role="hg-cmd">hg export</command> the + to-be-backed-out changeset as a diff, then use the <option + role="cmd-opt-patch">--reverse</option> option to the + <command>patch</command> command to reverse the effect of the + change without fiddling with the working directory. This + sounds much simpler, but it would not work nearly as + well.</para> + + <para>The reason that <command role="hg-cmd">hg + backout</command> does an update, a commit, a merge, and + another commit is to give the merge machinery the best chance + to do a good job when dealing with all the changes + <emphasis>between</emphasis> the change you're backing out and + the current tip.</para> + + <para>If you're backing out a changeset that's 100 revisions + back in your project's history, the chances that the + <command>patch</command> command will be able to apply a + reverse diff cleanly are not good, because intervening changes + are likely to have <quote>broken the context</quote> that + <command>patch</command> uses to determine whether it can + apply a patch (if this sounds like gibberish, see <xref + linkend="sec:mq:patch"/> for a + discussion of the <command>patch</command> command). Also, + Mercurial's merge machinery will handle files and directories + being renamed, permission changes, and modifications to binary + files, none of which <command>patch</command> can deal + with.</para> + + </sect2> + </sect1> + <sect1 id="sec:undo:aaaiiieee"> + <title>Changes that should never have been</title> + + <para>Most of the time, the <command role="hg-cmd">hg + backout</command> command is exactly what you need if you want + to undo the effects of a change. It leaves a permanent record + of exactly what you did, both when committing the original + changeset and when you cleaned up after it.</para> + + <para>On rare occasions, though, you may find that you've + committed a change that really should not be present in the + repository at all. For example, it would be very unusual, and + usually considered a mistake, to commit a software project's + object files as well as its source files. Object files have + almost no intrinsic value, and they're <emphasis>big</emphasis>, + so they increase the size of the repository and the amount of + time it takes to clone or pull changes.</para> + + <para>Before I discuss the options that you have if you commit a + <quote>brown paper bag</quote> change (the kind that's so bad + that you want to pull a brown paper bag over your head), let me + first discuss some approaches that probably won't work.</para> + + <para>Since Mercurial treats history as accumulative&emdash;every + change builds on top of all changes that preceded it&emdash;you + generally can't just make disastrous changes disappear. The one + exception is when you've just committed a change, and it hasn't + been pushed or pulled into another repository. That's when you + can safely use the <command role="hg-cmd">hg rollback</command> + command, as I detailed in section <xref + linkend="sec:undo:rollback"/>.</para> + + <para>After you've pushed a bad change to another repository, you + <emphasis>could</emphasis> still use <command role="hg-cmd">hg + rollback</command> to make your local copy of the change + disappear, but it won't have the consequences you want. The + change will still be present in the remote repository, so it + will reappear in your local repository the next time you + pull.</para> + + <para>If a situation like this arises, and you know which + repositories your bad change has propagated into, you can + <emphasis>try</emphasis> to get rid of the changeefrom + <emphasis>every</emphasis> one of those repositories. This is, + of course, not a satisfactory solution: if you miss even a + single repository while you're expunging, the change is still + <quote>in the wild</quote>, and could propagate further.</para> + + <para>If you've committed one or more changes + <emphasis>after</emphasis> the change that you'd like to see + disappear, your options are further reduced. Mercurial doesn't + provide a way to <quote>punch a hole</quote> in history, leaving + changesets intact.</para> + + <para>XXX This needs filling out. The + <literal>hg-replay</literal> script in the + <literal>examples</literal> directory works, but doesn't handle + merge changesets. Kind of an important omission.</para> + + <sect2> + <title>Protect yourself from <quote>escaped</quote> + changes</title> + + <para>If you've committed some changes to your local repository + and they've been pushed or pulled somewhere else, this isn't + necessarily a disaster. You can protect yourself ahead of + time against some classes of bad changeset. This is + particularly easy if your team usually pulls changes from a + central repository.</para> + + <para>By configuring some hooks on that repository to validate + incoming changesets (see chapter <xref linkend="chap:hook"/>), + you can + automatically prevent some kinds of bad changeset from being + pushed to the central repository at all. With such a + configuration in place, some kinds of bad changeset will + naturally tend to <quote>die out</quote> because they can't + propagate into the central repository. Better yet, this + happens without any need for explicit intervention.</para> + + <para>For instance, an incoming change hook that verifies that a + changeset will actually compile can prevent people from + inadvertantly <quote>breaking the build</quote>.</para> + + </sect2> + </sect1> + <sect1 id="sec:undo:bisect"> + <title>Finding the source of a bug</title> + + <para>While it's all very well to be able to back out a changeset + that introduced a bug, this requires that you know which + changeset to back out. Mercurial provides an invaluable + command, called <command role="hg-cmd">hg bisect</command>, that + helps you to automate this process and accomplish it very + efficiently.</para> + + <para>The idea behind the <command role="hg-cmd">hg + bisect</command> command is that a changeset has introduced + some change of behaviour that you can identify with a simple + binary test. You don't know which piece of code introduced the + change, but you know how to test for the presence of the bug. + The <command role="hg-cmd">hg bisect</command> command uses your + test to direct its search for the changeset that introduced the + code that caused the bug.</para> + + <para>Here are a few scenarios to help you understand how you + might apply this command.</para> + <itemizedlist> + <listitem><para>The most recent version of your software has a + bug that you remember wasn't present a few weeks ago, but + you don't know when it was introduced. Here, your binary + test checks for the presence of that bug.</para> + </listitem> + <listitem><para>You fixed a bug in a rush, and now it's time to + close the entry in your team's bug database. The bug + database requires a changeset ID when you close an entry, + but you don't remember which changeset you fixed the bug in. + Once again, your binary test checks for the presence of the + bug.</para> + </listitem> + <listitem><para>Your software works correctly, but runs 15% + slower than the last time you measured it. You want to know + which changeset introduced the performance regression. In + this case, your binary test measures the performance of your + software, to see whether it's <quote>fast</quote> or + <quote>slow</quote>.</para> + </listitem> + <listitem><para>The sizes of the components of your project that + you ship exploded recently, and you suspect that something + changed in the way you build your project.</para> + </listitem></itemizedlist> + + <para>From these examples, it should be clear that the <command + role="hg-cmd">hg bisect</command> command is not useful only + for finding the sources of bugs. You can use it to find any + <quote>emergent property</quote> of a repository (anything that + you can't find from a simple text search of the files in the + tree) for which you can write a binary test.</para> + + <para>We'll introduce a little bit of terminology here, just to + make it clear which parts of the search process are your + responsibility, and which are Mercurial's. A + <emphasis>test</emphasis> is something that + <emphasis>you</emphasis> run when <command role="hg-cmd">hg + bisect</command> chooses a changeset. A + <emphasis>probe</emphasis> is what <command role="hg-cmd">hg + bisect</command> runs to tell whether a revision is good. + Finally, we'll use the word <quote>bisect</quote>, as both a + noun and a verb, to stand in for the phrase <quote>search using + the <command role="hg-cmd">hg bisect</command> + command</quote>.</para> + + <para>One simple way to automate the searching process would be + simply to probe every changeset. However, this scales poorly. + If it took ten minutes to test a single changeset, and you had + 10,000 changesets in your repository, the exhaustive approach + would take on average 35 <emphasis>days</emphasis> to find the + changeset that introduced a bug. Even if you knew that the bug + was introduced by one of the last 500 changesets, and limited + your search to those, you'd still be looking at over 40 hours to + find the changeset that introduced your bug.</para> + + <para>What the <command role="hg-cmd">hg bisect</command> command + does is use its knowledge of the <quote>shape</quote> of your + project's revision history to perform a search in time + proportional to the <emphasis>logarithm</emphasis> of the number + of changesets to check (the kind of search it performs is called + a dichotomic search). With this approach, searching through + 10,000 changesets will take less than three hours, even at ten + minutes per test (the search will require about 14 tests). + Limit your search to the last hundred changesets, and it will + take only about an hour (roughly seven tests).</para> + + <para>The <command role="hg-cmd">hg bisect</command> command is + aware of the <quote>branchy</quote> nature of a Mercurial + project's revision history, so it has no problems dealing with + branches, merges, or multiple heads in a repository. It can + prune entire branches of history with a single probe, which is + how it operates so efficiently.</para> + + <sect2> + <title>Using the <command role="hg-cmd">hg bisect</command> + command</title> + + <para>Here's an example of <command role="hg-cmd">hg + bisect</command> in action.</para> + + <note> + <para> In versions 0.9.5 and earlier of Mercurial, <command + role="hg-cmd">hg bisect</command> was not a core command: + it was distributed with Mercurial as an extension. This + section describes the built-in command, not the old + extension.</para> + </note> + + <para>Now let's create a repository, so that we can try out the + <command role="hg-cmd">hg bisect</command> command in + isolation.</para> + + &interaction.bisect.init; + + <para>We'll simulate a project that has a bug in it in a + simple-minded way: create trivial changes in a loop, and + nominate one specific change that will have the + <quote>bug</quote>. This loop creates 35 changesets, each + adding a single file to the repository. We'll represent our + <quote>bug</quote> with a file that contains the text <quote>i + have a gub</quote>.</para> + + &interaction.bisect.commits; + + <para>The next thing that we'd like to do is figure out how to + use the <command role="hg-cmd">hg bisect</command> command. + We can use Mercurial's normal built-in help mechanism for + this.</para> + + &interaction.bisect.help; + + <para>The <command role="hg-cmd">hg bisect</command> command + works in steps. Each step proceeds as follows.</para> + <orderedlist> + <listitem><para>You run your binary test.</para> + <itemizedlist> + <listitem><para>If the test succeeded, you tell <command + role="hg-cmd">hg bisect</command> by running the + <command role="hg-cmd">hg bisect good</command> + command.</para> + </listitem> + <listitem><para>If it failed, run the <command + role="hg-cmd">hg bisect bad</command> + command.</para></listitem></itemizedlist> + </listitem> + <listitem><para>The command uses your information to decide + which changeset to test next.</para> + </listitem> + <listitem><para>It updates the working directory to that + changeset, and the process begins again.</para> + </listitem></orderedlist> + <para>The process ends when <command role="hg-cmd">hg + bisect</command> identifies a unique changeset that marks + the point where your test transitioned from + <quote>succeeding</quote> to <quote>failing</quote>.</para> + + <para>To start the search, we must run the <command + role="hg-cmd">hg bisect --reset</command> command.</para> + + &interaction.bisect.search.init; + + <para>In our case, the binary test we use is simple: we check to + see if any file in the repository contains the string <quote>i + have a gub</quote>. If it does, this changeset contains the + change that <quote>caused the bug</quote>. By convention, a + changeset that has the property we're searching for is + <quote>bad</quote>, while one that doesn't is + <quote>good</quote>.</para> + + <para>Most of the time, the revision to which the working + directory is synced (usually the tip) already exhibits the + problem introduced by the buggy change, so we'll mark it as + <quote>bad</quote>.</para> + + &interaction.bisect.search.bad-init; + + <para>Our next task is to nominate a changeset that we know + <emphasis>doesn't</emphasis> have the bug; the <command + role="hg-cmd">hg bisect</command> command will + <quote>bracket</quote> its search between the first pair of + good and bad changesets. In our case, we know that revision + 10 didn't have the bug. (I'll have more words about choosing + the first <quote>good</quote> changeset later.)</para> + + &interaction.bisect.search.good-init; + + <para>Notice that this command printed some output.</para> + <itemizedlist> + <listitem><para>It told us how many changesets it must + consider before it can identify the one that introduced + the bug, and how many tests that will require.</para> + </listitem> + <listitem><para>It updated the working directory to the next + changeset to test, and told us which changeset it's + testing.</para> + </listitem></itemizedlist> + + <para>We now run our test in the working directory. We use the + <command>grep</command> command to see if our + <quote>bad</quote> file is present in the working directory. + If it is, this revision is bad; if not, this revision is good. + &interaction.bisect.search.step1;</para> + + <para>This test looks like a perfect candidate for automation, + so let's turn it into a shell function.</para> + &interaction.bisect.search.mytest; + + <para>We can now run an entire test step with a single command, + <literal>mytest</literal>.</para> + + &interaction.bisect.search.step2; + + <para>A few more invocations of our canned test step command, + and we're done.</para> + + &interaction.bisect.search.rest; + + <para>Even though we had 40 changesets to search through, the + <command role="hg-cmd">hg bisect</command> command let us find + the changeset that introduced our <quote>bug</quote> with only + five tests. Because the number of tests that the <command + role="hg-cmd">hg bisect</command> command performs grows + logarithmically with the number of changesets to search, the + advantage that it has over the <quote>brute force</quote> + search approach increases with every changeset you add.</para> + + </sect2> + <sect2> + <title>Cleaning up after your search</title> + + <para>When you're finished using the <command role="hg-cmd">hg + bisect</command> command in a repository, you can use the + <command role="hg-cmd">hg bisect reset</command> command to + drop the information it was using to drive your search. The + command doesn't use much space, so it doesn't matter if you + forget to run this command. However, <command + role="hg-cmd">hg bisect</command> won't let you start a new + search in that repository until you do a <command + role="hg-cmd">hg bisect reset</command>.</para> + + &interaction.bisect.search.reset; + + </sect2> + </sect1> + <sect1> + <title>Tips for finding bugs effectively</title> + + <sect2> + <title>Give consistent input</title> + + <para>The <command role="hg-cmd">hg bisect</command> command + requires that you correctly report the result of every test + you perform. If you tell it that a test failed when it really + succeeded, it <emphasis>might</emphasis> be able to detect the + inconsistency. If it can identify an inconsistency in your + reports, it will tell you that a particular changeset is both + good and bad. However, it can't do this perfectly; it's about + as likely to report the wrong changeset as the source of the + bug.</para> + + </sect2> + <sect2> + <title>Automate as much as possible</title> + + <para>When I started using the <command role="hg-cmd">hg + bisect</command> command, I tried a few times to run my + tests by hand, on the command line. This is an approach that + I, at least, am not suited to. After a few tries, I found + that I was making enough mistakes that I was having to restart + my searches several times before finally getting correct + results.</para> + + <para>My initial problems with driving the <command + role="hg-cmd">hg bisect</command> command by hand occurred + even with simple searches on small repositories; if the + problem you're looking for is more subtle, or the number of + tests that <command role="hg-cmd">hg bisect</command> must + perform increases, the likelihood of operator error ruining + the search is much higher. Once I started automating my + tests, I had much better results.</para> + + <para>The key to automated testing is twofold:</para> + <itemizedlist> + <listitem><para>always test for the same symptom, and</para> + </listitem> + <listitem><para>always feed consistent input to the <command + role="hg-cmd">hg bisect</command> command.</para> + </listitem></itemizedlist> + <para>In my tutorial example above, the <command>grep</command> + command tests for the symptom, and the <literal>if</literal> + statement takes the result of this check and ensures that we + always feed the same input to the <command role="hg-cmd">hg + bisect</command> command. The <literal>mytest</literal> + function marries these together in a reproducible way, so that + every test is uniform and consistent.</para> + + </sect2> + <sect2> + <title>Check your results</title> + + <para>Because the output of a <command role="hg-cmd">hg + bisect</command> search is only as good as the input you + give it, don't take the changeset it reports as the absolute + truth. A simple way to cross-check its report is to manually + run your test at each of the following changesets:</para> + <itemizedlist> + <listitem><para>The changeset that it reports as the first bad + revision. Your test should still report this as + bad.</para> + </listitem> + <listitem><para>The parent of that changeset (either parent, + if it's a merge). Your test should report this changeset + as good.</para> + </listitem> + <listitem><para>A child of that changeset. Your test should + report this changeset as bad.</para> + </listitem></itemizedlist> + + </sect2> + <sect2> + <title>Beware interference between bugs</title> + + <para>It's possible that your search for one bug could be + disrupted by the presence of another. For example, let's say + your software crashes at revision 100, and worked correctly at + revision 50. Unknown to you, someone else introduced a + different crashing bug at revision 60, and fixed it at + revision 80. This could distort your results in one of + several ways.</para> + + <para>It is possible that this other bug completely + <quote>masks</quote> yours, which is to say that it occurs + before your bug has a chance to manifest itself. If you can't + avoid that other bug (for example, it prevents your project + from building), and so can't tell whether your bug is present + in a particular changeset, the <command role="hg-cmd">hg + bisect</command> command cannot help you directly. Instead, + you can mark a changeset as untested by running <command + role="hg-cmd">hg bisect --skip</command>.</para> + + <para>A different problem could arise if your test for a bug's + presence is not specific enough. If you check for <quote>my + program crashes</quote>, then both your crashing bug and an + unrelated crashing bug that masks it will look like the same + thing, and mislead <command role="hg-cmd">hg + bisect</command>.</para> + + <para>Another useful situation in which to use <command + role="hg-cmd">hg bisect --skip</command> is if you can't + test a revision because your project was in a broken and hence + untestable state at that revision, perhaps because someone + checked in a change that prevented the project from + building.</para> + + </sect2> + <sect2> + <title>Bracket your search lazily</title> + + <para>Choosing the first <quote>good</quote> and + <quote>bad</quote> changesets that will mark the end points of + your search is often easy, but it bears a little discussion + nevertheless. From the perspective of <command + role="hg-cmd">hg bisect</command>, the <quote>newest</quote> + changeset is conventionally <quote>bad</quote>, and the older + changeset is <quote>good</quote>.</para> + + <para>If you're having trouble remembering when a suitable + <quote>good</quote> change was, so that you can tell <command + role="hg-cmd">hg bisect</command>, you could do worse than + testing changesets at random. Just remember to eliminate + contenders that can't possibly exhibit the bug (perhaps + because the feature with the bug isn't present yet) and those + where another problem masks the bug (as I discussed + above).</para> + + <para>Even if you end up <quote>early</quote> by thousands of + changesets or months of history, you will only add a handful + of tests to the total number that <command role="hg-cmd">hg + bisect</command> must perform, thanks to its logarithmic + behaviour.</para> + + </sect2> + </sect1> +</chapter> + +<!-- +local variables: +sgml-parent-document: ("00book.xml" "book" "chapter") +end: +-->
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/en/ch09-hook.xml Thu Mar 19 20:54:12 2009 -0700 @@ -0,0 +1,2037 @@ +<!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : --> + +<chapter id="chap:hook"> + <?dbhtml filename="handling-repository-events-with-hooks.html"?> + <title>Handling repository events with hooks</title> + + <para>Mercurial offers a powerful mechanism to let you perform + automated actions in response to events that occur in a + repository. In some cases, you can even control Mercurial's + response to those events.</para> + + <para>The name Mercurial uses for one of these actions is a + <emphasis>hook</emphasis>. Hooks are called + <quote>triggers</quote> in some revision control systems, but the + two names refer to the same idea.</para> + + <sect1> + <title>An overview of hooks in Mercurial</title> + + <para>Here is a brief list of the hooks that Mercurial supports. + We will revisit each of these hooks in more detail later, in + section <xref linkend="sec:hook:ref"/>.</para> + + <itemizedlist> + <listitem><para><literal role="hook">changegroup</literal>: This + is run after a group of changesets has been brought into the + repository from elsewhere.</para> + </listitem> + <listitem><para><literal role="hook">commit</literal>: This is + run after a new changeset has been created in the local + repository.</para> + </listitem> + <listitem><para><literal role="hook">incoming</literal>: This is + run once for each new changeset that is brought into the + repository from elsewhere. Notice the difference from + <literal role="hook">changegroup</literal>, which is run + once per <emphasis>group</emphasis> of changesets brought + in.</para> + </listitem> + <listitem><para><literal role="hook">outgoing</literal>: This is + run after a group of changesets has been transmitted from + this repository.</para> + </listitem> + <listitem><para><literal role="hook">prechangegroup</literal>: + This is run before starting to bring a group of changesets + into the repository. + </para> + </listitem> + <listitem><para><literal role="hook">precommit</literal>: + Controlling. This is run before starting a commit. + </para> + </listitem> + <listitem><para><literal role="hook">preoutgoing</literal>: + Controlling. This is run before starting to transmit a group + of changesets from this repository. + </para> + </listitem> + <listitem><para><literal role="hook">pretag</literal>: + Controlling. This is run before creating a tag. + </para> + </listitem> + <listitem><para><literal + role="hook">pretxnchangegroup</literal>: Controlling. This + is run after a group of changesets has been brought into the + local repository from another, but before the transaction + completes that will make the changes permanent in the + repository. + </para> + </listitem> + <listitem><para><literal role="hook">pretxncommit</literal>: + Controlling. This is run after a new changeset has been + created in the local repository, but before the transaction + completes that will make it permanent. + </para> + </listitem> + <listitem><para><literal role="hook">preupdate</literal>: + Controlling. This is run before starting an update or merge + of the working directory. + </para> + </listitem> + <listitem><para><literal role="hook">tag</literal>: This is run + after a tag is created. + </para> + </listitem> + <listitem><para><literal role="hook">update</literal>: This is + run after an update or merge of the working directory has + finished. + </para> + </listitem></itemizedlist> + <para>Each of the hooks whose description begins with the word + <quote>Controlling</quote> has the ability to determine whether + an activity can proceed. If the hook succeeds, the activity may + proceed; if it fails, the activity is either not permitted or + undone, depending on the hook. + </para> + + </sect1> + <sect1> + <title>Hooks and security</title> + + <sect2> + <title>Hooks are run with your privileges</title> + + <para>When you run a Mercurial command in a repository, and the + command causes a hook to run, that hook runs on + <emphasis>your</emphasis> system, under + <emphasis>your</emphasis> user account, with + <emphasis>your</emphasis> privilege level. Since hooks are + arbitrary pieces of executable code, you should treat them + with an appropriate level of suspicion. Do not install a hook + unless you are confident that you know who created it and what + it does. + </para> + + <para>In some cases, you may be exposed to hooks that you did + not install yourself. If you work with Mercurial on an + unfamiliar system, Mercurial will run hooks defined in that + system's global <filename role="special">~/.hgrc</filename> + file. + </para> + + <para>If you are working with a repository owned by another + user, Mercurial can run hooks defined in that user's + repository, but it will still run them as <quote>you</quote>. + For example, if you <command role="hg-cmd">hg pull</command> + from that repository, and its <filename + role="special">.hg/hgrc</filename> defines a local <literal + role="hook">outgoing</literal> hook, that hook will run + under your user account, even though you don't own that + repository. + </para> + + <note> + <para> This only applies if you are pulling from a repository + on a local or network filesystem. If you're pulling over + http or ssh, any <literal role="hook">outgoing</literal> + hook will run under whatever account is executing the server + process, on the server. + </para> + </note> + + <para>XXX To see what hooks are defined in a repository, use the + <command role="hg-cmd">hg config hooks</command> command. If + you are working in one repository, but talking to another that + you do not own (e.g. using <command role="hg-cmd">hg + pull</command> or <command role="hg-cmd">hg + incoming</command>), remember that it is the other + repository's hooks you should be checking, not your own. + </para> + + </sect2> + <sect2> + <title>Hooks do not propagate</title> + + <para>In Mercurial, hooks are not revision controlled, and do + not propagate when you clone, or pull from, a repository. The + reason for this is simple: a hook is a completely arbitrary + piece of executable code. It runs under your user identity, + with your privilege level, on your machine. + </para> + + <para>It would be extremely reckless for any distributed + revision control system to implement revision-controlled + hooks, as this would offer an easily exploitable way to + subvert the accounts of users of the revision control system. + </para> + + <para>Since Mercurial does not propagate hooks, if you are + collaborating with other people on a common project, you + should not assume that they are using the same Mercurial hooks + as you are, or that theirs are correctly configured. You + should document the hooks you expect people to use. + </para> + + <para>In a corporate intranet, this is somewhat easier to + control, as you can for example provide a + <quote>standard</quote> installation of Mercurial on an NFS + filesystem, and use a site-wide <filename role="special">~/.hgrc</filename> file to define hooks that all users will + see. However, this too has its limits; see below. + </para> + + </sect2> + <sect2> + <title>Hooks can be overridden</title> + + <para>Mercurial allows you to override a hook definition by + redefining the hook. You can disable it by setting its value + to the empty string, or change its behaviour as you wish. + </para> + + <para>If you deploy a system- or site-wide <filename + role="special">~/.hgrc</filename> file that defines some + hooks, you should thus understand that your users can disable + or override those hooks. + </para> + + </sect2> + <sect2> + <title>Ensuring that critical hooks are run</title> + + <para>Sometimes you may want to enforce a policy that you do not + want others to be able to work around. For example, you may + have a requirement that every changeset must pass a rigorous + set of tests. Defining this requirement via a hook in a + site-wide <filename role="special">~/.hgrc</filename> won't + work for remote users on laptops, and of course local users + can subvert it at will by overriding the hook. + </para> + + <para>Instead, you can set up your policies for use of Mercurial + so that people are expected to propagate changes through a + well-known <quote>canonical</quote> server that you have + locked down and configured appropriately. + </para> + + <para>One way to do this is via a combination of social + engineering and technology. Set up a restricted-access + account; users can push changes over the network to + repositories managed by this account, but they cannot log into + the account and run normal shell commands. In this scenario, + a user can commit a changeset that contains any old garbage + they want. + </para> + + <para>When someone pushes a changeset to the server that + everyone pulls from, the server will test the changeset before + it accepts it as permanent, and reject it if it fails to pass + the test suite. If people only pull changes from this + filtering server, it will serve to ensure that all changes + that people pull have been automatically vetted. + </para> + + </sect2> + </sect1> + <sect1> + <title>Care with <literal>pretxn</literal> hooks in a + shared-access repository</title> + + <para>If you want to use hooks to do some automated work in a + repository that a number of people have shared access to, you + need to be careful in how you do this. + </para> + + <para>Mercurial only locks a repository when it is writing to the + repository, and only the parts of Mercurial that write to the + repository pay attention to locks. Write locks are necessary to + prevent multiple simultaneous writers from scribbling on each + other's work, corrupting the repository. + </para> + + <para>Because Mercurial is careful with the order in which it + reads and writes data, it does not need to acquire a lock when + it wants to read data from the repository. The parts of + Mercurial that read from the repository never pay attention to + locks. This lockless reading scheme greatly increases + performance and concurrency. + </para> + + <para>With great performance comes a trade-off, though, one which + has the potential to cause you trouble unless you're aware of + it. To describe this requires a little detail about how + Mercurial adds changesets to a repository and reads those + changes. + </para> + + <para>When Mercurial <emphasis>writes</emphasis> metadata, it + writes it straight into the destination file. It writes file + data first, then manifest data (which contains pointers to the + new file data), then changelog data (which contains pointers to + the new manifest data). Before the first write to each file, it + stores a record of where the end of the file was in its + transaction log. If the transaction must be rolled back, + Mercurial simply truncates each file back to the size it was + before the transaction began. + </para> + + <para>When Mercurial <emphasis>reads</emphasis> metadata, it reads + the changelog first, then everything else. Since a reader will + only access parts of the manifest or file metadata that it can + see in the changelog, it can never see partially written data. + </para> + + <para>Some controlling hooks (<literal + role="hook">pretxncommit</literal> and <literal + role="hook">pretxnchangegroup</literal>) run when a + transaction is almost complete. All of the metadata has been + written, but Mercurial can still roll the transaction back and + cause the newly-written data to disappear. + </para> + + <para>If one of these hooks runs for long, it opens a window of + time during which a reader can see the metadata for changesets + that are not yet permanent, and should not be thought of as + <quote>really there</quote>. The longer the hook runs, the + longer that window is open. + </para> + + <sect2> + <title>The problem illustrated</title> + + <para>In principle, a good use for the <literal + role="hook">pretxnchangegroup</literal> hook would be to + automatically build and test incoming changes before they are + accepted into a central repository. This could let you + guarantee that nobody can push changes to this repository that + <quote>break the build</quote>. But if a client can pull + changes while they're being tested, the usefulness of the test + is zero; an unsuspecting someone can pull untested changes, + potentially breaking their build. + </para> + + <para>The safest technological answer to this challenge is to + set up such a <quote>gatekeeper</quote> repository as + <emphasis>unidirectional</emphasis>. Let it take changes + pushed in from the outside, but do not allow anyone to pull + changes from it (use the <literal + role="hook">preoutgoing</literal> hook to lock it down). + Configure a <literal role="hook">changegroup</literal> hook so + that if a build or test succeeds, the hook will push the new + changes out to another repository that people + <emphasis>can</emphasis> pull from. + </para> + + <para>In practice, putting a centralised bottleneck like this in + place is not often a good idea, and transaction visibility has + nothing to do with the problem. As the size of a + project&emdash;and the time it takes to build and + test&emdash;grows, you rapidly run into a wall with this + <quote>try before you buy</quote> approach, where you have + more changesets to test than time in which to deal with them. + The inevitable result is frustration on the part of all + involved. + </para> + + <para>An approach that scales better is to get people to build + and test before they push, then run automated builds and tests + centrally <emphasis>after</emphasis> a push, to be sure all is + well. The advantage of this approach is that it does not + impose a limit on the rate at which the repository can accept + changes. + </para> + + </sect2> + </sect1> + <sect1 id="sec:hook:simple"> + <title>A short tutorial on using hooks</title> + + <para>It is easy to write a Mercurial hook. Let's start with a + hook that runs when you finish a <command role="hg-cmd">hg + commit</command>, and simply prints the hash of the changeset + you just created. The hook is called <literal + role="hook">commit</literal>. + </para> + + <para>All hooks follow the pattern in this example.</para> + +&interaction.hook.simple.init; + + <para>You add an entry to the <literal + role="rc-hooks">hooks</literal> section of your <filename + role="special">~/.hgrc</filename>. On the left is the name of + the event to trigger on; on the right is the action to take. As + you can see, you can run an arbitrary shell command in a hook. + Mercurial passes extra information to the hook using environment + variables (look for <envar>HG_NODE</envar> in the example). + </para> + + <sect2> + <title>Performing multiple actions per event</title> + + <para>Quite often, you will want to define more than one hook + for a particular kind of event, as shown below.</para> + +&interaction.hook.simple.ext; + + <para>Mercurial lets you do this by adding an + <emphasis>extension</emphasis> to the end of a hook's name. + You extend a hook's name by giving the name of the hook, + followed by a full stop (the + <quote><literal>.</literal></quote> character), followed by + some more text of your choosing. For example, Mercurial will + run both <literal>commit.foo</literal> and + <literal>commit.bar</literal> when the + <literal>commit</literal> event occurs. + </para> + + <para>To give a well-defined order of execution when there are + multiple hooks defined for an event, Mercurial sorts hooks by + extension, and executes the hook commands in this sorted + order. In the above example, it will execute + <literal>commit.bar</literal> before + <literal>commit.foo</literal>, and <literal>commit</literal> + before both. + </para> + + <para>It is a good idea to use a somewhat descriptive extension + when you define a new hook. This will help you to remember + what the hook was for. If the hook fails, you'll get an error + message that contains the hook name and extension, so using a + descriptive extension could give you an immediate hint as to + why the hook failed (see section <xref + linkend="sec:hook:perm"/> for an example). + </para> + + </sect2> + <sect2 id="sec:hook:perm"> + <title>Controlling whether an activity can proceed</title> + + <para>In our earlier examples, we used the <literal + role="hook">commit</literal> hook, which is run after a + commit has completed. This is one of several Mercurial hooks + that run after an activity finishes. Such hooks have no way + of influencing the activity itself. + </para> + + <para>Mercurial defines a number of events that occur before an + activity starts; or after it starts, but before it finishes. + Hooks that trigger on these events have the added ability to + choose whether the activity can continue, or will abort. + </para> + + <para>The <literal role="hook">pretxncommit</literal> hook runs + after a commit has all but completed. In other words, the + metadata representing the changeset has been written out to + disk, but the transaction has not yet been allowed to + complete. The <literal role="hook">pretxncommit</literal> + hook has the ability to decide whether the transaction can + complete, or must be rolled back. + </para> + + <para>If the <literal role="hook">pretxncommit</literal> hook + exits with a status code of zero, the transaction is allowed + to complete; the commit finishes; and the <literal + role="hook">commit</literal> hook is run. If the <literal + role="hook">pretxncommit</literal> hook exits with a + non-zero status code, the transaction is rolled back; the + metadata representing the changeset is erased; and the + <literal role="hook">commit</literal> hook is not run. + </para> + +&interaction.hook.simple.pretxncommit; + + <para>The hook in the example above checks that a commit comment + contains a bug ID. If it does, the commit can complete. If + not, the commit is rolled back. + </para> + + </sect2> + </sect1> + <sect1> + <title>Writing your own hooks</title> + + <para>When you are writing a hook, you might find it useful to run + Mercurial either with the <option + role="hg-opt-global">-v</option> option, or the <envar + role="rc-item-ui">verbose</envar> config item set to + <quote>true</quote>. When you do so, Mercurial will print a + message before it calls each hook. + </para> + + <sect2 id="sec:hook:lang"> + <title>Choosing how your hook should run</title> + + <para>You can write a hook either as a normal + program&emdash;typically a shell script&emdash;or as a Python + function that is executed within the Mercurial process. + </para> + + <para>Writing a hook as an external program has the advantage + that it requires no knowledge of Mercurial's internals. You + can call normal Mercurial commands to get any added + information you need. The trade-off is that external hooks + are slower than in-process hooks. + </para> + + <para>An in-process Python hook has complete access to the + Mercurial API, and does not <quote>shell out</quote> to + another process, so it is inherently faster than an external + hook. It is also easier to obtain much of the information + that a hook requires by using the Mercurial API than by + running Mercurial commands. + </para> + + <para>If you are comfortable with Python, or require high + performance, writing your hooks in Python may be a good + choice. However, when you have a straightforward hook to + write and you don't need to care about performance (probably + the majority of hooks), a shell script is perfectly fine. + </para> + + </sect2> + <sect2 id="sec:hook:param"> + <title>Hook parameters</title> + + <para>Mercurial calls each hook with a set of well-defined + parameters. In Python, a parameter is passed as a keyword + argument to your hook function. For an external program, a + parameter is passed as an environment variable. + </para> + + <para>Whether your hook is written in Python or as a shell + script, the hook-specific parameter names and values will be + the same. A boolean parameter will be represented as a + boolean value in Python, but as the number 1 (for + <quote>true</quote>) or 0 (for <quote>false</quote>) as an + environment variable for an external hook. If a hook + parameter is named <literal>foo</literal>, the keyword + argument for a Python hook will also be named + <literal>foo</literal>, while the environment variable for an + external hook will be named <literal>HG_FOO</literal>. + </para> + + </sect2> + <sect2> + <title>Hook return values and activity control</title> + + <para>A hook that executes successfully must exit with a status + of zero if external, or return boolean <quote>false</quote> if + in-process. Failure is indicated with a non-zero exit status + from an external hook, or an in-process hook returning boolean + <quote>true</quote>. If an in-process hook raises an + exception, the hook is considered to have failed. + </para> + + <para>For a hook that controls whether an activity can proceed, + zero/false means <quote>allow</quote>, while + non-zero/true/exception means <quote>deny</quote>. + </para> + + </sect2> + <sect2> + <title>Writing an external hook</title> + + <para>When you define an external hook in your <filename + role="special">~/.hgrc</filename> and the hook is run, its + value is passed to your shell, which interprets it. This + means that you can use normal shell constructs in the body of + the hook. + </para> + + <para>An executable hook is always run with its current + directory set to a repository's root directory. + </para> + + <para>Each hook parameter is passed in as an environment + variable; the name is upper-cased, and prefixed with the + string <quote><literal>HG_</literal></quote>. + </para> + + <para>With the exception of hook parameters, Mercurial does not + set or modify any environment variables when running a hook. + This is useful to remember if you are writing a site-wide hook + that may be run by a number of different users with differing + environment variables set. In multi-user situations, you + should not rely on environment variables being set to the + values you have in your environment when testing the hook. + </para> + + </sect2> + <sect2> + <title>Telling Mercurial to use an in-process hook</title> + + <para>The <filename role="special">~/.hgrc</filename> syntax + for defining an in-process hook is slightly different than for + an executable hook. The value of the hook must start with the + text <quote><literal>python:</literal></quote>, and continue + with the fully-qualified name of a callable object to use as + the hook's value. + </para> + + <para>The module in which a hook lives is automatically imported + when a hook is run. So long as you have the module name and + <envar>PYTHONPATH</envar> right, it should <quote>just + work</quote>. + </para> + + <para>The following <filename role="special">~/.hgrc</filename> + example snippet illustrates the syntax and meaning of the + notions we just described. + </para> + <programlisting>[hooks] +commit.example = python:mymodule.submodule.myhook</programlisting> + <para>When Mercurial runs the <literal>commit.example</literal> + hook, it imports <literal>mymodule.submodule</literal>, looks + for the callable object named <literal>myhook</literal>, and + calls it. + </para> + + </sect2> + <sect2> + <title>Writing an in-process hook</title> + + <para>The simplest in-process hook does nothing, but illustrates + the basic shape of the hook API: + </para> + <programlisting>def myhook(ui, repo, **kwargs): + pass</programlisting> + <para>The first argument to a Python hook is always a <literal + role="py-mod-mercurial.ui">ui</literal> object. The second + is a repository object; at the moment, it is always an + instance of <literal + role="py-mod-mercurial.localrepo">localrepository</literal>. + Following these two arguments are other keyword arguments. + Which ones are passed in depends on the hook being called, but + a hook can ignore arguments it doesn't care about by dropping + them into a keyword argument dict, as with + <literal>**kwargs</literal> above. + </para> + + </sect2> + </sect1> + <sect1> + <title>Some hook examples</title> + + <sect2> + <title>Writing meaningful commit messages</title> + + <para>It's hard to imagine a useful commit message being very + short. The simple <literal role="hook">pretxncommit</literal> + hook of the example below will prevent you from committing a + changeset with a message that is less than ten bytes long. + </para> + +&interaction.hook.msglen.go; + + </sect2> + <sect2> + <title>Checking for trailing whitespace</title> + + <para>An interesting use of a commit-related hook is to help you + to write cleaner code. A simple example of <quote>cleaner + code</quote> is the dictum that a change should not add any + new lines of text that contain <quote>trailing + whitespace</quote>. Trailing whitespace is a series of + space and tab characters at the end of a line of text. In + most cases, trailing whitespace is unnecessary, invisible + noise, but it is occasionally problematic, and people often + prefer to get rid of it. + </para> + + <para>You can use either the <literal + role="hook">precommit</literal> or <literal + role="hook">pretxncommit</literal> hook to tell whether you + have a trailing whitespace problem. If you use the <literal + role="hook">precommit</literal> hook, the hook will not know + which files you are committing, so it will have to check every + modified file in the repository for trailing white space. If + you want to commit a change to just the file + <filename>foo</filename>, but the file + <filename>bar</filename> contains trailing whitespace, doing a + check in the <literal role="hook">precommit</literal> hook + will prevent you from committing <filename>foo</filename> due + to the problem with <filename>bar</filename>. This doesn't + seem right. + </para> + + <para>Should you choose the <literal + role="hook">pretxncommit</literal> hook, the check won't + occur until just before the transaction for the commit + completes. This will allow you to check for problems only the + exact files that are being committed. However, if you entered + the commit message interactively and the hook fails, the + transaction will roll back; you'll have to re-enter the commit + message after you fix the trailing whitespace and run <command + role="hg-cmd">hg commit</command> again. + </para> + +&interaction.hook.ws.simple; + + <para>In this example, we introduce a simple <literal + role="hook">pretxncommit</literal> hook that checks for + trailing whitespace. This hook is short, but not very + helpful. It exits with an error status if a change adds a + line with trailing whitespace to any file, but does not print + any information that might help us to identify the offending + file or line. It also has the nice property of not paying + attention to unmodified lines; only lines that introduce new + trailing whitespace cause problems. + </para> + + <para>The above version is much more complex, but also more + useful. It parses a unified diff to see if any lines add + trailing whitespace, and prints the name of the file and the + line number of each such occurrence. Even better, if the + change adds trailing whitespace, this hook saves the commit + comment and prints the name of the save file before exiting + and telling Mercurial to roll the transaction back, so you can + use the <option role="hg-opt-commit">-l filename</option> + option to <command role="hg-cmd">hg commit</command> to reuse + the saved commit message once you've corrected the problem. + </para> + +&interaction.hook.ws.better; + + <para>As a final aside, note in the example above the use of + <command>perl</command>'s in-place editing feature to get rid + of trailing whitespace from a file. This is concise and + useful enough that I will reproduce it here. + </para> + <programlisting>perl -pi -e 's,\s+$,,' filename</programlisting> + + </sect2> + </sect1> + <sect1> + <title>Bundled hooks</title> + + <para>Mercurial ships with several bundled hooks. You can find + them in the <filename class="directory">hgext</filename> + directory of a Mercurial source tree. If you are using a + Mercurial binary package, the hooks will be located in the + <filename class="directory">hgext</filename> directory of + wherever your package installer put Mercurial. + </para> + + <sect2> + <title><literal role="hg-ext">acl</literal>&emdash;access + control for parts of a repository</title> + + <para>The <literal role="hg-ext">acl</literal> extension lets + you control which remote users are allowed to push changesets + to a networked server. You can protect any portion of a + repository (including the entire repo), so that a specific + remote user can push changes that do not affect the protected + portion. + </para> + + <para>This extension implements access control based on the + identity of the user performing a push, + <emphasis>not</emphasis> on who committed the changesets + they're pushing. It makes sense to use this hook only if you + have a locked-down server environment that authenticates + remote users, and you want to be sure that only specific users + are allowed to push changes to that server. + </para> + + <sect3> + <title>Configuring the <literal role="hook">acl</literal> + hook</title> + + <para>In order to manage incoming changesets, the <literal + role="hg-ext">acl</literal> hook must be used as a + <literal role="hook">pretxnchangegroup</literal> hook. This + lets it see which files are modified by each incoming + changeset, and roll back a group of changesets if they + modify <quote>forbidden</quote> files. Example: + </para> + <programlisting>[hooks] +pretxnchangegroup.acl = python:hgext.acl.hook</programlisting> + + <para>The <literal role="hg-ext">acl</literal> extension is + configured using three sections. + </para> + + <para>The <literal role="rc-acl">acl</literal> section has + only one entry, <envar role="rc-item-acl">sources</envar>, + which lists the sources of incoming changesets that the hook + should pay attention to. You don't normally need to + configure this section. + </para> + <itemizedlist> + <listitem><para><envar role="rc-item-acl">serve</envar>: + Control incoming changesets that are arriving from a + remote repository over http or ssh. This is the default + value of <envar role="rc-item-acl">sources</envar>, and + usually the only setting you'll need for this + configuration item. + </para> + </listitem> + <listitem><para><envar role="rc-item-acl">pull</envar>: + Control incoming changesets that are arriving via a pull + from a local repository. + </para> + </listitem> + <listitem><para><envar role="rc-item-acl">push</envar>: + Control incoming changesets that are arriving via a push + from a local repository. + </para> + </listitem> + <listitem><para><envar role="rc-item-acl">bundle</envar>: + Control incoming changesets that are arriving from + another repository via a bundle. + </para> + </listitem></itemizedlist> + + <para>The <literal role="rc-acl.allow">acl.allow</literal> + section controls the users that are allowed to add + changesets to the repository. If this section is not + present, all users that are not explicitly denied are + allowed. If this section is present, all users that are not + explicitly allowed are denied (so an empty section means + that all users are denied). + </para> + + <para>The <literal role="rc-acl.deny">acl.deny</literal> + section determines which users are denied from adding + changesets to the repository. If this section is not + present or is empty, no users are denied. + </para> + + <para>The syntaxes for the <literal + role="rc-acl.allow">acl.allow</literal> and <literal + role="rc-acl.deny">acl.deny</literal> sections are + identical. On the left of each entry is a glob pattern that + matches files or directories, relative to the root of the + repository; on the right, a user name. + </para> + + <para>In the following example, the user + <literal>docwriter</literal> can only push changes to the + <filename class="directory">docs</filename> subtree of the + repository, while <literal>intern</literal> can push changes + to any file or directory except <filename + class="directory">source/sensitive</filename>. + </para> + <programlisting>[acl.allow] +docs/** = docwriter +[acl.deny] +source/sensitive/** = intern</programlisting> + + </sect3> + <sect3> + <title>Testing and troubleshooting</title> + + <para>If you want to test the <literal + role="hg-ext">acl</literal> hook, run it with Mercurial's + debugging output enabled. Since you'll probably be running + it on a server where it's not convenient (or sometimes + possible) to pass in the <option + role="hg-opt-global">--debug</option> option, don't forget + that you can enable debugging output in your <filename + role="special">~/.hgrc</filename>: + </para> + <programlisting>[ui] +debug = true</programlisting> + <para>With this enabled, the <literal + role="hg-ext">acl</literal> hook will print enough + information to let you figure out why it is allowing or + forbidding pushes from specific users. + </para> + + </sect3> + </sect2> + <sect2> + <title><literal + role="hg-ext">bugzilla</literal>&emdash;integration with + Bugzilla</title> + + <para>The <literal role="hg-ext">bugzilla</literal> extension + adds a comment to a Bugzilla bug whenever it finds a reference + to that bug ID in a commit comment. You can install this hook + on a shared server, so that any time a remote user pushes + changes to this server, the hook gets run. + </para> + + <para>It adds a comment to the bug that looks like this (you can + configure the contents of the comment&emdash;see below): + </para> + <programlisting>Changeset aad8b264143a, made by Joe User + <joe.user@domain.com> in the frobnitz repository, refers + to this bug. For complete details, see + http://hg.domain.com/frobnitz?cmd=changeset;node=aad8b264143a + Changeset description: Fix bug 10483 by guarding against some + NULL pointers</programlisting> + <para>The value of this hook is that it automates the process of + updating a bug any time a changeset refers to it. If you + configure the hook properly, it makes it easy for people to + browse straight from a Bugzilla bug to a changeset that refers + to that bug. + </para> + + <para>You can use the code in this hook as a starting point for + some more exotic Bugzilla integration recipes. Here are a few + possibilities: + </para> + <itemizedlist> + <listitem><para>Require that every changeset pushed to the + server have a valid bug ID in its commit comment. In this + case, you'd want to configure the hook as a <literal + role="hook">pretxncommit</literal> hook. This would + allow the hook to reject changes that didn't contain bug + IDs. + </para> + </listitem> + <listitem><para>Allow incoming changesets to automatically + modify the <emphasis>state</emphasis> of a bug, as well as + simply adding a comment. For example, the hook could + recognise the string <quote>fixed bug 31337</quote> as + indicating that it should update the state of bug 31337 to + <quote>requires testing</quote>. + </para> + </listitem></itemizedlist> + + <sect3 id="sec:hook:bugzilla:config"> + <title>Configuring the <literal role="hook">bugzilla</literal> + hook</title> + + <para>You should configure this hook in your server's + <filename role="special">~/.hgrc</filename> as an <literal + role="hook">incoming</literal> hook, for example as + follows: + </para> + <programlisting>[hooks] +incoming.bugzilla = python:hgext.bugzilla.hook</programlisting> + + <para>Because of the specialised nature of this hook, and + because Bugzilla was not written with this kind of + integration in mind, configuring this hook is a somewhat + involved process. + </para> + + <para>Before you begin, you must install the MySQL bindings + for Python on the host(s) where you'll be running the hook. + If this is not available as a binary package for your + system, you can download it from + <citation>web:mysql-python</citation>. + </para> + + <para>Configuration information for this hook lives in the + <literal role="rc-bugzilla">bugzilla</literal> section of + your <filename role="special">~/.hgrc</filename>. + </para> + <itemizedlist> + <listitem><para><envar + role="rc-item-bugzilla">version</envar>: The version + of Bugzilla installed on the server. The database + schema that Bugzilla uses changes occasionally, so this + hook has to know exactly which schema to use. At the + moment, the only version supported is + <literal>2.16</literal>. + </para> + </listitem> + <listitem><para><envar role="rc-item-bugzilla">host</envar>: + The hostname of the MySQL server that stores your + Bugzilla data. The database must be configured to allow + connections from whatever host you are running the + <literal role="hook">bugzilla</literal> hook on. + </para> + </listitem> + <listitem><para><envar role="rc-item-bugzilla">user</envar>: + The username with which to connect to the MySQL server. + The database must be configured to allow this user to + connect from whatever host you are running the <literal + role="hook">bugzilla</literal> hook on. This user + must be able to access and modify Bugzilla tables. The + default value of this item is <literal>bugs</literal>, + which is the standard name of the Bugzilla user in a + MySQL database. + </para> + </listitem> + <listitem><para><envar + role="rc-item-bugzilla">password</envar>: The MySQL + password for the user you configured above. This is + stored as plain text, so you should make sure that + unauthorised users cannot read the <filename + role="special">~/.hgrc</filename> file where you + store this information. + </para> + </listitem> + <listitem><para><envar role="rc-item-bugzilla">db</envar>: + The name of the Bugzilla database on the MySQL server. + The default value of this item is + <literal>bugs</literal>, which is the standard name of + the MySQL database where Bugzilla stores its data. + </para> + </listitem> + <listitem><para><envar + role="rc-item-bugzilla">notify</envar>: If you want + Bugzilla to send out a notification email to subscribers + after this hook has added a comment to a bug, you will + need this hook to run a command whenever it updates the + database. The command to run depends on where you have + installed Bugzilla, but it will typically look something + like this, if you have Bugzilla installed in <filename + class="directory">/var/www/html/bugzilla</filename>: + </para> + <programlisting>cd /var/www/html/bugzilla && + ./processmail %s nobody@nowhere.com</programlisting> + </listitem> + <listitem><para> The Bugzilla + <literal>processmail</literal> program expects to be + given a bug ID (the hook replaces + <quote><literal>%s</literal></quote> with the bug ID) + and an email address. It also expects to be able to + write to some files in the directory that it runs in. + If Bugzilla and this hook are not installed on the same + machine, you will need to find a way to run + <literal>processmail</literal> on the server where + Bugzilla is installed. + </para> + </listitem></itemizedlist> + + </sect3> + <sect3> + <title>Mapping committer names to Bugzilla user names</title> + + <para>By default, the <literal + role="hg-ext">bugzilla</literal> hook tries to use the + email address of a changeset's committer as the Bugzilla + user name with which to update a bug. If this does not suit + your needs, you can map committer email addresses to + Bugzilla user names using a <literal + role="rc-usermap">usermap</literal> section. + </para> + + <para>Each item in the <literal + role="rc-usermap">usermap</literal> section contains an + email address on the left, and a Bugzilla user name on the + right. + </para> + <programlisting>[usermap] +jane.user@example.com = jane</programlisting> + <para>You can either keep the <literal + role="rc-usermap">usermap</literal> data in a normal + <filename role="special">~/.hgrc</filename>, or tell the + <literal role="hg-ext">bugzilla</literal> hook to read the + information from an external <filename>usermap</filename> + file. In the latter case, you can store + <filename>usermap</filename> data by itself in (for example) + a user-modifiable repository. This makes it possible to let + your users maintain their own <envar + role="rc-item-bugzilla">usermap</envar> entries. The main + <filename role="special">~/.hgrc</filename> file might look + like this: + </para> + <programlisting># regular hgrc file refers to external usermap file +[bugzilla] +usermap = /home/hg/repos/userdata/bugzilla-usermap.conf</programlisting> + <para>While the <filename>usermap</filename> file that it + refers to might look like this: + </para> + <programlisting># bugzilla-usermap.conf - inside a hg repository +[usermap] stephanie@example.com = steph</programlisting> + + </sect3> + <sect3> + <title>Configuring the text that gets added to a bug</title> + + <para>You can configure the text that this hook adds as a + comment; you specify it in the form of a Mercurial template. + Several <filename role="special">~/.hgrc</filename> entries + (still in the <literal role="rc-bugzilla">bugzilla</literal> + section) control this behaviour. + </para> + <itemizedlist> + <listitem><para><literal>strip</literal>: The number of + leading path elements to strip from a repository's path + name to construct a partial path for a URL. For example, + if the repositories on your server live under <filename + class="directory">/home/hg/repos</filename>, and you + have a repository whose path is <filename + class="directory">/home/hg/repos/app/tests</filename>, + then setting <literal>strip</literal> to + <literal>4</literal> will give a partial path of + <filename class="directory">app/tests</filename>. The + hook will make this partial path available when + expanding a template, as <literal>webroot</literal>. + </para> + </listitem> + <listitem><para><literal>template</literal>: The text of the + template to use. In addition to the usual + changeset-related variables, this template can use + <literal>hgweb</literal> (the value of the + <literal>hgweb</literal> configuration item above) and + <literal>webroot</literal> (the path constructed using + <literal>strip</literal> above). + </para> + </listitem></itemizedlist> + + <para>In addition, you can add a <envar + role="rc-item-web">baseurl</envar> item to the <literal + role="rc-web">web</literal> section of your <filename + role="special">~/.hgrc</filename>. The <literal + role="hg-ext">bugzilla</literal> hook will make this + available when expanding a template, as the base string to + use when constructing a URL that will let users browse from + a Bugzilla comment to view a changeset. Example: + </para> + <programlisting>[web] +baseurl = http://hg.domain.com/</programlisting> + + <para>Here is an example set of <literal + role="hg-ext">bugzilla</literal> hook config information. + </para> + + &ch10-bugzilla-config.lst; + + </sect3> + <sect3> + <title>Testing and troubleshooting</title> + + <para>The most common problems with configuring the <literal + role="hg-ext">bugzilla</literal> hook relate to running + Bugzilla's <filename>processmail</filename> script and + mapping committer names to user names. + </para> + + <para>Recall from section <xref + linkend="sec:hook:bugzilla:config"/> above that the user + that runs the Mercurial process on the server is also the + one that will run the <filename>processmail</filename> + script. The <filename>processmail</filename> script + sometimes causes Bugzilla to write to files in its + configuration directory, and Bugzilla's configuration files + are usually owned by the user that your web server runs + under. + </para> + + <para>You can cause <filename>processmail</filename> to be run + with the suitable user's identity using the + <command>sudo</command> command. Here is an example entry + for a <filename>sudoers</filename> file. + </para> + <programlisting>hg_user = (httpd_user) +NOPASSWD: /var/www/html/bugzilla/processmail-wrapper %s</programlisting> + <para>This allows the <literal>hg_user</literal> user to run a + <filename>processmail-wrapper</filename> program under the + identity of <literal>httpd_user</literal>. + </para> + + <para>This indirection through a wrapper script is necessary, + because <filename>processmail</filename> expects to be run + with its current directory set to wherever you installed + Bugzilla; you can't specify that kind of constraint in a + <filename>sudoers</filename> file. The contents of the + wrapper script are simple: + </para> + <programlisting>#!/bin/sh +cd `dirname $0` && ./processmail "$1" nobody@example.com</programlisting> + <para>It doesn't seem to matter what email address you pass to + <filename>processmail</filename>. + </para> + + <para>If your <literal role="rc-usermap">usermap</literal> is + not set up correctly, users will see an error message from + the <literal role="hg-ext">bugzilla</literal> hook when they + push changes to the server. The error message will look + like this: + </para> + <programlisting>cannot find bugzilla user id for john.q.public@example.com</programlisting> + <para>What this means is that the committer's address, + <literal>john.q.public@example.com</literal>, is not a valid + Bugzilla user name, nor does it have an entry in your + <literal role="rc-usermap">usermap</literal> that maps it to + a valid Bugzilla user name. + </para> + + </sect3> + </sect2> + <sect2> + <title><literal role="hg-ext">notify</literal>&emdash;send email + notifications</title> + + <para>Although Mercurial's built-in web server provides RSS + feeds of changes in every repository, many people prefer to + receive change notifications via email. The <literal + role="hg-ext">notify</literal> hook lets you send out + notifications to a set of email addresses whenever changesets + arrive that those subscribers are interested in. + </para> + + <para>As with the <literal role="hg-ext">bugzilla</literal> + hook, the <literal role="hg-ext">notify</literal> hook is + template-driven, so you can customise the contents of the + notification messages that it sends. + </para> + + <para>By default, the <literal role="hg-ext">notify</literal> + hook includes a diff of every changeset that it sends out; you + can limit the size of the diff, or turn this feature off + entirely. It is useful for letting subscribers review changes + immediately, rather than clicking to follow a URL. + </para> + + <sect3> + <title>Configuring the <literal role="hg-ext">notify</literal> + hook</title> + + <para>You can set up the <literal + role="hg-ext">notify</literal> hook to send one email + message per incoming changeset, or one per incoming group of + changesets (all those that arrived in a single pull or + push). + </para> + <programlisting>[hooks] +# send one email per group of changes +changegroup.notify = python:hgext.notify.hook +# send one email per change +incoming.notify = python:hgext.notify.hook</programlisting> + + <para>Configuration information for this hook lives in the + <literal role="rc-notify">notify</literal> section of a + <filename role="special">~/.hgrc</filename> file. + </para> + <itemizedlist> + <listitem><para><envar role="rc-item-notify">test</envar>: + By default, this hook does not send out email at all; + instead, it prints the message that it + <emphasis>would</emphasis> send. Set this item to + <literal>false</literal> to allow email to be sent. The + reason that sending of email is turned off by default is + that it takes several tries to configure this extension + exactly as you would like, and it would be bad form to + spam subscribers with a number of <quote>broken</quote> + notifications while you debug your configuration. + </para> + </listitem> + <listitem><para><envar role="rc-item-notify">config</envar>: + The path to a configuration file that contains + subscription information. This is kept separate from + the main <filename role="special">~/.hgrc</filename> so + that you can maintain it in a repository of its own. + People can then clone that repository, update their + subscriptions, and push the changes back to your server. + </para> + </listitem> + <listitem><para><envar role="rc-item-notify">strip</envar>: + The number of leading path separator characters to strip + from a repository's path, when deciding whether a + repository has subscribers. For example, if the + repositories on your server live in <filename + class="directory">/home/hg/repos</filename>, and + <literal role="hg-ext">notify</literal> is considering a + repository named <filename + class="directory">/home/hg/repos/shared/test</filename>, + setting <envar role="rc-item-notify">strip</envar> to + <literal>4</literal> will cause <literal + role="hg-ext">notify</literal> to trim the path it + considers down to <filename + class="directory">shared/test</filename>, and it will + match subscribers against that. + </para> + </listitem> + <listitem><para><envar + role="rc-item-notify">template</envar>: The template + text to use when sending messages. This specifies both + the contents of the message header and its body. + </para> + </listitem> + <listitem><para><envar + role="rc-item-notify">maxdiff</envar>: The maximum + number of lines of diff data to append to the end of a + message. If a diff is longer than this, it is + truncated. By default, this is set to 300. Set this to + <literal>0</literal> to omit diffs from notification + emails. + </para> + </listitem> + <listitem><para><envar + role="rc-item-notify">sources</envar>: A list of + sources of changesets to consider. This lets you limit + <literal role="hg-ext">notify</literal> to only sending + out email about changes that remote users pushed into + this repository via a server, for example. See section + <xref + linkend="sec:hook:sources"/> for the sources you can + specify here. + </para> + </listitem></itemizedlist> + + <para>If you set the <envar role="rc-item-web">baseurl</envar> + item in the <literal role="rc-web">web</literal> section, + you can use it in a template; it will be available as + <literal>webroot</literal>. + </para> + + <para>Here is an example set of <literal + role="hg-ext">notify</literal> configuration information. + </para> + + &ch10-notify-config.lst; + + <para>This will produce a message that looks like the + following: + </para> + + &ch10-notify-config-mail.lst; + + </sect3> + <sect3> + <title>Testing and troubleshooting</title> + + <para>Do not forget that by default, the <literal + role="hg-ext">notify</literal> extension <emphasis>will not + send any mail</emphasis> until you explicitly configure it to do so, + by setting <envar role="rc-item-notify">test</envar> to + <literal>false</literal>. Until you do that, it simply + prints the message it <emphasis>would</emphasis> send. + </para> + + </sect3> + </sect2> + </sect1> + <sect1 id="sec:hook:ref"> + <title>Information for writers of hooks</title> + + <sect2> + <title>In-process hook execution</title> + + <para>An in-process hook is called with arguments of the + following form: + </para> + <programlisting>def myhook(ui, repo, **kwargs): pass</programlisting> + <para>The <literal>ui</literal> parameter is a <literal + role="py-mod-mercurial.ui">ui</literal> object. The + <literal>repo</literal> parameter is a <literal + role="py-mod-mercurial.localrepo">localrepository</literal> + object. The names and values of the + <literal>**kwargs</literal> parameters depend on the hook + being invoked, with the following common features: + </para> + <itemizedlist> + <listitem><para>If a parameter is named + <literal>node</literal> or <literal>parentN</literal>, it + will contain a hexadecimal changeset ID. The empty string + is used to represent <quote>null changeset ID</quote> + instead of a string of zeroes. + </para> + </listitem> + <listitem><para>If a parameter is named + <literal>url</literal>, it will contain the URL of a + remote repository, if that can be determined. + </para> + </listitem> + <listitem><para>Boolean-valued parameters are represented as + Python <literal>bool</literal> objects. + </para> + </listitem></itemizedlist> + + <para>An in-process hook is called without a change to the + process's working directory (unlike external hooks, which are + run in the root of the repository). It must not change the + process's working directory, or it will cause any calls it + makes into the Mercurial API to fail. + </para> + + <para>If a hook returns a boolean <quote>false</quote> value, it + is considered to have succeeded. If it returns a boolean + <quote>true</quote> value or raises an exception, it is + considered to have failed. A useful way to think of the + calling convention is <quote>tell me if you fail</quote>. + </para> + + <para>Note that changeset IDs are passed into Python hooks as + hexadecimal strings, not the binary hashes that Mercurial's + APIs normally use. To convert a hash from hex to binary, use + the <literal>bin</literal> function. + </para> + + </sect2> + <sect2> + <title>External hook execution</title> + + <para>An external hook is passed to the shell of the user + running Mercurial. Features of that shell, such as variable + substitution and command redirection, are available. The hook + is run in the root directory of the repository (unlike + in-process hooks, which are run in the same directory that + Mercurial was run in). + </para> + + <para>Hook parameters are passed to the hook as environment + variables. Each environment variable's name is converted in + upper case and prefixed with the string + <quote><literal>HG_</literal></quote>. For example, if the + name of a parameter is <quote><literal>node</literal></quote>, + the name of the environment variable representing that + parameter will be <quote><literal>HG_NODE</literal></quote>. + </para> + + <para>A boolean parameter is represented as the string + <quote><literal>1</literal></quote> for <quote>true</quote>, + <quote><literal>0</literal></quote> for <quote>false</quote>. + If an environment variable is named <envar>HG_NODE</envar>, + <envar>HG_PARENT1</envar> or <envar>HG_PARENT2</envar>, it + contains a changeset ID represented as a hexadecimal string. + The empty string is used to represent <quote>null changeset + ID</quote> instead of a string of zeroes. If an environment + variable is named <envar>HG_URL</envar>, it will contain the + URL of a remote repository, if that can be determined. + </para> + + <para>If a hook exits with a status of zero, it is considered to + have succeeded. If it exits with a non-zero status, it is + considered to have failed. + </para> + + </sect2> + <sect2> + <title>Finding out where changesets come from</title> + + <para>A hook that involves the transfer of changesets between a + local repository and another may be able to find out + information about the <quote>far side</quote>. Mercurial + knows <emphasis>how</emphasis> changes are being transferred, + and in many cases <emphasis>where</emphasis> they are being + transferred to or from. + </para> + + <sect3 id="sec:hook:sources"> + <title>Sources of changesets</title> + + <para>Mercurial will tell a hook what means are, or were, used + to transfer changesets between repositories. This is + provided by Mercurial in a Python parameter named + <literal>source</literal>, or an environment variable named + <envar>HG_SOURCE</envar>. + </para> + + <itemizedlist> + <listitem><para><literal>serve</literal>: Changesets are + transferred to or from a remote repository over http or + ssh. + </para> + </listitem> + <listitem><para><literal>pull</literal>: Changesets are + being transferred via a pull from one repository into + another. + </para> + </listitem> + <listitem><para><literal>push</literal>: Changesets are + being transferred via a push from one repository into + another. + </para> + </listitem> + <listitem><para><literal>bundle</literal>: Changesets are + being transferred to or from a bundle. + </para> + </listitem></itemizedlist> + + </sect3> + <sect3 id="sec:hook:url"> + <title>Where changes are going&emdash;remote repository + URLs</title> + + <para>When possible, Mercurial will tell a hook the location + of the <quote>far side</quote> of an activity that transfers + changeset data between repositories. This is provided by + Mercurial in a Python parameter named + <literal>url</literal>, or an environment variable named + <envar>HG_URL</envar>. + </para> + + <para>This information is not always known. If a hook is + invoked in a repository that is being served via http or + ssh, Mercurial cannot tell where the remote repository is, + but it may know where the client is connecting from. In + such cases, the URL will take one of the following forms: + </para> + <itemizedlist> + <listitem><para><literal>remote:ssh:1.2.3.4</literal>&emdash;remote + ssh client, at the IP address + <literal>1.2.3.4</literal>. + </para> + </listitem> + <listitem><para><literal>remote:http:1.2.3.4</literal>&emdash;remote + http client, at the IP address + <literal>1.2.3.4</literal>. If the client is using SSL, + this will be of the form + <literal>remote:https:1.2.3.4</literal>. + </para> + </listitem> + <listitem><para>Empty&emdash;no information could be + discovered about the remote client. + </para> + </listitem></itemizedlist> + + </sect3> + </sect2> + </sect1> + <sect1> + <title>Hook reference</title> + + <sect2 id="sec:hook:changegroup"> + <title><literal role="hook">changegroup</literal>&emdash;after + remote changesets added</title> + + <para>This hook is run after a group of pre-existing changesets + has been added to the repository, for example via a <command + role="hg-cmd">hg pull</command> or <command role="hg-cmd">hg + unbundle</command>. This hook is run once per operation + that added one or more changesets. This is in contrast to the + <literal role="hook">incoming</literal> hook, which is run + once per changeset, regardless of whether the changesets + arrive in a group. + </para> + + <para>Some possible uses for this hook include kicking off an + automated build or test of the added changesets, updating a + bug database, or notifying subscribers that a repository + contains new changes. + </para> + + <para>Parameters to this hook: + </para> + <itemizedlist> + <listitem><para><literal>node</literal>: A changeset ID. The + changeset ID of the first changeset in the group that was + added. All changesets between this and + <literal role="tag">tip</literal>, inclusive, were added by a single + <command role="hg-cmd">hg pull</command>, <command + role="hg-cmd">hg push</command> or <command + role="hg-cmd">hg unbundle</command>. + </para> + </listitem> + <listitem><para><literal>source</literal>: A string. The + source of these changes. See section <xref + linkend="sec:hook:sources"/> for details. + </para> + </listitem> + <listitem><para><literal>url</literal>: A URL. The location + of the remote repository, if known. See section <xref + linkend="sec:hook:url"/> for more + information. + </para> + </listitem></itemizedlist> + + <para>See also: <literal role="hook">incoming</literal> (section + <xref linkend="sec:hook:incoming"/>), <literal + role="hook">prechangegroup</literal> (section <xref + linkend="sec:hook:prechangegroup"/>), <literal + role="hook">pretxnchangegroup</literal> (section <xref + linkend="sec:hook:pretxnchangegroup"/>) + </para> + + </sect2> + <sect2 id="sec:hook:commit"> + <title><literal role="hook">commit</literal>&emdash;after a new + changeset is created</title> + + <para>This hook is run after a new changeset has been created. + </para> + + <para>Parameters to this hook: + </para> + <itemizedlist> + <listitem><para><literal>node</literal>: A changeset ID. The + changeset ID of the newly committed changeset. + </para> + </listitem> + <listitem><para><literal>parent1</literal>: A changeset ID. + The changeset ID of the first parent of the newly + committed changeset. + </para> + </listitem> + <listitem><para><literal>parent2</literal>: A changeset ID. + The changeset ID of the second parent of the newly + committed changeset. + </para> + </listitem></itemizedlist> + + <para>See also: <literal role="hook">precommit</literal> + (section <xref linkend="sec:hook:precommit"/>), <literal + role="hook">pretxncommit</literal> (section <xref + linkend="sec:hook:pretxncommit"/>) + </para> + + </sect2> + <sect2 id="sec:hook:incoming"> + <title><literal role="hook">incoming</literal>&emdash;after one + remote changeset is added</title> + + <para>This hook is run after a pre-existing changeset has been + added to the repository, for example via a <command + role="hg-cmd">hg push</command>. If a group of changesets + was added in a single operation, this hook is called once for + each added changeset. + </para> + + <para>You can use this hook for the same purposes as the + <literal role="hook">changegroup</literal> hook (section <xref + linkend="sec:hook:changegroup"/>); it's simply + more convenient sometimes to run a hook once per group of + changesets, while other times it's handier once per changeset. + </para> + + <para>Parameters to this hook: + </para> + <itemizedlist> + <listitem><para><literal>node</literal>: A changeset ID. The + ID of the newly added changeset. + </para> + </listitem> + <listitem><para><literal>source</literal>: A string. The + source of these changes. See section <xref + linkend="sec:hook:sources"/> for details. + </para> + </listitem> + <listitem><para><literal>url</literal>: A URL. The location + of the remote repository, if known. See section <xref + linkend="sec:hook:url"/> for more + information. + </para> + </listitem></itemizedlist> + + <para>See also: <literal role="hook">changegroup</literal> + (section <xref linkend="sec:hook:changegroup"/>) <literal + role="hook">prechangegroup</literal> (section <xref + linkend="sec:hook:prechangegroup"/>), <literal + role="hook">pretxnchangegroup</literal> (section <xref + linkend="sec:hook:pretxnchangegroup"/>) + </para> + + </sect2> + <sect2 id="sec:hook:outgoing"> + <title><literal role="hook">outgoing</literal>&emdash;after + changesets are propagated</title> + + <para>This hook is run after a group of changesets has been + propagated out of this repository, for example by a <command + role="hg-cmd">hg push</command> or <command role="hg-cmd">hg + bundle</command> command. + </para> + + <para>One possible use for this hook is to notify administrators + that changes have been pulled. + </para> + + <para>Parameters to this hook: + </para> + <itemizedlist> + <listitem><para><literal>node</literal>: A changeset ID. The + changeset ID of the first changeset of the group that was + sent. + </para> + </listitem> + <listitem><para><literal>source</literal>: A string. The + source of the of the operation (see section <xref + linkend="sec:hook:sources"/>). If a remote + client pulled changes from this repository, + <literal>source</literal> will be + <literal>serve</literal>. If the client that obtained + changes from this repository was local, + <literal>source</literal> will be + <literal>bundle</literal>, <literal>pull</literal>, or + <literal>push</literal>, depending on the operation the + client performed. + </para> + </listitem> + <listitem><para><literal>url</literal>: A URL. The location + of the remote repository, if known. See section <xref + linkend="sec:hook:url"/> for more + information. + </para> + </listitem></itemizedlist> + + <para>See also: <literal role="hook">preoutgoing</literal> + (section <xref linkend="sec:hook:preoutgoing"/>) + </para> + + </sect2> + <sect2 id="sec:hook:prechangegroup"> + <title><literal + role="hook">prechangegroup</literal>&emdash;before starting + to add remote changesets</title> + + <para>This controlling hook is run before Mercurial begins to + add a group of changesets from another repository. + </para> + + <para>This hook does not have any information about the + changesets to be added, because it is run before transmission + of those changesets is allowed to begin. If this hook fails, + the changesets will not be transmitted. + </para> + + <para>One use for this hook is to prevent external changes from + being added to a repository. For example, you could use this + to <quote>freeze</quote> a server-hosted branch temporarily or + permanently so that users cannot push to it, while still + allowing a local administrator to modify the repository. + </para> + + <para>Parameters to this hook: + </para> + <itemizedlist> + <listitem><para><literal>source</literal>: A string. The + source of these changes. See section <xref + linkend="sec:hook:sources"/> for details. + </para> + </listitem> + <listitem><para><literal>url</literal>: A URL. The location + of the remote repository, if known. See section <xref + linkend="sec:hook:url"/> for more + information. + </para> + </listitem></itemizedlist> + + <para>See also: <literal role="hook">changegroup</literal> + (section <xref linkend="sec:hook:changegroup"/>), <literal + role="hook">incoming</literal> (section <xref + linkend="sec:hook:incoming"/>), , <literal + role="hook">pretxnchangegroup</literal> (section <xref + linkend="sec:hook:pretxnchangegroup"/>) + </para> + + </sect2> + <sect2 id="sec:hook:precommit"> + <title><literal role="hook">precommit</literal>&emdash;before + starting to commit a changeset</title> + + <para>This hook is run before Mercurial begins to commit a new + changeset. It is run before Mercurial has any of the metadata + for the commit, such as the files to be committed, the commit + message, or the commit date. + </para> + + <para>One use for this hook is to disable the ability to commit + new changesets, while still allowing incoming changesets. + Another is to run a build or test, and only allow the commit + to begin if the build or test succeeds. + </para> + + <para>Parameters to this hook: + </para> + <itemizedlist> + <listitem><para><literal>parent1</literal>: A changeset ID. + The changeset ID of the first parent of the working + directory. + </para> + </listitem> + <listitem><para><literal>parent2</literal>: A changeset ID. + The changeset ID of the second parent of the working + directory. + </para> + </listitem></itemizedlist> + <para>If the commit proceeds, the parents of the working + directory will become the parents of the new changeset. + </para> + + <para>See also: <literal role="hook">commit</literal> (section + <xref linkend="sec:hook:commit"/>), <literal + role="hook">pretxncommit</literal> (section <xref + linkend="sec:hook:pretxncommit"/>) + </para> + + </sect2> + <sect2 id="sec:hook:preoutgoing"> + <title><literal role="hook">preoutgoing</literal>&emdash;before + starting to propagate changesets</title> + + <para>This hook is invoked before Mercurial knows the identities + of the changesets to be transmitted. + </para> + + <para>One use for this hook is to prevent changes from being + transmitted to another repository. + </para> + + <para>Parameters to this hook: + </para> + <itemizedlist> + <listitem><para><literal>source</literal>: A string. The + source of the operation that is attempting to obtain + changes from this repository (see section <xref + linkend="sec:hook:sources"/>). See the documentation + for the <literal>source</literal> parameter to the + <literal role="hook">outgoing</literal> hook, in section + <xref linkend="sec:hook:outgoing"/>, for possible values + of + this parameter. + </para> + </listitem> + <listitem><para><literal>url</literal>: A URL. The location + of the remote repository, if known. See section <xref + linkend="sec:hook:url"/> for more + information. + </para> + </listitem></itemizedlist> + + <para>See also: <literal role="hook">outgoing</literal> (section + <xref linkend="sec:hook:outgoing"/>) + </para> + + </sect2> + <sect2 id="sec:hook:pretag"> + <title><literal role="hook">pretag</literal>&emdash;before + tagging a changeset</title> + + <para>This controlling hook is run before a tag is created. If + the hook succeeds, creation of the tag proceeds. If the hook + fails, the tag is not created. + </para> + + <para>Parameters to this hook: + </para> + <itemizedlist> + <listitem><para><literal>local</literal>: A boolean. Whether + the tag is local to this repository instance (i.e. stored + in <filename role="special">.hg/localtags</filename>) or + managed by Mercurial (stored in <filename + role="special">.hgtags</filename>). + </para> + </listitem> + <listitem><para><literal>node</literal>: A changeset ID. The + ID of the changeset to be tagged. + </para> + </listitem> + <listitem><para><literal>tag</literal>: A string. The name of + the tag to be created. + </para> + </listitem></itemizedlist> + + <para>If the tag to be created is revision-controlled, the + <literal role="hook">precommit</literal> and <literal + role="hook">pretxncommit</literal> hooks (sections <xref + linkend="sec:hook:commit"/> and <xref + linkend="sec:hook:pretxncommit"/>) will also be run. + </para> + + <para>See also: <literal role="hook">tag</literal> (section + <xref linkend="sec:hook:tag"/>) + </para> + </sect2> + <sect2 id="sec:hook:pretxnchangegroup"> + <title><literal + role="hook">pretxnchangegroup</literal>&emdash;before + completing addition of remote changesets</title> + + <para>This controlling hook is run before a + transaction&emdash;that manages the addition of a group of new + changesets from outside the repository&emdash;completes. If + the hook succeeds, the transaction completes, and all of the + changesets become permanent within this repository. If the + hook fails, the transaction is rolled back, and the data for + the changesets is erased. + </para> + + <para>This hook can access the metadata associated with the + almost-added changesets, but it should not do anything + permanent with this data. It must also not modify the working + directory. + </para> + + <para>While this hook is running, if other Mercurial processes + access this repository, they will be able to see the + almost-added changesets as if they are permanent. This may + lead to race conditions if you do not take steps to avoid + them. + </para> + + <para>This hook can be used to automatically vet a group of + changesets. If the hook fails, all of the changesets are + <quote>rejected</quote> when the transaction rolls back. + </para> + + <para>Parameters to this hook: + </para> + <itemizedlist> + <listitem><para><literal>node</literal>: A changeset ID. The + changeset ID of the first changeset in the group that was + added. All changesets between this and + <literal role="tag">tip</literal>, + inclusive, were added by a single <command + role="hg-cmd">hg pull</command>, <command + role="hg-cmd">hg push</command> or <command + role="hg-cmd">hg unbundle</command>. + </para> + </listitem> + <listitem><para><literal>source</literal>: A string. The + source of these changes. See section <xref + linkend="sec:hook:sources"/> for details. + </para> + </listitem> + <listitem><para><literal>url</literal>: A URL. The location + of the remote repository, if known. See section <xref + linkend="sec:hook:url"/> for more + information. + </para> + </listitem></itemizedlist> + + <para>See also: <literal role="hook">changegroup</literal> + (section <xref linkend="sec:hook:changegroup"/>), <literal + role="hook">incoming</literal> (section <xref + linkend="sec:hook:incoming"/>), <literal + role="hook">prechangegroup</literal> (section <xref + linkend="sec:hook:prechangegroup"/>) + </para> + + </sect2> + <sect2 id="sec:hook:pretxncommit"> + <title><literal role="hook">pretxncommit</literal>&emdash;before + completing commit of new changeset</title> + + <para>This controlling hook is run before a + transaction&emdash;that manages a new commit&emdash;completes. + If the hook succeeds, the transaction completes and the + changeset becomes permanent within this repository. If the + hook fails, the transaction is rolled back, and the commit + data is erased. + </para> + + <para>This hook can access the metadata associated with the + almost-new changeset, but it should not do anything permanent + with this data. It must also not modify the working + directory. + </para> + + <para>While this hook is running, if other Mercurial processes + access this repository, they will be able to see the + almost-new changeset as if it is permanent. This may lead to + race conditions if you do not take steps to avoid them. + </para> + + <para>Parameters to this hook: + </para> + <itemizedlist> + <listitem><para><literal>node</literal>: A changeset ID. The + changeset ID of the newly committed changeset. + </para> + </listitem> + <listitem><para><literal>parent1</literal>: A changeset ID. + The changeset ID of the first parent of the newly + committed changeset. + </para> + </listitem> + <listitem><para><literal>parent2</literal>: A changeset ID. + The changeset ID of the second parent of the newly + committed changeset. + </para> + </listitem></itemizedlist> + + <para>See also: <literal role="hook">precommit</literal> + (section <xref linkend="sec:hook:precommit"/>) + </para> + + </sect2> + <sect2 id="sec:hook:preupdate"> + <title><literal role="hook">preupdate</literal>&emdash;before + updating or merging working directory</title> + + <para>This controlling hook is run before an update or merge of + the working directory begins. It is run only if Mercurial's + normal pre-update checks determine that the update or merge + can proceed. If the hook succeeds, the update or merge may + proceed; if it fails, the update or merge does not start. + </para> + + <para>Parameters to this hook: + </para> + <itemizedlist> + <listitem><para><literal>parent1</literal>: A changeset ID. + The ID of the parent that the working directory is to be + updated to. If the working directory is being merged, it + will not change this parent. + </para> + </listitem> + <listitem><para><literal>parent2</literal>: A changeset ID. + Only set if the working directory is being merged. The ID + of the revision that the working directory is being merged + with. + </para> + </listitem></itemizedlist> + + <para>See also: <literal role="hook">update</literal> (section + <xref linkend="sec:hook:update"/>) + </para> + + </sect2> + <sect2 id="sec:hook:tag"> + <title><literal role="hook">tag</literal>&emdash;after tagging a + changeset</title> + + <para>This hook is run after a tag has been created. + </para> + + <para>Parameters to this hook: + </para> + <itemizedlist> + <listitem><para><literal>local</literal>: A boolean. Whether + the new tag is local to this repository instance (i.e. + stored in <filename + role="special">.hg/localtags</filename>) or managed by + Mercurial (stored in <filename + role="special">.hgtags</filename>). + </para> + </listitem> + <listitem><para><literal>node</literal>: A changeset ID. The + ID of the changeset that was tagged. + </para> + </listitem> + <listitem><para><literal>tag</literal>: A string. The name of + the tag that was created. + </para> + </listitem></itemizedlist> + + <para>If the created tag is revision-controlled, the <literal + role="hook">commit</literal> hook (section <xref + linkend="sec:hook:commit"/>) is run before this hook. + </para> + + <para>See also: <literal role="hook">pretag</literal> (section + <xref linkend="sec:hook:pretag"/>) + </para> + + </sect2> + <sect2 id="sec:hook:update"> + <title><literal role="hook">update</literal>&emdash;after + updating or merging working directory</title> + + <para>This hook is run after an update or merge of the working + directory completes. Since a merge can fail (if the external + <command>hgmerge</command> command fails to resolve conflicts + in a file), this hook communicates whether the update or merge + completed cleanly. + </para> + + <itemizedlist> + <listitem><para><literal>error</literal>: A boolean. + Indicates whether the update or merge completed + successfully. + </para> + </listitem> + <listitem><para><literal>parent1</literal>: A changeset ID. + The ID of the parent that the working directory was + updated to. If the working directory was merged, it will + not have changed this parent. + </para> + </listitem> + <listitem><para><literal>parent2</literal>: A changeset ID. + Only set if the working directory was merged. The ID of + the revision that the working directory was merged with. + </para> + </listitem></itemizedlist> + + <para>See also: <literal role="hook">preupdate</literal> + (section <xref linkend="sec:hook:preupdate"/>) + </para> + + </sect2> + </sect1> +</chapter> + +<!-- +local variables: +sgml-parent-document: ("00book.xml" "book" "chapter") +end: +-->
--- a/en/ch09-undo.xml Wed Mar 18 00:08:22 2009 -0700 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,1072 +0,0 @@ -<!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : --> - -<chapter id="chap:undo"> - <?dbhtml filename="finding-and-fixing-mistakes.html"?> - <title>Finding and fixing mistakes</title> - - <para>To err might be human, but to really handle the consequences - well takes a top-notch revision control system. In this chapter, - we'll discuss some of the techniques you can use when you find - that a problem has crept into your project. Mercurial has some - highly capable features that will help you to isolate the sources - of problems, and to handle them appropriately.</para> - - <sect1> - <title>Erasing local history</title> - - <sect2> - <title>The accidental commit</title> - - <para>I have the occasional but persistent problem of typing - rather more quickly than I can think, which sometimes results - in me committing a changeset that is either incomplete or - plain wrong. In my case, the usual kind of incomplete - changeset is one in which I've created a new source file, but - forgotten to <command role="hg-cmd">hg add</command> it. A - <quote>plain wrong</quote> changeset is not as common, but no - less annoying.</para> - - </sect2> - <sect2 id="sec:undo:rollback"> - <title>Rolling back a transaction</title> - - <para>In section <xref linkend="sec:concepts:txn"/>, I mentioned - that Mercurial treats each modification of a repository as a - <emphasis>transaction</emphasis>. Every time you commit a - changeset or pull changes from another repository, Mercurial - remembers what you did. You can undo, or <emphasis>roll - back</emphasis>, exactly one of these actions using the - <command role="hg-cmd">hg rollback</command> command. (See - section <xref linkend="sec:undo:rollback-after-push"/> for an - important caveat about the use of this command.)</para> - - <para>Here's a mistake that I often find myself making: - committing a change in which I've created a new file, but - forgotten to <command role="hg-cmd">hg add</command> - it.</para> - - &interaction.rollback.commit; - - <para>Looking at the output of <command role="hg-cmd">hg - status</command> after the commit immediately confirms the - error.</para> - - &interaction.rollback.status; - - <para>The commit captured the changes to the file - <filename>a</filename>, but not the new file - <filename>b</filename>. If I were to push this changeset to a - repository that I shared with a colleague, the chances are - high that something in <filename>a</filename> would refer to - <filename>b</filename>, which would not be present in their - repository when they pulled my changes. I would thus become - the object of some indignation.</para> - - <para>However, luck is with me&emdash;I've caught my error - before I pushed the changeset. I use the <command - role="hg-cmd">hg rollback</command> command, and Mercurial - makes that last changeset vanish.</para> - - &interaction.rollback.rollback; - - <para>Notice that the changeset is no longer present in the - repository's history, and the working directory once again - thinks that the file <filename>a</filename> is modified. The - commit and rollback have left the working directory exactly as - it was prior to the commit; the changeset has been completely - erased. I can now safely <command role="hg-cmd">hg - add</command> the file <filename>b</filename>, and rerun my - commit.</para> - - &interaction.rollback.add; - - </sect2> - <sect2> - <title>The erroneous pull</title> - - <para>It's common practice with Mercurial to maintain separate - development branches of a project in different repositories. - Your development team might have one shared repository for - your project's <quote>0.9</quote> release, and another, - containing different changes, for the <quote>1.0</quote> - release.</para> - - <para>Given this, you can imagine that the consequences could be - messy if you had a local <quote>0.9</quote> repository, and - accidentally pulled changes from the shared <quote>1.0</quote> - repository into it. At worst, you could be paying - insufficient attention, and push those changes into the shared - <quote>0.9</quote> tree, confusing your entire team (but don't - worry, we'll return to this horror scenario later). However, - it's more likely that you'll notice immediately, because - Mercurial will display the URL it's pulling from, or you will - see it pull a suspiciously large number of changes into the - repository.</para> - - <para>The <command role="hg-cmd">hg rollback</command> command - will work nicely to expunge all of the changesets that you - just pulled. Mercurial groups all changes from one <command - role="hg-cmd">hg pull</command> into a single transaction, - so one <command role="hg-cmd">hg rollback</command> is all you - need to undo this mistake.</para> - - </sect2> - <sect2 id="sec:undo:rollback-after-push"> - <title>Rolling back is useless once you've pushed</title> - - <para>The value of the <command role="hg-cmd">hg - rollback</command> command drops to zero once you've pushed - your changes to another repository. Rolling back a change - makes it disappear entirely, but <emphasis>only</emphasis> in - the repository in which you perform the <command - role="hg-cmd">hg rollback</command>. Because a rollback - eliminates history, there's no way for the disappearance of a - change to propagate between repositories.</para> - - <para>If you've pushed a change to another - repository&emdash;particularly if it's a shared - repository&emdash;it has essentially <quote>escaped into the - wild,</quote> and you'll have to recover from your mistake - in a different way. What will happen if you push a changeset - somewhere, then roll it back, then pull from the repository - you pushed to, is that the changeset will reappear in your - repository.</para> - - <para>(If you absolutely know for sure that the change you want - to roll back is the most recent change in the repository that - you pushed to, <emphasis>and</emphasis> you know that nobody - else could have pulled it from that repository, you can roll - back the changeset there, too, but you really should really - not rely on this working reliably. If you do this, sooner or - later a change really will make it into a repository that you - don't directly control (or have forgotten about), and come - back to bite you.)</para> - - </sect2> - <sect2> - <title>You can only roll back once</title> - - <para>Mercurial stores exactly one transaction in its - transaction log; that transaction is the most recent one that - occurred in the repository. This means that you can only roll - back one transaction. If you expect to be able to roll back - one transaction, then its predecessor, this is not the - behaviour you will get.</para> - - &interaction.rollback.twice; - - <para>Once you've rolled back one transaction in a repository, - you can't roll back again in that repository until you perform - another commit or pull.</para> - - </sect2> - </sect1> - <sect1> - <title>Reverting the mistaken change</title> - - <para>If you make a modification to a file, and decide that you - really didn't want to change the file at all, and you haven't - yet committed your changes, the <command role="hg-cmd">hg - revert</command> command is the one you'll need. It looks at - the changeset that's the parent of the working directory, and - restores the contents of the file to their state as of that - changeset. (That's a long-winded way of saying that, in the - normal case, it undoes your modifications.)</para> - - <para>Let's illustrate how the <command role="hg-cmd">hg - revert</command> command works with yet another small example. - We'll begin by modifying a file that Mercurial is already - tracking.</para> - - &interaction.daily.revert.modify; - - <para>If we don't - want that change, we can simply <command role="hg-cmd">hg - revert</command> the file.</para> - - &interaction.daily.revert.unmodify; - - <para>The <command role="hg-cmd">hg revert</command> command - provides us with an extra degree of safety by saving our - modified file with a <filename>.orig</filename> - extension.</para> - - &interaction.daily.revert.status; - - <para>Here is a summary of the cases that the <command - role="hg-cmd">hg revert</command> command can deal with. We - will describe each of these in more detail in the section that - follows.</para> - <itemizedlist> - <listitem><para>If you modify a file, it will restore the file - to its unmodified state.</para> - </listitem> - <listitem><para>If you <command role="hg-cmd">hg add</command> a - file, it will undo the <quote>added</quote> state of the - file, but leave the file itself untouched.</para> - </listitem> - <listitem><para>If you delete a file without telling Mercurial, - it will restore the file to its unmodified contents.</para> - </listitem> - <listitem><para>If you use the <command role="hg-cmd">hg - remove</command> command to remove a file, it will undo - the <quote>removed</quote> state of the file, and restore - the file to its unmodified contents.</para> - </listitem></itemizedlist> - - <sect2 id="sec:undo:mgmt"> - <title>File management errors</title> - - <para>The <command role="hg-cmd">hg revert</command> command is - useful for more than just modified files. It lets you reverse - the results of all of Mercurial's file management - commands&emdash;<command role="hg-cmd">hg add</command>, - <command role="hg-cmd">hg remove</command>, and so on.</para> - - <para>If you <command role="hg-cmd">hg add</command> a file, - then decide that in fact you don't want Mercurial to track it, - use <command role="hg-cmd">hg revert</command> to undo the - add. Don't worry; Mercurial will not modify the file in any - way. It will just <quote>unmark</quote> the file.</para> - - &interaction.daily.revert.add; - - <para>Similarly, if you ask Mercurial to <command - role="hg-cmd">hg remove</command> a file, you can use - <command role="hg-cmd">hg revert</command> to restore it to - the contents it had as of the parent of the working directory. - &interaction.daily.revert.remove; This works just as - well for a file that you deleted by hand, without telling - Mercurial (recall that in Mercurial terminology, this kind of - file is called <quote>missing</quote>).</para> - - &interaction.daily.revert.missing; - - <para>If you revert a <command role="hg-cmd">hg copy</command>, - the copied-to file remains in your working directory - afterwards, untracked. Since a copy doesn't affect the - copied-from file in any way, Mercurial doesn't do anything - with the copied-from file.</para> - - &interaction.daily.revert.copy; - - <sect3> - <title>A slightly special case: reverting a rename</title> - - <para>If you <command role="hg-cmd">hg rename</command> a - file, there is one small detail that you should remember. - When you <command role="hg-cmd">hg revert</command> a - rename, it's not enough to provide the name of the - renamed-to file, as you can see here.</para> - - &interaction.daily.revert.rename; - - <para>As you can see from the output of <command - role="hg-cmd">hg status</command>, the renamed-to file is - no longer identified as added, but the - renamed-<emphasis>from</emphasis> file is still removed! - This is counter-intuitive (at least to me), but at least - it's easy to deal with.</para> - - &interaction.daily.revert.rename-orig; - - <para>So remember, to revert a <command role="hg-cmd">hg - rename</command>, you must provide - <emphasis>both</emphasis> the source and destination - names.</para> - - <para>% TODO: the output doesn't look like it will be - removed!</para> - - <para>(By the way, if you rename a file, then modify the - renamed-to file, then revert both components of the rename, - when Mercurial restores the file that was removed as part of - the rename, it will be unmodified. If you need the - modifications in the renamed-to file to show up in the - renamed-from file, don't forget to copy them over.)</para> - - <para>These fiddly aspects of reverting a rename arguably - constitute a small bug in Mercurial.</para> - - </sect3> - </sect2> - </sect1> - <sect1> - <title>Dealing with committed changes</title> - - <para>Consider a case where you have committed a change $a$, and - another change $b$ on top of it; you then realise that change - $a$ was incorrect. Mercurial lets you <quote>back out</quote> - an entire changeset automatically, and building blocks that let - you reverse part of a changeset by hand.</para> - - <para>Before you read this section, here's something to keep in - mind: the <command role="hg-cmd">hg backout</command> command - undoes changes by <emphasis>adding</emphasis> history, not by - modifying or erasing it. It's the right tool to use if you're - fixing bugs, but not if you're trying to undo some change that - has catastrophic consequences. To deal with those, see section - <xref linkend="sec:undo:aaaiiieee"/>.</para> - - <sect2> - <title>Backing out a changeset</title> - - <para>The <command role="hg-cmd">hg backout</command> command - lets you <quote>undo</quote> the effects of an entire - changeset in an automated fashion. Because Mercurial's - history is immutable, this command <emphasis>does - not</emphasis> get rid of the changeset you want to undo. - Instead, it creates a new changeset that - <emphasis>reverses</emphasis> the effect of the to-be-undone - changeset.</para> - - <para>The operation of the <command role="hg-cmd">hg - backout</command> command is a little intricate, so let's - illustrate it with some examples. First, we'll create a - repository with some simple changes.</para> - - &interaction.backout.init; - - <para>The <command role="hg-cmd">hg backout</command> command - takes a single changeset ID as its argument; this is the - changeset to back out. Normally, <command role="hg-cmd">hg - backout</command> will drop you into a text editor to write - a commit message, so you can record why you're backing the - change out. In this example, we provide a commit message on - the command line using the <option - role="hg-opt-backout">-m</option> option.</para> - - </sect2> - <sect2> - <title>Backing out the tip changeset</title> - - <para>We're going to start by backing out the last changeset we - committed.</para> - - &interaction.backout.simple; - - <para>You can see that the second line from - <filename>myfile</filename> is no longer present. Taking a - look at the output of <command role="hg-cmd">hg log</command> - gives us an idea of what the <command role="hg-cmd">hg - backout</command> command has done. - &interaction.backout.simple.log; Notice that the new changeset - that <command role="hg-cmd">hg backout</command> has created - is a child of the changeset we backed out. It's easier to see - this in figure <xref - linkend="fig:undo:backout"/>, which presents a graphical - view of the change history. As you can see, the history is - nice and linear.</para> - - <informalfigure id="fig:undo:backout"> - <mediaobject><imageobject><imagedata - fileref="undo-simple"/></imageobject><textobject><phrase>XXX - add text</phrase></textobject><caption><para>Backing out - a change using the <command role="hg-cmd">hg - backout</command> - command</para></caption></mediaobject> - - </informalfigure> - - </sect2> - <sect2> - <title>Backing out a non-tip change</title> - - <para>If you want to back out a change other than the last one - you committed, pass the <option - role="hg-opt-backout">--merge</option> option to the - <command role="hg-cmd">hg backout</command> command.</para> - - &interaction.backout.non-tip.clone; - - <para>This makes backing out any changeset a - <quote>one-shot</quote> operation that's usually simple and - fast.</para> - - &interaction.backout.non-tip.backout; - - <para>If you take a look at the contents of - <filename>myfile</filename> after the backout finishes, you'll - see that the first and third changes are present, but not the - second.</para> - - &interaction.backout.non-tip.cat; - - <para>As the graphical history in figure <xref - linkend="fig:undo:backout-non-tip"/> illustrates, Mercurial - actually commits <emphasis>two</emphasis> changes in this kind - of situation (the box-shaped nodes are the ones that Mercurial - commits automatically). Before Mercurial begins the backout - process, it first remembers what the current parent of the - working directory is. It then backs out the target changeset, - and commits that as a changeset. Finally, it merges back to - the previous parent of the working directory, and commits the - result of the merge.</para> - - <para>% TODO: to me it looks like mercurial doesn't commit the - second merge automatically!</para> - - <informalfigure id="fig:undo:backout-non-tip"> - <mediaobject><imageobject><imagedata - fileref="undo-non-tip"/></imageobject><textobject><phrase>XXX - add text</phrase></textobject><caption><para>Automated - backout of a non-tip change using the <command - role="hg-cmd">hg backout</command> - command</para></caption></mediaobject> - </informalfigure> - - <para>The result is that you end up <quote>back where you - were</quote>, only with some extra history that undoes the - effect of the changeset you wanted to back out.</para> - - <sect3> - <title>Always use the <option - role="hg-opt-backout">--merge</option> option</title> - - <para>In fact, since the <option - role="hg-opt-backout">--merge</option> option will do the - <quote>right thing</quote> whether or not the changeset - you're backing out is the tip (i.e. it won't try to merge if - it's backing out the tip, since there's no need), you should - <emphasis>always</emphasis> use this option when you run the - <command role="hg-cmd">hg backout</command> command.</para> - - </sect3> - </sect2> - <sect2> - <title>Gaining more control of the backout process</title> - - <para>While I've recommended that you always use the <option - role="hg-opt-backout">--merge</option> option when backing - out a change, the <command role="hg-cmd">hg backout</command> - command lets you decide how to merge a backout changeset. - Taking control of the backout process by hand is something you - will rarely need to do, but it can be useful to understand - what the <command role="hg-cmd">hg backout</command> command - is doing for you automatically. To illustrate this, let's - clone our first repository, but omit the backout change that - it contains.</para> - - &interaction.backout.manual.clone; - - <para>As with our - earlier example, We'll commit a third changeset, then back out - its parent, and see what happens.</para> - - &interaction.backout.manual.backout; - - <para>Our new changeset is again a descendant of the changeset - we backout out; it's thus a new head, <emphasis>not</emphasis> - a descendant of the changeset that was the tip. The <command - role="hg-cmd">hg backout</command> command was quite - explicit in telling us this.</para> - - &interaction.backout.manual.log; - - <para>Again, it's easier to see what has happened by looking at - a graph of the revision history, in figure <xref - linkend="fig:undo:backout-manual"/>. This makes it clear - that when we use <command role="hg-cmd">hg backout</command> - to back out a change other than the tip, Mercurial adds a new - head to the repository (the change it committed is - box-shaped).</para> - - <informalfigure id="fig:undo:backout-manual"> - <mediaobject><imageobject><imagedata - fileref="undo-manual"/></imageobject><textobject><phrase>XXX - add text</phrase></textobject><caption><para>Backing out - a change using the <command role="hg-cmd">hg - backout</command> - command</para></caption></mediaobject> - - </informalfigure> - - <para>After the <command role="hg-cmd">hg backout</command> - command has completed, it leaves the new - <quote>backout</quote> changeset as the parent of the working - directory.</para> - - &interaction.backout.manual.parents; - - <para>Now we have two isolated sets of changes.</para> - - &interaction.backout.manual.heads; - - <para>Let's think about what we expect to see as the contents of - <filename>myfile</filename> now. The first change should be - present, because we've never backed it out. The second change - should be missing, as that's the change we backed out. Since - the history graph shows the third change as a separate head, - we <emphasis>don't</emphasis> expect to see the third change - present in <filename>myfile</filename>.</para> - - &interaction.backout.manual.cat; - - <para>To get the third change back into the file, we just do a - normal merge of our two heads.</para> - - &interaction.backout.manual.merge; - - <para>Afterwards, the graphical history of our repository looks - like figure - <xref linkend="fig:undo:backout-manual-merge"/>.</para> - - <informalfigure id="fig:undo:backout-manual-merge"> - <mediaobject><imageobject><imagedata - fileref="undo-manual-merge"/></imageobject><textobject><phrase>XXX - add text</phrase></textobject><caption><para>Manually - merging a backout change</para></caption></mediaobject> - - </informalfigure> - - </sect2> - <sect2> - <title>Why <command role="hg-cmd">hg backout</command> works as - it does</title> - - <para>Here's a brief description of how the <command - role="hg-cmd">hg backout</command> command works.</para> - <orderedlist> - <listitem><para>It ensures that the working directory is - <quote>clean</quote>, i.e. that the output of <command - role="hg-cmd">hg status</command> would be empty.</para> - </listitem> - <listitem><para>It remembers the current parent of the working - directory. Let's call this changeset - <literal>orig</literal></para> - </listitem> - <listitem><para>It does the equivalent of a <command - role="hg-cmd">hg update</command> to sync the working - directory to the changeset you want to back out. Let's - call this changeset <literal>backout</literal></para> - </listitem> - <listitem><para>It finds the parent of that changeset. Let's - call that changeset <literal>parent</literal>.</para> - </listitem> - <listitem><para>For each file that the - <literal>backout</literal> changeset affected, it does the - equivalent of a <command role="hg-cmd">hg revert -r - parent</command> on that file, to restore it to the - contents it had before that changeset was - committed.</para> - </listitem> - <listitem><para>It commits the result as a new changeset. - This changeset has <literal>backout</literal> as its - parent.</para> - </listitem> - <listitem><para>If you specify <option - role="hg-opt-backout">--merge</option> on the command - line, it merges with <literal>orig</literal>, and commits - the result of the merge.</para> - </listitem></orderedlist> - - <para>An alternative way to implement the <command - role="hg-cmd">hg backout</command> command would be to - <command role="hg-cmd">hg export</command> the - to-be-backed-out changeset as a diff, then use the <option - role="cmd-opt-patch">--reverse</option> option to the - <command>patch</command> command to reverse the effect of the - change without fiddling with the working directory. This - sounds much simpler, but it would not work nearly as - well.</para> - - <para>The reason that <command role="hg-cmd">hg - backout</command> does an update, a commit, a merge, and - another commit is to give the merge machinery the best chance - to do a good job when dealing with all the changes - <emphasis>between</emphasis> the change you're backing out and - the current tip.</para> - - <para>If you're backing out a changeset that's 100 revisions - back in your project's history, the chances that the - <command>patch</command> command will be able to apply a - reverse diff cleanly are not good, because intervening changes - are likely to have <quote>broken the context</quote> that - <command>patch</command> uses to determine whether it can - apply a patch (if this sounds like gibberish, see <xref - linkend="sec:mq:patch"/> for a - discussion of the <command>patch</command> command). Also, - Mercurial's merge machinery will handle files and directories - being renamed, permission changes, and modifications to binary - files, none of which <command>patch</command> can deal - with.</para> - - </sect2> - </sect1> - <sect1 id="sec:undo:aaaiiieee"> - <title>Changes that should never have been</title> - - <para>Most of the time, the <command role="hg-cmd">hg - backout</command> command is exactly what you need if you want - to undo the effects of a change. It leaves a permanent record - of exactly what you did, both when committing the original - changeset and when you cleaned up after it.</para> - - <para>On rare occasions, though, you may find that you've - committed a change that really should not be present in the - repository at all. For example, it would be very unusual, and - usually considered a mistake, to commit a software project's - object files as well as its source files. Object files have - almost no intrinsic value, and they're <emphasis>big</emphasis>, - so they increase the size of the repository and the amount of - time it takes to clone or pull changes.</para> - - <para>Before I discuss the options that you have if you commit a - <quote>brown paper bag</quote> change (the kind that's so bad - that you want to pull a brown paper bag over your head), let me - first discuss some approaches that probably won't work.</para> - - <para>Since Mercurial treats history as accumulative&emdash;every - change builds on top of all changes that preceded it&emdash;you - generally can't just make disastrous changes disappear. The one - exception is when you've just committed a change, and it hasn't - been pushed or pulled into another repository. That's when you - can safely use the <command role="hg-cmd">hg rollback</command> - command, as I detailed in section <xref - linkend="sec:undo:rollback"/>.</para> - - <para>After you've pushed a bad change to another repository, you - <emphasis>could</emphasis> still use <command role="hg-cmd">hg - rollback</command> to make your local copy of the change - disappear, but it won't have the consequences you want. The - change will still be present in the remote repository, so it - will reappear in your local repository the next time you - pull.</para> - - <para>If a situation like this arises, and you know which - repositories your bad change has propagated into, you can - <emphasis>try</emphasis> to get rid of the changeefrom - <emphasis>every</emphasis> one of those repositories. This is, - of course, not a satisfactory solution: if you miss even a - single repository while you're expunging, the change is still - <quote>in the wild</quote>, and could propagate further.</para> - - <para>If you've committed one or more changes - <emphasis>after</emphasis> the change that you'd like to see - disappear, your options are further reduced. Mercurial doesn't - provide a way to <quote>punch a hole</quote> in history, leaving - changesets intact.</para> - - <para>XXX This needs filling out. The - <literal>hg-replay</literal> script in the - <literal>examples</literal> directory works, but doesn't handle - merge changesets. Kind of an important omission.</para> - - <sect2> - <title>Protect yourself from <quote>escaped</quote> - changes</title> - - <para>If you've committed some changes to your local repository - and they've been pushed or pulled somewhere else, this isn't - necessarily a disaster. You can protect yourself ahead of - time against some classes of bad changeset. This is - particularly easy if your team usually pulls changes from a - central repository.</para> - - <para>By configuring some hooks on that repository to validate - incoming changesets (see chapter <xref linkend="chap:hook"/>), - you can - automatically prevent some kinds of bad changeset from being - pushed to the central repository at all. With such a - configuration in place, some kinds of bad changeset will - naturally tend to <quote>die out</quote> because they can't - propagate into the central repository. Better yet, this - happens without any need for explicit intervention.</para> - - <para>For instance, an incoming change hook that verifies that a - changeset will actually compile can prevent people from - inadvertantly <quote>breaking the build</quote>.</para> - - </sect2> - </sect1> - <sect1 id="sec:undo:bisect"> - <title>Finding the source of a bug</title> - - <para>While it's all very well to be able to back out a changeset - that introduced a bug, this requires that you know which - changeset to back out. Mercurial provides an invaluable - command, called <command role="hg-cmd">hg bisect</command>, that - helps you to automate this process and accomplish it very - efficiently.</para> - - <para>The idea behind the <command role="hg-cmd">hg - bisect</command> command is that a changeset has introduced - some change of behaviour that you can identify with a simple - binary test. You don't know which piece of code introduced the - change, but you know how to test for the presence of the bug. - The <command role="hg-cmd">hg bisect</command> command uses your - test to direct its search for the changeset that introduced the - code that caused the bug.</para> - - <para>Here are a few scenarios to help you understand how you - might apply this command.</para> - <itemizedlist> - <listitem><para>The most recent version of your software has a - bug that you remember wasn't present a few weeks ago, but - you don't know when it was introduced. Here, your binary - test checks for the presence of that bug.</para> - </listitem> - <listitem><para>You fixed a bug in a rush, and now it's time to - close the entry in your team's bug database. The bug - database requires a changeset ID when you close an entry, - but you don't remember which changeset you fixed the bug in. - Once again, your binary test checks for the presence of the - bug.</para> - </listitem> - <listitem><para>Your software works correctly, but runs 15% - slower than the last time you measured it. You want to know - which changeset introduced the performance regression. In - this case, your binary test measures the performance of your - software, to see whether it's <quote>fast</quote> or - <quote>slow</quote>.</para> - </listitem> - <listitem><para>The sizes of the components of your project that - you ship exploded recently, and you suspect that something - changed in the way you build your project.</para> - </listitem></itemizedlist> - - <para>From these examples, it should be clear that the <command - role="hg-cmd">hg bisect</command> command is not useful only - for finding the sources of bugs. You can use it to find any - <quote>emergent property</quote> of a repository (anything that - you can't find from a simple text search of the files in the - tree) for which you can write a binary test.</para> - - <para>We'll introduce a little bit of terminology here, just to - make it clear which parts of the search process are your - responsibility, and which are Mercurial's. A - <emphasis>test</emphasis> is something that - <emphasis>you</emphasis> run when <command role="hg-cmd">hg - bisect</command> chooses a changeset. A - <emphasis>probe</emphasis> is what <command role="hg-cmd">hg - bisect</command> runs to tell whether a revision is good. - Finally, we'll use the word <quote>bisect</quote>, as both a - noun and a verb, to stand in for the phrase <quote>search using - the <command role="hg-cmd">hg bisect</command> - command</quote>.</para> - - <para>One simple way to automate the searching process would be - simply to probe every changeset. However, this scales poorly. - If it took ten minutes to test a single changeset, and you had - 10,000 changesets in your repository, the exhaustive approach - would take on average 35 <emphasis>days</emphasis> to find the - changeset that introduced a bug. Even if you knew that the bug - was introduced by one of the last 500 changesets, and limited - your search to those, you'd still be looking at over 40 hours to - find the changeset that introduced your bug.</para> - - <para>What the <command role="hg-cmd">hg bisect</command> command - does is use its knowledge of the <quote>shape</quote> of your - project's revision history to perform a search in time - proportional to the <emphasis>logarithm</emphasis> of the number - of changesets to check (the kind of search it performs is called - a dichotomic search). With this approach, searching through - 10,000 changesets will take less than three hours, even at ten - minutes per test (the search will require about 14 tests). - Limit your search to the last hundred changesets, and it will - take only about an hour (roughly seven tests).</para> - - <para>The <command role="hg-cmd">hg bisect</command> command is - aware of the <quote>branchy</quote> nature of a Mercurial - project's revision history, so it has no problems dealing with - branches, merges, or multiple heads in a repository. It can - prune entire branches of history with a single probe, which is - how it operates so efficiently.</para> - - <sect2> - <title>Using the <command role="hg-cmd">hg bisect</command> - command</title> - - <para>Here's an example of <command role="hg-cmd">hg - bisect</command> in action.</para> - - <note> - <para> In versions 0.9.5 and earlier of Mercurial, <command - role="hg-cmd">hg bisect</command> was not a core command: - it was distributed with Mercurial as an extension. This - section describes the built-in command, not the old - extension.</para> - </note> - - <para>Now let's create a repository, so that we can try out the - <command role="hg-cmd">hg bisect</command> command in - isolation.</para> - - &interaction.bisect.init; - - <para>We'll simulate a project that has a bug in it in a - simple-minded way: create trivial changes in a loop, and - nominate one specific change that will have the - <quote>bug</quote>. This loop creates 35 changesets, each - adding a single file to the repository. We'll represent our - <quote>bug</quote> with a file that contains the text <quote>i - have a gub</quote>.</para> - - &interaction.bisect.commits; - - <para>The next thing that we'd like to do is figure out how to - use the <command role="hg-cmd">hg bisect</command> command. - We can use Mercurial's normal built-in help mechanism for - this.</para> - - &interaction.bisect.help; - - <para>The <command role="hg-cmd">hg bisect</command> command - works in steps. Each step proceeds as follows.</para> - <orderedlist> - <listitem><para>You run your binary test.</para> - <itemizedlist> - <listitem><para>If the test succeeded, you tell <command - role="hg-cmd">hg bisect</command> by running the - <command role="hg-cmd">hg bisect good</command> - command.</para> - </listitem> - <listitem><para>If it failed, run the <command - role="hg-cmd">hg bisect bad</command> - command.</para></listitem></itemizedlist> - </listitem> - <listitem><para>The command uses your information to decide - which changeset to test next.</para> - </listitem> - <listitem><para>It updates the working directory to that - changeset, and the process begins again.</para> - </listitem></orderedlist> - <para>The process ends when <command role="hg-cmd">hg - bisect</command> identifies a unique changeset that marks - the point where your test transitioned from - <quote>succeeding</quote> to <quote>failing</quote>.</para> - - <para>To start the search, we must run the <command - role="hg-cmd">hg bisect --reset</command> command.</para> - - &interaction.bisect.search.init; - - <para>In our case, the binary test we use is simple: we check to - see if any file in the repository contains the string <quote>i - have a gub</quote>. If it does, this changeset contains the - change that <quote>caused the bug</quote>. By convention, a - changeset that has the property we're searching for is - <quote>bad</quote>, while one that doesn't is - <quote>good</quote>.</para> - - <para>Most of the time, the revision to which the working - directory is synced (usually the tip) already exhibits the - problem introduced by the buggy change, so we'll mark it as - <quote>bad</quote>.</para> - - &interaction.bisect.search.bad-init; - - <para>Our next task is to nominate a changeset that we know - <emphasis>doesn't</emphasis> have the bug; the <command - role="hg-cmd">hg bisect</command> command will - <quote>bracket</quote> its search between the first pair of - good and bad changesets. In our case, we know that revision - 10 didn't have the bug. (I'll have more words about choosing - the first <quote>good</quote> changeset later.)</para> - - &interaction.bisect.search.good-init; - - <para>Notice that this command printed some output.</para> - <itemizedlist> - <listitem><para>It told us how many changesets it must - consider before it can identify the one that introduced - the bug, and how many tests that will require.</para> - </listitem> - <listitem><para>It updated the working directory to the next - changeset to test, and told us which changeset it's - testing.</para> - </listitem></itemizedlist> - - <para>We now run our test in the working directory. We use the - <command>grep</command> command to see if our - <quote>bad</quote> file is present in the working directory. - If it is, this revision is bad; if not, this revision is good. - &interaction.bisect.search.step1;</para> - - <para>This test looks like a perfect candidate for automation, - so let's turn it into a shell function.</para> - &interaction.bisect.search.mytest; - - <para>We can now run an entire test step with a single command, - <literal>mytest</literal>.</para> - - &interaction.bisect.search.step2; - - <para>A few more invocations of our canned test step command, - and we're done.</para> - - &interaction.bisect.search.rest; - - <para>Even though we had 40 changesets to search through, the - <command role="hg-cmd">hg bisect</command> command let us find - the changeset that introduced our <quote>bug</quote> with only - five tests. Because the number of tests that the <command - role="hg-cmd">hg bisect</command> command performs grows - logarithmically with the number of changesets to search, the - advantage that it has over the <quote>brute force</quote> - search approach increases with every changeset you add.</para> - - </sect2> - <sect2> - <title>Cleaning up after your search</title> - - <para>When you're finished using the <command role="hg-cmd">hg - bisect</command> command in a repository, you can use the - <command role="hg-cmd">hg bisect reset</command> command to - drop the information it was using to drive your search. The - command doesn't use much space, so it doesn't matter if you - forget to run this command. However, <command - role="hg-cmd">hg bisect</command> won't let you start a new - search in that repository until you do a <command - role="hg-cmd">hg bisect reset</command>.</para> - - &interaction.bisect.search.reset; - - </sect2> - </sect1> - <sect1> - <title>Tips for finding bugs effectively</title> - - <sect2> - <title>Give consistent input</title> - - <para>The <command role="hg-cmd">hg bisect</command> command - requires that you correctly report the result of every test - you perform. If you tell it that a test failed when it really - succeeded, it <emphasis>might</emphasis> be able to detect the - inconsistency. If it can identify an inconsistency in your - reports, it will tell you that a particular changeset is both - good and bad. However, it can't do this perfectly; it's about - as likely to report the wrong changeset as the source of the - bug.</para> - - </sect2> - <sect2> - <title>Automate as much as possible</title> - - <para>When I started using the <command role="hg-cmd">hg - bisect</command> command, I tried a few times to run my - tests by hand, on the command line. This is an approach that - I, at least, am not suited to. After a few tries, I found - that I was making enough mistakes that I was having to restart - my searches several times before finally getting correct - results.</para> - - <para>My initial problems with driving the <command - role="hg-cmd">hg bisect</command> command by hand occurred - even with simple searches on small repositories; if the - problem you're looking for is more subtle, or the number of - tests that <command role="hg-cmd">hg bisect</command> must - perform increases, the likelihood of operator error ruining - the search is much higher. Once I started automating my - tests, I had much better results.</para> - - <para>The key to automated testing is twofold:</para> - <itemizedlist> - <listitem><para>always test for the same symptom, and</para> - </listitem> - <listitem><para>always feed consistent input to the <command - role="hg-cmd">hg bisect</command> command.</para> - </listitem></itemizedlist> - <para>In my tutorial example above, the <command>grep</command> - command tests for the symptom, and the <literal>if</literal> - statement takes the result of this check and ensures that we - always feed the same input to the <command role="hg-cmd">hg - bisect</command> command. The <literal>mytest</literal> - function marries these together in a reproducible way, so that - every test is uniform and consistent.</para> - - </sect2> - <sect2> - <title>Check your results</title> - - <para>Because the output of a <command role="hg-cmd">hg - bisect</command> search is only as good as the input you - give it, don't take the changeset it reports as the absolute - truth. A simple way to cross-check its report is to manually - run your test at each of the following changesets:</para> - <itemizedlist> - <listitem><para>The changeset that it reports as the first bad - revision. Your test should still report this as - bad.</para> - </listitem> - <listitem><para>The parent of that changeset (either parent, - if it's a merge). Your test should report this changeset - as good.</para> - </listitem> - <listitem><para>A child of that changeset. Your test should - report this changeset as bad.</para> - </listitem></itemizedlist> - - </sect2> - <sect2> - <title>Beware interference between bugs</title> - - <para>It's possible that your search for one bug could be - disrupted by the presence of another. For example, let's say - your software crashes at revision 100, and worked correctly at - revision 50. Unknown to you, someone else introduced a - different crashing bug at revision 60, and fixed it at - revision 80. This could distort your results in one of - several ways.</para> - - <para>It is possible that this other bug completely - <quote>masks</quote> yours, which is to say that it occurs - before your bug has a chance to manifest itself. If you can't - avoid that other bug (for example, it prevents your project - from building), and so can't tell whether your bug is present - in a particular changeset, the <command role="hg-cmd">hg - bisect</command> command cannot help you directly. Instead, - you can mark a changeset as untested by running <command - role="hg-cmd">hg bisect --skip</command>.</para> - - <para>A different problem could arise if your test for a bug's - presence is not specific enough. If you check for <quote>my - program crashes</quote>, then both your crashing bug and an - unrelated crashing bug that masks it will look like the same - thing, and mislead <command role="hg-cmd">hg - bisect</command>.</para> - - <para>Another useful situation in which to use <command - role="hg-cmd">hg bisect --skip</command> is if you can't - test a revision because your project was in a broken and hence - untestable state at that revision, perhaps because someone - checked in a change that prevented the project from - building.</para> - - </sect2> - <sect2> - <title>Bracket your search lazily</title> - - <para>Choosing the first <quote>good</quote> and - <quote>bad</quote> changesets that will mark the end points of - your search is often easy, but it bears a little discussion - nevertheless. From the perspective of <command - role="hg-cmd">hg bisect</command>, the <quote>newest</quote> - changeset is conventionally <quote>bad</quote>, and the older - changeset is <quote>good</quote>.</para> - - <para>If you're having trouble remembering when a suitable - <quote>good</quote> change was, so that you can tell <command - role="hg-cmd">hg bisect</command>, you could do worse than - testing changesets at random. Just remember to eliminate - contenders that can't possibly exhibit the bug (perhaps - because the feature with the bug isn't present yet) and those - where another problem masks the bug (as I discussed - above).</para> - - <para>Even if you end up <quote>early</quote> by thousands of - changesets or months of history, you will only add a handful - of tests to the total number that <command role="hg-cmd">hg - bisect</command> must perform, thanks to its logarithmic - behaviour.</para> - - </sect2> - </sect1> -</chapter> - -<!-- -local variables: -sgml-parent-document: ("00book.xml" "book" "chapter") -end: --->
--- a/en/ch10-hook.xml Wed Mar 18 00:08:22 2009 -0700 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,2037 +0,0 @@ -<!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : --> - -<chapter id="chap:hook"> - <?dbhtml filename="handling-repository-events-with-hooks.html"?> - <title>Handling repository events with hooks</title> - - <para>Mercurial offers a powerful mechanism to let you perform - automated actions in response to events that occur in a - repository. In some cases, you can even control Mercurial's - response to those events.</para> - - <para>The name Mercurial uses for one of these actions is a - <emphasis>hook</emphasis>. Hooks are called - <quote>triggers</quote> in some revision control systems, but the - two names refer to the same idea.</para> - - <sect1> - <title>An overview of hooks in Mercurial</title> - - <para>Here is a brief list of the hooks that Mercurial supports. - We will revisit each of these hooks in more detail later, in - section <xref linkend="sec:hook:ref"/>.</para> - - <itemizedlist> - <listitem><para><literal role="hook">changegroup</literal>: This - is run after a group of changesets has been brought into the - repository from elsewhere.</para> - </listitem> - <listitem><para><literal role="hook">commit</literal>: This is - run after a new changeset has been created in the local - repository.</para> - </listitem> - <listitem><para><literal role="hook">incoming</literal>: This is - run once for each new changeset that is brought into the - repository from elsewhere. Notice the difference from - <literal role="hook">changegroup</literal>, which is run - once per <emphasis>group</emphasis> of changesets brought - in.</para> - </listitem> - <listitem><para><literal role="hook">outgoing</literal>: This is - run after a group of changesets has been transmitted from - this repository.</para> - </listitem> - <listitem><para><literal role="hook">prechangegroup</literal>: - This is run before starting to bring a group of changesets - into the repository. - </para> - </listitem> - <listitem><para><literal role="hook">precommit</literal>: - Controlling. This is run before starting a commit. - </para> - </listitem> - <listitem><para><literal role="hook">preoutgoing</literal>: - Controlling. This is run before starting to transmit a group - of changesets from this repository. - </para> - </listitem> - <listitem><para><literal role="hook">pretag</literal>: - Controlling. This is run before creating a tag. - </para> - </listitem> - <listitem><para><literal - role="hook">pretxnchangegroup</literal>: Controlling. This - is run after a group of changesets has been brought into the - local repository from another, but before the transaction - completes that will make the changes permanent in the - repository. - </para> - </listitem> - <listitem><para><literal role="hook">pretxncommit</literal>: - Controlling. This is run after a new changeset has been - created in the local repository, but before the transaction - completes that will make it permanent. - </para> - </listitem> - <listitem><para><literal role="hook">preupdate</literal>: - Controlling. This is run before starting an update or merge - of the working directory. - </para> - </listitem> - <listitem><para><literal role="hook">tag</literal>: This is run - after a tag is created. - </para> - </listitem> - <listitem><para><literal role="hook">update</literal>: This is - run after an update or merge of the working directory has - finished. - </para> - </listitem></itemizedlist> - <para>Each of the hooks whose description begins with the word - <quote>Controlling</quote> has the ability to determine whether - an activity can proceed. If the hook succeeds, the activity may - proceed; if it fails, the activity is either not permitted or - undone, depending on the hook. - </para> - - </sect1> - <sect1> - <title>Hooks and security</title> - - <sect2> - <title>Hooks are run with your privileges</title> - - <para>When you run a Mercurial command in a repository, and the - command causes a hook to run, that hook runs on - <emphasis>your</emphasis> system, under - <emphasis>your</emphasis> user account, with - <emphasis>your</emphasis> privilege level. Since hooks are - arbitrary pieces of executable code, you should treat them - with an appropriate level of suspicion. Do not install a hook - unless you are confident that you know who created it and what - it does. - </para> - - <para>In some cases, you may be exposed to hooks that you did - not install yourself. If you work with Mercurial on an - unfamiliar system, Mercurial will run hooks defined in that - system's global <filename role="special">~/.hgrc</filename> - file. - </para> - - <para>If you are working with a repository owned by another - user, Mercurial can run hooks defined in that user's - repository, but it will still run them as <quote>you</quote>. - For example, if you <command role="hg-cmd">hg pull</command> - from that repository, and its <filename - role="special">.hg/hgrc</filename> defines a local <literal - role="hook">outgoing</literal> hook, that hook will run - under your user account, even though you don't own that - repository. - </para> - - <note> - <para> This only applies if you are pulling from a repository - on a local or network filesystem. If you're pulling over - http or ssh, any <literal role="hook">outgoing</literal> - hook will run under whatever account is executing the server - process, on the server. - </para> - </note> - - <para>XXX To see what hooks are defined in a repository, use the - <command role="hg-cmd">hg config hooks</command> command. If - you are working in one repository, but talking to another that - you do not own (e.g. using <command role="hg-cmd">hg - pull</command> or <command role="hg-cmd">hg - incoming</command>), remember that it is the other - repository's hooks you should be checking, not your own. - </para> - - </sect2> - <sect2> - <title>Hooks do not propagate</title> - - <para>In Mercurial, hooks are not revision controlled, and do - not propagate when you clone, or pull from, a repository. The - reason for this is simple: a hook is a completely arbitrary - piece of executable code. It runs under your user identity, - with your privilege level, on your machine. - </para> - - <para>It would be extremely reckless for any distributed - revision control system to implement revision-controlled - hooks, as this would offer an easily exploitable way to - subvert the accounts of users of the revision control system. - </para> - - <para>Since Mercurial does not propagate hooks, if you are - collaborating with other people on a common project, you - should not assume that they are using the same Mercurial hooks - as you are, or that theirs are correctly configured. You - should document the hooks you expect people to use. - </para> - - <para>In a corporate intranet, this is somewhat easier to - control, as you can for example provide a - <quote>standard</quote> installation of Mercurial on an NFS - filesystem, and use a site-wide <filename role="special">~/.hgrc</filename> file to define hooks that all users will - see. However, this too has its limits; see below. - </para> - - </sect2> - <sect2> - <title>Hooks can be overridden</title> - - <para>Mercurial allows you to override a hook definition by - redefining the hook. You can disable it by setting its value - to the empty string, or change its behaviour as you wish. - </para> - - <para>If you deploy a system- or site-wide <filename - role="special">~/.hgrc</filename> file that defines some - hooks, you should thus understand that your users can disable - or override those hooks. - </para> - - </sect2> - <sect2> - <title>Ensuring that critical hooks are run</title> - - <para>Sometimes you may want to enforce a policy that you do not - want others to be able to work around. For example, you may - have a requirement that every changeset must pass a rigorous - set of tests. Defining this requirement via a hook in a - site-wide <filename role="special">~/.hgrc</filename> won't - work for remote users on laptops, and of course local users - can subvert it at will by overriding the hook. - </para> - - <para>Instead, you can set up your policies for use of Mercurial - so that people are expected to propagate changes through a - well-known <quote>canonical</quote> server that you have - locked down and configured appropriately. - </para> - - <para>One way to do this is via a combination of social - engineering and technology. Set up a restricted-access - account; users can push changes over the network to - repositories managed by this account, but they cannot log into - the account and run normal shell commands. In this scenario, - a user can commit a changeset that contains any old garbage - they want. - </para> - - <para>When someone pushes a changeset to the server that - everyone pulls from, the server will test the changeset before - it accepts it as permanent, and reject it if it fails to pass - the test suite. If people only pull changes from this - filtering server, it will serve to ensure that all changes - that people pull have been automatically vetted. - </para> - - </sect2> - </sect1> - <sect1> - <title>Care with <literal>pretxn</literal> hooks in a - shared-access repository</title> - - <para>If you want to use hooks to do some automated work in a - repository that a number of people have shared access to, you - need to be careful in how you do this. - </para> - - <para>Mercurial only locks a repository when it is writing to the - repository, and only the parts of Mercurial that write to the - repository pay attention to locks. Write locks are necessary to - prevent multiple simultaneous writers from scribbling on each - other's work, corrupting the repository. - </para> - - <para>Because Mercurial is careful with the order in which it - reads and writes data, it does not need to acquire a lock when - it wants to read data from the repository. The parts of - Mercurial that read from the repository never pay attention to - locks. This lockless reading scheme greatly increases - performance and concurrency. - </para> - - <para>With great performance comes a trade-off, though, one which - has the potential to cause you trouble unless you're aware of - it. To describe this requires a little detail about how - Mercurial adds changesets to a repository and reads those - changes. - </para> - - <para>When Mercurial <emphasis>writes</emphasis> metadata, it - writes it straight into the destination file. It writes file - data first, then manifest data (which contains pointers to the - new file data), then changelog data (which contains pointers to - the new manifest data). Before the first write to each file, it - stores a record of where the end of the file was in its - transaction log. If the transaction must be rolled back, - Mercurial simply truncates each file back to the size it was - before the transaction began. - </para> - - <para>When Mercurial <emphasis>reads</emphasis> metadata, it reads - the changelog first, then everything else. Since a reader will - only access parts of the manifest or file metadata that it can - see in the changelog, it can never see partially written data. - </para> - - <para>Some controlling hooks (<literal - role="hook">pretxncommit</literal> and <literal - role="hook">pretxnchangegroup</literal>) run when a - transaction is almost complete. All of the metadata has been - written, but Mercurial can still roll the transaction back and - cause the newly-written data to disappear. - </para> - - <para>If one of these hooks runs for long, it opens a window of - time during which a reader can see the metadata for changesets - that are not yet permanent, and should not be thought of as - <quote>really there</quote>. The longer the hook runs, the - longer that window is open. - </para> - - <sect2> - <title>The problem illustrated</title> - - <para>In principle, a good use for the <literal - role="hook">pretxnchangegroup</literal> hook would be to - automatically build and test incoming changes before they are - accepted into a central repository. This could let you - guarantee that nobody can push changes to this repository that - <quote>break the build</quote>. But if a client can pull - changes while they're being tested, the usefulness of the test - is zero; an unsuspecting someone can pull untested changes, - potentially breaking their build. - </para> - - <para>The safest technological answer to this challenge is to - set up such a <quote>gatekeeper</quote> repository as - <emphasis>unidirectional</emphasis>. Let it take changes - pushed in from the outside, but do not allow anyone to pull - changes from it (use the <literal - role="hook">preoutgoing</literal> hook to lock it down). - Configure a <literal role="hook">changegroup</literal> hook so - that if a build or test succeeds, the hook will push the new - changes out to another repository that people - <emphasis>can</emphasis> pull from. - </para> - - <para>In practice, putting a centralised bottleneck like this in - place is not often a good idea, and transaction visibility has - nothing to do with the problem. As the size of a - project&emdash;and the time it takes to build and - test&emdash;grows, you rapidly run into a wall with this - <quote>try before you buy</quote> approach, where you have - more changesets to test than time in which to deal with them. - The inevitable result is frustration on the part of all - involved. - </para> - - <para>An approach that scales better is to get people to build - and test before they push, then run automated builds and tests - centrally <emphasis>after</emphasis> a push, to be sure all is - well. The advantage of this approach is that it does not - impose a limit on the rate at which the repository can accept - changes. - </para> - - </sect2> - </sect1> - <sect1 id="sec:hook:simple"> - <title>A short tutorial on using hooks</title> - - <para>It is easy to write a Mercurial hook. Let's start with a - hook that runs when you finish a <command role="hg-cmd">hg - commit</command>, and simply prints the hash of the changeset - you just created. The hook is called <literal - role="hook">commit</literal>. - </para> - - <para>All hooks follow the pattern in this example.</para> - -&interaction.hook.simple.init; - - <para>You add an entry to the <literal - role="rc-hooks">hooks</literal> section of your <filename - role="special">~/.hgrc</filename>. On the left is the name of - the event to trigger on; on the right is the action to take. As - you can see, you can run an arbitrary shell command in a hook. - Mercurial passes extra information to the hook using environment - variables (look for <envar>HG_NODE</envar> in the example). - </para> - - <sect2> - <title>Performing multiple actions per event</title> - - <para>Quite often, you will want to define more than one hook - for a particular kind of event, as shown below.</para> - -&interaction.hook.simple.ext; - - <para>Mercurial lets you do this by adding an - <emphasis>extension</emphasis> to the end of a hook's name. - You extend a hook's name by giving the name of the hook, - followed by a full stop (the - <quote><literal>.</literal></quote> character), followed by - some more text of your choosing. For example, Mercurial will - run both <literal>commit.foo</literal> and - <literal>commit.bar</literal> when the - <literal>commit</literal> event occurs. - </para> - - <para>To give a well-defined order of execution when there are - multiple hooks defined for an event, Mercurial sorts hooks by - extension, and executes the hook commands in this sorted - order. In the above example, it will execute - <literal>commit.bar</literal> before - <literal>commit.foo</literal>, and <literal>commit</literal> - before both. - </para> - - <para>It is a good idea to use a somewhat descriptive extension - when you define a new hook. This will help you to remember - what the hook was for. If the hook fails, you'll get an error - message that contains the hook name and extension, so using a - descriptive extension could give you an immediate hint as to - why the hook failed (see section <xref - linkend="sec:hook:perm"/> for an example). - </para> - - </sect2> - <sect2 id="sec:hook:perm"> - <title>Controlling whether an activity can proceed</title> - - <para>In our earlier examples, we used the <literal - role="hook">commit</literal> hook, which is run after a - commit has completed. This is one of several Mercurial hooks - that run after an activity finishes. Such hooks have no way - of influencing the activity itself. - </para> - - <para>Mercurial defines a number of events that occur before an - activity starts; or after it starts, but before it finishes. - Hooks that trigger on these events have the added ability to - choose whether the activity can continue, or will abort. - </para> - - <para>The <literal role="hook">pretxncommit</literal> hook runs - after a commit has all but completed. In other words, the - metadata representing the changeset has been written out to - disk, but the transaction has not yet been allowed to - complete. The <literal role="hook">pretxncommit</literal> - hook has the ability to decide whether the transaction can - complete, or must be rolled back. - </para> - - <para>If the <literal role="hook">pretxncommit</literal> hook - exits with a status code of zero, the transaction is allowed - to complete; the commit finishes; and the <literal - role="hook">commit</literal> hook is run. If the <literal - role="hook">pretxncommit</literal> hook exits with a - non-zero status code, the transaction is rolled back; the - metadata representing the changeset is erased; and the - <literal role="hook">commit</literal> hook is not run. - </para> - -&interaction.hook.simple.pretxncommit; - - <para>The hook in the example above checks that a commit comment - contains a bug ID. If it does, the commit can complete. If - not, the commit is rolled back. - </para> - - </sect2> - </sect1> - <sect1> - <title>Writing your own hooks</title> - - <para>When you are writing a hook, you might find it useful to run - Mercurial either with the <option - role="hg-opt-global">-v</option> option, or the <envar - role="rc-item-ui">verbose</envar> config item set to - <quote>true</quote>. When you do so, Mercurial will print a - message before it calls each hook. - </para> - - <sect2 id="sec:hook:lang"> - <title>Choosing how your hook should run</title> - - <para>You can write a hook either as a normal - program&emdash;typically a shell script&emdash;or as a Python - function that is executed within the Mercurial process. - </para> - - <para>Writing a hook as an external program has the advantage - that it requires no knowledge of Mercurial's internals. You - can call normal Mercurial commands to get any added - information you need. The trade-off is that external hooks - are slower than in-process hooks. - </para> - - <para>An in-process Python hook has complete access to the - Mercurial API, and does not <quote>shell out</quote> to - another process, so it is inherently faster than an external - hook. It is also easier to obtain much of the information - that a hook requires by using the Mercurial API than by - running Mercurial commands. - </para> - - <para>If you are comfortable with Python, or require high - performance, writing your hooks in Python may be a good - choice. However, when you have a straightforward hook to - write and you don't need to care about performance (probably - the majority of hooks), a shell script is perfectly fine. - </para> - - </sect2> - <sect2 id="sec:hook:param"> - <title>Hook parameters</title> - - <para>Mercurial calls each hook with a set of well-defined - parameters. In Python, a parameter is passed as a keyword - argument to your hook function. For an external program, a - parameter is passed as an environment variable. - </para> - - <para>Whether your hook is written in Python or as a shell - script, the hook-specific parameter names and values will be - the same. A boolean parameter will be represented as a - boolean value in Python, but as the number 1 (for - <quote>true</quote>) or 0 (for <quote>false</quote>) as an - environment variable for an external hook. If a hook - parameter is named <literal>foo</literal>, the keyword - argument for a Python hook will also be named - <literal>foo</literal>, while the environment variable for an - external hook will be named <literal>HG_FOO</literal>. - </para> - - </sect2> - <sect2> - <title>Hook return values and activity control</title> - - <para>A hook that executes successfully must exit with a status - of zero if external, or return boolean <quote>false</quote> if - in-process. Failure is indicated with a non-zero exit status - from an external hook, or an in-process hook returning boolean - <quote>true</quote>. If an in-process hook raises an - exception, the hook is considered to have failed. - </para> - - <para>For a hook that controls whether an activity can proceed, - zero/false means <quote>allow</quote>, while - non-zero/true/exception means <quote>deny</quote>. - </para> - - </sect2> - <sect2> - <title>Writing an external hook</title> - - <para>When you define an external hook in your <filename - role="special">~/.hgrc</filename> and the hook is run, its - value is passed to your shell, which interprets it. This - means that you can use normal shell constructs in the body of - the hook. - </para> - - <para>An executable hook is always run with its current - directory set to a repository's root directory. - </para> - - <para>Each hook parameter is passed in as an environment - variable; the name is upper-cased, and prefixed with the - string <quote><literal>HG_</literal></quote>. - </para> - - <para>With the exception of hook parameters, Mercurial does not - set or modify any environment variables when running a hook. - This is useful to remember if you are writing a site-wide hook - that may be run by a number of different users with differing - environment variables set. In multi-user situations, you - should not rely on environment variables being set to the - values you have in your environment when testing the hook. - </para> - - </sect2> - <sect2> - <title>Telling Mercurial to use an in-process hook</title> - - <para>The <filename role="special">~/.hgrc</filename> syntax - for defining an in-process hook is slightly different than for - an executable hook. The value of the hook must start with the - text <quote><literal>python:</literal></quote>, and continue - with the fully-qualified name of a callable object to use as - the hook's value. - </para> - - <para>The module in which a hook lives is automatically imported - when a hook is run. So long as you have the module name and - <envar>PYTHONPATH</envar> right, it should <quote>just - work</quote>. - </para> - - <para>The following <filename role="special">~/.hgrc</filename> - example snippet illustrates the syntax and meaning of the - notions we just described. - </para> - <programlisting>[hooks] -commit.example = python:mymodule.submodule.myhook</programlisting> - <para>When Mercurial runs the <literal>commit.example</literal> - hook, it imports <literal>mymodule.submodule</literal>, looks - for the callable object named <literal>myhook</literal>, and - calls it. - </para> - - </sect2> - <sect2> - <title>Writing an in-process hook</title> - - <para>The simplest in-process hook does nothing, but illustrates - the basic shape of the hook API: - </para> - <programlisting>def myhook(ui, repo, **kwargs): - pass</programlisting> - <para>The first argument to a Python hook is always a <literal - role="py-mod-mercurial.ui">ui</literal> object. The second - is a repository object; at the moment, it is always an - instance of <literal - role="py-mod-mercurial.localrepo">localrepository</literal>. - Following these two arguments are other keyword arguments. - Which ones are passed in depends on the hook being called, but - a hook can ignore arguments it doesn't care about by dropping - them into a keyword argument dict, as with - <literal>**kwargs</literal> above. - </para> - - </sect2> - </sect1> - <sect1> - <title>Some hook examples</title> - - <sect2> - <title>Writing meaningful commit messages</title> - - <para>It's hard to imagine a useful commit message being very - short. The simple <literal role="hook">pretxncommit</literal> - hook of the example below will prevent you from committing a - changeset with a message that is less than ten bytes long. - </para> - -&interaction.hook.msglen.go; - - </sect2> - <sect2> - <title>Checking for trailing whitespace</title> - - <para>An interesting use of a commit-related hook is to help you - to write cleaner code. A simple example of <quote>cleaner - code</quote> is the dictum that a change should not add any - new lines of text that contain <quote>trailing - whitespace</quote>. Trailing whitespace is a series of - space and tab characters at the end of a line of text. In - most cases, trailing whitespace is unnecessary, invisible - noise, but it is occasionally problematic, and people often - prefer to get rid of it. - </para> - - <para>You can use either the <literal - role="hook">precommit</literal> or <literal - role="hook">pretxncommit</literal> hook to tell whether you - have a trailing whitespace problem. If you use the <literal - role="hook">precommit</literal> hook, the hook will not know - which files you are committing, so it will have to check every - modified file in the repository for trailing white space. If - you want to commit a change to just the file - <filename>foo</filename>, but the file - <filename>bar</filename> contains trailing whitespace, doing a - check in the <literal role="hook">precommit</literal> hook - will prevent you from committing <filename>foo</filename> due - to the problem with <filename>bar</filename>. This doesn't - seem right. - </para> - - <para>Should you choose the <literal - role="hook">pretxncommit</literal> hook, the check won't - occur until just before the transaction for the commit - completes. This will allow you to check for problems only the - exact files that are being committed. However, if you entered - the commit message interactively and the hook fails, the - transaction will roll back; you'll have to re-enter the commit - message after you fix the trailing whitespace and run <command - role="hg-cmd">hg commit</command> again. - </para> - -&interaction.hook.ws.simple; - - <para>In this example, we introduce a simple <literal - role="hook">pretxncommit</literal> hook that checks for - trailing whitespace. This hook is short, but not very - helpful. It exits with an error status if a change adds a - line with trailing whitespace to any file, but does not print - any information that might help us to identify the offending - file or line. It also has the nice property of not paying - attention to unmodified lines; only lines that introduce new - trailing whitespace cause problems. - </para> - - <para>The above version is much more complex, but also more - useful. It parses a unified diff to see if any lines add - trailing whitespace, and prints the name of the file and the - line number of each such occurrence. Even better, if the - change adds trailing whitespace, this hook saves the commit - comment and prints the name of the save file before exiting - and telling Mercurial to roll the transaction back, so you can - use the <option role="hg-opt-commit">-l filename</option> - option to <command role="hg-cmd">hg commit</command> to reuse - the saved commit message once you've corrected the problem. - </para> - -&interaction.hook.ws.better; - - <para>As a final aside, note in the example above the use of - <command>perl</command>'s in-place editing feature to get rid - of trailing whitespace from a file. This is concise and - useful enough that I will reproduce it here. - </para> - <programlisting>perl -pi -e 's,\s+$,,' filename</programlisting> - - </sect2> - </sect1> - <sect1> - <title>Bundled hooks</title> - - <para>Mercurial ships with several bundled hooks. You can find - them in the <filename class="directory">hgext</filename> - directory of a Mercurial source tree. If you are using a - Mercurial binary package, the hooks will be located in the - <filename class="directory">hgext</filename> directory of - wherever your package installer put Mercurial. - </para> - - <sect2> - <title><literal role="hg-ext">acl</literal>&emdash;access - control for parts of a repository</title> - - <para>The <literal role="hg-ext">acl</literal> extension lets - you control which remote users are allowed to push changesets - to a networked server. You can protect any portion of a - repository (including the entire repo), so that a specific - remote user can push changes that do not affect the protected - portion. - </para> - - <para>This extension implements access control based on the - identity of the user performing a push, - <emphasis>not</emphasis> on who committed the changesets - they're pushing. It makes sense to use this hook only if you - have a locked-down server environment that authenticates - remote users, and you want to be sure that only specific users - are allowed to push changes to that server. - </para> - - <sect3> - <title>Configuring the <literal role="hook">acl</literal> - hook</title> - - <para>In order to manage incoming changesets, the <literal - role="hg-ext">acl</literal> hook must be used as a - <literal role="hook">pretxnchangegroup</literal> hook. This - lets it see which files are modified by each incoming - changeset, and roll back a group of changesets if they - modify <quote>forbidden</quote> files. Example: - </para> - <programlisting>[hooks] -pretxnchangegroup.acl = python:hgext.acl.hook</programlisting> - - <para>The <literal role="hg-ext">acl</literal> extension is - configured using three sections. - </para> - - <para>The <literal role="rc-acl">acl</literal> section has - only one entry, <envar role="rc-item-acl">sources</envar>, - which lists the sources of incoming changesets that the hook - should pay attention to. You don't normally need to - configure this section. - </para> - <itemizedlist> - <listitem><para><envar role="rc-item-acl">serve</envar>: - Control incoming changesets that are arriving from a - remote repository over http or ssh. This is the default - value of <envar role="rc-item-acl">sources</envar>, and - usually the only setting you'll need for this - configuration item. - </para> - </listitem> - <listitem><para><envar role="rc-item-acl">pull</envar>: - Control incoming changesets that are arriving via a pull - from a local repository. - </para> - </listitem> - <listitem><para><envar role="rc-item-acl">push</envar>: - Control incoming changesets that are arriving via a push - from a local repository. - </para> - </listitem> - <listitem><para><envar role="rc-item-acl">bundle</envar>: - Control incoming changesets that are arriving from - another repository via a bundle. - </para> - </listitem></itemizedlist> - - <para>The <literal role="rc-acl.allow">acl.allow</literal> - section controls the users that are allowed to add - changesets to the repository. If this section is not - present, all users that are not explicitly denied are - allowed. If this section is present, all users that are not - explicitly allowed are denied (so an empty section means - that all users are denied). - </para> - - <para>The <literal role="rc-acl.deny">acl.deny</literal> - section determines which users are denied from adding - changesets to the repository. If this section is not - present or is empty, no users are denied. - </para> - - <para>The syntaxes for the <literal - role="rc-acl.allow">acl.allow</literal> and <literal - role="rc-acl.deny">acl.deny</literal> sections are - identical. On the left of each entry is a glob pattern that - matches files or directories, relative to the root of the - repository; on the right, a user name. - </para> - - <para>In the following example, the user - <literal>docwriter</literal> can only push changes to the - <filename class="directory">docs</filename> subtree of the - repository, while <literal>intern</literal> can push changes - to any file or directory except <filename - class="directory">source/sensitive</filename>. - </para> - <programlisting>[acl.allow] -docs/** = docwriter -[acl.deny] -source/sensitive/** = intern</programlisting> - - </sect3> - <sect3> - <title>Testing and troubleshooting</title> - - <para>If you want to test the <literal - role="hg-ext">acl</literal> hook, run it with Mercurial's - debugging output enabled. Since you'll probably be running - it on a server where it's not convenient (or sometimes - possible) to pass in the <option - role="hg-opt-global">--debug</option> option, don't forget - that you can enable debugging output in your <filename - role="special">~/.hgrc</filename>: - </para> - <programlisting>[ui] -debug = true</programlisting> - <para>With this enabled, the <literal - role="hg-ext">acl</literal> hook will print enough - information to let you figure out why it is allowing or - forbidding pushes from specific users. - </para> - - </sect3> - </sect2> - <sect2> - <title><literal - role="hg-ext">bugzilla</literal>&emdash;integration with - Bugzilla</title> - - <para>The <literal role="hg-ext">bugzilla</literal> extension - adds a comment to a Bugzilla bug whenever it finds a reference - to that bug ID in a commit comment. You can install this hook - on a shared server, so that any time a remote user pushes - changes to this server, the hook gets run. - </para> - - <para>It adds a comment to the bug that looks like this (you can - configure the contents of the comment&emdash;see below): - </para> - <programlisting>Changeset aad8b264143a, made by Joe User - <joe.user@domain.com> in the frobnitz repository, refers - to this bug. For complete details, see - http://hg.domain.com/frobnitz?cmd=changeset;node=aad8b264143a - Changeset description: Fix bug 10483 by guarding against some - NULL pointers</programlisting> - <para>The value of this hook is that it automates the process of - updating a bug any time a changeset refers to it. If you - configure the hook properly, it makes it easy for people to - browse straight from a Bugzilla bug to a changeset that refers - to that bug. - </para> - - <para>You can use the code in this hook as a starting point for - some more exotic Bugzilla integration recipes. Here are a few - possibilities: - </para> - <itemizedlist> - <listitem><para>Require that every changeset pushed to the - server have a valid bug ID in its commit comment. In this - case, you'd want to configure the hook as a <literal - role="hook">pretxncommit</literal> hook. This would - allow the hook to reject changes that didn't contain bug - IDs. - </para> - </listitem> - <listitem><para>Allow incoming changesets to automatically - modify the <emphasis>state</emphasis> of a bug, as well as - simply adding a comment. For example, the hook could - recognise the string <quote>fixed bug 31337</quote> as - indicating that it should update the state of bug 31337 to - <quote>requires testing</quote>. - </para> - </listitem></itemizedlist> - - <sect3 id="sec:hook:bugzilla:config"> - <title>Configuring the <literal role="hook">bugzilla</literal> - hook</title> - - <para>You should configure this hook in your server's - <filename role="special">~/.hgrc</filename> as an <literal - role="hook">incoming</literal> hook, for example as - follows: - </para> - <programlisting>[hooks] -incoming.bugzilla = python:hgext.bugzilla.hook</programlisting> - - <para>Because of the specialised nature of this hook, and - because Bugzilla was not written with this kind of - integration in mind, configuring this hook is a somewhat - involved process. - </para> - - <para>Before you begin, you must install the MySQL bindings - for Python on the host(s) where you'll be running the hook. - If this is not available as a binary package for your - system, you can download it from - <citation>web:mysql-python</citation>. - </para> - - <para>Configuration information for this hook lives in the - <literal role="rc-bugzilla">bugzilla</literal> section of - your <filename role="special">~/.hgrc</filename>. - </para> - <itemizedlist> - <listitem><para><envar - role="rc-item-bugzilla">version</envar>: The version - of Bugzilla installed on the server. The database - schema that Bugzilla uses changes occasionally, so this - hook has to know exactly which schema to use. At the - moment, the only version supported is - <literal>2.16</literal>. - </para> - </listitem> - <listitem><para><envar role="rc-item-bugzilla">host</envar>: - The hostname of the MySQL server that stores your - Bugzilla data. The database must be configured to allow - connections from whatever host you are running the - <literal role="hook">bugzilla</literal> hook on. - </para> - </listitem> - <listitem><para><envar role="rc-item-bugzilla">user</envar>: - The username with which to connect to the MySQL server. - The database must be configured to allow this user to - connect from whatever host you are running the <literal - role="hook">bugzilla</literal> hook on. This user - must be able to access and modify Bugzilla tables. The - default value of this item is <literal>bugs</literal>, - which is the standard name of the Bugzilla user in a - MySQL database. - </para> - </listitem> - <listitem><para><envar - role="rc-item-bugzilla">password</envar>: The MySQL - password for the user you configured above. This is - stored as plain text, so you should make sure that - unauthorised users cannot read the <filename - role="special">~/.hgrc</filename> file where you - store this information. - </para> - </listitem> - <listitem><para><envar role="rc-item-bugzilla">db</envar>: - The name of the Bugzilla database on the MySQL server. - The default value of this item is - <literal>bugs</literal>, which is the standard name of - the MySQL database where Bugzilla stores its data. - </para> - </listitem> - <listitem><para><envar - role="rc-item-bugzilla">notify</envar>: If you want - Bugzilla to send out a notification email to subscribers - after this hook has added a comment to a bug, you will - need this hook to run a command whenever it updates the - database. The command to run depends on where you have - installed Bugzilla, but it will typically look something - like this, if you have Bugzilla installed in <filename - class="directory">/var/www/html/bugzilla</filename>: - </para> - <programlisting>cd /var/www/html/bugzilla && - ./processmail %s nobody@nowhere.com</programlisting> - </listitem> - <listitem><para> The Bugzilla - <literal>processmail</literal> program expects to be - given a bug ID (the hook replaces - <quote><literal>%s</literal></quote> with the bug ID) - and an email address. It also expects to be able to - write to some files in the directory that it runs in. - If Bugzilla and this hook are not installed on the same - machine, you will need to find a way to run - <literal>processmail</literal> on the server where - Bugzilla is installed. - </para> - </listitem></itemizedlist> - - </sect3> - <sect3> - <title>Mapping committer names to Bugzilla user names</title> - - <para>By default, the <literal - role="hg-ext">bugzilla</literal> hook tries to use the - email address of a changeset's committer as the Bugzilla - user name with which to update a bug. If this does not suit - your needs, you can map committer email addresses to - Bugzilla user names using a <literal - role="rc-usermap">usermap</literal> section. - </para> - - <para>Each item in the <literal - role="rc-usermap">usermap</literal> section contains an - email address on the left, and a Bugzilla user name on the - right. - </para> - <programlisting>[usermap] -jane.user@example.com = jane</programlisting> - <para>You can either keep the <literal - role="rc-usermap">usermap</literal> data in a normal - <filename role="special">~/.hgrc</filename>, or tell the - <literal role="hg-ext">bugzilla</literal> hook to read the - information from an external <filename>usermap</filename> - file. In the latter case, you can store - <filename>usermap</filename> data by itself in (for example) - a user-modifiable repository. This makes it possible to let - your users maintain their own <envar - role="rc-item-bugzilla">usermap</envar> entries. The main - <filename role="special">~/.hgrc</filename> file might look - like this: - </para> - <programlisting># regular hgrc file refers to external usermap file -[bugzilla] -usermap = /home/hg/repos/userdata/bugzilla-usermap.conf</programlisting> - <para>While the <filename>usermap</filename> file that it - refers to might look like this: - </para> - <programlisting># bugzilla-usermap.conf - inside a hg repository -[usermap] stephanie@example.com = steph</programlisting> - - </sect3> - <sect3> - <title>Configuring the text that gets added to a bug</title> - - <para>You can configure the text that this hook adds as a - comment; you specify it in the form of a Mercurial template. - Several <filename role="special">~/.hgrc</filename> entries - (still in the <literal role="rc-bugzilla">bugzilla</literal> - section) control this behaviour. - </para> - <itemizedlist> - <listitem><para><literal>strip</literal>: The number of - leading path elements to strip from a repository's path - name to construct a partial path for a URL. For example, - if the repositories on your server live under <filename - class="directory">/home/hg/repos</filename>, and you - have a repository whose path is <filename - class="directory">/home/hg/repos/app/tests</filename>, - then setting <literal>strip</literal> to - <literal>4</literal> will give a partial path of - <filename class="directory">app/tests</filename>. The - hook will make this partial path available when - expanding a template, as <literal>webroot</literal>. - </para> - </listitem> - <listitem><para><literal>template</literal>: The text of the - template to use. In addition to the usual - changeset-related variables, this template can use - <literal>hgweb</literal> (the value of the - <literal>hgweb</literal> configuration item above) and - <literal>webroot</literal> (the path constructed using - <literal>strip</literal> above). - </para> - </listitem></itemizedlist> - - <para>In addition, you can add a <envar - role="rc-item-web">baseurl</envar> item to the <literal - role="rc-web">web</literal> section of your <filename - role="special">~/.hgrc</filename>. The <literal - role="hg-ext">bugzilla</literal> hook will make this - available when expanding a template, as the base string to - use when constructing a URL that will let users browse from - a Bugzilla comment to view a changeset. Example: - </para> - <programlisting>[web] -baseurl = http://hg.domain.com/</programlisting> - - <para>Here is an example set of <literal - role="hg-ext">bugzilla</literal> hook config information. - </para> - - &ch10-bugzilla-config.lst; - - </sect3> - <sect3> - <title>Testing and troubleshooting</title> - - <para>The most common problems with configuring the <literal - role="hg-ext">bugzilla</literal> hook relate to running - Bugzilla's <filename>processmail</filename> script and - mapping committer names to user names. - </para> - - <para>Recall from section <xref - linkend="sec:hook:bugzilla:config"/> above that the user - that runs the Mercurial process on the server is also the - one that will run the <filename>processmail</filename> - script. The <filename>processmail</filename> script - sometimes causes Bugzilla to write to files in its - configuration directory, and Bugzilla's configuration files - are usually owned by the user that your web server runs - under. - </para> - - <para>You can cause <filename>processmail</filename> to be run - with the suitable user's identity using the - <command>sudo</command> command. Here is an example entry - for a <filename>sudoers</filename> file. - </para> - <programlisting>hg_user = (httpd_user) -NOPASSWD: /var/www/html/bugzilla/processmail-wrapper %s</programlisting> - <para>This allows the <literal>hg_user</literal> user to run a - <filename>processmail-wrapper</filename> program under the - identity of <literal>httpd_user</literal>. - </para> - - <para>This indirection through a wrapper script is necessary, - because <filename>processmail</filename> expects to be run - with its current directory set to wherever you installed - Bugzilla; you can't specify that kind of constraint in a - <filename>sudoers</filename> file. The contents of the - wrapper script are simple: - </para> - <programlisting>#!/bin/sh -cd `dirname $0` && ./processmail "$1" nobody@example.com</programlisting> - <para>It doesn't seem to matter what email address you pass to - <filename>processmail</filename>. - </para> - - <para>If your <literal role="rc-usermap">usermap</literal> is - not set up correctly, users will see an error message from - the <literal role="hg-ext">bugzilla</literal> hook when they - push changes to the server. The error message will look - like this: - </para> - <programlisting>cannot find bugzilla user id for john.q.public@example.com</programlisting> - <para>What this means is that the committer's address, - <literal>john.q.public@example.com</literal>, is not a valid - Bugzilla user name, nor does it have an entry in your - <literal role="rc-usermap">usermap</literal> that maps it to - a valid Bugzilla user name. - </para> - - </sect3> - </sect2> - <sect2> - <title><literal role="hg-ext">notify</literal>&emdash;send email - notifications</title> - - <para>Although Mercurial's built-in web server provides RSS - feeds of changes in every repository, many people prefer to - receive change notifications via email. The <literal - role="hg-ext">notify</literal> hook lets you send out - notifications to a set of email addresses whenever changesets - arrive that those subscribers are interested in. - </para> - - <para>As with the <literal role="hg-ext">bugzilla</literal> - hook, the <literal role="hg-ext">notify</literal> hook is - template-driven, so you can customise the contents of the - notification messages that it sends. - </para> - - <para>By default, the <literal role="hg-ext">notify</literal> - hook includes a diff of every changeset that it sends out; you - can limit the size of the diff, or turn this feature off - entirely. It is useful for letting subscribers review changes - immediately, rather than clicking to follow a URL. - </para> - - <sect3> - <title>Configuring the <literal role="hg-ext">notify</literal> - hook</title> - - <para>You can set up the <literal - role="hg-ext">notify</literal> hook to send one email - message per incoming changeset, or one per incoming group of - changesets (all those that arrived in a single pull or - push). - </para> - <programlisting>[hooks] -# send one email per group of changes -changegroup.notify = python:hgext.notify.hook -# send one email per change -incoming.notify = python:hgext.notify.hook</programlisting> - - <para>Configuration information for this hook lives in the - <literal role="rc-notify">notify</literal> section of a - <filename role="special">~/.hgrc</filename> file. - </para> - <itemizedlist> - <listitem><para><envar role="rc-item-notify">test</envar>: - By default, this hook does not send out email at all; - instead, it prints the message that it - <emphasis>would</emphasis> send. Set this item to - <literal>false</literal> to allow email to be sent. The - reason that sending of email is turned off by default is - that it takes several tries to configure this extension - exactly as you would like, and it would be bad form to - spam subscribers with a number of <quote>broken</quote> - notifications while you debug your configuration. - </para> - </listitem> - <listitem><para><envar role="rc-item-notify">config</envar>: - The path to a configuration file that contains - subscription information. This is kept separate from - the main <filename role="special">~/.hgrc</filename> so - that you can maintain it in a repository of its own. - People can then clone that repository, update their - subscriptions, and push the changes back to your server. - </para> - </listitem> - <listitem><para><envar role="rc-item-notify">strip</envar>: - The number of leading path separator characters to strip - from a repository's path, when deciding whether a - repository has subscribers. For example, if the - repositories on your server live in <filename - class="directory">/home/hg/repos</filename>, and - <literal role="hg-ext">notify</literal> is considering a - repository named <filename - class="directory">/home/hg/repos/shared/test</filename>, - setting <envar role="rc-item-notify">strip</envar> to - <literal>4</literal> will cause <literal - role="hg-ext">notify</literal> to trim the path it - considers down to <filename - class="directory">shared/test</filename>, and it will - match subscribers against that. - </para> - </listitem> - <listitem><para><envar - role="rc-item-notify">template</envar>: The template - text to use when sending messages. This specifies both - the contents of the message header and its body. - </para> - </listitem> - <listitem><para><envar - role="rc-item-notify">maxdiff</envar>: The maximum - number of lines of diff data to append to the end of a - message. If a diff is longer than this, it is - truncated. By default, this is set to 300. Set this to - <literal>0</literal> to omit diffs from notification - emails. - </para> - </listitem> - <listitem><para><envar - role="rc-item-notify">sources</envar>: A list of - sources of changesets to consider. This lets you limit - <literal role="hg-ext">notify</literal> to only sending - out email about changes that remote users pushed into - this repository via a server, for example. See section - <xref - linkend="sec:hook:sources"/> for the sources you can - specify here. - </para> - </listitem></itemizedlist> - - <para>If you set the <envar role="rc-item-web">baseurl</envar> - item in the <literal role="rc-web">web</literal> section, - you can use it in a template; it will be available as - <literal>webroot</literal>. - </para> - - <para>Here is an example set of <literal - role="hg-ext">notify</literal> configuration information. - </para> - - &ch10-notify-config.lst; - - <para>This will produce a message that looks like the - following: - </para> - - &ch10-notify-config-mail.lst; - - </sect3> - <sect3> - <title>Testing and troubleshooting</title> - - <para>Do not forget that by default, the <literal - role="hg-ext">notify</literal> extension <emphasis>will not - send any mail</emphasis> until you explicitly configure it to do so, - by setting <envar role="rc-item-notify">test</envar> to - <literal>false</literal>. Until you do that, it simply - prints the message it <emphasis>would</emphasis> send. - </para> - - </sect3> - </sect2> - </sect1> - <sect1 id="sec:hook:ref"> - <title>Information for writers of hooks</title> - - <sect2> - <title>In-process hook execution</title> - - <para>An in-process hook is called with arguments of the - following form: - </para> - <programlisting>def myhook(ui, repo, **kwargs): pass</programlisting> - <para>The <literal>ui</literal> parameter is a <literal - role="py-mod-mercurial.ui">ui</literal> object. The - <literal>repo</literal> parameter is a <literal - role="py-mod-mercurial.localrepo">localrepository</literal> - object. The names and values of the - <literal>**kwargs</literal> parameters depend on the hook - being invoked, with the following common features: - </para> - <itemizedlist> - <listitem><para>If a parameter is named - <literal>node</literal> or <literal>parentN</literal>, it - will contain a hexadecimal changeset ID. The empty string - is used to represent <quote>null changeset ID</quote> - instead of a string of zeroes. - </para> - </listitem> - <listitem><para>If a parameter is named - <literal>url</literal>, it will contain the URL of a - remote repository, if that can be determined. - </para> - </listitem> - <listitem><para>Boolean-valued parameters are represented as - Python <literal>bool</literal> objects. - </para> - </listitem></itemizedlist> - - <para>An in-process hook is called without a change to the - process's working directory (unlike external hooks, which are - run in the root of the repository). It must not change the - process's working directory, or it will cause any calls it - makes into the Mercurial API to fail. - </para> - - <para>If a hook returns a boolean <quote>false</quote> value, it - is considered to have succeeded. If it returns a boolean - <quote>true</quote> value or raises an exception, it is - considered to have failed. A useful way to think of the - calling convention is <quote>tell me if you fail</quote>. - </para> - - <para>Note that changeset IDs are passed into Python hooks as - hexadecimal strings, not the binary hashes that Mercurial's - APIs normally use. To convert a hash from hex to binary, use - the <literal>bin</literal> function. - </para> - - </sect2> - <sect2> - <title>External hook execution</title> - - <para>An external hook is passed to the shell of the user - running Mercurial. Features of that shell, such as variable - substitution and command redirection, are available. The hook - is run in the root directory of the repository (unlike - in-process hooks, which are run in the same directory that - Mercurial was run in). - </para> - - <para>Hook parameters are passed to the hook as environment - variables. Each environment variable's name is converted in - upper case and prefixed with the string - <quote><literal>HG_</literal></quote>. For example, if the - name of a parameter is <quote><literal>node</literal></quote>, - the name of the environment variable representing that - parameter will be <quote><literal>HG_NODE</literal></quote>. - </para> - - <para>A boolean parameter is represented as the string - <quote><literal>1</literal></quote> for <quote>true</quote>, - <quote><literal>0</literal></quote> for <quote>false</quote>. - If an environment variable is named <envar>HG_NODE</envar>, - <envar>HG_PARENT1</envar> or <envar>HG_PARENT2</envar>, it - contains a changeset ID represented as a hexadecimal string. - The empty string is used to represent <quote>null changeset - ID</quote> instead of a string of zeroes. If an environment - variable is named <envar>HG_URL</envar>, it will contain the - URL of a remote repository, if that can be determined. - </para> - - <para>If a hook exits with a status of zero, it is considered to - have succeeded. If it exits with a non-zero status, it is - considered to have failed. - </para> - - </sect2> - <sect2> - <title>Finding out where changesets come from</title> - - <para>A hook that involves the transfer of changesets between a - local repository and another may be able to find out - information about the <quote>far side</quote>. Mercurial - knows <emphasis>how</emphasis> changes are being transferred, - and in many cases <emphasis>where</emphasis> they are being - transferred to or from. - </para> - - <sect3 id="sec:hook:sources"> - <title>Sources of changesets</title> - - <para>Mercurial will tell a hook what means are, or were, used - to transfer changesets between repositories. This is - provided by Mercurial in a Python parameter named - <literal>source</literal>, or an environment variable named - <envar>HG_SOURCE</envar>. - </para> - - <itemizedlist> - <listitem><para><literal>serve</literal>: Changesets are - transferred to or from a remote repository over http or - ssh. - </para> - </listitem> - <listitem><para><literal>pull</literal>: Changesets are - being transferred via a pull from one repository into - another. - </para> - </listitem> - <listitem><para><literal>push</literal>: Changesets are - being transferred via a push from one repository into - another. - </para> - </listitem> - <listitem><para><literal>bundle</literal>: Changesets are - being transferred to or from a bundle. - </para> - </listitem></itemizedlist> - - </sect3> - <sect3 id="sec:hook:url"> - <title>Where changes are going&emdash;remote repository - URLs</title> - - <para>When possible, Mercurial will tell a hook the location - of the <quote>far side</quote> of an activity that transfers - changeset data between repositories. This is provided by - Mercurial in a Python parameter named - <literal>url</literal>, or an environment variable named - <envar>HG_URL</envar>. - </para> - - <para>This information is not always known. If a hook is - invoked in a repository that is being served via http or - ssh, Mercurial cannot tell where the remote repository is, - but it may know where the client is connecting from. In - such cases, the URL will take one of the following forms: - </para> - <itemizedlist> - <listitem><para><literal>remote:ssh:1.2.3.4</literal>&emdash;remote - ssh client, at the IP address - <literal>1.2.3.4</literal>. - </para> - </listitem> - <listitem><para><literal>remote:http:1.2.3.4</literal>&emdash;remote - http client, at the IP address - <literal>1.2.3.4</literal>. If the client is using SSL, - this will be of the form - <literal>remote:https:1.2.3.4</literal>. - </para> - </listitem> - <listitem><para>Empty&emdash;no information could be - discovered about the remote client. - </para> - </listitem></itemizedlist> - - </sect3> - </sect2> - </sect1> - <sect1> - <title>Hook reference</title> - - <sect2 id="sec:hook:changegroup"> - <title><literal role="hook">changegroup</literal>&emdash;after - remote changesets added</title> - - <para>This hook is run after a group of pre-existing changesets - has been added to the repository, for example via a <command - role="hg-cmd">hg pull</command> or <command role="hg-cmd">hg - unbundle</command>. This hook is run once per operation - that added one or more changesets. This is in contrast to the - <literal role="hook">incoming</literal> hook, which is run - once per changeset, regardless of whether the changesets - arrive in a group. - </para> - - <para>Some possible uses for this hook include kicking off an - automated build or test of the added changesets, updating a - bug database, or notifying subscribers that a repository - contains new changes. - </para> - - <para>Parameters to this hook: - </para> - <itemizedlist> - <listitem><para><literal>node</literal>: A changeset ID. The - changeset ID of the first changeset in the group that was - added. All changesets between this and - <literal role="tag">tip</literal>, inclusive, were added by a single - <command role="hg-cmd">hg pull</command>, <command - role="hg-cmd">hg push</command> or <command - role="hg-cmd">hg unbundle</command>. - </para> - </listitem> - <listitem><para><literal>source</literal>: A string. The - source of these changes. See section <xref - linkend="sec:hook:sources"/> for details. - </para> - </listitem> - <listitem><para><literal>url</literal>: A URL. The location - of the remote repository, if known. See section <xref - linkend="sec:hook:url"/> for more - information. - </para> - </listitem></itemizedlist> - - <para>See also: <literal role="hook">incoming</literal> (section - <xref linkend="sec:hook:incoming"/>), <literal - role="hook">prechangegroup</literal> (section <xref - linkend="sec:hook:prechangegroup"/>), <literal - role="hook">pretxnchangegroup</literal> (section <xref - linkend="sec:hook:pretxnchangegroup"/>) - </para> - - </sect2> - <sect2 id="sec:hook:commit"> - <title><literal role="hook">commit</literal>&emdash;after a new - changeset is created</title> - - <para>This hook is run after a new changeset has been created. - </para> - - <para>Parameters to this hook: - </para> - <itemizedlist> - <listitem><para><literal>node</literal>: A changeset ID. The - changeset ID of the newly committed changeset. - </para> - </listitem> - <listitem><para><literal>parent1</literal>: A changeset ID. - The changeset ID of the first parent of the newly - committed changeset. - </para> - </listitem> - <listitem><para><literal>parent2</literal>: A changeset ID. - The changeset ID of the second parent of the newly - committed changeset. - </para> - </listitem></itemizedlist> - - <para>See also: <literal role="hook">precommit</literal> - (section <xref linkend="sec:hook:precommit"/>), <literal - role="hook">pretxncommit</literal> (section <xref - linkend="sec:hook:pretxncommit"/>) - </para> - - </sect2> - <sect2 id="sec:hook:incoming"> - <title><literal role="hook">incoming</literal>&emdash;after one - remote changeset is added</title> - - <para>This hook is run after a pre-existing changeset has been - added to the repository, for example via a <command - role="hg-cmd">hg push</command>. If a group of changesets - was added in a single operation, this hook is called once for - each added changeset. - </para> - - <para>You can use this hook for the same purposes as the - <literal role="hook">changegroup</literal> hook (section <xref - linkend="sec:hook:changegroup"/>); it's simply - more convenient sometimes to run a hook once per group of - changesets, while other times it's handier once per changeset. - </para> - - <para>Parameters to this hook: - </para> - <itemizedlist> - <listitem><para><literal>node</literal>: A changeset ID. The - ID of the newly added changeset. - </para> - </listitem> - <listitem><para><literal>source</literal>: A string. The - source of these changes. See section <xref - linkend="sec:hook:sources"/> for details. - </para> - </listitem> - <listitem><para><literal>url</literal>: A URL. The location - of the remote repository, if known. See section <xref - linkend="sec:hook:url"/> for more - information. - </para> - </listitem></itemizedlist> - - <para>See also: <literal role="hook">changegroup</literal> - (section <xref linkend="sec:hook:changegroup"/>) <literal - role="hook">prechangegroup</literal> (section <xref - linkend="sec:hook:prechangegroup"/>), <literal - role="hook">pretxnchangegroup</literal> (section <xref - linkend="sec:hook:pretxnchangegroup"/>) - </para> - - </sect2> - <sect2 id="sec:hook:outgoing"> - <title><literal role="hook">outgoing</literal>&emdash;after - changesets are propagated</title> - - <para>This hook is run after a group of changesets has been - propagated out of this repository, for example by a <command - role="hg-cmd">hg push</command> or <command role="hg-cmd">hg - bundle</command> command. - </para> - - <para>One possible use for this hook is to notify administrators - that changes have been pulled. - </para> - - <para>Parameters to this hook: - </para> - <itemizedlist> - <listitem><para><literal>node</literal>: A changeset ID. The - changeset ID of the first changeset of the group that was - sent. - </para> - </listitem> - <listitem><para><literal>source</literal>: A string. The - source of the of the operation (see section <xref - linkend="sec:hook:sources"/>). If a remote - client pulled changes from this repository, - <literal>source</literal> will be - <literal>serve</literal>. If the client that obtained - changes from this repository was local, - <literal>source</literal> will be - <literal>bundle</literal>, <literal>pull</literal>, or - <literal>push</literal>, depending on the operation the - client performed. - </para> - </listitem> - <listitem><para><literal>url</literal>: A URL. The location - of the remote repository, if known. See section <xref - linkend="sec:hook:url"/> for more - information. - </para> - </listitem></itemizedlist> - - <para>See also: <literal role="hook">preoutgoing</literal> - (section <xref linkend="sec:hook:preoutgoing"/>) - </para> - - </sect2> - <sect2 id="sec:hook:prechangegroup"> - <title><literal - role="hook">prechangegroup</literal>&emdash;before starting - to add remote changesets</title> - - <para>This controlling hook is run before Mercurial begins to - add a group of changesets from another repository. - </para> - - <para>This hook does not have any information about the - changesets to be added, because it is run before transmission - of those changesets is allowed to begin. If this hook fails, - the changesets will not be transmitted. - </para> - - <para>One use for this hook is to prevent external changes from - being added to a repository. For example, you could use this - to <quote>freeze</quote> a server-hosted branch temporarily or - permanently so that users cannot push to it, while still - allowing a local administrator to modify the repository. - </para> - - <para>Parameters to this hook: - </para> - <itemizedlist> - <listitem><para><literal>source</literal>: A string. The - source of these changes. See section <xref - linkend="sec:hook:sources"/> for details. - </para> - </listitem> - <listitem><para><literal>url</literal>: A URL. The location - of the remote repository, if known. See section <xref - linkend="sec:hook:url"/> for more - information. - </para> - </listitem></itemizedlist> - - <para>See also: <literal role="hook">changegroup</literal> - (section <xref linkend="sec:hook:changegroup"/>), <literal - role="hook">incoming</literal> (section <xref - linkend="sec:hook:incoming"/>), , <literal - role="hook">pretxnchangegroup</literal> (section <xref - linkend="sec:hook:pretxnchangegroup"/>) - </para> - - </sect2> - <sect2 id="sec:hook:precommit"> - <title><literal role="hook">precommit</literal>&emdash;before - starting to commit a changeset</title> - - <para>This hook is run before Mercurial begins to commit a new - changeset. It is run before Mercurial has any of the metadata - for the commit, such as the files to be committed, the commit - message, or the commit date. - </para> - - <para>One use for this hook is to disable the ability to commit - new changesets, while still allowing incoming changesets. - Another is to run a build or test, and only allow the commit - to begin if the build or test succeeds. - </para> - - <para>Parameters to this hook: - </para> - <itemizedlist> - <listitem><para><literal>parent1</literal>: A changeset ID. - The changeset ID of the first parent of the working - directory. - </para> - </listitem> - <listitem><para><literal>parent2</literal>: A changeset ID. - The changeset ID of the second parent of the working - directory. - </para> - </listitem></itemizedlist> - <para>If the commit proceeds, the parents of the working - directory will become the parents of the new changeset. - </para> - - <para>See also: <literal role="hook">commit</literal> (section - <xref linkend="sec:hook:commit"/>), <literal - role="hook">pretxncommit</literal> (section <xref - linkend="sec:hook:pretxncommit"/>) - </para> - - </sect2> - <sect2 id="sec:hook:preoutgoing"> - <title><literal role="hook">preoutgoing</literal>&emdash;before - starting to propagate changesets</title> - - <para>This hook is invoked before Mercurial knows the identities - of the changesets to be transmitted. - </para> - - <para>One use for this hook is to prevent changes from being - transmitted to another repository. - </para> - - <para>Parameters to this hook: - </para> - <itemizedlist> - <listitem><para><literal>source</literal>: A string. The - source of the operation that is attempting to obtain - changes from this repository (see section <xref - linkend="sec:hook:sources"/>). See the documentation - for the <literal>source</literal> parameter to the - <literal role="hook">outgoing</literal> hook, in section - <xref linkend="sec:hook:outgoing"/>, for possible values - of - this parameter. - </para> - </listitem> - <listitem><para><literal>url</literal>: A URL. The location - of the remote repository, if known. See section <xref - linkend="sec:hook:url"/> for more - information. - </para> - </listitem></itemizedlist> - - <para>See also: <literal role="hook">outgoing</literal> (section - <xref linkend="sec:hook:outgoing"/>) - </para> - - </sect2> - <sect2 id="sec:hook:pretag"> - <title><literal role="hook">pretag</literal>&emdash;before - tagging a changeset</title> - - <para>This controlling hook is run before a tag is created. If - the hook succeeds, creation of the tag proceeds. If the hook - fails, the tag is not created. - </para> - - <para>Parameters to this hook: - </para> - <itemizedlist> - <listitem><para><literal>local</literal>: A boolean. Whether - the tag is local to this repository instance (i.e. stored - in <filename role="special">.hg/localtags</filename>) or - managed by Mercurial (stored in <filename - role="special">.hgtags</filename>). - </para> - </listitem> - <listitem><para><literal>node</literal>: A changeset ID. The - ID of the changeset to be tagged. - </para> - </listitem> - <listitem><para><literal>tag</literal>: A string. The name of - the tag to be created. - </para> - </listitem></itemizedlist> - - <para>If the tag to be created is revision-controlled, the - <literal role="hook">precommit</literal> and <literal - role="hook">pretxncommit</literal> hooks (sections <xref - linkend="sec:hook:commit"/> and <xref - linkend="sec:hook:pretxncommit"/>) will also be run. - </para> - - <para>See also: <literal role="hook">tag</literal> (section - <xref linkend="sec:hook:tag"/>) - </para> - </sect2> - <sect2 id="sec:hook:pretxnchangegroup"> - <title><literal - role="hook">pretxnchangegroup</literal>&emdash;before - completing addition of remote changesets</title> - - <para>This controlling hook is run before a - transaction&emdash;that manages the addition of a group of new - changesets from outside the repository&emdash;completes. If - the hook succeeds, the transaction completes, and all of the - changesets become permanent within this repository. If the - hook fails, the transaction is rolled back, and the data for - the changesets is erased. - </para> - - <para>This hook can access the metadata associated with the - almost-added changesets, but it should not do anything - permanent with this data. It must also not modify the working - directory. - </para> - - <para>While this hook is running, if other Mercurial processes - access this repository, they will be able to see the - almost-added changesets as if they are permanent. This may - lead to race conditions if you do not take steps to avoid - them. - </para> - - <para>This hook can be used to automatically vet a group of - changesets. If the hook fails, all of the changesets are - <quote>rejected</quote> when the transaction rolls back. - </para> - - <para>Parameters to this hook: - </para> - <itemizedlist> - <listitem><para><literal>node</literal>: A changeset ID. The - changeset ID of the first changeset in the group that was - added. All changesets between this and - <literal role="tag">tip</literal>, - inclusive, were added by a single <command - role="hg-cmd">hg pull</command>, <command - role="hg-cmd">hg push</command> or <command - role="hg-cmd">hg unbundle</command>. - </para> - </listitem> - <listitem><para><literal>source</literal>: A string. The - source of these changes. See section <xref - linkend="sec:hook:sources"/> for details. - </para> - </listitem> - <listitem><para><literal>url</literal>: A URL. The location - of the remote repository, if known. See section <xref - linkend="sec:hook:url"/> for more - information. - </para> - </listitem></itemizedlist> - - <para>See also: <literal role="hook">changegroup</literal> - (section <xref linkend="sec:hook:changegroup"/>), <literal - role="hook">incoming</literal> (section <xref - linkend="sec:hook:incoming"/>), <literal - role="hook">prechangegroup</literal> (section <xref - linkend="sec:hook:prechangegroup"/>) - </para> - - </sect2> - <sect2 id="sec:hook:pretxncommit"> - <title><literal role="hook">pretxncommit</literal>&emdash;before - completing commit of new changeset</title> - - <para>This controlling hook is run before a - transaction&emdash;that manages a new commit&emdash;completes. - If the hook succeeds, the transaction completes and the - changeset becomes permanent within this repository. If the - hook fails, the transaction is rolled back, and the commit - data is erased. - </para> - - <para>This hook can access the metadata associated with the - almost-new changeset, but it should not do anything permanent - with this data. It must also not modify the working - directory. - </para> - - <para>While this hook is running, if other Mercurial processes - access this repository, they will be able to see the - almost-new changeset as if it is permanent. This may lead to - race conditions if you do not take steps to avoid them. - </para> - - <para>Parameters to this hook: - </para> - <itemizedlist> - <listitem><para><literal>node</literal>: A changeset ID. The - changeset ID of the newly committed changeset. - </para> - </listitem> - <listitem><para><literal>parent1</literal>: A changeset ID. - The changeset ID of the first parent of the newly - committed changeset. - </para> - </listitem> - <listitem><para><literal>parent2</literal>: A changeset ID. - The changeset ID of the second parent of the newly - committed changeset. - </para> - </listitem></itemizedlist> - - <para>See also: <literal role="hook">precommit</literal> - (section <xref linkend="sec:hook:precommit"/>) - </para> - - </sect2> - <sect2 id="sec:hook:preupdate"> - <title><literal role="hook">preupdate</literal>&emdash;before - updating or merging working directory</title> - - <para>This controlling hook is run before an update or merge of - the working directory begins. It is run only if Mercurial's - normal pre-update checks determine that the update or merge - can proceed. If the hook succeeds, the update or merge may - proceed; if it fails, the update or merge does not start. - </para> - - <para>Parameters to this hook: - </para> - <itemizedlist> - <listitem><para><literal>parent1</literal>: A changeset ID. - The ID of the parent that the working directory is to be - updated to. If the working directory is being merged, it - will not change this parent. - </para> - </listitem> - <listitem><para><literal>parent2</literal>: A changeset ID. - Only set if the working directory is being merged. The ID - of the revision that the working directory is being merged - with. - </para> - </listitem></itemizedlist> - - <para>See also: <literal role="hook">update</literal> (section - <xref linkend="sec:hook:update"/>) - </para> - - </sect2> - <sect2 id="sec:hook:tag"> - <title><literal role="hook">tag</literal>&emdash;after tagging a - changeset</title> - - <para>This hook is run after a tag has been created. - </para> - - <para>Parameters to this hook: - </para> - <itemizedlist> - <listitem><para><literal>local</literal>: A boolean. Whether - the new tag is local to this repository instance (i.e. - stored in <filename - role="special">.hg/localtags</filename>) or managed by - Mercurial (stored in <filename - role="special">.hgtags</filename>). - </para> - </listitem> - <listitem><para><literal>node</literal>: A changeset ID. The - ID of the changeset that was tagged. - </para> - </listitem> - <listitem><para><literal>tag</literal>: A string. The name of - the tag that was created. - </para> - </listitem></itemizedlist> - - <para>If the created tag is revision-controlled, the <literal - role="hook">commit</literal> hook (section <xref - linkend="sec:hook:commit"/>) is run before this hook. - </para> - - <para>See also: <literal role="hook">pretag</literal> (section - <xref linkend="sec:hook:pretag"/>) - </para> - - </sect2> - <sect2 id="sec:hook:update"> - <title><literal role="hook">update</literal>&emdash;after - updating or merging working directory</title> - - <para>This hook is run after an update or merge of the working - directory completes. Since a merge can fail (if the external - <command>hgmerge</command> command fails to resolve conflicts - in a file), this hook communicates whether the update or merge - completed cleanly. - </para> - - <itemizedlist> - <listitem><para><literal>error</literal>: A boolean. - Indicates whether the update or merge completed - successfully. - </para> - </listitem> - <listitem><para><literal>parent1</literal>: A changeset ID. - The ID of the parent that the working directory was - updated to. If the working directory was merged, it will - not have changed this parent. - </para> - </listitem> - <listitem><para><literal>parent2</literal>: A changeset ID. - Only set if the working directory was merged. The ID of - the revision that the working directory was merged with. - </para> - </listitem></itemizedlist> - - <para>See also: <literal role="hook">preupdate</literal> - (section <xref linkend="sec:hook:preupdate"/>) - </para> - - </sect2> - </sect1> -</chapter> - -<!-- -local variables: -sgml-parent-document: ("00book.xml" "book" "chapter") -end: --->
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/en/ch10-template.xml Thu Mar 19 20:54:12 2009 -0700 @@ -0,0 +1,675 @@ +<!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : --> + +<chapter id="chap:template"> + <?dbhtml filename="customizing-the-output-of-mercurial.html"?> + <title>Customising the output of Mercurial</title> + + <para>Mercurial provides a powerful mechanism to let you control how + it displays information. The mechanism is based on templates. + You can use templates to generate specific output for a single + command, or to customise the entire appearance of the built-in web + interface.</para> + + <sect1 id="sec:style"> + <title>Using precanned output styles</title> + + <para>Packaged with Mercurial are some output styles that you can + use immediately. A style is simply a precanned template that + someone wrote and installed somewhere that Mercurial can + find.</para> + + <para>Before we take a look at Mercurial's bundled styles, let's + review its normal output.</para> + + &interaction.template.simple.normal; + + <para>This is somewhat informative, but it takes up a lot of + space&emdash;five lines of output per changeset. The + <literal>compact</literal> style reduces this to three lines, + presented in a sparse manner.</para> + + &interaction.template.simple.compact; + + <para>The <literal>changelog</literal> style hints at the + expressive power of Mercurial's templating engine. This style + attempts to follow the GNU Project's changelog + guidelines<citation>web:changelog</citation>.</para> + + &interaction.template.simple.changelog; + + <para>You will not be shocked to learn that Mercurial's default + output style is named <literal>default</literal>.</para> + + <sect2> + <title>Setting a default style</title> + + <para>You can modify the output style that Mercurial will use + for every command by editing your <filename + role="special">~/.hgrc</filename> file, naming the style + you would prefer to use.</para> + + <programlisting>[ui] +style = compact</programlisting> + + <para>If you write a style of your own, you can use it by either + providing the path to your style file, or copying your style + file into a location where Mercurial can find it (typically + the <literal>templates</literal> subdirectory of your + Mercurial install directory).</para> + + </sect2> + </sect1> + <sect1> + <title>Commands that support styles and templates</title> + + <para>All of Mercurial's + <quote><literal>log</literal>-like</quote> commands let you use + styles and templates: <command role="hg-cmd">hg + incoming</command>, <command role="hg-cmd">hg log</command>, + <command role="hg-cmd">hg outgoing</command>, and <command + role="hg-cmd">hg tip</command>.</para> + + <para>As I write this manual, these are so far the only commands + that support styles and templates. Since these are the most + important commands that need customisable output, there has been + little pressure from the Mercurial user community to add style + and template support to other commands.</para> + + </sect1> + <sect1> + <title>The basics of templating</title> + + <para>At its simplest, a Mercurial template is a piece of text. + Some of the text never changes, while other parts are + <emphasis>expanded</emphasis>, or replaced with new text, when + necessary.</para> + + <para>Before we continue, let's look again at a simple example of + Mercurial's normal output.</para> + + &interaction.template.simple.normal; + + <para>Now, let's run the same command, but using a template to + change its output.</para> + + &interaction.template.simple.simplest; + + <para>The example above illustrates the simplest possible + template; it's just a piece of static text, printed once for + each changeset. The <option + role="hg-opt-log">--template</option> option to the <command + role="hg-cmd">hg log</command> command tells Mercurial to use + the given text as the template when printing each + changeset.</para> + + <para>Notice that the template string above ends with the text + <quote><literal>\n</literal></quote>. This is an + <emphasis>escape sequence</emphasis>, telling Mercurial to print + a newline at the end of each template item. If you omit this + newline, Mercurial will run each piece of output together. See + section <xref linkend="sec:template:escape"/> for more details + of escape sequences.</para> + + <para>A template that prints a fixed string of text all the time + isn't very useful; let's try something a bit more + complex.</para> + + &interaction.template.simple.simplesub; + + <para>As you can see, the string + <quote><literal>{desc}</literal></quote> in the template has + been replaced in the output with the description of each + changeset. Every time Mercurial finds text enclosed in curly + braces (<quote><literal>{</literal></quote> and + <quote><literal>}</literal></quote>), it will try to replace the braces + and text with the expansion of whatever is inside. To print a + literal curly brace, you must escape it, as described in section + <xref + linkend="sec:template:escape"/>.</para> + + </sect1> + <sect1 id="sec:template:keyword"> + <title>Common template keywords</title> + + <para>You can start writing simple templates immediately using the + keywords below.</para> + + <itemizedlist> + <listitem><para><literal + role="template-keyword">author</literal>: String. The + unmodified author of the changeset.</para> + </listitem> + <listitem><para><literal + role="template-keyword">branches</literal>: String. The + name of the branch on which the changeset was committed. + Will be empty if the branch name was + <literal>default</literal>.</para> + </listitem> + <listitem><para><literal role="template-keyword">date</literal>: + Date information. The date when the changeset was + committed. This is <emphasis>not</emphasis> human-readable; + you must pass it through a filter that will render it + appropriately. See section <xref + linkend="sec:template:filter"/> for more information + on filters. The date is expressed as a pair of numbers. The + first number is a Unix UTC timestamp (seconds since January + 1, 1970); the second is the offset of the committer's + timezone from UTC, in seconds.</para> + </listitem> + <listitem><para><literal role="template-keyword">desc</literal>: + String. The text of the changeset description.</para> + </listitem> + <listitem><para><literal + role="template-keyword">files</literal>: List of strings. + All files modified, added, or removed by this + changeset.</para> + </listitem> + <listitem><para><literal + role="template-keyword">file_adds</literal>: List of + strings. Files added by this changeset.</para> + </listitem> + <listitem><para><literal + role="template-keyword">file_dels</literal>: List of + strings. Files removed by this changeset.</para> + </listitem> + <listitem><para><literal role="template-keyword">node</literal>: + String. The changeset identification hash, as a + 40-character hexadecimal string.</para> + </listitem> + <listitem><para><literal + role="template-keyword">parents</literal>: List of + strings. The parents of the changeset.</para> + </listitem> + <listitem><para><literal role="template-keyword">rev</literal>: + Integer. The repository-local changeset revision + number.</para> + </listitem> + <listitem><para><literal role="template-keyword">tags</literal>: + List of strings. Any tags associated with the + changeset.</para> + </listitem></itemizedlist> + + <para>A few simple experiments will show us what to expect when we + use these keywords; you can see the results below.</para> + +&interaction.template.simple.keywords; + + <para>As we noted above, the date keyword does not produce + human-readable output, so we must treat it specially. This + involves using a <emphasis>filter</emphasis>, about which more + in section <xref + linkend="sec:template:filter"/>.</para> + + &interaction.template.simple.datekeyword; + + </sect1> + <sect1 id="sec:template:escape"> + <title>Escape sequences</title> + + <para>Mercurial's templating engine recognises the most commonly + used escape sequences in strings. When it sees a backslash + (<quote><literal>\</literal></quote>) character, it looks at the + following character and substitutes the two characters with a + single replacement, as described below.</para> + + <itemizedlist> + <listitem><para><literal>\</literal>: + Backslash, <quote><literal>\</literal></quote>, ASCII + 134.</para> + </listitem> + <listitem><para><literal>\n</literal>: Newline, + ASCII 12.</para> + </listitem> + <listitem><para><literal>\r</literal>: Carriage + return, ASCII 15.</para> + </listitem> + <listitem><para><literal>\t</literal>: Tab, ASCII + 11.</para> + </listitem> + <listitem><para><literal>\v</literal>: Vertical + tab, ASCII 13.</para> + </listitem> + <listitem><para><literal>{</literal>: Open curly + brace, <quote><literal>{</literal></quote>, ASCII + 173.</para> + </listitem> + <listitem><para><literal>}</literal>: Close curly + brace, <quote><literal>}</literal></quote>, ASCII + 175.</para> + </listitem></itemizedlist> + + <para>As indicated above, if you want the expansion of a template + to contain a literal <quote><literal>\</literal></quote>, + <quote><literal>{</literal></quote>, or + <quote><literal>{</literal></quote> character, you must escape + it.</para> + + </sect1> + <sect1 id="sec:template:filter"> + <title>Filtering keywords to change their results</title> + + <para>Some of the results of template expansion are not + immediately easy to use. Mercurial lets you specify an optional + chain of <emphasis>filters</emphasis> to modify the result of + expanding a keyword. You have already seen a common filter, + <literal role="template-kw-filt-date">isodate</literal>, in + action above, to make a date readable.</para> + + <para>Below is a list of the most commonly used filters that + Mercurial supports. While some filters can be applied to any + text, others can only be used in specific circumstances. The + name of each filter is followed first by an indication of where + it can be used, then a description of its effect.</para> + + <itemizedlist> + <listitem><para><literal + role="template-filter">addbreaks</literal>: Any text. Add + an XHTML <quote><literal><br/></literal></quote> tag + before the end of every line except the last. For example, + <quote><literal>foo\nbar</literal></quote> becomes + <quote><literal>foo<br/>\nbar</literal></quote>.</para> + </listitem> + <listitem><para><literal + role="template-kw-filt-date">age</literal>: <literal + role="template-keyword">date</literal> keyword. Render + the age of the date, relative to the current time. Yields a + string like <quote><literal>10 + minutes</literal></quote>.</para> + </listitem> + <listitem><para><literal + role="template-filter">basename</literal>: Any text, but + most useful for the <literal + role="template-keyword">files</literal> keyword and its + relatives. Treat the text as a path, and return the + basename. For example, + <quote><literal>foo/bar/baz</literal></quote> becomes + <quote><literal>baz</literal></quote>.</para> + </listitem> + <listitem><para><literal + role="template-kw-filt-date">date</literal>: <literal + role="template-keyword">date</literal> keyword. Render a + date in a similar format to the Unix <literal + role="template-keyword">date</literal> command, but with + timezone included. Yields a string like <quote><literal>Mon + Sep 04 15:13:13 2006 -0700</literal></quote>.</para> + </listitem> + <listitem><para><literal + role="template-kw-filt-author">domain</literal>: Any text, + but most useful for the <literal + role="template-keyword">author</literal> keyword. Finds + the first string that looks like an email address, and + extract just the domain component. For example, + <quote><literal>Bryan O'Sullivan + <bos@serpentine.com></literal></quote> becomes + <quote><literal>serpentine.com</literal></quote>.</para> + </listitem> + <listitem><para><literal + role="template-kw-filt-author">email</literal>: Any text, + but most useful for the <literal + role="template-keyword">author</literal> keyword. Extract + the first string that looks like an email address. For + example, <quote><literal>Bryan O'Sullivan + <bos@serpentine.com></literal></quote> becomes + <quote><literal>bos@serpentine.com</literal></quote>.</para> + </listitem> + <listitem><para><literal + role="template-filter">escape</literal>: Any text. + Replace the special XML/XHTML characters + <quote><literal>&</literal></quote>, + <quote><literal><</literal></quote> and + <quote><literal>></literal></quote> with XML + entities.</para> + </listitem> + <listitem><para><literal + role="template-filter">fill68</literal>: Any text. Wrap + the text to fit in 68 columns. This is useful before you + pass text through the <literal + role="template-filter">tabindent</literal> filter, and + still want it to fit in an 80-column fixed-font + window.</para> + </listitem> + <listitem><para><literal + role="template-filter">fill76</literal>: Any text. Wrap + the text to fit in 76 columns.</para> + </listitem> + <listitem><para><literal + role="template-filter">firstline</literal>: Any text. + Yield the first line of text, without any trailing + newlines.</para> + </listitem> + <listitem><para><literal + role="template-kw-filt-date">hgdate</literal>: <literal + role="template-keyword">date</literal> keyword. Render + the date as a pair of readable numbers. Yields a string + like <quote><literal>1157407993 + 25200</literal></quote>.</para> + </listitem> + <listitem><para><literal + role="template-kw-filt-date">isodate</literal>: <literal + role="template-keyword">date</literal> keyword. Render + the date as a text string in ISO 8601 format. Yields a + string like <quote><literal>2006-09-04 15:13:13 + -0700</literal></quote>.</para> + </listitem> + <listitem><para><literal + role="template-filter">obfuscate</literal>: Any text, but + most useful for the <literal + role="template-keyword">author</literal> keyword. Yield + the input text rendered as a sequence of XML entities. This + helps to defeat some particularly stupid screen-scraping + email harvesting spambots.</para> + </listitem> + <listitem><para><literal + role="template-kw-filt-author">person</literal>: Any text, + but most useful for the <literal + role="template-keyword">author</literal> keyword. Yield + the text before an email address. For example, + <quote><literal>Bryan O'Sullivan + <bos@serpentine.com></literal></quote> becomes + <quote><literal>Bryan O'Sullivan</literal></quote>.</para> + </listitem> + <listitem><para><literal + role="template-kw-filt-date">rfc822date</literal>: + <literal role="template-keyword">date</literal> keyword. + Render a date using the same format used in email headers. + Yields a string like <quote><literal>Mon, 04 Sep 2006 + 15:13:13 -0700</literal></quote>.</para> + </listitem> + <listitem><para><literal + role="template-kw-filt-node">short</literal>: Changeset + hash. Yield the short form of a changeset hash, i.e. a + 12-character hexadecimal string.</para> + </listitem> + <listitem><para><literal + role="template-kw-filt-date">shortdate</literal>: <literal + role="template-keyword">date</literal> keyword. Render + the year, month, and day of the date. Yields a string like + <quote><literal>2006-09-04</literal></quote>.</para> + </listitem> + <listitem><para><literal role="template-filter">strip</literal>: + Any text. Strip all leading and trailing whitespace from + the string.</para> + </listitem> + <listitem><para><literal + role="template-filter">tabindent</literal>: Any text. + Yield the text, with every line except the first starting + with a tab character.</para> + </listitem> + <listitem><para><literal + role="template-filter">urlescape</literal>: Any text. + Escape all characters that are considered + <quote>special</quote> by URL parsers. For example, + <literal>foo bar</literal> becomes + <literal>foo%20bar</literal>.</para> + </listitem> + <listitem><para><literal + role="template-kw-filt-author">user</literal>: Any text, + but most useful for the <literal + role="template-keyword">author</literal> keyword. Return + the <quote>user</quote> portion of an email address. For + example, <quote><literal>Bryan O'Sullivan + <bos@serpentine.com></literal></quote> becomes + <quote><literal>bos</literal></quote>.</para> + </listitem></itemizedlist> + +&interaction.template.simple.manyfilters; + + <note> + <para> If you try to apply a filter to a piece of data that it + cannot process, Mercurial will fail and print a Python + exception. For example, trying to run the output of the + <literal role="template-keyword">desc</literal> keyword into + the <literal role="template-kw-filt-date">isodate</literal> + filter is not a good idea.</para> + </note> + + <sect2> + <title>Combining filters</title> + + <para>It is easy to combine filters to yield output in the form + you would like. The following chain of filters tidies up a + description, then makes sure that it fits cleanly into 68 + columns, then indents it by a further 8 characters (at least + on Unix-like systems, where a tab is conventionally 8 + characters wide).</para> + + &interaction.template.simple.combine; + + <para>Note the use of <quote><literal>\t</literal></quote> (a + tab character) in the template to force the first line to be + indented; this is necessary since <literal + role="template-keyword">tabindent</literal> indents all + lines <emphasis>except</emphasis> the first.</para> + + <para>Keep in mind that the order of filters in a chain is + significant. The first filter is applied to the result of the + keyword; the second to the result of the first filter; and so + on. For example, using <literal>fill68|tabindent</literal> + gives very different results from + <literal>tabindent|fill68</literal>.</para> + + + </sect2> + </sect1> + <sect1> + <title>From templates to styles</title> + + <para>A command line template provides a quick and simple way to + format some output. Templates can become verbose, though, and + it's useful to be able to give a template a name. A style file + is a template with a name, stored in a file.</para> + + <para>More than that, using a style file unlocks the power of + Mercurial's templating engine in ways that are not possible + using the command line <option + role="hg-opt-log">--template</option> option.</para> + + <sect2> + <title>The simplest of style files</title> + + <para>Our simple style file contains just one line:</para> + + &interaction.template.simple.rev; + + <para>This tells Mercurial, <quote>if you're printing a + changeset, use the text on the right as the + template</quote>.</para> + + </sect2> + <sect2> + <title>Style file syntax</title> + + <para>The syntax rules for a style file are simple.</para> + + <itemizedlist> + <listitem><para>The file is processed one line at a + time.</para> + </listitem> + <listitem><para>Leading and trailing white space are + ignored.</para> + </listitem> + <listitem><para>Empty lines are skipped.</para> + </listitem> + <listitem><para>If a line starts with either of the characters + <quote><literal>#</literal></quote> or + <quote><literal>;</literal></quote>, the entire line is + treated as a comment, and skipped as if empty.</para> + </listitem> + <listitem><para>A line starts with a keyword. This must start + with an alphabetic character or underscore, and can + subsequently contain any alphanumeric character or + underscore. (In regexp notation, a keyword must match + <literal>[A-Za-z_][A-Za-z0-9_]*</literal>.)</para> + </listitem> + <listitem><para>The next element must be an + <quote><literal>=</literal></quote> character, which can + be preceded or followed by an arbitrary amount of white + space.</para> + </listitem> + <listitem><para>If the rest of the line starts and ends with + matching quote characters (either single or double quote), + it is treated as a template body.</para> + </listitem> + <listitem><para>If the rest of the line <emphasis>does + not</emphasis> start with a quote character, it is + treated as the name of a file; the contents of this file + will be read and used as a template body.</para> + </listitem></itemizedlist> + + </sect2> + </sect1> + <sect1> + <title>Style files by example</title> + + <para>To illustrate how to write a style file, we will construct a + few by example. Rather than provide a complete style file and + walk through it, we'll mirror the usual process of developing a + style file by starting with something very simple, and walking + through a series of successively more complete examples.</para> + + <sect2> + <title>Identifying mistakes in style files</title> + + <para>If Mercurial encounters a problem in a style file you are + working on, it prints a terse error message that, once you + figure out what it means, is actually quite useful.</para> + +&interaction.template.svnstyle.syntax.input; + + <para>Notice that <filename>broken.style</filename> attempts to + define a <literal>changeset</literal> keyword, but forgets to + give any content for it. When instructed to use this style + file, Mercurial promptly complains.</para> + + &interaction.template.svnstyle.syntax.error; + + <para>This error message looks intimidating, but it is not too + hard to follow.</para> + + <itemizedlist> + <listitem><para>The first component is simply Mercurial's way + of saying <quote>I am giving up</quote>.</para> + <programlisting>___abort___: broken.style:1: parse error</programlisting> + </listitem> + <listitem><para>Next comes the name of the style file that + contains the error.</para> + <programlisting>abort: ___broken.style___:1: parse error</programlisting> + </listitem> + <listitem><para>Following the file name is the line number + where the error was encountered.</para> + <programlisting>abort: broken.style:___1___: parse error</programlisting> + </listitem> + <listitem><para>Finally, a description of what went + wrong.</para> + <programlisting>abort: broken.style:1: ___parse error___</programlisting> + </listitem> + <listitem><para>The description of the problem is not always + clear (as in this case), but even when it is cryptic, it + is almost always trivial to visually inspect the offending + line in the style file and see what is wrong.</para> + </listitem></itemizedlist> + + </sect2> + <sect2> + <title>Uniquely identifying a repository</title> + + <para>If you would like to be able to identify a Mercurial + repository <quote>fairly uniquely</quote> using a short string + as an identifier, you can use the first revision in the + repository.</para> + + &interaction.template.svnstyle.id; + + <para>This is not guaranteed to be unique, but it is + nevertheless useful in many cases.</para> + <itemizedlist> + <listitem><para>It will not work in a completely empty + repository, because such a repository does not have a + revision zero.</para> + </listitem> + <listitem><para>Neither will it work in the (extremely rare) + case where a repository is a merge of two or more formerly + independent repositories, and you still have those + repositories around.</para> + </listitem></itemizedlist> + <para>Here are some uses to which you could put this + identifier:</para> + <itemizedlist> + <listitem><para>As a key into a table for a database that + manages repositories on a server.</para> + </listitem> + <listitem><para>As half of a {<emphasis>repository + ID</emphasis>, <emphasis>revision ID</emphasis>} tuple. + Save this information away when you run an automated build + or other activity, so that you can <quote>replay</quote> + the build later if necessary.</para> + </listitem></itemizedlist> + + </sect2> + <sect2> + <title>Mimicking Subversion's output</title> + + <para>Let's try to emulate the default output format used by + another revision control tool, Subversion.</para> + + &interaction.template.svnstyle.short; + + <para>Since Subversion's output style is fairly simple, it is + easy to copy-and-paste a hunk of its output into a file, and + replace the text produced above by Subversion with the + template values we'd like to see expanded.</para> + + &interaction.template.svnstyle.template; + + <para>There are a few small ways in which this template deviates + from the output produced by Subversion.</para> + <itemizedlist> + <listitem><para>Subversion prints a <quote>readable</quote> + date (the <quote><literal>Wed, 27 Sep 2006</literal></quote> in the + example output above) in parentheses. Mercurial's + templating engine does not provide a way to display a date + in this format without also printing the time and time + zone.</para> + </listitem> + <listitem><para>We emulate Subversion's printing of + <quote>separator</quote> lines full of + <quote><literal>-</literal></quote> characters by ending + the template with such a line. We use the templating + engine's <literal role="template-keyword">header</literal> + keyword to print a separator line as the first line of + output (see below), thus achieving similar output to + Subversion.</para> + </listitem> + <listitem><para>Subversion's output includes a count in the + header of the number of lines in the commit message. We + cannot replicate this in Mercurial; the templating engine + does not currently provide a filter that counts the number + of lines the template generates.</para> + </listitem></itemizedlist> + <para>It took me no more than a minute or two of work to replace + literal text from an example of Subversion's output with some + keywords and filters to give the template above. The style + file simply refers to the template.</para> + + &interaction.template.svnstyle.style; + + <para>We could have included the text of the template file + directly in the style file by enclosing it in quotes and + replacing the newlines with + <quote><literal>\n</literal></quote> sequences, but it would + have made the style file too difficult to read. Readability + is a good guide when you're trying to decide whether some text + belongs in a style file, or in a template file that the style + file points to. If the style file will look too big or + cluttered if you insert a literal piece of text, drop it into + a template instead.</para> + + </sect2> + </sect1> +</chapter> + +<!-- +local variables: +sgml-parent-document: ("00book.xml" "book" "chapter") +end: +-->
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/en/ch11-mq.xml Thu Mar 19 20:54:12 2009 -0700 @@ -0,0 +1,1322 @@ +<!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : --> + +<chapter id="chap:mq"> + <?dbhtml filename="managing-change-with-mercurial-queues.html"?> + <title>Managing change with Mercurial Queues</title> + + <sect1 id="sec:mq:patch-mgmt"> + <title>The patch management problem</title> + + <para>Here is a common scenario: you need to install a software + package from source, but you find a bug that you must fix in the + source before you can start using the package. You make your + changes, forget about the package for a while, and a few months + later you need to upgrade to a newer version of the package. If + the newer version of the package still has the bug, you must + extract your fix from the older source tree and apply it against + the newer version. This is a tedious task, and it's easy to + make mistakes.</para> + + <para>This is a simple case of the <quote>patch management</quote> + problem. You have an <quote>upstream</quote> source tree that + you can't change; you need to make some local changes on top of + the upstream tree; and you'd like to be able to keep those + changes separate, so that you can apply them to newer versions + of the upstream source.</para> + + <para>The patch management problem arises in many situations. + Probably the most visible is that a user of an open source + software project will contribute a bug fix or new feature to the + project's maintainers in the form of a patch.</para> + + <para>Distributors of operating systems that include open source + software often need to make changes to the packages they + distribute so that they will build properly in their + environments.</para> + + <para>When you have few changes to maintain, it is easy to manage + a single patch using the standard <command>diff</command> and + <command>patch</command> programs (see section <xref + linkend="sec:mq:patch"/> for a discussion of these + tools). Once the number of changes grows, it starts to make + sense to maintain patches as discrete <quote>chunks of + work,</quote> so that for example a single patch will contain + only one bug fix (the patch might modify several files, but it's + doing <quote>only one thing</quote>), and you may have a number + of such patches for different bugs you need fixed and local + changes you require. In this situation, if you submit a bug fix + patch to the upstream maintainers of a package and they include + your fix in a subsequent release, you can simply drop that + single patch when you're updating to the newer release.</para> + + <para>Maintaining a single patch against an upstream tree is a + little tedious and error-prone, but not difficult. However, the + complexity of the problem grows rapidly as the number of patches + you have to maintain increases. With more than a tiny number of + patches in hand, understanding which ones you have applied and + maintaining them moves from messy to overwhelming.</para> + + <para>Fortunately, Mercurial includes a powerful extension, + Mercurial Queues (or simply <quote>MQ</quote>), that massively + simplifies the patch management problem.</para> + + </sect1> + <sect1 id="sec:mq:history"> + <title>The prehistory of Mercurial Queues</title> + + <para>During the late 1990s, several Linux kernel developers + started to maintain <quote>patch series</quote> that modified + the behaviour of the Linux kernel. Some of these series were + focused on stability, some on feature coverage, and others were + more speculative.</para> + + <para>The sizes of these patch series grew rapidly. In 2002, + Andrew Morton published some shell scripts he had been using to + automate the task of managing his patch queues. Andrew was + successfully using these scripts to manage hundreds (sometimes + thousands) of patches on top of the Linux kernel.</para> + + <sect2 id="sec:mq:quilt"> + <title>A patchwork quilt</title> + + <para>In early 2003, Andreas Gruenbacher and Martin Quinson + borrowed the approach of Andrew's scripts and published a tool + called <quote>patchwork quilt</quote> + <citation>web:quilt</citation>, or simply <quote>quilt</quote> + (see <citation>gruenbacher:2005</citation> for a paper + describing it). Because quilt substantially automated patch + management, it rapidly gained a large following among open + source software developers.</para> + + <para>Quilt manages a <emphasis>stack of patches</emphasis> on + top of a directory tree. To begin, you tell quilt to manage a + directory tree, and tell it which files you want to manage; it + stores away the names and contents of those files. To fix a + bug, you create a new patch (using a single command), edit the + files you need to fix, then <quote>refresh</quote> the + patch.</para> + + <para>The refresh step causes quilt to scan the directory tree; + it updates the patch with all of the changes you have made. + You can create another patch on top of the first, which will + track the changes required to modify the tree from <quote>tree + with one patch applied</quote> to <quote>tree with two + patches applied</quote>.</para> + + <para>You can <emphasis>change</emphasis> which patches are + applied to the tree. If you <quote>pop</quote> a patch, the + changes made by that patch will vanish from the directory + tree. Quilt remembers which patches you have popped, though, + so you can <quote>push</quote> a popped patch again, and the + directory tree will be restored to contain the modifications + in the patch. Most importantly, you can run the + <quote>refresh</quote> command at any time, and the topmost + applied patch will be updated. This means that you can, at + any time, change both which patches are applied and what + modifications those patches make.</para> + + <para>Quilt knows nothing about revision control tools, so it + works equally well on top of an unpacked tarball or a + Subversion working copy.</para> + + </sect2> + <sect2 id="sec:mq:quilt-mq"> + <title>From patchwork quilt to Mercurial Queues</title> + + <para>In mid-2005, Chris Mason took the features of quilt and + wrote an extension that he called Mercurial Queues, which + added quilt-like behaviour to Mercurial.</para> + + <para>The key difference between quilt and MQ is that quilt + knows nothing about revision control systems, while MQ is + <emphasis>integrated</emphasis> into Mercurial. Each patch + that you push is represented as a Mercurial changeset. Pop a + patch, and the changeset goes away.</para> + + <para>Because quilt does not care about revision control tools, + it is still a tremendously useful piece of software to know + about for situations where you cannot use Mercurial and + MQ.</para> + + </sect2> + </sect1> + <sect1> + <title>The huge advantage of MQ</title> + + <para>I cannot overstate the value that MQ offers through the + unification of patches and revision control.</para> + + <para>A major reason that patches have persisted in the free + software and open source world&emdash;in spite of the + availability of increasingly capable revision control tools over + the years&emdash;is the <emphasis>agility</emphasis> they + offer.</para> + + <para>Traditional revision control tools make a permanent, + irreversible record of everything that you do. While this has + great value, it's also somewhat stifling. If you want to + perform a wild-eyed experiment, you have to be careful in how + you go about it, or you risk leaving unneeded&emdash;or worse, + misleading or destabilising&emdash;traces of your missteps and + errors in the permanent revision record.</para> + + <para>By contrast, MQ's marriage of distributed revision control + with patches makes it much easier to isolate your work. Your + patches live on top of normal revision history, and you can make + them disappear or reappear at will. If you don't like a patch, + you can drop it. If a patch isn't quite as you want it to be, + simply fix it&emdash;as many times as you need to, until you + have refined it into the form you desire.</para> + + <para>As an example, the integration of patches with revision + control makes understanding patches and debugging their + effects&emdash;and their interplay with the code they're based + on&emdash;<emphasis>enormously</emphasis> easier. Since every + applied patch has an associated changeset, you can give <command + role="hg-cmd">hg log</command> a file name to see which + changesets and patches affected the file. You can use the + <command role="hg-cmd">hg bisect</command> command to + binary-search through all changesets and applied patches to see + where a bug got introduced or fixed. You can use the <command + role="hg-cmd">hg annotate</command> command to see which + changeset or patch modified a particular line of a source file. + And so on.</para> + + </sect1> + <sect1 id="sec:mq:patch"> + <title>Understanding patches</title> + + <para>Because MQ doesn't hide its patch-oriented nature, it is + helpful to understand what patches are, and a little about the + tools that work with them.</para> + + <para>The traditional Unix <command>diff</command> command + compares two files, and prints a list of differences between + them. The <command>patch</command> command understands these + differences as <emphasis>modifications</emphasis> to make to a + file. Take a look below for a simple example of these commands + in action.</para> + +&interaction.mq.dodiff.diff; + + <para>The type of file that <command>diff</command> generates (and + <command>patch</command> takes as input) is called a + <quote>patch</quote> or a <quote>diff</quote>; there is no + difference between a patch and a diff. (We'll use the term + <quote>patch</quote>, since it's more commonly used.)</para> + + <para>A patch file can start with arbitrary text; the + <command>patch</command> command ignores this text, but MQ uses + it as the commit message when creating changesets. To find the + beginning of the patch content, <command>patch</command> + searches for the first line that starts with the string + <quote><literal>diff -</literal></quote>.</para> + + <para>MQ works with <emphasis>unified</emphasis> diffs + (<command>patch</command> can accept several other diff formats, + but MQ doesn't). A unified diff contains two kinds of header. + The <emphasis>file header</emphasis> describes the file being + modified; it contains the name of the file to modify. When + <command>patch</command> sees a new file header, it looks for a + file with that name to start modifying.</para> + + <para>After the file header comes a series of + <emphasis>hunks</emphasis>. Each hunk starts with a header; + this identifies the range of line numbers within the file that + the hunk should modify. Following the header, a hunk starts and + ends with a few (usually three) lines of text from the + unmodified file; these are called the + <emphasis>context</emphasis> for the hunk. If there's only a + small amount of context between successive hunks, + <command>diff</command> doesn't print a new hunk header; it just + runs the hunks together, with a few lines of context between + modifications.</para> + + <para>Each line of context begins with a space character. Within + the hunk, a line that begins with + <quote><literal>-</literal></quote> means <quote>remove this + line,</quote> while a line that begins with + <quote><literal>+</literal></quote> means <quote>insert this + line.</quote> For example, a line that is modified is + represented by one deletion and one insertion.</para> + + <para>We will return to some of the more subtle aspects of patches + later (in section <xref linkend="sec:mq:adv-patch"/>), but you + should have + enough information now to use MQ.</para> + + </sect1> + <sect1 id="sec:mq:start"> + <title>Getting started with Mercurial Queues</title> + + <para>Because MQ is implemented as an extension, you must + explicitly enable before you can use it. (You don't need to + download anything; MQ ships with the standard Mercurial + distribution.) To enable MQ, edit your <filename + role="home">~/.hgrc</filename> file, and add the lines + below.</para> + + <programlisting>[extensions] +hgext.mq =</programlisting> + + <para>Once the extension is enabled, it will make a number of new + commands available. To verify that the extension is working, + you can use <command role="hg-cmd">hg help</command> to see if + the <command role="hg-ext-mq">qinit</command> command is now + available.</para> + +&interaction.mq.qinit-help.help; + + <para>You can use MQ with <emphasis>any</emphasis> Mercurial + repository, and its commands only operate within that + repository. To get started, simply prepare the repository using + the <command role="hg-ext-mq">qinit</command> command.</para> + +&interaction.mq.tutorial.qinit; + + <para>This command creates an empty directory called <filename + role="special" class="directory">.hg/patches</filename>, where + MQ will keep its metadata. As with many Mercurial commands, the + <command role="hg-ext-mq">qinit</command> command prints nothing + if it succeeds.</para> + + <sect2> + <title>Creating a new patch</title> + + <para>To begin work on a new patch, use the <command + role="hg-ext-mq">qnew</command> command. This command takes + one argument, the name of the patch to create.</para> + + <para>MQ will use this as the name of an actual file in the + <filename role="special" + class="directory">.hg/patches</filename> directory, as you + can see below.</para> + +&interaction.mq.tutorial.qnew; + + <para>Also newly present in the <filename role="special" + class="directory">.hg/patches</filename> directory are two + other files, <filename role="special">series</filename> and + <filename role="special">status</filename>. The <filename + role="special">series</filename> file lists all of the + patches that MQ knows about for this repository, with one + patch per line. Mercurial uses the <filename + role="special">status</filename> file for internal + book-keeping; it tracks all of the patches that MQ has + <emphasis>applied</emphasis> in this repository.</para> + + <note> + <para> You may sometimes want to edit the <filename + role="special">series</filename> file by hand; for + example, to change the sequence in which some patches are + applied. However, manually editing the <filename + role="special">status</filename> file is almost always a + bad idea, as it's easy to corrupt MQ's idea of what is + happening.</para> + </note> + + <para>Once you have created your new patch, you can edit files + in the working directory as you usually would. All of the + normal Mercurial commands, such as <command role="hg-cmd">hg + diff</command> and <command role="hg-cmd">hg + annotate</command>, work exactly as they did before.</para> + + </sect2> + <sect2> + <title>Refreshing a patch</title> + + <para>When you reach a point where you want to save your work, + use the <command role="hg-ext-mq">qrefresh</command> command + to update the patch you are working on.</para> + +&interaction.mq.tutorial.qrefresh; + + <para>This command folds the changes you have made in the + working directory into your patch, and updates its + corresponding changeset to contain those changes.</para> + + <para>You can run <command role="hg-ext-mq">qrefresh</command> + as often as you like, so it's a good way to + <quote>checkpoint</quote> your work. Refresh your patch at an + opportune time; try an experiment; and if the experiment + doesn't work out, <command role="hg-cmd">hg revert</command> + your modifications back to the last time you refreshed.</para> + +&interaction.mq.tutorial.qrefresh2; + + </sect2> + <sect2> + <title>Stacking and tracking patches</title> + + <para>Once you have finished working on a patch, or need to work + on another, you can use the <command + role="hg-ext-mq">qnew</command> command again to create a + new patch. Mercurial will apply this patch on top of your + existing patch.</para> + +&interaction.mq.tutorial.qnew2; + <para>Notice that the patch contains the changes in our prior + patch as part of its context (you can see this more clearly in + the output of <command role="hg-cmd">hg + annotate</command>).</para> + + <para>So far, with the exception of <command + role="hg-ext-mq">qnew</command> and <command + role="hg-ext-mq">qrefresh</command>, we've been careful to + only use regular Mercurial commands. However, MQ provides + many commands that are easier to use when you are thinking + about patches, as illustrated below.</para> + +&interaction.mq.tutorial.qseries; + + <itemizedlist> + <listitem><para>The <command + role="hg-ext-mq">qseries</command> command lists every + patch that MQ knows about in this repository, from oldest + to newest (most recently + <emphasis>created</emphasis>).</para> + </listitem> + <listitem><para>The <command + role="hg-ext-mq">qapplied</command> command lists every + patch that MQ has <emphasis>applied</emphasis> in this + repository, again from oldest to newest (most recently + applied).</para> + </listitem></itemizedlist> + + </sect2> + <sect2> + <title>Manipulating the patch stack</title> + + <para>The previous discussion implied that there must be a + difference between <quote>known</quote> and + <quote>applied</quote> patches, and there is. MQ can manage a + patch without it being applied in the repository.</para> + + <para>An <emphasis>applied</emphasis> patch has a corresponding + changeset in the repository, and the effects of the patch and + changeset are visible in the working directory. You can undo + the application of a patch using the <command + role="hg-ext-mq">qpop</command> command. MQ still + <emphasis>knows about</emphasis>, or manages, a popped patch, + but the patch no longer has a corresponding changeset in the + repository, and the working directory does not contain the + changes made by the patch. Figure <xref + linkend="fig:mq:stack"/> illustrates + the difference between applied and tracked patches.</para> + + <informalfigure id="fig:mq:stack"> + <mediaobject><imageobject><imagedata + fileref="mq-stack"/></imageobject><textobject><phrase>XXX + add text</phrase></textobject><caption><para>Applied and + unapplied patches in the MQ patch + stack</para></caption></mediaobject> + </informalfigure> + + <para>You can reapply an unapplied, or popped, patch using the + <command role="hg-ext-mq">qpush</command> command. This + creates a new changeset to correspond to the patch, and the + patch's changes once again become present in the working + directory. See below for examples of <command + role="hg-ext-mq">qpop</command> and <command + role="hg-ext-mq">qpush</command> in action.</para> +&interaction.mq.tutorial.qpop; + + <para>Notice that once we have popped a patch or two patches, + the output of <command role="hg-ext-mq">qseries</command> + remains the same, while that of <command + role="hg-ext-mq">qapplied</command> has changed.</para> + + + </sect2> + <sect2> + <title>Pushing and popping many patches</title> + + <para>While <command role="hg-ext-mq">qpush</command> and + <command role="hg-ext-mq">qpop</command> each operate on a + single patch at a time by default, you can push and pop many + patches in one go. The <option + role="hg-ext-mq-cmd-qpush-opt">hg -a</option> option to + <command role="hg-ext-mq">qpush</command> causes it to push + all unapplied patches, while the <option + role="hg-ext-mq-cmd-qpop-opt">-a</option> option to <command + role="hg-ext-mq">qpop</command> causes it to pop all applied + patches. (For some more ways to push and pop many patches, + see section <xref linkend="sec:mq:perf"/> + below.)</para> + +&interaction.mq.tutorial.qpush-a; + + </sect2> + <sect2> + <title>Safety checks, and overriding them</title> + + <para>Several MQ commands check the working directory before + they do anything, and fail if they find any modifications. + They do this to ensure that you won't lose any changes that + you have made, but not yet incorporated into a patch. The + example below illustrates this; the <command + role="hg-ext-mq">qnew</command> command will not create a + new patch if there are outstanding changes, caused in this + case by the <command role="hg-cmd">hg add</command> of + <filename>file3</filename>.</para> + +&interaction.mq.tutorial.add; + + <para>Commands that check the working directory all take an + <quote>I know what I'm doing</quote> option, which is always + named <option>-f</option>. The exact meaning of + <option>-f</option> depends on the command. For example, + <command role="hg-cmd">hg qnew <option + role="hg-ext-mq-cmd-qnew-opt">hg -f</option></command> + will incorporate any outstanding changes into the new patch it + creates, but <command role="hg-cmd">hg qpop <option + role="hg-ext-mq-cmd-qpop-opt">hg -f</option></command> + will revert modifications to any files affected by the patch + that it is popping. Be sure to read the documentation for a + command's <option>-f</option> option before you use it!</para> + + </sect2> + <sect2> + <title>Working on several patches at once</title> + + <para>The <command role="hg-ext-mq">qrefresh</command> command + always refreshes the <emphasis>topmost</emphasis> applied + patch. This means that you can suspend work on one patch (by + refreshing it), pop or push to make a different patch the top, + and work on <emphasis>that</emphasis> patch for a + while.</para> + + <para>Here's an example that illustrates how you can use this + ability. Let's say you're developing a new feature as two + patches. The first is a change to the core of your software, + and the second&emdash;layered on top of the + first&emdash;changes the user interface to use the code you + just added to the core. If you notice a bug in the core while + you're working on the UI patch, it's easy to fix the core. + Simply <command role="hg-ext-mq">qrefresh</command> the UI + patch to save your in-progress changes, and <command + role="hg-ext-mq">qpop</command> down to the core patch. Fix + the core bug, <command role="hg-ext-mq">qrefresh</command> the + core patch, and <command role="hg-ext-mq">qpush</command> back + to the UI patch to continue where you left off.</para> + + </sect2> + </sect1> + <sect1 id="sec:mq:adv-patch"> + <title>More about patches</title> + + <para>MQ uses the GNU <command>patch</command> command to apply + patches, so it's helpful to know a few more detailed aspects of + how <command>patch</command> works, and about patches + themselves.</para> + + <sect2> + <title>The strip count</title> + + <para>If you look at the file headers in a patch, you will + notice that the pathnames usually have an extra component on + the front that isn't present in the actual path name. This is + a holdover from the way that people used to generate patches + (people still do this, but it's somewhat rare with modern + revision control tools).</para> + + <para>Alice would unpack a tarball, edit her files, then decide + that she wanted to create a patch. So she'd rename her + working directory, unpack the tarball again (hence the need + for the rename), and use the <option + role="cmd-opt-diff">-r</option> and <option + role="cmd-opt-diff">-N</option> options to + <command>diff</command> to recursively generate a patch + between the unmodified directory and the modified one. The + result would be that the name of the unmodified directory + would be at the front of the left-hand path in every file + header, and the name of the modified directory would be at the + front of the right-hand path.</para> + + <para>Since someone receiving a patch from the Alices of the net + would be unlikely to have unmodified and modified directories + with exactly the same names, the <command>patch</command> + command has a <option role="cmd-opt-patch">-p</option> option + that indicates the number of leading path name components to + strip when trying to apply a patch. This number is called the + <emphasis>strip count</emphasis>.</para> + + <para>An option of <quote><literal>-p1</literal></quote> means + <quote>use a strip count of one</quote>. If + <command>patch</command> sees a file name + <filename>foo/bar/baz</filename> in a file header, it will + strip <filename>foo</filename> and try to patch a file named + <filename>bar/baz</filename>. (Strictly speaking, the strip + count refers to the number of <emphasis>path + separators</emphasis> (and the components that go with them + ) to strip. A strip count of one will turn + <filename>foo/bar</filename> into <filename>bar</filename>, + but <filename>/foo/bar</filename> (notice the extra leading + slash) into <filename>foo/bar</filename>.)</para> + + <para>The <quote>standard</quote> strip count for patches is + one; almost all patches contain one leading path name + component that needs to be stripped. Mercurial's <command + role="hg-cmd">hg diff</command> command generates path names + in this form, and the <command role="hg-cmd">hg + import</command> command and MQ expect patches to have a + strip count of one.</para> + + <para>If you receive a patch from someone that you want to add + to your patch queue, and the patch needs a strip count other + than one, you cannot just <command + role="hg-ext-mq">qimport</command> the patch, because + <command role="hg-ext-mq">qimport</command> does not yet have + a <literal>-p</literal> option (see <ulink role="hg-bug" + url="http://www.selenic.com/mercurial/bts/issue311">issue + 311</ulink>). Your best bet is to <command + role="hg-ext-mq">qnew</command> a patch of your own, then + use <command>patch -pN</command> to apply their patch, + followed by <command role="hg-cmd">hg addremove</command> to + pick up any files added or removed by the patch, followed by + <command role="hg-ext-mq">hg qrefresh</command>. This + complexity may become unnecessary; see <ulink role="hg-bug" + url="http://www.selenic.com/mercurial/bts/issue311">issue + 311</ulink> for details. + </para> + </sect2> + <sect2> + <title>Strategies for applying a patch</title> + + <para>When <command>patch</command> applies a hunk, it tries a + handful of successively less accurate strategies to try to + make the hunk apply. This falling-back technique often makes + it possible to take a patch that was generated against an old + version of a file, and apply it against a newer version of + that file.</para> + + <para>First, <command>patch</command> tries an exact match, + where the line numbers, the context, and the text to be + modified must apply exactly. If it cannot make an exact + match, it tries to find an exact match for the context, + without honouring the line numbering information. If this + succeeds, it prints a line of output saying that the hunk was + applied, but at some <emphasis>offset</emphasis> from the + original line number.</para> + + <para>If a context-only match fails, <command>patch</command> + removes the first and last lines of the context, and tries a + <emphasis>reduced</emphasis> context-only match. If the hunk + with reduced context succeeds, it prints a message saying that + it applied the hunk with a <emphasis>fuzz factor</emphasis> + (the number after the fuzz factor indicates how many lines of + context <command>patch</command> had to trim before the patch + applied).</para> + + <para>When neither of these techniques works, + <command>patch</command> prints a message saying that the hunk + in question was rejected. It saves rejected hunks (also + simply called <quote>rejects</quote>) to a file with the same + name, and an added <filename role="special">.rej</filename> + extension. It also saves an unmodified copy of the file with + a <filename role="special">.orig</filename> extension; the + copy of the file without any extensions will contain any + changes made by hunks that <emphasis>did</emphasis> apply + cleanly. If you have a patch that modifies + <filename>foo</filename> with six hunks, and one of them fails + to apply, you will have: an unmodified + <filename>foo.orig</filename>, a <filename>foo.rej</filename> + containing one hunk, and <filename>foo</filename>, containing + the changes made by the five successful hunks.</para> + + </sect2> + <sect2> + <title>Some quirks of patch representation</title> + + <para>There are a few useful things to know about how + <command>patch</command> works with files.</para> + <itemizedlist> + <listitem><para>This should already be obvious, but + <command>patch</command> cannot handle binary + files.</para> + </listitem> + <listitem><para>Neither does it care about the executable bit; + it creates new files as readable, but not + executable.</para> + </listitem> + <listitem><para><command>patch</command> treats the removal of + a file as a diff between the file to be removed and the + empty file. So your idea of <quote>I deleted this + file</quote> looks like <quote>every line of this file + was deleted</quote> in a patch.</para> + </listitem> + <listitem><para>It treats the addition of a file as a diff + between the empty file and the file to be added. So in a + patch, your idea of <quote>I added this file</quote> looks + like <quote>every line of this file was + added</quote>.</para> + </listitem> + <listitem><para>It treats a renamed file as the removal of the + old name, and the addition of the new name. This means + that renamed files have a big footprint in patches. (Note + also that Mercurial does not currently try to infer when + files have been renamed or copied in a patch.)</para> + </listitem> + <listitem><para><command>patch</command> cannot represent + empty files, so you cannot use a patch to represent the + notion <quote>I added this empty file to the + tree</quote>.</para> + </listitem></itemizedlist> + </sect2> + <sect2> + <title>Beware the fuzz</title> + + <para>While applying a hunk at an offset, or with a fuzz factor, + will often be completely successful, these inexact techniques + naturally leave open the possibility of corrupting the patched + file. The most common cases typically involve applying a + patch twice, or at an incorrect location in the file. If + <command>patch</command> or <command + role="hg-ext-mq">qpush</command> ever mentions an offset or + fuzz factor, you should make sure that the modified files are + correct afterwards.</para> + + <para>It's often a good idea to refresh a patch that has applied + with an offset or fuzz factor; refreshing the patch generates + new context information that will make it apply cleanly. I + say <quote>often,</quote> not <quote>always,</quote> because + sometimes refreshing a patch will make it fail to apply + against a different revision of the underlying files. In some + cases, such as when you're maintaining a patch that must sit + on top of multiple versions of a source tree, it's acceptable + to have a patch apply with some fuzz, provided you've verified + the results of the patching process in such cases.</para> + + </sect2> + <sect2> + <title>Handling rejection</title> + + <para>If <command role="hg-ext-mq">qpush</command> fails to + apply a patch, it will print an error message and exit. If it + has left <filename role="special">.rej</filename> files + behind, it is usually best to fix up the rejected hunks before + you push more patches or do any further work.</para> + + <para>If your patch <emphasis>used to</emphasis> apply cleanly, + and no longer does because you've changed the underlying code + that your patches are based on, Mercurial Queues can help; see + section <xref + linkend="sec:mq:merge"/> for details.</para> + + <para>Unfortunately, there aren't any great techniques for + dealing with rejected hunks. Most often, you'll need to view + the <filename role="special">.rej</filename> file and edit the + target file, applying the rejected hunks by hand.</para> + + <para>If you're feeling adventurous, Neil Brown, a Linux kernel + hacker, wrote a tool called <command>wiggle</command> + <citation>web:wiggle</citation>, which is more vigorous than + <command>patch</command> in its attempts to make a patch + apply.</para> + + <para>Another Linux kernel hacker, Chris Mason (the author of + Mercurial Queues), wrote a similar tool called + <command>mpatch</command> <citation>web:mpatch</citation>, + which takes a simple approach to automating the application of + hunks rejected by <command>patch</command>. The + <command>mpatch</command> command can help with four common + reasons that a hunk may be rejected:</para> + + <itemizedlist> + <listitem><para>The context in the middle of a hunk has + changed.</para> + </listitem> + <listitem><para>A hunk is missing some context at the + beginning or end.</para> + </listitem> + <listitem><para>A large hunk might apply better&emdash;either + entirely or in part&emdash;if it was broken up into + smaller hunks.</para> + </listitem> + <listitem><para>A hunk removes lines with slightly different + content than those currently present in the file.</para> + </listitem></itemizedlist> + + <para>If you use <command>wiggle</command> or + <command>mpatch</command>, you should be doubly careful to + check your results when you're done. In fact, + <command>mpatch</command> enforces this method of + double-checking the tool's output, by automatically dropping + you into a merge program when it has done its job, so that you + can verify its work and finish off any remaining + merges.</para> + + </sect2> + </sect1> + <sect1 id="sec:mq:perf"> + <title>Getting the best performance out of MQ</title> + + <para>MQ is very efficient at handling a large number of patches. + I ran some performance experiments in mid-2006 for a talk that I + gave at the 2006 EuroPython conference + <citation>web:europython</citation>. I used as my data set the + Linux 2.6.17-mm1 patch series, which consists of 1,738 patches. + I applied these on top of a Linux kernel repository containing + all 27,472 revisions between Linux 2.6.12-rc2 and Linux + 2.6.17.</para> + + <para>On my old, slow laptop, I was able to <command + role="hg-cmd">hg qpush <option + role="hg-ext-mq-cmd-qpush-opt">hg -a</option></command> all + 1,738 patches in 3.5 minutes, and <command role="hg-cmd">hg qpop + <option role="hg-ext-mq-cmd-qpop-opt">hg -a</option></command> + them all in 30 seconds. (On a newer laptop, the time to push + all patches dropped to two minutes.) I could <command + role="hg-ext-mq">qrefresh</command> one of the biggest patches + (which made 22,779 lines of changes to 287 files) in 6.6 + seconds.</para> + + <para>Clearly, MQ is well suited to working in large trees, but + there are a few tricks you can use to get the best performance + of it.</para> + + <para>First of all, try to <quote>batch</quote> operations + together. Every time you run <command + role="hg-ext-mq">qpush</command> or <command + role="hg-ext-mq">qpop</command>, these commands scan the + working directory once to make sure you haven't made some + changes and then forgotten to run <command + role="hg-ext-mq">qrefresh</command>. On a small tree, the + time that this scan takes is unnoticeable. However, on a + medium-sized tree (containing tens of thousands of files), it + can take a second or more.</para> + + <para>The <command role="hg-ext-mq">qpush</command> and <command + role="hg-ext-mq">qpop</command> commands allow you to push and + pop multiple patches at a time. You can identify the + <quote>destination patch</quote> that you want to end up at. + When you <command role="hg-ext-mq">qpush</command> with a + destination specified, it will push patches until that patch is + at the top of the applied stack. When you <command + role="hg-ext-mq">qpop</command> to a destination, MQ will pop + patches until the destination patch is at the top.</para> + + <para>You can identify a destination patch using either the name + of the patch, or by number. If you use numeric addressing, + patches are counted from zero; this means that the first patch + is zero, the second is one, and so on.</para> + + </sect1> + <sect1 id="sec:mq:merge"> + <title>Updating your patches when the underlying code + changes</title> + + <para>It's common to have a stack of patches on top of an + underlying repository that you don't modify directly. If you're + working on changes to third-party code, or on a feature that is + taking longer to develop than the rate of change of the code + beneath, you will often need to sync up with the underlying + code, and fix up any hunks in your patches that no longer apply. + This is called <emphasis>rebasing</emphasis> your patch + series.</para> + + <para>The simplest way to do this is to <command role="hg-cmd">hg + qpop <option role="hg-ext-mq-cmd-qpop-opt">hg + -a</option></command> your patches, then <command + role="hg-cmd">hg pull</command> changes into the underlying + repository, and finally <command role="hg-cmd">hg qpush <option + role="hg-ext-mq-cmd-qpop-opt">hg -a</option></command> your + patches again. MQ will stop pushing any time it runs across a + patch that fails to apply during conflicts, allowing you to fix + your conflicts, <command role="hg-ext-mq">qrefresh</command> the + affected patch, and continue pushing until you have fixed your + entire stack.</para> + + <para>This approach is easy to use and works well if you don't + expect changes to the underlying code to affect how well your + patches apply. If your patch stack touches code that is modified + frequently or invasively in the underlying repository, however, + fixing up rejected hunks by hand quickly becomes + tiresome.</para> + + <para>It's possible to partially automate the rebasing process. + If your patches apply cleanly against some revision of the + underlying repo, MQ can use this information to help you to + resolve conflicts between your patches and a different + revision.</para> + + <para>The process is a little involved.</para> + <orderedlist> + <listitem><para>To begin, <command role="hg-cmd">hg qpush + -a</command> all of your patches on top of the revision + where you know that they apply cleanly.</para> + </listitem> + <listitem><para>Save a backup copy of your patch directory using + <command role="hg-cmd">hg qsave <option + role="hg-ext-mq-cmd-qsave-opt">hg -e</option> <option + role="hg-ext-mq-cmd-qsave-opt">hg -c</option></command>. + This prints the name of the directory that it has saved the + patches in. It will save the patches to a directory called + <filename role="special" + class="directory">.hg/patches.N</filename>, where + <literal>N</literal> is a small integer. It also commits a + <quote>save changeset</quote> on top of your applied + patches; this is for internal book-keeping, and records the + states of the <filename role="special">series</filename> and + <filename role="special">status</filename> files.</para> + </listitem> + <listitem><para>Use <command role="hg-cmd">hg pull</command> to + bring new changes into the underlying repository. (Don't + run <command role="hg-cmd">hg pull -u</command>; see below + for why.)</para> + </listitem> + <listitem><para>Update to the new tip revision, using <command + role="hg-cmd">hg update <option + role="hg-opt-update">-C</option></command> to override + the patches you have pushed.</para> + </listitem> + <listitem><para>Merge all patches using <command>hg qpush -m + -a</command>. The <option + role="hg-ext-mq-cmd-qpush-opt">-m</option> option to + <command role="hg-ext-mq">qpush</command> tells MQ to + perform a three-way merge if the patch fails to + apply.</para> + </listitem></orderedlist> + + <para>During the <command role="hg-cmd">hg qpush <option + role="hg-ext-mq-cmd-qpush-opt">hg -m</option></command>, + each patch in the <filename role="special">series</filename> + file is applied normally. If a patch applies with fuzz or + rejects, MQ looks at the queue you <command + role="hg-ext-mq">qsave</command>d, and performs a three-way + merge with the corresponding changeset. This merge uses + Mercurial's normal merge machinery, so it may pop up a GUI merge + tool to help you to resolve problems.</para> + + <para>When you finish resolving the effects of a patch, MQ + refreshes your patch based on the result of the merge.</para> + + <para>At the end of this process, your repository will have one + extra head from the old patch queue, and a copy of the old patch + queue will be in <filename role="special" + class="directory">.hg/patches.N</filename>. You can remove the + extra head using <command role="hg-cmd">hg qpop -a -n + patches.N</command> or <command role="hg-cmd">hg + strip</command>. You can delete <filename role="special" + class="directory">.hg/patches.N</filename> once you are sure + that you no longer need it as a backup.</para> + + </sect1> + <sect1> + <title>Identifying patches</title> + + <para>MQ commands that work with patches let you refer to a patch + either by using its name or by a number. By name is obvious + enough; pass the name <filename>foo.patch</filename> to <command + role="hg-ext-mq">qpush</command>, for example, and it will + push patches until <filename>foo.patch</filename> is + applied.</para> + + <para>As a shortcut, you can refer to a patch using both a name + and a numeric offset; <literal>foo.patch-2</literal> means + <quote>two patches before <literal>foo.patch</literal></quote>, + while <literal>bar.patch+4</literal> means <quote>four patches + after <literal>bar.patch</literal></quote>.</para> + + <para>Referring to a patch by index isn't much different. The + first patch printed in the output of <command + role="hg-ext-mq">qseries</command> is patch zero (yes, it's + one of those start-at-zero counting systems); the second is + patch one; and so on.</para> + + <para>MQ also makes it easy to work with patches when you are + using normal Mercurial commands. Every command that accepts a + changeset ID will also accept the name of an applied patch. MQ + augments the tags normally in the repository with an eponymous + one for each applied patch. In addition, the special tags + <literal role="tag">qbase</literal> and + <literal role="tag">qtip</literal> identify + the <quote>bottom-most</quote> and topmost applied patches, + respectively.</para> + + <para>These additions to Mercurial's normal tagging capabilities + make dealing with patches even more of a breeze.</para> + <itemizedlist> + <listitem><para>Want to patchbomb a mailing list with your + latest series of changes?</para> + <programlisting>hg email qbase:qtip</programlisting> + <para> (Don't know what <quote>patchbombing</quote> is? See + section <xref linkend="sec:hgext:patchbomb"/>.)</para> + </listitem> + <listitem><para>Need to see all of the patches since + <literal>foo.patch</literal> that have touched files in a + subdirectory of your tree?</para> + <programlisting>hg log -r foo.patch:qtip subdir</programlisting> + </listitem> + </itemizedlist> + + <para>Because MQ makes the names of patches available to the rest + of Mercurial through its normal internal tag machinery, you + don't need to type in the entire name of a patch when you want + to identify it by name.</para> + + <para>Another nice consequence of representing patch names as tags + is that when you run the <command role="hg-cmd">hg log</command> + command, it will display a patch's name as a tag, simply as part + of its normal output. This makes it easy to visually + distinguish applied patches from underlying + <quote>normal</quote> revisions. The following example shows a + few normal Mercurial commands in use with applied + patches.</para> + +&interaction.mq.id.output; + + </sect1> + <sect1> + <title>Useful things to know about</title> + + <para>There are a number of aspects of MQ usage that don't fit + tidily into sections of their own, but that are good to know. + Here they are, in one place.</para> + + <itemizedlist> + <listitem><para>Normally, when you <command + role="hg-ext-mq">qpop</command> a patch and <command + role="hg-ext-mq">qpush</command> it again, the changeset + that represents the patch after the pop/push will have a + <emphasis>different identity</emphasis> than the changeset + that represented the hash beforehand. See section <xref + linkend="sec:mqref:cmd:qpush"/> for + information as to why this is.</para> + </listitem> + <listitem><para>It's not a good idea to <command + role="hg-cmd">hg merge</command> changes from another + branch with a patch changeset, at least if you want to + maintain the <quote>patchiness</quote> of that changeset and + changesets below it on the patch stack. If you try to do + this, it will appear to succeed, but MQ will become + confused.</para> + </listitem></itemizedlist> + + </sect1> + <sect1 id="sec:mq:repo"> + <title>Managing patches in a repository</title> + + <para>Because MQ's <filename role="special" + class="directory">.hg/patches</filename> directory resides + outside a Mercurial repository's working directory, the + <quote>underlying</quote> Mercurial repository knows nothing + about the management or presence of patches.</para> + + <para>This presents the interesting possibility of managing the + contents of the patch directory as a Mercurial repository in its + own right. This can be a useful way to work. For example, you + can work on a patch for a while, <command + role="hg-ext-mq">qrefresh</command> it, then <command + role="hg-cmd">hg commit</command> the current state of the + patch. This lets you <quote>roll back</quote> to that version + of the patch later on.</para> + + <para>You can then share different versions of the same patch + stack among multiple underlying repositories. I use this when I + am developing a Linux kernel feature. I have a pristine copy of + my kernel sources for each of several CPU architectures, and a + cloned repository under each that contains the patches I am + working on. When I want to test a change on a different + architecture, I push my current patches to the patch repository + associated with that kernel tree, pop and push all of my + patches, and build and test that kernel.</para> + + <para>Managing patches in a repository makes it possible for + multiple developers to work on the same patch series without + colliding with each other, all on top of an underlying source + base that they may or may not control.</para> + + <sect2> + <title>MQ support for patch repositories</title> + + <para>MQ helps you to work with the <filename role="special" + class="directory">.hg/patches</filename> directory as a + repository; when you prepare a repository for working with + patches using <command role="hg-ext-mq">qinit</command>, you + can pass the <option role="hg-ext-mq-cmd-qinit-opt">hg + -c</option> option to create the <filename role="special" + class="directory">.hg/patches</filename> directory as a + Mercurial repository.</para> + + <note> + <para> If you forget to use the <option + role="hg-ext-mq-cmd-qinit-opt">hg -c</option> option, you + can simply go into the <filename role="special" + class="directory">.hg/patches</filename> directory at any + time and run <command role="hg-cmd">hg init</command>. + Don't forget to add an entry for the <filename + role="special">status</filename> file to the <filename + role="special">.hgignore</filename> file, though</para> + + <para> (<command role="hg-cmd">hg qinit <option + role="hg-ext-mq-cmd-qinit-opt">hg -c</option></command> + does this for you automatically); you + <emphasis>really</emphasis> don't want to manage the + <filename role="special">status</filename> file.</para> + </note> + + <para>As a convenience, if MQ notices that the <filename + class="directory">.hg/patches</filename> directory is a + repository, it will automatically <command role="hg-cmd">hg + add</command> every patch that you create and import.</para> + + <para>MQ provides a shortcut command, <command + role="hg-ext-mq">qcommit</command>, that runs <command + role="hg-cmd">hg commit</command> in the <filename + role="special" class="directory">.hg/patches</filename> + directory. This saves some bothersome typing.</para> + + <para>Finally, as a convenience to manage the patch directory, + you can define the alias <command>mq</command> on Unix + systems. For example, on Linux systems using the + <command>bash</command> shell, you can include the following + snippet in your <filename + role="home">~/.bashrc</filename>.</para> + + <programlisting>alias mq=`hg -R $(hg root)/.hg/patches'</programlisting> + + <para>You can then issue commands of the form <command>mq + pull</command> from the main repository.</para> + + </sect2> + <sect2> + <title>A few things to watch out for</title> + + <para>MQ's support for working with a repository full of patches + is limited in a few small respects.</para> + + <para>MQ cannot automatically detect changes that you make to + the patch directory. If you <command role="hg-cmd">hg + pull</command>, manually edit, or <command role="hg-cmd">hg + update</command> changes to patches or the <filename + role="special">series</filename> file, you will have to + <command role="hg-cmd">hg qpop <option + role="hg-ext-mq-cmd-qpop-opt">hg -a</option></command> and + then <command role="hg-cmd">hg qpush <option + role="hg-ext-mq-cmd-qpush-opt">hg -a</option></command> in + the underlying repository to see those changes show up there. + If you forget to do this, you can confuse MQ's idea of which + patches are applied.</para> + + </sect2> + </sect1> + <sect1 id="sec:mq:tools"> + <title>Third party tools for working with patches</title> + + <para>Once you've been working with patches for a while, you'll + find yourself hungry for tools that will help you to understand + and manipulate the patches you're dealing with.</para> + + <para>The <command>diffstat</command> command + <citation>web:diffstat</citation> generates a histogram of the + modifications made to each file in a patch. It provides a good + way to <quote>get a sense of</quote> a patch&emdash;which files + it affects, and how much change it introduces to each file and + as a whole. (I find that it's a good idea to use + <command>diffstat</command>'s <option + role="cmd-opt-diffstat">-p</option> option as a matter of + course, as otherwise it will try to do clever things with + prefixes of file names that inevitably confuse at least + me.)</para> + +&interaction.mq.tools.tools; + + <para>The <literal role="package">patchutils</literal> package + <citation>web:patchutils</citation> is invaluable. It provides a + set of small utilities that follow the <quote>Unix + philosophy;</quote> each does one useful thing with a patch. + The <literal role="package">patchutils</literal> command I use + most is <command>filterdiff</command>, which extracts subsets + from a patch file. For example, given a patch that modifies + hundreds of files across dozens of directories, a single + invocation of <command>filterdiff</command> can generate a + smaller patch that only touches files whose names match a + particular glob pattern. See section <xref + linkend="mq-collab:tips:interdiff"/> for another + example.</para> + + </sect1> + <sect1> + <title>Good ways to work with patches</title> + + <para>Whether you are working on a patch series to submit to a + free software or open source project, or a series that you + intend to treat as a sequence of regular changesets when you're + done, you can use some simple techniques to keep your work well + organised.</para> + + <para>Give your patches descriptive names. A good name for a + patch might be <filename>rework-device-alloc.patch</filename>, + because it will immediately give you a hint what the purpose of + the patch is. Long names shouldn't be a problem; you won't be + typing the names often, but you <emphasis>will</emphasis> be + running commands like <command + role="hg-ext-mq">qapplied</command> and <command + role="hg-ext-mq">qtop</command> over and over. Good naming + becomes especially important when you have a number of patches + to work with, or if you are juggling a number of different tasks + and your patches only get a fraction of your attention.</para> + + <para>Be aware of what patch you're working on. Use the <command + role="hg-ext-mq">qtop</command> command and skim over the text + of your patches frequently&emdash;for example, using <command + role="hg-cmd">hg tip <option + role="hg-opt-tip">-p</option></command>)&emdash;to be sure + of where you stand. I have several times worked on and <command + role="hg-ext-mq">qrefresh</command>ed a patch other than the + one I intended, and it's often tricky to migrate changes into + the right patch after making them in the wrong one.</para> + + <para>For this reason, it is very much worth investing a little + time to learn how to use some of the third-party tools I + described in section <xref linkend="sec:mq:tools"/>, + particularly + <command>diffstat</command> and <command>filterdiff</command>. + The former will give you a quick idea of what changes your patch + is making, while the latter makes it easy to splice hunks + selectively out of one patch and into another.</para> + + </sect1> + <sect1> + <title>MQ cookbook</title> + + <sect2> + <title>Manage <quote>trivial</quote> patches</title> + + <para>Because the overhead of dropping files into a new + Mercurial repository is so low, it makes a lot of sense to + manage patches this way even if you simply want to make a few + changes to a source tarball that you downloaded.</para> + + <para>Begin by downloading and unpacking the source tarball, and + turning it into a Mercurial repository.</para> + + &interaction.mq.tarball.download; + + <para>Continue by creating a patch stack and making your + changes.</para> + + &interaction.mq.tarball.qinit; + + <para>Let's say a few weeks or months pass, and your package + author releases a new version. First, bring their changes + into the repository.</para> + + &interaction.mq.tarball.newsource; + + <para>The pipeline starting with <command role="hg-cmd">hg + locate</command> above deletes all files in the working + directory, so that <command role="hg-cmd">hg + commit</command>'s <option + role="hg-opt-commit">--addremove</option> option can + actually tell which files have really been removed in the + newer version of the source.</para> + + <para>Finally, you can apply your patches on top of the new + tree.</para> + + &interaction.mq.tarball.repush; + + </sect2> + <sect2 id="sec:mq:combine"> + <title>Combining entire patches</title> + + <para>MQ provides a command, <command + role="hg-ext-mq">qfold</command> that lets you combine + entire patches. This <quote>folds</quote> the patches you + name, in the order you name them, into the topmost applied + patch, and concatenates their descriptions onto the end of its + description. The patches that you fold must be unapplied + before you fold them.</para> + + <para>The order in which you fold patches matters. If your + topmost applied patch is <literal>foo</literal>, and you + <command role="hg-ext-mq">qfold</command> + <literal>bar</literal> and <literal>quux</literal> into it, + you will end up with a patch that has the same effect as if + you applied first <literal>foo</literal>, then + <literal>bar</literal>, followed by + <literal>quux</literal>.</para> + + </sect2> + <sect2> + <title>Merging part of one patch into another</title> + + <para>Merging <emphasis>part</emphasis> of one patch into + another is more difficult than combining entire + patches.</para> + + <para>If you want to move changes to entire files, you can use + <command>filterdiff</command>'s <option + role="cmd-opt-filterdiff">-i</option> and <option + role="cmd-opt-filterdiff">-x</option> options to choose the + modifications to snip out of one patch, concatenating its + output onto the end of the patch you want to merge into. You + usually won't need to modify the patch you've merged the + changes from. Instead, MQ will report some rejected hunks + when you <command role="hg-ext-mq">qpush</command> it (from + the hunks you moved into the other patch), and you can simply + <command role="hg-ext-mq">qrefresh</command> the patch to drop + the duplicate hunks.</para> + + <para>If you have a patch that has multiple hunks modifying a + file, and you only want to move a few of those hunks, the job + becomes more messy, but you can still partly automate it. Use + <command>lsdiff -nvv</command> to print some metadata about + the patch.</para> + + &interaction.mq.tools.lsdiff; + + <para>This command prints three different kinds of + number:</para> + <itemizedlist> + <listitem><para>(in the first column) a <emphasis>file + number</emphasis> to identify each file modified in the + patch;</para> + </listitem> + <listitem><para>(on the next line, indented) the line number + within a modified file where a hunk starts; and</para> + </listitem> + <listitem><para>(on the same line) a <emphasis>hunk + number</emphasis> to identify that hunk.</para> + </listitem></itemizedlist> + + <para>You'll have to use some visual inspection, and reading of + the patch, to identify the file and hunk numbers you'll want, + but you can then pass them to to + <command>filterdiff</command>'s <option + role="cmd-opt-filterdiff">--files</option> and <option + role="cmd-opt-filterdiff">--hunks</option> options, to + select exactly the file and hunk you want to extract.</para> + + <para>Once you have this hunk, you can concatenate it onto the + end of your destination patch and continue with the remainder + of section <xref linkend="sec:mq:combine"/>.</para> + + </sect2> + </sect1> + <sect1> + <title>Differences between quilt and MQ</title> + + <para>If you are already familiar with quilt, MQ provides a + similar command set. There are a few differences in the way + that it works.</para> + + <para>You will already have noticed that most quilt commands have + MQ counterparts that simply begin with a + <quote><literal>q</literal></quote>. The exceptions are quilt's + <literal>add</literal> and <literal>remove</literal> commands, + the counterparts for which are the normal Mercurial <command + role="hg-cmd">hg add</command> and <command role="hg-cmd">hg + remove</command> commands. There is no MQ equivalent of the + quilt <literal>edit</literal> command.</para> + + </sect1> +</chapter> + +<!-- +local variables: +sgml-parent-document: ("00book.xml" "book" "chapter") +end: +-->
--- a/en/ch11-template.xml Wed Mar 18 00:08:22 2009 -0700 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,675 +0,0 @@ -<!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : --> - -<chapter id="chap:template"> - <?dbhtml filename="customizing-the-output-of-mercurial.html"?> - <title>Customising the output of Mercurial</title> - - <para>Mercurial provides a powerful mechanism to let you control how - it displays information. The mechanism is based on templates. - You can use templates to generate specific output for a single - command, or to customise the entire appearance of the built-in web - interface.</para> - - <sect1 id="sec:style"> - <title>Using precanned output styles</title> - - <para>Packaged with Mercurial are some output styles that you can - use immediately. A style is simply a precanned template that - someone wrote and installed somewhere that Mercurial can - find.</para> - - <para>Before we take a look at Mercurial's bundled styles, let's - review its normal output.</para> - - &interaction.template.simple.normal; - - <para>This is somewhat informative, but it takes up a lot of - space&emdash;five lines of output per changeset. The - <literal>compact</literal> style reduces this to three lines, - presented in a sparse manner.</para> - - &interaction.template.simple.compact; - - <para>The <literal>changelog</literal> style hints at the - expressive power of Mercurial's templating engine. This style - attempts to follow the GNU Project's changelog - guidelines<citation>web:changelog</citation>.</para> - - &interaction.template.simple.changelog; - - <para>You will not be shocked to learn that Mercurial's default - output style is named <literal>default</literal>.</para> - - <sect2> - <title>Setting a default style</title> - - <para>You can modify the output style that Mercurial will use - for every command by editing your <filename - role="special">~/.hgrc</filename> file, naming the style - you would prefer to use.</para> - - <programlisting>[ui] -style = compact</programlisting> - - <para>If you write a style of your own, you can use it by either - providing the path to your style file, or copying your style - file into a location where Mercurial can find it (typically - the <literal>templates</literal> subdirectory of your - Mercurial install directory).</para> - - </sect2> - </sect1> - <sect1> - <title>Commands that support styles and templates</title> - - <para>All of Mercurial's - <quote><literal>log</literal>-like</quote> commands let you use - styles and templates: <command role="hg-cmd">hg - incoming</command>, <command role="hg-cmd">hg log</command>, - <command role="hg-cmd">hg outgoing</command>, and <command - role="hg-cmd">hg tip</command>.</para> - - <para>As I write this manual, these are so far the only commands - that support styles and templates. Since these are the most - important commands that need customisable output, there has been - little pressure from the Mercurial user community to add style - and template support to other commands.</para> - - </sect1> - <sect1> - <title>The basics of templating</title> - - <para>At its simplest, a Mercurial template is a piece of text. - Some of the text never changes, while other parts are - <emphasis>expanded</emphasis>, or replaced with new text, when - necessary.</para> - - <para>Before we continue, let's look again at a simple example of - Mercurial's normal output.</para> - - &interaction.template.simple.normal; - - <para>Now, let's run the same command, but using a template to - change its output.</para> - - &interaction.template.simple.simplest; - - <para>The example above illustrates the simplest possible - template; it's just a piece of static text, printed once for - each changeset. The <option - role="hg-opt-log">--template</option> option to the <command - role="hg-cmd">hg log</command> command tells Mercurial to use - the given text as the template when printing each - changeset.</para> - - <para>Notice that the template string above ends with the text - <quote><literal>\n</literal></quote>. This is an - <emphasis>escape sequence</emphasis>, telling Mercurial to print - a newline at the end of each template item. If you omit this - newline, Mercurial will run each piece of output together. See - section <xref linkend="sec:template:escape"/> for more details - of escape sequences.</para> - - <para>A template that prints a fixed string of text all the time - isn't very useful; let's try something a bit more - complex.</para> - - &interaction.template.simple.simplesub; - - <para>As you can see, the string - <quote><literal>{desc}</literal></quote> in the template has - been replaced in the output with the description of each - changeset. Every time Mercurial finds text enclosed in curly - braces (<quote><literal>{</literal></quote> and - <quote><literal>}</literal></quote>), it will try to replace the braces - and text with the expansion of whatever is inside. To print a - literal curly brace, you must escape it, as described in section - <xref - linkend="sec:template:escape"/>.</para> - - </sect1> - <sect1 id="sec:template:keyword"> - <title>Common template keywords</title> - - <para>You can start writing simple templates immediately using the - keywords below.</para> - - <itemizedlist> - <listitem><para><literal - role="template-keyword">author</literal>: String. The - unmodified author of the changeset.</para> - </listitem> - <listitem><para><literal - role="template-keyword">branches</literal>: String. The - name of the branch on which the changeset was committed. - Will be empty if the branch name was - <literal>default</literal>.</para> - </listitem> - <listitem><para><literal role="template-keyword">date</literal>: - Date information. The date when the changeset was - committed. This is <emphasis>not</emphasis> human-readable; - you must pass it through a filter that will render it - appropriately. See section <xref - linkend="sec:template:filter"/> for more information - on filters. The date is expressed as a pair of numbers. The - first number is a Unix UTC timestamp (seconds since January - 1, 1970); the second is the offset of the committer's - timezone from UTC, in seconds.</para> - </listitem> - <listitem><para><literal role="template-keyword">desc</literal>: - String. The text of the changeset description.</para> - </listitem> - <listitem><para><literal - role="template-keyword">files</literal>: List of strings. - All files modified, added, or removed by this - changeset.</para> - </listitem> - <listitem><para><literal - role="template-keyword">file_adds</literal>: List of - strings. Files added by this changeset.</para> - </listitem> - <listitem><para><literal - role="template-keyword">file_dels</literal>: List of - strings. Files removed by this changeset.</para> - </listitem> - <listitem><para><literal role="template-keyword">node</literal>: - String. The changeset identification hash, as a - 40-character hexadecimal string.</para> - </listitem> - <listitem><para><literal - role="template-keyword">parents</literal>: List of - strings. The parents of the changeset.</para> - </listitem> - <listitem><para><literal role="template-keyword">rev</literal>: - Integer. The repository-local changeset revision - number.</para> - </listitem> - <listitem><para><literal role="template-keyword">tags</literal>: - List of strings. Any tags associated with the - changeset.</para> - </listitem></itemizedlist> - - <para>A few simple experiments will show us what to expect when we - use these keywords; you can see the results below.</para> - -&interaction.template.simple.keywords; - - <para>As we noted above, the date keyword does not produce - human-readable output, so we must treat it specially. This - involves using a <emphasis>filter</emphasis>, about which more - in section <xref - linkend="sec:template:filter"/>.</para> - - &interaction.template.simple.datekeyword; - - </sect1> - <sect1 id="sec:template:escape"> - <title>Escape sequences</title> - - <para>Mercurial's templating engine recognises the most commonly - used escape sequences in strings. When it sees a backslash - (<quote><literal>\</literal></quote>) character, it looks at the - following character and substitutes the two characters with a - single replacement, as described below.</para> - - <itemizedlist> - <listitem><para><literal>\</literal>: - Backslash, <quote><literal>\</literal></quote>, ASCII - 134.</para> - </listitem> - <listitem><para><literal>\n</literal>: Newline, - ASCII 12.</para> - </listitem> - <listitem><para><literal>\r</literal>: Carriage - return, ASCII 15.</para> - </listitem> - <listitem><para><literal>\t</literal>: Tab, ASCII - 11.</para> - </listitem> - <listitem><para><literal>\v</literal>: Vertical - tab, ASCII 13.</para> - </listitem> - <listitem><para><literal>{</literal>: Open curly - brace, <quote><literal>{</literal></quote>, ASCII - 173.</para> - </listitem> - <listitem><para><literal>}</literal>: Close curly - brace, <quote><literal>}</literal></quote>, ASCII - 175.</para> - </listitem></itemizedlist> - - <para>As indicated above, if you want the expansion of a template - to contain a literal <quote><literal>\</literal></quote>, - <quote><literal>{</literal></quote>, or - <quote><literal>{</literal></quote> character, you must escape - it.</para> - - </sect1> - <sect1 id="sec:template:filter"> - <title>Filtering keywords to change their results</title> - - <para>Some of the results of template expansion are not - immediately easy to use. Mercurial lets you specify an optional - chain of <emphasis>filters</emphasis> to modify the result of - expanding a keyword. You have already seen a common filter, - <literal role="template-kw-filt-date">isodate</literal>, in - action above, to make a date readable.</para> - - <para>Below is a list of the most commonly used filters that - Mercurial supports. While some filters can be applied to any - text, others can only be used in specific circumstances. The - name of each filter is followed first by an indication of where - it can be used, then a description of its effect.</para> - - <itemizedlist> - <listitem><para><literal - role="template-filter">addbreaks</literal>: Any text. Add - an XHTML <quote><literal><br/></literal></quote> tag - before the end of every line except the last. For example, - <quote><literal>foo\nbar</literal></quote> becomes - <quote><literal>foo<br/>\nbar</literal></quote>.</para> - </listitem> - <listitem><para><literal - role="template-kw-filt-date">age</literal>: <literal - role="template-keyword">date</literal> keyword. Render - the age of the date, relative to the current time. Yields a - string like <quote><literal>10 - minutes</literal></quote>.</para> - </listitem> - <listitem><para><literal - role="template-filter">basename</literal>: Any text, but - most useful for the <literal - role="template-keyword">files</literal> keyword and its - relatives. Treat the text as a path, and return the - basename. For example, - <quote><literal>foo/bar/baz</literal></quote> becomes - <quote><literal>baz</literal></quote>.</para> - </listitem> - <listitem><para><literal - role="template-kw-filt-date">date</literal>: <literal - role="template-keyword">date</literal> keyword. Render a - date in a similar format to the Unix <literal - role="template-keyword">date</literal> command, but with - timezone included. Yields a string like <quote><literal>Mon - Sep 04 15:13:13 2006 -0700</literal></quote>.</para> - </listitem> - <listitem><para><literal - role="template-kw-filt-author">domain</literal>: Any text, - but most useful for the <literal - role="template-keyword">author</literal> keyword. Finds - the first string that looks like an email address, and - extract just the domain component. For example, - <quote><literal>Bryan O'Sullivan - <bos@serpentine.com></literal></quote> becomes - <quote><literal>serpentine.com</literal></quote>.</para> - </listitem> - <listitem><para><literal - role="template-kw-filt-author">email</literal>: Any text, - but most useful for the <literal - role="template-keyword">author</literal> keyword. Extract - the first string that looks like an email address. For - example, <quote><literal>Bryan O'Sullivan - <bos@serpentine.com></literal></quote> becomes - <quote><literal>bos@serpentine.com</literal></quote>.</para> - </listitem> - <listitem><para><literal - role="template-filter">escape</literal>: Any text. - Replace the special XML/XHTML characters - <quote><literal>&</literal></quote>, - <quote><literal><</literal></quote> and - <quote><literal>></literal></quote> with XML - entities.</para> - </listitem> - <listitem><para><literal - role="template-filter">fill68</literal>: Any text. Wrap - the text to fit in 68 columns. This is useful before you - pass text through the <literal - role="template-filter">tabindent</literal> filter, and - still want it to fit in an 80-column fixed-font - window.</para> - </listitem> - <listitem><para><literal - role="template-filter">fill76</literal>: Any text. Wrap - the text to fit in 76 columns.</para> - </listitem> - <listitem><para><literal - role="template-filter">firstline</literal>: Any text. - Yield the first line of text, without any trailing - newlines.</para> - </listitem> - <listitem><para><literal - role="template-kw-filt-date">hgdate</literal>: <literal - role="template-keyword">date</literal> keyword. Render - the date as a pair of readable numbers. Yields a string - like <quote><literal>1157407993 - 25200</literal></quote>.</para> - </listitem> - <listitem><para><literal - role="template-kw-filt-date">isodate</literal>: <literal - role="template-keyword">date</literal> keyword. Render - the date as a text string in ISO 8601 format. Yields a - string like <quote><literal>2006-09-04 15:13:13 - -0700</literal></quote>.</para> - </listitem> - <listitem><para><literal - role="template-filter">obfuscate</literal>: Any text, but - most useful for the <literal - role="template-keyword">author</literal> keyword. Yield - the input text rendered as a sequence of XML entities. This - helps to defeat some particularly stupid screen-scraping - email harvesting spambots.</para> - </listitem> - <listitem><para><literal - role="template-kw-filt-author">person</literal>: Any text, - but most useful for the <literal - role="template-keyword">author</literal> keyword. Yield - the text before an email address. For example, - <quote><literal>Bryan O'Sullivan - <bos@serpentine.com></literal></quote> becomes - <quote><literal>Bryan O'Sullivan</literal></quote>.</para> - </listitem> - <listitem><para><literal - role="template-kw-filt-date">rfc822date</literal>: - <literal role="template-keyword">date</literal> keyword. - Render a date using the same format used in email headers. - Yields a string like <quote><literal>Mon, 04 Sep 2006 - 15:13:13 -0700</literal></quote>.</para> - </listitem> - <listitem><para><literal - role="template-kw-filt-node">short</literal>: Changeset - hash. Yield the short form of a changeset hash, i.e. a - 12-character hexadecimal string.</para> - </listitem> - <listitem><para><literal - role="template-kw-filt-date">shortdate</literal>: <literal - role="template-keyword">date</literal> keyword. Render - the year, month, and day of the date. Yields a string like - <quote><literal>2006-09-04</literal></quote>.</para> - </listitem> - <listitem><para><literal role="template-filter">strip</literal>: - Any text. Strip all leading and trailing whitespace from - the string.</para> - </listitem> - <listitem><para><literal - role="template-filter">tabindent</literal>: Any text. - Yield the text, with every line except the first starting - with a tab character.</para> - </listitem> - <listitem><para><literal - role="template-filter">urlescape</literal>: Any text. - Escape all characters that are considered - <quote>special</quote> by URL parsers. For example, - <literal>foo bar</literal> becomes - <literal>foo%20bar</literal>.</para> - </listitem> - <listitem><para><literal - role="template-kw-filt-author">user</literal>: Any text, - but most useful for the <literal - role="template-keyword">author</literal> keyword. Return - the <quote>user</quote> portion of an email address. For - example, <quote><literal>Bryan O'Sullivan - <bos@serpentine.com></literal></quote> becomes - <quote><literal>bos</literal></quote>.</para> - </listitem></itemizedlist> - -&interaction.template.simple.manyfilters; - - <note> - <para> If you try to apply a filter to a piece of data that it - cannot process, Mercurial will fail and print a Python - exception. For example, trying to run the output of the - <literal role="template-keyword">desc</literal> keyword into - the <literal role="template-kw-filt-date">isodate</literal> - filter is not a good idea.</para> - </note> - - <sect2> - <title>Combining filters</title> - - <para>It is easy to combine filters to yield output in the form - you would like. The following chain of filters tidies up a - description, then makes sure that it fits cleanly into 68 - columns, then indents it by a further 8 characters (at least - on Unix-like systems, where a tab is conventionally 8 - characters wide).</para> - - &interaction.template.simple.combine; - - <para>Note the use of <quote><literal>\t</literal></quote> (a - tab character) in the template to force the first line to be - indented; this is necessary since <literal - role="template-keyword">tabindent</literal> indents all - lines <emphasis>except</emphasis> the first.</para> - - <para>Keep in mind that the order of filters in a chain is - significant. The first filter is applied to the result of the - keyword; the second to the result of the first filter; and so - on. For example, using <literal>fill68|tabindent</literal> - gives very different results from - <literal>tabindent|fill68</literal>.</para> - - - </sect2> - </sect1> - <sect1> - <title>From templates to styles</title> - - <para>A command line template provides a quick and simple way to - format some output. Templates can become verbose, though, and - it's useful to be able to give a template a name. A style file - is a template with a name, stored in a file.</para> - - <para>More than that, using a style file unlocks the power of - Mercurial's templating engine in ways that are not possible - using the command line <option - role="hg-opt-log">--template</option> option.</para> - - <sect2> - <title>The simplest of style files</title> - - <para>Our simple style file contains just one line:</para> - - &interaction.template.simple.rev; - - <para>This tells Mercurial, <quote>if you're printing a - changeset, use the text on the right as the - template</quote>.</para> - - </sect2> - <sect2> - <title>Style file syntax</title> - - <para>The syntax rules for a style file are simple.</para> - - <itemizedlist> - <listitem><para>The file is processed one line at a - time.</para> - </listitem> - <listitem><para>Leading and trailing white space are - ignored.</para> - </listitem> - <listitem><para>Empty lines are skipped.</para> - </listitem> - <listitem><para>If a line starts with either of the characters - <quote><literal>#</literal></quote> or - <quote><literal>;</literal></quote>, the entire line is - treated as a comment, and skipped as if empty.</para> - </listitem> - <listitem><para>A line starts with a keyword. This must start - with an alphabetic character or underscore, and can - subsequently contain any alphanumeric character or - underscore. (In regexp notation, a keyword must match - <literal>[A-Za-z_][A-Za-z0-9_]*</literal>.)</para> - </listitem> - <listitem><para>The next element must be an - <quote><literal>=</literal></quote> character, which can - be preceded or followed by an arbitrary amount of white - space.</para> - </listitem> - <listitem><para>If the rest of the line starts and ends with - matching quote characters (either single or double quote), - it is treated as a template body.</para> - </listitem> - <listitem><para>If the rest of the line <emphasis>does - not</emphasis> start with a quote character, it is - treated as the name of a file; the contents of this file - will be read and used as a template body.</para> - </listitem></itemizedlist> - - </sect2> - </sect1> - <sect1> - <title>Style files by example</title> - - <para>To illustrate how to write a style file, we will construct a - few by example. Rather than provide a complete style file and - walk through it, we'll mirror the usual process of developing a - style file by starting with something very simple, and walking - through a series of successively more complete examples.</para> - - <sect2> - <title>Identifying mistakes in style files</title> - - <para>If Mercurial encounters a problem in a style file you are - working on, it prints a terse error message that, once you - figure out what it means, is actually quite useful.</para> - -&interaction.template.svnstyle.syntax.input; - - <para>Notice that <filename>broken.style</filename> attempts to - define a <literal>changeset</literal> keyword, but forgets to - give any content for it. When instructed to use this style - file, Mercurial promptly complains.</para> - - &interaction.template.svnstyle.syntax.error; - - <para>This error message looks intimidating, but it is not too - hard to follow.</para> - - <itemizedlist> - <listitem><para>The first component is simply Mercurial's way - of saying <quote>I am giving up</quote>.</para> - <programlisting>___abort___: broken.style:1: parse error</programlisting> - </listitem> - <listitem><para>Next comes the name of the style file that - contains the error.</para> - <programlisting>abort: ___broken.style___:1: parse error</programlisting> - </listitem> - <listitem><para>Following the file name is the line number - where the error was encountered.</para> - <programlisting>abort: broken.style:___1___: parse error</programlisting> - </listitem> - <listitem><para>Finally, a description of what went - wrong.</para> - <programlisting>abort: broken.style:1: ___parse error___</programlisting> - </listitem> - <listitem><para>The description of the problem is not always - clear (as in this case), but even when it is cryptic, it - is almost always trivial to visually inspect the offending - line in the style file and see what is wrong.</para> - </listitem></itemizedlist> - - </sect2> - <sect2> - <title>Uniquely identifying a repository</title> - - <para>If you would like to be able to identify a Mercurial - repository <quote>fairly uniquely</quote> using a short string - as an identifier, you can use the first revision in the - repository.</para> - - &interaction.template.svnstyle.id; - - <para>This is not guaranteed to be unique, but it is - nevertheless useful in many cases.</para> - <itemizedlist> - <listitem><para>It will not work in a completely empty - repository, because such a repository does not have a - revision zero.</para> - </listitem> - <listitem><para>Neither will it work in the (extremely rare) - case where a repository is a merge of two or more formerly - independent repositories, and you still have those - repositories around.</para> - </listitem></itemizedlist> - <para>Here are some uses to which you could put this - identifier:</para> - <itemizedlist> - <listitem><para>As a key into a table for a database that - manages repositories on a server.</para> - </listitem> - <listitem><para>As half of a {<emphasis>repository - ID</emphasis>, <emphasis>revision ID</emphasis>} tuple. - Save this information away when you run an automated build - or other activity, so that you can <quote>replay</quote> - the build later if necessary.</para> - </listitem></itemizedlist> - - </sect2> - <sect2> - <title>Mimicking Subversion's output</title> - - <para>Let's try to emulate the default output format used by - another revision control tool, Subversion.</para> - - &interaction.template.svnstyle.short; - - <para>Since Subversion's output style is fairly simple, it is - easy to copy-and-paste a hunk of its output into a file, and - replace the text produced above by Subversion with the - template values we'd like to see expanded.</para> - - &interaction.template.svnstyle.template; - - <para>There are a few small ways in which this template deviates - from the output produced by Subversion.</para> - <itemizedlist> - <listitem><para>Subversion prints a <quote>readable</quote> - date (the <quote><literal>Wed, 27 Sep 2006</literal></quote> in the - example output above) in parentheses. Mercurial's - templating engine does not provide a way to display a date - in this format without also printing the time and time - zone.</para> - </listitem> - <listitem><para>We emulate Subversion's printing of - <quote>separator</quote> lines full of - <quote><literal>-</literal></quote> characters by ending - the template with such a line. We use the templating - engine's <literal role="template-keyword">header</literal> - keyword to print a separator line as the first line of - output (see below), thus achieving similar output to - Subversion.</para> - </listitem> - <listitem><para>Subversion's output includes a count in the - header of the number of lines in the commit message. We - cannot replicate this in Mercurial; the templating engine - does not currently provide a filter that counts the number - of lines the template generates.</para> - </listitem></itemizedlist> - <para>It took me no more than a minute or two of work to replace - literal text from an example of Subversion's output with some - keywords and filters to give the template above. The style - file simply refers to the template.</para> - - &interaction.template.svnstyle.style; - - <para>We could have included the text of the template file - directly in the style file by enclosing it in quotes and - replacing the newlines with - <quote><literal>\n</literal></quote> sequences, but it would - have made the style file too difficult to read. Readability - is a good guide when you're trying to decide whether some text - belongs in a style file, or in a template file that the style - file points to. If the style file will look too big or - cluttered if you insert a literal piece of text, drop it into - a template instead.</para> - - </sect2> - </sect1> -</chapter> - -<!-- -local variables: -sgml-parent-document: ("00book.xml" "book" "chapter") -end: --->
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/en/ch12-mq-collab.xml Thu Mar 19 20:54:12 2009 -0700 @@ -0,0 +1,518 @@ +<!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : --> + +<chapter id="chap:mq-collab"> + <?dbhtml filename="advanced-uses-of-mercurial-queues.html"?> + <title>Advanced uses of Mercurial Queues</title> + + <para>While it's easy to pick up straightforward uses of Mercurial + Queues, use of a little discipline and some of MQ's less + frequently used capabilities makes it possible to work in + complicated development environments.</para> + + <para>In this chapter, I will use as an example a technique I have + used to manage the development of an Infiniband device driver for + the Linux kernel. The driver in question is large (at least as + drivers go), with 25,000 lines of code spread across 35 source + files. It is maintained by a small team of developers.</para> + + <para>While much of the material in this chapter is specific to + Linux, the same principles apply to any code base for which you're + not the primary owner, and upon which you need to do a lot of + development.</para> + + <sect1> + <title>The problem of many targets</title> + + <para>The Linux kernel changes rapidly, and has never been + internally stable; developers frequently make drastic changes + between releases. This means that a version of the driver that + works well with a particular released version of the kernel will + not even <emphasis>compile</emphasis> correctly against, + typically, any other version.</para> + + <para>To maintain a driver, we have to keep a number of distinct + versions of Linux in mind.</para> + <itemizedlist> + <listitem><para>One target is the main Linux kernel development + tree. Maintenance of the code is in this case partly shared + by other developers in the kernel community, who make + <quote>drive-by</quote> modifications to the driver as they + develop and refine kernel subsystems.</para> + </listitem> + <listitem><para>We also maintain a number of + <quote>backports</quote> to older versions of the Linux + kernel, to support the needs of customers who are running + older Linux distributions that do not incorporate our + drivers. (To <emphasis>backport</emphasis> a piece of code + is to modify it to work in an older version of its target + environment than the version it was developed for.)</para> + </listitem> + <listitem><para>Finally, we make software releases on a schedule + that is necessarily not aligned with those used by Linux + distributors and kernel developers, so that we can deliver + new features to customers without forcing them to upgrade + their entire kernels or distributions.</para> + </listitem></itemizedlist> + + <sect2> + <title>Tempting approaches that don't work well</title> + + <para>There are two <quote>standard</quote> ways to maintain a + piece of software that has to target many different + environments.</para> + + <para>The first is to maintain a number of branches, each + intended for a single target. The trouble with this approach + is that you must maintain iron discipline in the flow of + changes between repositories. A new feature or bug fix must + start life in a <quote>pristine</quote> repository, then + percolate out to every backport repository. Backport changes + are more limited in the branches they should propagate to; a + backport change that is applied to a branch where it doesn't + belong will probably stop the driver from compiling.</para> + + <para>The second is to maintain a single source tree filled with + conditional statements that turn chunks of code on or off + depending on the intended target. Because these + <quote>ifdefs</quote> are not allowed in the Linux kernel + tree, a manual or automatic process must be followed to strip + them out and yield a clean tree. A code base maintained in + this fashion rapidly becomes a rat's nest of conditional + blocks that are difficult to understand and maintain.</para> + + <para>Neither of these approaches is well suited to a situation + where you don't <quote>own</quote> the canonical copy of a + source tree. In the case of a Linux driver that is + distributed with the standard kernel, Linus's tree contains + the copy of the code that will be treated by the world as + canonical. The upstream version of <quote>my</quote> driver + can be modified by people I don't know, without me even + finding out about it until after the changes show up in + Linus's tree.</para> + + <para>These approaches have the added weakness of making it + difficult to generate well-formed patches to submit + upstream.</para> + + <para>In principle, Mercurial Queues seems like a good candidate + to manage a development scenario such as the above. While + this is indeed the case, MQ contains a few added features that + make the job more pleasant.</para> + + </sect2> + </sect1> + <sect1> + <title>Conditionally applying patches with guards</title> + + <para>Perhaps the best way to maintain sanity with so many targets + is to be able to choose specific patches to apply for a given + situation. MQ provides a feature called <quote>guards</quote> + (which originates with quilt's <literal>guards</literal> + command) that does just this. To start off, let's create a + simple repository for experimenting in.</para> + + &interaction.mq.guards.init; + + <para>This gives us a tiny repository that contains two patches + that don't have any dependencies on each other, because they + touch different files.</para> + + <para>The idea behind conditional application is that you can + <quote>tag</quote> a patch with a <emphasis>guard</emphasis>, + which is simply a text string of your choosing, then tell MQ to + select specific guards to use when applying patches. MQ will + then either apply, or skip over, a guarded patch, depending on + the guards that you have selected.</para> + + <para>A patch can have an arbitrary number of guards; each one is + <emphasis>positive</emphasis> (<quote>apply this patch if this + guard is selected</quote>) or <emphasis>negative</emphasis> + (<quote>skip this patch if this guard is selected</quote>). A + patch with no guards is always applied.</para> + + </sect1> + <sect1> + <title>Controlling the guards on a patch</title> + + <para>The <command role="hg-ext-mq">qguard</command> command lets + you determine which guards should apply to a patch, or display + the guards that are already in effect. Without any arguments, it + displays the guards on the current topmost patch.</para> + + &interaction.mq.guards.qguard; + + <para>To set a positive guard on a patch, prefix the name of the + guard with a <quote><literal>+</literal></quote>.</para> + + &interaction.mq.guards.qguard.pos; + + <para>To set a negative guard + on a patch, prefix the name of the guard with a + <quote><literal>-</literal></quote>.</para> + + &interaction.mq.guards.qguard.neg; + + <note> + <para> The <command role="hg-ext-mq">qguard</command> command + <emphasis>sets</emphasis> the guards on a patch; it doesn't + <emphasis>modify</emphasis> them. What this means is that if + you run <command role="hg-cmd">hg qguard +a +b</command> on a + patch, then <command role="hg-cmd">hg qguard +c</command> on + the same patch, the <emphasis>only</emphasis> guard that will + be set on it afterwards is <literal>+c</literal>.</para> + </note> + + <para>Mercurial stores guards in the <filename + role="special">series</filename> file; the form in which they + are stored is easy both to understand and to edit by hand. (In + other words, you don't have to use the <command + role="hg-ext-mq">qguard</command> command if you don't want + to; it's okay to simply edit the <filename + role="special">series</filename> file.)</para> + + &interaction.mq.guards.series; + + </sect1> + <sect1> + <title>Selecting the guards to use</title> + + <para>The <command role="hg-ext-mq">qselect</command> command + determines which guards are active at a given time. The effect + of this is to determine which patches MQ will apply the next + time you run <command role="hg-ext-mq">qpush</command>. It has + no other effect; in particular, it doesn't do anything to + patches that are already applied.</para> + + <para>With no arguments, the <command + role="hg-ext-mq">qselect</command> command lists the guards + currently in effect, one per line of output. Each argument is + treated as the name of a guard to apply.</para> + + &interaction.mq.guards.qselect.foo; + + <para>In case you're interested, the currently selected guards are + stored in the <filename role="special">guards</filename> file.</para> + + &interaction.mq.guards.qselect.cat; + + <para>We can see the effect the selected guards have when we run + <command role="hg-ext-mq">qpush</command>.</para> + + &interaction.mq.guards.qselect.qpush; + + <para>A guard cannot start with a + <quote><literal>+</literal></quote> or + <quote><literal>-</literal></quote> character. The name of a + guard must not contain white space, but most other characters + are acceptable. If you try to use a guard with an invalid name, + MQ will complain:</para> + + &interaction.mq.guards.qselect.error; + + <para>Changing the selected guards changes the patches that are + applied.</para> + + &interaction.mq.guards.qselect.quux; + + <para>You can see in the example below that negative guards take + precedence over positive guards.</para> + + &interaction.mq.guards.qselect.foobar; + + </sect1> + <sect1> + <title>MQ's rules for applying patches</title> + + <para>The rules that MQ uses when deciding whether to apply a + patch are as follows.</para> + <itemizedlist> + <listitem><para>A patch that has no guards is always + applied.</para> + </listitem> + <listitem><para>If the patch has any negative guard that matches + any currently selected guard, the patch is skipped.</para> + </listitem> + <listitem><para>If the patch has any positive guard that matches + any currently selected guard, the patch is applied.</para> + </listitem> + <listitem><para>If the patch has positive or negative guards, + but none matches any currently selected guard, the patch is + skipped.</para> + </listitem></itemizedlist> + + </sect1> + <sect1> + <title>Trimming the work environment</title> + + <para>In working on the device driver I mentioned earlier, I don't + apply the patches to a normal Linux kernel tree. Instead, I use + a repository that contains only a snapshot of the source files + and headers that are relevant to Infiniband development. This + repository is 1% the size of a kernel repository, so it's easier + to work with.</para> + + <para>I then choose a <quote>base</quote> version on top of which + the patches are applied. This is a snapshot of the Linux kernel + tree as of a revision of my choosing. When I take the snapshot, + I record the changeset ID from the kernel repository in the + commit message. Since the snapshot preserves the + <quote>shape</quote> and content of the relevant parts of the + kernel tree, I can apply my patches on top of either my tiny + repository or a normal kernel tree.</para> + + <para>Normally, the base tree atop which the patches apply should + be a snapshot of a very recent upstream tree. This best + facilitates the development of patches that can easily be + submitted upstream with few or no modifications.</para> + + </sect1> + <sect1> + <title>Dividing up the <filename role="special">series</filename> + file</title> + + <para>I categorise the patches in the <filename + role="special">series</filename> file into a number of logical + groups. Each section of like patches begins with a block of + comments that describes the purpose of the patches that + follow.</para> + + <para>The sequence of patch groups that I maintain follows. The + ordering of these groups is important; I'll describe why after I + introduce the groups.</para> + <itemizedlist> + <listitem><para>The <quote>accepted</quote> group. Patches that + the development team has submitted to the maintainer of the + Infiniband subsystem, and which he has accepted, but which + are not present in the snapshot that the tiny repository is + based on. These are <quote>read only</quote> patches, + present only to transform the tree into a similar state as + it is in the upstream maintainer's repository.</para> + </listitem> + <listitem><para>The <quote>rework</quote> group. Patches that I + have submitted, but that the upstream maintainer has + requested modifications to before he will accept + them.</para> + </listitem> + <listitem><para>The <quote>pending</quote> group. Patches that + I have not yet submitted to the upstream maintainer, but + which we have finished working on. These will be <quote>read + only</quote> for a while. If the upstream maintainer + accepts them upon submission, I'll move them to the end of + the <quote>accepted</quote> group. If he requests that I + modify any, I'll move them to the beginning of the + <quote>rework</quote> group.</para> + </listitem> + <listitem><para>The <quote>in progress</quote> group. Patches + that are actively being developed, and should not be + submitted anywhere yet.</para> + </listitem> + <listitem><para>The <quote>backport</quote> group. Patches that + adapt the source tree to older versions of the kernel + tree.</para> + </listitem> + <listitem><para>The <quote>do not ship</quote> group. Patches + that for some reason should never be submitted upstream. + For example, one such patch might change embedded driver + identification strings to make it easier to distinguish, in + the field, between an out-of-tree version of the driver and + a version shipped by a distribution vendor.</para> + </listitem></itemizedlist> + + <para>Now to return to the reasons for ordering groups of patches + in this way. We would like the lowest patches in the stack to + be as stable as possible, so that we will not need to rework + higher patches due to changes in context. Putting patches that + will never be changed first in the <filename + role="special">series</filename> file serves this + purpose.</para> + + <para>We would also like the patches that we know we'll need to + modify to be applied on top of a source tree that resembles the + upstream tree as closely as possible. This is why we keep + accepted patches around for a while.</para> + + <para>The <quote>backport</quote> and <quote>do not ship</quote> + patches float at the end of the <filename + role="special">series</filename> file. The backport patches + must be applied on top of all other patches, and the <quote>do + not ship</quote> patches might as well stay out of harm's + way.</para> + + </sect1> + <sect1> + <title>Maintaining the patch series</title> + + <para>In my work, I use a number of guards to control which + patches are to be applied.</para> + + <itemizedlist> + <listitem><para><quote>Accepted</quote> patches are guarded with + <literal>accepted</literal>. I enable this guard most of + the time. When I'm applying the patches on top of a tree + where the patches are already present, I can turn this patch + off, and the patches that follow it will apply + cleanly.</para> + </listitem> + <listitem><para>Patches that are <quote>finished</quote>, but + not yet submitted, have no guards. If I'm applying the + patch stack to a copy of the upstream tree, I don't need to + enable any guards in order to get a reasonably safe source + tree.</para> + </listitem> + <listitem><para>Those patches that need reworking before being + resubmitted are guarded with + <literal>rework</literal>.</para> + </listitem> + <listitem><para>For those patches that are still under + development, I use <literal>devel</literal>.</para> + </listitem> + <listitem><para>A backport patch may have several guards, one + for each version of the kernel to which it applies. For + example, a patch that backports a piece of code to 2.6.9 + will have a <literal>2.6.9</literal> guard.</para> + </listitem></itemizedlist> + <para>This variety of guards gives me considerable flexibility in + determining what kind of source tree I want to end up with. For + most situations, the selection of appropriate guards is + automated during the build process, but I can manually tune the + guards to use for less common circumstances.</para> + + <sect2> + <title>The art of writing backport patches</title> + + <para>Using MQ, writing a backport patch is a simple process. + All such a patch has to do is modify a piece of code that uses + a kernel feature not present in the older version of the + kernel, so that the driver continues to work correctly under + that older version.</para> + + <para>A useful goal when writing a good backport patch is to + make your code look as if it was written for the older version + of the kernel you're targeting. The less obtrusive the patch, + the easier it will be to understand and maintain. If you're + writing a collection of backport patches to avoid the + <quote>rat's nest</quote> effect of lots of + <literal>#ifdef</literal>s (hunks of source code that are only + used conditionally) in your code, don't introduce + version-dependent <literal>#ifdef</literal>s into the patches. + Instead, write several patches, each of which makes + unconditional changes, and control their application using + guards.</para> + + <para>There are two reasons to divide backport patches into a + distinct group, away from the <quote>regular</quote> patches + whose effects they modify. The first is that intermingling the + two makes it more difficult to use a tool like the <literal + role="hg-ext">patchbomb</literal> extension to automate the + process of submitting the patches to an upstream maintainer. + The second is that a backport patch could perturb the context + in which a subsequent regular patch is applied, making it + impossible to apply the regular patch cleanly + <emphasis>without</emphasis> the earlier backport patch + already being applied.</para> + + </sect2> + </sect1> + <sect1> + <title>Useful tips for developing with MQ</title> + + <sect2> + <title>Organising patches in directories</title> + + <para>If you're working on a substantial project with MQ, it's + not difficult to accumulate a large number of patches. For + example, I have one patch repository that contains over 250 + patches.</para> + + <para>If you can group these patches into separate logical + categories, you can if you like store them in different + directories; MQ has no problems with patch names that contain + path separators.</para> + + </sect2> + <sect2 id="mq-collab:tips:interdiff"> + <title>Viewing the history of a patch</title> + + <para>If you're developing a set of patches over a long time, + it's a good idea to maintain them in a repository, as + discussed in section <xref linkend="sec:mq:repo"/>. If you do + so, you'll quickly + discover that using the <command role="hg-cmd">hg + diff</command> command to look at the history of changes to + a patch is unworkable. This is in part because you're looking + at the second derivative of the real code (a diff of a diff), + but also because MQ adds noise to the process by modifying + time stamps and directory names when it updates a + patch.</para> + + <para>However, you can use the <literal + role="hg-ext">extdiff</literal> extension, which is bundled + with Mercurial, to turn a diff of two versions of a patch into + something readable. To do this, you will need a third-party + package called <literal role="package">patchutils</literal> + <citation>web:patchutils</citation>. This provides a command + named <command>interdiff</command>, which shows the + differences between two diffs as a diff. Used on two versions + of the same diff, it generates a diff that represents the diff + from the first to the second version.</para> + + <para>You can enable the <literal + role="hg-ext">extdiff</literal> extension in the usual way, + by adding a line to the <literal + role="rc-extensions">extensions</literal> section of your + <filename role="special">~/.hgrc</filename>.</para> + <programlisting>[extensions] +extdiff =</programlisting> + <para>The <command>interdiff</command> command expects to be + passed the names of two files, but the <literal + role="hg-ext">extdiff</literal> extension passes the program + it runs a pair of directories, each of which can contain an + arbitrary number of files. We thus need a small program that + will run <command>interdiff</command> on each pair of files in + these two directories. This program is available as <filename + role="special">hg-interdiff</filename> in the <filename + class="directory">examples</filename> directory of the + source code repository that accompanies this book. <!-- + &example.hg-interdiff; --></para> + + <para>With the <filename role="special">hg-interdiff</filename> + program in your shell's search path, you can run it as + follows, from inside an MQ patch directory:</para> + <programlisting>hg extdiff -p hg-interdiff -r A:B my-change.patch</programlisting> + <para>Since you'll probably want to use this long-winded command + a lot, you can get <literal role="hg-ext">hgext</literal> to + make it available as a normal Mercurial command, again by + editing your <filename + role="special">~/.hgrc</filename>.</para> + <programlisting>[extdiff] +cmd.interdiff = hg-interdiff</programlisting> + <para>This directs <literal role="hg-ext">hgext</literal> to + make an <literal>interdiff</literal> command available, so you + can now shorten the previous invocation of <command + role="hg-ext-extdiff">extdiff</command> to something a + little more wieldy.</para> + <programlisting>hg interdiff -r A:B my-change.patch</programlisting> + + <note> + <para> The <command>interdiff</command> command works well + only if the underlying files against which versions of a + patch are generated remain the same. If you create a patch, + modify the underlying files, and then regenerate the patch, + <command>interdiff</command> may not produce useful + output.</para> + </note> + + <para>The <literal role="hg-ext">extdiff</literal> extension is + useful for more than merely improving the presentation of MQ + patches. To read more about it, go to section <xref + linkend="sec:hgext:extdiff"/>.</para> + + </sect2> + </sect1> +</chapter> + +<!-- +local variables: +sgml-parent-document: ("00book.xml" "book" "chapter") +end: +-->
--- a/en/ch12-mq.xml Wed Mar 18 00:08:22 2009 -0700 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,1322 +0,0 @@ -<!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : --> - -<chapter id="chap:mq"> - <?dbhtml filename="managing-change-with-mercurial-queues.html"?> - <title>Managing change with Mercurial Queues</title> - - <sect1 id="sec:mq:patch-mgmt"> - <title>The patch management problem</title> - - <para>Here is a common scenario: you need to install a software - package from source, but you find a bug that you must fix in the - source before you can start using the package. You make your - changes, forget about the package for a while, and a few months - later you need to upgrade to a newer version of the package. If - the newer version of the package still has the bug, you must - extract your fix from the older source tree and apply it against - the newer version. This is a tedious task, and it's easy to - make mistakes.</para> - - <para>This is a simple case of the <quote>patch management</quote> - problem. You have an <quote>upstream</quote> source tree that - you can't change; you need to make some local changes on top of - the upstream tree; and you'd like to be able to keep those - changes separate, so that you can apply them to newer versions - of the upstream source.</para> - - <para>The patch management problem arises in many situations. - Probably the most visible is that a user of an open source - software project will contribute a bug fix or new feature to the - project's maintainers in the form of a patch.</para> - - <para>Distributors of operating systems that include open source - software often need to make changes to the packages they - distribute so that they will build properly in their - environments.</para> - - <para>When you have few changes to maintain, it is easy to manage - a single patch using the standard <command>diff</command> and - <command>patch</command> programs (see section <xref - linkend="sec:mq:patch"/> for a discussion of these - tools). Once the number of changes grows, it starts to make - sense to maintain patches as discrete <quote>chunks of - work,</quote> so that for example a single patch will contain - only one bug fix (the patch might modify several files, but it's - doing <quote>only one thing</quote>), and you may have a number - of such patches for different bugs you need fixed and local - changes you require. In this situation, if you submit a bug fix - patch to the upstream maintainers of a package and they include - your fix in a subsequent release, you can simply drop that - single patch when you're updating to the newer release.</para> - - <para>Maintaining a single patch against an upstream tree is a - little tedious and error-prone, but not difficult. However, the - complexity of the problem grows rapidly as the number of patches - you have to maintain increases. With more than a tiny number of - patches in hand, understanding which ones you have applied and - maintaining them moves from messy to overwhelming.</para> - - <para>Fortunately, Mercurial includes a powerful extension, - Mercurial Queues (or simply <quote>MQ</quote>), that massively - simplifies the patch management problem.</para> - - </sect1> - <sect1 id="sec:mq:history"> - <title>The prehistory of Mercurial Queues</title> - - <para>During the late 1990s, several Linux kernel developers - started to maintain <quote>patch series</quote> that modified - the behaviour of the Linux kernel. Some of these series were - focused on stability, some on feature coverage, and others were - more speculative.</para> - - <para>The sizes of these patch series grew rapidly. In 2002, - Andrew Morton published some shell scripts he had been using to - automate the task of managing his patch queues. Andrew was - successfully using these scripts to manage hundreds (sometimes - thousands) of patches on top of the Linux kernel.</para> - - <sect2 id="sec:mq:quilt"> - <title>A patchwork quilt</title> - - <para>In early 2003, Andreas Gruenbacher and Martin Quinson - borrowed the approach of Andrew's scripts and published a tool - called <quote>patchwork quilt</quote> - <citation>web:quilt</citation>, or simply <quote>quilt</quote> - (see <citation>gruenbacher:2005</citation> for a paper - describing it). Because quilt substantially automated patch - management, it rapidly gained a large following among open - source software developers.</para> - - <para>Quilt manages a <emphasis>stack of patches</emphasis> on - top of a directory tree. To begin, you tell quilt to manage a - directory tree, and tell it which files you want to manage; it - stores away the names and contents of those files. To fix a - bug, you create a new patch (using a single command), edit the - files you need to fix, then <quote>refresh</quote> the - patch.</para> - - <para>The refresh step causes quilt to scan the directory tree; - it updates the patch with all of the changes you have made. - You can create another patch on top of the first, which will - track the changes required to modify the tree from <quote>tree - with one patch applied</quote> to <quote>tree with two - patches applied</quote>.</para> - - <para>You can <emphasis>change</emphasis> which patches are - applied to the tree. If you <quote>pop</quote> a patch, the - changes made by that patch will vanish from the directory - tree. Quilt remembers which patches you have popped, though, - so you can <quote>push</quote> a popped patch again, and the - directory tree will be restored to contain the modifications - in the patch. Most importantly, you can run the - <quote>refresh</quote> command at any time, and the topmost - applied patch will be updated. This means that you can, at - any time, change both which patches are applied and what - modifications those patches make.</para> - - <para>Quilt knows nothing about revision control tools, so it - works equally well on top of an unpacked tarball or a - Subversion working copy.</para> - - </sect2> - <sect2 id="sec:mq:quilt-mq"> - <title>From patchwork quilt to Mercurial Queues</title> - - <para>In mid-2005, Chris Mason took the features of quilt and - wrote an extension that he called Mercurial Queues, which - added quilt-like behaviour to Mercurial.</para> - - <para>The key difference between quilt and MQ is that quilt - knows nothing about revision control systems, while MQ is - <emphasis>integrated</emphasis> into Mercurial. Each patch - that you push is represented as a Mercurial changeset. Pop a - patch, and the changeset goes away.</para> - - <para>Because quilt does not care about revision control tools, - it is still a tremendously useful piece of software to know - about for situations where you cannot use Mercurial and - MQ.</para> - - </sect2> - </sect1> - <sect1> - <title>The huge advantage of MQ</title> - - <para>I cannot overstate the value that MQ offers through the - unification of patches and revision control.</para> - - <para>A major reason that patches have persisted in the free - software and open source world&emdash;in spite of the - availability of increasingly capable revision control tools over - the years&emdash;is the <emphasis>agility</emphasis> they - offer.</para> - - <para>Traditional revision control tools make a permanent, - irreversible record of everything that you do. While this has - great value, it's also somewhat stifling. If you want to - perform a wild-eyed experiment, you have to be careful in how - you go about it, or you risk leaving unneeded&emdash;or worse, - misleading or destabilising&emdash;traces of your missteps and - errors in the permanent revision record.</para> - - <para>By contrast, MQ's marriage of distributed revision control - with patches makes it much easier to isolate your work. Your - patches live on top of normal revision history, and you can make - them disappear or reappear at will. If you don't like a patch, - you can drop it. If a patch isn't quite as you want it to be, - simply fix it&emdash;as many times as you need to, until you - have refined it into the form you desire.</para> - - <para>As an example, the integration of patches with revision - control makes understanding patches and debugging their - effects&emdash;and their interplay with the code they're based - on&emdash;<emphasis>enormously</emphasis> easier. Since every - applied patch has an associated changeset, you can give <command - role="hg-cmd">hg log</command> a file name to see which - changesets and patches affected the file. You can use the - <command role="hg-cmd">hg bisect</command> command to - binary-search through all changesets and applied patches to see - where a bug got introduced or fixed. You can use the <command - role="hg-cmd">hg annotate</command> command to see which - changeset or patch modified a particular line of a source file. - And so on.</para> - - </sect1> - <sect1 id="sec:mq:patch"> - <title>Understanding patches</title> - - <para>Because MQ doesn't hide its patch-oriented nature, it is - helpful to understand what patches are, and a little about the - tools that work with them.</para> - - <para>The traditional Unix <command>diff</command> command - compares two files, and prints a list of differences between - them. The <command>patch</command> command understands these - differences as <emphasis>modifications</emphasis> to make to a - file. Take a look below for a simple example of these commands - in action.</para> - -&interaction.mq.dodiff.diff; - - <para>The type of file that <command>diff</command> generates (and - <command>patch</command> takes as input) is called a - <quote>patch</quote> or a <quote>diff</quote>; there is no - difference between a patch and a diff. (We'll use the term - <quote>patch</quote>, since it's more commonly used.)</para> - - <para>A patch file can start with arbitrary text; the - <command>patch</command> command ignores this text, but MQ uses - it as the commit message when creating changesets. To find the - beginning of the patch content, <command>patch</command> - searches for the first line that starts with the string - <quote><literal>diff -</literal></quote>.</para> - - <para>MQ works with <emphasis>unified</emphasis> diffs - (<command>patch</command> can accept several other diff formats, - but MQ doesn't). A unified diff contains two kinds of header. - The <emphasis>file header</emphasis> describes the file being - modified; it contains the name of the file to modify. When - <command>patch</command> sees a new file header, it looks for a - file with that name to start modifying.</para> - - <para>After the file header comes a series of - <emphasis>hunks</emphasis>. Each hunk starts with a header; - this identifies the range of line numbers within the file that - the hunk should modify. Following the header, a hunk starts and - ends with a few (usually three) lines of text from the - unmodified file; these are called the - <emphasis>context</emphasis> for the hunk. If there's only a - small amount of context between successive hunks, - <command>diff</command> doesn't print a new hunk header; it just - runs the hunks together, with a few lines of context between - modifications.</para> - - <para>Each line of context begins with a space character. Within - the hunk, a line that begins with - <quote><literal>-</literal></quote> means <quote>remove this - line,</quote> while a line that begins with - <quote><literal>+</literal></quote> means <quote>insert this - line.</quote> For example, a line that is modified is - represented by one deletion and one insertion.</para> - - <para>We will return to some of the more subtle aspects of patches - later (in section <xref linkend="sec:mq:adv-patch"/>), but you - should have - enough information now to use MQ.</para> - - </sect1> - <sect1 id="sec:mq:start"> - <title>Getting started with Mercurial Queues</title> - - <para>Because MQ is implemented as an extension, you must - explicitly enable before you can use it. (You don't need to - download anything; MQ ships with the standard Mercurial - distribution.) To enable MQ, edit your <filename - role="home">~/.hgrc</filename> file, and add the lines - below.</para> - - <programlisting>[extensions] -hgext.mq =</programlisting> - - <para>Once the extension is enabled, it will make a number of new - commands available. To verify that the extension is working, - you can use <command role="hg-cmd">hg help</command> to see if - the <command role="hg-ext-mq">qinit</command> command is now - available.</para> - -&interaction.mq.qinit-help.help; - - <para>You can use MQ with <emphasis>any</emphasis> Mercurial - repository, and its commands only operate within that - repository. To get started, simply prepare the repository using - the <command role="hg-ext-mq">qinit</command> command.</para> - -&interaction.mq.tutorial.qinit; - - <para>This command creates an empty directory called <filename - role="special" class="directory">.hg/patches</filename>, where - MQ will keep its metadata. As with many Mercurial commands, the - <command role="hg-ext-mq">qinit</command> command prints nothing - if it succeeds.</para> - - <sect2> - <title>Creating a new patch</title> - - <para>To begin work on a new patch, use the <command - role="hg-ext-mq">qnew</command> command. This command takes - one argument, the name of the patch to create.</para> - - <para>MQ will use this as the name of an actual file in the - <filename role="special" - class="directory">.hg/patches</filename> directory, as you - can see below.</para> - -&interaction.mq.tutorial.qnew; - - <para>Also newly present in the <filename role="special" - class="directory">.hg/patches</filename> directory are two - other files, <filename role="special">series</filename> and - <filename role="special">status</filename>. The <filename - role="special">series</filename> file lists all of the - patches that MQ knows about for this repository, with one - patch per line. Mercurial uses the <filename - role="special">status</filename> file for internal - book-keeping; it tracks all of the patches that MQ has - <emphasis>applied</emphasis> in this repository.</para> - - <note> - <para> You may sometimes want to edit the <filename - role="special">series</filename> file by hand; for - example, to change the sequence in which some patches are - applied. However, manually editing the <filename - role="special">status</filename> file is almost always a - bad idea, as it's easy to corrupt MQ's idea of what is - happening.</para> - </note> - - <para>Once you have created your new patch, you can edit files - in the working directory as you usually would. All of the - normal Mercurial commands, such as <command role="hg-cmd">hg - diff</command> and <command role="hg-cmd">hg - annotate</command>, work exactly as they did before.</para> - - </sect2> - <sect2> - <title>Refreshing a patch</title> - - <para>When you reach a point where you want to save your work, - use the <command role="hg-ext-mq">qrefresh</command> command - to update the patch you are working on.</para> - -&interaction.mq.tutorial.qrefresh; - - <para>This command folds the changes you have made in the - working directory into your patch, and updates its - corresponding changeset to contain those changes.</para> - - <para>You can run <command role="hg-ext-mq">qrefresh</command> - as often as you like, so it's a good way to - <quote>checkpoint</quote> your work. Refresh your patch at an - opportune time; try an experiment; and if the experiment - doesn't work out, <command role="hg-cmd">hg revert</command> - your modifications back to the last time you refreshed.</para> - -&interaction.mq.tutorial.qrefresh2; - - </sect2> - <sect2> - <title>Stacking and tracking patches</title> - - <para>Once you have finished working on a patch, or need to work - on another, you can use the <command - role="hg-ext-mq">qnew</command> command again to create a - new patch. Mercurial will apply this patch on top of your - existing patch.</para> - -&interaction.mq.tutorial.qnew2; - <para>Notice that the patch contains the changes in our prior - patch as part of its context (you can see this more clearly in - the output of <command role="hg-cmd">hg - annotate</command>).</para> - - <para>So far, with the exception of <command - role="hg-ext-mq">qnew</command> and <command - role="hg-ext-mq">qrefresh</command>, we've been careful to - only use regular Mercurial commands. However, MQ provides - many commands that are easier to use when you are thinking - about patches, as illustrated below.</para> - -&interaction.mq.tutorial.qseries; - - <itemizedlist> - <listitem><para>The <command - role="hg-ext-mq">qseries</command> command lists every - patch that MQ knows about in this repository, from oldest - to newest (most recently - <emphasis>created</emphasis>).</para> - </listitem> - <listitem><para>The <command - role="hg-ext-mq">qapplied</command> command lists every - patch that MQ has <emphasis>applied</emphasis> in this - repository, again from oldest to newest (most recently - applied).</para> - </listitem></itemizedlist> - - </sect2> - <sect2> - <title>Manipulating the patch stack</title> - - <para>The previous discussion implied that there must be a - difference between <quote>known</quote> and - <quote>applied</quote> patches, and there is. MQ can manage a - patch without it being applied in the repository.</para> - - <para>An <emphasis>applied</emphasis> patch has a corresponding - changeset in the repository, and the effects of the patch and - changeset are visible in the working directory. You can undo - the application of a patch using the <command - role="hg-ext-mq">qpop</command> command. MQ still - <emphasis>knows about</emphasis>, or manages, a popped patch, - but the patch no longer has a corresponding changeset in the - repository, and the working directory does not contain the - changes made by the patch. Figure <xref - linkend="fig:mq:stack"/> illustrates - the difference between applied and tracked patches.</para> - - <informalfigure id="fig:mq:stack"> - <mediaobject><imageobject><imagedata - fileref="mq-stack"/></imageobject><textobject><phrase>XXX - add text</phrase></textobject><caption><para>Applied and - unapplied patches in the MQ patch - stack</para></caption></mediaobject> - </informalfigure> - - <para>You can reapply an unapplied, or popped, patch using the - <command role="hg-ext-mq">qpush</command> command. This - creates a new changeset to correspond to the patch, and the - patch's changes once again become present in the working - directory. See below for examples of <command - role="hg-ext-mq">qpop</command> and <command - role="hg-ext-mq">qpush</command> in action.</para> -&interaction.mq.tutorial.qpop; - - <para>Notice that once we have popped a patch or two patches, - the output of <command role="hg-ext-mq">qseries</command> - remains the same, while that of <command - role="hg-ext-mq">qapplied</command> has changed.</para> - - - </sect2> - <sect2> - <title>Pushing and popping many patches</title> - - <para>While <command role="hg-ext-mq">qpush</command> and - <command role="hg-ext-mq">qpop</command> each operate on a - single patch at a time by default, you can push and pop many - patches in one go. The <option - role="hg-ext-mq-cmd-qpush-opt">hg -a</option> option to - <command role="hg-ext-mq">qpush</command> causes it to push - all unapplied patches, while the <option - role="hg-ext-mq-cmd-qpop-opt">-a</option> option to <command - role="hg-ext-mq">qpop</command> causes it to pop all applied - patches. (For some more ways to push and pop many patches, - see section <xref linkend="sec:mq:perf"/> - below.)</para> - -&interaction.mq.tutorial.qpush-a; - - </sect2> - <sect2> - <title>Safety checks, and overriding them</title> - - <para>Several MQ commands check the working directory before - they do anything, and fail if they find any modifications. - They do this to ensure that you won't lose any changes that - you have made, but not yet incorporated into a patch. The - example below illustrates this; the <command - role="hg-ext-mq">qnew</command> command will not create a - new patch if there are outstanding changes, caused in this - case by the <command role="hg-cmd">hg add</command> of - <filename>file3</filename>.</para> - -&interaction.mq.tutorial.add; - - <para>Commands that check the working directory all take an - <quote>I know what I'm doing</quote> option, which is always - named <option>-f</option>. The exact meaning of - <option>-f</option> depends on the command. For example, - <command role="hg-cmd">hg qnew <option - role="hg-ext-mq-cmd-qnew-opt">hg -f</option></command> - will incorporate any outstanding changes into the new patch it - creates, but <command role="hg-cmd">hg qpop <option - role="hg-ext-mq-cmd-qpop-opt">hg -f</option></command> - will revert modifications to any files affected by the patch - that it is popping. Be sure to read the documentation for a - command's <option>-f</option> option before you use it!</para> - - </sect2> - <sect2> - <title>Working on several patches at once</title> - - <para>The <command role="hg-ext-mq">qrefresh</command> command - always refreshes the <emphasis>topmost</emphasis> applied - patch. This means that you can suspend work on one patch (by - refreshing it), pop or push to make a different patch the top, - and work on <emphasis>that</emphasis> patch for a - while.</para> - - <para>Here's an example that illustrates how you can use this - ability. Let's say you're developing a new feature as two - patches. The first is a change to the core of your software, - and the second&emdash;layered on top of the - first&emdash;changes the user interface to use the code you - just added to the core. If you notice a bug in the core while - you're working on the UI patch, it's easy to fix the core. - Simply <command role="hg-ext-mq">qrefresh</command> the UI - patch to save your in-progress changes, and <command - role="hg-ext-mq">qpop</command> down to the core patch. Fix - the core bug, <command role="hg-ext-mq">qrefresh</command> the - core patch, and <command role="hg-ext-mq">qpush</command> back - to the UI patch to continue where you left off.</para> - - </sect2> - </sect1> - <sect1 id="sec:mq:adv-patch"> - <title>More about patches</title> - - <para>MQ uses the GNU <command>patch</command> command to apply - patches, so it's helpful to know a few more detailed aspects of - how <command>patch</command> works, and about patches - themselves.</para> - - <sect2> - <title>The strip count</title> - - <para>If you look at the file headers in a patch, you will - notice that the pathnames usually have an extra component on - the front that isn't present in the actual path name. This is - a holdover from the way that people used to generate patches - (people still do this, but it's somewhat rare with modern - revision control tools).</para> - - <para>Alice would unpack a tarball, edit her files, then decide - that she wanted to create a patch. So she'd rename her - working directory, unpack the tarball again (hence the need - for the rename), and use the <option - role="cmd-opt-diff">-r</option> and <option - role="cmd-opt-diff">-N</option> options to - <command>diff</command> to recursively generate a patch - between the unmodified directory and the modified one. The - result would be that the name of the unmodified directory - would be at the front of the left-hand path in every file - header, and the name of the modified directory would be at the - front of the right-hand path.</para> - - <para>Since someone receiving a patch from the Alices of the net - would be unlikely to have unmodified and modified directories - with exactly the same names, the <command>patch</command> - command has a <option role="cmd-opt-patch">-p</option> option - that indicates the number of leading path name components to - strip when trying to apply a patch. This number is called the - <emphasis>strip count</emphasis>.</para> - - <para>An option of <quote><literal>-p1</literal></quote> means - <quote>use a strip count of one</quote>. If - <command>patch</command> sees a file name - <filename>foo/bar/baz</filename> in a file header, it will - strip <filename>foo</filename> and try to patch a file named - <filename>bar/baz</filename>. (Strictly speaking, the strip - count refers to the number of <emphasis>path - separators</emphasis> (and the components that go with them - ) to strip. A strip count of one will turn - <filename>foo/bar</filename> into <filename>bar</filename>, - but <filename>/foo/bar</filename> (notice the extra leading - slash) into <filename>foo/bar</filename>.)</para> - - <para>The <quote>standard</quote> strip count for patches is - one; almost all patches contain one leading path name - component that needs to be stripped. Mercurial's <command - role="hg-cmd">hg diff</command> command generates path names - in this form, and the <command role="hg-cmd">hg - import</command> command and MQ expect patches to have a - strip count of one.</para> - - <para>If you receive a patch from someone that you want to add - to your patch queue, and the patch needs a strip count other - than one, you cannot just <command - role="hg-ext-mq">qimport</command> the patch, because - <command role="hg-ext-mq">qimport</command> does not yet have - a <literal>-p</literal> option (see <ulink role="hg-bug" - url="http://www.selenic.com/mercurial/bts/issue311">issue - 311</ulink>). Your best bet is to <command - role="hg-ext-mq">qnew</command> a patch of your own, then - use <command>patch -pN</command> to apply their patch, - followed by <command role="hg-cmd">hg addremove</command> to - pick up any files added or removed by the patch, followed by - <command role="hg-ext-mq">hg qrefresh</command>. This - complexity may become unnecessary; see <ulink role="hg-bug" - url="http://www.selenic.com/mercurial/bts/issue311">issue - 311</ulink> for details. - </para> - </sect2> - <sect2> - <title>Strategies for applying a patch</title> - - <para>When <command>patch</command> applies a hunk, it tries a - handful of successively less accurate strategies to try to - make the hunk apply. This falling-back technique often makes - it possible to take a patch that was generated against an old - version of a file, and apply it against a newer version of - that file.</para> - - <para>First, <command>patch</command> tries an exact match, - where the line numbers, the context, and the text to be - modified must apply exactly. If it cannot make an exact - match, it tries to find an exact match for the context, - without honouring the line numbering information. If this - succeeds, it prints a line of output saying that the hunk was - applied, but at some <emphasis>offset</emphasis> from the - original line number.</para> - - <para>If a context-only match fails, <command>patch</command> - removes the first and last lines of the context, and tries a - <emphasis>reduced</emphasis> context-only match. If the hunk - with reduced context succeeds, it prints a message saying that - it applied the hunk with a <emphasis>fuzz factor</emphasis> - (the number after the fuzz factor indicates how many lines of - context <command>patch</command> had to trim before the patch - applied).</para> - - <para>When neither of these techniques works, - <command>patch</command> prints a message saying that the hunk - in question was rejected. It saves rejected hunks (also - simply called <quote>rejects</quote>) to a file with the same - name, and an added <filename role="special">.rej</filename> - extension. It also saves an unmodified copy of the file with - a <filename role="special">.orig</filename> extension; the - copy of the file without any extensions will contain any - changes made by hunks that <emphasis>did</emphasis> apply - cleanly. If you have a patch that modifies - <filename>foo</filename> with six hunks, and one of them fails - to apply, you will have: an unmodified - <filename>foo.orig</filename>, a <filename>foo.rej</filename> - containing one hunk, and <filename>foo</filename>, containing - the changes made by the five successful hunks.</para> - - </sect2> - <sect2> - <title>Some quirks of patch representation</title> - - <para>There are a few useful things to know about how - <command>patch</command> works with files.</para> - <itemizedlist> - <listitem><para>This should already be obvious, but - <command>patch</command> cannot handle binary - files.</para> - </listitem> - <listitem><para>Neither does it care about the executable bit; - it creates new files as readable, but not - executable.</para> - </listitem> - <listitem><para><command>patch</command> treats the removal of - a file as a diff between the file to be removed and the - empty file. So your idea of <quote>I deleted this - file</quote> looks like <quote>every line of this file - was deleted</quote> in a patch.</para> - </listitem> - <listitem><para>It treats the addition of a file as a diff - between the empty file and the file to be added. So in a - patch, your idea of <quote>I added this file</quote> looks - like <quote>every line of this file was - added</quote>.</para> - </listitem> - <listitem><para>It treats a renamed file as the removal of the - old name, and the addition of the new name. This means - that renamed files have a big footprint in patches. (Note - also that Mercurial does not currently try to infer when - files have been renamed or copied in a patch.)</para> - </listitem> - <listitem><para><command>patch</command> cannot represent - empty files, so you cannot use a patch to represent the - notion <quote>I added this empty file to the - tree</quote>.</para> - </listitem></itemizedlist> - </sect2> - <sect2> - <title>Beware the fuzz</title> - - <para>While applying a hunk at an offset, or with a fuzz factor, - will often be completely successful, these inexact techniques - naturally leave open the possibility of corrupting the patched - file. The most common cases typically involve applying a - patch twice, or at an incorrect location in the file. If - <command>patch</command> or <command - role="hg-ext-mq">qpush</command> ever mentions an offset or - fuzz factor, you should make sure that the modified files are - correct afterwards.</para> - - <para>It's often a good idea to refresh a patch that has applied - with an offset or fuzz factor; refreshing the patch generates - new context information that will make it apply cleanly. I - say <quote>often,</quote> not <quote>always,</quote> because - sometimes refreshing a patch will make it fail to apply - against a different revision of the underlying files. In some - cases, such as when you're maintaining a patch that must sit - on top of multiple versions of a source tree, it's acceptable - to have a patch apply with some fuzz, provided you've verified - the results of the patching process in such cases.</para> - - </sect2> - <sect2> - <title>Handling rejection</title> - - <para>If <command role="hg-ext-mq">qpush</command> fails to - apply a patch, it will print an error message and exit. If it - has left <filename role="special">.rej</filename> files - behind, it is usually best to fix up the rejected hunks before - you push more patches or do any further work.</para> - - <para>If your patch <emphasis>used to</emphasis> apply cleanly, - and no longer does because you've changed the underlying code - that your patches are based on, Mercurial Queues can help; see - section <xref - linkend="sec:mq:merge"/> for details.</para> - - <para>Unfortunately, there aren't any great techniques for - dealing with rejected hunks. Most often, you'll need to view - the <filename role="special">.rej</filename> file and edit the - target file, applying the rejected hunks by hand.</para> - - <para>If you're feeling adventurous, Neil Brown, a Linux kernel - hacker, wrote a tool called <command>wiggle</command> - <citation>web:wiggle</citation>, which is more vigorous than - <command>patch</command> in its attempts to make a patch - apply.</para> - - <para>Another Linux kernel hacker, Chris Mason (the author of - Mercurial Queues), wrote a similar tool called - <command>mpatch</command> <citation>web:mpatch</citation>, - which takes a simple approach to automating the application of - hunks rejected by <command>patch</command>. The - <command>mpatch</command> command can help with four common - reasons that a hunk may be rejected:</para> - - <itemizedlist> - <listitem><para>The context in the middle of a hunk has - changed.</para> - </listitem> - <listitem><para>A hunk is missing some context at the - beginning or end.</para> - </listitem> - <listitem><para>A large hunk might apply better&emdash;either - entirely or in part&emdash;if it was broken up into - smaller hunks.</para> - </listitem> - <listitem><para>A hunk removes lines with slightly different - content than those currently present in the file.</para> - </listitem></itemizedlist> - - <para>If you use <command>wiggle</command> or - <command>mpatch</command>, you should be doubly careful to - check your results when you're done. In fact, - <command>mpatch</command> enforces this method of - double-checking the tool's output, by automatically dropping - you into a merge program when it has done its job, so that you - can verify its work and finish off any remaining - merges.</para> - - </sect2> - </sect1> - <sect1 id="sec:mq:perf"> - <title>Getting the best performance out of MQ</title> - - <para>MQ is very efficient at handling a large number of patches. - I ran some performance experiments in mid-2006 for a talk that I - gave at the 2006 EuroPython conference - <citation>web:europython</citation>. I used as my data set the - Linux 2.6.17-mm1 patch series, which consists of 1,738 patches. - I applied these on top of a Linux kernel repository containing - all 27,472 revisions between Linux 2.6.12-rc2 and Linux - 2.6.17.</para> - - <para>On my old, slow laptop, I was able to <command - role="hg-cmd">hg qpush <option - role="hg-ext-mq-cmd-qpush-opt">hg -a</option></command> all - 1,738 patches in 3.5 minutes, and <command role="hg-cmd">hg qpop - <option role="hg-ext-mq-cmd-qpop-opt">hg -a</option></command> - them all in 30 seconds. (On a newer laptop, the time to push - all patches dropped to two minutes.) I could <command - role="hg-ext-mq">qrefresh</command> one of the biggest patches - (which made 22,779 lines of changes to 287 files) in 6.6 - seconds.</para> - - <para>Clearly, MQ is well suited to working in large trees, but - there are a few tricks you can use to get the best performance - of it.</para> - - <para>First of all, try to <quote>batch</quote> operations - together. Every time you run <command - role="hg-ext-mq">qpush</command> or <command - role="hg-ext-mq">qpop</command>, these commands scan the - working directory once to make sure you haven't made some - changes and then forgotten to run <command - role="hg-ext-mq">qrefresh</command>. On a small tree, the - time that this scan takes is unnoticeable. However, on a - medium-sized tree (containing tens of thousands of files), it - can take a second or more.</para> - - <para>The <command role="hg-ext-mq">qpush</command> and <command - role="hg-ext-mq">qpop</command> commands allow you to push and - pop multiple patches at a time. You can identify the - <quote>destination patch</quote> that you want to end up at. - When you <command role="hg-ext-mq">qpush</command> with a - destination specified, it will push patches until that patch is - at the top of the applied stack. When you <command - role="hg-ext-mq">qpop</command> to a destination, MQ will pop - patches until the destination patch is at the top.</para> - - <para>You can identify a destination patch using either the name - of the patch, or by number. If you use numeric addressing, - patches are counted from zero; this means that the first patch - is zero, the second is one, and so on.</para> - - </sect1> - <sect1 id="sec:mq:merge"> - <title>Updating your patches when the underlying code - changes</title> - - <para>It's common to have a stack of patches on top of an - underlying repository that you don't modify directly. If you're - working on changes to third-party code, or on a feature that is - taking longer to develop than the rate of change of the code - beneath, you will often need to sync up with the underlying - code, and fix up any hunks in your patches that no longer apply. - This is called <emphasis>rebasing</emphasis> your patch - series.</para> - - <para>The simplest way to do this is to <command role="hg-cmd">hg - qpop <option role="hg-ext-mq-cmd-qpop-opt">hg - -a</option></command> your patches, then <command - role="hg-cmd">hg pull</command> changes into the underlying - repository, and finally <command role="hg-cmd">hg qpush <option - role="hg-ext-mq-cmd-qpop-opt">hg -a</option></command> your - patches again. MQ will stop pushing any time it runs across a - patch that fails to apply during conflicts, allowing you to fix - your conflicts, <command role="hg-ext-mq">qrefresh</command> the - affected patch, and continue pushing until you have fixed your - entire stack.</para> - - <para>This approach is easy to use and works well if you don't - expect changes to the underlying code to affect how well your - patches apply. If your patch stack touches code that is modified - frequently or invasively in the underlying repository, however, - fixing up rejected hunks by hand quickly becomes - tiresome.</para> - - <para>It's possible to partially automate the rebasing process. - If your patches apply cleanly against some revision of the - underlying repo, MQ can use this information to help you to - resolve conflicts between your patches and a different - revision.</para> - - <para>The process is a little involved.</para> - <orderedlist> - <listitem><para>To begin, <command role="hg-cmd">hg qpush - -a</command> all of your patches on top of the revision - where you know that they apply cleanly.</para> - </listitem> - <listitem><para>Save a backup copy of your patch directory using - <command role="hg-cmd">hg qsave <option - role="hg-ext-mq-cmd-qsave-opt">hg -e</option> <option - role="hg-ext-mq-cmd-qsave-opt">hg -c</option></command>. - This prints the name of the directory that it has saved the - patches in. It will save the patches to a directory called - <filename role="special" - class="directory">.hg/patches.N</filename>, where - <literal>N</literal> is a small integer. It also commits a - <quote>save changeset</quote> on top of your applied - patches; this is for internal book-keeping, and records the - states of the <filename role="special">series</filename> and - <filename role="special">status</filename> files.</para> - </listitem> - <listitem><para>Use <command role="hg-cmd">hg pull</command> to - bring new changes into the underlying repository. (Don't - run <command role="hg-cmd">hg pull -u</command>; see below - for why.)</para> - </listitem> - <listitem><para>Update to the new tip revision, using <command - role="hg-cmd">hg update <option - role="hg-opt-update">-C</option></command> to override - the patches you have pushed.</para> - </listitem> - <listitem><para>Merge all patches using <command>hg qpush -m - -a</command>. The <option - role="hg-ext-mq-cmd-qpush-opt">-m</option> option to - <command role="hg-ext-mq">qpush</command> tells MQ to - perform a three-way merge if the patch fails to - apply.</para> - </listitem></orderedlist> - - <para>During the <command role="hg-cmd">hg qpush <option - role="hg-ext-mq-cmd-qpush-opt">hg -m</option></command>, - each patch in the <filename role="special">series</filename> - file is applied normally. If a patch applies with fuzz or - rejects, MQ looks at the queue you <command - role="hg-ext-mq">qsave</command>d, and performs a three-way - merge with the corresponding changeset. This merge uses - Mercurial's normal merge machinery, so it may pop up a GUI merge - tool to help you to resolve problems.</para> - - <para>When you finish resolving the effects of a patch, MQ - refreshes your patch based on the result of the merge.</para> - - <para>At the end of this process, your repository will have one - extra head from the old patch queue, and a copy of the old patch - queue will be in <filename role="special" - class="directory">.hg/patches.N</filename>. You can remove the - extra head using <command role="hg-cmd">hg qpop -a -n - patches.N</command> or <command role="hg-cmd">hg - strip</command>. You can delete <filename role="special" - class="directory">.hg/patches.N</filename> once you are sure - that you no longer need it as a backup.</para> - - </sect1> - <sect1> - <title>Identifying patches</title> - - <para>MQ commands that work with patches let you refer to a patch - either by using its name or by a number. By name is obvious - enough; pass the name <filename>foo.patch</filename> to <command - role="hg-ext-mq">qpush</command>, for example, and it will - push patches until <filename>foo.patch</filename> is - applied.</para> - - <para>As a shortcut, you can refer to a patch using both a name - and a numeric offset; <literal>foo.patch-2</literal> means - <quote>two patches before <literal>foo.patch</literal></quote>, - while <literal>bar.patch+4</literal> means <quote>four patches - after <literal>bar.patch</literal></quote>.</para> - - <para>Referring to a patch by index isn't much different. The - first patch printed in the output of <command - role="hg-ext-mq">qseries</command> is patch zero (yes, it's - one of those start-at-zero counting systems); the second is - patch one; and so on.</para> - - <para>MQ also makes it easy to work with patches when you are - using normal Mercurial commands. Every command that accepts a - changeset ID will also accept the name of an applied patch. MQ - augments the tags normally in the repository with an eponymous - one for each applied patch. In addition, the special tags - <literal role="tag">qbase</literal> and - <literal role="tag">qtip</literal> identify - the <quote>bottom-most</quote> and topmost applied patches, - respectively.</para> - - <para>These additions to Mercurial's normal tagging capabilities - make dealing with patches even more of a breeze.</para> - <itemizedlist> - <listitem><para>Want to patchbomb a mailing list with your - latest series of changes?</para> - <programlisting>hg email qbase:qtip</programlisting> - <para> (Don't know what <quote>patchbombing</quote> is? See - section <xref linkend="sec:hgext:patchbomb"/>.)</para> - </listitem> - <listitem><para>Need to see all of the patches since - <literal>foo.patch</literal> that have touched files in a - subdirectory of your tree?</para> - <programlisting>hg log -r foo.patch:qtip subdir</programlisting> - </listitem> - </itemizedlist> - - <para>Because MQ makes the names of patches available to the rest - of Mercurial through its normal internal tag machinery, you - don't need to type in the entire name of a patch when you want - to identify it by name.</para> - - <para>Another nice consequence of representing patch names as tags - is that when you run the <command role="hg-cmd">hg log</command> - command, it will display a patch's name as a tag, simply as part - of its normal output. This makes it easy to visually - distinguish applied patches from underlying - <quote>normal</quote> revisions. The following example shows a - few normal Mercurial commands in use with applied - patches.</para> - -&interaction.mq.id.output; - - </sect1> - <sect1> - <title>Useful things to know about</title> - - <para>There are a number of aspects of MQ usage that don't fit - tidily into sections of their own, but that are good to know. - Here they are, in one place.</para> - - <itemizedlist> - <listitem><para>Normally, when you <command - role="hg-ext-mq">qpop</command> a patch and <command - role="hg-ext-mq">qpush</command> it again, the changeset - that represents the patch after the pop/push will have a - <emphasis>different identity</emphasis> than the changeset - that represented the hash beforehand. See section <xref - linkend="sec:mqref:cmd:qpush"/> for - information as to why this is.</para> - </listitem> - <listitem><para>It's not a good idea to <command - role="hg-cmd">hg merge</command> changes from another - branch with a patch changeset, at least if you want to - maintain the <quote>patchiness</quote> of that changeset and - changesets below it on the patch stack. If you try to do - this, it will appear to succeed, but MQ will become - confused.</para> - </listitem></itemizedlist> - - </sect1> - <sect1 id="sec:mq:repo"> - <title>Managing patches in a repository</title> - - <para>Because MQ's <filename role="special" - class="directory">.hg/patches</filename> directory resides - outside a Mercurial repository's working directory, the - <quote>underlying</quote> Mercurial repository knows nothing - about the management or presence of patches.</para> - - <para>This presents the interesting possibility of managing the - contents of the patch directory as a Mercurial repository in its - own right. This can be a useful way to work. For example, you - can work on a patch for a while, <command - role="hg-ext-mq">qrefresh</command> it, then <command - role="hg-cmd">hg commit</command> the current state of the - patch. This lets you <quote>roll back</quote> to that version - of the patch later on.</para> - - <para>You can then share different versions of the same patch - stack among multiple underlying repositories. I use this when I - am developing a Linux kernel feature. I have a pristine copy of - my kernel sources for each of several CPU architectures, and a - cloned repository under each that contains the patches I am - working on. When I want to test a change on a different - architecture, I push my current patches to the patch repository - associated with that kernel tree, pop and push all of my - patches, and build and test that kernel.</para> - - <para>Managing patches in a repository makes it possible for - multiple developers to work on the same patch series without - colliding with each other, all on top of an underlying source - base that they may or may not control.</para> - - <sect2> - <title>MQ support for patch repositories</title> - - <para>MQ helps you to work with the <filename role="special" - class="directory">.hg/patches</filename> directory as a - repository; when you prepare a repository for working with - patches using <command role="hg-ext-mq">qinit</command>, you - can pass the <option role="hg-ext-mq-cmd-qinit-opt">hg - -c</option> option to create the <filename role="special" - class="directory">.hg/patches</filename> directory as a - Mercurial repository.</para> - - <note> - <para> If you forget to use the <option - role="hg-ext-mq-cmd-qinit-opt">hg -c</option> option, you - can simply go into the <filename role="special" - class="directory">.hg/patches</filename> directory at any - time and run <command role="hg-cmd">hg init</command>. - Don't forget to add an entry for the <filename - role="special">status</filename> file to the <filename - role="special">.hgignore</filename> file, though</para> - - <para> (<command role="hg-cmd">hg qinit <option - role="hg-ext-mq-cmd-qinit-opt">hg -c</option></command> - does this for you automatically); you - <emphasis>really</emphasis> don't want to manage the - <filename role="special">status</filename> file.</para> - </note> - - <para>As a convenience, if MQ notices that the <filename - class="directory">.hg/patches</filename> directory is a - repository, it will automatically <command role="hg-cmd">hg - add</command> every patch that you create and import.</para> - - <para>MQ provides a shortcut command, <command - role="hg-ext-mq">qcommit</command>, that runs <command - role="hg-cmd">hg commit</command> in the <filename - role="special" class="directory">.hg/patches</filename> - directory. This saves some bothersome typing.</para> - - <para>Finally, as a convenience to manage the patch directory, - you can define the alias <command>mq</command> on Unix - systems. For example, on Linux systems using the - <command>bash</command> shell, you can include the following - snippet in your <filename - role="home">~/.bashrc</filename>.</para> - - <programlisting>alias mq=`hg -R $(hg root)/.hg/patches'</programlisting> - - <para>You can then issue commands of the form <command>mq - pull</command> from the main repository.</para> - - </sect2> - <sect2> - <title>A few things to watch out for</title> - - <para>MQ's support for working with a repository full of patches - is limited in a few small respects.</para> - - <para>MQ cannot automatically detect changes that you make to - the patch directory. If you <command role="hg-cmd">hg - pull</command>, manually edit, or <command role="hg-cmd">hg - update</command> changes to patches or the <filename - role="special">series</filename> file, you will have to - <command role="hg-cmd">hg qpop <option - role="hg-ext-mq-cmd-qpop-opt">hg -a</option></command> and - then <command role="hg-cmd">hg qpush <option - role="hg-ext-mq-cmd-qpush-opt">hg -a</option></command> in - the underlying repository to see those changes show up there. - If you forget to do this, you can confuse MQ's idea of which - patches are applied.</para> - - </sect2> - </sect1> - <sect1 id="sec:mq:tools"> - <title>Third party tools for working with patches</title> - - <para>Once you've been working with patches for a while, you'll - find yourself hungry for tools that will help you to understand - and manipulate the patches you're dealing with.</para> - - <para>The <command>diffstat</command> command - <citation>web:diffstat</citation> generates a histogram of the - modifications made to each file in a patch. It provides a good - way to <quote>get a sense of</quote> a patch&emdash;which files - it affects, and how much change it introduces to each file and - as a whole. (I find that it's a good idea to use - <command>diffstat</command>'s <option - role="cmd-opt-diffstat">-p</option> option as a matter of - course, as otherwise it will try to do clever things with - prefixes of file names that inevitably confuse at least - me.)</para> - -&interaction.mq.tools.tools; - - <para>The <literal role="package">patchutils</literal> package - <citation>web:patchutils</citation> is invaluable. It provides a - set of small utilities that follow the <quote>Unix - philosophy;</quote> each does one useful thing with a patch. - The <literal role="package">patchutils</literal> command I use - most is <command>filterdiff</command>, which extracts subsets - from a patch file. For example, given a patch that modifies - hundreds of files across dozens of directories, a single - invocation of <command>filterdiff</command> can generate a - smaller patch that only touches files whose names match a - particular glob pattern. See section <xref - linkend="mq-collab:tips:interdiff"/> for another - example.</para> - - </sect1> - <sect1> - <title>Good ways to work with patches</title> - - <para>Whether you are working on a patch series to submit to a - free software or open source project, or a series that you - intend to treat as a sequence of regular changesets when you're - done, you can use some simple techniques to keep your work well - organised.</para> - - <para>Give your patches descriptive names. A good name for a - patch might be <filename>rework-device-alloc.patch</filename>, - because it will immediately give you a hint what the purpose of - the patch is. Long names shouldn't be a problem; you won't be - typing the names often, but you <emphasis>will</emphasis> be - running commands like <command - role="hg-ext-mq">qapplied</command> and <command - role="hg-ext-mq">qtop</command> over and over. Good naming - becomes especially important when you have a number of patches - to work with, or if you are juggling a number of different tasks - and your patches only get a fraction of your attention.</para> - - <para>Be aware of what patch you're working on. Use the <command - role="hg-ext-mq">qtop</command> command and skim over the text - of your patches frequently&emdash;for example, using <command - role="hg-cmd">hg tip <option - role="hg-opt-tip">-p</option></command>)&emdash;to be sure - of where you stand. I have several times worked on and <command - role="hg-ext-mq">qrefresh</command>ed a patch other than the - one I intended, and it's often tricky to migrate changes into - the right patch after making them in the wrong one.</para> - - <para>For this reason, it is very much worth investing a little - time to learn how to use some of the third-party tools I - described in section <xref linkend="sec:mq:tools"/>, - particularly - <command>diffstat</command> and <command>filterdiff</command>. - The former will give you a quick idea of what changes your patch - is making, while the latter makes it easy to splice hunks - selectively out of one patch and into another.</para> - - </sect1> - <sect1> - <title>MQ cookbook</title> - - <sect2> - <title>Manage <quote>trivial</quote> patches</title> - - <para>Because the overhead of dropping files into a new - Mercurial repository is so low, it makes a lot of sense to - manage patches this way even if you simply want to make a few - changes to a source tarball that you downloaded.</para> - - <para>Begin by downloading and unpacking the source tarball, and - turning it into a Mercurial repository.</para> - - &interaction.mq.tarball.download; - - <para>Continue by creating a patch stack and making your - changes.</para> - - &interaction.mq.tarball.qinit; - - <para>Let's say a few weeks or months pass, and your package - author releases a new version. First, bring their changes - into the repository.</para> - - &interaction.mq.tarball.newsource; - - <para>The pipeline starting with <command role="hg-cmd">hg - locate</command> above deletes all files in the working - directory, so that <command role="hg-cmd">hg - commit</command>'s <option - role="hg-opt-commit">--addremove</option> option can - actually tell which files have really been removed in the - newer version of the source.</para> - - <para>Finally, you can apply your patches on top of the new - tree.</para> - - &interaction.mq.tarball.repush; - - </sect2> - <sect2 id="sec:mq:combine"> - <title>Combining entire patches</title> - - <para>MQ provides a command, <command - role="hg-ext-mq">qfold</command> that lets you combine - entire patches. This <quote>folds</quote> the patches you - name, in the order you name them, into the topmost applied - patch, and concatenates their descriptions onto the end of its - description. The patches that you fold must be unapplied - before you fold them.</para> - - <para>The order in which you fold patches matters. If your - topmost applied patch is <literal>foo</literal>, and you - <command role="hg-ext-mq">qfold</command> - <literal>bar</literal> and <literal>quux</literal> into it, - you will end up with a patch that has the same effect as if - you applied first <literal>foo</literal>, then - <literal>bar</literal>, followed by - <literal>quux</literal>.</para> - - </sect2> - <sect2> - <title>Merging part of one patch into another</title> - - <para>Merging <emphasis>part</emphasis> of one patch into - another is more difficult than combining entire - patches.</para> - - <para>If you want to move changes to entire files, you can use - <command>filterdiff</command>'s <option - role="cmd-opt-filterdiff">-i</option> and <option - role="cmd-opt-filterdiff">-x</option> options to choose the - modifications to snip out of one patch, concatenating its - output onto the end of the patch you want to merge into. You - usually won't need to modify the patch you've merged the - changes from. Instead, MQ will report some rejected hunks - when you <command role="hg-ext-mq">qpush</command> it (from - the hunks you moved into the other patch), and you can simply - <command role="hg-ext-mq">qrefresh</command> the patch to drop - the duplicate hunks.</para> - - <para>If you have a patch that has multiple hunks modifying a - file, and you only want to move a few of those hunks, the job - becomes more messy, but you can still partly automate it. Use - <command>lsdiff -nvv</command> to print some metadata about - the patch.</para> - - &interaction.mq.tools.lsdiff; - - <para>This command prints three different kinds of - number:</para> - <itemizedlist> - <listitem><para>(in the first column) a <emphasis>file - number</emphasis> to identify each file modified in the - patch;</para> - </listitem> - <listitem><para>(on the next line, indented) the line number - within a modified file where a hunk starts; and</para> - </listitem> - <listitem><para>(on the same line) a <emphasis>hunk - number</emphasis> to identify that hunk.</para> - </listitem></itemizedlist> - - <para>You'll have to use some visual inspection, and reading of - the patch, to identify the file and hunk numbers you'll want, - but you can then pass them to to - <command>filterdiff</command>'s <option - role="cmd-opt-filterdiff">--files</option> and <option - role="cmd-opt-filterdiff">--hunks</option> options, to - select exactly the file and hunk you want to extract.</para> - - <para>Once you have this hunk, you can concatenate it onto the - end of your destination patch and continue with the remainder - of section <xref linkend="sec:mq:combine"/>.</para> - - </sect2> - </sect1> - <sect1> - <title>Differences between quilt and MQ</title> - - <para>If you are already familiar with quilt, MQ provides a - similar command set. There are a few differences in the way - that it works.</para> - - <para>You will already have noticed that most quilt commands have - MQ counterparts that simply begin with a - <quote><literal>q</literal></quote>. The exceptions are quilt's - <literal>add</literal> and <literal>remove</literal> commands, - the counterparts for which are the normal Mercurial <command - role="hg-cmd">hg add</command> and <command role="hg-cmd">hg - remove</command> commands. There is no MQ equivalent of the - quilt <literal>edit</literal> command.</para> - - </sect1> -</chapter> - -<!-- -local variables: -sgml-parent-document: ("00book.xml" "book" "chapter") -end: --->
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/en/ch13-hgext.xml Thu Mar 19 20:54:12 2009 -0700 @@ -0,0 +1,554 @@ +<!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : --> + +<chapter id="chap:hgext"> + <?dbhtml filename="adding-functionality-with-extensions.html"?> + <title>Adding functionality with extensions</title> + + <para>While the core of Mercurial is quite complete from a + functionality standpoint, it's deliberately shorn of fancy + features. This approach of preserving simplicity keeps the + software easy to deal with for both maintainers and users.</para> + + <para>However, Mercurial doesn't box you in with an inflexible + command set: you can add features to it as + <emphasis>extensions</emphasis> (sometimes known as + <emphasis>plugins</emphasis>). We've already discussed a few of + these extensions in earlier chapters.</para> + <itemizedlist> + <listitem><para>Section <xref linkend="sec:tour-merge:fetch"/> + covers the <literal role="hg-ext">fetch</literal> extension; + this combines pulling new changes and merging them with local + changes into a single command, <command + role="hg-ext-fetch">fetch</command>.</para> + </listitem> + <listitem><para>In chapter <xref linkend="chap:hook"/>, we covered + several extensions that are useful for hook-related + functionality: <literal role="hg-ext">acl</literal> adds + access control lists; <literal + role="hg-ext">bugzilla</literal> adds integration with the + Bugzilla bug tracking system; and <literal + role="hg-ext">notify</literal> sends notification emails on + new changes.</para> + </listitem> + <listitem><para>The Mercurial Queues patch management extension is + so invaluable that it merits two chapters and an appendix all + to itself. Chapter <xref linkend="chap:mq"/> covers the + basics; chapter <xref + linkend="chap:mq-collab"/> discusses advanced topics; + and appendix <xref linkend="chap:mqref"/> goes into detail on + each + command.</para> + </listitem></itemizedlist> + + <para>In this chapter, we'll cover some of the other extensions that + are available for Mercurial, and briefly touch on some of the + machinery you'll need to know about if you want to write an + extension of your own.</para> + <itemizedlist> + <listitem><para>In section <xref linkend="sec:hgext:inotify"/>, + we'll discuss the possibility of <emphasis>huge</emphasis> + performance improvements using the <literal + role="hg-ext">inotify</literal> extension.</para> + </listitem></itemizedlist> + + <sect1 id="sec:hgext:inotify"> + <title>Improve performance with the <literal + role="hg-ext">inotify</literal> extension</title> + + <para>Are you interested in having some of the most common + Mercurial operations run as much as a hundred times faster? + Read on!</para> + + <para>Mercurial has great performance under normal circumstances. + For example, when you run the <command role="hg-cmd">hg + status</command> command, Mercurial has to scan almost every + directory and file in your repository so that it can display + file status. Many other Mercurial commands need to do the same + work behind the scenes; for example, the <command + role="hg-cmd">hg diff</command> command uses the status + machinery to avoid doing an expensive comparison operation on + files that obviously haven't changed.</para> + + <para>Because obtaining file status is crucial to good + performance, the authors of Mercurial have optimised this code + to within an inch of its life. However, there's no avoiding the + fact that when you run <command role="hg-cmd">hg + status</command>, Mercurial is going to have to perform at + least one expensive system call for each managed file to + determine whether it's changed since the last time Mercurial + checked. For a sufficiently large repository, this can take a + long time.</para> + + <para>To put a number on the magnitude of this effect, I created a + repository containing 150,000 managed files. I timed <command + role="hg-cmd">hg status</command> as taking ten seconds to + run, even when <emphasis>none</emphasis> of those files had been + modified.</para> + + <para>Many modern operating systems contain a file notification + facility. If a program signs up to an appropriate service, the + operating system will notify it every time a file of interest is + created, modified, or deleted. On Linux systems, the kernel + component that does this is called + <literal>inotify</literal>.</para> + + <para>Mercurial's <literal role="hg-ext">inotify</literal> + extension talks to the kernel's <literal>inotify</literal> + component to optimise <command role="hg-cmd">hg status</command> + commands. The extension has two components. A daemon sits in + the background and receives notifications from the + <literal>inotify</literal> subsystem. It also listens for + connections from a regular Mercurial command. The extension + modifies Mercurial's behaviour so that instead of scanning the + filesystem, it queries the daemon. Since the daemon has perfect + information about the state of the repository, it can respond + with a result instantaneously, avoiding the need to scan every + directory and file in the repository.</para> + + <para>Recall the ten seconds that I measured plain Mercurial as + taking to run <command role="hg-cmd">hg status</command> on a + 150,000 file repository. With the <literal + role="hg-ext">inotify</literal> extension enabled, the time + dropped to 0.1 seconds, a factor of <emphasis>one + hundred</emphasis> faster.</para> + + <para>Before we continue, please pay attention to some + caveats.</para> + <itemizedlist> + <listitem><para>The <literal role="hg-ext">inotify</literal> + extension is Linux-specific. Because it interfaces directly + to the Linux kernel's <literal>inotify</literal> subsystem, + it does not work on other operating systems.</para> + </listitem> + <listitem><para>It should work on any Linux distribution that + was released after early 2005. Older distributions are + likely to have a kernel that lacks + <literal>inotify</literal>, or a version of + <literal>glibc</literal> that does not have the necessary + interfacing support.</para> + </listitem> + <listitem><para>Not all filesystems are suitable for use with + the <literal role="hg-ext">inotify</literal> extension. + Network filesystems such as NFS are a non-starter, for + example, particularly if you're running Mercurial on several + systems, all mounting the same network filesystem. The + kernel's <literal>inotify</literal> system has no way of + knowing about changes made on another system. Most local + filesystems (e.g. ext3, XFS, ReiserFS) should work + fine.</para> + </listitem></itemizedlist> + + <para>The <literal role="hg-ext">inotify</literal> extension is + not yet shipped with Mercurial as of May 2007, so it's a little + more involved to set up than other extensions. But the + performance improvement is worth it!</para> + + <para>The extension currently comes in two parts: a set of patches + to the Mercurial source code, and a library of Python bindings + to the <literal>inotify</literal> subsystem.</para> + <note> + <para> There are <emphasis>two</emphasis> Python + <literal>inotify</literal> binding libraries. One of them is + called <literal>pyinotify</literal>, and is packaged by some + Linux distributions as <literal>python-inotify</literal>. + This is <emphasis>not</emphasis> the one you'll need, as it is + too buggy and inefficient to be practical.</para> + </note> + <para>To get going, it's best to already have a functioning copy + of Mercurial installed.</para> + <note> + <para> If you follow the instructions below, you'll be + <emphasis>replacing</emphasis> and overwriting any existing + installation of Mercurial that you might already have, using + the latest <quote>bleeding edge</quote> Mercurial code. Don't + say you weren't warned!</para> + </note> + <orderedlist> + <listitem><para>Clone the Python <literal>inotify</literal> + binding repository. Build and install it.</para> + <programlisting>hg clone http://hg.kublai.com/python/inotify +cd inotify +python setup.py build --force +sudo python setup.py install --skip-build</programlisting> + </listitem> + <listitem><para>Clone the <filename + class="directory">crew</filename> Mercurial repository. + Clone the <literal role="hg-ext">inotify</literal> patch + repository so that Mercurial Queues will be able to apply + patches to your cope of the <filename + class="directory">crew</filename> repository.</para> + <programlisting>hg clone http://hg.intevation.org/mercurial/crew +hg clone crew inotify +hg clone http://hg.kublai.com/mercurial/patches/inotify inotify/.hg/patches</programlisting> + </listitem> + <listitem><para>Make sure that you have the Mercurial Queues + extension, <literal role="hg-ext">mq</literal>, enabled. If + you've never used MQ, read section <xref + linkend="sec:mq:start"/> to get started + quickly.</para> + </listitem> + <listitem><para>Go into the <filename + class="directory">inotify</filename> repo, and apply all + of the <literal role="hg-ext">inotify</literal> patches + using the <option role="hg-ext-mq-cmd-qpush-opt">hg + -a</option> option to the <command + role="hg-ext-mq">qpush</command> command.</para> + <programlisting>cd inotify +hg qpush -a</programlisting> + </listitem> + <listitem><para> If you get an error message from <command + role="hg-ext-mq">qpush</command>, you should not continue. + Instead, ask for help.</para> + </listitem> + <listitem><para>Build and install the patched version of + Mercurial.</para> + <programlisting>python setup.py build --force +sudo python setup.py install --skip-build</programlisting> + </listitem> + </orderedlist> + <para>Once you've build a suitably patched version of Mercurial, + all you need to do to enable the <literal + role="hg-ext">inotify</literal> extension is add an entry to + your <filename role="special">~/.hgrc</filename>.</para> + <programlisting>[extensions] inotify =</programlisting> + <para>When the <literal role="hg-ext">inotify</literal> extension + is enabled, Mercurial will automatically and transparently start + the status daemon the first time you run a command that needs + status in a repository. It runs one status daemon per + repository.</para> + + <para>The status daemon is started silently, and runs in the + background. If you look at a list of running processes after + you've enabled the <literal role="hg-ext">inotify</literal> + extension and run a few commands in different repositories, + you'll thus see a few <literal>hg</literal> processes sitting + around, waiting for updates from the kernel and queries from + Mercurial.</para> + + <para>The first time you run a Mercurial command in a repository + when you have the <literal role="hg-ext">inotify</literal> + extension enabled, it will run with about the same performance + as a normal Mercurial command. This is because the status + daemon needs to perform a normal status scan so that it has a + baseline against which to apply later updates from the kernel. + However, <emphasis>every</emphasis> subsequent command that does + any kind of status check should be noticeably faster on + repositories of even fairly modest size. Better yet, the bigger + your repository is, the greater a performance advantage you'll + see. The <literal role="hg-ext">inotify</literal> daemon makes + status operations almost instantaneous on repositories of all + sizes!</para> + + <para>If you like, you can manually start a status daemon using + the <command role="hg-ext-inotify">inserve</command> command. + This gives you slightly finer control over how the daemon ought + to run. This command will of course only be available when the + <literal role="hg-ext">inotify</literal> extension is + enabled.</para> + + <para>When you're using the <literal + role="hg-ext">inotify</literal> extension, you should notice + <emphasis>no difference at all</emphasis> in Mercurial's + behaviour, with the sole exception of status-related commands + running a whole lot faster than they used to. You should + specifically expect that commands will not print different + output; neither should they give different results. If either of + these situations occurs, please report a bug.</para> + + </sect1> + <sect1 id="sec:hgext:extdiff"> + <title>Flexible diff support with the <literal + role="hg-ext">extdiff</literal> extension</title> + + <para>Mercurial's built-in <command role="hg-cmd">hg + diff</command> command outputs plaintext unified diffs.</para> + + &interaction.extdiff.diff; + + <para>If you would like to use an external tool to display + modifications, you'll want to use the <literal + role="hg-ext">extdiff</literal> extension. This will let you + use, for example, a graphical diff tool.</para> + + <para>The <literal role="hg-ext">extdiff</literal> extension is + bundled with Mercurial, so it's easy to set up. In the <literal + role="rc-extensions">extensions</literal> section of your + <filename role="special">~/.hgrc</filename>, simply add a + one-line entry to enable the extension.</para> + <programlisting>[extensions] +extdiff =</programlisting> + <para>This introduces a command named <command + role="hg-ext-extdiff">extdiff</command>, which by default uses + your system's <command>diff</command> command to generate a + unified diff in the same form as the built-in <command + role="hg-cmd">hg diff</command> command.</para> + + &interaction.extdiff.extdiff; + + <para>The result won't be exactly the same as with the built-in + <command role="hg-cmd">hg diff</command> variations, because the + output of <command>diff</command> varies from one system to + another, even when passed the same options.</para> + + <para>As the <quote><literal>making snapshot</literal></quote> + lines of output above imply, the <command + role="hg-ext-extdiff">extdiff</command> command works by + creating two snapshots of your source tree. The first snapshot + is of the source revision; the second, of the target revision or + working directory. The <command + role="hg-ext-extdiff">extdiff</command> command generates + these snapshots in a temporary directory, passes the name of + each directory to an external diff viewer, then deletes the + temporary directory. For efficiency, it only snapshots the + directories and files that have changed between the two + revisions.</para> + + <para>Snapshot directory names have the same base name as your + repository. If your repository path is <filename + class="directory">/quux/bar/foo</filename>, then <filename + class="directory">foo</filename> will be the name of each + snapshot directory. Each snapshot directory name has its + changeset ID appended, if appropriate. If a snapshot is of + revision <literal>a631aca1083f</literal>, the directory will be + named <filename class="directory">foo.a631aca1083f</filename>. + A snapshot of the working directory won't have a changeset ID + appended, so it would just be <filename + class="directory">foo</filename> in this example. To see what + this looks like in practice, look again at the <command + role="hg-ext-extdiff">extdiff</command> example above. Notice + that the diff has the snapshot directory names embedded in its + header.</para> + + <para>The <command role="hg-ext-extdiff">extdiff</command> command + accepts two important options. The <option + role="hg-ext-extdiff-cmd-extdiff-opt">hg -p</option> option + lets you choose a program to view differences with, instead of + <command>diff</command>. With the <option + role="hg-ext-extdiff-cmd-extdiff-opt">hg -o</option> option, + you can change the options that <command + role="hg-ext-extdiff">extdiff</command> passes to the program + (by default, these options are + <quote><literal>-Npru</literal></quote>, which only make sense + if you're running <command>diff</command>). In other respects, + the <command role="hg-ext-extdiff">extdiff</command> command + acts similarly to the built-in <command role="hg-cmd">hg + diff</command> command: you use the same option names, syntax, + and arguments to specify the revisions you want, the files you + want, and so on.</para> + + <para>As an example, here's how to run the normal system + <command>diff</command> command, getting it to generate context + diffs (using the <option role="cmd-opt-diff">-c</option> option) + instead of unified diffs, and five lines of context instead of + the default three (passing <literal>5</literal> as the argument + to the <option role="cmd-opt-diff">-C</option> option).</para> + + &interaction.extdiff.extdiff-ctx; + + <para>Launching a visual diff tool is just as easy. Here's how to + launch the <command>kdiff3</command> viewer.</para> + <programlisting>hg extdiff -p kdiff3 -o</programlisting> + + <para>If your diff viewing command can't deal with directories, + you can easily work around this with a little scripting. For an + example of such scripting in action with the <literal + role="hg-ext">mq</literal> extension and the + <command>interdiff</command> command, see section <xref + linkend="mq-collab:tips:interdiff"/>.</para> + + <sect2> + <title>Defining command aliases</title> + + <para>It can be cumbersome to remember the options to both the + <command role="hg-ext-extdiff">extdiff</command> command and + the diff viewer you want to use, so the <literal + role="hg-ext">extdiff</literal> extension lets you define + <emphasis>new</emphasis> commands that will invoke your diff + viewer with exactly the right options.</para> + + <para>All you need to do is edit your <filename + role="special">~/.hgrc</filename>, and add a section named + <literal role="rc-extdiff">extdiff</literal>. Inside this + section, you can define multiple commands. Here's how to add + a <literal>kdiff3</literal> command. Once you've defined + this, you can type <quote><literal>hg kdiff3</literal></quote> + and the <literal role="hg-ext">extdiff</literal> extension + will run <command>kdiff3</command> for you.</para> + <programlisting>[extdiff] +cmd.kdiff3 =</programlisting> + <para>If you leave the right hand side of the definition empty, + as above, the <literal role="hg-ext">extdiff</literal> + extension uses the name of the command you defined as the name + of the external program to run. But these names don't have to + be the same. Here, we define a command named + <quote><literal>hg wibble</literal></quote>, which runs + <command>kdiff3</command>.</para> + <programlisting>[extdiff] + cmd.wibble = kdiff3</programlisting> + + <para>You can also specify the default options that you want to + invoke your diff viewing program with. The prefix to use is + <quote><literal>opts.</literal></quote>, followed by the name + of the command to which the options apply. This example + defines a <quote><literal>hg vimdiff</literal></quote> command + that runs the <command>vim</command> editor's + <literal>DirDiff</literal> extension.</para> + <programlisting>[extdiff] + cmd.vimdiff = vim +opts.vimdiff = -f '+next' '+execute "DirDiff" argv(0) argv(1)'</programlisting> + + </sect2> + </sect1> + <sect1 id="sec:hgext:transplant"> + <title>Cherrypicking changes with the <literal + role="hg-ext">transplant</literal> extension</title> + + <para>Need to have a long chat with Brendan about this.</para> + + </sect1> + <sect1 id="sec:hgext:patchbomb"> + <title>Send changes via email with the <literal + role="hg-ext">patchbomb</literal> extension</title> + + <para>Many projects have a culture of <quote>change + review</quote>, in which people send their modifications to a + mailing list for others to read and comment on before they + commit the final version to a shared repository. Some projects + have people who act as gatekeepers; they apply changes from + other people to a repository to which those others don't have + access.</para> + + <para>Mercurial makes it easy to send changes over email for + review or application, via its <literal + role="hg-ext">patchbomb</literal> extension. The extension is + so named because changes are formatted as patches, and it's usual + to send one changeset per email message. Sending a long series + of changes by email is thus much like <quote>bombing</quote> the + recipient's inbox, hence <quote>patchbomb</quote>.</para> + + <para>As usual, the basic configuration of the <literal + role="hg-ext">patchbomb</literal> extension takes just one or + two lines in your <filename role="special"> + /.hgrc</filename>.</para> + <programlisting>[extensions] +patchbomb =</programlisting> + <para>Once you've enabled the extension, you will have a new + command available, named <command + role="hg-ext-patchbomb">email</command>.</para> + + <para>The safest and best way to invoke the <command + role="hg-ext-patchbomb">email</command> command is to + <emphasis>always</emphasis> run it first with the <option + role="hg-ext-patchbomb-cmd-email-opt">hg -n</option> option. + This will show you what the command <emphasis>would</emphasis> + send, without actually sending anything. Once you've had a + quick glance over the changes and verified that you are sending + the right ones, you can rerun the same command, with the <option + role="hg-ext-patchbomb-cmd-email-opt">hg -n</option> option + removed.</para> + + <para>The <command role="hg-ext-patchbomb">email</command> command + accepts the same kind of revision syntax as every other + Mercurial command. For example, this command will send every + revision between 7 and <literal>tip</literal>, inclusive.</para> + <programlisting>hg email -n 7:tip</programlisting> + <para>You can also specify a <emphasis>repository</emphasis> to + compare with. If you provide a repository but no revisions, the + <command role="hg-ext-patchbomb">email</command> command will + send all revisions in the local repository that are not present + in the remote repository. If you additionally specify revisions + or a branch name (the latter using the <option + role="hg-ext-patchbomb-cmd-email-opt">hg -b</option> option), + this will constrain the revisions sent.</para> + + <para>It's perfectly safe to run the <command + role="hg-ext-patchbomb">email</command> command without the + names of the people you want to send to: if you do this, it will + just prompt you for those values interactively. (If you're + using a Linux or Unix-like system, you should have enhanced + <literal>readline</literal>-style editing capabilities when + entering those headers, too, which is useful.)</para> + + <para>When you are sending just one revision, the <command + role="hg-ext-patchbomb">email</command> command will by + default use the first line of the changeset description as the + subject of the single email message it sends.</para> + + <para>If you send multiple revisions, the <command + role="hg-ext-patchbomb">email</command> command will usually + send one message per changeset. It will preface the series with + an introductory message, in which you should describe the + purpose of the series of changes you're sending.</para> + + <sect2> + <title>Changing the behaviour of patchbombs</title> + + <para>Not every project has exactly the same conventions for + sending changes in email; the <literal + role="hg-ext">patchbomb</literal> extension tries to + accommodate a number of variations through command line + options.</para> + <itemizedlist> + <listitem><para>You can write a subject for the introductory + message on the command line using the <option + role="hg-ext-patchbomb-cmd-email-opt">hg -s</option> + option. This takes one argument, the text of the subject + to use.</para> + </listitem> + <listitem><para>To change the email address from which the + messages originate, use the <option + role="hg-ext-patchbomb-cmd-email-opt">hg -f</option> + option. This takes one argument, the email address to + use.</para> + </listitem> + <listitem><para>The default behaviour is to send unified diffs + (see section <xref linkend="sec:mq:patch"/> for a + description of the + format), one per message. You can send a binary bundle + instead with the <option + role="hg-ext-patchbomb-cmd-email-opt">hg -b</option> + option.</para> + </listitem> + <listitem><para>Unified diffs are normally prefaced with a + metadata header. You can omit this, and send unadorned + diffs, with the <option + role="hg-ext-patchbomb-cmd-email-opt">hg + --plain</option> option.</para> + </listitem> + <listitem><para>Diffs are normally sent <quote>inline</quote>, + in the same body part as the description of a patch. This + makes it easiest for the largest number of readers to + quote and respond to parts of a diff, as some mail clients + will only quote the first MIME body part in a message. If + you'd prefer to send the description and the diff in + separate body parts, use the <option + role="hg-ext-patchbomb-cmd-email-opt">hg -a</option> + option.</para> + </listitem> + <listitem><para>Instead of sending mail messages, you can + write them to an <literal>mbox</literal>-format mail + folder using the <option + role="hg-ext-patchbomb-cmd-email-opt">hg -m</option> + option. That option takes one argument, the name of the + file to write to.</para> + </listitem> + <listitem><para>If you would like to add a + <command>diffstat</command>-format summary to each patch, + and one to the introductory message, use the <option + role="hg-ext-patchbomb-cmd-email-opt">hg -d</option> + option. The <command>diffstat</command> command displays + a table containing the name of each file patched, the + number of lines affected, and a histogram showing how much + each file is modified. This gives readers a qualitative + glance at how complex a patch is.</para> + </listitem></itemizedlist> + + </sect2> + </sect1> +</chapter> + +<!-- +local variables: +sgml-parent-document: ("00book.xml" "book" "chapter") +end: +-->
--- a/en/ch13-mq-collab.xml Wed Mar 18 00:08:22 2009 -0700 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,518 +0,0 @@ -<!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : --> - -<chapter id="chap:mq-collab"> - <?dbhtml filename="advanced-uses-of-mercurial-queues.html"?> - <title>Advanced uses of Mercurial Queues</title> - - <para>While it's easy to pick up straightforward uses of Mercurial - Queues, use of a little discipline and some of MQ's less - frequently used capabilities makes it possible to work in - complicated development environments.</para> - - <para>In this chapter, I will use as an example a technique I have - used to manage the development of an Infiniband device driver for - the Linux kernel. The driver in question is large (at least as - drivers go), with 25,000 lines of code spread across 35 source - files. It is maintained by a small team of developers.</para> - - <para>While much of the material in this chapter is specific to - Linux, the same principles apply to any code base for which you're - not the primary owner, and upon which you need to do a lot of - development.</para> - - <sect1> - <title>The problem of many targets</title> - - <para>The Linux kernel changes rapidly, and has never been - internally stable; developers frequently make drastic changes - between releases. This means that a version of the driver that - works well with a particular released version of the kernel will - not even <emphasis>compile</emphasis> correctly against, - typically, any other version.</para> - - <para>To maintain a driver, we have to keep a number of distinct - versions of Linux in mind.</para> - <itemizedlist> - <listitem><para>One target is the main Linux kernel development - tree. Maintenance of the code is in this case partly shared - by other developers in the kernel community, who make - <quote>drive-by</quote> modifications to the driver as they - develop and refine kernel subsystems.</para> - </listitem> - <listitem><para>We also maintain a number of - <quote>backports</quote> to older versions of the Linux - kernel, to support the needs of customers who are running - older Linux distributions that do not incorporate our - drivers. (To <emphasis>backport</emphasis> a piece of code - is to modify it to work in an older version of its target - environment than the version it was developed for.)</para> - </listitem> - <listitem><para>Finally, we make software releases on a schedule - that is necessarily not aligned with those used by Linux - distributors and kernel developers, so that we can deliver - new features to customers without forcing them to upgrade - their entire kernels or distributions.</para> - </listitem></itemizedlist> - - <sect2> - <title>Tempting approaches that don't work well</title> - - <para>There are two <quote>standard</quote> ways to maintain a - piece of software that has to target many different - environments.</para> - - <para>The first is to maintain a number of branches, each - intended for a single target. The trouble with this approach - is that you must maintain iron discipline in the flow of - changes between repositories. A new feature or bug fix must - start life in a <quote>pristine</quote> repository, then - percolate out to every backport repository. Backport changes - are more limited in the branches they should propagate to; a - backport change that is applied to a branch where it doesn't - belong will probably stop the driver from compiling.</para> - - <para>The second is to maintain a single source tree filled with - conditional statements that turn chunks of code on or off - depending on the intended target. Because these - <quote>ifdefs</quote> are not allowed in the Linux kernel - tree, a manual or automatic process must be followed to strip - them out and yield a clean tree. A code base maintained in - this fashion rapidly becomes a rat's nest of conditional - blocks that are difficult to understand and maintain.</para> - - <para>Neither of these approaches is well suited to a situation - where you don't <quote>own</quote> the canonical copy of a - source tree. In the case of a Linux driver that is - distributed with the standard kernel, Linus's tree contains - the copy of the code that will be treated by the world as - canonical. The upstream version of <quote>my</quote> driver - can be modified by people I don't know, without me even - finding out about it until after the changes show up in - Linus's tree.</para> - - <para>These approaches have the added weakness of making it - difficult to generate well-formed patches to submit - upstream.</para> - - <para>In principle, Mercurial Queues seems like a good candidate - to manage a development scenario such as the above. While - this is indeed the case, MQ contains a few added features that - make the job more pleasant.</para> - - </sect2> - </sect1> - <sect1> - <title>Conditionally applying patches with guards</title> - - <para>Perhaps the best way to maintain sanity with so many targets - is to be able to choose specific patches to apply for a given - situation. MQ provides a feature called <quote>guards</quote> - (which originates with quilt's <literal>guards</literal> - command) that does just this. To start off, let's create a - simple repository for experimenting in.</para> - - &interaction.mq.guards.init; - - <para>This gives us a tiny repository that contains two patches - that don't have any dependencies on each other, because they - touch different files.</para> - - <para>The idea behind conditional application is that you can - <quote>tag</quote> a patch with a <emphasis>guard</emphasis>, - which is simply a text string of your choosing, then tell MQ to - select specific guards to use when applying patches. MQ will - then either apply, or skip over, a guarded patch, depending on - the guards that you have selected.</para> - - <para>A patch can have an arbitrary number of guards; each one is - <emphasis>positive</emphasis> (<quote>apply this patch if this - guard is selected</quote>) or <emphasis>negative</emphasis> - (<quote>skip this patch if this guard is selected</quote>). A - patch with no guards is always applied.</para> - - </sect1> - <sect1> - <title>Controlling the guards on a patch</title> - - <para>The <command role="hg-ext-mq">qguard</command> command lets - you determine which guards should apply to a patch, or display - the guards that are already in effect. Without any arguments, it - displays the guards on the current topmost patch.</para> - - &interaction.mq.guards.qguard; - - <para>To set a positive guard on a patch, prefix the name of the - guard with a <quote><literal>+</literal></quote>.</para> - - &interaction.mq.guards.qguard.pos; - - <para>To set a negative guard - on a patch, prefix the name of the guard with a - <quote><literal>-</literal></quote>.</para> - - &interaction.mq.guards.qguard.neg; - - <note> - <para> The <command role="hg-ext-mq">qguard</command> command - <emphasis>sets</emphasis> the guards on a patch; it doesn't - <emphasis>modify</emphasis> them. What this means is that if - you run <command role="hg-cmd">hg qguard +a +b</command> on a - patch, then <command role="hg-cmd">hg qguard +c</command> on - the same patch, the <emphasis>only</emphasis> guard that will - be set on it afterwards is <literal>+c</literal>.</para> - </note> - - <para>Mercurial stores guards in the <filename - role="special">series</filename> file; the form in which they - are stored is easy both to understand and to edit by hand. (In - other words, you don't have to use the <command - role="hg-ext-mq">qguard</command> command if you don't want - to; it's okay to simply edit the <filename - role="special">series</filename> file.)</para> - - &interaction.mq.guards.series; - - </sect1> - <sect1> - <title>Selecting the guards to use</title> - - <para>The <command role="hg-ext-mq">qselect</command> command - determines which guards are active at a given time. The effect - of this is to determine which patches MQ will apply the next - time you run <command role="hg-ext-mq">qpush</command>. It has - no other effect; in particular, it doesn't do anything to - patches that are already applied.</para> - - <para>With no arguments, the <command - role="hg-ext-mq">qselect</command> command lists the guards - currently in effect, one per line of output. Each argument is - treated as the name of a guard to apply.</para> - - &interaction.mq.guards.qselect.foo; - - <para>In case you're interested, the currently selected guards are - stored in the <filename role="special">guards</filename> file.</para> - - &interaction.mq.guards.qselect.cat; - - <para>We can see the effect the selected guards have when we run - <command role="hg-ext-mq">qpush</command>.</para> - - &interaction.mq.guards.qselect.qpush; - - <para>A guard cannot start with a - <quote><literal>+</literal></quote> or - <quote><literal>-</literal></quote> character. The name of a - guard must not contain white space, but most other characters - are acceptable. If you try to use a guard with an invalid name, - MQ will complain:</para> - - &interaction.mq.guards.qselect.error; - - <para>Changing the selected guards changes the patches that are - applied.</para> - - &interaction.mq.guards.qselect.quux; - - <para>You can see in the example below that negative guards take - precedence over positive guards.</para> - - &interaction.mq.guards.qselect.foobar; - - </sect1> - <sect1> - <title>MQ's rules for applying patches</title> - - <para>The rules that MQ uses when deciding whether to apply a - patch are as follows.</para> - <itemizedlist> - <listitem><para>A patch that has no guards is always - applied.</para> - </listitem> - <listitem><para>If the patch has any negative guard that matches - any currently selected guard, the patch is skipped.</para> - </listitem> - <listitem><para>If the patch has any positive guard that matches - any currently selected guard, the patch is applied.</para> - </listitem> - <listitem><para>If the patch has positive or negative guards, - but none matches any currently selected guard, the patch is - skipped.</para> - </listitem></itemizedlist> - - </sect1> - <sect1> - <title>Trimming the work environment</title> - - <para>In working on the device driver I mentioned earlier, I don't - apply the patches to a normal Linux kernel tree. Instead, I use - a repository that contains only a snapshot of the source files - and headers that are relevant to Infiniband development. This - repository is 1% the size of a kernel repository, so it's easier - to work with.</para> - - <para>I then choose a <quote>base</quote> version on top of which - the patches are applied. This is a snapshot of the Linux kernel - tree as of a revision of my choosing. When I take the snapshot, - I record the changeset ID from the kernel repository in the - commit message. Since the snapshot preserves the - <quote>shape</quote> and content of the relevant parts of the - kernel tree, I can apply my patches on top of either my tiny - repository or a normal kernel tree.</para> - - <para>Normally, the base tree atop which the patches apply should - be a snapshot of a very recent upstream tree. This best - facilitates the development of patches that can easily be - submitted upstream with few or no modifications.</para> - - </sect1> - <sect1> - <title>Dividing up the <filename role="special">series</filename> - file</title> - - <para>I categorise the patches in the <filename - role="special">series</filename> file into a number of logical - groups. Each section of like patches begins with a block of - comments that describes the purpose of the patches that - follow.</para> - - <para>The sequence of patch groups that I maintain follows. The - ordering of these groups is important; I'll describe why after I - introduce the groups.</para> - <itemizedlist> - <listitem><para>The <quote>accepted</quote> group. Patches that - the development team has submitted to the maintainer of the - Infiniband subsystem, and which he has accepted, but which - are not present in the snapshot that the tiny repository is - based on. These are <quote>read only</quote> patches, - present only to transform the tree into a similar state as - it is in the upstream maintainer's repository.</para> - </listitem> - <listitem><para>The <quote>rework</quote> group. Patches that I - have submitted, but that the upstream maintainer has - requested modifications to before he will accept - them.</para> - </listitem> - <listitem><para>The <quote>pending</quote> group. Patches that - I have not yet submitted to the upstream maintainer, but - which we have finished working on. These will be <quote>read - only</quote> for a while. If the upstream maintainer - accepts them upon submission, I'll move them to the end of - the <quote>accepted</quote> group. If he requests that I - modify any, I'll move them to the beginning of the - <quote>rework</quote> group.</para> - </listitem> - <listitem><para>The <quote>in progress</quote> group. Patches - that are actively being developed, and should not be - submitted anywhere yet.</para> - </listitem> - <listitem><para>The <quote>backport</quote> group. Patches that - adapt the source tree to older versions of the kernel - tree.</para> - </listitem> - <listitem><para>The <quote>do not ship</quote> group. Patches - that for some reason should never be submitted upstream. - For example, one such patch might change embedded driver - identification strings to make it easier to distinguish, in - the field, between an out-of-tree version of the driver and - a version shipped by a distribution vendor.</para> - </listitem></itemizedlist> - - <para>Now to return to the reasons for ordering groups of patches - in this way. We would like the lowest patches in the stack to - be as stable as possible, so that we will not need to rework - higher patches due to changes in context. Putting patches that - will never be changed first in the <filename - role="special">series</filename> file serves this - purpose.</para> - - <para>We would also like the patches that we know we'll need to - modify to be applied on top of a source tree that resembles the - upstream tree as closely as possible. This is why we keep - accepted patches around for a while.</para> - - <para>The <quote>backport</quote> and <quote>do not ship</quote> - patches float at the end of the <filename - role="special">series</filename> file. The backport patches - must be applied on top of all other patches, and the <quote>do - not ship</quote> patches might as well stay out of harm's - way.</para> - - </sect1> - <sect1> - <title>Maintaining the patch series</title> - - <para>In my work, I use a number of guards to control which - patches are to be applied.</para> - - <itemizedlist> - <listitem><para><quote>Accepted</quote> patches are guarded with - <literal>accepted</literal>. I enable this guard most of - the time. When I'm applying the patches on top of a tree - where the patches are already present, I can turn this patch - off, and the patches that follow it will apply - cleanly.</para> - </listitem> - <listitem><para>Patches that are <quote>finished</quote>, but - not yet submitted, have no guards. If I'm applying the - patch stack to a copy of the upstream tree, I don't need to - enable any guards in order to get a reasonably safe source - tree.</para> - </listitem> - <listitem><para>Those patches that need reworking before being - resubmitted are guarded with - <literal>rework</literal>.</para> - </listitem> - <listitem><para>For those patches that are still under - development, I use <literal>devel</literal>.</para> - </listitem> - <listitem><para>A backport patch may have several guards, one - for each version of the kernel to which it applies. For - example, a patch that backports a piece of code to 2.6.9 - will have a <literal>2.6.9</literal> guard.</para> - </listitem></itemizedlist> - <para>This variety of guards gives me considerable flexibility in - determining what kind of source tree I want to end up with. For - most situations, the selection of appropriate guards is - automated during the build process, but I can manually tune the - guards to use for less common circumstances.</para> - - <sect2> - <title>The art of writing backport patches</title> - - <para>Using MQ, writing a backport patch is a simple process. - All such a patch has to do is modify a piece of code that uses - a kernel feature not present in the older version of the - kernel, so that the driver continues to work correctly under - that older version.</para> - - <para>A useful goal when writing a good backport patch is to - make your code look as if it was written for the older version - of the kernel you're targeting. The less obtrusive the patch, - the easier it will be to understand and maintain. If you're - writing a collection of backport patches to avoid the - <quote>rat's nest</quote> effect of lots of - <literal>#ifdef</literal>s (hunks of source code that are only - used conditionally) in your code, don't introduce - version-dependent <literal>#ifdef</literal>s into the patches. - Instead, write several patches, each of which makes - unconditional changes, and control their application using - guards.</para> - - <para>There are two reasons to divide backport patches into a - distinct group, away from the <quote>regular</quote> patches - whose effects they modify. The first is that intermingling the - two makes it more difficult to use a tool like the <literal - role="hg-ext">patchbomb</literal> extension to automate the - process of submitting the patches to an upstream maintainer. - The second is that a backport patch could perturb the context - in which a subsequent regular patch is applied, making it - impossible to apply the regular patch cleanly - <emphasis>without</emphasis> the earlier backport patch - already being applied.</para> - - </sect2> - </sect1> - <sect1> - <title>Useful tips for developing with MQ</title> - - <sect2> - <title>Organising patches in directories</title> - - <para>If you're working on a substantial project with MQ, it's - not difficult to accumulate a large number of patches. For - example, I have one patch repository that contains over 250 - patches.</para> - - <para>If you can group these patches into separate logical - categories, you can if you like store them in different - directories; MQ has no problems with patch names that contain - path separators.</para> - - </sect2> - <sect2 id="mq-collab:tips:interdiff"> - <title>Viewing the history of a patch</title> - - <para>If you're developing a set of patches over a long time, - it's a good idea to maintain them in a repository, as - discussed in section <xref linkend="sec:mq:repo"/>. If you do - so, you'll quickly - discover that using the <command role="hg-cmd">hg - diff</command> command to look at the history of changes to - a patch is unworkable. This is in part because you're looking - at the second derivative of the real code (a diff of a diff), - but also because MQ adds noise to the process by modifying - time stamps and directory names when it updates a - patch.</para> - - <para>However, you can use the <literal - role="hg-ext">extdiff</literal> extension, which is bundled - with Mercurial, to turn a diff of two versions of a patch into - something readable. To do this, you will need a third-party - package called <literal role="package">patchutils</literal> - <citation>web:patchutils</citation>. This provides a command - named <command>interdiff</command>, which shows the - differences between two diffs as a diff. Used on two versions - of the same diff, it generates a diff that represents the diff - from the first to the second version.</para> - - <para>You can enable the <literal - role="hg-ext">extdiff</literal> extension in the usual way, - by adding a line to the <literal - role="rc-extensions">extensions</literal> section of your - <filename role="special">~/.hgrc</filename>.</para> - <programlisting>[extensions] -extdiff =</programlisting> - <para>The <command>interdiff</command> command expects to be - passed the names of two files, but the <literal - role="hg-ext">extdiff</literal> extension passes the program - it runs a pair of directories, each of which can contain an - arbitrary number of files. We thus need a small program that - will run <command>interdiff</command> on each pair of files in - these two directories. This program is available as <filename - role="special">hg-interdiff</filename> in the <filename - class="directory">examples</filename> directory of the - source code repository that accompanies this book. <!-- - &example.hg-interdiff; --></para> - - <para>With the <filename role="special">hg-interdiff</filename> - program in your shell's search path, you can run it as - follows, from inside an MQ patch directory:</para> - <programlisting>hg extdiff -p hg-interdiff -r A:B my-change.patch</programlisting> - <para>Since you'll probably want to use this long-winded command - a lot, you can get <literal role="hg-ext">hgext</literal> to - make it available as a normal Mercurial command, again by - editing your <filename - role="special">~/.hgrc</filename>.</para> - <programlisting>[extdiff] -cmd.interdiff = hg-interdiff</programlisting> - <para>This directs <literal role="hg-ext">hgext</literal> to - make an <literal>interdiff</literal> command available, so you - can now shorten the previous invocation of <command - role="hg-ext-extdiff">extdiff</command> to something a - little more wieldy.</para> - <programlisting>hg interdiff -r A:B my-change.patch</programlisting> - - <note> - <para> The <command>interdiff</command> command works well - only if the underlying files against which versions of a - patch are generated remain the same. If you create a patch, - modify the underlying files, and then regenerate the patch, - <command>interdiff</command> may not produce useful - output.</para> - </note> - - <para>The <literal role="hg-ext">extdiff</literal> extension is - useful for more than merely improving the presentation of MQ - patches. To read more about it, go to section <xref - linkend="sec:hgext:extdiff"/>.</para> - - </sect2> - </sect1> -</chapter> - -<!-- -local variables: -sgml-parent-document: ("00book.xml" "book" "chapter") -end: --->
--- a/en/ch14-hgext.xml Wed Mar 18 00:08:22 2009 -0700 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,554 +0,0 @@ -<!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : --> - -<chapter id="chap:hgext"> - <?dbhtml filename="adding-functionality-with-extensions.html"?> - <title>Adding functionality with extensions</title> - - <para>While the core of Mercurial is quite complete from a - functionality standpoint, it's deliberately shorn of fancy - features. This approach of preserving simplicity keeps the - software easy to deal with for both maintainers and users.</para> - - <para>However, Mercurial doesn't box you in with an inflexible - command set: you can add features to it as - <emphasis>extensions</emphasis> (sometimes known as - <emphasis>plugins</emphasis>). We've already discussed a few of - these extensions in earlier chapters.</para> - <itemizedlist> - <listitem><para>Section <xref linkend="sec:tour-merge:fetch"/> - covers the <literal role="hg-ext">fetch</literal> extension; - this combines pulling new changes and merging them with local - changes into a single command, <command - role="hg-ext-fetch">fetch</command>.</para> - </listitem> - <listitem><para>In chapter <xref linkend="chap:hook"/>, we covered - several extensions that are useful for hook-related - functionality: <literal role="hg-ext">acl</literal> adds - access control lists; <literal - role="hg-ext">bugzilla</literal> adds integration with the - Bugzilla bug tracking system; and <literal - role="hg-ext">notify</literal> sends notification emails on - new changes.</para> - </listitem> - <listitem><para>The Mercurial Queues patch management extension is - so invaluable that it merits two chapters and an appendix all - to itself. Chapter <xref linkend="chap:mq"/> covers the - basics; chapter <xref - linkend="chap:mq-collab"/> discusses advanced topics; - and appendix <xref linkend="chap:mqref"/> goes into detail on - each - command.</para> - </listitem></itemizedlist> - - <para>In this chapter, we'll cover some of the other extensions that - are available for Mercurial, and briefly touch on some of the - machinery you'll need to know about if you want to write an - extension of your own.</para> - <itemizedlist> - <listitem><para>In section <xref linkend="sec:hgext:inotify"/>, - we'll discuss the possibility of <emphasis>huge</emphasis> - performance improvements using the <literal - role="hg-ext">inotify</literal> extension.</para> - </listitem></itemizedlist> - - <sect1 id="sec:hgext:inotify"> - <title>Improve performance with the <literal - role="hg-ext">inotify</literal> extension</title> - - <para>Are you interested in having some of the most common - Mercurial operations run as much as a hundred times faster? - Read on!</para> - - <para>Mercurial has great performance under normal circumstances. - For example, when you run the <command role="hg-cmd">hg - status</command> command, Mercurial has to scan almost every - directory and file in your repository so that it can display - file status. Many other Mercurial commands need to do the same - work behind the scenes; for example, the <command - role="hg-cmd">hg diff</command> command uses the status - machinery to avoid doing an expensive comparison operation on - files that obviously haven't changed.</para> - - <para>Because obtaining file status is crucial to good - performance, the authors of Mercurial have optimised this code - to within an inch of its life. However, there's no avoiding the - fact that when you run <command role="hg-cmd">hg - status</command>, Mercurial is going to have to perform at - least one expensive system call for each managed file to - determine whether it's changed since the last time Mercurial - checked. For a sufficiently large repository, this can take a - long time.</para> - - <para>To put a number on the magnitude of this effect, I created a - repository containing 150,000 managed files. I timed <command - role="hg-cmd">hg status</command> as taking ten seconds to - run, even when <emphasis>none</emphasis> of those files had been - modified.</para> - - <para>Many modern operating systems contain a file notification - facility. If a program signs up to an appropriate service, the - operating system will notify it every time a file of interest is - created, modified, or deleted. On Linux systems, the kernel - component that does this is called - <literal>inotify</literal>.</para> - - <para>Mercurial's <literal role="hg-ext">inotify</literal> - extension talks to the kernel's <literal>inotify</literal> - component to optimise <command role="hg-cmd">hg status</command> - commands. The extension has two components. A daemon sits in - the background and receives notifications from the - <literal>inotify</literal> subsystem. It also listens for - connections from a regular Mercurial command. The extension - modifies Mercurial's behaviour so that instead of scanning the - filesystem, it queries the daemon. Since the daemon has perfect - information about the state of the repository, it can respond - with a result instantaneously, avoiding the need to scan every - directory and file in the repository.</para> - - <para>Recall the ten seconds that I measured plain Mercurial as - taking to run <command role="hg-cmd">hg status</command> on a - 150,000 file repository. With the <literal - role="hg-ext">inotify</literal> extension enabled, the time - dropped to 0.1 seconds, a factor of <emphasis>one - hundred</emphasis> faster.</para> - - <para>Before we continue, please pay attention to some - caveats.</para> - <itemizedlist> - <listitem><para>The <literal role="hg-ext">inotify</literal> - extension is Linux-specific. Because it interfaces directly - to the Linux kernel's <literal>inotify</literal> subsystem, - it does not work on other operating systems.</para> - </listitem> - <listitem><para>It should work on any Linux distribution that - was released after early 2005. Older distributions are - likely to have a kernel that lacks - <literal>inotify</literal>, or a version of - <literal>glibc</literal> that does not have the necessary - interfacing support.</para> - </listitem> - <listitem><para>Not all filesystems are suitable for use with - the <literal role="hg-ext">inotify</literal> extension. - Network filesystems such as NFS are a non-starter, for - example, particularly if you're running Mercurial on several - systems, all mounting the same network filesystem. The - kernel's <literal>inotify</literal> system has no way of - knowing about changes made on another system. Most local - filesystems (e.g. ext3, XFS, ReiserFS) should work - fine.</para> - </listitem></itemizedlist> - - <para>The <literal role="hg-ext">inotify</literal> extension is - not yet shipped with Mercurial as of May 2007, so it's a little - more involved to set up than other extensions. But the - performance improvement is worth it!</para> - - <para>The extension currently comes in two parts: a set of patches - to the Mercurial source code, and a library of Python bindings - to the <literal>inotify</literal> subsystem.</para> - <note> - <para> There are <emphasis>two</emphasis> Python - <literal>inotify</literal> binding libraries. One of them is - called <literal>pyinotify</literal>, and is packaged by some - Linux distributions as <literal>python-inotify</literal>. - This is <emphasis>not</emphasis> the one you'll need, as it is - too buggy and inefficient to be practical.</para> - </note> - <para>To get going, it's best to already have a functioning copy - of Mercurial installed.</para> - <note> - <para> If you follow the instructions below, you'll be - <emphasis>replacing</emphasis> and overwriting any existing - installation of Mercurial that you might already have, using - the latest <quote>bleeding edge</quote> Mercurial code. Don't - say you weren't warned!</para> - </note> - <orderedlist> - <listitem><para>Clone the Python <literal>inotify</literal> - binding repository. Build and install it.</para> - <programlisting>hg clone http://hg.kublai.com/python/inotify -cd inotify -python setup.py build --force -sudo python setup.py install --skip-build</programlisting> - </listitem> - <listitem><para>Clone the <filename - class="directory">crew</filename> Mercurial repository. - Clone the <literal role="hg-ext">inotify</literal> patch - repository so that Mercurial Queues will be able to apply - patches to your cope of the <filename - class="directory">crew</filename> repository.</para> - <programlisting>hg clone http://hg.intevation.org/mercurial/crew -hg clone crew inotify -hg clone http://hg.kublai.com/mercurial/patches/inotify inotify/.hg/patches</programlisting> - </listitem> - <listitem><para>Make sure that you have the Mercurial Queues - extension, <literal role="hg-ext">mq</literal>, enabled. If - you've never used MQ, read section <xref - linkend="sec:mq:start"/> to get started - quickly.</para> - </listitem> - <listitem><para>Go into the <filename - class="directory">inotify</filename> repo, and apply all - of the <literal role="hg-ext">inotify</literal> patches - using the <option role="hg-ext-mq-cmd-qpush-opt">hg - -a</option> option to the <command - role="hg-ext-mq">qpush</command> command.</para> - <programlisting>cd inotify -hg qpush -a</programlisting> - </listitem> - <listitem><para> If you get an error message from <command - role="hg-ext-mq">qpush</command>, you should not continue. - Instead, ask for help.</para> - </listitem> - <listitem><para>Build and install the patched version of - Mercurial.</para> - <programlisting>python setup.py build --force -sudo python setup.py install --skip-build</programlisting> - </listitem> - </orderedlist> - <para>Once you've build a suitably patched version of Mercurial, - all you need to do to enable the <literal - role="hg-ext">inotify</literal> extension is add an entry to - your <filename role="special">~/.hgrc</filename>.</para> - <programlisting>[extensions] inotify =</programlisting> - <para>When the <literal role="hg-ext">inotify</literal> extension - is enabled, Mercurial will automatically and transparently start - the status daemon the first time you run a command that needs - status in a repository. It runs one status daemon per - repository.</para> - - <para>The status daemon is started silently, and runs in the - background. If you look at a list of running processes after - you've enabled the <literal role="hg-ext">inotify</literal> - extension and run a few commands in different repositories, - you'll thus see a few <literal>hg</literal> processes sitting - around, waiting for updates from the kernel and queries from - Mercurial.</para> - - <para>The first time you run a Mercurial command in a repository - when you have the <literal role="hg-ext">inotify</literal> - extension enabled, it will run with about the same performance - as a normal Mercurial command. This is because the status - daemon needs to perform a normal status scan so that it has a - baseline against which to apply later updates from the kernel. - However, <emphasis>every</emphasis> subsequent command that does - any kind of status check should be noticeably faster on - repositories of even fairly modest size. Better yet, the bigger - your repository is, the greater a performance advantage you'll - see. The <literal role="hg-ext">inotify</literal> daemon makes - status operations almost instantaneous on repositories of all - sizes!</para> - - <para>If you like, you can manually start a status daemon using - the <command role="hg-ext-inotify">inserve</command> command. - This gives you slightly finer control over how the daemon ought - to run. This command will of course only be available when the - <literal role="hg-ext">inotify</literal> extension is - enabled.</para> - - <para>When you're using the <literal - role="hg-ext">inotify</literal> extension, you should notice - <emphasis>no difference at all</emphasis> in Mercurial's - behaviour, with the sole exception of status-related commands - running a whole lot faster than they used to. You should - specifically expect that commands will not print different - output; neither should they give different results. If either of - these situations occurs, please report a bug.</para> - - </sect1> - <sect1 id="sec:hgext:extdiff"> - <title>Flexible diff support with the <literal - role="hg-ext">extdiff</literal> extension</title> - - <para>Mercurial's built-in <command role="hg-cmd">hg - diff</command> command outputs plaintext unified diffs.</para> - - &interaction.extdiff.diff; - - <para>If you would like to use an external tool to display - modifications, you'll want to use the <literal - role="hg-ext">extdiff</literal> extension. This will let you - use, for example, a graphical diff tool.</para> - - <para>The <literal role="hg-ext">extdiff</literal> extension is - bundled with Mercurial, so it's easy to set up. In the <literal - role="rc-extensions">extensions</literal> section of your - <filename role="special">~/.hgrc</filename>, simply add a - one-line entry to enable the extension.</para> - <programlisting>[extensions] -extdiff =</programlisting> - <para>This introduces a command named <command - role="hg-ext-extdiff">extdiff</command>, which by default uses - your system's <command>diff</command> command to generate a - unified diff in the same form as the built-in <command - role="hg-cmd">hg diff</command> command.</para> - - &interaction.extdiff.extdiff; - - <para>The result won't be exactly the same as with the built-in - <command role="hg-cmd">hg diff</command> variations, because the - output of <command>diff</command> varies from one system to - another, even when passed the same options.</para> - - <para>As the <quote><literal>making snapshot</literal></quote> - lines of output above imply, the <command - role="hg-ext-extdiff">extdiff</command> command works by - creating two snapshots of your source tree. The first snapshot - is of the source revision; the second, of the target revision or - working directory. The <command - role="hg-ext-extdiff">extdiff</command> command generates - these snapshots in a temporary directory, passes the name of - each directory to an external diff viewer, then deletes the - temporary directory. For efficiency, it only snapshots the - directories and files that have changed between the two - revisions.</para> - - <para>Snapshot directory names have the same base name as your - repository. If your repository path is <filename - class="directory">/quux/bar/foo</filename>, then <filename - class="directory">foo</filename> will be the name of each - snapshot directory. Each snapshot directory name has its - changeset ID appended, if appropriate. If a snapshot is of - revision <literal>a631aca1083f</literal>, the directory will be - named <filename class="directory">foo.a631aca1083f</filename>. - A snapshot of the working directory won't have a changeset ID - appended, so it would just be <filename - class="directory">foo</filename> in this example. To see what - this looks like in practice, look again at the <command - role="hg-ext-extdiff">extdiff</command> example above. Notice - that the diff has the snapshot directory names embedded in its - header.</para> - - <para>The <command role="hg-ext-extdiff">extdiff</command> command - accepts two important options. The <option - role="hg-ext-extdiff-cmd-extdiff-opt">hg -p</option> option - lets you choose a program to view differences with, instead of - <command>diff</command>. With the <option - role="hg-ext-extdiff-cmd-extdiff-opt">hg -o</option> option, - you can change the options that <command - role="hg-ext-extdiff">extdiff</command> passes to the program - (by default, these options are - <quote><literal>-Npru</literal></quote>, which only make sense - if you're running <command>diff</command>). In other respects, - the <command role="hg-ext-extdiff">extdiff</command> command - acts similarly to the built-in <command role="hg-cmd">hg - diff</command> command: you use the same option names, syntax, - and arguments to specify the revisions you want, the files you - want, and so on.</para> - - <para>As an example, here's how to run the normal system - <command>diff</command> command, getting it to generate context - diffs (using the <option role="cmd-opt-diff">-c</option> option) - instead of unified diffs, and five lines of context instead of - the default three (passing <literal>5</literal> as the argument - to the <option role="cmd-opt-diff">-C</option> option).</para> - - &interaction.extdiff.extdiff-ctx; - - <para>Launching a visual diff tool is just as easy. Here's how to - launch the <command>kdiff3</command> viewer.</para> - <programlisting>hg extdiff -p kdiff3 -o</programlisting> - - <para>If your diff viewing command can't deal with directories, - you can easily work around this with a little scripting. For an - example of such scripting in action with the <literal - role="hg-ext">mq</literal> extension and the - <command>interdiff</command> command, see section <xref - linkend="mq-collab:tips:interdiff"/>.</para> - - <sect2> - <title>Defining command aliases</title> - - <para>It can be cumbersome to remember the options to both the - <command role="hg-ext-extdiff">extdiff</command> command and - the diff viewer you want to use, so the <literal - role="hg-ext">extdiff</literal> extension lets you define - <emphasis>new</emphasis> commands that will invoke your diff - viewer with exactly the right options.</para> - - <para>All you need to do is edit your <filename - role="special">~/.hgrc</filename>, and add a section named - <literal role="rc-extdiff">extdiff</literal>. Inside this - section, you can define multiple commands. Here's how to add - a <literal>kdiff3</literal> command. Once you've defined - this, you can type <quote><literal>hg kdiff3</literal></quote> - and the <literal role="hg-ext">extdiff</literal> extension - will run <command>kdiff3</command> for you.</para> - <programlisting>[extdiff] -cmd.kdiff3 =</programlisting> - <para>If you leave the right hand side of the definition empty, - as above, the <literal role="hg-ext">extdiff</literal> - extension uses the name of the command you defined as the name - of the external program to run. But these names don't have to - be the same. Here, we define a command named - <quote><literal>hg wibble</literal></quote>, which runs - <command>kdiff3</command>.</para> - <programlisting>[extdiff] - cmd.wibble = kdiff3</programlisting> - - <para>You can also specify the default options that you want to - invoke your diff viewing program with. The prefix to use is - <quote><literal>opts.</literal></quote>, followed by the name - of the command to which the options apply. This example - defines a <quote><literal>hg vimdiff</literal></quote> command - that runs the <command>vim</command> editor's - <literal>DirDiff</literal> extension.</para> - <programlisting>[extdiff] - cmd.vimdiff = vim -opts.vimdiff = -f '+next' '+execute "DirDiff" argv(0) argv(1)'</programlisting> - - </sect2> - </sect1> - <sect1 id="sec:hgext:transplant"> - <title>Cherrypicking changes with the <literal - role="hg-ext">transplant</literal> extension</title> - - <para>Need to have a long chat with Brendan about this.</para> - - </sect1> - <sect1 id="sec:hgext:patchbomb"> - <title>Send changes via email with the <literal - role="hg-ext">patchbomb</literal> extension</title> - - <para>Many projects have a culture of <quote>change - review</quote>, in which people send their modifications to a - mailing list for others to read and comment on before they - commit the final version to a shared repository. Some projects - have people who act as gatekeepers; they apply changes from - other people to a repository to which those others don't have - access.</para> - - <para>Mercurial makes it easy to send changes over email for - review or application, via its <literal - role="hg-ext">patchbomb</literal> extension. The extension is - so named because changes are formatted as patches, and it's usual - to send one changeset per email message. Sending a long series - of changes by email is thus much like <quote>bombing</quote> the - recipient's inbox, hence <quote>patchbomb</quote>.</para> - - <para>As usual, the basic configuration of the <literal - role="hg-ext">patchbomb</literal> extension takes just one or - two lines in your <filename role="special"> - /.hgrc</filename>.</para> - <programlisting>[extensions] -patchbomb =</programlisting> - <para>Once you've enabled the extension, you will have a new - command available, named <command - role="hg-ext-patchbomb">email</command>.</para> - - <para>The safest and best way to invoke the <command - role="hg-ext-patchbomb">email</command> command is to - <emphasis>always</emphasis> run it first with the <option - role="hg-ext-patchbomb-cmd-email-opt">hg -n</option> option. - This will show you what the command <emphasis>would</emphasis> - send, without actually sending anything. Once you've had a - quick glance over the changes and verified that you are sending - the right ones, you can rerun the same command, with the <option - role="hg-ext-patchbomb-cmd-email-opt">hg -n</option> option - removed.</para> - - <para>The <command role="hg-ext-patchbomb">email</command> command - accepts the same kind of revision syntax as every other - Mercurial command. For example, this command will send every - revision between 7 and <literal>tip</literal>, inclusive.</para> - <programlisting>hg email -n 7:tip</programlisting> - <para>You can also specify a <emphasis>repository</emphasis> to - compare with. If you provide a repository but no revisions, the - <command role="hg-ext-patchbomb">email</command> command will - send all revisions in the local repository that are not present - in the remote repository. If you additionally specify revisions - or a branch name (the latter using the <option - role="hg-ext-patchbomb-cmd-email-opt">hg -b</option> option), - this will constrain the revisions sent.</para> - - <para>It's perfectly safe to run the <command - role="hg-ext-patchbomb">email</command> command without the - names of the people you want to send to: if you do this, it will - just prompt you for those values interactively. (If you're - using a Linux or Unix-like system, you should have enhanced - <literal>readline</literal>-style editing capabilities when - entering those headers, too, which is useful.)</para> - - <para>When you are sending just one revision, the <command - role="hg-ext-patchbomb">email</command> command will by - default use the first line of the changeset description as the - subject of the single email message it sends.</para> - - <para>If you send multiple revisions, the <command - role="hg-ext-patchbomb">email</command> command will usually - send one message per changeset. It will preface the series with - an introductory message, in which you should describe the - purpose of the series of changes you're sending.</para> - - <sect2> - <title>Changing the behaviour of patchbombs</title> - - <para>Not every project has exactly the same conventions for - sending changes in email; the <literal - role="hg-ext">patchbomb</literal> extension tries to - accommodate a number of variations through command line - options.</para> - <itemizedlist> - <listitem><para>You can write a subject for the introductory - message on the command line using the <option - role="hg-ext-patchbomb-cmd-email-opt">hg -s</option> - option. This takes one argument, the text of the subject - to use.</para> - </listitem> - <listitem><para>To change the email address from which the - messages originate, use the <option - role="hg-ext-patchbomb-cmd-email-opt">hg -f</option> - option. This takes one argument, the email address to - use.</para> - </listitem> - <listitem><para>The default behaviour is to send unified diffs - (see section <xref linkend="sec:mq:patch"/> for a - description of the - format), one per message. You can send a binary bundle - instead with the <option - role="hg-ext-patchbomb-cmd-email-opt">hg -b</option> - option.</para> - </listitem> - <listitem><para>Unified diffs are normally prefaced with a - metadata header. You can omit this, and send unadorned - diffs, with the <option - role="hg-ext-patchbomb-cmd-email-opt">hg - --plain</option> option.</para> - </listitem> - <listitem><para>Diffs are normally sent <quote>inline</quote>, - in the same body part as the description of a patch. This - makes it easiest for the largest number of readers to - quote and respond to parts of a diff, as some mail clients - will only quote the first MIME body part in a message. If - you'd prefer to send the description and the diff in - separate body parts, use the <option - role="hg-ext-patchbomb-cmd-email-opt">hg -a</option> - option.</para> - </listitem> - <listitem><para>Instead of sending mail messages, you can - write them to an <literal>mbox</literal>-format mail - folder using the <option - role="hg-ext-patchbomb-cmd-email-opt">hg -m</option> - option. That option takes one argument, the name of the - file to write to.</para> - </listitem> - <listitem><para>If you would like to add a - <command>diffstat</command>-format summary to each patch, - and one to the introductory message, use the <option - role="hg-ext-patchbomb-cmd-email-opt">hg -d</option> - option. The <command>diffstat</command> command displays - a table containing the name of each file patched, the - number of lines affected, and a histogram showing how much - each file is modified. This gives readers a qualitative - glance at how complex a patch is.</para> - </listitem></itemizedlist> - - </sect2> - </sect1> -</chapter> - -<!-- -local variables: -sgml-parent-document: ("00book.xml" "book" "chapter") -end: --->