Mercurial: The Definitive Guide

# HG changeset patch # User Bryan O'Sullivan # Date 1233902748 28800 # Node ID 863a82f13901e1b6bdd2817e6517550a7a3f66df # Parent cf006cabe489e775bbdbdb3845bb5625b779785b Basic progress on XML. diff -r cf006cabe489 -r 863a82f13901 en/00book.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/en/00book.xml Thu Feb 05 22:45:48 2009 -0800 @@ -0,0 +1,43 @@ + + + + + + + + + + + + +%SHORTCUTS; +]> + + + Mercurial: The Definitive Guide + + + + Bryan + O'Sullivan + + + + + Mike + Loukides + + + + 2007 + 2008 + Bryan O'Sullivan + + + + &ch01; + &ch02; + diff -r cf006cabe489 -r 863a82f13901 en/ch01-intro.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/en/ch01-intro.xml Thu Feb 05 22:45:48 2009 -0800 @@ -0,0 +1,680 @@ + + + + Introduction + \label{chap:intro} + + + About revision control + + Revision control is the process of managing multiple + versions of a piece of information. In its simplest form, this + is something that many people do by hand: every time you modify + a file, save it under a new name that contains a number, each + one higher than the number of the preceding version. + + Manually managing multiple versions of even a single file is + an error-prone task, though, so software tools to help automate + this process have long been available. The earliest automated + revision control tools were intended to help a single user to + manage revisions of a single file. Over the past few decades, + the scope of revision control tools has expanded greatly; they + now manage multiple files, and help multiple people to work + together. The best modern revision control tools have no + problem coping with thousands of people working together on + projects that consist of hundreds of thousands of files. + + + Why use revision control? + + There are a number of reasons why you or your team might + want to use an automated revision control tool for a + project. + + It will track the history and evolution of + your project, so you don't have to. For every change, + you'll have a log of who made it; + why they made it; + when they made it; and + what the change + was. + When you're working with other people, + revision control software makes it easier for you to + collaborate. For example, when people more or less + simultaneously make potentially incompatible changes, the + software will help you to identify and resolve those + conflicts. + It can help you to recover from mistakes. If + you make a change that later turns out to be in error, you + can revert to an earlier version of one or more files. In + fact, a really good revision control + tool will even help you to efficiently figure out exactly + when a problem was introduced (see section for details). + It will help you to work simultaneously on, + and manage the drift between, multiple versions of your + project. + Most of these reasons are equally valid---at least in + theory---whether you're working on a project by yourself, or + with a hundred other people. + + A key question about the practicality of revision control + at these two different scales (lone hacker and + huge team) is how its + benefits compare to its + costs. A revision control tool that's + difficult to understand or use is going to impose a high + cost. + + A five-hundred-person project is likely to collapse under + its own weight almost immediately without a revision control + tool and process. In this case, the cost of using revision + control might hardly seem worth considering, since + without it, failure is almost + guaranteed. + + On the other hand, a one-person quick hack + might seem like a poor place to use a revision control tool, + because surely the cost of using one must be close to the + overall cost of the project. Right? + + Mercurial uniquely supports both of + these scales of development. You can learn the basics in just + a few minutes, and due to its low overhead, you can apply + revision control to the smallest of projects with ease. Its + simplicity means you won't have a lot of abstruse concepts or + command sequences competing for mental space with whatever + you're really trying to do. At the same + time, Mercurial's high performance and peer-to-peer nature let + you scale painlessly to handle large projects. + + No revision control tool can rescue a poorly run project, + but a good choice of tools can make a huge difference to the + fluidity with which you can work on a project. + + + + The many names of revision control + + Revision control is a diverse field, so much so that it + doesn't actually have a single name or acronym. Here are a + few of the more common names and acronyms you'll + encounter: + + Revision control (RCS) + Software configuration management (SCM), or + configuration management + Source code management + Source code control, or source + control + Version control + (VCS) + Some people claim that these terms actually have different + meanings, but in practice they overlap so much that there's no + agreed or even useful way to tease them apart. + + + + + A short history of revision control + + The best known of the old-time revision control tools is + SCCS (Source Code Control System), which Marc Rochkind wrote at + Bell Labs, in the early 1970s. SCCS operated on individual + files, and required every person working on a project to have + access to a shared workspace on a single system. Only one + person could modify a file at any time; arbitration for access + to files was via locks. It was common for people to lock files, + and later forget to unlock them, preventing anyone else from + modifying those files without the help of an + administrator. + + Walter Tichy developed a free alternative to SCCS in the + early 1980s; he called his program RCS (Revison Control System). + Like SCCS, RCS required developers to work in a single shared + workspace, and to lock files to prevent multiple people from + modifying them simultaneously. + + Later in the 1980s, Dick Grune used RCS as a building block + for a set of shell scripts he initially called cmt, but then + renamed to CVS (Concurrent Versions System). The big innovation + of CVS was that it let developers work simultaneously and + somewhat independently in their own personal workspaces. The + personal workspaces prevented developers from stepping on each + other's toes all the time, as was common with SCCS and RCS. Each + developer had a copy of every project file, and could modify + their copies independently. They had to merge their edits prior + to committing changes to the central repository. + + Brian Berliner took Grune's original scripts and rewrote + them in C, releasing in 1989 the code that has since developed + into the modern version of CVS. CVS subsequently acquired the + ability to operate over a network connection, giving it a + client/server architecture. CVS's architecture is centralised; + only the server has a copy of the history of the project. Client + workspaces just contain copies of recent versions of the + project's files, and a little metadata to tell them where the + server is. CVS has been enormously successful; it is probably + the world's most widely used revision control system. + + In the early 1990s, Sun Microsystems developed an early + distributed revision control system, called TeamWare. A + TeamWare workspace contains a complete copy of the project's + history. TeamWare has no notion of a central repository. (CVS + relied upon RCS for its history storage; TeamWare used + SCCS.) + + As the 1990s progressed, awareness grew of a number of + problems with CVS. It records simultaneous changes to multiple + files individually, instead of grouping them together as a + single logically atomic operation. It does not manage its file + hierarchy well; it is easy to make a mess of a repository by + renaming files and directories. Worse, its source code is + difficult to read and maintain, which made the pain + level of fixing these architectural problems + prohibitive. + + In 2001, Jim Blandy and Karl Fogel, two developers who had + worked on CVS, started a project to replace it with a tool that + would have a better architecture and cleaner code. The result, + Subversion, does not stray from CVS's centralised client/server + model, but it adds multi-file atomic commits, better namespace + management, and a number of other features that make it a + generally better tool than CVS. Since its initial release, it + has rapidly grown in popularity. + + More or less simultaneously, Graydon Hoare began working on + an ambitious distributed revision control system that he named + Monotone. While Monotone addresses many of CVS's design flaws + and has a peer-to-peer architecture, it goes beyond earlier (and + subsequent) revision control tools in a number of innovative + ways. It uses cryptographic hashes as identifiers, and has an + integral notion of trust for code from different + sources. + + Mercurial began life in 2005. While a few aspects of its + design are influenced by Monotone, Mercurial focuses on ease of + use, high performance, and scalability to very large + projects. + + + + Trends in revision control + + There has been an unmistakable trend in the development and + use of revision control tools over the past four decades, as + people have become familiar with the capabilities of their tools + and constrained by their limitations. + + The first generation began by managing single files on + individual computers. Although these tools represented a huge + advance over ad-hoc manual revision control, their locking model + and reliance on a single computer limited them to small, + tightly-knit teams. + + The second generation loosened these constraints by moving + to network-centered architectures, and managing entire projects + at a time. As projects grew larger, they ran into new problems. + With clients needing to talk to servers very frequently, server + scaling became an issue for large projects. An unreliable + network connection could prevent remote users from being able to + talk to the server at all. As open source projects started + making read-only access available anonymously to anyone, people + without commit privileges found that they could not use the + tools to interact with a project in a natural way, as they could + not record their changes. + + The current generation of revision control tools is + peer-to-peer in nature. All of these systems have dropped the + dependency on a single central server, and allow people to + distribute their revision control data to where it's actually + needed. Collaboration over the Internet has moved from + constrained by technology to a matter of choice and consensus. + Modern tools can operate offline indefinitely and autonomously, + with a network connection only needed when syncing changes with + another repository. + + + + A few of the advantages of distributed revision + control + + Even though distributed revision control tools have for + several years been as robust and usable as their + previous-generation counterparts, people using older tools have + not yet necessarily woken up to their advantages. There are a + number of ways in which distributed tools shine relative to + centralised ones. + + For an individual developer, distributed tools are almost + always much faster than centralised tools. This is for a simple + reason: a centralised tool needs to talk over the network for + many common operations, because most metadata is stored in a + single copy on the central server. A distributed tool stores + all of its metadata locally. All else being equal, talking over + the network adds overhead to a centralised tool. Don't + underestimate the value of a snappy, responsive tool: you're + going to spend a lot of time interacting with your revision + control software. + + Distributed tools are indifferent to the vagaries of your + server infrastructure, again because they replicate metadata to + so many locations. If you use a centralised system and your + server catches fire, you'd better hope that your backup media + are reliable, and that your last backup was recent and actually + worked. With a distributed tool, you have many backups + available on every contributor's computer. + + The reliability of your network will affect distributed + tools far less than it will centralised tools. You can't even + use a centralised tool without a network connection, except for + a few highly constrained commands. With a distributed tool, if + your network connection goes down while you're working, you may + not even notice. The only thing you won't be able to do is talk + to repositories on other computers, something that is relatively + rare compared with local operations. If you have a far-flung + team of collaborators, this may be significant. + + + Advantages for open source projects + + If you take a shine to an open source project and decide + that you would like to start hacking on it, and that project + uses a distributed revision control tool, you are at once a + peer with the people who consider themselves the + core of that project. If they publish their + repositories, you can immediately copy their project history, + start making changes, and record your work, using the same + tools in the same ways as insiders. By contrast, with a + centralised tool, you must use the software in a read + only mode unless someone grants you permission to + commit changes to their central server. Until then, you won't + be able to record changes, and your local modifications will + be at risk of corruption any time you try to update your + client's view of the repository. + + + The forking non-problem + + It has been suggested that distributed revision control + tools pose some sort of risk to open source projects because + they make it easy to fork the development of + a project. A fork happens when there are differences in + opinion or attitude between groups of developers that cause + them to decide that they can't work together any longer. + Each side takes a more or less complete copy of the + project's source code, and goes off in its own + direction. + + Sometimes the camps in a fork decide to reconcile their + differences. With a centralised revision control system, the + technical process of reconciliation is + painful, and has to be performed largely by hand. You have + to decide whose revision history is going to + win, and graft the other team's changes into + the tree somehow. This usually loses some or all of one + side's revision history. + + What distributed tools do with respect to forking is + they make forking the only way to + develop a project. Every single change that you make is + potentially a fork point. The great strength of this + approach is that a distributed revision control tool has to + be really good at merging forks, + because forks are absolutely fundamental: they happen all + the time. + + If every piece of work that everybody does, all the + time, is framed in terms of forking and merging, then what + the open source world refers to as a fork + becomes purely a social issue. If + anything, distributed tools lower the + likelihood of a fork: + + They eliminate the social distinction that + centralised tools impose: that between insiders (people + with commit access) and outsiders (people + without). + They make it easier to reconcile after a + social fork, because all that's involved from the + perspective of the revision control software is just + another merge. + + Some people resist distributed tools because they want + to retain tight control over their projects, and they + believe that centralised tools give them this control. + However, if you're of this belief, and you publish your CVS + or Subversion repositories publically, there are plenty of + tools available that can pull out your entire project's + history (albeit slowly) and recreate it somewhere that you + don't control. So while your control in this case is + illusory, you are forgoing the ability to fluidly + collaborate with whatever people feel compelled to mirror + and fork your history. + + + + + Advantages for commercial projects + + Many commercial projects are undertaken by teams that are + scattered across the globe. Contributors who are far from a + central server will see slower command execution and perhaps + less reliability. Commercial revision control systems attempt + to ameliorate these problems with remote-site replication + add-ons that are typically expensive to buy and cantankerous + to administer. A distributed system doesn't suffer from these + problems in the first place. Better yet, you can easily set + up multiple authoritative servers, say one per site, so that + there's no redundant communication between repositories over + expensive long-haul network links. + + Centralised revision control systems tend to have + relatively low scalability. It's not unusual for an expensive + centralised system to fall over under the combined load of + just a few dozen concurrent users. Once again, the typical + response tends to be an expensive and clunky replication + facility. Since the load on a central server---if you have + one at all---is many times lower with a distributed tool + (because all of the data is replicated everywhere), a single + cheap server can handle the needs of a much larger team, and + replication to balance load becomes a simple matter of + scripting. + + If you have an employee in the field, troubleshooting a + problem at a customer's site, they'll benefit from distributed + revision control. The tool will let them generate custom + builds, try different fixes in isolation from each other, and + search efficiently through history for the sources of bugs and + regressions in the customer's environment, all without needing + to connect to your company's network. + + + + + Why choose Mercurial? + + Mercurial has a unique set of properties that make it a + particularly good choice as a revision control system. + + It is easy to learn and use. + It is lightweight. + It scales excellently. + It is easy to + customise. + + If you are at all familiar with revision control systems, + you should be able to get up and running with Mercurial in less + than five minutes. Even if not, it will take no more than a few + minutes longer. Mercurial's command and feature sets are + generally uniform and consistent, so you can keep track of a few + general rules instead of a host of exceptions. + + On a small project, you can start working with Mercurial in + moments. Creating new changes and branches; transferring changes + around (whether locally or over a network); and history and + status operations are all fast. Mercurial attempts to stay + nimble and largely out of your way by combining low cognitive + overhead with blazingly fast operations. + + The usefulness of Mercurial is not limited to small + projects: it is used by projects with hundreds to thousands of + contributors, each containing tens of thousands of files and + hundreds of megabytes of source code. + + If the core functionality of Mercurial is not enough for + you, it's easy to build on. Mercurial is well suited to + scripting tasks, and its clean internals and implementation in + Python make it easy to add features in the form of extensions. + There are a number of popular and useful extensions already + available, ranging from helping to identify bugs to improving + performance. + + + + Mercurial compared with other tools + + Before you read on, please understand that this section + necessarily reflects my own experiences, interests, and (dare I + say it) biases. I have used every one of the revision control + tools listed below, in most cases for several years at a + time. + + + + Subversion + + Subversion is a popular revision control tool, developed + to replace CVS. It has a centralised client/server + architecture. + + Subversion and Mercurial have similarly named commands for + performing the same operations, so if you're familiar with + one, it is easy to learn to use the other. Both tools are + portable to all popular operating systems. + + Prior to version 1.5, Subversion had no useful support for + merges. At the time of writing, its merge tracking capability + is new, and known to be complicated + and buggy. + + Mercurial has a substantial performance advantage over + Subversion on every revision control operation I have + benchmarked. I have measured its advantage as ranging from a + factor of two to a factor of six when compared with Subversion + 1.4.3's ra_local file store, which is the + fastest access method available. In more realistic + deployments involving a network-based store, Subversion will + be at a substantially larger disadvantage. Because many + Subversion commands must talk to the server and Subversion + does not have useful replication facilities, server capacity + and network bandwidth become bottlenecks for modestly large + projects. + + Additionally, Subversion incurs substantial storage + overhead to avoid network transactions for a few common + operations, such as finding modified files + (status) and displaying modifications + against the current revision (diff). As a + result, a Subversion working copy is often the same size as, + or larger than, a Mercurial repository and working directory, + even though the Mercurial repository contains a complete + history of the project. + + Subversion is widely supported by third party tools. + Mercurial currently lags considerably in this area. This gap + is closing, however, and indeed some of Mercurial's GUI tools + now outshine their Subversion equivalents. Like Mercurial, + Subversion has an excellent user manual. + + Because Subversion doesn't store revision history on the + client, it is well suited to managing projects that deal with + lots of large, opaque binary files. If you check in fifty + revisions to an incompressible 10MB file, Subversion's + client-side space usage stays constant The space used by any + distributed SCM will grow rapidly in proportion to the number + of revisions, because the differences between each revision + are large. + + In addition, it's often difficult or, more usually, + impossible to merge different versions of a binary file. + Subversion's ability to let a user lock a file, so that they + temporarily have the exclusive right to commit changes to it, + can be a significant advantage to a project where binary files + are widely used. + + Mercurial can import revision history from a Subversion + repository. It can also export revision history to a + Subversion repository. This makes it easy to test the + waters and use Mercurial and Subversion in parallel + before deciding to switch. History conversion is incremental, + so you can perform an initial conversion, then small + additional conversions afterwards to bring in new + changes. + + + + + Git + + Git is a distributed revision control tool that was + developed for managing the Linux kernel source tree. Like + Mercurial, its early design was somewhat influenced by + Monotone. + + Git has a very large command set, with version 1.5.0 + providing 139 individual commands. It has something of a + reputation for being difficult to learn. Compared to Git, + Mercurial has a strong focus on simplicity. + + In terms of performance, Git is extremely fast. In + several cases, it is faster than Mercurial, at least on Linux, + while Mercurial performs better on other operations. However, + on Windows, the performance and general level of support that + Git provides is, at the time of writing, far behind that of + Mercurial. + + While a Mercurial repository needs no maintenance, a Git + repository requires frequent manual repacks of + its metadata. Without these, performance degrades, while + space usage grows rapidly. A server that contains many Git + repositories that are not rigorously and frequently repacked + will become heavily disk-bound during backups, and there have + been instances of daily backups taking far longer than 24 + hours as a result. A freshly packed Git repository is + slightly smaller than a Mercurial repository, but an unpacked + repository is several orders of magnitude larger. + + The core of Git is written in C. Many Git commands are + implemented as shell or Perl scripts, and the quality of these + scripts varies widely. I have encountered several instances + where scripts charged along blindly in the presence of errors + that should have been fatal. + + Mercurial can import revision history from a Git + repository. + + + + + CVS + + CVS is probably the most widely used revision control tool + in the world. Due to its age and internal untidiness, it has + been only lightly maintained for many years. + + It has a centralised client/server architecture. It does + not group related file changes into atomic commits, making it + easy for people to break the build: one person + can successfully commit part of a change and then be blocked + by the need for a merge, causing other people to see only a + portion of the work they intended to do. This also affects + how you work with project history. If you want to see all of + the modifications someone made as part of a task, you will + need to manually inspect the descriptions and timestamps of + the changes made to each file involved (if you even know what + those files were). + + CVS has a muddled notion of tags and branches that I will + not attempt to even describe. It does not support renaming of + files or directories well, making it easy to corrupt a + repository. It has almost no internal consistency checking + capabilities, so it is usually not even possible to tell + whether or how a repository is corrupt. I would not recommend + CVS for any project, existing or new. + + Mercurial can import CVS revision history. However, there + are a few caveats that apply; these are true of every other + revision control tool's CVS importer, too. Due to CVS's lack + of atomic changes and unversioned filesystem hierarchy, it is + not possible to reconstruct CVS history completely accurately; + some guesswork is involved, and renames will usually not show + up. Because a lot of advanced CVS administration has to be + done by hand and is hence error-prone, it's common for CVS + importers to run into multiple problems with corrupted + repositories (completely bogus revision timestamps and files + that have remained locked for over a decade are just two of + the less interesting problems I can recall from personal + experience). + + Mercurial can import revision history from a CVS + repository. + + + + + Commercial tools + + Perforce has a centralised client/server architecture, + with no client-side caching of any data. Unlike modern + revision control tools, Perforce requires that a user run a + command to inform the server about every file they intend to + edit. + + The performance of Perforce is quite good for small teams, + but it falls off rapidly as the number of users grows beyond a + few dozen. Modestly large Perforce installations require the + deployment of proxies to cope with the load their users + generate. + + + + + Choosing a revision control tool + + With the exception of CVS, all of the tools listed above + have unique strengths that suit them to particular styles of + work. There is no single revision control tool that is best + in all situations. + + As an example, Subversion is a good choice for working + with frequently edited binary files, due to its centralised + nature and support for file locking. + + I personally find Mercurial's properties of simplicity, + performance, and good merge support to be a compelling + combination that has served me well for several years. + + + + + + Switching from another tool to Mercurial + + Mercurial is bundled with an extension named convert, which can incrementally + import revision history from several other revision control + tools. By incremental, I mean that you can + convert all of a project's history to date in one go, then rerun + the conversion later to obtain new changes that happened after + the initial conversion. + + The revision control tools supported by convert are as follows: + + Subversion + CVS + Git + Darcs + + In addition, convert can + export changes from Mercurial to Subversion. This makes it + possible to try Subversion and Mercurial in parallel before + committing to a switchover, without risking the loss of any + work. + + The convert command + is easy to use. Simply point it at the path or URL of the + source repository, optionally give it the name of the + destination repository, and it will start working. After the + initial conversion, just run the same command again to import + new changes. + + + + diff -r cf006cabe489 -r 863a82f13901 en/ch02-tour-basic.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/en/ch02-tour-basic.xml Thu Feb 05 22:45:48 2009 -0800 @@ -0,0 +1,787 @@ + + + + A tour of Mercurial: the basics + \label{chap:tour-basic} + + + Installing Mercurial on your system + \label{sec:tour:install} + + Prebuilt binary packages of Mercurial are available for + every popular operating system. These make it easy to start + using Mercurial on your computer immediately. + + + Linux + + Because each Linux distribution has its own packaging + tools, policies, and rate of development, it's difficult to + give a comprehensive set of instructions on how to install + Mercurial binaries. The version of Mercurial that you will + end up with can vary depending on how active the person is who + maintains the package for your distribution. + + To keep things simple, I will focus on installing + Mercurial from the command line under the most popular Linux + distributions. Most of these distributions provide graphical + package managers that will let you install Mercurial with a + single click; the package name to look for is + mercurial. + + + Debian: + apt-get install + mercurial + Fedora Core: + yum install + mercurial + Gentoo: + emerge mercurial + OpenSUSE: + yum install + mercurial + Ubuntu: Ubuntu's Mercurial package is based on + Debian's. To install it, run the following + command. + apt-get install + mercurial + + + + + Solaris + + SunFreeWare, at http://www.sunfreeware.com, + is a good source for a large number of pre-built Solaris + packages for 32 and 64 bit Intel and Sparc architectures, + including current versions of Mercurial. + + + + Mac OS X + + Lee Cantey publishes an installer of Mercurial for Mac OS + X at http://mercurial.berkwood.com. + This package works on both Intel- and Power-based Macs. + Before you can use it, you must install a compatible version + of Universal MacPython web:macpython. + This is easy to do; simply follow the instructions on Lee's + site. + + It's also possible to install Mercurial using Fink or + MacPorts, two popular free package managers for Mac OS X. If + you have Fink, use sudo apt-get install + mercurial-py25. If MacPorts, sudo port + install mercurial. + + + + Windows + + Lee Cantey publishes an installer of Mercurial for Windows + at http://mercurial.berkwood.com. + This package has no external dependencies; it just + works. + + + The Windows version of Mercurial does not + automatically convert line endings between Windows and Unix + styles. If you want to share work with Unix users, you must + do a little additional configuration work. XXX Flesh this + out. + + + + + + Getting started + + To begin, we'll use the hg + version command to find out whether Mercurial is + actually installed properly. The actual version information + that it prints isn't so important; it's whether it prints + anything at all that we care about. + + + Built-in help + + Mercurial provides a built-in help system. This is + invaluable for those times when you find yourself stuck trying + to remember how to run a command. If you are completely + stuck, simply run hg help; it + will print a brief list of commands, along with a description + of what each does. If you ask for help on a specific command + (as below), it prints more detailed information. For a more impressive level of + detail (which you won't usually need) run hg help . The option is short for , and tells Mercurial + to print more information than it usually would. + + + + + Working with a repository + + In Mercurial, everything happens inside a + repository. The repository for a project + contains all of the files that belong to that + project, along with a historical record of the project's + files. + + There's nothing particularly magical about a repository; it + is simply a directory tree in your filesystem that Mercurial + treats as special. You can rename or delete a repository any + time you like, using either the command line or your file + browser. + + + Making a local copy of a repository + + Copying a repository is just a little + bit special. While you could use a normal file copying + command to make a copy of a repository, it's best to use a + built-in command that Mercurial provides. This command is + called hg clone, because it + creates an identical copy of an existing repository. If our clone succeeded, we should + now have a local directory called hello. This directory will + contain some files. These files + have the same contents and history in our repository as they + do in the repository we cloned. + + Every Mercurial repository is complete, self-contained, + and independent. It contains its own private copy of a + project's files and history. A cloned repository remembers + the location of the repository it was cloned from, but it does + not communicate with that repository, or any other, unless you + tell it to. + + What this means for now is that we're free to experiment + with our repository, safe in the knowledge that it's a private + sandbox that won't affect anyone else. + + + + What's in a repository? + + When we take a more detailed look inside a repository, we + can see that it contains a directory named .hg. This is where Mercurial + keeps all of its metadata for the repository. + + The contents of the .hg directory and its + subdirectories are private to Mercurial. Every other file and + directory in the repository is yours to do with as you + please. + + To introduce a little terminology, the .hg directory is the + real repository, and all of the files and + directories that coexist with it are said to live in the + working directory. An easy way to + remember the distinction is that the + repository contains the + history of your project, while the + working directory contains a + snapshot of your project at a particular + point in history. + + + + + A tour through history + + One of the first things we might want to do with a new, + unfamiliar repository is understand its history. The hg log command gives us a view of + history. By default, this + command prints a brief paragraph of output for each change to + the project that was recorded. In Mercurial terminology, we + call each of these recorded events a + changeset, because it can contain a record + of changes to several files. + + The fields in a record of output from hg log are as follows. + + changeset: This field has the + format of a number, followed by a colon, followed by a + hexadecimal string. These are + identifiers for the changeset. There + are two identifiers because the number is shorter and easier + to type than the hex string. + user: The identity of the + person who created the changeset. This is a free-form + field, but it most often contains a person's name and email + address. + date: The date and time on + which the changeset was created, and the timezone in which + it was created. (The date and time are local to that + timezone; they display what time and date it was for the + person who created the changeset.) + summary: The first line of + the text message that the creator of the changeset entered + to describe the changeset. + The default output printed by hg + log is purely a summary; it is missing a lot of + detail. + + Figure provides a + graphical representation of the history of the hello repository, to make it a + little easier to see which direction history is + flowing in. We'll be returning to this figure + several times in this chapter and the chapter that + follows. + +

+ + XXX + add text + Graphical history of the hello repository + \label{fig:tour-basic:history} + + + + Changesets, revisions, and talking to other + people + + As English is a notoriously sloppy language, and computer + science has a hallowed history of terminological confusion + (why use one term when four will do?), revision control has a + variety of words and phrases that mean the same thing. If you + are talking about Mercurial history with other people, you + will find that the word changeset is often + compressed to change or (when written) + cset, and sometimes a changeset is referred to + as a revision or a rev. + + While it doesn't matter what word you + use to refer to the concept of a changeset, the + identifier that you use to refer to + a specific changeset is of + great importance. Recall that the changeset + field in the output from hg + log identifies a changeset using both a number and + a hexadecimal string. + + The revision number is only valid in + that repository, + while the hex string is the + permanent, unchanging identifier that + will always identify that exact changeset in + every copy of the + repository. + This distinction is important. If you send someone an + email talking about revision 33, there's a high + likelihood that their revision 33 will not be the + same as yours. The reason for this is that a + revision number depends on the order in which changes arrived + in a repository, and there is no guarantee that the same + changes will happen in the same order in different + repositories. Three changes $a,b,c$ can easily appear in one + repository as $0,1,2$, while in another as $1,0,2$. + + Mercurial uses revision numbers purely as a convenient + shorthand. If you need to discuss a changeset with someone, + or make a record of a changeset for some other reason (for + example, in a bug report), use the hexadecimal + identifier. + + + + Viewing specific revisions + + To narrow the output of hg + log down to a single revision, use the (or ) option. You can use + either a revision number or a long-form changeset identifier, + and you can provide as many revisions as you want. + + If you want to see the history of several revisions + without having to list each one, you can use range + notation; this lets you express the idea I + want all revisions between $a$ and $b$, inclusive. + Mercurial also honours + the order in which you specify revisions, so hg log -r 2:4 prints $2,3,4$ while + hg log -r 4:2 prints + $4,3,2$. + + + + More detailed information + + While the summary information printed by hg log is useful if you already know + what you're looking for, you may need to see a complete + description of the change, or a list of the files changed, if + you're trying to decide whether a changeset is the one you're + looking for. The hg log + command's (or ) option gives you + this extra detail. + + If you want to see both the description and content of a + change, add the (or + ) option. This + displays the content of a change as a unified + diff (if you've never seen a unified diff before, + see section for an + overview). + + + + + All about command options + + Let's take a brief break from exploring Mercurial commands + to discuss a pattern in the way that they work; you may find + this useful to keep in mind as we continue our tour. + + Mercurial has a consistent and straightforward approach to + dealing with the options that you can pass to commands. It + follows the conventions for options that are common to modern + Linux and Unix systems. + + Every option has a long name. For example, as + we've already seen, the hg + log command accepts a option. + Most options have short names, too. Instead of + , we can use . (The reason that some + options don't have short names is that the options in + question are rarely used.) + Long options start with two dashes (e.g. ), while short options + start with one (e.g. ). + Option naming and usage is consistent across + commands. For example, every command that lets you specify + a changeset ID or revision number accepts both and + arguments. + In the examples throughout this book, I use short options + instead of long. This just reflects my own preference, so don't + read anything significant into it. + + Most commands that print output of some kind will print more + output when passed a + (or ) option, and + less when passed (or + ). + + + + Making and reviewing changes + + Now that we have a grasp of viewing history in Mercurial, + let's take a look at making some changes and examining + them. + + The first thing we'll do is isolate our experiment in a + repository of its own. We use the hg + clone command, but we don't need to clone a copy of + the remote repository. Since we already have a copy of it + locally, we can just clone that instead. This is much faster + than cloning over the network, and cloning a local repository + uses less disk space in most cases, too. As an aside, it's often good + practice to keep a pristine copy of a remote + repository around, which you can then make temporary clones of + to create sandboxes for each task you want to work on. This + lets you work on multiple tasks in parallel, each isolated from + the others until it's complete and you're ready to integrate it + back. Because local clones are so cheap, there's almost no + overhead to cloning and destroying repositories whenever you + want. + + In our my-hello + repository, we have a file hello.c that + contains the classic hello, world program. Let's + use the ancient and venerable sed command to + edit this file so that it prints a second line of output. (I'm + only using sed to do this because it's easy + to write a scripted example this way. Since you're not under + the same constraint, you probably won't want to use + sed; simply use your preferred text editor to + do the same thing.) + + Mercurial's hg status + command will tell us what Mercurial knows about the files in the + repository. The hg status command prints no output for + some files, but a line starting with + M for + hello.c. Unless you tell it to, hg status will not print any output + for files that have not been modified. + + The M indicates that + Mercurial has noticed that we modified + hello.c. We didn't need to + inform Mercurial that we were going to + modify the file before we started, or that we had modified the + file after we were done; it was able to figure this out + itself. + + It's a little bit helpful to know that we've modified + hello.c, but we might prefer to know + exactly what changes we've made to it. To + do this, we use the hg diff + command. + + + + Recording changes in a new changeset + + We can modify files, build and test our changes, and use + hg status and hg diff to review our changes, until + we're satisfied with what we've done and arrive at a natural + stopping point where we want to record our work in a new + changeset. + + The hg commit command lets + us create a new changeset; we'll usually refer to this as + making a commit or + committing. + + + Setting up a username + + When you try to run hg + commit for the first time, it is not guaranteed to + succeed. Mercurial records your name and address with each + change that you commit, so that you and others will later be + able to tell who made each change. Mercurial tries to + automatically figure out a sensible username to commit the + change with. It will attempt each of the following methods, + in order: + + If you specify a option to the hg commit command on the command + line, followed by a username, this is always given the + highest precedence. + If you have set the HGUSER + environment variable, this is checked + next. + If you create a file in your home directory + called .hgrc, with a + username entry, that will + be used next. To see what the contents of this file + should look like, refer to section + below. + If you have set the EMAIL + environment variable, this will be used + next. + Mercurial will query your system to find out + your local user name and host name, and construct a + username from these components. Since this often results + in a username that is not very useful, it will print a + warning if it has to do + this. + If all of these mechanisms fail, Mercurial will + fail, printing an error message. In this case, it will not + let you commit until you set up a + username. + You should think of the HGUSER + environment variable and the option to the hg commit command as ways to + override Mercurial's default selection + of username. For normal use, the simplest and most robust + way to set a username for yourself is by creating a + .hgrc file; see below + for details. + + Creating a Mercurial configuration file + \label{sec:tour-basic:username} + To set a user name, use your favourite editor + to create a file called .hgrc in your home directory. + Mercurial will use this file to look up your personalised + configuration settings. The initial contents of your + .hgrc should look like + this. + # This is a Mercurial configuration file. + [ui] username = Firstname Lastname + <email.address@domain.net> + The [ui] + line begins a section of the config + file, so you can read the username = + ... line as meaning set the + value of the username item in the + ui section. A section + continues until a new section begins, or the end of the + file. Mercurial ignores empty lines and treats any text + from # to the end of a + line as a comment. + + + Choosing a user name + + You can use any text you like as the value of + the username config item, since this + information is for reading by other people, but for + interpreting by Mercurial. The convention that most + people follow is to use their name and email address, as + in the example above. + + Mercurial's built-in web server obfuscates + email addresses, to make it more difficult for the email + harvesting tools that spammers use. This reduces the + likelihood that you'll start receiving more junk email + if you publish a Mercurial repository on the + web. + + + + + Writing a commit message + + When we commit a change, Mercurial drops us into + a text editor, to enter a message that will describe the + modifications we've made in this changeset. This is called + the commit message. It will be a + record for readers of what we did and why, and it will be + printed by hg log after + we've finished committing. + The editor that the hg + commit command drops us into will contain an + empty line, followed by a number of lines starting with + HG:. + empty line HG: changed + hello.c + Mercurial ignores the lines that start with + HG:; it uses them only to + tell us which files it's recording changes to. Modifying or + deleting these lines has no effect. + + + Writing a good commit message + + Since hg log + only prints the first line of a commit message by default, + it's best to write a commit message whose first line stands + alone. Here's a real example of a commit message that + doesn't follow this guideline, and + hence has a summary that is not + readable. + changeset: 73:584af0e231be user: Censored + Person <censored.person@example.org> date: Tue Sep + 26 21:37:07 2006 -0700 summary: include + buildmeister/commondefs. Add an exports and + install + + As far as the remainder of the contents of the + commit message are concerned, there are no hard-and-fast + rules. Mercurial itself doesn't interpret or care about the + contents of the commit message, though your project may have + policies that dictate a certain kind of + formatting. + My personal preference is for short, but + informative, commit messages that tell me something that I + can't figure out with a quick glance at the output of + hg log + --patch. + + + Aborting a commit + + If you decide that you don't want to commit + while in the middle of editing a commit message, simply exit + from your editor without saving the file that it's editing. + This will cause nothing to happen to either the repository + or the working directory. + If we run the hg + commit command without any arguments, it records + all of the changes we've made, as reported by hg status and hg diff. + + + Admiring our new handiwork + + Once we've finished the commit, we can use the + hg tip command to display + the changeset we just created. This command produces output + that is identical to hg + log, but it only displays the newest revision in + the repository. We refer to + the newest revision in the repository as the tip revision, + or simply the tip. + + + + Sharing changes + + We mentioned earlier that repositories in + Mercurial are self-contained. This means that the changeset + we just created exists only in our my-hello repository. Let's + look at a few ways that we can propagate this change into + other repositories. + + Pulling changes from another repository + \label{sec:tour:pull} + To get started, let's clone our original + hello repository, + which does not contain the change we just committed. We'll + call our temporary repository hello-pull. + We'll use the hg + pull command to bring changes from my-hello into hello-pull. However, blindly + pulling unknown changes into a repository is a somewhat + scary prospect. Mercurial provides the hg incoming command to tell us + what changes the hg pull + command would pull into the repository, + without actually pulling the changes in. (Of course, someone could + cause more changesets to appear in the repository that we + ran hg incoming in, before + we get a chance to hg pull + the changes, so that we could end up pulling changes that we + didn't expect.) + Bringing changes into a repository is a simple + matter of running the hg + pull command, and telling it which repository to + pull from. As you can see + from the before-and-after output of hg tip, we have successfully + pulled changes into our repository. There remains one step + before we can see these changes in the working + directory. + + + Updating the working directory + + We have so far glossed over the relationship + between a repository and its working directory. The + hg pull command that we ran + in section brought changes into + the + repository, but if we check, there's no sign of those + changes in the working directory. This is because hg pull does not (by default) + touch the working directory. Instead, we use the hg update command to do this. + It might seem a bit strange that hg pull doesn't update the working + directory automatically. There's actually a good reason for + this: you can use hg update + to update the working directory to the state it was in at + any revision in the history of the + repository. If you had the working directory updated to an + old revision---to hunt down the origin of a bug, say---and + ran a hg pull which + automatically updated the working directory to a new + revision, you might not be terribly happy. + However, since pull-then-update is such a common + thing to do, Mercurial lets you combine the two by passing + the option to + hg + pull. + hg pull + -u + If you look back at the output of hg pull in section when we ran it without , you can see that it + printed a helpful reminder that we'd have to take an + explicit step to update the working + directory: + (run 'hg update' to get a working + copy) + + To find out what revision the working directory + is at, use the hg parents + command. If you look + back at figure , you'll + see arrows connecting each changeset. The node that the + arrow leads from in each case is a + parent, and the node that the arrow leads + to is its child. The working directory + has a parent in just the same way; this is the changeset + that the working directory currently + contains. + To update the working directory to a particular + revision, give a revision number or changeset ID to the + hg update command. If you omit an explicit + revision, hg update will + update to the tip revision, as shown by the second call to + hg update in the example + above. + + + Pushing changes to another repository + + Mercurial lets us push changes to another + repository, from the repository we're currently visiting. + As with the example of hg + pull above, we'll create a temporary repository + to push our changes into. The hg outgoing command + tells us what changes would be pushed into another + repository. And the + hg push command does the + actual push. As with + hg pull, the hg push command does not update + the working directory in the repository that it's pushing + changes into. (Unlike hg + pull, hg push + does not provide a -u option that updates + the other repository's working directory.) + What happens if we try to pull or push changes + and the receiving repository already has those changes? + Nothing too exciting. + + + Sharing changes over a network + + The commands we have covered in the previous few + sections are not limited to working with local repositories. + Each works in exactly the same fashion over a network + connection; simply pass in a URL instead of a local path. + In this example, we + can see what changes we could push to the remote repository, + but the repository is understandably not set up to let + anonymous users push to it. + + + + +