view en/tour-basic.tex @ 159:7355af913937

First steps on collaboration chapter.
author Bryan O'Sullivan <bos@serpentine.com>
date Thu, 22 Mar 2007 23:03:11 -0700
parents d3f8aec5beff
children ef6a1427d0af
line wrap: on
line source

\chapter{A tour of Mercurial: the basics}
\label{chap:tour-basic}

\section{Installing Mercurial on your system}
\label{sec:tour:install}

Prebuilt binary packages of Mercurial are available for every popular
operating system.  These make it easy to start using Mercurial on your
computer immediately.

\subsection{Linux}

Because each Linux distribution has its own packaging tools, policies,
and rate of development, it's difficult to give a comprehensive set of
instructions on how to install Mercurial binaries.  The version of
Mercurial that you will end up with can vary depending on how active
the person is who maintains the package for your distribution.

To keep things simple, I will focus on installing Mercurial from the
command line under the most popular Linux distributions.  Most of
these distributions provide graphical package managers that will let
you install Mercurial with a single click; the package name to look
for is \texttt{mercurial}.

\begin{itemize}
\item[Debian]
  \begin{codesample4}
    apt-get install mercurial
  \end{codesample4}

\item[Fedora Core]
  \begin{codesample4}
    yum install mercurial
  \end{codesample4}

\item[Gentoo]
  \begin{codesample4}
    emerge mercurial
  \end{codesample4}

\item[OpenSUSE]
  \begin{codesample4}
    yum install mercurial
  \end{codesample4}

\item[Ubuntu] Ubuntu's Mercurial package is particularly old, and you
  should not use it.  If you know how, you can rebuild and install the
  Debian package.  It's probably easier to build Mercurial from source
  and simply run that; see section~\ref{sec:srcinstall:unixlike} for
  details.
\end{itemize}

\subsection{Mac OS X}

Lee Cantey publishes an installer of Mercurial for Mac OS~X at
\url{http://mercurial.berkwood.com}.  This package works on both
Intel-~and Power-based Macs.  Before you can use it, you must install
a compatible version of Universal MacPython~\cite{web:macpython}.  This
is easy to do; simply follow the instructions on Lee's site.

\subsection{Solaris}

XXX.

\subsection{Windows}

Lee Cantey publishes an installer of Mercurial for Windows at
\url{http://mercurial.berkwood.com}.  This package has no external
dependencies; it ``just works''.

\begin{note}
  The Windows version of Mercurial does not automatically convert line
  endings between Windows and Unix styles.  If you want to share work
  with Unix users, you must do a little additional configuration
  work. XXX Flesh this out.
\end{note}

\section{Getting started}

To begin, we'll use the \hgcmd{version} command to find out whether
Mercurial is actually installed properly.  The actual version
information that it prints isn't so important; it's whether it prints
anything at all that we care about.
\interaction{tour.version}

\subsection{Built-in help}

Mercurial provides a built-in help system.  This is invaluable for those
times when you find yourself stuck trying to remember how to run a
command.  If you are completely stuck, simply run \hgcmd{help}; it
will print a brief list of commands, along with a description of what
each does.  If you ask for help on a specific command (as below), it
prints more detailed information.
\interaction{tour.help}
For a more impressive level of detail (which you won't usually need)
run \hgcmdargs{help}{\hggopt{-v}}.  The \hggopt{-v} option is short
for \hggopt{--verbose}, and tells Mercurial to print more information
than it usually would.

\section{Working with a repository}

In Mercurial, everything happens inside a \emph{repository}.  The
repository for a project contains all of the files that ``belong to''
that project, along with a historical record of the project's files.

There's nothing particularly magical about a repository; it is simply
a directory tree in your filesystem that Mercurial treats as special.
You can rename or delete a repository any time you like, using either the
command line or your file browser.

\subsection{Making a local copy of a repository}

\emph{Copying} a repository is just a little bit special.  While you
could use a normal file copying command to make a copy of a
repository, it's best to use a built-in command that Mercurial
provides.  This command is called \hgcmd{clone}, because it creates an
identical copy of an existing repository.
\interaction{tour.clone}
If our clone succeeded, we should now have a local directory called
\dirname{hello}.  This directory will contain some files.
\interaction{tour.ls}
These files have the same contents and history in our repository as
they do in the repository we cloned.

Every Mercurial repository is complete, self-contained, and
independent.  It contains its own private copy of a project's files
and history.  A cloned repository remembers the location of the
repository it was cloned from, but it does not communicate with that
repository, or any other, unless you tell it to.

What this means for now is that we're free to experiment with our
repository, safe in the knowledge that it's a private ``sandbox'' that
won't affect anyone else.

\subsection{What's in a repository?}

When we take a more detailed look inside a repository, we can see that
it contains a directory named \dirname{.hg}.  This is where Mercurial
keeps all of its metadata for the repository.
\interaction{tour.ls-a}

The contents of the \dirname{.hg} directory and its subdirectories are
private to Mercurial.  Every other file and directory in the
repository is yours to do with as you please.

To introduce a little terminology, the \dirname{.hg} directory is the
``real'' repository, and all of the files and directories that coexist
with it are said to live in the \emph{working directory}.  An easy way
to remember the distinction is that the \emph{repository} contains the
\emph{history} of your project, while the \emph{working directory}
contains a \emph{snapshot} of your project at a particular point in
history.

\section{A tour through history}

One of the first things we might want to do with a new, unfamiliar
repository is understand its history.  The \hgcmd{log} command gives
us a view of history.
\interaction{tour.log}
By default, this command prints a brief paragraph of output for each
change to the project that was recorded.  In Mercurial terminology, we
call each of these recorded events a \emph{changeset}, because it can
contain a record of changes to several files.

The fields in a record of output from \hgcmd{log} are as follows.
\begin{itemize}
\item[\texttt{changeset}] This field has the format of a number,
  followed by a colon, followed by a hexadecimal string.  These are
  \emph{identifiers} for the changeset.  There are two identifiers
  because the number is shorter and easier to type than the hex
  string.
\item[\texttt{user}] The identity of the person who created the
  changeset.  This is a free-form field, but it most often contains a
  person's name and email address.
\item[\texttt{date}] The date and time on which the changeset was
  created, and the timezone in which it was created.  (The date and
  time are local to that timezone; they display what time and date it
  was for the person who created the changeset.)
\item[\texttt{summary}] The first line of the text message that the
  creator of the changeset entered to describe the changeset.
\end{itemize}
The default output printed by \hgcmd{log} is purely a summary; it is
missing a lot of detail.

Figure~\ref{fig:tour-basic:history} provides a graphical representation of
the history of the \dirname{hello} repository, to make it a little
easier to see which direction history is ``flowing'' in.  We'll be
returning to this figure several times in this chapter and the chapter
that follows.

\begin{figure}[ht]
  \centering
  \grafix{tour-history}
  \caption{Graphical history of the \dirname{hello} repository}
  \label{fig:tour-basic:history}
\end{figure}

\subsection{Changesets, revisions, and talking to other 
  people}

As English is a notoriously sloppy language, and computer science has
a hallowed history of terminological confusion (why use one term when
four will do?), revision control has a variety of words and phrases
that mean the same thing.  If you are talking about Mercurial history
with other people, you will find that the word ``changeset'' is often
compressed to ``change'' or (when written) ``cset'', and sometimes a
changeset is referred to as a ``revision'' or a ``rev''.

While it doesn't matter what \emph{word} you use to refer to the
concept of ``a~changeset'', the \emph{identifier} that you use to
refer to ``a~\emph{specific} changeset'' is of great importance.
Recall that the \texttt{changeset} field in the output from
\hgcmd{log} identifies a changeset using both a number and a
hexadecimal string.
\begin{itemize}
\item The revision number is \emph{only valid in that repository},
\item while the hex string is the \emph{permanent, unchanging
    identifier} that will always identify that exact changeset in
  \emph{every} copy of the repository.
\end{itemize}
This distinction is important.  If you send someone an email talking
about ``revision~33'', there's a high likelihood that their
revision~33 will \emph{not be the same} as yours.  The reason for this
is that a revision number depends on the order in which changes
arrived in a repository, and there is no guarantee that the same
changes will happen in the same order in different repositories.
Three changes $a,b,c$ can easily appear in one repository as $0,1,2$,
while in another as $1,0,2$.

Mercurial uses revision numbers purely as a convenient shorthand.  If
you need to discuss a changeset with someone, or make a record of a
changeset for some other reason (for example, in a bug report), use
the hexadecimal identifier.

\subsection{Viewing specific revisions}

To narrow the output of \hgcmd{log} down to a single revision, use the
\hgopt{log}{-r} (or \hgopt{log}{--rev}) option.  You can use either a
revision number or a long-form changeset identifier, and you can
provide as many revisions as you want.  \interaction{tour.log-r}

If you want to see the history of several revisions without having to
list each one, you can use \emph{range notation}; this lets you
express the idea ``I want all revisions between $a$ and $b$,
inclusive''.
\interaction{tour.log.range}
Mercurial also honours the order in which you specify revisions, so
\hgcmdargs{log}{-r 2:4} prints $2,3,4$ while \hgcmdargs{log}{-r 4:2}
prints $4,3,2$.

\subsection{More detailed information}

While the summary information printed by \hgcmd{log} is useful if you
already know what you're looking for, you may need to see a complete
description of the change, or a list of the files changed, if you're
trying to decide whether a changeset is the one you're looking for.
The \hgcmd{log} command's \hggopt{-v} (or \hggopt{--verbose})
option gives you this extra detail.
\interaction{tour.log-v}

If you want to see both the description and content of a change, add
the \hgopt{log}{-p} (or \hgopt{log}{--patch}) option.  This displays
the content of a change as a \emph{unified diff} (if you've never seen
a unified diff before, see section~\ref{sec:mq:patch} for an overview).
\interaction{tour.log-vp}

\section{All about command options}

Let's take a brief break from exploring Mercurial commands to discuss
a pattern in the way that they work; you may find this useful to keep
in mind as we continue our tour.

Mercurial has a consistent and straightforward approach to dealing
with the options that you can pass to commands.  It follows the
conventions for options that are common to modern Linux and Unix
systems.
\begin{itemize}
\item Every option has a long name.  For example, as we've already
  seen, the \hgcmd{log} command accepts a \hgopt{log}{--rev} option.
\item Most options have short names, too.  Instead of
  \hgopt{log}{--rev}, we can use \hgopt{log}{-r}.  (The reason that
  some options don't have short names is that the options in question
  are rarely used.)
\item Long options start with two dashes (e.g.~\hgopt{log}{--rev}),
  while short options start with one (e.g.~\hgopt{log}{-r}).
\item Option naming and usage is consistent across commands.  For
  example, every command that lets you specify a changeset~ID or
  revision number accepts both \hgopt{log}{-r} and \hgopt{log}{--rev}
  arguments.
\end{itemize}
In the examples throughout this book, I use short options instead of
long.  This just reflects my own preference, so don't read anything
significant into it.

Most commands that print output of some kind will print more output
when passed a \hggopt{-v} (or \hggopt{--verbose}) option, and less
when passed \hggopt{-q} (or \hggopt{--quiet}).

\section{Making and reviewing changes}

Now that we have a grasp of viewing history in Mercurial, let's take a
look at making some changes and examining them.

The first thing we'll do is isolate our experiment in a repository of
its own.  We use the \hgcmd{clone} command, but we don't need to
clone a copy of the remote repository.  Since we already have a copy
of it locally, we can just clone that instead.  This is much faster
than cloning over the network, and cloning a local repository uses
less disk space in most cases, too.
\interaction{tour.reclone}
As an aside, it's often good practice to keep a ``pristine'' copy of a
remote repository around, which you can then make temporary clones of
to create sandboxes for each task you want to work on.  This lets you
work on multiple tasks in parallel, each isolated from the others
until it's complete and you're ready to integrate it back.  Because
local clones are so cheap, there's almost no overhead to cloning and
destroying repositories whenever you want.

In our \dirname{my-hello} repository, we have a file
\filename{hello.c} that contains the classic ``hello, world'' program.
Let's use the ancient and venerable \command{sed} command to edit this
file so that it prints a second line of output.  (I'm only using
\command{sed} to do this because it's easy to write a scripted example
this way.  Since you're not under the same constraint, you probably
won't want to use \command{sed}; simply use your preferred text editor to
do the same thing.)
\interaction{tour.sed}

Mercurial's \hgcmd{status} command will tell us what Mercurial knows
about the files in the repository.
\interaction{tour.status}
The \hgcmd{status} command prints no output for some files, but a line
starting with ``\texttt{M}'' for \filename{hello.c}.  Unless you tell
it to, \hgcmd{status} will not print any output for files that have
not been modified.  

The ``\texttt{M}'' indicates that Mercurial has noticed that we
modified \filename{hello.c}.  We didn't need to \emph{inform}
Mercurial that we were going to modify the file before we started, or
that we had modified the file after we were done; it was able to
figure this out itself.

It's a little bit helpful to know that we've modified
\filename{hello.c}, but we might prefer to know exactly \emph{what}
changes we've made to it.  To do this, we use the \hgcmd{diff}
command.
\interaction{tour.diff}

\section{Recording changes in a new changeset}

We can modify files, build and test our changes, and use
\hgcmd{status} and \hgcmd{diff} to review our changes, until we're
satisfied with what we've done and arrive at a natural stopping point
where we want to record our work in a new changeset.

The \hgcmd{commit} command lets us create a new changeset; we'll
usually refer to this as ``making a commit'' or ``committing''.  

\subsection{Setting up a username}

When you try to run \hgcmd{commit} for the first time, it may succeed
immediately, or it may fail with an error message that looks like
this.
\interaction{tour.commit-no-user}
If it succeeds for you, the chances are that either you already have a
file called \sfilename{.hgrc} in your home directory, or an
environment variable set named \envar{EMAIL}.

When you commit, Mercurial wants to know what your name is, so that it
can record it.  If you have created a \sfilename{.hgrc} file, it will
look in there.  If it doesn't find something suitable, it will see if
your \envar{EMAIL} address is set.  If neither of these is present, it
will produce the error message you can see above.

\subsubsection{Creating a Mercurial configuration file}

To set a user name, use your favourite editor to create a file called
\sfilename{.hgrc} in your home directory.  Mercurial will use this
file to look up your personalised configuration settings.  The initial
contents of your \sfilename{.hgrc} should look like this.
\begin{codesample2}
  # This is a Mercurial configuration file.
  [ui]
  username = Firstname Lastname <email.address@domain.net>
\end{codesample2}
The ``\texttt{[ui]}'' line begins a \emph{section} of the config file,
so you can read the ``\texttt{username = ...}'' line as meaning ``set
the value of the \texttt{username} item in the \texttt{ui} section''.
A section continues until a new section begins, or the end of the
file.  Mercurial ignores empty lines and treats any text from
``\texttt{\#}'' to the end of a line as a comment.

\subsubsection{Choosing a user name}

You can use any text you like as the value of the \texttt{username}
config item, since this information is for reading by other people,
but for interpreting by Mercurial.  The convention that most people
follow is to use their name and email address, as in the example
above.

\begin{note}
  Mercurial's built-in web server obfuscates email addresses, to make
  it more difficult for the email harvesting tools that spammers use.
  This reduces the likelihood that you'll start receiving more junk
  email if you publish a Mercurial repository on the web.
\end{note}

\subsection{Writing a commit message}

When we commit a change, Mercurial drops us into a text editor, to
enter a message that will describe the modifications we've made in
this changeset.  This is called the \emph{commit message}.  It will be
a record for readers of what we did and why, and it will be printed by
\hgcmd{log} after we've finished committing.
\interaction{tour.commit}

The editor that the \hgcmd{commit} command drops us into will contain
an empty line, followed by a number of lines starting with
``\texttt{HG:}''.
\begin{codesample2}
  \emph{empty line}
  HG: changed hello.c
\end{codesample2}
Mercurial ignores the lines that start with ``\texttt{HG:}''; it uses
them only to tell us which files it's recording changes to.  Modifying
or deleting these lines has no effect.

\subsection{Writing a good commit message}

Since \hgcmd{log} only prints the first line of a commit message by
default, it's best to write a commit message whose first line stands
alone.  Here's a real example of a commit message that \emph{doesn't}
follow this guideline, and hence has a summary that is not readable.
\begin{codesample2}
  changeset:   73:584af0e231be
  user:        Censored Person <censored.person@example.org>
  date:        Tue Sep 26 21:37:07 2006 -0700
  summary:     include buildmeister/commondefs.   Add an exports and install
\end{codesample2}

As far as the remainder of the contents of the commit message are
concerned, there are no hard-and-fast rules.  Mercurial itself doesn't
interpret or care about the contents of the commit message, though
your project may have policies that dictate a certain kind of
formatting.

My personal preference is for short, but informative, commit messages
that tell me something that I can't figure out with a quick glance at
the output of \hgcmdargs{log}{--patch}.

\subsection{Aborting a commit}

If you decide that you don't want to commit while in the middle of
editing a commit message, simply exit from your editor without saving
the file that it's editing.  This will cause nothing to happen to
either the repository or the working directory.

If we run the \hgcmd{commit} command without any arguments, it records
all of the changes we've made, as reported by \hgcmd{status} and
\hgcmd{diff}.

\subsection{Admiring our new handiwork}

Once we've finished the commit, we can use the \hgcmd{tip} command to
display the changeset we just created.  This command produces output
that is identical to \hgcmd{log}, but it only displays the newest
revision in the repository.
\interaction{tour.tip}
We refer to the newest revision in the repository as the tip revision,
or simply the tip.

\section{Sharing changes}

We mentioned earlier that repositories in Mercurial are
self-contained.  This means that the changeset we just created exists
only in our \dirname{my-hello} repository.  Let's look at a few ways
that we can propagate this change into other repositories.

\subsection{Pulling changes from another repository}
\label{sec:tour:pull}

To get started, let's clone our original \dirname{hello} repository,
which does not contain the change we just committed.  We'll call our
temporary repository \dirname{hello-pull}.
\interaction{tour.clone-pull}

We'll use the \hgcmd{pull} command to bring changes from
\dirname{my-hello} into \dirname{hello-pull}.  However, blindly
pulling unknown changes into a repository is a somewhat scary
prospect.  Mercurial provides the \hgcmd{incoming} command to tell us
what changes the \hgcmd{pull} command \emph{would} pull into the
repository, without actually pulling the changes in.
\interaction{tour.incoming}
(Of course, someone could cause more changesets to appear in the
repository that we ran \hgcmd{incoming} in, before we get a chance to
\hgcmd{pull} the changes, so that we could end up pulling changes that we
didn't expect.)

Bringing changes into a repository is a simple matter of running the
\hgcmd{pull} command, and telling it which repository to pull from.
\interaction{tour.pull}
As you can see from the before-and-after output of \hgcmd{tip}, we
have successfully pulled changes into our repository.  There remains
one step before we can see these changes in the working directory.

\subsection{Updating the working directory}

We have so far glossed over the relationship between a repository and
its working directory.  The \hgcmd{pull} command that we ran in
section~\ref{sec:tour:pull} brought changes into the repository, but
if we check, there's no sign of those changes in the working
directory.  This is because \hgcmd{pull} does not (by default) touch
the working directory.  Instead, we use the \hgcmd{update} command to
do this.
\interaction{tour.update}

It might seem a bit strange that \hgcmd{pull} doesn't update the
working directory automatically.  There's actually a good reason for
this: you can use \hgcmd{update} to update the working directory to
the state it was in at \emph{any revision} in the history of the
repository.  If you had the working directory updated to an old
revision---to hunt down the origin of a bug, say---and ran a
\hgcmd{pull} which automatically updated the working directory to a
new revision, you might not be terribly happy.

However, since pull-then-update is such a common thing to do,
Mercurial lets you combine the two by passing the \hgopt{pull}{-u}
option to \hgcmd{pull}.
\begin{codesample2}
  hg pull -u
\end{codesample2}
If you look back at the output of \hgcmd{pull} in
section~\ref{sec:tour:pull} when we ran it without \hgopt{pull}{-u},
you can see that it printed a helpful reminder that we'd have to take
an explicit step to update the working directory:
\begin{codesample2}
  (run 'hg update' to get a working copy)
\end{codesample2}

To find out what revision the working directory is at, use the
\hgcmd{parents} command.
\interaction{tour.parents}
If you look back at figure~\ref{fig:tour-basic:history}, you'll see
arrows connecting each changeset.  The node that the arrow leads
\emph{from} in each case is a parent, and the node that the arrow
leads \emph{to} is its child.  The working directory has a parent in
just the same way; this is the changeset that the working directory
currently contains.

To update the working directory to a particular revision, give a
revision number or changeset~ID to the \hgcmd{update} command.
\interaction{tour.older}
If you omit an explicit revision, \hgcmd{update} will update to the
tip revision, as shown by the second call to \hgcmd{update} in the
example above.

\subsection{Pushing changes to another repository}

Mercurial lets us push changes to another repository, from the
repository we're currently visiting.  As with the example of
\hgcmd{pull} above, we'll create a temporary repository to push our
changes into.
\interaction{tour.clone-push}
The \hgcmd{outgoing} command tells us what changes would be pushed
into another repository.
\interaction{tour.outgoing}
And the \hgcmd{push} command does the actual push.
\interaction{tour.push}
As with \hgcmd{pull}, the \hgcmd{push} command does not update the
working directory in the repository that it's pushing changes into.
(Unlike \hgcmd{pull}, \hgcmd{push} does not provide a \texttt{-u}
option that updates the other repository's working directory.)

What happens if we try to pull or push changes and the receiving
repository already has those changes?  Nothing too exciting.
\interaction{tour.push.nothing}

\subsection{Sharing changes over a network}

The commands we have covered in the previous few sections are not
limited to working with local repositories.  Each works in exactly the
same fashion over a network connection; simply pass in a URL instead
of a local path.
\interaction{tour.outgoing.net}
In this example, we can see what changes we could push to the remote
repository, but the repository is understandably not set up to let
anonymous users push to it.
\interaction{tour.push.net}

%%% Local Variables: 
%%% mode: latex
%%% TeX-master: "00book"
%%% End: