diff es/collab.tex @ 501:b05e35d641e4

Copying the files from en to es and taking intro chapter
author Igor TAmara <igor@tamarapatino.org>
date Fri, 07 Nov 2008 21:42:57 -0500
parents 04c08ad7e92e
children a9ea523446cc
line wrap: on
line diff
--- a/es/collab.tex	Fri Nov 07 21:33:22 2008 -0500
+++ b/es/collab.tex	Fri Nov 07 21:42:57 2008 -0500
@@ -0,0 +1,1118 @@
+\chapter{Collaborating with other people}
+\label{cha:collab}
+
+As a completely decentralised tool, Mercurial doesn't impose any
+policy on how people ought to work with each other.  However, if
+you're new to distributed revision control, it helps to have some
+tools and examples in mind when you're thinking about possible
+workflow models.
+
+\section{Mercurial's web interface}
+
+Mercurial has a powerful web interface that provides several 
+useful capabilities.
+
+For interactive use, the web interface lets you browse a single
+repository or a collection of repositories.  You can view the history
+of a repository, examine each change (comments and diffs), and view
+the contents of each directory and file.
+
+Also for human consumption, the web interface provides an RSS feed of
+the changes in a repository.  This lets you ``subscribe'' to a
+repository using your favourite feed reader, and be automatically
+notified of activity in that repository as soon as it happens.  I find
+this capability much more convenient than the model of subscribing to
+a mailing list to which notifications are sent, as it requires no
+additional configuration on the part of whoever is serving the
+repository.
+
+The web interface also lets remote users clone a repository, pull
+changes from it, and (when the server is configured to permit it) push
+changes back to it.  Mercurial's HTTP tunneling protocol aggressively
+compresses data, so that it works efficiently even over low-bandwidth
+network connections.
+
+The easiest way to get started with the web interface is to use your
+web browser to visit an existing repository, such as the master
+Mercurial repository at
+\url{http://www.selenic.com/repo/hg?style=gitweb}.
+
+If you're interested in providing a web interface to your own
+repositories, Mercurial provides two ways to do this.  The first is
+using the \hgcmd{serve} command, which is best suited to short-term
+``lightweight'' serving.  See section~\ref{sec:collab:serve} below for
+details of how to use this command.  If you have a long-lived
+repository that you'd like to make permanently available, Mercurial
+has built-in support for the CGI (Common Gateway Interface) standard,
+which all common web servers support.  See
+section~\ref{sec:collab:cgi} for details of CGI configuration.
+
+\section{Collaboration models}
+
+With a suitably flexible tool, making decisions about workflow is much
+more of a social engineering challenge than a technical one.
+Mercurial imposes few limitations on how you can structure the flow of
+work in a project, so it's up to you and your group to set up and live
+with a model that matches your own particular needs.
+
+\subsection{Factors to keep in mind}
+
+The most important aspect of any model that you must keep in mind is
+how well it matches the needs and capabilities of the people who will
+be using it.  This might seem self-evident; even so, you still can't
+afford to forget it for a moment.
+
+I once put together a workflow model that seemed to make perfect sense
+to me, but that caused a considerable amount of consternation and
+strife within my development team.  In spite of my attempts to explain
+why we needed a complex set of branches, and how changes ought to flow
+between them, a few team members revolted.  Even though they were
+smart people, they didn't want to pay attention to the constraints we
+were operating under, or face the consequences of those constraints in
+the details of the model that I was advocating.
+
+Don't sweep foreseeable social or technical problems under the rug.
+Whatever scheme you put into effect, you should plan for mistakes and
+problem scenarios.  Consider adding automated machinery to prevent, or
+quickly recover from, trouble that you can anticipate.  As an example,
+if you intend to have a branch with not-for-release changes in it,
+you'd do well to think early about the possibility that someone might
+accidentally merge those changes into a release branch.  You could
+avoid this particular problem by writing a hook that prevents changes
+from being merged from an inappropriate branch.
+
+\subsection{Informal anarchy}
+
+I wouldn't suggest an ``anything goes'' approach as something
+sustainable, but it's a model that's easy to grasp, and it works
+perfectly well in a few unusual situations.
+
+As one example, many projects have a loose-knit group of collaborators
+who rarely physically meet each other.  Some groups like to overcome
+the isolation of working at a distance by organising occasional
+``sprints''.  In a sprint, a number of people get together in a single
+location (a company's conference room, a hotel meeting room, that kind
+of place) and spend several days more or less locked in there, hacking
+intensely on a handful of projects.
+
+A sprint is the perfect place to use the \hgcmd{serve} command, since
+\hgcmd{serve} does not requires any fancy server infrastructure.  You
+can get started with \hgcmd{serve} in moments, by reading
+section~\ref{sec:collab:serve} below.  Then simply tell the person
+next to you that you're running a server, send the URL to them in an
+instant message, and you immediately have a quick-turnaround way to
+work together.  They can type your URL into their web browser and
+quickly review your changes; or they can pull a bugfix from you and
+verify it; or they can clone a branch containing a new feature and try
+it out.
+
+The charm, and the problem, with doing things in an ad hoc fashion
+like this is that only people who know about your changes, and where
+they are, can see them.  Such an informal approach simply doesn't
+scale beyond a handful people, because each individual needs to know
+about $n$ different repositories to pull from.
+
+\subsection{A single central repository}
+
+For smaller projects migrating from a centralised revision control
+tool, perhaps the easiest way to get started is to have changes flow
+through a single shared central repository.  This is also the
+most common ``building block'' for more ambitious workflow schemes.
+
+Contributors start by cloning a copy of this repository.  They can
+pull changes from it whenever they need to, and some (perhaps all)
+developers have permission to push a change back when they're ready
+for other people to see it.
+
+Under this model, it can still often make sense for people to pull
+changes directly from each other, without going through the central
+repository.  Consider a case in which I have a tentative bug fix, but
+I am worried that if I were to publish it to the central repository,
+it might subsequently break everyone else's trees as they pull it.  To
+reduce the potential for damage, I can ask you to clone my repository
+into a temporary repository of your own and test it.  This lets us put
+off publishing the potentially unsafe change until it has had a little
+testing.
+
+In this kind of scenario, people usually use the \command{ssh}
+protocol to securely push changes to the central repository, as
+documented in section~\ref{sec:collab:ssh}.  It's also usual to
+publish a read-only copy of the repository over HTTP using CGI, as in
+section~\ref{sec:collab:cgi}.  Publishing over HTTP satisfies the
+needs of people who don't have push access, and those who want to use
+web browsers to browse the repository's history.
+
+\subsection{Working with multiple branches}
+
+Projects of any significant size naturally tend to make progress on
+several fronts simultaneously.  In the case of software, it's common
+for a project to go through periodic official releases.  A release
+might then go into ``maintenance mode'' for a while after its first
+publication; maintenance releases tend to contain only bug fixes, not
+new features.  In parallel with these maintenance releases, one or
+more future releases may be under development.  People normally use
+the word ``branch'' to refer to one of these many slightly different
+directions in which development is proceeding.
+
+Mercurial is particularly well suited to managing a number of
+simultaneous, but not identical, branches.  Each ``development
+direction'' can live in its own central repository, and you can merge
+changes from one to another as the need arises.  Because repositories
+are independent of each other, unstable changes in a development
+branch will never affect a stable branch unless someone explicitly
+merges those changes in.
+
+Here's an example of how this can work in practice.  Let's say you
+have one ``main branch'' on a central server.
+\interaction{branching.init}
+People clone it, make changes locally, test them, and push them back.
+
+Once the main branch reaches a release milestone, you can use the
+\hgcmd{tag} command to give a permanent name to the milestone
+revision.
+\interaction{branching.tag}
+Let's say some ongoing development occurs on the main branch.
+\interaction{branching.main}
+Using the tag that was recorded at the milestone, people who clone
+that repository at any time in the future can use \hgcmd{update} to
+get a copy of the working directory exactly as it was when that tagged
+revision was committed.  
+\interaction{branching.update}
+
+In addition, immediately after the main branch is tagged, someone can
+then clone the main branch on the server to a new ``stable'' branch,
+also on the server.
+\interaction{branching.clone}
+
+Someone who needs to make a change to the stable branch can then clone
+\emph{that} repository, make their changes, commit, and push their
+changes back there.
+\interaction{branching.stable}
+Because Mercurial repositories are independent, and Mercurial doesn't
+move changes around automatically, the stable and main branches are
+\emph{isolated} from each other.  The changes that you made on the
+main branch don't ``leak'' to the stable branch, and vice versa.
+
+You'll often want all of your bugfixes on the stable branch to show up
+on the main branch, too.  Rather than rewrite a bugfix on the main
+branch, you can simply pull and merge changes from the stable to the
+main branch, and Mercurial will bring those bugfixes in for you.
+\interaction{branching.merge}
+The main branch will still contain changes that are not on the stable
+branch, but it will also contain all of the bugfixes from the stable
+branch.  The stable branch remains unaffected by these changes.
+
+\subsection{Feature branches}
+
+For larger projects, an effective way to manage change is to break up
+a team into smaller groups.  Each group has a shared branch of its
+own, cloned from a single ``master'' branch used by the entire
+project.  People working on an individual branch are typically quite
+isolated from developments on other branches.
+
+\begin{figure}[ht]
+  \centering
+  \grafix{feature-branches}
+  \caption{Feature branches}
+  \label{fig:collab:feature-branches}
+\end{figure}
+
+When a particular feature is deemed to be in suitable shape, someone
+on that feature team pulls and merges from the master branch into the
+feature branch, then pushes back up to the master branch.
+
+\subsection{The release train}
+
+Some projects are organised on a ``train'' basis: a release is
+scheduled to happen every few months, and whatever features are ready
+when the ``train'' is ready to leave are allowed in.
+
+This model resembles working with feature branches.  The difference is
+that when a feature branch misses a train, someone on the feature team
+pulls and merges the changes that went out on that train release into
+the feature branch, and the team continues its work on top of that
+release so that their feature can make the next release.
+
+\subsection{The Linux kernel model}
+
+The development of the Linux kernel has a shallow hierarchical
+structure, surrounded by a cloud of apparent chaos.  Because most
+Linux developers use \command{git}, a distributed revision control
+tool with capabilities similar to Mercurial, it's useful to describe
+the way work flows in that environment; if you like the ideas, the
+approach translates well across tools.
+
+At the center of the community sits Linus Torvalds, the creator of
+Linux.  He publishes a single source repository that is considered the
+``authoritative'' current tree by the entire developer community.
+Anyone can clone Linus's tree, but he is very choosy about whose trees
+he pulls from.
+
+Linus has a number of ``trusted lieutenants''.  As a general rule, he
+pulls whatever changes they publish, in most cases without even
+reviewing those changes.  Some of those lieutenants are generally
+agreed to be ``maintainers'', responsible for specific subsystems
+within the kernel.  If a random kernel hacker wants to make a change
+to a subsystem that they want to end up in Linus's tree, they must
+find out who the subsystem's maintainer is, and ask that maintainer to
+take their change.  If the maintainer reviews their changes and agrees
+to take them, they'll pass them along to Linus in due course.
+
+Individual lieutenants have their own approaches to reviewing,
+accepting, and publishing changes; and for deciding when to feed them
+to Linus.  In addition, there are several well known branches that
+people use for different purposes.  For example, a few people maintain
+``stable'' repositories of older versions of the kernel, to which they
+apply critical fixes as needed.  Some maintainers publish multiple
+trees: one for experimental changes; one for changes that they are
+about to feed upstream; and so on.  Others just publish a single
+tree.
+
+This model has two notable features.  The first is that it's ``pull
+only''.  You have to ask, convince, or beg another developer to take a
+change from you, because there are almost no trees to which more than
+one person can push, and there's no way to push changes into a tree
+that someone else controls.
+
+The second is that it's based on reputation and acclaim.  If you're an
+unknown, Linus will probably ignore changes from you without even
+responding.  But a subsystem maintainer will probably review them, and
+will likely take them if they pass their criteria for suitability.
+The more ``good'' changes you contribute to a maintainer, the more
+likely they are to trust your judgment and accept your changes.  If
+you're well-known and maintain a long-lived branch for something Linus
+hasn't yet accepted, people with similar interests may pull your
+changes regularly to keep up with your work.
+
+Reputation and acclaim don't necessarily cross subsystem or ``people''
+boundaries.  If you're a respected but specialised storage hacker, and
+you try to fix a networking bug, that change will receive a level of
+scrutiny from a network maintainer comparable to a change from a
+complete stranger.
+
+To people who come from more orderly project backgrounds, the
+comparatively chaotic Linux kernel development process often seems
+completely insane.  It's subject to the whims of individuals; people
+make sweeping changes whenever they deem it appropriate; and the pace
+of development is astounding.  And yet Linux is a highly successful,
+well-regarded piece of software.
+
+\subsection{Pull-only versus shared-push collaboration}
+
+A perpetual source of heat in the open source community is whether a
+development model in which people only ever pull changes from others
+is ``better than'' one in which multiple people can push changes to a
+shared repository.
+
+Typically, the backers of the shared-push model use tools that
+actively enforce this approach.  If you're using a centralised
+revision control tool such as Subversion, there's no way to make a
+choice over which model you'll use: the tool gives you shared-push,
+and if you want to do anything else, you'll have to roll your own
+approach on top (such as applying a patch by hand).
+
+A good distributed revision control tool, such as Mercurial, will
+support both models.  You and your collaborators can then structure
+how you work together based on your own needs and preferences, not on
+what contortions your tools force you into.
+
+\subsection{Where collaboration meets branch management}
+
+Once you and your team set up some shared repositories and start
+propagating changes back and forth between local and shared repos, you
+begin to face a related, but slightly different challenge: that of
+managing the multiple directions in which your team may be moving at
+once.  Even though this subject is intimately related to how your team
+collaborates, it's dense enough to merit treatment of its own, in
+chapter~\ref{chap:branch}.
+
+\section{The technical side of sharing}
+
+The remainder of this chapter is devoted to the question of serving
+data to your collaborators.
+
+\section{Informal sharing with \hgcmd{serve}}
+\label{sec:collab:serve}
+
+Mercurial's \hgcmd{serve} command is wonderfully suited to small,
+tight-knit, and fast-paced group environments.  It also provides a
+great way to get a feel for using Mercurial commands over a network.
+
+Run \hgcmd{serve} inside a repository, and in under a second it will
+bring up a specialised HTTP server; this will accept connections from
+any client, and serve up data for that repository until you terminate
+it.  Anyone who knows the URL of the server you just started, and can
+talk to your computer over the network, can then use a web browser or
+Mercurial to read data from that repository.  A URL for a
+\hgcmd{serve} instance running on a laptop is likely to look something
+like \Verb|http://my-laptop.local:8000/|.
+
+The \hgcmd{serve} command is \emph{not} a general-purpose web server.
+It can do only two things:
+\begin{itemize}
+\item Allow people to browse the history of the repository it's
+  serving, from their normal web browsers.
+\item Speak Mercurial's wire protocol, so that people can
+  \hgcmd{clone} or \hgcmd{pull} changes from that repository.
+\end{itemize}
+In particular, \hgcmd{serve} won't allow remote users to \emph{modify}
+your repository.  It's intended for read-only use.
+
+If you're getting started with Mercurial, there's nothing to prevent
+you from using \hgcmd{serve} to serve up a repository on your own
+computer, then use commands like \hgcmd{clone}, \hgcmd{incoming}, and
+so on to talk to that server as if the repository was hosted remotely.
+This can help you to quickly get acquainted with using commands on
+network-hosted repositories.
+
+\subsection{A few things to keep in mind}
+
+Because it provides unauthenticated read access to all clients, you
+should only use \hgcmd{serve} in an environment where you either don't
+care, or have complete control over, who can access your network and
+pull data from your repository.
+
+The \hgcmd{serve} command knows nothing about any firewall software
+you might have installed on your system or network.  It cannot detect
+or control your firewall software.  If other people are unable to talk
+to a running \hgcmd{serve} instance, the second thing you should do
+(\emph{after} you make sure that they're using the correct URL) is
+check your firewall configuration.
+
+By default, \hgcmd{serve} listens for incoming connections on
+port~8000.  If another process is already listening on the port you
+want to use, you can specify a different port to listen on using the
+\hgopt{serve}{-p} option.
+
+Normally, when \hgcmd{serve} starts, it prints no output, which can be
+a bit unnerving.  If you'd like to confirm that it is indeed running
+correctly, and find out what URL you should send to your
+collaborators, start it with the \hggopt{-v} option.
+
+\section{Using the Secure Shell (ssh) protocol}
+\label{sec:collab:ssh}
+
+You can pull and push changes securely over a network connection using
+the Secure Shell (\texttt{ssh}) protocol.  To use this successfully,
+you may have to do a little bit of configuration on the client or
+server sides.
+
+If you're not familiar with ssh, it's a network protocol that lets you
+securely communicate with another computer.  To use it with Mercurial,
+you'll be setting up one or more user accounts on a server so that
+remote users can log in and execute commands.
+
+(If you \emph{are} familiar with ssh, you'll probably find some of the
+material that follows to be elementary in nature.)
+
+\subsection{How to read and write ssh URLs}
+
+An ssh URL tends to look like this:
+\begin{codesample2}
+  ssh://bos@hg.serpentine.com:22/hg/hgbook
+\end{codesample2}
+\begin{enumerate}
+\item The ``\texttt{ssh://}'' part tells Mercurial to use the ssh
+  protocol.
+\item The ``\texttt{bos@}'' component indicates what username to log
+  into the server as.  You can leave this out if the remote username
+  is the same as your local username.
+\item The ``\texttt{hg.serpentine.com}'' gives the hostname of the
+  server to log into.
+\item The ``:22'' identifies the port number to connect to the server
+  on.  The default port is~22, so you only need to specify this part
+  if you're \emph{not} using port~22.
+\item The remainder of the URL is the local path to the repository on
+  the server.
+\end{enumerate}
+
+There's plenty of scope for confusion with the path component of ssh
+URLs, as there is no standard way for tools to interpret it.  Some
+programs behave differently than others when dealing with these paths.
+This isn't an ideal situation, but it's unlikely to change.  Please
+read the following paragraphs carefully.
+
+Mercurial treats the path to a repository on the server as relative to
+the remote user's home directory.  For example, if user \texttt{foo}
+on the server has a home directory of \dirname{/home/foo}, then an ssh
+URL that contains a path component of \dirname{bar}
+\emph{really} refers to the directory \dirname{/home/foo/bar}.
+
+If you want to specify a path relative to another user's home
+directory, you can use a path that starts with a tilde character
+followed by the user's name (let's call them \texttt{otheruser}), like
+this.
+\begin{codesample2}
+  ssh://server/~otheruser/hg/repo
+\end{codesample2}
+
+And if you really want to specify an \emph{absolute} path on the
+server, begin the path component with two slashes, as in this example.
+\begin{codesample2}
+  ssh://server//absolute/path
+\end{codesample2}
+
+\subsection{Finding an ssh client for your system}
+
+Almost every Unix-like system comes with OpenSSH preinstalled.  If
+you're using such a system, run \Verb|which ssh| to find out if
+the \command{ssh} command is installed (it's usually in
+\dirname{/usr/bin}).  In the unlikely event that it isn't present,
+take a look at your system documentation to figure out how to install
+it.
+
+On Windows, you'll first need to choose download a suitable ssh
+client.  There are two alternatives.
+\begin{itemize}
+\item Simon Tatham's excellent PuTTY package~\cite{web:putty} provides
+  a complete suite of ssh client commands.
+\item If you have a high tolerance for pain, you can use the Cygwin
+  port of OpenSSH.
+\end{itemize}
+In either case, you'll need to edit your \hgini\ file to tell
+Mercurial where to find the actual client command.  For example, if
+you're using PuTTY, you'll need to use the \command{plink} command as
+a command-line ssh client.
+\begin{codesample2}
+  [ui]
+  ssh = C:/path/to/plink.exe -ssh -i "C:/path/to/my/private/key"
+\end{codesample2}
+
+\begin{note}
+  The path to \command{plink} shouldn't contain any whitespace
+  characters, or Mercurial may not be able to run it correctly (so
+  putting it in \dirname{C:\\Program Files} is probably not a good
+  idea).
+\end{note}
+
+\subsection{Generating a key pair}
+
+To avoid the need to repetitively type a password every time you need
+to use your ssh client, I recommend generating a key pair.  On a
+Unix-like system, the \command{ssh-keygen} command will do the trick.
+On Windows, if you're using PuTTY, the \command{puttygen} command is
+what you'll need.
+
+When you generate a key pair, it's usually \emph{highly} advisable to
+protect it with a passphrase.  (The only time that you might not want
+to do this id when you're using the ssh protocol for automated tasks
+on a secure network.)
+
+Simply generating a key pair isn't enough, however.  You'll need to
+add the public key to the set of authorised keys for whatever user
+you're logging in remotely as.  For servers using OpenSSH (the vast
+majority), this will mean adding the public key to a list in a file
+called \sfilename{authorized\_keys} in their \sdirname{.ssh}
+directory.
+
+On a Unix-like system, your public key will have a \filename{.pub}
+extension.  If you're using \command{puttygen} on Windows, you can
+save the public key to a file of your choosing, or paste it from the
+window it's displayed in straight into the
+\sfilename{authorized\_keys} file.
+
+\subsection{Using an authentication agent}
+
+An authentication agent is a daemon that stores passphrases in memory
+(so it will forget passphrases if you log out and log back in again).
+An ssh client will notice if it's running, and query it for a
+passphrase.  If there's no authentication agent running, or the agent
+doesn't store the necessary passphrase, you'll have to type your
+passphrase every time Mercurial tries to communicate with a server on
+your behalf (e.g.~whenever you pull or push changes).
+
+The downside of storing passphrases in an agent is that it's possible
+for a well-prepared attacker to recover the plain text of your
+passphrases, in some cases even if your system has been power-cycled.
+You should make your own judgment as to whether this is an acceptable
+risk.  It certainly saves a lot of repeated typing.
+
+On Unix-like systems, the agent is called \command{ssh-agent}, and
+it's often run automatically for you when you log in.  You'll need to
+use the \command{ssh-add} command to add passphrases to the agent's
+store.  On Windows, if you're using PuTTY, the \command{pageant}
+command acts as the agent.  It adds an icon to your system tray that
+will let you manage stored passphrases.
+
+\subsection{Configuring the server side properly}
+
+Because ssh can be fiddly to set up if you're new to it, there's a
+variety of things that can go wrong.  Add Mercurial on top, and
+there's plenty more scope for head-scratching.  Most of these
+potential problems occur on the server side, not the client side.  The
+good news is that once you've gotten a configuration working, it will
+usually continue to work indefinitely.
+
+Before you try using Mercurial to talk to an ssh server, it's best to
+make sure that you can use the normal \command{ssh} or \command{putty}
+command to talk to the server first.  If you run into problems with
+using these commands directly, Mercurial surely won't work.  Worse, it
+will obscure the underlying problem.  Any time you want to debug
+ssh-related Mercurial problems, you should drop back to making sure
+that plain ssh client commands work first, \emph{before} you worry
+about whether there's a problem with Mercurial.
+
+The first thing to be sure of on the server side is that you can
+actually log in from another machine at all.  If you can't use
+\command{ssh} or \command{putty} to log in, the error message you get
+may give you a few hints as to what's wrong.  The most common problems
+are as follows.
+\begin{itemize}
+\item If you get a ``connection refused'' error, either there isn't an
+  SSH daemon running on the server at all, or it's inaccessible due to
+  firewall configuration.
+\item If you get a ``no route to host'' error, you either have an
+  incorrect address for the server or a seriously locked down firewall
+  that won't admit its existence at all.
+\item If you get a ``permission denied'' error, you may have mistyped
+  the username on the server, or you could have mistyped your key's
+  passphrase or the remote user's password.
+\end{itemize}
+In summary, if you're having trouble talking to the server's ssh
+daemon, first make sure that one is running at all.  On many systems
+it will be installed, but disabled, by default.  Once you're done with
+this step, you should then check that the server's firewall is
+configured to allow incoming connections on the port the ssh daemon is
+listening on (usually~22).  Don't worry about more exotic
+possibilities for misconfiguration until you've checked these two
+first.
+
+If you're using an authentication agent on the client side to store
+passphrases for your keys, you ought to be able to log into the server
+without being prompted for a passphrase or a password.  If you're
+prompted for a passphrase, there are a few possible culprits.
+\begin{itemize}
+\item You might have forgotten to use \command{ssh-add} or
+  \command{pageant} to store the passphrase.
+\item You might have stored the passphrase for the wrong key.
+\end{itemize}
+If you're being prompted for the remote user's password, there are
+another few possible problems to check.
+\begin{itemize}
+\item Either the user's home directory or their \sdirname{.ssh}
+  directory might have excessively liberal permissions.  As a result,
+  the ssh daemon will not trust or read their
+  \sfilename{authorized\_keys} file.  For example, a group-writable
+  home or \sdirname{.ssh} directory will often cause this symptom.
+\item The user's \sfilename{authorized\_keys} file may have a problem.
+  If anyone other than the user owns or can write to that file, the
+  ssh daemon will not trust or read it.
+\end{itemize}
+
+In the ideal world, you should be able to run the following command
+successfully, and it should print exactly one line of output, the
+current date and time.
+\begin{codesample2}
+  ssh myserver date
+\end{codesample2}
+
+If, on your server, you have login scripts that print banners or other
+junk even when running non-interactive commands like this, you should
+fix them before you continue, so that they only print output if
+they're run interactively.  Otherwise these banners will at least
+clutter up Mercurial's output.  Worse, they could potentially cause
+problems with running Mercurial commands remotely.  Mercurial makes
+tries to detect and ignore banners in non-interactive \command{ssh}
+sessions, but it is not foolproof.  (If you're editing your login
+scripts on your server, the usual way to see if a login script is
+running in an interactive shell is to check the return code from the
+command \Verb|tty -s|.)
+
+Once you've verified that plain old ssh is working with your server,
+the next step is to ensure that Mercurial runs on the server.  The
+following command should run successfully:
+\begin{codesample2}
+  ssh myserver hg version
+\end{codesample2}
+If you see an error message instead of normal \hgcmd{version} output,
+this is usually because you haven't installed Mercurial to
+\dirname{/usr/bin}.  Don't worry if this is the case; you don't need
+to do that.  But you should check for a few possible problems.
+\begin{itemize}
+\item Is Mercurial really installed on the server at all?  I know this
+  sounds trivial, but it's worth checking!
+\item Maybe your shell's search path (usually set via the \envar{PATH}
+  environment variable) is simply misconfigured.
+\item Perhaps your \envar{PATH} environment variable is only being set
+  to point to the location of the \command{hg} executable if the login
+  session is interactive.  This can happen if you're setting the path
+  in the wrong shell login script.  See your shell's documentation for
+  details.
+\item The \envar{PYTHONPATH} environment variable may need to contain
+  the path to the Mercurial Python modules.  It might not be set at
+  all; it could be incorrect; or it may be set only if the login is
+  interactive.
+\end{itemize}
+
+If you can run \hgcmd{version} over an ssh connection, well done!
+You've got the server and client sorted out.  You should now be able
+to use Mercurial to access repositories hosted by that username on
+that server.  If you run into problems with Mercurial and ssh at this
+point, try using the \hggopt{--debug} option to get a clearer picture
+of what's going on.
+
+\subsection{Using compression with ssh}
+
+Mercurial does not compress data when it uses the ssh protocol,
+because the ssh protocol can transparently compress data.  However,
+the default behaviour of ssh clients is \emph{not} to request
+compression.
+
+Over any network other than a fast LAN (even a wireless network),
+using compression is likely to significantly speed up Mercurial's
+network operations.  For example, over a WAN, someone measured
+compression as reducing the amount of time required to clone a
+particularly large repository from~51 minutes to~17 minutes.
+
+Both \command{ssh} and \command{plink} accept a \cmdopt{ssh}{-C}
+option which turns on compression.  You can easily edit your \hgrc\ to
+enable compression for all of Mercurial's uses of the ssh protocol.
+\begin{codesample2}
+  [ui]
+  ssh = ssh -C
+\end{codesample2}
+
+If you use \command{ssh}, you can configure it to always use
+compression when talking to your server.  To do this, edit your
+\sfilename{.ssh/config} file (which may not yet exist), as follows.
+\begin{codesample2}
+  Host hg
+    Compression yes
+    HostName hg.example.com
+\end{codesample2}
+This defines an alias, \texttt{hg}.  When you use it on the
+\command{ssh} command line or in a Mercurial \texttt{ssh}-protocol
+URL, it will cause \command{ssh} to connect to \texttt{hg.example.com}
+and use compression.  This gives you both a shorter name to type and
+compression, each of which is a good thing in its own right.
+
+\section{Serving over HTTP using CGI}
+\label{sec:collab:cgi}
+
+Depending on how ambitious you are, configuring Mercurial's CGI
+interface can take anything from a few moments to several hours.
+
+We'll begin with the simplest of examples, and work our way towards a
+more complex configuration.  Even for the most basic case, you're
+almost certainly going to need to read and modify your web server's
+configuration.
+
+\begin{note}
+  Configuring a web server is a complex, fiddly, and highly
+  system-dependent activity.  I can't possibly give you instructions
+  that will cover anything like all of the cases you will encounter.
+  Please use your discretion and judgment in following the sections
+  below.  Be prepared to make plenty of mistakes, and to spend a lot
+  of time reading your server's error logs.
+\end{note}
+
+\subsection{Web server configuration checklist}
+
+Before you continue, do take a few moments to check a few aspects of
+your system's setup.
+
+\begin{enumerate}
+\item Do you have a web server installed at all?  Mac OS X ships with
+  Apache, but many other systems may not have a web server installed.
+\item If you have a web server installed, is it actually running?  On
+  most systems, even if one is present, it will be disabled by
+  default.
+\item Is your server configured to allow you to run CGI programs in
+  the directory where you plan to do so?  Most servers default to
+  explicitly disabling the ability to run CGI programs.
+\end{enumerate}
+
+If you don't have a web server installed, and don't have substantial
+experience configuring Apache, you should consider using the
+\texttt{lighttpd} web server instead of Apache.  Apache has a
+well-deserved reputation for baroque and confusing configuration.
+While \texttt{lighttpd} is less capable in some ways than Apache, most
+of these capabilities are not relevant to serving Mercurial
+repositories.  And \texttt{lighttpd} is undeniably \emph{much} easier
+to get started with than Apache.
+
+\subsection{Basic CGI configuration}
+
+On Unix-like systems, it's common for users to have a subdirectory
+named something like \dirname{public\_html} in their home directory,
+from which they can serve up web pages.  A file named \filename{foo}
+in this directory will be accessible at a URL of the form
+\texttt{http://www.example.com/\~username/foo}.
+
+To get started, find the \sfilename{hgweb.cgi} script that should be
+present in your Mercurial installation.  If you can't quickly find a
+local copy on your system, simply download one from the master
+Mercurial repository at
+\url{http://www.selenic.com/repo/hg/raw-file/tip/hgweb.cgi}.
+
+You'll need to copy this script into your \dirname{public\_html}
+directory, and ensure that it's executable.
+\begin{codesample2}
+  cp .../hgweb.cgi ~/public_html
+  chmod 755 ~/public_html/hgweb.cgi
+\end{codesample2}
+The \texttt{755} argument to \command{chmod} is a little more general
+than just making the script executable: it ensures that the script is
+executable by anyone, and that ``group'' and ``other'' write
+permissions are \emph{not} set.  If you were to leave those write
+permissions enabled, Apache's \texttt{suexec} subsystem would likely
+refuse to execute the script.  In fact, \texttt{suexec} also insists
+that the \emph{directory} in which the script resides must not be
+writable by others.
+\begin{codesample2}
+  chmod 755 ~/public_html
+\end{codesample2}
+
+\subsubsection{What could \emph{possibly} go wrong?}
+\label{sec:collab:wtf}
+
+Once you've copied the CGI script into place, go into a web browser,
+and try to open the URL \url{http://myhostname/~myuser/hgweb.cgi},
+\emph{but} brace yourself for instant failure.  There's a high
+probability that trying to visit this URL will fail, and there are
+many possible reasons for this.  In fact, you're likely to stumble
+over almost every one of the possible errors below, so please read
+carefully.  The following are all of the problems I ran into on a
+system running Fedora~7, with a fresh installation of Apache, and a
+user account that I created specially to perform this exercise.
+
+Your web server may have per-user directories disabled.  If you're
+using Apache, search your config file for a \texttt{UserDir}
+directive.  If there's none present, per-user directories will be
+disabled.  If one exists, but its value is \texttt{disabled}, then
+per-user directories will be disabled.  Otherwise, the string after
+\texttt{UserDir} gives the name of the subdirectory that Apache will
+look in under your home directory, for example \dirname{public\_html}.
+
+Your file access permissions may be too restrictive.  The web server
+must be able to traverse your home directory and directories under
+your \dirname{public\_html} directory, and read files under the latter
+too.  Here's a quick recipe to help you to make your permissions more
+appropriate.
+\begin{codesample2}
+  chmod 755 ~
+  find ~/public_html -type d -print0 | xargs -0r chmod 755
+  find ~/public_html -type f -print0 | xargs -0r chmod 644
+\end{codesample2}
+
+The other possibility with permissions is that you might get a
+completely empty window when you try to load the script.  In this
+case, it's likely that your access permissions are \emph{too
+  permissive}.  Apache's \texttt{suexec} subsystem won't execute a
+script that's group-~or world-writable, for example.
+
+Your web server may be configured to disallow execution of CGI
+programs in your per-user web directory.  Here's Apache's
+default per-user configuration from my Fedora system.
+\begin{codesample2}
+  <Directory /home/*/public_html>
+      AllowOverride FileInfo AuthConfig Limit
+      Options MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec
+      <Limit GET POST OPTIONS>
+          Order allow,deny
+          Allow from all
+      </Limit>
+      <LimitExcept GET POST OPTIONS>
+          Order deny,allow
+          Deny from all
+      </LimitExcept>
+  </Directory>
+\end{codesample2}
+If you find a similar-looking \texttt{Directory} group in your Apache
+configuration, the directive to look at inside it is \texttt{Options}.
+Add \texttt{ExecCGI} to the end of this list if it's missing, and
+restart the web server.
+
+If you find that Apache serves you the text of the CGI script instead
+of executing it, you may need to either uncomment (if already present)
+or add a directive like this.
+\begin{codesample2}
+  AddHandler cgi-script .cgi
+\end{codesample2}
+
+The next possibility is that you might be served with a colourful
+Python backtrace claiming that it can't import a
+\texttt{mercurial}-related module.  This is actually progress!  The
+server is now capable of executing your CGI script.  This error is
+only likely to occur if you're running a private installation of
+Mercurial, instead of a system-wide version.  Remember that the web
+server runs the CGI program without any of the environment variables
+that you take for granted in an interactive session.  If this error
+happens to you, edit your copy of \sfilename{hgweb.cgi} and follow the
+directions inside it to correctly set your \envar{PYTHONPATH}
+environment variable.
+
+Finally, you are \emph{certain} to by served with another colourful
+Python backtrace: this one will complain that it can't find
+\dirname{/path/to/repository}.  Edit your \sfilename{hgweb.cgi} script
+and replace the \dirname{/path/to/repository} string with the complete
+path to the repository you want to serve up.
+
+At this point, when you try to reload the page, you should be
+presented with a nice HTML view of your repository's history.  Whew!
+
+\subsubsection{Configuring lighttpd}
+
+To be exhaustive in my experiments, I tried configuring the
+increasingly popular \texttt{lighttpd} web server to serve the same
+repository as I described with Apache above.  I had already overcome
+all of the problems I outlined with Apache, many of which are not
+server-specific.  As a result, I was fairly sure that my file and
+directory permissions were good, and that my \sfilename{hgweb.cgi}
+script was properly edited.
+
+Once I had Apache running, getting \texttt{lighttpd} to serve the
+repository was a snap (in other words, even if you're trying to use
+\texttt{lighttpd}, you should read the Apache section).  I first had
+to edit the \texttt{mod\_access} section of its config file to enable
+\texttt{mod\_cgi} and \texttt{mod\_userdir}, both of which were
+disabled by default on my system.  I then added a few lines to the end
+of the config file, to configure these modules.
+\begin{codesample2}
+  userdir.path = "public_html"
+  cgi.assign = ( ".cgi" => "" )
+\end{codesample2}
+With this done, \texttt{lighttpd} ran immediately for me.  If I had
+configured \texttt{lighttpd} before Apache, I'd almost certainly have
+run into many of the same system-level configuration problems as I did
+with Apache.  However, I found \texttt{lighttpd} to be noticeably
+easier to configure than Apache, even though I've used Apache for over
+a decade, and this was my first exposure to \texttt{lighttpd}.
+
+\subsection{Sharing multiple repositories with one CGI script}
+
+The \sfilename{hgweb.cgi} script only lets you publish a single
+repository, which is an annoying restriction.  If you want to publish
+more than one without wracking yourself with multiple copies of the
+same script, each with different names, a better choice is to use the
+\sfilename{hgwebdir.cgi} script.
+
+The procedure to configure \sfilename{hgwebdir.cgi} is only a little
+more involved than for \sfilename{hgweb.cgi}.  First, you must obtain
+a copy of the script.  If you don't have one handy, you can download a
+copy from the master Mercurial repository at
+\url{http://www.selenic.com/repo/hg/raw-file/tip/hgwebdir.cgi}.
+
+You'll need to copy this script into your \dirname{public\_html}
+directory, and ensure that it's executable.
+\begin{codesample2}
+  cp .../hgwebdir.cgi ~/public_html
+  chmod 755 ~/public_html ~/public_html/hgwebdir.cgi
+\end{codesample2}
+With basic configuration out of the way, try to visit
+\url{http://myhostname/~myuser/hgwebdir.cgi} in your browser.  It
+should display an empty list of repositories.  If you get a blank
+window or error message, try walking through the list of potential
+problems in section~\ref{sec:collab:wtf}.
+
+The \sfilename{hgwebdir.cgi} script relies on an external
+configuration file.  By default, it searches for a file named
+\sfilename{hgweb.config} in the same directory as itself.  You'll need
+to create this file, and make it world-readable.  The format of the
+file is similar to a Windows ``ini'' file, as understood by Python's
+\texttt{ConfigParser}~\cite{web:configparser} module.
+
+The easiest way to configure \sfilename{hgwebdir.cgi} is with a
+section named \texttt{collections}.  This will automatically publish
+\emph{every} repository under the directories you name.  The section
+should look like this:
+\begin{codesample2}
+  [collections]
+  /my/root = /my/root
+\end{codesample2}
+Mercurial interprets this by looking at the directory name on the
+\emph{right} hand side of the ``\texttt{=}'' sign; finding
+repositories in that directory hierarchy; and using the text on the
+\emph{left} to strip off matching text from the names it will actually
+list in the web interface.  The remaining component of a path after
+this stripping has occurred is called a ``virtual path''.
+
+Given the example above, if we have a repository whose local path is
+\dirname{/my/root/this/repo}, the CGI script will strip the leading
+\dirname{/my/root} from the name, and publish the repository with a
+virtual path of \dirname{this/repo}.  If the base URL for our CGI
+script is \url{http://myhostname/~myuser/hgwebdir.cgi}, the complete
+URL for that repository will be
+\url{http://myhostname/~myuser/hgwebdir.cgi/this/repo}.
+
+If we replace \dirname{/my/root} on the left hand side of this example
+with \dirname{/my}, then \sfilename{hgwebdir.cgi} will only strip off
+\dirname{/my} from the repository name, and will give us a virtual
+path of \dirname{root/this/repo} instead of \dirname{this/repo}.
+
+The \sfilename{hgwebdir.cgi} script will recursively search each
+directory listed in the \texttt{collections} section of its
+configuration file, but it will \texttt{not} recurse into the
+repositories it finds.
+
+The \texttt{collections} mechanism makes it easy to publish many
+repositories in a ``fire and forget'' manner.  You only need to set up
+the CGI script and configuration file one time.  Afterwards, you can
+publish or unpublish a repository at any time by simply moving it
+into, or out of, the directory hierarchy in which you've configured
+\sfilename{hgwebdir.cgi} to look.
+
+\subsubsection{Explicitly specifying which repositories to publish}
+
+In addition to the \texttt{collections} mechanism, the
+\sfilename{hgwebdir.cgi} script allows you to publish a specific list
+of repositories.  To do so, create a \texttt{paths} section, with
+contents of the following form.
+\begin{codesample2}
+  [paths]
+  repo1 = /my/path/to/some/repo
+  repo2 = /some/path/to/another
+\end{codesample2}
+In this case, the virtual path (the component that will appear in a
+URL) is on the left hand side of each definition, while the path to
+the repository is on the right.  Notice that there does not need to be
+any relationship between the virtual path you choose and the location
+of a repository in your filesystem.
+
+If you wish, you can use both the \texttt{collections} and
+\texttt{paths} mechanisms simultaneously in a single configuration
+file.
+
+\begin{note}
+  If multiple repositories have the same virtual path,
+  \sfilename{hgwebdir.cgi} will not report an error.  Instead, it will
+  behave unpredictably.
+\end{note}
+
+\subsection{Downloading source archives}
+
+Mercurial's web interface lets users download an archive of any
+revision.  This archive will contain a snapshot of the working
+directory as of that revision, but it will not contain a copy of the
+repository data.
+
+By default, this feature is not enabled.  To enable it, you'll need to
+add an \rcitem{web}{allow\_archive} item to the \rcsection{web}
+section of your \hgrc.
+
+\subsection{Web configuration options}
+
+Mercurial's web interfaces (the \hgcmd{serve} command, and the
+\sfilename{hgweb.cgi} and \sfilename{hgwebdir.cgi} scripts) have a
+number of configuration options that you can set.  These belong in a
+section named \rcsection{web}.
+\begin{itemize}
+\item[\rcitem{web}{allow\_archive}] Determines which (if any) archive
+  download mechanisms Mercurial supports.  If you enable this
+  feature, users of the web interface will be able to download an
+  archive of whatever revision of a repository they are viewing.
+  To enable the archive feature, this item must take the form of a
+  sequence of words drawn from the list below.
+  \begin{itemize}
+  \item[\texttt{bz2}] A \command{tar} archive, compressed using
+    \texttt{bzip2} compression.  This has the best compression ratio,
+    but uses the most CPU time on the server.
+  \item[\texttt{gz}] A \command{tar} archive, compressed using
+    \texttt{gzip} compression.
+  \item[\texttt{zip}] A \command{zip} archive, compressed using LZW
+    compression.  This format has the worst compression ratio, but is
+    widely used in the Windows world.
+  \end{itemize}
+  If you provide an empty list, or don't have an
+  \rcitem{web}{allow\_archive} entry at all, this feature will be
+  disabled.  Here is an example of how to enable all three supported
+  formats.
+  \begin{codesample4}
+    [web]
+    allow_archive = bz2 gz zip
+  \end{codesample4}
+\item[\rcitem{web}{allowpull}] Boolean.  Determines whether the web
+  interface allows remote users to \hgcmd{pull} and \hgcmd{clone} this
+  repository over~HTTP.  If set to \texttt{no} or \texttt{false}, only
+  the ``human-oriented'' portion of the web interface is available.
+\item[\rcitem{web}{contact}] String.  A free-form (but preferably
+  brief) string identifying the person or group in charge of the
+  repository.  This often contains the name and email address of a
+  person or mailing list.  It often makes sense to place this entry in
+  a repository's own \sfilename{.hg/hgrc} file, but it can make sense
+  to use in a global \hgrc\ if every repository has a single
+  maintainer.
+\item[\rcitem{web}{maxchanges}] Integer.  The default maximum number
+  of changesets to display in a single page of output.
+\item[\rcitem{web}{maxfiles}] Integer.  The default maximum number
+  of modified files to display in a single page of output.
+\item[\rcitem{web}{stripes}] Integer.  If the web interface displays
+  alternating ``stripes'' to make it easier to visually align rows
+  when you are looking at a table, this number controls the number of
+  rows in each stripe.
+\item[\rcitem{web}{style}] Controls the template Mercurial uses to
+  display the web interface.  Mercurial ships with two web templates,
+  named \texttt{default} and \texttt{gitweb} (the latter is much more
+  visually attractive).  You can also specify a custom template of
+  your own; see chapter~\ref{chap:template} for details.  Here, you
+  can see how to enable the \texttt{gitweb} style.
+  \begin{codesample4}
+    [web]
+    style = gitweb
+  \end{codesample4}
+\item[\rcitem{web}{templates}] Path.  The directory in which to search
+  for template files.  By default, Mercurial searches in the directory
+  in which it was installed.
+\end{itemize}
+If you are using \sfilename{hgwebdir.cgi}, you can place a few
+configuration items in a \rcsection{web} section of the
+\sfilename{hgweb.config} file instead of a \hgrc\ file, for
+convenience.  These items are \rcitem{web}{motd} and
+\rcitem{web}{style}.
+
+\subsubsection{Options specific to an individual repository}
+
+A few \rcsection{web} configuration items ought to be placed in a
+repository's local \sfilename{.hg/hgrc}, rather than a user's or
+global \hgrc.
+\begin{itemize}
+\item[\rcitem{web}{description}] String.  A free-form (but preferably
+  brief) string that describes the contents or purpose of the
+  repository.
+\item[\rcitem{web}{name}] String.  The name to use for the repository
+  in the web interface.  This overrides the default name, which is the
+  last component of the repository's path.
+\end{itemize}
+
+\subsubsection{Options specific to the \hgcmd{serve} command}
+
+Some of the items in the \rcsection{web} section of a \hgrc\ file are
+only for use with the \hgcmd{serve} command.
+\begin{itemize}
+\item[\rcitem{web}{accesslog}] Path.  The name of a file into which to
+  write an access log.  By default, the \hgcmd{serve} command writes
+  this information to standard output, not to a file.  Log entries are
+  written in the standard ``combined'' file format used by almost all
+  web servers.
+\item[\rcitem{web}{address}] String.  The local address on which the
+  server should listen for incoming connections.  By default, the
+  server listens on all addresses.
+\item[\rcitem{web}{errorlog}] Path.  The name of a file into which to
+  write an error log.  By default, the \hgcmd{serve} command writes this
+  information to standard error, not to a file.
+\item[\rcitem{web}{ipv6}] Boolean.  Whether to use the IPv6 protocol.
+  By default, IPv6 is not used. 
+\item[\rcitem{web}{port}] Integer.  The TCP~port number on which the
+  server should listen.  The default port number used is~8000.
+\end{itemize}
+
+\subsubsection{Choosing the right \hgrc\ file to add \rcsection{web}
+  items to}
+
+It is important to remember that a web server like Apache or
+\texttt{lighttpd} will run under a user~ID that is different to yours.
+CGI scripts run by your server, such as \sfilename{hgweb.cgi}, will
+usually also run under that user~ID.
+
+If you add \rcsection{web} items to your own personal \hgrc\ file, CGI
+scripts won't read that \hgrc\ file.  Those settings will thus only
+affect the behaviour of the \hgcmd{serve} command when you run it.  To
+cause CGI scripts to see your settings, either create a \hgrc\ file in
+the home directory of the user ID that runs your web server, or add
+those settings to a system-wide \hgrc\ file.
+
+
+%%% Local Variables: 
+%%% mode: latex
+%%% TeX-master: "00book"
+%%% End: