Mercurial > hgbook
view en/collab.tex @ 196:4237e45506ee
Add early material describing tags.
author | Bryan O'Sullivan <bos@serpentine.com> |
---|---|
date | Mon, 16 Apr 2007 16:11:24 -0700 |
parents | b60e2de6dbc3 |
children | 8b599dcca584 |
line wrap: on
line source
\chapter{Collaborating with other people} \label{cha:collab} As a completely decentralised tool, Mercurial doesn't impose any policy on how people ought to work with each other. However, if you're new to distributed revision control, it helps to have some tools and examples in mind when you're thinking about possible workflow models. \section{Collaboration models} With a suitably flexible tool, making decisions about workflow is much more of a social engineering challenge than a technical one. Mercurial imposes few limitations on how you can structure the flow of work in a project, so it's up to you and your group to set up and live with a model that matches your own particular needs. \subsection{Factors to keep in mind} The most important aspect of any model that you must keep in mind is how well it matches the needs and capabilities of the people who will be using it. This might seem self-evident; even so, you still can't afford to forget it for a moment. I once put together a workflow model that seemed to make perfect sense to me, but that caused a considerable amount of consternation and strife within my development team. In spite of my attempts to explain why we needed a complex set of branches, and how changes ought to flow between them, a few team members revolted. Even though they were smart people, they didn't want to pay attention to the constraints we were operating under, or face the consequences of those constraints in the details of the model that I was advocating. Don't sweep foreseeable social or technical problems under the rug. Whatever scheme you put into effect, you should plan for mistakes and problem scenarios. Consider adding automated machinery to prevent, or quickly recover from, trouble that you can anticipate. As an example, if you intend to have a branch with not-for-release changes in it, you'd do well to think early about the possibility that someone might accidentally merge those changes into a release branch. You could avoid this particular problem by writing a hook that prevents changes from being merged from an inappropriate branch. \subsection{Informal anarchy} I wouldn't suggest an ``anything goes'' approach as something sustainable, but it's a model that's easy to grasp, and it works perfectly well in a few unusual situations. As one example, many projects have a loose-knit group of collaborators who rarely physically meet each other. Some groups like to overcome the isolation of working at a distance by organising occasional ``sprints''. In a sprint, a number of people get together in a single location (a company's conference room, a hotel meeting room, that kind of place) and spend several days more or less locked in there, hacking intensely on a handful of projects. A sprint is the perfect place to use the \hgcmd{serve} command, since \hgcmd{serve} does not requires any fancy server infrastructure. You can get started with \hgcmd{serve} in moments, by reading section~\ref{sec:collab:serve} below. Then simply tell the person next to you that you're running a server, send the URL to them in an instant message, and you immediately have a quick-turnaround way to work together. They can type your URL into their web browser and quickly review your changes; or they can pull a bugfix from you and verify it; or they can clone a branch containing a new feature and try it out. The charm, and the problem, with doing things in an ad hoc fashion like this is that only people who know about your changes, and where they are, can see them. Such an informal approach simply doesn't scale beyond a handful people, because each individual needs to know about $n$ different repositories to pull from. \subsection{A single central repository} For smaller projects migrating from a centralised revision control tool, perhaps the easiest way to get started is to have changes flow through a single shared central repository. This is also the most common ``building block'' for more ambitious workflow schemes. Contributors start by cloning a copy of this repository. They can pull changes from it whenever they need to, and some (perhaps all) developers have permission to push a change back when they're ready for other people to see it. Under this model, it can still often make sense for people to pull changes directly from each other, without going through the central repository. Consider a case in which I have a tentative bug fix, but I am worried that if I were to publish it to the central repository, it might subsequently break everyone else's trees as they pull it. To reduce the potential for damage, I can ask you to clone my repository into a temporary repository of your own and test it. This lets us put off publishing the potentially unsafe change until it has had a little testing. In this kind of scenario, people usually use the \command{ssh} protocol to securely push changes to the central repository, as documented in section~\ref{sec:collab:ssh}. It's also usual to publish a read-only copy of the repository over HTTP using CGI, as in section~\ref{sec:collab:cgi}. Publishing over HTTP satisfies the needs of people who don't have push access, and those who want to use web browsers to browse the repository's history. \subsection{Working with multiple branches} Projects of any significant size naturally tend to make progress on several fronts simultaneously. In the case of software, it's common for a project to go through periodic official releases. A release might then go into ``maintenance mode'' for a while after its first publication; maintenance releases tend to contain only bug fixes, not new features. In parallel with these maintenance releases, one or more future releases may be under development. People normally use the word ``branch'' to refer to one of these many slightly different directions in which development is proceeding. Mercurial is particularly well suited to managing a number of simultaneous, but not identical, branches. Each ``development direction'' can live in its own central repository, and you can merge changes from one to another as the need arises. Because repositories are independent of each other, unstable changes in a development branch will never affect a stable branch unless someone explicitly merges those changes in. Here's an example of how this can work in practice. Let's say you have one ``main branch'' on a central server. \interaction{branching.init} People clone it, make changes locally, test them, and push them back. Once the main branch reaches a release milestone, you can use the \hgcmd{tag} command to give a permanent name to the milestone revision. \interaction{branching.tag} Let's say some ongoing development occurs on the main branch. \interaction{branching.main} Using the tag that was recorded at the milestone, people who clone that repository at any time in the future can use \hgcmd{update} to get a copy of the working directory exactly as it was when that tagged revision was committed. \interaction{branching.update} In addition, immediately after the main branch is tagged, someone can then clone the main branch on the server to a new ``stable'' branch, also on the server. \interaction{branching.clone} Someone who needs to make a change to the stable branch can then clone \emph{that} repository, make their changes, commit, and push their changes back there. \interaction{branching.stable} Because Mercurial repositories are independent, and Mercurial doesn't move changes around automatically, the stable and main branches are \emph{isolated} from each other. The changes that you made on the main branch don't ``leak'' to the stable branch, and vice versa. You'll often want all of your bugfixes on the stable branch to show up on the main branch, too. Rather than rewrite a bugfix on the main branch, you can simply pull and merge changes from the stable to the main branch, and Mercurial will bring those bugfixes in for you. \interaction{branching.merge} The main branch will still contain changes that are not on the stable branch, but it will also contain all of the bugfixes from the stable branch. The stable branch remains unaffected by these changes. \subsection{Feature branches} For larger projects, an effective way to manage change is to break up a team into smaller groups. Each group has a shared branch of its own, cloned from a single ``master'' branch used by the entire project. People working on an individual branch are typically quite isolated from developments on other branches. \begin{figure}[ht] \centering \grafix{feature-branches} \caption{Feature branches} \label{fig:collab:feature-branches} \end{figure} When a particular feature is deemed to be in suitable shape, someone on that feature team pulls and merges from the master branch into the feature branch, then pushes back up to the master branch. \subsection{The release train} Some projects are organised on a ``train'' basis: a release is scheduled to happen every few months, and whatever features are ready when the ``train'' is ready to leave are allowed in. This model resembles working with feature branches. The difference is that when a feature branch misses a train, someone on the feature team pulls and merges the changes that went out on that train release into the feature branch, and the team continues its work on top of that release so that their feature can make the next release. \subsection{The Linux kernel model} The development of the Linux kernel has a shallow hierarchical structure, surrounded by a cloud of apparent chaos. Because most Linux developers use \command{git}, a distributed revision control tool with capabilities similar to Mercurial, it's useful to describe the way work flows in that environment; if you like the ideas, the approach translates well across tools. At the center of the community sits Linus Torvalds, the creator of Linux. He publishes a single source repository that is considered the ``authoritative'' current tree by the entire developer community. Anyone can clone Linus's tree, but he is very choosy about whose trees he pulls from. Linus has a number of ``trusted lieutenants''. As a general rule, he pulls whatever changes they publish, in most cases without even reviewing those changes. Some of those lieutenants are generally agreed to be ``maintainers'', responsible for specific subsystems within the kernel. If a random kernel hacker wants to make a change to a subsystem that they want to end up in Linus's tree, they must find out who the subsystem's maintainer is, and ask that maintainer to take their change. If the maintainer reviews their changes and agrees to take them, they'll pass them along to Linus in due course. Individual lieutenants have their own approaches to reviewing, accepting, and publishing changes; and for deciding when to feed them to Linus. In addition, there are several well known branches that people use for different purposes. For example, a few people maintain ``stable'' repositories of older versions of the kernel, to which they apply critical fixes as needed. Some maintainers publish multiple trees: one for experimental changes; one for changes that they are about to feed upstream; and so on. Others just publish a single tree. This model has two notable features. The first is that it's ``pull only''. You have to ask, convince, or beg another developer to take a change from you, because there are almost no trees to which more than one person can push, and there's no way to push changes into a tree that someone else controls. The second is that it's based on reputation and acclaim. If you're an unknown, Linus will probably ignore changes from you without even responding. But a subsystem maintainer will probably review them, and will likely take them if they pass their criteria for suitability. The more ``good'' changes you contribute to a maintainer, the more likely they are to trust your judgment and accept your changes. If you're well-known and maintain a long-lived branch for something Linus hasn't yet accepted, people with similar interests may pull your changes regularly to keep up with your work. Reputation and acclaim don't necessarily cross subsystem or ``people'' boundaries. If you're a respected but specialised storage hacker, and you try to fix a networking bug, that change will receive a level of scrutiny from a network maintainer comparable to a change from a complete stranger. To people who come from more orderly project backgrounds, the comparatively chaotic Linux kernel development process often seems completely insane. It's subject to the whims of individuals; people make sweeping changes whenever they deem it appropriate; and the pace of development is astounding. And yet Linux is a highly successful, well-regarded piece of software. \subsection{Pull-only versus shared-push collaboration} A perpetual source of heat in the open source community is whether a development model in which people only ever pull changes from others is ``better than'' one in which multiple people can push changes to a shared repository. Typically, the backers of the shared-push model use tools that actively enforce this approach. If you're using a centralised revision control tool such as Subversion, there's no way to make a choice over which model you'll use: the tool gives you shared-push, and if you want to do anything else, you'll have to roll your own approach on top (such as applying a patch by hand). A good distributed revision control tool, such as Mercurial, will support both models. You and your collaborators can then structure how you work together based on your own needs and preferences, not on what contortions your tools force you into. \subsection{Where collaboration meets branch management} Once you and your team set up some shared repositories and start propagating changes back and forth between local and shared repos, you begin to face a related, but slightly different challenge: that of managing the multiple directions in which your team may be moving at once. Even though this subject is intimately related to how your team collaborates, it's dense enough to merit treatment of its own, in chapter~\ref{chap:branch}. \section{The technical side of sharing} \subsection{Informal sharing with \hgcmd{serve}} \label{sec:collab:serve} Mercurial's \hgcmd{serve} command is wonderfully suited to small, tight-knit, and fast-paced group environments. It also provides a great way to get a feel for using Mercurial commands over a network. Run \hgcmd{serve} inside a repository, and in under a second it will bring up a specialised HTTP server; this will accept connections from any client, and serve up data for that repository until you terminate it. Anyone who knows the URL of the server you just started, and can talk to your computer over the network, can then use a web browser or Mercurial to read data from that repository. A URL for a \hgcmd{serve} instance running on a laptop is likely to look something like \Verb|http://my-laptop.local:8000/|. The \hgcmd{serve} command is \emph{not} a general-purpose web server. It can do only two things: \begin{itemize} \item Allow people to browse the history of the repository it's serving, from their normal web browsers. \item Speak Mercurial's wire protocol, so that people can \hgcmd{clone} or \hgcmd{pull} changes from that repository. \end{itemize} In particular, \hgcmd{serve} won't allow remote users to \emph{modify} your repository. It's intended for read-only use. If you're getting started with Mercurial, there's nothing to prevent you from using \hgcmd{serve} to serve up a repository on your own computer, then use commands like \hgcmd{clone}, \hgcmd{incoming}, and so on to talk to that server as if the repository was hosted remotely. This can help you to quickly get acquainted with using commands on network-hosted repositories. \subsubsection{A few things to keep in mind} Because it provides unauthenticated read access to all clients, you should only use \hgcmd{serve} in an environment where you either don't care, or have complete control over, who can access your network and pull data from your repository. The \hgcmd{serve} command knows nothing about any firewall software you might have installed on your system or network. It cannot detect or control your firewall software. If other people are unable to talk to a running \hgcmd{serve} instance, the second thing you should do (\emph{after} you make sure that they're using the correct URL) is check your firewall configuration. By default, \hgcmd{serve} listens for incoming connections on port~8000. If another process is already listening on the port you want to use, you can specify a different port to listen on using the \hgopt{serve}{-p} option. Normally, when \hgcmd{serve} starts, it prints no output, which can be a bit unnerving. If you'd like to confirm that it is indeed running correctly, and find out what URL you should send to your collaborators, start it with the \hggopt{-v} option. \subsection{Using the Secure Shell (ssh) protocol} \label{sec:collab:ssh} You can pull and push changes securely over a network connection using the Secure Shell (\texttt{ssh}) protocol. To use this successfully, you may have to do a little bit of configuration on the client or server sides. If you're not familiar with ssh, it's a network protocol that lets you securely communicate with another computer. To use it with Mercurial, you'll be setting up one or more user accounts on a server so that remote users can log in and execute commands. (If you \emph{are} familiar with ssh, you'll probably find some of the material that follows to be elementary in nature.) \subsubsection{How to read and write ssh URLs} An ssh URL tends to look like this: \begin{codesample2} ssh://bos@hg.serpentine.com:22/hg/hgbook \end{codesample2} \begin{enumerate} \item The ``\texttt{ssh://}'' part tells Mercurial to use the ssh protocol. \item The ``\texttt{bos@}'' component indicates what username to log into the server as. You can leave this out if the remote username is the same as your local username. \item The ``\texttt{hg.serpentine.com}'' gives the hostname of the server to log into. \item The ``:22'' identifies the port number to connect to the server on. The default port is~22, so you only need to specify this part if you're \emph{not} using port~22. \item The remainder of the URL is the local path to the repository on the server. \end{enumerate} There's plenty of scope for confusion with the path component of ssh URLs, as there is no standard way for tools to interpret it. Some programs behave differently than others when dealing with these paths. This isn't an ideal situation, but it's unlikely to change. Please read the following paragraphs carefully. Mercurial treats the path to a repository on the server as relative to the remote user's home directory. For example, if user \texttt{foo} on the server has a home directory of \dirname{/home/foo}, then an ssh URL that contains a path component of \dirname{bar} \emph{really} refers to the directory \dirname{/home/foo/bar}. If you want to specify a path relative to another user's home directory, you can use a path that starts with a tilde character followed by the user's name (let's call them \texttt{otheruser}), like this. \begin{codesample2} ssh://server/~otheruser/hg/repo \end{codesample2} And if you really want to specify an \emph{absolute} path on the server, begin the path component with two slashes, as in this example. \begin{codesample2} ssh://server//absolute/path \end{codesample2} \subsubsection{Finding an ssh client for your system} Almost every Unix-like system comes with OpenSSH preinstalled. If you're using such a system, run \Verb|which ssh| to find out if the \command{ssh} command is installed (it's usually in \dirname{/usr/bin}). In the unlikely event that it isn't present, take a look at your system documentation to figure out how to install it. On Windows, you'll first need to choose download a suitable ssh client. There are two alternatives. \begin{itemize} \item Simon Tatham's excellent PuTTY package~\cite{web:putty} provides a complete suite of ssh client commands. \item If you have a high tolerance for pain, you can use the Cygwin port of OpenSSH. \end{itemize} In either case, you'll need to edit your \hgini\ file to tell Mercurial where to find the actual client command. For example, if you're using PuTTY, you'll need to use the \command{plink} command as a command-line ssh client. \begin{codesample2} [ui] ssh = C:/path/to/plink.exe -ssh -i "C:/path/to/my/private/key" \end{codesample2} \begin{note} The path to \command{plink} shouldn't contain any whitespace characters, or Mercurial may not be able to run it correctly (so putting it in \dirname{C:\\Program Files} is probably not be a good idea). \end{note} \subsubsection{Generating a key pair} To avoid the need to repetitively type a password every time you need to use your ssh client, I recommend generating a key pair. On a Unix-like system, the \command{ssh-keygen} command will do the trick. On Windows, if you're using PuTTY, the \command{puttygen} command is what you'll need. When you generate a key pair, it's usually \emph{highly} advisable to protect it with a passphrase. (The only time that you might not want to do this id when you're using the ssh protocol for automated tasks on a secure network.) Simply generating a key pair isn't enough, however. You'll need to add the public key to the set of authorised keys for whatever user you're logging in remotely as. For servers using OpenSSH (the vast majority), this will mean adding the public key to a list in a file called \sfilename{authorized\_keys} in their \sdirname{.ssh} directory. On a Unix-like system, your public key will have a \filename{.pub} extension. If you're using \command{puttygen} on Windows, you can save the public key to a file of your choosing, or paste it from the window it's displayed in straight into the \sfilename{authorized\_keys} file. \subsubsection{Using an authentication agent} An authentication agent is a daemon that stores passphrases in memory (so it will forget passphrases if you log out and log back in again). An ssh client will notice if it's running, and query it for a passphrase. If there's no authentication agent running, or the agent doesn't store the necessary passphrase, you'll have to type your passphrase every time Mercurial tries to communicate with a server on your behalf (e.g.~whenever you pull or push changes). The downside of storing passphrases in an agent is that it's possible for a well-prepared attacker to recover the plain text of your passphrases, in some cases even if your system has been power-cycled. You should make your own judgment as to whether this is an acceptable risk. It certainly saves a lot of repeated typing. On Unix-like systems, the agent is called \command{ssh-agent}, and it's often run automatically for you when you log in. You'll need to use the \command{ssh-add} command to add passphrases to the agent's store. On Windows, if you're using PuTTY, the \command{pageant} command acts as the agent. It adds an icon to your system tray that will let you manage stored passphrases. \subsubsection{Configuring the server side properly} Because ssh can be fiddly to set up if you're new to it, there's a variety of things that can go wrong. Add Mercurial on top, and there's plenty more scope for head-scratching. Most of these potential problems occur on the server side, not the client side. The good news is that once you've gotten a configuration working, it will usually continue to work indefinitely. Before you try using Mercurial to talk to an ssh server, it's best to make sure that you can use the normal \command{ssh} or \command{putty} command to talk to the server first. If you run into problems with using these commands directly, Mercurial surely won't work. Worse, it will obscure the underlying problem. Any time you want to debug ssh-related Mercurial problems, you should drop back to making sure that plain ssh client commands work first, \emph{before} you worry about whether there's a problem with Mercurial. The first thing to be sure of on the server side is that you can actually log in from another machine at all. If you can't use \command{ssh} or \command{putty} to log in, the error message you get may give you a few hints as to what's wrong. The most common problems are as follows. \begin{itemize} \item If you get a ``connection refused'' error, either there isn't an SSH daemon running on the server at all, or it's inaccessible due to firewall configuration. \item If you get a ``no route to host'' error, you either have an incorrect address for the server or a seriously locked down firewall that won't admit its existence at all. \item If you get a ``permission denied'' error, you may have mistyped the username on the server, or you could have mistyped your key's passphrase or the remote user's password. \end{itemize} In summary, if you're having trouble talking to the server's ssh daemon, first make sure that one is running at all. On many systems it will be installed, but disabled, by default. Once you're done with this step, you should then check that the server's firewall is configured to allow incoming connections on the port the ssh daemon is listening on (usually~22). Don't worry about more exotic possibilities for misconfiguration until you've checked these two first. If you're using an authentication agent on the client side to store passphrases for your keys, you ought to be able to log into the server without being prompted for a passphrase or a password. If you're prompted for a passphrase, there are a few possible culprits. \begin{itemize} \item You might have forgotten to use \command{ssh-add} or \command{pageant} to store the passphrase. \item You might have stored the passphrase for the wrong key. \end{itemize} If you're being prompted for the remote user's password, there are another few possible problems to check. \begin{itemize} \item Either the user's home directory or their \sdirname{.ssh} directory might have excessively liberal permissions. As a result, the ssh daemon will not trust or read their \sfilename{authorized\_keys} file. For example, a group-writable home or \sdirname{.ssh} directory will often cause this symptom. \item The user's \sfilename{authorized\_keys} file may have a problem. If anyone other than the user owns or can write to that file, the ssh daemon will not trust or read it. \end{itemize} In the ideal world, you should be able to run the following command successfully, and it should print exactly one line of output, the current date and time. \begin{codesample2} ssh myserver date \end{codesample2} If on your server you have login scripts that print banners or other junk even when running non-interactive commands like this, you should fix them before you continue, so that they only print output if they're run interactively. Otherwise these banners will at least clutter up Mercurial's output. Worse, they could potentially cause problems with running Mercurial commands remotely. (The usual way to see if a login script is running in an interactive shell is to check the return code from the command \Verb|tty -s|.) Once you've verified that plain old ssh is working with your server, the next step is to ensure that Mercurial runs on the server. The following command should run successfully: \begin{codesample2} ssh myserver hg version \end{codesample2} If you see an error message instead of normal \hgcmd{version} output, this is usually because you haven't installed Mercurial to \dirname{/usr/bin}. Don't worry if this is the case; you don't need to do that. But you should check for a few possible problems. \begin{itemize} \item Is Mercurial really installed on the server at all? I know this sounds trivial, but it's worth checking! \item Maybe your shell's search path (usually set via the \envar{PATH} environment variable) is simply misconfigured. \item Perhaps your \envar{PATH} environment variable is only being set to point to the location of the \command{hg} executable if the login session is interactive. This can happen if you're setting the path in the wrong shell login script. See your shell's documentation for details. \item The \envar{PYTHONPATH} environment variable may need to contain the path to the Mercurial Python modules. It might not be set at all; it could be incorrect; or it may be set only if the login is interactive. \end{itemize} If you can run \hgcmd{version} over an ssh connection, well done! You've got the server and client sorted out. You should now be able to use Mercurial to access repositories hosted by that username on that server. If you run into problems with Mercurial and ssh at this point, try using the \hggopt{--debug} option to get a clearer picture of what's going on. \subsubsection{Using compression with ssh} Mercurial does not compress data when it uses the ssh protocol, because the ssh protocol can transparently compress data. However, the default behaviour of ssh clients is \emph{not} to request compression. Over any network other than a fast LAN (even a wireless network), using compression is likely to significantly speed up Mercurial's network operations. For example, over a WAN, someone measured compression as reducing the amount of time required to clone a particularly large repository from~51 minutes to~17 minutes. Both \command{ssh} and \command{plink} accept a \cmdopt{ssh}{-C} option which turns on compression. You can easily edit your \hgrc\ to enable compression for all of Mercurial's uses of the ssh protocol. \begin{codesample2} [ui] ssh = ssh -C \end{codesample2} \subsection{Serving over HTTP with a CGI script} \label{sec:collab:cgi} %%% Local Variables: %%% mode: latex %%% TeX-master: "00book" %%% End: