comparison ja/collab.tex @ 290:b0db5adf11c1 ja_root

fork Japanese translation.
author Yoshiki Yazawa <yaz@cc.rim.or.jp>
date Wed, 06 Feb 2008 17:43:11 +0900
parents en/collab.tex@f8a2fe77908d
children 3b1291f24c0d
comparison
equal deleted inserted replaced
289:7be02466421b 290:b0db5adf11c1
1 \chapter{Collaborating with other people}
2 \label{cha:collab}
3
4 As a completely decentralised tool, Mercurial doesn't impose any
5 policy on how people ought to work with each other. However, if
6 you're new to distributed revision control, it helps to have some
7 tools and examples in mind when you're thinking about possible
8 workflow models.
9
10 \section{Mercurial's web interface}
11
12 Mercurial has a powerful web interface that provides several
13 useful capabilities.
14
15 For interactive use, the web interface lets you browse a single
16 repository or a collection of repositories. You can view the history
17 of a repository, examine each change (comments and diffs), and view
18 the contents of each directory and file.
19
20 Also for human consumption, the web interface provides an RSS feed of
21 the changes in a repository. This lets you ``subscribe'' to a
22 repository using your favourite feed reader, and be automatically
23 notified of activity in that repository as soon as it happens. I find
24 this capability much more convenient than the model of subscribing to
25 a mailing list to which notifications are sent, as it requires no
26 additional configuration on the part of whoever is serving the
27 repository.
28
29 The web interface also lets remote users clone a repository, pull
30 changes from it, and (when the server is configured to permit it) push
31 changes back to it. Mercurial's HTTP tunneling protocol aggressively
32 compresses data, so that it works efficiently even over low-bandwidth
33 network connections.
34
35 The easiest way to get started with the web interface is to use your
36 web browser to visit an existing repository, such as the master
37 Mercurial repository at
38 \url{http://www.selenic.com/repo/hg?style=gitweb}.
39
40 If you're interested in providing a web interface to your own
41 repositories, Mercurial provides two ways to do this. The first is
42 using the \hgcmd{serve} command, which is best suited to short-term
43 ``lightweight'' serving. See section~\ref{sec:collab:serve} below for
44 details of how to use this command. If you have a long-lived
45 repository that you'd like to make permanently available, Mercurial
46 has built-in support for the CGI (Common Gateway Interface) standard,
47 which all common web servers support. See
48 section~\ref{sec:collab:cgi} for details of CGI configuration.
49
50 \section{Collaboration models}
51
52 With a suitably flexible tool, making decisions about workflow is much
53 more of a social engineering challenge than a technical one.
54 Mercurial imposes few limitations on how you can structure the flow of
55 work in a project, so it's up to you and your group to set up and live
56 with a model that matches your own particular needs.
57
58 \subsection{Factors to keep in mind}
59
60 The most important aspect of any model that you must keep in mind is
61 how well it matches the needs and capabilities of the people who will
62 be using it. This might seem self-evident; even so, you still can't
63 afford to forget it for a moment.
64
65 I once put together a workflow model that seemed to make perfect sense
66 to me, but that caused a considerable amount of consternation and
67 strife within my development team. In spite of my attempts to explain
68 why we needed a complex set of branches, and how changes ought to flow
69 between them, a few team members revolted. Even though they were
70 smart people, they didn't want to pay attention to the constraints we
71 were operating under, or face the consequences of those constraints in
72 the details of the model that I was advocating.
73
74 Don't sweep foreseeable social or technical problems under the rug.
75 Whatever scheme you put into effect, you should plan for mistakes and
76 problem scenarios. Consider adding automated machinery to prevent, or
77 quickly recover from, trouble that you can anticipate. As an example,
78 if you intend to have a branch with not-for-release changes in it,
79 you'd do well to think early about the possibility that someone might
80 accidentally merge those changes into a release branch. You could
81 avoid this particular problem by writing a hook that prevents changes
82 from being merged from an inappropriate branch.
83
84 \subsection{Informal anarchy}
85
86 I wouldn't suggest an ``anything goes'' approach as something
87 sustainable, but it's a model that's easy to grasp, and it works
88 perfectly well in a few unusual situations.
89
90 As one example, many projects have a loose-knit group of collaborators
91 who rarely physically meet each other. Some groups like to overcome
92 the isolation of working at a distance by organising occasional
93 ``sprints''. In a sprint, a number of people get together in a single
94 location (a company's conference room, a hotel meeting room, that kind
95 of place) and spend several days more or less locked in there, hacking
96 intensely on a handful of projects.
97
98 A sprint is the perfect place to use the \hgcmd{serve} command, since
99 \hgcmd{serve} does not requires any fancy server infrastructure. You
100 can get started with \hgcmd{serve} in moments, by reading
101 section~\ref{sec:collab:serve} below. Then simply tell the person
102 next to you that you're running a server, send the URL to them in an
103 instant message, and you immediately have a quick-turnaround way to
104 work together. They can type your URL into their web browser and
105 quickly review your changes; or they can pull a bugfix from you and
106 verify it; or they can clone a branch containing a new feature and try
107 it out.
108
109 The charm, and the problem, with doing things in an ad hoc fashion
110 like this is that only people who know about your changes, and where
111 they are, can see them. Such an informal approach simply doesn't
112 scale beyond a handful people, because each individual needs to know
113 about $n$ different repositories to pull from.
114
115 \subsection{A single central repository}
116
117 For smaller projects migrating from a centralised revision control
118 tool, perhaps the easiest way to get started is to have changes flow
119 through a single shared central repository. This is also the
120 most common ``building block'' for more ambitious workflow schemes.
121
122 Contributors start by cloning a copy of this repository. They can
123 pull changes from it whenever they need to, and some (perhaps all)
124 developers have permission to push a change back when they're ready
125 for other people to see it.
126
127 Under this model, it can still often make sense for people to pull
128 changes directly from each other, without going through the central
129 repository. Consider a case in which I have a tentative bug fix, but
130 I am worried that if I were to publish it to the central repository,
131 it might subsequently break everyone else's trees as they pull it. To
132 reduce the potential for damage, I can ask you to clone my repository
133 into a temporary repository of your own and test it. This lets us put
134 off publishing the potentially unsafe change until it has had a little
135 testing.
136
137 In this kind of scenario, people usually use the \command{ssh}
138 protocol to securely push changes to the central repository, as
139 documented in section~\ref{sec:collab:ssh}. It's also usual to
140 publish a read-only copy of the repository over HTTP using CGI, as in
141 section~\ref{sec:collab:cgi}. Publishing over HTTP satisfies the
142 needs of people who don't have push access, and those who want to use
143 web browsers to browse the repository's history.
144
145 \subsection{Working with multiple branches}
146
147 Projects of any significant size naturally tend to make progress on
148 several fronts simultaneously. In the case of software, it's common
149 for a project to go through periodic official releases. A release
150 might then go into ``maintenance mode'' for a while after its first
151 publication; maintenance releases tend to contain only bug fixes, not
152 new features. In parallel with these maintenance releases, one or
153 more future releases may be under development. People normally use
154 the word ``branch'' to refer to one of these many slightly different
155 directions in which development is proceeding.
156
157 Mercurial is particularly well suited to managing a number of
158 simultaneous, but not identical, branches. Each ``development
159 direction'' can live in its own central repository, and you can merge
160 changes from one to another as the need arises. Because repositories
161 are independent of each other, unstable changes in a development
162 branch will never affect a stable branch unless someone explicitly
163 merges those changes in.
164
165 Here's an example of how this can work in practice. Let's say you
166 have one ``main branch'' on a central server.
167 \interaction{branching.init}
168 People clone it, make changes locally, test them, and push them back.
169
170 Once the main branch reaches a release milestone, you can use the
171 \hgcmd{tag} command to give a permanent name to the milestone
172 revision.
173 \interaction{branching.tag}
174 Let's say some ongoing development occurs on the main branch.
175 \interaction{branching.main}
176 Using the tag that was recorded at the milestone, people who clone
177 that repository at any time in the future can use \hgcmd{update} to
178 get a copy of the working directory exactly as it was when that tagged
179 revision was committed.
180 \interaction{branching.update}
181
182 In addition, immediately after the main branch is tagged, someone can
183 then clone the main branch on the server to a new ``stable'' branch,
184 also on the server.
185 \interaction{branching.clone}
186
187 Someone who needs to make a change to the stable branch can then clone
188 \emph{that} repository, make their changes, commit, and push their
189 changes back there.
190 \interaction{branching.stable}
191 Because Mercurial repositories are independent, and Mercurial doesn't
192 move changes around automatically, the stable and main branches are
193 \emph{isolated} from each other. The changes that you made on the
194 main branch don't ``leak'' to the stable branch, and vice versa.
195
196 You'll often want all of your bugfixes on the stable branch to show up
197 on the main branch, too. Rather than rewrite a bugfix on the main
198 branch, you can simply pull and merge changes from the stable to the
199 main branch, and Mercurial will bring those bugfixes in for you.
200 \interaction{branching.merge}
201 The main branch will still contain changes that are not on the stable
202 branch, but it will also contain all of the bugfixes from the stable
203 branch. The stable branch remains unaffected by these changes.
204
205 \subsection{Feature branches}
206
207 For larger projects, an effective way to manage change is to break up
208 a team into smaller groups. Each group has a shared branch of its
209 own, cloned from a single ``master'' branch used by the entire
210 project. People working on an individual branch are typically quite
211 isolated from developments on other branches.
212
213 \begin{figure}[ht]
214 \centering
215 \grafix{feature-branches}
216 \caption{Feature branches}
217 \label{fig:collab:feature-branches}
218 \end{figure}
219
220 When a particular feature is deemed to be in suitable shape, someone
221 on that feature team pulls and merges from the master branch into the
222 feature branch, then pushes back up to the master branch.
223
224 \subsection{The release train}
225
226 Some projects are organised on a ``train'' basis: a release is
227 scheduled to happen every few months, and whatever features are ready
228 when the ``train'' is ready to leave are allowed in.
229
230 This model resembles working with feature branches. The difference is
231 that when a feature branch misses a train, someone on the feature team
232 pulls and merges the changes that went out on that train release into
233 the feature branch, and the team continues its work on top of that
234 release so that their feature can make the next release.
235
236 \subsection{The Linux kernel model}
237
238 The development of the Linux kernel has a shallow hierarchical
239 structure, surrounded by a cloud of apparent chaos. Because most
240 Linux developers use \command{git}, a distributed revision control
241 tool with capabilities similar to Mercurial, it's useful to describe
242 the way work flows in that environment; if you like the ideas, the
243 approach translates well across tools.
244
245 At the center of the community sits Linus Torvalds, the creator of
246 Linux. He publishes a single source repository that is considered the
247 ``authoritative'' current tree by the entire developer community.
248 Anyone can clone Linus's tree, but he is very choosy about whose trees
249 he pulls from.
250
251 Linus has a number of ``trusted lieutenants''. As a general rule, he
252 pulls whatever changes they publish, in most cases without even
253 reviewing those changes. Some of those lieutenants are generally
254 agreed to be ``maintainers'', responsible for specific subsystems
255 within the kernel. If a random kernel hacker wants to make a change
256 to a subsystem that they want to end up in Linus's tree, they must
257 find out who the subsystem's maintainer is, and ask that maintainer to
258 take their change. If the maintainer reviews their changes and agrees
259 to take them, they'll pass them along to Linus in due course.
260
261 Individual lieutenants have their own approaches to reviewing,
262 accepting, and publishing changes; and for deciding when to feed them
263 to Linus. In addition, there are several well known branches that
264 people use for different purposes. For example, a few people maintain
265 ``stable'' repositories of older versions of the kernel, to which they
266 apply critical fixes as needed. Some maintainers publish multiple
267 trees: one for experimental changes; one for changes that they are
268 about to feed upstream; and so on. Others just publish a single
269 tree.
270
271 This model has two notable features. The first is that it's ``pull
272 only''. You have to ask, convince, or beg another developer to take a
273 change from you, because there are almost no trees to which more than
274 one person can push, and there's no way to push changes into a tree
275 that someone else controls.
276
277 The second is that it's based on reputation and acclaim. If you're an
278 unknown, Linus will probably ignore changes from you without even
279 responding. But a subsystem maintainer will probably review them, and
280 will likely take them if they pass their criteria for suitability.
281 The more ``good'' changes you contribute to a maintainer, the more
282 likely they are to trust your judgment and accept your changes. If
283 you're well-known and maintain a long-lived branch for something Linus
284 hasn't yet accepted, people with similar interests may pull your
285 changes regularly to keep up with your work.
286
287 Reputation and acclaim don't necessarily cross subsystem or ``people''
288 boundaries. If you're a respected but specialised storage hacker, and
289 you try to fix a networking bug, that change will receive a level of
290 scrutiny from a network maintainer comparable to a change from a
291 complete stranger.
292
293 To people who come from more orderly project backgrounds, the
294 comparatively chaotic Linux kernel development process often seems
295 completely insane. It's subject to the whims of individuals; people
296 make sweeping changes whenever they deem it appropriate; and the pace
297 of development is astounding. And yet Linux is a highly successful,
298 well-regarded piece of software.
299
300 \subsection{Pull-only versus shared-push collaboration}
301
302 A perpetual source of heat in the open source community is whether a
303 development model in which people only ever pull changes from others
304 is ``better than'' one in which multiple people can push changes to a
305 shared repository.
306
307 Typically, the backers of the shared-push model use tools that
308 actively enforce this approach. If you're using a centralised
309 revision control tool such as Subversion, there's no way to make a
310 choice over which model you'll use: the tool gives you shared-push,
311 and if you want to do anything else, you'll have to roll your own
312 approach on top (such as applying a patch by hand).
313
314 A good distributed revision control tool, such as Mercurial, will
315 support both models. You and your collaborators can then structure
316 how you work together based on your own needs and preferences, not on
317 what contortions your tools force you into.
318
319 \subsection{Where collaboration meets branch management}
320
321 Once you and your team set up some shared repositories and start
322 propagating changes back and forth between local and shared repos, you
323 begin to face a related, but slightly different challenge: that of
324 managing the multiple directions in which your team may be moving at
325 once. Even though this subject is intimately related to how your team
326 collaborates, it's dense enough to merit treatment of its own, in
327 chapter~\ref{chap:branch}.
328
329 \section{The technical side of sharing}
330
331 The remainder of this chapter is devoted to the question of serving
332 data to your collaborators.
333
334 \section{Informal sharing with \hgcmd{serve}}
335 \label{sec:collab:serve}
336
337 Mercurial's \hgcmd{serve} command is wonderfully suited to small,
338 tight-knit, and fast-paced group environments. It also provides a
339 great way to get a feel for using Mercurial commands over a network.
340
341 Run \hgcmd{serve} inside a repository, and in under a second it will
342 bring up a specialised HTTP server; this will accept connections from
343 any client, and serve up data for that repository until you terminate
344 it. Anyone who knows the URL of the server you just started, and can
345 talk to your computer over the network, can then use a web browser or
346 Mercurial to read data from that repository. A URL for a
347 \hgcmd{serve} instance running on a laptop is likely to look something
348 like \Verb|http://my-laptop.local:8000/|.
349
350 The \hgcmd{serve} command is \emph{not} a general-purpose web server.
351 It can do only two things:
352 \begin{itemize}
353 \item Allow people to browse the history of the repository it's
354 serving, from their normal web browsers.
355 \item Speak Mercurial's wire protocol, so that people can
356 \hgcmd{clone} or \hgcmd{pull} changes from that repository.
357 \end{itemize}
358 In particular, \hgcmd{serve} won't allow remote users to \emph{modify}
359 your repository. It's intended for read-only use.
360
361 If you're getting started with Mercurial, there's nothing to prevent
362 you from using \hgcmd{serve} to serve up a repository on your own
363 computer, then use commands like \hgcmd{clone}, \hgcmd{incoming}, and
364 so on to talk to that server as if the repository was hosted remotely.
365 This can help you to quickly get acquainted with using commands on
366 network-hosted repositories.
367
368 \subsection{A few things to keep in mind}
369
370 Because it provides unauthenticated read access to all clients, you
371 should only use \hgcmd{serve} in an environment where you either don't
372 care, or have complete control over, who can access your network and
373 pull data from your repository.
374
375 The \hgcmd{serve} command knows nothing about any firewall software
376 you might have installed on your system or network. It cannot detect
377 or control your firewall software. If other people are unable to talk
378 to a running \hgcmd{serve} instance, the second thing you should do
379 (\emph{after} you make sure that they're using the correct URL) is
380 check your firewall configuration.
381
382 By default, \hgcmd{serve} listens for incoming connections on
383 port~8000. If another process is already listening on the port you
384 want to use, you can specify a different port to listen on using the
385 \hgopt{serve}{-p} option.
386
387 Normally, when \hgcmd{serve} starts, it prints no output, which can be
388 a bit unnerving. If you'd like to confirm that it is indeed running
389 correctly, and find out what URL you should send to your
390 collaborators, start it with the \hggopt{-v} option.
391
392 \section{Using the Secure Shell (ssh) protocol}
393 \label{sec:collab:ssh}
394
395 You can pull and push changes securely over a network connection using
396 the Secure Shell (\texttt{ssh}) protocol. To use this successfully,
397 you may have to do a little bit of configuration on the client or
398 server sides.
399
400 If you're not familiar with ssh, it's a network protocol that lets you
401 securely communicate with another computer. To use it with Mercurial,
402 you'll be setting up one or more user accounts on a server so that
403 remote users can log in and execute commands.
404
405 (If you \emph{are} familiar with ssh, you'll probably find some of the
406 material that follows to be elementary in nature.)
407
408 \subsection{How to read and write ssh URLs}
409
410 An ssh URL tends to look like this:
411 \begin{codesample2}
412 ssh://bos@hg.serpentine.com:22/hg/hgbook
413 \end{codesample2}
414 \begin{enumerate}
415 \item The ``\texttt{ssh://}'' part tells Mercurial to use the ssh
416 protocol.
417 \item The ``\texttt{bos@}'' component indicates what username to log
418 into the server as. You can leave this out if the remote username
419 is the same as your local username.
420 \item The ``\texttt{hg.serpentine.com}'' gives the hostname of the
421 server to log into.
422 \item The ``:22'' identifies the port number to connect to the server
423 on. The default port is~22, so you only need to specify this part
424 if you're \emph{not} using port~22.
425 \item The remainder of the URL is the local path to the repository on
426 the server.
427 \end{enumerate}
428
429 There's plenty of scope for confusion with the path component of ssh
430 URLs, as there is no standard way for tools to interpret it. Some
431 programs behave differently than others when dealing with these paths.
432 This isn't an ideal situation, but it's unlikely to change. Please
433 read the following paragraphs carefully.
434
435 Mercurial treats the path to a repository on the server as relative to
436 the remote user's home directory. For example, if user \texttt{foo}
437 on the server has a home directory of \dirname{/home/foo}, then an ssh
438 URL that contains a path component of \dirname{bar}
439 \emph{really} refers to the directory \dirname{/home/foo/bar}.
440
441 If you want to specify a path relative to another user's home
442 directory, you can use a path that starts with a tilde character
443 followed by the user's name (let's call them \texttt{otheruser}), like
444 this.
445 \begin{codesample2}
446 ssh://server/~otheruser/hg/repo
447 \end{codesample2}
448
449 And if you really want to specify an \emph{absolute} path on the
450 server, begin the path component with two slashes, as in this example.
451 \begin{codesample2}
452 ssh://server//absolute/path
453 \end{codesample2}
454
455 \subsection{Finding an ssh client for your system}
456
457 Almost every Unix-like system comes with OpenSSH preinstalled. If
458 you're using such a system, run \Verb|which ssh| to find out if
459 the \command{ssh} command is installed (it's usually in
460 \dirname{/usr/bin}). In the unlikely event that it isn't present,
461 take a look at your system documentation to figure out how to install
462 it.
463
464 On Windows, you'll first need to choose download a suitable ssh
465 client. There are two alternatives.
466 \begin{itemize}
467 \item Simon Tatham's excellent PuTTY package~\cite{web:putty} provides
468 a complete suite of ssh client commands.
469 \item If you have a high tolerance for pain, you can use the Cygwin
470 port of OpenSSH.
471 \end{itemize}
472 In either case, you'll need to edit your \hgini\ file to tell
473 Mercurial where to find the actual client command. For example, if
474 you're using PuTTY, you'll need to use the \command{plink} command as
475 a command-line ssh client.
476 \begin{codesample2}
477 [ui]
478 ssh = C:/path/to/plink.exe -ssh -i "C:/path/to/my/private/key"
479 \end{codesample2}
480
481 \begin{note}
482 The path to \command{plink} shouldn't contain any whitespace
483 characters, or Mercurial may not be able to run it correctly (so
484 putting it in \dirname{C:\\Program Files} is probably not a good
485 idea).
486 \end{note}
487
488 \subsection{Generating a key pair}
489
490 To avoid the need to repetitively type a password every time you need
491 to use your ssh client, I recommend generating a key pair. On a
492 Unix-like system, the \command{ssh-keygen} command will do the trick.
493 On Windows, if you're using PuTTY, the \command{puttygen} command is
494 what you'll need.
495
496 When you generate a key pair, it's usually \emph{highly} advisable to
497 protect it with a passphrase. (The only time that you might not want
498 to do this id when you're using the ssh protocol for automated tasks
499 on a secure network.)
500
501 Simply generating a key pair isn't enough, however. You'll need to
502 add the public key to the set of authorised keys for whatever user
503 you're logging in remotely as. For servers using OpenSSH (the vast
504 majority), this will mean adding the public key to a list in a file
505 called \sfilename{authorized\_keys} in their \sdirname{.ssh}
506 directory.
507
508 On a Unix-like system, your public key will have a \filename{.pub}
509 extension. If you're using \command{puttygen} on Windows, you can
510 save the public key to a file of your choosing, or paste it from the
511 window it's displayed in straight into the
512 \sfilename{authorized\_keys} file.
513
514 \subsection{Using an authentication agent}
515
516 An authentication agent is a daemon that stores passphrases in memory
517 (so it will forget passphrases if you log out and log back in again).
518 An ssh client will notice if it's running, and query it for a
519 passphrase. If there's no authentication agent running, or the agent
520 doesn't store the necessary passphrase, you'll have to type your
521 passphrase every time Mercurial tries to communicate with a server on
522 your behalf (e.g.~whenever you pull or push changes).
523
524 The downside of storing passphrases in an agent is that it's possible
525 for a well-prepared attacker to recover the plain text of your
526 passphrases, in some cases even if your system has been power-cycled.
527 You should make your own judgment as to whether this is an acceptable
528 risk. It certainly saves a lot of repeated typing.
529
530 On Unix-like systems, the agent is called \command{ssh-agent}, and
531 it's often run automatically for you when you log in. You'll need to
532 use the \command{ssh-add} command to add passphrases to the agent's
533 store. On Windows, if you're using PuTTY, the \command{pageant}
534 command acts as the agent. It adds an icon to your system tray that
535 will let you manage stored passphrases.
536
537 \subsection{Configuring the server side properly}
538
539 Because ssh can be fiddly to set up if you're new to it, there's a
540 variety of things that can go wrong. Add Mercurial on top, and
541 there's plenty more scope for head-scratching. Most of these
542 potential problems occur on the server side, not the client side. The
543 good news is that once you've gotten a configuration working, it will
544 usually continue to work indefinitely.
545
546 Before you try using Mercurial to talk to an ssh server, it's best to
547 make sure that you can use the normal \command{ssh} or \command{putty}
548 command to talk to the server first. If you run into problems with
549 using these commands directly, Mercurial surely won't work. Worse, it
550 will obscure the underlying problem. Any time you want to debug
551 ssh-related Mercurial problems, you should drop back to making sure
552 that plain ssh client commands work first, \emph{before} you worry
553 about whether there's a problem with Mercurial.
554
555 The first thing to be sure of on the server side is that you can
556 actually log in from another machine at all. If you can't use
557 \command{ssh} or \command{putty} to log in, the error message you get
558 may give you a few hints as to what's wrong. The most common problems
559 are as follows.
560 \begin{itemize}
561 \item If you get a ``connection refused'' error, either there isn't an
562 SSH daemon running on the server at all, or it's inaccessible due to
563 firewall configuration.
564 \item If you get a ``no route to host'' error, you either have an
565 incorrect address for the server or a seriously locked down firewall
566 that won't admit its existence at all.
567 \item If you get a ``permission denied'' error, you may have mistyped
568 the username on the server, or you could have mistyped your key's
569 passphrase or the remote user's password.
570 \end{itemize}
571 In summary, if you're having trouble talking to the server's ssh
572 daemon, first make sure that one is running at all. On many systems
573 it will be installed, but disabled, by default. Once you're done with
574 this step, you should then check that the server's firewall is
575 configured to allow incoming connections on the port the ssh daemon is
576 listening on (usually~22). Don't worry about more exotic
577 possibilities for misconfiguration until you've checked these two
578 first.
579
580 If you're using an authentication agent on the client side to store
581 passphrases for your keys, you ought to be able to log into the server
582 without being prompted for a passphrase or a password. If you're
583 prompted for a passphrase, there are a few possible culprits.
584 \begin{itemize}
585 \item You might have forgotten to use \command{ssh-add} or
586 \command{pageant} to store the passphrase.
587 \item You might have stored the passphrase for the wrong key.
588 \end{itemize}
589 If you're being prompted for the remote user's password, there are
590 another few possible problems to check.
591 \begin{itemize}
592 \item Either the user's home directory or their \sdirname{.ssh}
593 directory might have excessively liberal permissions. As a result,
594 the ssh daemon will not trust or read their
595 \sfilename{authorized\_keys} file. For example, a group-writable
596 home or \sdirname{.ssh} directory will often cause this symptom.
597 \item The user's \sfilename{authorized\_keys} file may have a problem.
598 If anyone other than the user owns or can write to that file, the
599 ssh daemon will not trust or read it.
600 \end{itemize}
601
602 In the ideal world, you should be able to run the following command
603 successfully, and it should print exactly one line of output, the
604 current date and time.
605 \begin{codesample2}
606 ssh myserver date
607 \end{codesample2}
608
609 If, on your server, you have login scripts that print banners or other
610 junk even when running non-interactive commands like this, you should
611 fix them before you continue, so that they only print output if
612 they're run interactively. Otherwise these banners will at least
613 clutter up Mercurial's output. Worse, they could potentially cause
614 problems with running Mercurial commands remotely. Mercurial makes
615 tries to detect and ignore banners in non-interactive \command{ssh}
616 sessions, but it is not foolproof. (If you're editing your login
617 scripts on your server, the usual way to see if a login script is
618 running in an interactive shell is to check the return code from the
619 command \Verb|tty -s|.)
620
621 Once you've verified that plain old ssh is working with your server,
622 the next step is to ensure that Mercurial runs on the server. The
623 following command should run successfully:
624 \begin{codesample2}
625 ssh myserver hg version
626 \end{codesample2}
627 If you see an error message instead of normal \hgcmd{version} output,
628 this is usually because you haven't installed Mercurial to
629 \dirname{/usr/bin}. Don't worry if this is the case; you don't need
630 to do that. But you should check for a few possible problems.
631 \begin{itemize}
632 \item Is Mercurial really installed on the server at all? I know this
633 sounds trivial, but it's worth checking!
634 \item Maybe your shell's search path (usually set via the \envar{PATH}
635 environment variable) is simply misconfigured.
636 \item Perhaps your \envar{PATH} environment variable is only being set
637 to point to the location of the \command{hg} executable if the login
638 session is interactive. This can happen if you're setting the path
639 in the wrong shell login script. See your shell's documentation for
640 details.
641 \item The \envar{PYTHONPATH} environment variable may need to contain
642 the path to the Mercurial Python modules. It might not be set at
643 all; it could be incorrect; or it may be set only if the login is
644 interactive.
645 \end{itemize}
646
647 If you can run \hgcmd{version} over an ssh connection, well done!
648 You've got the server and client sorted out. You should now be able
649 to use Mercurial to access repositories hosted by that username on
650 that server. If you run into problems with Mercurial and ssh at this
651 point, try using the \hggopt{--debug} option to get a clearer picture
652 of what's going on.
653
654 \subsection{Using compression with ssh}
655
656 Mercurial does not compress data when it uses the ssh protocol,
657 because the ssh protocol can transparently compress data. However,
658 the default behaviour of ssh clients is \emph{not} to request
659 compression.
660
661 Over any network other than a fast LAN (even a wireless network),
662 using compression is likely to significantly speed up Mercurial's
663 network operations. For example, over a WAN, someone measured
664 compression as reducing the amount of time required to clone a
665 particularly large repository from~51 minutes to~17 minutes.
666
667 Both \command{ssh} and \command{plink} accept a \cmdopt{ssh}{-C}
668 option which turns on compression. You can easily edit your \hgrc\ to
669 enable compression for all of Mercurial's uses of the ssh protocol.
670 \begin{codesample2}
671 [ui]
672 ssh = ssh -C
673 \end{codesample2}
674
675 If you use \command{ssh}, you can configure it to always use
676 compression when talking to your server. To do this, edit your
677 \sfilename{.ssh/config} file (which may not yet exist), as follows.
678 \begin{codesample2}
679 Host hg
680 Compression yes
681 HostName hg.example.com
682 \end{codesample2}
683 This defines an alias, \texttt{hg}. When you use it on the
684 \command{ssh} command line or in a Mercurial \texttt{ssh}-protocol
685 URL, it will cause \command{ssh} to connect to \texttt{hg.example.com}
686 and use compression. This gives you both a shorter name to type and
687 compression, each of which is a good thing in its own right.
688
689 \section{Serving over HTTP using CGI}
690 \label{sec:collab:cgi}
691
692 Depending on how ambitious you are, configuring Mercurial's CGI
693 interface can take anything from a few moments to several hours.
694
695 We'll begin with the simplest of examples, and work our way towards a
696 more complex configuration. Even for the most basic case, you're
697 almost certainly going to need to read and modify your web server's
698 configuration.
699
700 \begin{note}
701 Configuring a web server is a complex, fiddly, and highly
702 system-dependent activity. I can't possibly give you instructions
703 that will cover anything like all of the cases you will encounter.
704 Please use your discretion and judgment in following the sections
705 below. Be prepared to make plenty of mistakes, and to spend a lot
706 of time reading your server's error logs.
707 \end{note}
708
709 \subsection{Web server configuration checklist}
710
711 Before you continue, do take a few moments to check a few aspects of
712 your system's setup.
713
714 \begin{enumerate}
715 \item Do you have a web server installed at all? Mac OS X ships with
716 Apache, but many other systems may not have a web server installed.
717 \item If you have a web server installed, is it actually running? On
718 most systems, even if one is present, it will be disabled by
719 default.
720 \item Is your server configured to allow you to run CGI programs in
721 the directory where you plan to do so? Most servers default to
722 explicitly disabling the ability to run CGI programs.
723 \end{enumerate}
724
725 If you don't have a web server installed, and don't have substantial
726 experience configuring Apache, you should consider using the
727 \texttt{lighttpd} web server instead of Apache. Apache has a
728 well-deserved reputation for baroque and confusing configuration.
729 While \texttt{lighttpd} is less capable in some ways than Apache, most
730 of these capabilities are not relevant to serving Mercurial
731 repositories. And \texttt{lighttpd} is undeniably \emph{much} easier
732 to get started with than Apache.
733
734 \subsection{Basic CGI configuration}
735
736 On Unix-like systems, it's common for users to have a subdirectory
737 named something like \dirname{public\_html} in their home directory,
738 from which they can serve up web pages. A file named \filename{foo}
739 in this directory will be accessible at a URL of the form
740 \texttt{http://www.example.com/\~username/foo}.
741
742 To get started, find the \sfilename{hgweb.cgi} script that should be
743 present in your Mercurial installation. If you can't quickly find a
744 local copy on your system, simply download one from the master
745 Mercurial repository at
746 \url{http://www.selenic.com/repo/hg/raw-file/tip/hgweb.cgi}.
747
748 You'll need to copy this script into your \dirname{public\_html}
749 directory, and ensure that it's executable.
750 \begin{codesample2}
751 cp .../hgweb.cgi ~/public_html
752 chmod 755 ~/public_html/hgweb.cgi
753 \end{codesample2}
754 The \texttt{755} argument to \command{chmod} is a little more general
755 than just making the script executable: it ensures that the script is
756 executable by anyone, and that ``group'' and ``other'' write
757 permissions are \emph{not} set. If you were to leave those write
758 permissions enabled, Apache's \texttt{suexec} subsystem would likely
759 refuse to execute the script. In fact, \texttt{suexec} also insists
760 that the \emph{directory} in which the script resides must not be
761 writable by others.
762 \begin{codesample2}
763 chmod 755 ~/public_html
764 \end{codesample2}
765
766 \subsubsection{What could \emph{possibly} go wrong?}
767 \label{sec:collab:wtf}
768
769 Once you've copied the CGI script into place, go into a web browser,
770 and try to open the URL \url{http://myhostname/~myuser/hgweb.cgi},
771 \emph{but} brace yourself for instant failure. There's a high
772 probability that trying to visit this URL will fail, and there are
773 many possible reasons for this. In fact, you're likely to stumble
774 over almost every one of the possible errors below, so please read
775 carefully. The following are all of the problems I ran into on a
776 system running Fedora~7, with a fresh installation of Apache, and a
777 user account that I created specially to perform this exercise.
778
779 Your web server may have per-user directories disabled. If you're
780 using Apache, search your config file for a \texttt{UserDir}
781 directive. If there's none present, per-user directories will be
782 disabled. If one exists, but its value is \texttt{disabled}, then
783 per-user directories will be disabled. Otherwise, the string after
784 \texttt{UserDir} gives the name of the subdirectory that Apache will
785 look in under your home directory, for example \dirname{public\_html}.
786
787 Your file access permissions may be too restrictive. The web server
788 must be able to traverse your home directory and directories under
789 your \dirname{public\_html} directory, and read files under the latter
790 too. Here's a quick recipe to help you to make your permissions more
791 appropriate.
792 \begin{codesample2}
793 chmod 755 ~
794 find ~/public_html -type d -print0 | xargs -0r chmod 755
795 find ~/public_html -type f -print0 | xargs -0r chmod 644
796 \end{codesample2}
797
798 The other possibility with permissions is that you might get a
799 completely empty window when you try to load the script. In this
800 case, it's likely that your access permissions are \emph{too
801 permissive}. Apache's \texttt{suexec} subsystem won't execute a
802 script that's group-~or world-writable, for example.
803
804 Your web server may be configured to disallow execution of CGI
805 programs in your per-user web directory. Here's Apache's
806 default per-user configuration from my Fedora system.
807 \begin{codesample2}
808 <Directory /home/*/public_html>
809 AllowOverride FileInfo AuthConfig Limit
810 Options MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec
811 <Limit GET POST OPTIONS>
812 Order allow,deny
813 Allow from all
814 </Limit>
815 <LimitExcept GET POST OPTIONS>
816 Order deny,allow
817 Deny from all
818 </LimitExcept>
819 </Directory>
820 \end{codesample2}
821 If you find a similar-looking \texttt{Directory} group in your Apache
822 configuration, the directive to look at inside it is \texttt{Options}.
823 Add \texttt{ExecCGI} to the end of this list if it's missing, and
824 restart the web server.
825
826 If you find that Apache serves you the text of the CGI script instead
827 of executing it, you may need to either uncomment (if already present)
828 or add a directive like this.
829 \begin{codesample2}
830 AddHandler cgi-script .cgi
831 \end{codesample2}
832
833 The next possibility is that you might be served with a colourful
834 Python backtrace claiming that it can't import a
835 \texttt{mercurial}-related module. This is actually progress! The
836 server is now capable of executing your CGI script. This error is
837 only likely to occur if you're running a private installation of
838 Mercurial, instead of a system-wide version. Remember that the web
839 server runs the CGI program without any of the environment variables
840 that you take for granted in an interactive session. If this error
841 happens to you, edit your copy of \sfilename{hgweb.cgi} and follow the
842 directions inside it to correctly set your \envar{PYTHONPATH}
843 environment variable.
844
845 Finally, you are \emph{certain} to by served with another colourful
846 Python backtrace: this one will complain that it can't find
847 \dirname{/path/to/repository}. Edit your \sfilename{hgweb.cgi} script
848 and replace the \dirname{/path/to/repository} string with the complete
849 path to the repository you want to serve up.
850
851 At this point, when you try to reload the page, you should be
852 presented with a nice HTML view of your repository's history. Whew!
853
854 \subsubsection{Configuring lighttpd}
855
856 To be exhaustive in my experiments, I tried configuring the
857 increasingly popular \texttt{lighttpd} web server to serve the same
858 repository as I described with Apache above. I had already overcome
859 all of the problems I outlined with Apache, many of which are not
860 server-specific. As a result, I was fairly sure that my file and
861 directory permissions were good, and that my \sfilename{hgweb.cgi}
862 script was properly edited.
863
864 Once I had Apache running, getting \texttt{lighttpd} to serve the
865 repository was a snap (in other words, even if you're trying to use
866 \texttt{lighttpd}, you should read the Apache section). I first had
867 to edit the \texttt{mod\_access} section of its config file to enable
868 \texttt{mod\_cgi} and \texttt{mod\_userdir}, both of which were
869 disabled by default on my system. I then added a few lines to the end
870 of the config file, to configure these modules.
871 \begin{codesample2}
872 userdir.path = "public_html"
873 cgi.assign = ( ".cgi" => "" )
874 \end{codesample2}
875 With this done, \texttt{lighttpd} ran immediately for me. If I had
876 configured \texttt{lighttpd} before Apache, I'd almost certainly have
877 run into many of the same system-level configuration problems as I did
878 with Apache. However, I found \texttt{lighttpd} to be noticeably
879 easier to configure than Apache, even though I've used Apache for over
880 a decade, and this was my first exposure to \texttt{lighttpd}.
881
882 \subsection{Sharing multiple repositories with one CGI script}
883
884 The \sfilename{hgweb.cgi} script only lets you publish a single
885 repository, which is an annoying restriction. If you want to publish
886 more than one without wracking yourself with multiple copies of the
887 same script, each with different names, a better choice is to use the
888 \sfilename{hgwebdir.cgi} script.
889
890 The procedure to configure \sfilename{hgwebdir.cgi} is only a little
891 more involved than for \sfilename{hgweb.cgi}. First, you must obtain
892 a copy of the script. If you don't have one handy, you can download a
893 copy from the master Mercurial repository at
894 \url{http://www.selenic.com/repo/hg/raw-file/tip/hgwebdir.cgi}.
895
896 You'll need to copy this script into your \dirname{public\_html}
897 directory, and ensure that it's executable.
898 \begin{codesample2}
899 cp .../hgwebdir.cgi ~/public_html
900 chmod 755 ~/public_html ~/public_html/hgwebdir.cgi
901 \end{codesample2}
902 With basic configuration out of the way, try to visit
903 \url{http://myhostname/~myuser/hgwebdir.cgi} in your browser. It
904 should display an empty list of repositories. If you get a blank
905 window or error message, try walking through the list of potential
906 problems in section~\ref{sec:collab:wtf}.
907
908 The \sfilename{hgwebdir.cgi} script relies on an external
909 configuration file. By default, it searches for a file named
910 \sfilename{hgweb.config} in the same directory as itself. You'll need
911 to create this file, and make it world-readable. The format of the
912 file is similar to a Windows ``ini'' file, as understood by Python's
913 \texttt{ConfigParser}~\cite{web:configparser} module.
914
915 The easiest way to configure \sfilename{hgwebdir.cgi} is with a
916 section named \texttt{collections}. This will automatically publish
917 \emph{every} repository under the directories you name. The section
918 should look like this:
919 \begin{codesample2}
920 [collections]
921 /my/root = /my/root
922 \end{codesample2}
923 Mercurial interprets this by looking at the directory name on the
924 \emph{right} hand side of the ``\texttt{=}'' sign; finding
925 repositories in that directory hierarchy; and using the text on the
926 \emph{left} to strip off matching text from the names it will actually
927 list in the web interface. The remaining component of a path after
928 this stripping has occurred is called a ``virtual path''.
929
930 Given the example above, if we have a repository whose local path is
931 \dirname{/my/root/this/repo}, the CGI script will strip the leading
932 \dirname{/my/root} from the name, and publish the repository with a
933 virtual path of \dirname{this/repo}. If the base URL for our CGI
934 script is \url{http://myhostname/~myuser/hgwebdir.cgi}, the complete
935 URL for that repository will be
936 \url{http://myhostname/~myuser/hgwebdir.cgi/this/repo}.
937
938 If we replace \dirname{/my/root} on the left hand side of this example
939 with \dirname{/my}, then \sfilename{hgwebdir.cgi} will only strip off
940 \dirname{/my} from the repository name, and will give us a virtual
941 path of \dirname{root/this/repo} instead of \dirname{this/repo}.
942
943 The \sfilename{hgwebdir.cgi} script will recursively search each
944 directory listed in the \texttt{collections} section of its
945 configuration file, but it will \texttt{not} recurse into the
946 repositories it finds.
947
948 The \texttt{collections} mechanism makes it easy to publish many
949 repositories in a ``fire and forget'' manner. You only need to set up
950 the CGI script and configuration file one time. Afterwards, you can
951 publish or unpublish a repository at any time by simply moving it
952 into, or out of, the directory hierarchy in which you've configured
953 \sfilename{hgwebdir.cgi} to look.
954
955 \subsubsection{Explicitly specifying which repositories to publish}
956
957 In addition to the \texttt{collections} mechanism, the
958 \sfilename{hgwebdir.cgi} script allows you to publish a specific list
959 of repositories. To do so, create a \texttt{paths} section, with
960 contents of the following form.
961 \begin{codesample2}
962 [paths]
963 repo1 = /my/path/to/some/repo
964 repo2 = /some/path/to/another
965 \end{codesample2}
966 In this case, the virtual path (the component that will appear in a
967 URL) is on the left hand side of each definition, while the path to
968 the repository is on the right. Notice that there does not need to be
969 any relationship between the virtual path you choose and the location
970 of a repository in your filesystem.
971
972 If you wish, you can use both the \texttt{collections} and
973 \texttt{paths} mechanisms simultaneously in a single configuration
974 file.
975
976 \begin{note}
977 If multiple repositories have the same virtual path,
978 \sfilename{hgwebdir.cgi} will not report an error. Instead, it will
979 behave unpredictably.
980 \end{note}
981
982 \subsection{Downloading source archives}
983
984 Mercurial's web interface lets users download an archive of any
985 revision. This archive will contain a snapshot of the working
986 directory as of that revision, but it will not contain a copy of the
987 repository data.
988
989 By default, this feature is not enabled. To enable it, you'll need to
990 add an \rcitem{web}{allow\_archive} item to the \rcsection{web}
991 section of your \hgrc.
992
993 \subsection{Web configuration options}
994
995 Mercurial's web interfaces (the \hgcmd{serve} command, and the
996 \sfilename{hgweb.cgi} and \sfilename{hgwebdir.cgi} scripts) have a
997 number of configuration options that you can set. These belong in a
998 section named \rcsection{web}.
999 \begin{itemize}
1000 \item[\rcitem{web}{allow\_archive}] Determines which (if any) archive
1001 download mechanisms Mercurial supports. If you enable this
1002 feature, users of the web interface will be able to download an
1003 archive of whatever revision of a repository they are viewing.
1004 To enable the archive feature, this item must take the form of a
1005 sequence of words drawn from the list below.
1006 \begin{itemize}
1007 \item[\texttt{bz2}] A \command{tar} archive, compressed using
1008 \texttt{bzip2} compression. This has the best compression ratio,
1009 but uses the most CPU time on the server.
1010 \item[\texttt{gz}] A \command{tar} archive, compressed using
1011 \texttt{gzip} compression.
1012 \item[\texttt{zip}] A \command{zip} archive, compressed using LZW
1013 compression. This format has the worst compression ratio, but is
1014 widely used in the Windows world.
1015 \end{itemize}
1016 If you provide an empty list, or don't have an
1017 \rcitem{web}{allow\_archive} entry at all, this feature will be
1018 disabled. Here is an example of how to enable all three supported
1019 formats.
1020 \begin{codesample4}
1021 [web]
1022 allow_archive = bz2 gz zip
1023 \end{codesample4}
1024 \item[\rcitem{web}{allowpull}] Boolean. Determines whether the web
1025 interface allows remote users to \hgcmd{pull} and \hgcmd{clone} this
1026 repository over~HTTP. If set to \texttt{no} or \texttt{false}, only
1027 the ``human-oriented'' portion of the web interface is available.
1028 \item[\rcitem{web}{contact}] String. A free-form (but preferably
1029 brief) string identifying the person or group in charge of the
1030 repository. This often contains the name and email address of a
1031 person or mailing list. It often makes sense to place this entry in
1032 a repository's own \sfilename{.hg/hgrc} file, but it can make sense
1033 to use in a global \hgrc\ if every repository has a single
1034 maintainer.
1035 \item[\rcitem{web}{maxchanges}] Integer. The default maximum number
1036 of changesets to display in a single page of output.
1037 \item[\rcitem{web}{maxfiles}] Integer. The default maximum number
1038 of modified files to display in a single page of output.
1039 \item[\rcitem{web}{stripes}] Integer. If the web interface displays
1040 alternating ``stripes'' to make it easier to visually align rows
1041 when you are looking at a table, this number controls the number of
1042 rows in each stripe.
1043 \item[\rcitem{web}{style}] Controls the template Mercurial uses to
1044 display the web interface. Mercurial ships with two web templates,
1045 named \texttt{default} and \texttt{gitweb} (the latter is much more
1046 visually attractive). You can also specify a custom template of
1047 your own; see chapter~\ref{chap:template} for details. Here, you
1048 can see how to enable the \texttt{gitweb} style.
1049 \begin{codesample4}
1050 [web]
1051 style = gitweb
1052 \end{codesample4}
1053 \item[\rcitem{web}{templates}] Path. The directory in which to search
1054 for template files. By default, Mercurial searches in the directory
1055 in which it was installed.
1056 \end{itemize}
1057 If you are using \sfilename{hgwebdir.cgi}, you can place a few
1058 configuration items in a \rcsection{web} section of the
1059 \sfilename{hgweb.config} file instead of a \hgrc\ file, for
1060 convenience. These items are \rcitem{web}{motd} and
1061 \rcitem{web}{style}.
1062
1063 \subsubsection{Options specific to an individual repository}
1064
1065 A few \rcsection{web} configuration items ought to be placed in a
1066 repository's local \sfilename{.hg/hgrc}, rather than a user's or
1067 global \hgrc.
1068 \begin{itemize}
1069 \item[\rcitem{web}{description}] String. A free-form (but preferably
1070 brief) string that describes the contents or purpose of the
1071 repository.
1072 \item[\rcitem{web}{name}] String. The name to use for the repository
1073 in the web interface. This overrides the default name, which is the
1074 last component of the repository's path.
1075 \end{itemize}
1076
1077 \subsubsection{Options specific to the \hgcmd{serve} command}
1078
1079 Some of the items in the \rcsection{web} section of a \hgrc\ file are
1080 only for use with the \hgcmd{serve} command.
1081 \begin{itemize}
1082 \item[\rcitem{web}{accesslog}] Path. The name of a file into which to
1083 write an access log. By default, the \hgcmd{serve} command writes
1084 this information to standard output, not to a file. Log entries are
1085 written in the standard ``combined'' file format used by almost all
1086 web servers.
1087 \item[\rcitem{web}{address}] String. The local address on which the
1088 server should listen for incoming connections. By default, the
1089 server listens on all addresses.
1090 \item[\rcitem{web}{errorlog}] Path. The name of a file into which to
1091 write an error log. By default, the \hgcmd{serve} command writes this
1092 information to standard error, not to a file.
1093 \item[\rcitem{web}{ipv6}] Boolean. Whether to use the IPv6 protocol.
1094 By default, IPv6 is not used.
1095 \item[\rcitem{web}{port}] Integer. The TCP~port number on which the
1096 server should listen. The default port number used is~8000.
1097 \end{itemize}
1098
1099 \subsubsection{Choosing the right \hgrc\ file to add \rcsection{web}
1100 items to}
1101
1102 It is important to remember that a web server like Apache or
1103 \texttt{lighttpd} will run under a user~ID that is different to yours.
1104 CGI scripts run by your server, such as \sfilename{hgweb.cgi}, will
1105 usually also run under that user~ID.
1106
1107 If you add \rcsection{web} items to your own personal \hgrc\ file, CGI
1108 scripts won't read that \hgrc\ file. Those settings will thus only
1109 affect the behaviour of the \hgcmd{serve} command when you run it. To
1110 cause CGI scripts to see your settings, either create a \hgrc\ file in
1111 the home directory of the user ID that runs your web server, or add
1112 those settings to a system-wide \hgrc\ file.
1113
1114
1115 %%% Local Variables:
1116 %%% mode: latex
1117 %%% TeX-master: "00book"
1118 %%% End: