diff en/collab.tex @ 184:7b812c428074

Document the ssh protocol, URL syntax, and configuration.
author Bryan O'Sullivan <bos@serpentine.com>
date Thu, 05 Apr 2007 23:28:06 -0700
parents 5fc4a45c069f
children b60e2de6dbc3
line wrap: on
line diff
--- a/en/collab.tex	Fri Mar 30 23:28:34 2007 -0700
+++ b/en/collab.tex	Thu Apr 05 23:28:06 2007 -0700
@@ -189,9 +189,9 @@
 
 This model resembles working with feature branches.  The difference is
 that when a feature branch misses a train, someone on the feature team
-pulls and merges the changes that went out on that train release, and
-the team continues its work on top of that release so that their
-feature can make the next release.
+pulls and merges the changes that went out on that train release into
+the feature branch, and the team continues its work on top of that
+release so that their feature can make the next release.
 
 \subsection{The Linux kernel model}
 
@@ -223,12 +223,16 @@
 to Linus.  In addition, there are several well known branches that
 people use for different purposes.  For example, a few people maintain
 ``stable'' repositories of older versions of the kernel, to which they
-apply critical fixes as needed.
+apply critical fixes as needed.  Some maintainers publish multiple
+trees: one for experimental changes; one for changes that they are
+about to feed upstream; and so on.  Others just publish a single
+tree.
 
 This model has two notable features.  The first is that it's ``pull
 only''.  You have to ask, convince, or beg another developer to take a
-change from you, because there are no shared trees, and there's no way
-to push changes into a tree that someone else controls.
+change from you, because there are almost no trees to which more than
+one person can push, and there's no way to push changes into a tree
+that someone else controls.
 
 The second is that it's based on reputation and acclaim.  If you're an
 unknown, Linus will probably ignore changes from you without even
@@ -313,10 +317,287 @@
 correctly, and find out what URL you should send to your
 collaborators, start it with the \hggopt{-v} option.
 
-\subsection{Using \command{ssh} as a tunnel}
+\subsection{Using the Secure Shell (ssh) protocol}
 \label{sec:collab:ssh}
 
-\subsection{Serving HTTP with a CGI script}
+You can pull and push changes securely over a network connection using
+the Secure Shell (\texttt{ssh}) protocol.  To use this successfully,
+you may have to do a little bit of configuration on the client or
+server sides.
+
+If you're not familiar with ssh, it's a network protocol that lets you
+securely communicate with another computer.  To use it with Mercurial,
+you'll be setting up one or more user accounts on a server so that
+remote users can log in and execute commands.
+
+(If you \emph{are} familiar with ssh, you'll probably find some of the
+material that follows to be elementary in nature.)
+
+\subsubsection{How to read and write ssh URLs}
+
+An ssh URL tends to look like this:
+\begin{codesample2}
+  ssh://bos@hg.serpentine.com:22/hg/hgbook
+\end{codesample2}
+\begin{enumerate}
+\item The ``\texttt{ssh://}'' part tells Mercurial to use the ssh
+  protocol.
+\item The ``\texttt{bos@}'' component indicates what username to log
+  into the server as.  You can leave this out if the remote username
+  is the same as your local username.
+\item The ``\texttt{hg.serpentine.com}'' gives the hostname of the
+  server to log into.
+\item The ``:22'' identifies the port number to connect to the server
+  on.  The default port is~22, so you only need to specify this part
+  if you're \emph{not} using port~22.
+\item The remainder of the URL is the local path to the repository on
+  the server.
+\end{enumerate}
+
+There's plenty of scope for confusion with the path component of ssh
+URLs, as there is no standard way for tools to interpret it.  Some
+programs behave differently than others when dealing with these paths.
+This isn't an ideal situation, but it's unlikely to change.  Please
+read the following paragraphs carefully.
+
+Mercurial treats the path to a repository on the server as relative to
+the remote user's home directory.  For example, if user \texttt{foo}
+on the server has a home directory of \dirname{/home/foo}, then an ssh
+URL that contains a path component of \dirname{bar}
+\emph{really} refers to the directory \dirname{/home/foo/bar}.
+
+If you want to specify a path relative to another user's home
+directory, you can use a path that starts with a tilde character
+followed by the user's name (let's call them \texttt{otheruser}), like
+this.
+\begin{codesample2}
+  ssh://server/~otheruser/hg/repo
+\end{codesample2}
+
+And if you really want to specify an \emph{absolute} path on the
+server, begin the path component with two slashes, as in this example.
+\begin{codesample2}
+  ssh://server//absolute/path
+\end{codesample2}
+
+\subsubsection{Finding an ssh client for your system}
+
+Almost every Unix-like system comes with OpenSSH preinstalled.  If
+you're using such a system, run \Verb|which ssh| to find out if
+the \command{ssh} command is installed (it's usually in
+\dirname{/usr/bin}).  In the unlikely event that it isn't present,
+take a look at your system documentation to figure out how to install
+it.
+
+On Windows, you'll first need to choose download a suitable ssh
+client.  There are two alternatives.
+\begin{itemize}
+\item Simon Tatham's excellent PuTTY package~\cite{web:putty} provides
+  a complete suite of ssh client commands.
+\item If you have a high tolerance for pain, you can use the Cygwin
+  port of OpenSSH.
+\end{itemize}
+In either case, you'll need to edit your \hgini\ file to tell
+Mercurial where to find the actual client command.  For example, if
+you're using PuTTY, you'll need to use the \command{plink} command as
+a command-line ssh client.
+\begin{codesample2}
+  [ui]
+  ssh = C:/path/to/plink.exe -ssh -i "C:/path/to/my/private/key"
+\end{codesample2}
+
+\begin{note}
+  The path to \command{plink} shouldn't contain any whitespace
+  characters, or Mercurial may not be able to run it correctly (so
+  putting it in \dirname{C:\\Program Files} is probably not be a good
+  idea).
+\end{note}
+
+\subsubsection{Generating a key pair}
+
+To avoid the need to repetitively type a password every time you need
+to use your ssh client, I recommend generating a key pair.  On a
+Unix-like system, the \command{ssh-keygen} command will do the trick.
+On Windows, if you're using PuTTY, the \command{puttygen} command is
+what you'll need.
+
+When you generate a key pair, it's usually \emph{highly} advisable to
+protect it with a passphrase.  (The only time that you might not want
+to do this id when you're using the ssh protocol for automated tasks
+on a secure network.)
+
+Simply generating a key pair isn't enough, however.  You'll need to
+add the public key to the set of authorised keys for whatever user
+you're logging in remotely as.  For servers using OpenSSH (the vast
+majority), this will mean adding the public key to a list in a file
+called \sfilename{authorized\_keys} in their \sdirname{.ssh}
+directory.
+
+On a Unix-like system, your public key will have a \filename{.pub}
+extension.  If you're using \command{puttygen} on Windows, you can
+save the public key to a file of your choosing, or paste it from the
+window it's displayed in straight into the
+\sfilename{authorized\_keys} file.
+
+\subsubsection{Using an authentication agent}
+
+An authentication agent is a daemon that stores passphrases in memory
+(so it will forget passphrases if you log out and log back in again).
+An ssh client will notice if it's running, and query it for a
+passphrase.  If there's no authentication agent running, or the agent
+doesn't store the necessary passphrase, you'll have to type your
+passphrase every time Mercurial tries to communicate with a server on
+your behalf (e.g.~whenever you pull or push changes).
+
+The downside of storing passphrases in an agent is that it's possible
+for a well-prepared attacker to recover the plain text of your
+passphrases, in some cases even if your system has been power-cycled.
+You should make your own judgment as to whether this is an acceptable
+risk.  It certainly saves a lot of repeated typing.
+
+On Unix-like systems, the agent is called \command{ssh-agent}, and
+it's often run automatically for you when you log in.  You'll need to
+use the \command{ssh-add} command to add passphrases to the agent's
+store.  On Windows, if you're using PuTTY, the \command{pageant}
+command acts as the agent.  It adds an icon to your system tray that
+will let you manage stored passphrases.
+
+\subsubsection{Configuring the server side properly}
+
+Because ssh can be fiddly to set up if you're new to it, there's a
+variety of things that can go wrong.  Add Mercurial on top, and
+there's plenty more scope for head-scratching.  Most of these
+potential problems occur on the server side, not the client side.  The
+good news is that once you've gotten a configuration working, it will
+usually continue to work indefinitely.
+
+Before you try using Mercurial to talk to an ssh server, it's best to
+make sure that you can use the normal \command{ssh} or \command{putty}
+command to talk to the server first.  If you run into problems with
+using these commands directly, Mercurial surely won't work.  Worse, it
+will obscure the underlying problem.  Any time you want to debug
+ssh-related Mercurial problems, you should drop back to making sure
+that plain ssh client commands work first, \emph{before} you worry
+about whether there's a problem with Mercurial.
+
+The first thing to be sure of on the server side is that you can
+actually log in from another machine at all.  If you can't use
+\command{ssh} or \command{putty} to log in, the error message you get
+may give you a few hints as to what's wrong.  The most common problems
+are as follows.
+\begin{itemize}
+\item If you get a ``connection refused'' error, either there isn't an
+  SSH daemon running on the server at all, or it's inaccessible due to
+  firewall configuration.
+\item If you get a ``no route to host'' error, you either have an
+  incorrect address for the server or a seriously locked down firewall
+  that won't admit its existence at all.
+\item If you get a ``permission denied'' error, you may have mistyped
+  the username on the server, or you could have mistyped your key's
+  passphrase or the remote user's password.
+\end{itemize}
+In summary, if you're having trouble talking to the server's ssh
+daemon, first make sure that one is running at all.  On many systems
+it will be installed, but disabled, by default.  Once you're done with
+this step, you should then check that the server's firewall is
+configured to allow incoming connections on the port the ssh daemon is
+listening on (usually~22).  Don't worry about more exotic
+possibilities for misconfiguration until you've checked these two
+first.
+
+If you're using an authentication agent on the client side to store
+passphrases for your keys, you ought to be able to log into the server
+without being prompted for a passphrase or a password.  If you're
+prompted for a passphrase, there are a few possible culprits.
+\begin{itemize}
+\item You might have forgotten to use \command{ssh-add} or
+  \command{pageant} to store the passphrase.
+\item You might have stored the passphrase for the wrong key.
+\end{itemize}
+If you're being prompted for the remote user's password, there are
+another few possible problems to check.
+\begin{itemize}
+\item Either the user's home directory or their \sdirname{.ssh}
+  directory might have excessively liberal permissions.  As a result,
+  the ssh daemon will not trust or read their
+  \sfilename{authorized\_keys} file.  For example, a group-writable
+  home or \sdirname{.ssh} directory will often cause this symptom.
+\item The user's \sfilename{authorized\_keys} file may have a problem.
+  If anyone other than the user owns or can write to that file, the
+  ssh daemon will not trust or read it.
+\end{itemize}
+
+In the ideal world, you should be able to run the following command
+successfully, and it should print exactly one line of output, the
+current date and time.
+\begin{codesample2}
+  ssh myserver date
+\end{codesample2}
+
+If on your server you have login scripts that print banners or other
+junk even when running non-interactive commands like this, you should
+fix them before you continue, so that they only print output if
+they're run interactively.  Otherwise these banners will at least
+clutter up Mercurial's output.  Worse, they could potentially cause
+problems with running Mercurial commands remotely.  (The usual way to
+see if a login script is running in an interactive shell is to check
+the return code from the command \Verb|tty -s|.)
+
+Once you've verified that plain old ssh is working with your server,
+the next step is to ensure that Mercurial runs on the server.  The
+following command should run successfully:
+\begin{codesample2}
+  ssh myserver hg version
+\end{codesample2}
+If you see an error message instead of normal \hgcmd{version} output,
+this is usually because you haven't installed Mercurial to
+\dirname{/usr/bin}.  Don't worry if this is the case; you don't need
+to do that.  But you should check for a few possible problems.
+\begin{itemize}
+\item Is Mercurial really installed on the server at all?  I know this
+  sounds trivial, but it's worth checking!
+\item Maybe your shell's search path (usually set via the \envar{PATH}
+  environment variable) is simply misconfigured.
+\item Perhaps your \envar{PATH} environment variable is only being set
+  to point to the location of the \command{hg} executable if the login
+  session is interactive.  This can happen if you're setting the path
+  in the wrong shell login script.  See your shell's documentation for
+  details.
+\item The \envar{PYTHONPATH} environment variable may need to contain
+  the path to the Mercurial Python modules.  It might not be set at
+  all; it could be incorrect; or it may be set only if the login is
+  interactive.
+\end{itemize}
+
+If you can run \hgcmd{version} over an ssh connection, well done!
+You've got the server and client sorted out.  You should now be able
+to use Mercurial to access repositories hosted by that username on
+that server.  If you run into problems with Mercurial and ssh at this
+point, try using the \hggopt{--debug} option to get a clearer picture
+of what's going on.
+
+\subsubsection{Using compression with ssh}
+
+Mercurial does not compress data when it uses the ssh protocol,
+because the ssh protocol can transparently compress data.  However,
+the default behaviour of ssh clients is \emph{not} to request
+compression.
+
+Over any network other than a fast LAN (even a wireless network),
+using compression is likely to significantly speed up Mercurial's
+network operations.  For example, over a WAN, someone measured
+compression as reducing the amount of time required to clone a
+particularly large repository from~51 minutes to~17 minutes.
+
+Both \command{ssh} and \command{plink} accept a \cmdopt{ssh}{-C}
+option which turns on compression.  You can easily edit your \hgrc\ to
+enable compression for all of Mercurial's uses of the ssh protocol.
+\begin{codesample2}
+  [ui]
+  ssh = ssh -C
+\end{codesample2}
+
+\subsection{Serving over HTTP with a CGI script}
 \label{sec:collab:cgi}