Mercurial > hgbook
view es/collab.tex @ 525:e5739e8d708f
finished file "concepts.tex". Upgraded project status file
author | Javier Rojas <jerojasro@devnull.li> |
---|---|
date | Sun, 23 Nov 2008 13:22:45 -0500 |
parents | 7f5d542be96b |
children | 4a1dc5e8e2ff |
line wrap: on
line source
\chapter{Colaborar con otros} \label{cha:collab} Debido a su naturaleza descentralizada, Mercurial no impone política alguna de cómo deben trabajar los grupos de personas. Sin embargo, si usted es nuevo al control distribuido de versiones, es bueno tener herramientas y ejemplos a la mano al pensar en posibles modelos de flujo de trabajo. \section{La interfaz web de Mercurial} Mercurial tiene una poderosa interfaz web que provee bastantes capacidades útiles. Para uso interactivo, la interfaz le permite visualizar uno o varios repositorios. Puede ver la historia de un repositorio, examinar cada cambio(comentarios y diferencias), y ver los contenidos de cada directorio y fichero. Adicionalmente la interfaz provee feeds de RSS de los cambios de los repositorios. Que le permite ``subscribirse''a un repositorio usando su herramienta de lectura de feeds favorita, y ser notificado automáticamente de la actividad en el repositorio tan pronto como sucede. Me gusta mucho más este modelo que el estar suscrito a una lista de correo a la cual se envían las notificaciones, dado que no requiere configuración adicional de parte de quien sea que está administrando el repositorio. La interfaz web también permite clonar repositorios a los usuarios remotos, jalar cambios, y (cuando el servidor está configurado para permitirlo) publicar cambios en el mismo. El protocolo de tunneling de Mercurial comprime datos agresivamente, de forma que trabaja eficientemente incluso con conexiones de red con poco ancho de banda. La forma más sencilla de iniciarse con la interfaz web es usar su navegador para visitar un repositorio existente, como por ejemplo el repositorio principal de Mercurial \url{http://www.selenic.com/repo/hg?style=gitweb}. Si está interesado en proveer una interfaz web a sus propios repositorios, Mercurial provee dos formas de hacerlo. La primera es usando la orden \hgcmd{serve}, que está enfocada a servir ``de forma liviana'' y por intervalos cortos. Para más detalles de cómo usar esta orden vea la sección~\ref{sec:collab:serve} más adelante. Si tiene un repositorio que desea hacer permanente, Mercurial tiene soporte embebido del \command{ssh} para publicar cambios con seguridad al repositorio central, como se documenta en la sección~\ref{sec:collab:ssh}. Es muy usual que se publique una copia de sólo lectura en el repositorio que está corriendo sobre HTTP usando CGI, como en la sección~\ref{sec:collab:cgi}. Publicar sobre HTTP satisface las necesidades de la gente que no tiene permisos de publicación y de aquellos que quieren usar navegadores web para visualizar la historia del repositorio. \subsection{Trabajo con muchas ramas} Los proyectos de cierta talla tienden naturlamente a progresar de forma simultánea en varios frentes. En el caso del software, es común que un proyecto tenga versiones periódicas oficiales. Una versión puede entrar a ``modo mantenimiento'' por un tiempo después de su primera publicación; las versiones de mantenimiento tienden a contener solamente arreglos de fallos, pero no nuevas características. En paralelo con las versiones de mantenimiento puede haber una o muchas versiones futuras pueden estar en desarrollo. La gente usa normalmente la palabra ``rama'' para referirse a una de las direcciones ligeramente distintas en las cuales procede el desarrollo. Mercurial está especialmente preparado para administrar un buen número de ramas simultáneas pero no idénticas. Cada ``dirección de desarrollo'' puede vivir en su propio repositorio central, y puede mezclar los cambios de una a otra de acuerdo con las necesidades. Dado que los repositorios son independientes, uno del otro, los cambios inestables de una rama de desarrollo nunca afectarán una rama estable a menos que alguien explícitamente mezcle los cambios. A continuación un ejemplo de cómo podría hacerse esto en la práctica. Digamos que tiene una ``rama principal'' en un servidor central. \interaction{branching.init} Alguien lo clona, hace cambios locales, los prueba, y los publica allí mismo. Una vez que la rama principal alcanza una estado de versión se puede usar la orden \hgcmd{tag} para dar un nombre permanente a la revisión. \interaction{branching.tag} Digamos que en la rama principal ocurre más desarrollo. \interaction{branching.main} Cuando se usa la etiqueta con que se identificó la versión, la gente puede clonar el repositorio en cualquier momento en el futuro empleando \hgcmd{update} para obtener una copia del directorio de trabajo exacta como cuando se creó la etiqueta de la revisión que se consignó. \interaction{branching.update} Adicionalmente, justo después de que la rama principal se etiquete, alguien puede clonarla en el servidor a una nueva rama ``estable'', también en el servidor. \interaction{branching.clone} Alguien que requiera hacer un cambio en la rama estable puede clonar \emph{ese} repositorio, hacer sus cambios, consignar y publicarlos posteriormente al inicial. \interaction{branching.stable} Puesto que los repositorios de Mercurial son independientes, y que Mercurial no mueve los cambios de un lado a otro automáticamente, las ramas estable y principal están \emph{aisladas} la una de la otra. Los cambios que haga en la rama principal no ``se filtran'' a la rama estable o vice versa. Es usual que los arreglos de fallos de la rama estable deban hacerse aparecer en la rama principal también. En lugar de reescribir el arreglo del fallo en la rama principal, puede jalar y mezclar los cambios de la rama estable a la principal, Mercurial traerá tales arreglos por usted. \interaction{branching.merge} La rama principal contendtrá aún los cambios que no están en la estable y contendrá además todos los arreglos de fallos de la rama estable. La rama estable permanece incólume a tales cambios. \subsection{Ramas de Características} En proyectos grandes, una forma efectiva de administrar los cambios es dividir el equipo en grupos más pequeños. Cada grupo tiene una rama compartida, clonada de una rama ``principal'' que conforma el proyecto completo. Aquellos que trabajan en ramas individuales típicamente están aislados de los desarrollos de otras ramas. \begin{figure}[ht] \centering \grafix{feature-branches} \caption{Ramas de Características} \label{fig:collab:feature-branches} \end{figure} Cuando una rama particular alcanza un estado deseado, alguien del equipo de características jala y fusiona de la rama principal hacia la rama de características y publica posteriormente a la rama principal. \subsection{El tren de publicación} Algunos proyectos se organizan al estilo``tren'': Una versión se planifica para ser liberada cada cierto tiempo, y las características que estén listas cuando ha llegado el momento ``tren'', se incorporan. Este modelo tiene cierta similitud a las ramas de características. La diferencia es que cuando una característica pierde el tren, alguien en el equipo de características jala y fusiona los cambios que se fueron en la versión liberada hacia la rama de característica, y el trabajo continúa sobre lo fusionado para que la característica logre estar en la próxima versión. \subsection{El modelo del kernel linux} El desarrollo del Kernel Linux tiene una estructura jerárquica bastante horizontal, rodeada de una nube de caos aparente. Dado que la mayoría de desarrolladores usan \command{git}, una herramienta distribuida de control de versiones con capacidades similares a Mercurial, resulta de utilidad describir la forma en que el trabajo fluye en tal ambiente; si le gustan las ideas, la aproximación se traduce bien entre Git y Mercurial. En el centro de la comunidad está Linus Torvalds, el creador de Linux. Él publica un único repositorio que es considerado el árbol ``oficial'' actual por la comunidad completa de desarrolladores. Cualquiera puede clonar el árbol de Linus, pero él es muy selectivo acerca de los árboles de los cuales jala. Linus tiene varios ``lugartenientes confiables''. Como regla, él jala todos los cambios que ellos publican, en la mayoría de los casos sin siquiera revisarlos. Algunos de sus lugartenientes generalmente aceptan ser los ``mantenedores'', responsables de subsistemas específicos dentro del kernel. Si un hacker cualquiera desea hacer un cambio a un subsistema y busca que termine en el árbol de Linus, debe encontrar quién es el mantenedor del subsistema y solicitarle que tenga en cuenta su cambio. Si el mantenedor revisa los cambios y está de acuerdo en tomarlos, estos pasarán al árbol de Linus de acuerdo a lo expuesto. Cada lugarteniente tiene su forma particular de revisar, aceptar y publicar los cambios; y para decidir cuando hacerlos presentes a Linus. Adicionalmente existen varias ramas conocidas que mucha gente usa para propósitos distintos. Por ejemplo, pocas personas mantienen repositorios ``estables'' de versiones anteriores del kernel, a los cuales aplican arreglos de fallos críticos necesarios. Algunos mantenedores publican varios árboles: uno para cambios experimentales; uno para cambios que van a ofrecer al mantenedor principal; y así sucesivamente. Otros publican un solo árbol. Este modelo tiene dos características notables. La primera es que son de ``jalar exclusivamente''. Usted debe solicitar, convencer o incluso rogar a otro desarrollador para que tome sus cabmios, porque casi no hay árboles en los cuales más de una persona pueda publicar, y no hay forma de publicar cambios en un árbol que otra persona controla. El segundo está basado en reputación y meritocracia. Si usted es un desconocido, Linus probablemente ignorará sus cambios, sin siquiera responderle. Pero un mantenedor de un subsistema probablemente los revisara, y los acogerá en caso de que aprueben su criterio de aplicabilidad. A medida que usted ofrezca ``mejores'' cambios a un mantenedor, habrá más posibilidad de que se confíe en su juicio y se acepten los cambios. Si usted es reconocido y matiene una rama durante bastante tiempo para algo que Linus no ha aceptado, personas con intereses similares pueden jalar sus cambios regularmente para estar al día con su trabajo. La reputación y meritocracia no necesariamente es transversal entre ``personas'' de diferentes subsistemas. Si usted es respetado pero es un hacker en almacenamiento y trata de arreglar un fallo de redes, tal cambio puede recibir un nivel de escrutinio de un mantenedor de redes comparable con el que se le haría a un completo extraño. Personas que vienen de proyectos con un ordenamiento distinto, sienten que el proceso comparativamente caótico del Kernel Linux es completamente lunático. Es objeto de los caprichos individuales; la gente desecha cambios cuando lo desean; y la fase de desarrollo es alucinante. A pesar de eso Linux es una pieza de software exitosa y bien reconocida. \subsection{Solamente jalar frente a colaboración pública} Una fuente perpetua de discusiones en la comunidad de código abierto yace en el modelo de desarrollo en el cual la gente solamente jala cambios de otros ``es mejor que'' uno en el cual muchas personas pueden publicar cambios a un repositorio compartido. Tícamente los partidarios del modelo de publicar usan las herramientas que se apegan a este modelo. Si usted usa una herramienta centralizada de control de versiones como Subversion, no hay forma de elegir qué modelo va a usar: La herramienta le ofrece publicación compartida, y si desea hacer cualquier otra cosa, va a tener que aplicar una aproximación artificial (tal como aplicar parches a mano). Una buena herramienta distribuida de control de versiones, tal como Mercurial soportará los dos modelos. Usted y sus colaboradores pueden estructurar cómo trabajarán juntos basados en sus propias necesidades y preferencias, sin depender de las peripecias que la herramienta les obligue a hacer. \subsection{Cuando la colaboración encuentra la administración ramificada} Una vez que usted y su equipo configurar algunos repositorios compartidos y comienzan a propagar cambios entre sus repositorios locales y compartidos, comenzará a encarar un reto relacionado, pero un poco distinto: Administrar las direcciones en las cuales su equipo puede moverse. A pesar de que está intimamente ligado acerca de cómo interactúa su equipo, es lo suficientemente denso para ameritar un tratamiento en el capítulo~\ref{chap:branch}. \section{Aspectos técnicos de la colaboración} Lo que resta del capítulo lo dedicamos a las cuestiones de servir datos a sus colaboradores. \section{Compartir informalmente con \hgcmd{serve}} \label{sec:collab:serve} La orden \hgcmd{serve} de Mercurial satisface de forma espectacular las necesidades de un grupo pequeño, acoplado y de corto tiempo. Se constituye en una demostración de cómo se siente usar los comandos usando la red. Ejecute \hgcmd{serve} dentro de un repositorio, y en pocos segundos iniciará un servidor HTTP especializado; aceptará conexiones desde cualquier cliente y servirá datos de este repositorio mientrs lo mantenga funcionando. Todo el que sepa el URL del servidor que ha iniciado, y que puede comunicarse con su computador por la red, puede usar un navegador web o Mercurial para leer datos del repositorio. Un URL para una instancia de \hgcmd{serve} ejecutándose en un portátil debería lucir algo \Verb|http://my-laptop.local:8000/|. La orden \hgcmd{serve} \emph{no} es un servidor web de propósito general. Solamente puede hacer dos cosas: \begin{itemize} \item Permitir que se pueda visualizar la historia del repositorio que está sirviendo desde navegadores web. \item Hablar el protocolo de conexión de Mercurial para que puedan hacer \hgcmd{clone} o \hgcmd{pull} (jalar) cambios de tal repositorio. \end{itemize} En particular, \hgcmd{serve} no permitirá que los usuarios remotos puedan \emph{modificar} su repositorio. Es de tipo solo lectura. Si está comenzando con Mercurial, no hay nada que le impida usar \hgcmd{serve} para servir un repositorio en su propio computador, y usar posteriormente órdenes como \hgcmd{clone}, \hgcmd{incoming}, para comunicarse con el servidor como si el repositorio estuviera alojado remotamente. Lo que además puede ayudarle a adecuarse rápidamente para usar comandos en repositorios alojados en la red. \subsection{Cuestiones adicionales para tener en cuenta} Dado que permite lectura sin autenticación a todos sus clientes, debería usar \hgcmd{serve} exclusivamente en ambientes en los cuáles no tenga problema en que otros vean, o en los cuales tenga control completo acerca de quien puede acceder a su red y jalar cambios de su repositorio. La orden \hgcmd{serve} no tiene conocimiento acerca de programas cortafuegos que puedan estar instalados en su sistema o en su red. No puede detectar o controlar sus cortafuegos. Si otras personas no pueden acceder a su instancia \hgcmd{serve}, lo siguiente que debería hacer (\emph{después} de asegurarse que tienen el URL correcto) es verificar su configuración de cortafuegos. De forma predeterminada, \hgcmd{serve} escucha conexiones entrantes en el puerto~8000. Si otro proceso está escuchando en tal puerto, usted podrá especificar un puerto distinto para escuchar con la opción \hgopt{serve}{-p} . Normalmente, cuando se inicia \hgcmd{serve}, no imprime nada, lo cual puede ser desconcertante. Si desea confirmar que en efecto está ejecutándose correctamente, y darse cuenta qué URL debería enviar a sus colaboradores, inícielo con la opción \hggopt{-v}. \section{Using the Secure Shell (ssh) protocol} \label{sec:collab:ssh} You can pull and push changes securely over a network connection using the Secure Shell (\texttt{ssh}) protocol. To use this successfully, you may have to do a little bit of configuration on the client or server sides. If you're not familiar with ssh, it's a network protocol that lets you securely communicate with another computer. To use it with Mercurial, you'll be setting up one or more user accounts on a server so that remote users can log in and execute commands. (If you \emph{are} familiar with ssh, you'll probably find some of the material that follows to be elementary in nature.) \subsection{How to read and write ssh URLs} An ssh URL tends to look like this: \begin{codesample2} ssh://bos@hg.serpentine.com:22/hg/hgbook \end{codesample2} \begin{enumerate} \item The ``\texttt{ssh://}'' part tells Mercurial to use the ssh protocol. \item The ``\texttt{bos@}'' component indicates what username to log into the server as. You can leave this out if the remote username is the same as your local username. \item The ``\texttt{hg.serpentine.com}'' gives the hostname of the server to log into. \item The ``:22'' identifies the port number to connect to the server on. The default port is~22, so you only need to specify this part if you're \emph{not} using port~22. \item The remainder of the URL is the local path to the repository on the server. \end{enumerate} There's plenty of scope for confusion with the path component of ssh URLs, as there is no standard way for tools to interpret it. Some programs behave differently than others when dealing with these paths. This isn't an ideal situation, but it's unlikely to change. Please read the following paragraphs carefully. Mercurial treats the path to a repository on the server as relative to the remote user's home directory. For example, if user \texttt{foo} on the server has a home directory of \dirname{/home/foo}, then an ssh URL that contains a path component of \dirname{bar} \emph{really} refers to the directory \dirname{/home/foo/bar}. If you want to specify a path relative to another user's home directory, you can use a path that starts with a tilde character followed by the user's name (let's call them \texttt{otheruser}), like this. \begin{codesample2} ssh://server/~otheruser/hg/repo \end{codesample2} And if you really want to specify an \emph{absolute} path on the server, begin the path component with two slashes, as in this example. \begin{codesample2} ssh://server//absolute/path \end{codesample2} \subsection{Finding an ssh client for your system} Almost every Unix-like system comes with OpenSSH preinstalled. If you're using such a system, run \Verb|which ssh| to find out if the \command{ssh} command is installed (it's usually in \dirname{/usr/bin}). In the unlikely event that it isn't present, take a look at your system documentation to figure out how to install it. On Windows, you'll first need to choose download a suitable ssh client. There are two alternatives. \begin{itemize} \item Simon Tatham's excellent PuTTY package~\cite{web:putty} provides a complete suite of ssh client commands. \item If you have a high tolerance for pain, you can use the Cygwin port of OpenSSH. \end{itemize} In either case, you'll need to edit your \hgini\ file to tell Mercurial where to find the actual client command. For example, if you're using PuTTY, you'll need to use the \command{plink} command as a command-line ssh client. \begin{codesample2} [ui] ssh = C:/path/to/plink.exe -ssh -i "C:/path/to/my/private/key" \end{codesample2} \begin{note} The path to \command{plink} shouldn't contain any whitespace characters, or Mercurial may not be able to run it correctly (so putting it in \dirname{C:\\Program Files} is probably not a good idea). \end{note} \subsection{Generating a key pair} To avoid the need to repetitively type a password every time you need to use your ssh client, I recommend generating a key pair. On a Unix-like system, the \command{ssh-keygen} command will do the trick. On Windows, if you're using PuTTY, the \command{puttygen} command is what you'll need. When you generate a key pair, it's usually \emph{highly} advisable to protect it with a passphrase. (The only time that you might not want to do this id when you're using the ssh protocol for automated tasks on a secure network.) Simply generating a key pair isn't enough, however. You'll need to add the public key to the set of authorised keys for whatever user you're logging in remotely as. For servers using OpenSSH (the vast majority), this will mean adding the public key to a list in a file called \sfilename{authorized\_keys} in their \sdirname{.ssh} directory. On a Unix-like system, your public key will have a \filename{.pub} extension. If you're using \command{puttygen} on Windows, you can save the public key to a file of your choosing, or paste it from the window it's displayed in straight into the \sfilename{authorized\_keys} file. \subsection{Using an authentication agent} An authentication agent is a daemon that stores passphrases in memory (so it will forget passphrases if you log out and log back in again). An ssh client will notice if it's running, and query it for a passphrase. If there's no authentication agent running, or the agent doesn't store the necessary passphrase, you'll have to type your passphrase every time Mercurial tries to communicate with a server on your behalf (e.g.~whenever you pull or push changes). The downside of storing passphrases in an agent is that it's possible for a well-prepared attacker to recover the plain text of your passphrases, in some cases even if your system has been power-cycled. You should make your own judgment as to whether this is an acceptable risk. It certainly saves a lot of repeated typing. On Unix-like systems, the agent is called \command{ssh-agent}, and it's often run automatically for you when you log in. You'll need to use the \command{ssh-add} command to add passphrases to the agent's store. On Windows, if you're using PuTTY, the \command{pageant} command acts as the agent. It adds an icon to your system tray that will let you manage stored passphrases. \subsection{Configuring the server side properly} Because ssh can be fiddly to set up if you're new to it, there's a variety of things that can go wrong. Add Mercurial on top, and there's plenty more scope for head-scratching. Most of these potential problems occur on the server side, not the client side. The good news is that once you've gotten a configuration working, it will usually continue to work indefinitely. Before you try using Mercurial to talk to an ssh server, it's best to make sure that you can use the normal \command{ssh} or \command{putty} command to talk to the server first. If you run into problems with using these commands directly, Mercurial surely won't work. Worse, it will obscure the underlying problem. Any time you want to debug ssh-related Mercurial problems, you should drop back to making sure that plain ssh client commands work first, \emph{before} you worry about whether there's a problem with Mercurial. The first thing to be sure of on the server side is that you can actually log in from another machine at all. If you can't use \command{ssh} or \command{putty} to log in, the error message you get may give you a few hints as to what's wrong. The most common problems are as follows. \begin{itemize} \item If you get a ``connection refused'' error, either there isn't an SSH daemon running on the server at all, or it's inaccessible due to firewall configuration. \item If you get a ``no route to host'' error, you either have an incorrect address for the server or a seriously locked down firewall that won't admit its existence at all. \item If you get a ``permission denied'' error, you may have mistyped the username on the server, or you could have mistyped your key's passphrase or the remote user's password. \end{itemize} In summary, if you're having trouble talking to the server's ssh daemon, first make sure that one is running at all. On many systems it will be installed, but disabled, by default. Once you're done with this step, you should then check that the server's firewall is configured to allow incoming connections on the port the ssh daemon is listening on (usually~22). Don't worry about more exotic possibilities for misconfiguration until you've checked these two first. If you're using an authentication agent on the client side to store passphrases for your keys, you ought to be able to log into the server without being prompted for a passphrase or a password. If you're prompted for a passphrase, there are a few possible culprits. \begin{itemize} \item You might have forgotten to use \command{ssh-add} or \command{pageant} to store the passphrase. \item You might have stored the passphrase for the wrong key. \end{itemize} If you're being prompted for the remote user's password, there are another few possible problems to check. \begin{itemize} \item Either the user's home directory or their \sdirname{.ssh} directory might have excessively liberal permissions. As a result, the ssh daemon will not trust or read their \sfilename{authorized\_keys} file. For example, a group-writable home or \sdirname{.ssh} directory will often cause this symptom. \item The user's \sfilename{authorized\_keys} file may have a problem. If anyone other than the user owns or can write to that file, the ssh daemon will not trust or read it. \end{itemize} In the ideal world, you should be able to run the following command successfully, and it should print exactly one line of output, the current date and time. \begin{codesample2} ssh myserver date \end{codesample2} If, on your server, you have login scripts that print banners or other junk even when running non-interactive commands like this, you should fix them before you continue, so that they only print output if they're run interactively. Otherwise these banners will at least clutter up Mercurial's output. Worse, they could potentially cause problems with running Mercurial commands remotely. Mercurial makes tries to detect and ignore banners in non-interactive \command{ssh} sessions, but it is not foolproof. (If you're editing your login scripts on your server, the usual way to see if a login script is running in an interactive shell is to check the return code from the command \Verb|tty -s|.) Once you've verified that plain old ssh is working with your server, the next step is to ensure that Mercurial runs on the server. The following command should run successfully: \begin{codesample2} ssh myserver hg version \end{codesample2} If you see an error message instead of normal \hgcmd{version} output, this is usually because you haven't installed Mercurial to \dirname{/usr/bin}. Don't worry if this is the case; you don't need to do that. But you should check for a few possible problems. \begin{itemize} \item Is Mercurial really installed on the server at all? I know this sounds trivial, but it's worth checking! \item Maybe your shell's search path (usually set via the \envar{PATH} environment variable) is simply misconfigured. \item Perhaps your \envar{PATH} environment variable is only being set to point to the location of the \command{hg} executable if the login session is interactive. This can happen if you're setting the path in the wrong shell login script. See your shell's documentation for details. \item The \envar{PYTHONPATH} environment variable may need to contain the path to the Mercurial Python modules. It might not be set at all; it could be incorrect; or it may be set only if the login is interactive. \end{itemize} If you can run \hgcmd{version} over an ssh connection, well done! You've got the server and client sorted out. You should now be able to use Mercurial to access repositories hosted by that username on that server. If you run into problems with Mercurial and ssh at this point, try using the \hggopt{--debug} option to get a clearer picture of what's going on. \subsection{Using compression with ssh} Mercurial does not compress data when it uses the ssh protocol, because the ssh protocol can transparently compress data. However, the default behaviour of ssh clients is \emph{not} to request compression. Over any network other than a fast LAN (even a wireless network), using compression is likely to significantly speed up Mercurial's network operations. For example, over a WAN, someone measured compression as reducing the amount of time required to clone a particularly large repository from~51 minutes to~17 minutes. Both \command{ssh} and \command{plink} accept a \cmdopt{ssh}{-C} option which turns on compression. You can easily edit your \hgrc\ to enable compression for all of Mercurial's uses of the ssh protocol. \begin{codesample2} [ui] ssh = ssh -C \end{codesample2} If you use \command{ssh}, you can configure it to always use compression when talking to your server. To do this, edit your \sfilename{.ssh/config} file (which may not yet exist), as follows. \begin{codesample2} Host hg Compression yes HostName hg.example.com \end{codesample2} This defines an alias, \texttt{hg}. When you use it on the \command{ssh} command line or in a Mercurial \texttt{ssh}-protocol URL, it will cause \command{ssh} to connect to \texttt{hg.example.com} and use compression. This gives you both a shorter name to type and compression, each of which is a good thing in its own right. \section{Serving over HTTP using CGI} \label{sec:collab:cgi} Depending on how ambitious you are, configuring Mercurial's CGI interface can take anything from a few moments to several hours. We'll begin with the simplest of examples, and work our way towards a more complex configuration. Even for the most basic case, you're almost certainly going to need to read and modify your web server's configuration. \begin{note} Configuring a web server is a complex, fiddly, and highly system-dependent activity. I can't possibly give you instructions that will cover anything like all of the cases you will encounter. Please use your discretion and judgment in following the sections below. Be prepared to make plenty of mistakes, and to spend a lot of time reading your server's error logs. \end{note} \subsection{Web server configuration checklist} Before you continue, do take a few moments to check a few aspects of your system's setup. \begin{enumerate} \item Do you have a web server installed at all? Mac OS X ships with Apache, but many other systems may not have a web server installed. \item If you have a web server installed, is it actually running? On most systems, even if one is present, it will be disabled by default. \item Is your server configured to allow you to run CGI programs in the directory where you plan to do so? Most servers default to explicitly disabling the ability to run CGI programs. \end{enumerate} If you don't have a web server installed, and don't have substantial experience configuring Apache, you should consider using the \texttt{lighttpd} web server instead of Apache. Apache has a well-deserved reputation for baroque and confusing configuration. While \texttt{lighttpd} is less capable in some ways than Apache, most of these capabilities are not relevant to serving Mercurial repositories. And \texttt{lighttpd} is undeniably \emph{much} easier to get started with than Apache. \subsection{Basic CGI configuration} On Unix-like systems, it's common for users to have a subdirectory named something like \dirname{public\_html} in their home directory, from which they can serve up web pages. A file named \filename{foo} in this directory will be accessible at a URL of the form \texttt{http://www.example.com/\~username/foo}. To get started, find the \sfilename{hgweb.cgi} script that should be present in your Mercurial installation. If you can't quickly find a local copy on your system, simply download one from the master Mercurial repository at \url{http://www.selenic.com/repo/hg/raw-file/tip/hgweb.cgi}. You'll need to copy this script into your \dirname{public\_html} directory, and ensure that it's executable. \begin{codesample2} cp .../hgweb.cgi ~/public_html chmod 755 ~/public_html/hgweb.cgi \end{codesample2} The \texttt{755} argument to \command{chmod} is a little more general than just making the script executable: it ensures that the script is executable by anyone, and that ``group'' and ``other'' write permissions are \emph{not} set. If you were to leave those write permissions enabled, Apache's \texttt{suexec} subsystem would likely refuse to execute the script. In fact, \texttt{suexec} also insists that the \emph{directory} in which the script resides must not be writable by others. \begin{codesample2} chmod 755 ~/public_html \end{codesample2} \subsubsection{What could \emph{possibly} go wrong?} \label{sec:collab:wtf} Once you've copied the CGI script into place, go into a web browser, and try to open the URL \url{http://myhostname/~myuser/hgweb.cgi}, \emph{but} brace yourself for instant failure. There's a high probability that trying to visit this URL will fail, and there are many possible reasons for this. In fact, you're likely to stumble over almost every one of the possible errors below, so please read carefully. The following are all of the problems I ran into on a system running Fedora~7, with a fresh installation of Apache, and a user account that I created specially to perform this exercise. Your web server may have per-user directories disabled. If you're using Apache, search your config file for a \texttt{UserDir} directive. If there's none present, per-user directories will be disabled. If one exists, but its value is \texttt{disabled}, then per-user directories will be disabled. Otherwise, the string after \texttt{UserDir} gives the name of the subdirectory that Apache will look in under your home directory, for example \dirname{public\_html}. Your file access permissions may be too restrictive. The web server must be able to traverse your home directory and directories under your \dirname{public\_html} directory, and read files under the latter too. Here's a quick recipe to help you to make your permissions more appropriate. \begin{codesample2} chmod 755 ~ find ~/public_html -type d -print0 | xargs -0r chmod 755 find ~/public_html -type f -print0 | xargs -0r chmod 644 \end{codesample2} The other possibility with permissions is that you might get a completely empty window when you try to load the script. In this case, it's likely that your access permissions are \emph{too permissive}. Apache's \texttt{suexec} subsystem won't execute a script that's group-~or world-writable, for example. Your web server may be configured to disallow execution of CGI programs in your per-user web directory. Here's Apache's default per-user configuration from my Fedora system. \begin{codesample2} <Directory /home/*/public_html> AllowOverride FileInfo AuthConfig Limit Options MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec <Limit GET POST OPTIONS> Order allow,deny Allow from all </Limit> <LimitExcept GET POST OPTIONS> Order deny,allow Deny from all </LimitExcept> </Directory> \end{codesample2} If you find a similar-looking \texttt{Directory} group in your Apache configuration, the directive to look at inside it is \texttt{Options}. Add \texttt{ExecCGI} to the end of this list if it's missing, and restart the web server. If you find that Apache serves you the text of the CGI script instead of executing it, you may need to either uncomment (if already present) or add a directive like this. \begin{codesample2} AddHandler cgi-script .cgi \end{codesample2} The next possibility is that you might be served with a colourful Python backtrace claiming that it can't import a \texttt{mercurial}-related module. This is actually progress! The server is now capable of executing your CGI script. This error is only likely to occur if you're running a private installation of Mercurial, instead of a system-wide version. Remember that the web server runs the CGI program without any of the environment variables that you take for granted in an interactive session. If this error happens to you, edit your copy of \sfilename{hgweb.cgi} and follow the directions inside it to correctly set your \envar{PYTHONPATH} environment variable. Finally, you are \emph{certain} to by served with another colourful Python backtrace: this one will complain that it can't find \dirname{/path/to/repository}. Edit your \sfilename{hgweb.cgi} script and replace the \dirname{/path/to/repository} string with the complete path to the repository you want to serve up. At this point, when you try to reload the page, you should be presented with a nice HTML view of your repository's history. Whew! \subsubsection{Configuring lighttpd} To be exhaustive in my experiments, I tried configuring the increasingly popular \texttt{lighttpd} web server to serve the same repository as I described with Apache above. I had already overcome all of the problems I outlined with Apache, many of which are not server-specific. As a result, I was fairly sure that my file and directory permissions were good, and that my \sfilename{hgweb.cgi} script was properly edited. Once I had Apache running, getting \texttt{lighttpd} to serve the repository was a snap (in other words, even if you're trying to use \texttt{lighttpd}, you should read the Apache section). I first had to edit the \texttt{mod\_access} section of its config file to enable \texttt{mod\_cgi} and \texttt{mod\_userdir}, both of which were disabled by default on my system. I then added a few lines to the end of the config file, to configure these modules. \begin{codesample2} userdir.path = "public_html" cgi.assign = ( ".cgi" => "" ) \end{codesample2} With this done, \texttt{lighttpd} ran immediately for me. If I had configured \texttt{lighttpd} before Apache, I'd almost certainly have run into many of the same system-level configuration problems as I did with Apache. However, I found \texttt{lighttpd} to be noticeably easier to configure than Apache, even though I've used Apache for over a decade, and this was my first exposure to \texttt{lighttpd}. \subsection{Sharing multiple repositories with one CGI script} The \sfilename{hgweb.cgi} script only lets you publish a single repository, which is an annoying restriction. If you want to publish more than one without wracking yourself with multiple copies of the same script, each with different names, a better choice is to use the \sfilename{hgwebdir.cgi} script. The procedure to configure \sfilename{hgwebdir.cgi} is only a little more involved than for \sfilename{hgweb.cgi}. First, you must obtain a copy of the script. If you don't have one handy, you can download a copy from the master Mercurial repository at \url{http://www.selenic.com/repo/hg/raw-file/tip/hgwebdir.cgi}. You'll need to copy this script into your \dirname{public\_html} directory, and ensure that it's executable. \begin{codesample2} cp .../hgwebdir.cgi ~/public_html chmod 755 ~/public_html ~/public_html/hgwebdir.cgi \end{codesample2} With basic configuration out of the way, try to visit \url{http://myhostname/~myuser/hgwebdir.cgi} in your browser. It should display an empty list of repositories. If you get a blank window or error message, try walking through the list of potential problems in section~\ref{sec:collab:wtf}. The \sfilename{hgwebdir.cgi} script relies on an external configuration file. By default, it searches for a file named \sfilename{hgweb.config} in the same directory as itself. You'll need to create this file, and make it world-readable. The format of the file is similar to a Windows ``ini'' file, as understood by Python's \texttt{ConfigParser}~\cite{web:configparser} module. The easiest way to configure \sfilename{hgwebdir.cgi} is with a section named \texttt{collections}. This will automatically publish \emph{every} repository under the directories you name. The section should look like this: \begin{codesample2} [collections] /my/root = /my/root \end{codesample2} Mercurial interprets this by looking at the directory name on the \emph{right} hand side of the ``\texttt{=}'' sign; finding repositories in that directory hierarchy; and using the text on the \emph{left} to strip off matching text from the names it will actually list in the web interface. The remaining component of a path after this stripping has occurred is called a ``virtual path''. Given the example above, if we have a repository whose local path is \dirname{/my/root/this/repo}, the CGI script will strip the leading \dirname{/my/root} from the name, and publish the repository with a virtual path of \dirname{this/repo}. If the base URL for our CGI script is \url{http://myhostname/~myuser/hgwebdir.cgi}, the complete URL for that repository will be \url{http://myhostname/~myuser/hgwebdir.cgi/this/repo}. If we replace \dirname{/my/root} on the left hand side of this example with \dirname{/my}, then \sfilename{hgwebdir.cgi} will only strip off \dirname{/my} from the repository name, and will give us a virtual path of \dirname{root/this/repo} instead of \dirname{this/repo}. The \sfilename{hgwebdir.cgi} script will recursively search each directory listed in the \texttt{collections} section of its configuration file, but it will \texttt{not} recurse into the repositories it finds. The \texttt{collections} mechanism makes it easy to publish many repositories in a ``fire and forget'' manner. You only need to set up the CGI script and configuration file one time. Afterwards, you can publish or unpublish a repository at any time by simply moving it into, or out of, the directory hierarchy in which you've configured \sfilename{hgwebdir.cgi} to look. \subsubsection{Explicitly specifying which repositories to publish} In addition to the \texttt{collections} mechanism, the \sfilename{hgwebdir.cgi} script allows you to publish a specific list of repositories. To do so, create a \texttt{paths} section, with contents of the following form. \begin{codesample2} [paths] repo1 = /my/path/to/some/repo repo2 = /some/path/to/another \end{codesample2} In this case, the virtual path (the component that will appear in a URL) is on the left hand side of each definition, while the path to the repository is on the right. Notice that there does not need to be any relationship between the virtual path you choose and the location of a repository in your filesystem. If you wish, you can use both the \texttt{collections} and \texttt{paths} mechanisms simultaneously in a single configuration file. \begin{note} If multiple repositories have the same virtual path, \sfilename{hgwebdir.cgi} will not report an error. Instead, it will behave unpredictably. \end{note} \subsection{Downloading source archives} Mercurial's web interface lets users download an archive of any revision. This archive will contain a snapshot of the working directory as of that revision, but it will not contain a copy of the repository data. By default, this feature is not enabled. To enable it, you'll need to add an \rcitem{web}{allow\_archive} item to the \rcsection{web} section of your \hgrc. \subsection{Web configuration options} Mercurial's web interfaces (the \hgcmd{serve} command, and the \sfilename{hgweb.cgi} and \sfilename{hgwebdir.cgi} scripts) have a number of configuration options that you can set. These belong in a section named \rcsection{web}. \begin{itemize} \item[\rcitem{web}{allow\_archive}] Determines which (if any) archive download mechanisms Mercurial supports. If you enable this feature, users of the web interface will be able to download an archive of whatever revision of a repository they are viewing. To enable the archive feature, this item must take the form of a sequence of words drawn from the list below. \begin{itemize} \item[\texttt{bz2}] A \command{tar} archive, compressed using \texttt{bzip2} compression. This has the best compression ratio, but uses the most CPU time on the server. \item[\texttt{gz}] A \command{tar} archive, compressed using \texttt{gzip} compression. \item[\texttt{zip}] A \command{zip} archive, compressed using LZW compression. This format has the worst compression ratio, but is widely used in the Windows world. \end{itemize} If you provide an empty list, or don't have an \rcitem{web}{allow\_archive} entry at all, this feature will be disabled. Here is an example of how to enable all three supported formats. \begin{codesample4} [web] allow_archive = bz2 gz zip \end{codesample4} \item[\rcitem{web}{allowpull}] Boolean. Determines whether the web interface allows remote users to \hgcmd{pull} and \hgcmd{clone} this repository over~HTTP. If set to \texttt{no} or \texttt{false}, only the ``human-oriented'' portion of the web interface is available. \item[\rcitem{web}{contact}] String. A free-form (but preferably brief) string identifying the person or group in charge of the repository. This often contains the name and email address of a person or mailing list. It often makes sense to place this entry in a repository's own \sfilename{.hg/hgrc} file, but it can make sense to use in a global \hgrc\ if every repository has a single maintainer. \item[\rcitem{web}{maxchanges}] Integer. The default maximum number of changesets to display in a single page of output. \item[\rcitem{web}{maxfiles}] Integer. The default maximum number of modified files to display in a single page of output. \item[\rcitem{web}{stripes}] Integer. If the web interface displays alternating ``stripes'' to make it easier to visually align rows when you are looking at a table, this number controls the number of rows in each stripe. \item[\rcitem{web}{style}] Controls the template Mercurial uses to display the web interface. Mercurial ships with two web templates, named \texttt{default} and \texttt{gitweb} (the latter is much more visually attractive). You can also specify a custom template of your own; see chapter~\ref{chap:template} for details. Here, you can see how to enable the \texttt{gitweb} style. \begin{codesample4} [web] style = gitweb \end{codesample4} \item[\rcitem{web}{templates}] Path. The directory in which to search for template files. By default, Mercurial searches in the directory in which it was installed. \end{itemize} If you are using \sfilename{hgwebdir.cgi}, you can place a few configuration items in a \rcsection{web} section of the \sfilename{hgweb.config} file instead of a \hgrc\ file, for convenience. These items are \rcitem{web}{motd} and \rcitem{web}{style}. \subsubsection{Options specific to an individual repository} A few \rcsection{web} configuration items ought to be placed in a repository's local \sfilename{.hg/hgrc}, rather than a user's or global \hgrc. \begin{itemize} \item[\rcitem{web}{description}] String. A free-form (but preferably brief) string that describes the contents or purpose of the repository. \item[\rcitem{web}{name}] String. The name to use for the repository in the web interface. This overrides the default name, which is the last component of the repository's path. \end{itemize} \subsubsection{Options specific to the \hgcmd{serve} command} Some of the items in the \rcsection{web} section of a \hgrc\ file are only for use with the \hgcmd{serve} command. \begin{itemize} \item[\rcitem{web}{accesslog}] Path. The name of a file into which to write an access log. By default, the \hgcmd{serve} command writes this information to standard output, not to a file. Log entries are written in the standard ``combined'' file format used by almost all web servers. \item[\rcitem{web}{address}] String. The local address on which the server should listen for incoming connections. By default, the server listens on all addresses. \item[\rcitem{web}{errorlog}] Path. The name of a file into which to write an error log. By default, the \hgcmd{serve} command writes this information to standard error, not to a file. \item[\rcitem{web}{ipv6}] Boolean. Whether to use the IPv6 protocol. By default, IPv6 is not used. \item[\rcitem{web}{port}] Integer. The TCP~port number on which the server should listen. The default port number used is~8000. \end{itemize} \subsubsection{Choosing the right \hgrc\ file to add \rcsection{web} items to} It is important to remember that a web server like Apache or \texttt{lighttpd} will run under a user~ID that is different to yours. CGI scripts run by your server, such as \sfilename{hgweb.cgi}, will usually also run under that user~ID. If you add \rcsection{web} items to your own personal \hgrc\ file, CGI scripts won't read that \hgrc\ file. Those settings will thus only affect the behaviour of the \hgcmd{serve} command when you run it. To cause CGI scripts to see your settings, either create a \hgrc\ file in the home directory of the user ID that runs your web server, or add those settings to a system-wide \hgrc\ file. %%% Local Variables: %%% mode: latex %%% TeX-master: "00book" %%% End: