Mercurial > hgbook
diff ja/intro.tex @ 290:b0db5adf11c1 ja_root
fork Japanese translation.
author | Yoshiki Yazawa <yaz@cc.rim.or.jp> |
---|---|
date | Wed, 06 Feb 2008 17:43:11 +0900 |
parents | en/intro.tex@4700dd38384c |
children | 3b1291f24c0d |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/ja/intro.tex Wed Feb 06 17:43:11 2008 +0900 @@ -0,0 +1,567 @@ +\chapter{Introduction} +\label{chap:intro} + +\section{About revision control} + +Revision control is the process of managing multiple versions of a +piece of information. In its simplest form, this is something that +many people do by hand: every time you modify a file, save it under a +new name that contains a number, each one higher than the number of +the preceding version. + +Manually managing multiple versions of even a single file is an +error-prone task, though, so software tools to help automate this +process have long been available. The earliest automated revision +control tools were intended to help a single user to manage revisions +of a single file. Over the past few decades, the scope of revision +control tools has expanded greatly; they now manage multiple files, +and help multiple people to work together. The best modern revision +control tools have no problem coping with thousands of people working +together on projects that consist of hundreds of thousands of files. + +\subsection{Why use revision control?} + +There are a number of reasons why you or your team might want to use +an automated revision control tool for a project. +\begin{itemize} +\item It will track the history and evolution of your project, so you + don't have to. For every change, you'll have a log of \emph{who} + made it; \emph{why} they made it; \emph{when} they made it; and + \emph{what} the change was. +\item When you're working with other people, revision control software + makes it easier for you to collaborate. For example, when people + more or less simultaneously make potentially incompatible changes, + the software will help you to identify and resolve those conflicts. +\item It can help you to recover from mistakes. If you make a change + that later turns out to be in error, you can revert to an earlier + version of one or more files. In fact, a \emph{really} good + revision control tool will even help you to efficiently figure out + exactly when a problem was introduced (see + section~\ref{sec:undo:bisect} for details). +\item It will help you to work simultaneously on, and manage the drift + between, multiple versions of your project. +\end{itemize} +Most of these reasons are equally valid---at least in theory---whether +you're working on a project by yourself, or with a hundred other +people. + +A key question about the practicality of revision control at these two +different scales (``lone hacker'' and ``huge team'') is how its +\emph{benefits} compare to its \emph{costs}. A revision control tool +that's difficult to understand or use is going to impose a high cost. + +A five-hundred-person project is likely to collapse under its own +weight almost immediately without a revision control tool and process. +In this case, the cost of using revision control might hardly seem +worth considering, since \emph{without} it, failure is almost +guaranteed. + +On the other hand, a one-person ``quick hack'' might seem like a poor +place to use a revision control tool, because surely the cost of using +one must be close to the overall cost of the project. Right? + +Mercurial uniquely supports \emph{both} of these scales of +development. You can learn the basics in just a few minutes, and due +to its low overhead, you can apply revision control to the smallest of +projects with ease. Its simplicity means you won't have a lot of +abstruse concepts or command sequences competing for mental space with +whatever you're \emph{really} trying to do. At the same time, +Mercurial's high performance and peer-to-peer nature let you scale +painlessly to handle large projects. + +No revision control tool can rescue a poorly run project, but a good +choice of tools can make a huge difference to the fluidity with which +you can work on a project. + +\subsection{The many names of revision control} + +Revision control is a diverse field, so much so that it doesn't +actually have a single name or acronym. Here are a few of the more +common names and acronyms you'll encounter: +\begin{itemize} +\item Revision control (RCS) +\item Software configuration management (SCM), or configuration management +\item Source code management +\item Source code control, or source control +\item Version control (VCS) +\end{itemize} +Some people claim that these terms actually have different meanings, +but in practice they overlap so much that there's no agreed or even +useful way to tease them apart. + +\section{A short history of revision control} + +The best known of the old-time revision control tools is SCCS (Source +Code Control System), which Marc Rochkind wrote at Bell Labs, in the +early 1970s. SCCS operated on individual files, and required every +person working on a project to have access to a shared workspace on a +single system. Only one person could modify a file at any time; +arbitration for access to files was via locks. It was common for +people to lock files, and later forget to unlock them, preventing +anyone else from modifying those files without the help of an +administrator. + +Walter Tichy developed a free alternative to SCCS in the early 1980s; +he called his program RCS (Revison Control System). Like SCCS, RCS +required developers to work in a single shared workspace, and to lock +files to prevent multiple people from modifying them simultaneously. + +Later in the 1980s, Dick Grune used RCS as a building block for a set +of shell scripts he initially called cmt, but then renamed to CVS +(Concurrent Versions System). The big innovation of CVS was that it +let developers work simultaneously and somewhat independently in their +own personal workspaces. The personal workspaces prevented developers +from stepping on each other's toes all the time, as was common with +SCCS and RCS. Each developer had a copy of every project file, and +could modify their copies independently. They had to merge their +edits prior to committing changes to the central repository. + +Brian Berliner took Grune's original scripts and rewrote them in~C, +releasing in 1989 the code that has since developed into the modern +version of CVS. CVS subsequently acquired the ability to operate over +a network connection, giving it a client/server architecture. CVS's +architecture is centralised; only the server has a copy of the history +of the project. Client workspaces just contain copies of recent +versions of the project's files, and a little metadata to tell them +where the server is. CVS has been enormously successful; it is +probably the world's most widely used revision control system. + +In the early 1990s, Sun Microsystems developed an early distributed +revision control system, called TeamWare. A TeamWare workspace +contains a complete copy of the project's history. TeamWare has no +notion of a central repository. (CVS relied upon RCS for its history +storage; TeamWare used SCCS.) + +As the 1990s progressed, awareness grew of a number of problems with +CVS. It records simultaneous changes to multiple files individually, +instead of grouping them together as a single logically atomic +operation. It does not manage its file hierarchy well; it is easy to +make a mess of a repository by renaming files and directories. Worse, +its source code is difficult to read and maintain, which made the +``pain level'' of fixing these architectural problems prohibitive. + +In 2001, Jim Blandy and Karl Fogel, two developers who had worked on +CVS, started a project to replace it with a tool that would have a +better architecture and cleaner code. The result, Subversion, does +not stray from CVS's centralised client/server model, but it adds +multi-file atomic commits, better namespace management, and a number +of other features that make it a generally better tool than CVS. +Since its initial release, it has rapidly grown in popularity. + +More or less simultaneously, Graydon Hoare began working on an +ambitious distributed revision control system that he named Monotone. +While Monotone addresses many of CVS's design flaws and has a +peer-to-peer architecture, it goes beyond earlier (and subsequent) +revision control tools in a number of innovative ways. It uses +cryptographic hashes as identifiers, and has an integral notion of +``trust'' for code from different sources. + +Mercurial began life in 2005. While a few aspects of its design are +influenced by Monotone, Mercurial focuses on ease of use, high +performance, and scalability to very large projects. + +\section{Trends in revision control} + +There has been an unmistakable trend in the development and use of +revision control tools over the past four decades, as people have +become familiar with the capabilities of their tools and constrained +by their limitations. + +The first generation began by managing single files on individual +computers. Although these tools represented a huge advance over +ad-hoc manual revision control, their locking model and reliance on a +single computer limited them to small, tightly-knit teams. + +The second generation loosened these constraints by moving to +network-centered architectures, and managing entire projects at a +time. As projects grew larger, they ran into new problems. With +clients needing to talk to servers very frequently, server scaling +became an issue for large projects. An unreliable network connection +could prevent remote users from being able to talk to the server at +all. As open source projects started making read-only access +available anonymously to anyone, people without commit privileges +found that they could not use the tools to interact with a project in +a natural way, as they could not record their changes. + +The current generation of revision control tools is peer-to-peer in +nature. All of these systems have dropped the dependency on a single +central server, and allow people to distribute their revision control +data to where it's actually needed. Collaboration over the Internet +has moved from constrained by technology to a matter of choice and +consensus. Modern tools can operate offline indefinitely and +autonomously, with a network connection only needed when syncing +changes with another repository. + +\section{A few of the advantages of distributed revision control} + +Even though distributed revision control tools have for several years +been as robust and usable as their previous-generation counterparts, +people using older tools have not yet necessarily woken up to their +advantages. There are a number of ways in which distributed tools +shine relative to centralised ones. + +For an individual developer, distributed tools are almost always much +faster than centralised tools. This is for a simple reason: a +centralised tool needs to talk over the network for many common +operations, because most metadata is stored in a single copy on the +central server. A distributed tool stores all of its metadata +locally. All else being equal, talking over the network adds overhead +to a centralised tool. Don't underestimate the value of a snappy, +responsive tool: you're going to spend a lot of time interacting with +your revision control software. + +Distributed tools are indifferent to the vagaries of your server +infrastructure, again because they replicate metadata to so many +locations. If you use a centralised system and your server catches +fire, you'd better hope that your backup media are reliable, and that +your last backup was recent and actually worked. With a distributed +tool, you have many backups available on every contributor's computer. + +The reliability of your network will affect distributed tools far less +than it will centralised tools. You can't even use a centralised tool +without a network connection, except for a few highly constrained +commands. With a distributed tool, if your network connection goes +down while you're working, you may not even notice. The only thing +you won't be able to do is talk to repositories on other computers, +something that is relatively rare compared with local operations. If +you have a far-flung team of collaborators, this may be significant. + +\subsection{Advantages for open source projects} + +If you take a shine to an open source project and decide that you +would like to start hacking on it, and that project uses a distributed +revision control tool, you are at once a peer with the people who +consider themselves the ``core'' of that project. If they publish +their repositories, you can immediately copy their project history, +start making changes, and record your work, using the same tools in +the same ways as insiders. By contrast, with a centralised tool, you +must use the software in a ``read only'' mode unless someone grants +you permission to commit changes to their central server. Until then, +you won't be able to record changes, and your local modifications will +be at risk of corruption any time you try to update your client's view +of the repository. + +\subsubsection{The forking non-problem} + +It has been suggested that distributed revision control tools pose +some sort of risk to open source projects because they make it easy to +``fork'' the development of a project. A fork happens when there are +differences in opinion or attitude between groups of developers that +cause them to decide that they can't work together any longer. Each +side takes a more or less complete copy of the project's source code, +and goes off in its own direction. + +Sometimes the camps in a fork decide to reconcile their differences. +With a centralised revision control system, the \emph{technical} +process of reconciliation is painful, and has to be performed largely +by hand. You have to decide whose revision history is going to +``win'', and graft the other team's changes into the tree somehow. +This usually loses some or all of one side's revision history. + +What distributed tools do with respect to forking is they make forking +the \emph{only} way to develop a project. Every single change that +you make is potentially a fork point. The great strength of this +approach is that a distributed revision control tool has to be really +good at \emph{merging} forks, because forks are absolutely +fundamental: they happen all the time. + +If every piece of work that everybody does, all the time, is framed in +terms of forking and merging, then what the open source world refers +to as a ``fork'' becomes \emph{purely} a social issue. If anything, +distributed tools \emph{lower} the likelihood of a fork: +\begin{itemize} +\item They eliminate the social distinction that centralised tools + impose: that between insiders (people with commit access) and + outsiders (people without). +\item They make it easier to reconcile after a social fork, because + all that's involved from the perspective of the revision control + software is just another merge. +\end{itemize} + +Some people resist distributed tools because they want to retain tight +control over their projects, and they believe that centralised tools +give them this control. However, if you're of this belief, and you +publish your CVS or Subversion repositories publically, there are +plenty of tools available that can pull out your entire project's +history (albeit slowly) and recreate it somewhere that you don't +control. So while your control in this case is illusory, you are +forgoing the ability to fluidly collaborate with whatever people feel +compelled to mirror and fork your history. + +\subsection{Advantages for commercial projects} + +Many commercial projects are undertaken by teams that are scattered +across the globe. Contributors who are far from a central server will +see slower command execution and perhaps less reliability. Commercial +revision control systems attempt to ameliorate these problems with +remote-site replication add-ons that are typically expensive to buy +and cantankerous to administer. A distributed system doesn't suffer +from these problems in the first place. Better yet, you can easily +set up multiple authoritative servers, say one per site, so that +there's no redundant communication between repositories over expensive +long-haul network links. + +Centralised revision control systems tend to have relatively low +scalability. It's not unusual for an expensive centralised system to +fall over under the combined load of just a few dozen concurrent +users. Once again, the typical response tends to be an expensive and +clunky replication facility. Since the load on a central server---if +you have one at all---is many times lower with a distributed +tool (because all of the data is replicated everywhere), a single +cheap server can handle the needs of a much larger team, and +replication to balance load becomes a simple matter of scripting. + +If you have an employee in the field, troubleshooting a problem at a +customer's site, they'll benefit from distributed revision control. +The tool will let them generate custom builds, try different fixes in +isolation from each other, and search efficiently through history for +the sources of bugs and regressions in the customer's environment, all +without needing to connect to your company's network. + +\section{Why choose Mercurial?} + +Mercurial has a unique set of properties that make it a particularly +good choice as a revision control system. +\begin{itemize} +\item It is easy to learn and use. +\item It is lightweight. +\item It scales excellently. +\item It is easy to customise. +\end{itemize} + +If you are at all familiar with revision control systems, you should +be able to get up and running with Mercurial in less than five +minutes. Even if not, it will take no more than a few minutes +longer. Mercurial's command and feature sets are generally uniform +and consistent, so you can keep track of a few general rules instead +of a host of exceptions. + +On a small project, you can start working with Mercurial in moments. +Creating new changes and branches; transferring changes around +(whether locally or over a network); and history and status operations +are all fast. Mercurial attempts to stay nimble and largely out of +your way by combining low cognitive overhead with blazingly fast +operations. + +The usefulness of Mercurial is not limited to small projects: it is +used by projects with hundreds to thousands of contributors, each +containing tens of thousands of files and hundreds of megabytes of +source code. + +If the core functionality of Mercurial is not enough for you, it's +easy to build on. Mercurial is well suited to scripting tasks, and +its clean internals and implementation in Python make it easy to add +features in the form of extensions. There are a number of popular and +useful extensions already available, ranging from helping to identify +bugs to improving performance. + +\section{Mercurial compared with other tools} + +Before you read on, please understand that this section necessarily +reflects my own experiences, interests, and (dare I say it) biases. I +have used every one of the revision control tools listed below, in +most cases for several years at a time. + + +\subsection{Subversion} + +Subversion is a popular revision control tool, developed to replace +CVS. It has a centralised client/server architecture. + +Subversion and Mercurial have similarly named commands for performing +the same operations, so if you're familiar with one, it is easy to +learn to use the other. Both tools are portable to all popular +operating systems. + +Subversion lacks a history-aware merge capability, forcing its users +to manually track exactly which revisions have been merged between +branches. If users fail to do this, or make mistakes, they face the +prospect of manually resolving merges with unnecessary conflicts. +Subversion also fails to merge changes when files or directories are +renamed. Subversion's poor merge support is its single biggest +weakness. + +Mercurial has a substantial performance advantage over Subversion on +every revision control operation I have benchmarked. I have measured +its advantage as ranging from a factor of two to a factor of six when +compared with Subversion~1.4.3's \emph{ra\_local} file store, which is +the fastest access method available). In more realistic deployments +involving a network-based store, Subversion will be at a substantially +larger disadvantage. Because many Subversion commands must talk to +the server and Subversion does not have useful replication facilities, +server capacity and network bandwidth become bottlenecks for modestly +large projects. + +Additionally, Subversion incurs substantial storage overhead to avoid +network transactions for a few common operations, such as finding +modified files (\texttt{status}) and displaying modifications against +the current revision (\texttt{diff}). As a result, a Subversion +working copy is often the same size as, or larger than, a Mercurial +repository and working directory, even though the Mercurial repository +contains a complete history of the project. + +Subversion is widely supported by third party tools. Mercurial +currently lags considerably in this area. This gap is closing, +however, and indeed some of Mercurial's GUI tools now outshine their +Subversion equivalents. Like Mercurial, Subversion has an excellent +user manual. + +Because Subversion doesn't store revision history on the client, it is +well suited to managing projects that deal with lots of large, opaque +binary files. If you check in fifty revisions to an incompressible +10MB file, Subversion's client-side space usage stays constant The +space used by any distributed SCM will grow rapidly in proportion to +the number of revisions, because the differences between each revision +are large. + +In addition, it's often difficult or, more usually, impossible to +merge different versions of a binary file. Subversion's ability to +let a user lock a file, so that they temporarily have the exclusive +right to commit changes to it, can be a significant advantage to a +project where binary files are widely used. + +Mercurial can import revision history from a Subversion repository. +It can also export revision history to a Subversion repository. This +makes it easy to ``test the waters'' and use Mercurial and Subversion +in parallel before deciding to switch. History conversion is +incremental, so you can perform an initial conversion, then small +additional conversions afterwards to bring in new changes. + + +\subsection{Git} + +Git is a distributed revision control tool that was developed for +managing the Linux kernel source tree. Like Mercurial, its early +design was somewhat influenced by Monotone. + +Git has a very large command set, with version~1.5.0 providing~139 +individual commands. It has something of a reputation for being +difficult to learn. Compared to Git, Mercurial has a strong focus on +simplicity. + +In terms of performance, Git is extremely fast. In several cases, it +is faster than Mercurial, at least on Linux, while Mercurial performs +better on other operations. However, on Windows, the performance and +general level of support that Git provides is, at the time of writing, +far behind that of Mercurial. + +While a Mercurial repository needs no maintenance, a Git repository +requires frequent manual ``repacks'' of its metadata. Without these, +performance degrades, while space usage grows rapidly. A server that +contains many Git repositories that are not rigorously and frequently +repacked will become heavily disk-bound during backups, and there have +been instances of daily backups taking far longer than~24 hours as a +result. A freshly packed Git repository is slightly smaller than a +Mercurial repository, but an unpacked repository is several orders of +magnitude larger. + +The core of Git is written in C. Many Git commands are implemented as +shell or Perl scripts, and the quality of these scripts varies widely. +I have encountered several instances where scripts charged along +blindly in the presence of errors that should have been fatal. + +Mercurial can import revision history from a Git repository. + + +\subsection{CVS} + +CVS is probably the most widely used revision control tool in the +world. Due to its age and internal untidiness, it has been only +lightly maintained for many years. + +It has a centralised client/server architecture. It does not group +related file changes into atomic commits, making it easy for people to +``break the build'': one person can successfully commit part of a +change and then be blocked by the need for a merge, causing other +people to see only a portion of the work they intended to do. This +also affects how you work with project history. If you want to see +all of the modifications someone made as part of a task, you will need +to manually inspect the descriptions and timestamps of the changes +made to each file involved (if you even know what those files were). + +CVS has a muddled notion of tags and branches that I will not attempt +to even describe. It does not support renaming of files or +directories well, making it easy to corrupt a repository. It has +almost no internal consistency checking capabilities, so it is usually +not even possible to tell whether or how a repository is corrupt. I +would not recommend CVS for any project, existing or new. + +Mercurial can import CVS revision history. However, there are a few +caveats that apply; these are true of every other revision control +tool's CVS importer, too. Due to CVS's lack of atomic changes and +unversioned filesystem hierarchy, it is not possible to reconstruct +CVS history completely accurately; some guesswork is involved, and +renames will usually not show up. Because a lot of advanced CVS +administration has to be done by hand and is hence error-prone, it's +common for CVS importers to run into multiple problems with corrupted +repositories (completely bogus revision timestamps and files that have +remained locked for over a decade are just two of the less interesting +problems I can recall from personal experience). + +Mercurial can import revision history from a CVS repository. + + +\subsection{Commercial tools} + +Perforce has a centralised client/server architecture, with no +client-side caching of any data. Unlike modern revision control +tools, Perforce requires that a user run a command to inform the +server about every file they intend to edit. + +The performance of Perforce is quite good for small teams, but it +falls off rapidly as the number of users grows beyond a few dozen. +Modestly large Perforce installations require the deployment of +proxies to cope with the load their users generate. + + +\subsection{Choosing a revision control tool} + +With the exception of CVS, all of the tools listed above have unique +strengths that suit them to particular styles of work. There is no +single revision control tool that is best in all situations. + +As an example, Subversion is a good choice for working with frequently +edited binary files, due to its centralised nature and support for +file locking. If you're averse to the command line, it currently has +better GUI support than other free revision control tools. However, +its poor merging is a substantial liability for busy projects with +overlapping development. + +I personally find Mercurial's properties of simplicity, performance, +and good merge support to be a compelling combination that has served +me well for several years. + + +\section{Switching from another tool to Mercurial} + +Mercurial is bundled with an extension named \hgext{convert}, which +can incrementally import revision history from several other revision +control tools. By ``incremental'', I mean that you can convert all of +a project's history to date in one go, then rerun the conversion later +to obtain new changes that happened after the initial conversion. + +The revision control tools supported by \hgext{convert} are as +follows: +\begin{itemize} +\item Subversion +\item CVS +\item Git +\item Darcs +\end{itemize} + +In addition, \hgext{convert} can export changes from Mercurial to +Subversion. This makes it possible to try Subversion and Mercurial in +parallel before committing to a switchover, without risking the loss +of any work. + +The \hgxcmd{conver}{convert} command is easy to use. Simply point it +at the path or URL of the source repository, optionally give it the +name of the destination repository, and it will start working. After +the initial conversion, just run the same command again to import new +changes. + + +%%% Local Variables: +%%% mode: latex +%%% TeX-master: "00book" +%%% End: