comparison en/ch00-preface.xml @ 749:7e7c47481e4f

Oops, this is the real merge for my hg's oddity
author Dongsheng Song <dongsheng.song@gmail.com>
date Fri, 20 Mar 2009 16:43:35 +0800
parents d0160b0b1a9e
children 751ee9bf2e8d
comparison
equal deleted inserted replaced
748:d13c7c706a58 749:7e7c47481e4f
1 <!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : --> 1 <!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : -->
2 2
3 <preface id="chap.preface"> 3 <preface id="chap.preface">
4 <?dbhtml filename="preface.html"?>
4 <title>Preface</title> 5 <title>Preface</title>
5 6
6 <para>Distributed revision control is a relatively new territory, 7 <sect1>
7 and has thus far grown due to people's willingness to strike out 8 <title>Why revision control? Why Mercurial?</title>
8 into ill-charted territory.</para> 9
9 10 <para id="x_6d">Revision control is the process of managing multiple
10 <para>I am writing a book about distributed revision control because 11 versions of a piece of information. In its simplest form, this
11 I believe that it is an important subject that deserves a field 12 is something that many people do by hand: every time you modify
12 guide. I chose to write about Mercurial because it is the easiest 13 a file, save it under a new name that contains a number, each
13 tool to learn the terrain with, and yet it scales to the demands 14 one higher than the number of the preceding version.</para>
14 of real, challenging environments where many other revision 15
15 control tools fail.</para> 16 <para id="x_6e">Manually managing multiple versions of even a single file is
17 an error-prone task, though, so software tools to help automate
18 this process have long been available. The earliest automated
19 revision control tools were intended to help a single user to
20 manage revisions of a single file. Over the past few decades,
21 the scope of revision control tools has expanded greatly; they
22 now manage multiple files, and help multiple people to work
23 together. The best modern revision control tools have no
24 problem coping with thousands of people working together on
25 projects that consist of hundreds of thousands of files.</para>
26
27 <para id="x_6f">The arrival of distributed revision control is relatively
28 recent, and so far this new field has grown due to people's
29 willingness to explore ill-charted territory.</para>
30
31 <para id="x_70">I am writing a book about distributed revision control
32 because I believe that it is an important subject that deserves
33 a field guide. I chose to write about Mercurial because it is
34 the easiest tool to learn the terrain with, and yet it scales to
35 the demands of real, challenging environments where many other
36 revision control tools buckle.</para>
37
38 <sect2>
39 <title>Why use revision control?</title>
40
41 <para id="x_71">There are a number of reasons why you or your team might
42 want to use an automated revision control tool for a
43 project.</para>
44
45 <itemizedlist>
46 <listitem><para id="x_72">It will track the history and evolution of
47 your project, so you don't have to. For every change,
48 you'll have a log of <emphasis>who</emphasis> made it;
49 <emphasis>why</emphasis> they made it;
50 <emphasis>when</emphasis> they made it; and
51 <emphasis>what</emphasis> the change
52 was.</para></listitem>
53 <listitem><para id="x_73">When you're working with other people,
54 revision control software makes it easier for you to
55 collaborate. For example, when people more or less
56 simultaneously make potentially incompatible changes, the
57 software will help you to identify and resolve those
58 conflicts.</para></listitem>
59 <listitem><para id="x_74">It can help you to recover from mistakes. If
60 you make a change that later turns out to be in error, you
61 can revert to an earlier version of one or more files. In
62 fact, a <emphasis>really</emphasis> good revision control
63 tool will even help you to efficiently figure out exactly
64 when a problem was introduced (see section <xref
65 linkend="sec.undo.bisect"/> for details).</para></listitem>
66 <listitem><para id="x_75">It will help you to work simultaneously on,
67 and manage the drift between, multiple versions of your
68 project.</para></listitem>
69 </itemizedlist>
70
71 <para id="x_76">Most of these reasons are equally valid---at least in
72 theory---whether you're working on a project by yourself, or
73 with a hundred other people.</para>
74
75 <para id="x_77">A key question about the practicality of revision control
76 at these two different scales (<quote>lone hacker</quote> and
77 <quote>huge team</quote>) is how its
78 <emphasis>benefits</emphasis> compare to its
79 <emphasis>costs</emphasis>. A revision control tool that's
80 difficult to understand or use is going to impose a high
81 cost.</para>
82
83 <para id="x_78">A five-hundred-person project is likely to collapse under
84 its own weight almost immediately without a revision control
85 tool and process. In this case, the cost of using revision
86 control might hardly seem worth considering, since
87 <emphasis>without</emphasis> it, failure is almost
88 guaranteed.</para>
89
90 <para id="x_79">On the other hand, a one-person <quote>quick hack</quote>
91 might seem like a poor place to use a revision control tool,
92 because surely the cost of using one must be close to the
93 overall cost of the project. Right?</para>
94
95 <para id="x_7a">Mercurial uniquely supports <emphasis>both</emphasis> of
96 these scales of development. You can learn the basics in just
97 a few minutes, and due to its low overhead, you can apply
98 revision control to the smallest of projects with ease. Its
99 simplicity means you won't have a lot of abstruse concepts or
100 command sequences competing for mental space with whatever
101 you're <emphasis>really</emphasis> trying to do. At the same
102 time, Mercurial's high performance and peer-to-peer nature let
103 you scale painlessly to handle large projects.</para>
104
105 <para id="x_7b">No revision control tool can rescue a poorly run project,
106 but a good choice of tools can make a huge difference to the
107 fluidity with which you can work on a project.</para>
108
109 </sect2>
110
111 <sect2>
112 <title>The many names of revision control</title>
113
114 <para id="x_7c">Revision control is a diverse field, so much so that it is
115 referred to by many names and acronyms. Here are a few of the
116 more common variations you'll encounter:</para>
117 <itemizedlist>
118 <listitem><para id="x_7d">Revision control (RCS)</para></listitem>
119 <listitem><para id="x_7e">Software configuration management (SCM), or
120 configuration management</para></listitem>
121 <listitem><para id="x_7f">Source code management</para></listitem>
122 <listitem><para id="x_80">Source code control, or source
123 control</para></listitem>
124 <listitem><para id="x_81">Version control
125 (VCS)</para></listitem></itemizedlist>
126 <para id="x_82">Some people claim that these terms actually have different
127 meanings, but in practice they overlap so much that there's no
128 agreed or even useful way to tease them apart.</para>
129
130 </sect2>
131 </sect1>
16 132
17 <sect1> 133 <sect1>
18 <title>This book is a work in progress</title> 134 <title>This book is a work in progress</title>
19 135
20 <para>I am releasing this book while I am still writing it, in the 136 <para id="x_83">I am releasing this book while I am still writing it, in the
21 hope that it will prove useful to others. I also hope that 137 hope that it will prove useful to others. I am writing under an
22 readers will contribute as they see fit.</para> 138 open license in the hope that you, my readers, will contribute
139 feedback and perhaps content of your own.</para>
23 140
24 </sect1> 141 </sect1>
25 <sect1> 142 <sect1>
26 <title>About the examples in this book</title> 143 <title>About the examples in this book</title>
27 144
28 <para>This book takes an unusual approach to code samples. Every 145 <para id="x_84">This book takes an unusual approach to code samples. Every
29 example is <quote>live</quote>---each one is actually the result 146 example is <quote>live</quote>---each one is actually the result
30 of a shell script that executes the Mercurial commands you see. 147 of a shell script that executes the Mercurial commands you see.
31 Every time an image of the book is built from its sources, all 148 Every time an image of the book is built from its sources, all
32 the example scripts are automatically run, and their current 149 the example scripts are automatically run, and their current
33 results compared against their expected results.</para> 150 results compared against their expected results.</para>
34 151
35 <para>The advantage of this approach is that the examples are 152 <para id="x_85">The advantage of this approach is that the examples are
36 always accurate; they describe <emphasis>exactly</emphasis> the 153 always accurate; they describe <emphasis>exactly</emphasis> the
37 behaviour of the version of Mercurial that's mentioned at the 154 behaviour of the version of Mercurial that's mentioned at the
38 front of the book. If I update the version of Mercurial that 155 front of the book. If I update the version of Mercurial that
39 I'm documenting, and the output of some command changes, the 156 I'm documenting, and the output of some command changes, the
40 build fails.</para> 157 build fails.</para>
41 158
42 <para>There is a small disadvantage to this approach, which is 159 <para id="x_86">There is a small disadvantage to this approach, which is
43 that the dates and times you'll see in examples tend to be 160 that the dates and times you'll see in examples tend to be
44 <quote>squashed</quote> together in a way that they wouldn't be 161 <quote>squashed</quote> together in a way that they wouldn't be
45 if the same commands were being typed by a human. Where a human 162 if the same commands were being typed by a human. Where a human
46 can issue no more than one command every few seconds, with any 163 can issue no more than one command every few seconds, with any
47 resulting timestamps correspondingly spread out, my automated 164 resulting timestamps correspondingly spread out, my automated
48 example scripts run many commands in one second.</para> 165 example scripts run many commands in one second.</para>
49 166
50 <para>As an instance of this, several consecutive commits in an 167 <para id="x_87">As an instance of this, several consecutive commits in an
51 example can show up as having occurred during the same second. 168 example can show up as having occurred during the same second.
52 You can see this occur in the <literal 169 You can see this occur in the <literal
53 role="hg-ext">bisect</literal> example in section <xref 170 role="hg-ext">bisect</literal> example in section <xref
54 id="sec.undo.bisect"/>, for instance.</para> 171 id="sec.undo.bisect"/>, for instance.</para>
55 172
56 <para>So when you're reading examples, don't place too much weight 173 <para id="x_88">So when you're reading examples, don't place too much weight
57 on the dates or times you see in the output of commands. But 174 on the dates or times you see in the output of commands. But
58 <emphasis>do</emphasis> be confident that the behaviour you're 175 <emphasis>do</emphasis> be confident that the behaviour you're
59 seeing is consistent and reproducible.</para> 176 seeing is consistent and reproducible.</para>
60 177
61 </sect1> 178 </sect1>
62 <sect1> 179
63 <title>Colophon---this book is Free</title> 180 <sect1>
64 181 <title>Trends in the field</title>
65 <para>This book is licensed under the Open Publication License, 182
183 <para id="x_89">There has been an unmistakable trend in the development and
184 use of revision control tools over the past four decades, as
185 people have become familiar with the capabilities of their tools
186 and constrained by their limitations.</para>
187
188 <para id="x_8a">The first generation began by managing single files on
189 individual computers. Although these tools represented a huge
190 advance over ad-hoc manual revision control, their locking model
191 and reliance on a single computer limited them to small,
192 tightly-knit teams.</para>
193
194 <para id="x_8b">The second generation loosened these constraints by moving
195 to network-centered architectures, and managing entire projects
196 at a time. As projects grew larger, they ran into new problems.
197 With clients needing to talk to servers very frequently, server
198 scaling became an issue for large projects. An unreliable
199 network connection could prevent remote users from being able to
200 talk to the server at all. As open source projects started
201 making read-only access available anonymously to anyone, people
202 without commit privileges found that they could not use the
203 tools to interact with a project in a natural way, as they could
204 not record their changes.</para>
205
206 <para id="x_8c">The current generation of revision control tools is
207 peer-to-peer in nature. All of these systems have dropped the
208 dependency on a single central server, and allow people to
209 distribute their revision control data to where it's actually
210 needed. Collaboration over the Internet has moved from
211 constrained by technology to a matter of choice and consensus.
212 Modern tools can operate offline indefinitely and autonomously,
213 with a network connection only needed when syncing changes with
214 another repository.</para>
215
216 </sect1>
217 <sect1>
218 <title>A few of the advantages of distributed revision
219 control</title>
220
221 <para id="x_8d">Even though distributed revision control tools have for
222 several years been as robust and usable as their
223 previous-generation counterparts, people using older tools have
224 not yet necessarily woken up to their advantages. There are a
225 number of ways in which distributed tools shine relative to
226 centralised ones.</para>
227
228 <para id="x_8e">For an individual developer, distributed tools are almost
229 always much faster than centralised tools. This is for a simple
230 reason: a centralised tool needs to talk over the network for
231 many common operations, because most metadata is stored in a
232 single copy on the central server. A distributed tool stores
233 all of its metadata locally. All else being equal, talking over
234 the network adds overhead to a centralised tool. Don't
235 underestimate the value of a snappy, responsive tool: you're
236 going to spend a lot of time interacting with your revision
237 control software.</para>
238
239 <para id="x_8f">Distributed tools are indifferent to the vagaries of your
240 server infrastructure, again because they replicate metadata to
241 so many locations. If you use a centralised system and your
242 server catches fire, you'd better hope that your backup media
243 are reliable, and that your last backup was recent and actually
244 worked. With a distributed tool, you have many backups
245 available on every contributor's computer.</para>
246
247 <para id="x_90">The reliability of your network will affect distributed
248 tools far less than it will centralised tools. You can't even
249 use a centralised tool without a network connection, except for
250 a few highly constrained commands. With a distributed tool, if
251 your network connection goes down while you're working, you may
252 not even notice. The only thing you won't be able to do is talk
253 to repositories on other computers, something that is relatively
254 rare compared with local operations. If you have a far-flung
255 team of collaborators, this may be significant.</para>
256
257 <sect2>
258 <title>Advantages for open source projects</title>
259
260 <para id="x_91">If you take a shine to an open source project and decide
261 that you would like to start hacking on it, and that project
262 uses a distributed revision control tool, you are at once a
263 peer with the people who consider themselves the
264 <quote>core</quote> of that project. If they publish their
265 repositories, you can immediately copy their project history,
266 start making changes, and record your work, using the same
267 tools in the same ways as insiders. By contrast, with a
268 centralised tool, you must use the software in a <quote>read
269 only</quote> mode unless someone grants you permission to
270 commit changes to their central server. Until then, you won't
271 be able to record changes, and your local modifications will
272 be at risk of corruption any time you try to update your
273 client's view of the repository.</para>
274
275 <sect3>
276 <title>The forking non-problem</title>
277
278 <para id="x_92">It has been suggested that distributed revision control
279 tools pose some sort of risk to open source projects because
280 they make it easy to <quote>fork</quote> the development of
281 a project. A fork happens when there are differences in
282 opinion or attitude between groups of developers that cause
283 them to decide that they can't work together any longer.
284 Each side takes a more or less complete copy of the
285 project's source code, and goes off in its own
286 direction.</para>
287
288 <para id="x_93">Sometimes the camps in a fork decide to reconcile their
289 differences. With a centralised revision control system, the
290 <emphasis>technical</emphasis> process of reconciliation is
291 painful, and has to be performed largely by hand. You have
292 to decide whose revision history is going to
293 <quote>win</quote>, and graft the other team's changes into
294 the tree somehow. This usually loses some or all of one
295 side's revision history.</para>
296
297 <para id="x_94">What distributed tools do with respect to forking is
298 they make forking the <emphasis>only</emphasis> way to
299 develop a project. Every single change that you make is
300 potentially a fork point. The great strength of this
301 approach is that a distributed revision control tool has to
302 be really good at <emphasis>merging</emphasis> forks,
303 because forks are absolutely fundamental: they happen all
304 the time.</para>
305
306 <para id="x_95">If every piece of work that everybody does, all the
307 time, is framed in terms of forking and merging, then what
308 the open source world refers to as a <quote>fork</quote>
309 becomes <emphasis>purely</emphasis> a social issue. If
310 anything, distributed tools <emphasis>lower</emphasis> the
311 likelihood of a fork:</para>
312 <itemizedlist>
313 <listitem><para id="x_96">They eliminate the social distinction that
314 centralised tools impose: that between insiders (people
315 with commit access) and outsiders (people
316 without).</para></listitem>
317 <listitem><para id="x_97">They make it easier to reconcile after a
318 social fork, because all that's involved from the
319 perspective of the revision control software is just
320 another merge.</para></listitem></itemizedlist>
321
322 <para id="x_98">Some people resist distributed tools because they want
323 to retain tight control over their projects, and they
324 believe that centralised tools give them this control.
325 However, if you're of this belief, and you publish your CVS
326 or Subversion repositories publicly, there are plenty of
327 tools available that can pull out your entire project's
328 history (albeit slowly) and recreate it somewhere that you
329 don't control. So while your control in this case is
330 illusory, you are forgoing the ability to fluidly
331 collaborate with whatever people feel compelled to mirror
332 and fork your history.</para>
333
334 </sect3>
335 </sect2>
336 <sect2>
337 <title>Advantages for commercial projects</title>
338
339 <para id="x_99">Many commercial projects are undertaken by teams that are
340 scattered across the globe. Contributors who are far from a
341 central server will see slower command execution and perhaps
342 less reliability. Commercial revision control systems attempt
343 to ameliorate these problems with remote-site replication
344 add-ons that are typically expensive to buy and cantankerous
345 to administer. A distributed system doesn't suffer from these
346 problems in the first place. Better yet, you can easily set
347 up multiple authoritative servers, say one per site, so that
348 there's no redundant communication between repositories over
349 expensive long-haul network links.</para>
350
351 <para id="x_9a">Centralised revision control systems tend to have
352 relatively low scalability. It's not unusual for an expensive
353 centralised system to fall over under the combined load of
354 just a few dozen concurrent users. Once again, the typical
355 response tends to be an expensive and clunky replication
356 facility. Since the load on a central server---if you have
357 one at all---is many times lower with a distributed tool
358 (because all of the data is replicated everywhere), a single
359 cheap server can handle the needs of a much larger team, and
360 replication to balance load becomes a simple matter of
361 scripting.</para>
362
363 <para id="x_9b">If you have an employee in the field, troubleshooting a
364 problem at a customer's site, they'll benefit from distributed
365 revision control. The tool will let them generate custom
366 builds, try different fixes in isolation from each other, and
367 search efficiently through history for the sources of bugs and
368 regressions in the customer's environment, all without needing
369 to connect to your company's network.</para>
370
371 </sect2>
372 </sect1>
373 <sect1>
374 <title>Why choose Mercurial?</title>
375
376 <para id="x_9c">Mercurial has a unique set of properties that make it a
377 particularly good choice as a revision control system.</para>
378 <itemizedlist>
379 <listitem><para id="x_9d">It is easy to learn and use.</para></listitem>
380 <listitem><para id="x_9e">It is lightweight.</para></listitem>
381 <listitem><para id="x_9f">It scales excellently.</para></listitem>
382 <listitem><para id="x_a0">It is easy to
383 customise.</para></listitem></itemizedlist>
384
385 <para id="x_a1">If you are at all familiar with revision control systems,
386 you should be able to get up and running with Mercurial in less
387 than five minutes. Even if not, it will take no more than a few
388 minutes longer. Mercurial's command and feature sets are
389 generally uniform and consistent, so you can keep track of a few
390 general rules instead of a host of exceptions.</para>
391
392 <para id="x_a2">On a small project, you can start working with Mercurial in
393 moments. Creating new changes and branches; transferring changes
394 around (whether locally or over a network); and history and
395 status operations are all fast. Mercurial attempts to stay
396 nimble and largely out of your way by combining low cognitive
397 overhead with blazingly fast operations.</para>
398
399 <para id="x_a3">The usefulness of Mercurial is not limited to small
400 projects: it is used by projects with hundreds to thousands of
401 contributors, each containing tens of thousands of files and
402 hundreds of megabytes of source code.</para>
403
404 <para id="x_a4">If the core functionality of Mercurial is not enough for
405 you, it's easy to build on. Mercurial is well suited to
406 scripting tasks, and its clean internals and implementation in
407 Python make it easy to add features in the form of extensions.
408 There are a number of popular and useful extensions already
409 available, ranging from helping to identify bugs to improving
410 performance.</para>
411
412 </sect1>
413 <sect1>
414 <title>Mercurial compared with other tools</title>
415
416 <para id="x_a5">Before you read on, please understand that this section
417 necessarily reflects my own experiences, interests, and (dare I
418 say it) biases. I have used every one of the revision control
419 tools listed below, in most cases for several years at a
420 time.</para>
421
422
423 <sect2>
424 <title>Subversion</title>
425
426 <para id="x_a6">Subversion is a popular revision control tool, developed
427 to replace CVS. It has a centralised client/server
428 architecture.</para>
429
430 <para id="x_a7">Subversion and Mercurial have similarly named commands for
431 performing the same operations, so if you're familiar with
432 one, it is easy to learn to use the other. Both tools are
433 portable to all popular operating systems.</para>
434
435 <para id="x_a8">Prior to version 1.5, Subversion had no useful support for
436 merges. At the time of writing, its merge tracking capability
437 is new, and known to be <ulink
438 url="http://svnbook.red-bean.com/nightly/en/svn.branchmerge.advanced.html#svn.branchmerge.advanced.finalword">complicated
439 and buggy</ulink>.</para>
440
441 <para id="x_a9">Mercurial has a substantial performance advantage over
442 Subversion on every revision control operation I have
443 benchmarked. I have measured its advantage as ranging from a
444 factor of two to a factor of six when compared with Subversion
445 1.4.3's <emphasis>ra_local</emphasis> file store, which is the
446 fastest access method available. In more realistic
447 deployments involving a network-based store, Subversion will
448 be at a substantially larger disadvantage. Because many
449 Subversion commands must talk to the server and Subversion
450 does not have useful replication facilities, server capacity
451 and network bandwidth become bottlenecks for modestly large
452 projects.</para>
453
454 <para id="x_aa">Additionally, Subversion incurs substantial storage
455 overhead to avoid network transactions for a few common
456 operations, such as finding modified files
457 (<literal>status</literal>) and displaying modifications
458 against the current revision (<literal>diff</literal>). As a
459 result, a Subversion working copy is often the same size as,
460 or larger than, a Mercurial repository and working directory,
461 even though the Mercurial repository contains a complete
462 history of the project.</para>
463
464 <para id="x_ab">Subversion is widely supported by third party tools.
465 Mercurial currently lags considerably in this area. This gap
466 is closing, however, and indeed some of Mercurial's GUI tools
467 now outshine their Subversion equivalents. Like Mercurial,
468 Subversion has an excellent user manual.</para>
469
470 <para id="x_ac">Because Subversion doesn't store revision history on the
471 client, it is well suited to managing projects that deal with
472 lots of large, opaque binary files. If you check in fifty
473 revisions to an incompressible 10MB file, Subversion's
474 client-side space usage stays constant The space used by any
475 distributed SCM will grow rapidly in proportion to the number
476 of revisions, because the differences between each revision
477 are large.</para>
478
479 <para id="x_ad">In addition, it's often difficult or, more usually,
480 impossible to merge different versions of a binary file.
481 Subversion's ability to let a user lock a file, so that they
482 temporarily have the exclusive right to commit changes to it,
483 can be a significant advantage to a project where binary files
484 are widely used.</para>
485
486 <para id="x_ae">Mercurial can import revision history from a Subversion
487 repository. It can also export revision history to a
488 Subversion repository. This makes it easy to <quote>test the
489 waters</quote> and use Mercurial and Subversion in parallel
490 before deciding to switch. History conversion is incremental,
491 so you can perform an initial conversion, then small
492 additional conversions afterwards to bring in new
493 changes.</para>
494
495
496 </sect2>
497 <sect2>
498 <title>Git</title>
499
500 <para id="x_af">Git is a distributed revision control tool that was
501 developed for managing the Linux kernel source tree. Like
502 Mercurial, its early design was somewhat influenced by
503 Monotone.</para>
504
505 <para id="x_b0">Git has a very large command set, with version 1.5.0
506 providing 139 individual commands. It has something of a
507 reputation for being difficult to learn. Compared to Git,
508 Mercurial has a strong focus on simplicity.</para>
509
510 <para id="x_b1">In terms of performance, Git is extremely fast. In
511 several cases, it is faster than Mercurial, at least on Linux,
512 while Mercurial performs better on other operations. However,
513 on Windows, the performance and general level of support that
514 Git provides is, at the time of writing, far behind that of
515 Mercurial.</para>
516
517 <para id="x_b2">While a Mercurial repository needs no maintenance, a Git
518 repository requires frequent manual <quote>repacks</quote> of
519 its metadata. Without these, performance degrades, while
520 space usage grows rapidly. A server that contains many Git
521 repositories that are not rigorously and frequently repacked
522 will become heavily disk-bound during backups, and there have
523 been instances of daily backups taking far longer than 24
524 hours as a result. A freshly packed Git repository is
525 slightly smaller than a Mercurial repository, but an unpacked
526 repository is several orders of magnitude larger.</para>
527
528 <para id="x_b3">The core of Git is written in C. Many Git commands are
529 implemented as shell or Perl scripts, and the quality of these
530 scripts varies widely. I have encountered several instances
531 where scripts charged along blindly in the presence of errors
532 that should have been fatal.</para>
533
534 <para id="x_b4">Mercurial can import revision history from a Git
535 repository.</para>
536
537
538 </sect2>
539 <sect2>
540 <title>CVS</title>
541
542 <para id="x_b5">CVS is probably the most widely used revision control tool
543 in the world. Due to its age and internal untidiness, it has
544 been only lightly maintained for many years.</para>
545
546 <para id="x_b6">It has a centralised client/server architecture. It does
547 not group related file changes into atomic commits, making it
548 easy for people to <quote>break the build</quote>: one person
549 can successfully commit part of a change and then be blocked
550 by the need for a merge, causing other people to see only a
551 portion of the work they intended to do. This also affects
552 how you work with project history. If you want to see all of
553 the modifications someone made as part of a task, you will
554 need to manually inspect the descriptions and timestamps of
555 the changes made to each file involved (if you even know what
556 those files were).</para>
557
558 <para id="x_b7">CVS has a muddled notion of tags and branches that I will
559 not attempt to even describe. It does not support renaming of
560 files or directories well, making it easy to corrupt a
561 repository. It has almost no internal consistency checking
562 capabilities, so it is usually not even possible to tell
563 whether or how a repository is corrupt. I would not recommend
564 CVS for any project, existing or new.</para>
565
566 <para id="x_b8">Mercurial can import CVS revision history. However, there
567 are a few caveats that apply; these are true of every other
568 revision control tool's CVS importer, too. Due to CVS's lack
569 of atomic changes and unversioned filesystem hierarchy, it is
570 not possible to reconstruct CVS history completely accurately;
571 some guesswork is involved, and renames will usually not show
572 up. Because a lot of advanced CVS administration has to be
573 done by hand and is hence error-prone, it's common for CVS
574 importers to run into multiple problems with corrupted
575 repositories (completely bogus revision timestamps and files
576 that have remained locked for over a decade are just two of
577 the less interesting problems I can recall from personal
578 experience).</para>
579
580 <para id="x_b9">Mercurial can import revision history from a CVS
581 repository.</para>
582
583
584 </sect2>
585 <sect2>
586 <title>Commercial tools</title>
587
588 <para id="x_ba">Perforce has a centralised client/server architecture,
589 with no client-side caching of any data. Unlike modern
590 revision control tools, Perforce requires that a user run a
591 command to inform the server about every file they intend to
592 edit.</para>
593
594 <para id="x_bb">The performance of Perforce is quite good for small teams,
595 but it falls off rapidly as the number of users grows beyond a
596 few dozen. Modestly large Perforce installations require the
597 deployment of proxies to cope with the load their users
598 generate.</para>
599
600
601 </sect2>
602 <sect2>
603 <title>Choosing a revision control tool</title>
604
605 <para id="x_bc">With the exception of CVS, all of the tools listed above
606 have unique strengths that suit them to particular styles of
607 work. There is no single revision control tool that is best
608 in all situations.</para>
609
610 <para id="x_bd">As an example, Subversion is a good choice for working
611 with frequently edited binary files, due to its centralised
612 nature and support for file locking.</para>
613
614 <para id="x_be">I personally find Mercurial's properties of simplicity,
615 performance, and good merge support to be a compelling
616 combination that has served me well for several years.</para>
617
618
619 </sect2>
620 </sect1>
621 <sect1>
622 <title>Switching from another tool to Mercurial</title>
623
624 <para id="x_bf">Mercurial is bundled with an extension named <literal
625 role="hg-ext">convert</literal>, which can incrementally
626 import revision history from several other revision control
627 tools. By <quote>incremental</quote>, I mean that you can
628 convert all of a project's history to date in one go, then rerun
629 the conversion later to obtain new changes that happened after
630 the initial conversion.</para>
631
632 <para id="x_c0">The revision control tools supported by <literal
633 role="hg-ext">convert</literal> are as follows:</para>
634 <itemizedlist>
635 <listitem><para id="x_c1">Subversion</para></listitem>
636 <listitem><para id="x_c2">CVS</para></listitem>
637 <listitem><para id="x_c3">Git</para></listitem>
638 <listitem><para id="x_c4">Darcs</para></listitem></itemizedlist>
639
640 <para id="x_c5">In addition, <literal role="hg-ext">convert</literal> can
641 export changes from Mercurial to Subversion. This makes it
642 possible to try Subversion and Mercurial in parallel before
643 committing to a switchover, without risking the loss of any
644 work.</para>
645
646 <para id="x_c6">The <command role="hg-ext-convert">convert</command> command
647 is easy to use. Simply point it at the path or URL of the
648 source repository, optionally give it the name of the
649 destination repository, and it will start working. After the
650 initial conversion, just run the same command again to import
651 new changes.</para>
652 </sect1>
653
654 <sect1>
655 <title>A short history of revision control</title>
656
657 <para id="x_c7">The best known of the old-time revision control tools is
658 SCCS (Source Code Control System), which Marc Rochkind wrote at
659 Bell Labs, in the early 1970s. SCCS operated on individual
660 files, and required every person working on a project to have
661 access to a shared workspace on a single system. Only one
662 person could modify a file at any time; arbitration for access
663 to files was via locks. It was common for people to lock files,
664 and later forget to unlock them, preventing anyone else from
665 modifying those files without the help of an
666 administrator.</para>
667
668 <para id="x_c8">Walter Tichy developed a free alternative to SCCS in the
669 early 1980s; he called his program RCS (Revision Control System).
670 Like SCCS, RCS required developers to work in a single shared
671 workspace, and to lock files to prevent multiple people from
672 modifying them simultaneously.</para>
673
674 <para id="x_c9">Later in the 1980s, Dick Grune used RCS as a building block
675 for a set of shell scripts he initially called cmt, but then
676 renamed to CVS (Concurrent Versions System). The big innovation
677 of CVS was that it let developers work simultaneously and
678 somewhat independently in their own personal workspaces. The
679 personal workspaces prevented developers from stepping on each
680 other's toes all the time, as was common with SCCS and RCS. Each
681 developer had a copy of every project file, and could modify
682 their copies independently. They had to merge their edits prior
683 to committing changes to the central repository.</para>
684
685 <para id="x_ca">Brian Berliner took Grune's original scripts and rewrote
686 them in C, releasing in 1989 the code that has since developed
687 into the modern version of CVS. CVS subsequently acquired the
688 ability to operate over a network connection, giving it a
689 client/server architecture. CVS's architecture is centralised;
690 only the server has a copy of the history of the project. Client
691 workspaces just contain copies of recent versions of the
692 project's files, and a little metadata to tell them where the
693 server is. CVS has been enormously successful; it is probably
694 the world's most widely used revision control system.</para>
695
696 <para id="x_cb">In the early 1990s, Sun Microsystems developed an early
697 distributed revision control system, called TeamWare. A
698 TeamWare workspace contains a complete copy of the project's
699 history. TeamWare has no notion of a central repository. (CVS
700 relied upon RCS for its history storage; TeamWare used
701 SCCS.)</para>
702
703 <para id="x_cc">As the 1990s progressed, awareness grew of a number of
704 problems with CVS. It records simultaneous changes to multiple
705 files individually, instead of grouping them together as a
706 single logically atomic operation. It does not manage its file
707 hierarchy well; it is easy to make a mess of a repository by
708 renaming files and directories. Worse, its source code is
709 difficult to read and maintain, which made the <quote>pain
710 level</quote> of fixing these architectural problems
711 prohibitive.</para>
712
713 <para id="x_cd">In 2001, Jim Blandy and Karl Fogel, two developers who had
714 worked on CVS, started a project to replace it with a tool that
715 would have a better architecture and cleaner code. The result,
716 Subversion, does not stray from CVS's centralised client/server
717 model, but it adds multi-file atomic commits, better namespace
718 management, and a number of other features that make it a
719 generally better tool than CVS. Since its initial release, it
720 has rapidly grown in popularity.</para>
721
722 <para id="x_ce">More or less simultaneously, Graydon Hoare began working on
723 an ambitious distributed revision control system that he named
724 Monotone. While Monotone addresses many of CVS's design flaws
725 and has a peer-to-peer architecture, it goes beyond earlier (and
726 subsequent) revision control tools in a number of innovative
727 ways. It uses cryptographic hashes as identifiers, and has an
728 integral notion of <quote>trust</quote> for code from different
729 sources.</para>
730
731 <para id="x_cf">Mercurial began life in 2005. While a few aspects of its
732 design are influenced by Monotone, Mercurial focuses on ease of
733 use, high performance, and scalability to very large
734 projects.</para>
735
736 </sect1>
737
738 <sect1>
739 <title>Colophon&emdash;this book is Free</title>
740
741 <para id="x_d0">This book is licensed under the Open Publication License,
66 and is produced entirely using Free Software tools. It is 742 and is produced entirely using Free Software tools. It is
67 typeset with DocBook XML. Illustrations are drawn and rendered with 743 typeset with DocBook XML. Illustrations are drawn and rendered with
68 <ulink url="http://www.inkscape.org/">Inkscape</ulink>.</para> 744 <ulink url="http://www.inkscape.org/">Inkscape</ulink>.</para>
69 745
70 <para>The complete source code for this book is published as a 746 <para id="x_d1">The complete source code for this book is published as a
71 Mercurial repository, at <ulink 747 Mercurial repository, at <ulink
72 url="http://hg.serpentine.com/mercurial/book">http://hg.serpentine.com/mercurial/book</ulink>.</para> 748 url="http://hg.serpentine.com/mercurial/book">http://hg.serpentine.com/mercurial/book</ulink>.</para>
73 749
74 </sect1> 750 </sect1>
75 </preface> 751 </preface>