comparison en/ch00-preface.xml @ 831:acf9dc5f088d

Add a skeletal preface.
author Bryan O'Sullivan <bos@serpentine.com>
date Thu, 07 May 2009 21:07:35 -0700
parents b338f5490029
children d5688822c51d
comparison
equal deleted inserted replaced
830:cbdff5945f9d 831:acf9dc5f088d
3 <preface id="chap:preface"> 3 <preface id="chap:preface">
4 <?dbhtml filename="preface.html"?> 4 <?dbhtml filename="preface.html"?>
5 <title>Preface</title> 5 <title>Preface</title>
6 6
7 <sect1> 7 <sect1>
8 <title>Why revision control? Why Mercurial?</title> 8 <title>Conventions Used in This Book</title>
9 9
10 <para id="x_6d">Revision control is the process of managing multiple 10 <para>The following typographical conventions are used in this
11 versions of a piece of information. In its simplest form, this 11 book:</para>
12 is something that many people do by hand: every time you modify
13 a file, save it under a new name that contains a number, each
14 one higher than the number of the preceding version.</para>
15 12
16 <para id="x_6e">Manually managing multiple versions of even a single file is 13 <variablelist>
17 an error-prone task, though, so software tools to help automate 14 <varlistentry>
18 this process have long been available. The earliest automated 15 <term>Italic</term>
19 revision control tools were intended to help a single user to
20 manage revisions of a single file. Over the past few decades,
21 the scope of revision control tools has expanded greatly; they
22 now manage multiple files, and help multiple people to work
23 together. The best modern revision control tools have no
24 problem coping with thousands of people working together on
25 projects that consist of hundreds of thousands of files.</para>
26 16
27 <para id="x_6f">The arrival of distributed revision control is relatively 17 <listitem>
28 recent, and so far this new field has grown due to people's 18 <para>Indicates new terms, URLs, email addresses, filenames,
29 willingness to explore ill-charted territory.</para> 19 and file extensions.</para>
20 </listitem>
21 </varlistentry>
30 22
31 <para id="x_70">I am writing a book about distributed revision control 23 <varlistentry>
32 because I believe that it is an important subject that deserves 24 <term><literal>Constant width</literal></term>
33 a field guide. I chose to write about Mercurial because it is
34 the easiest tool to learn the terrain with, and yet it scales to
35 the demands of real, challenging environments where many other
36 revision control tools buckle.</para>
37 25
38 <sect2> 26 <listitem>
39 <title>Why use revision control?</title> 27 <para>Used for program listings, as well as within
28 paragraphs to refer to program elements such as variable
29 or function names, databases, data types, environment
30 variables, statements, and keywords.</para>
31 </listitem>
32 </varlistentry>
40 33
41 <para id="x_71">There are a number of reasons why you or your team might 34 <varlistentry>
42 want to use an automated revision control tool for a 35 <term><userinput>Constant width bold</userinput></term>
43 project.</para>
44 36
45 <itemizedlist> 37 <listitem>
46 <listitem><para id="x_72">It will track the history and evolution of 38 <para>Shows commands or other text that should be typed
47 your project, so you don't have to. For every change, 39 literally by the user.</para>
48 you'll have a log of <emphasis>who</emphasis> made it; 40 </listitem>
49 <emphasis>why</emphasis> they made it; 41 </varlistentry>
50 <emphasis>when</emphasis> they made it; and
51 <emphasis>what</emphasis> the change
52 was.</para></listitem>
53 <listitem><para id="x_73">When you're working with other people,
54 revision control software makes it easier for you to
55 collaborate. For example, when people more or less
56 simultaneously make potentially incompatible changes, the
57 software will help you to identify and resolve those
58 conflicts.</para></listitem>
59 <listitem><para id="x_74">It can help you to recover from mistakes. If
60 you make a change that later turns out to be in error, you
61 can revert to an earlier version of one or more files. In
62 fact, a <emphasis>really</emphasis> good revision control
63 tool will even help you to efficiently figure out exactly
64 when a problem was introduced (see <xref
65 linkend="sec:undo:bisect"/> for details).</para></listitem>
66 <listitem><para id="x_75">It will help you to work simultaneously on,
67 and manage the drift between, multiple versions of your
68 project.</para></listitem>
69 </itemizedlist>
70 42
71 <para id="x_76">Most of these reasons are equally 43 <varlistentry>
72 valid&emdash;at least in theory&emdash;whether you're working 44 <term><replaceable>Constant width italic</replaceable></term>
73 on a project by yourself, or with a hundred other
74 people.</para>
75 45
76 <para id="x_77">A key question about the practicality of revision control 46 <listitem>
77 at these two different scales (<quote>lone hacker</quote> and 47 <para>Shows text that should be replaced with user-supplied
78 <quote>huge team</quote>) is how its 48 values or by values determined by context.</para>
79 <emphasis>benefits</emphasis> compare to its 49 </listitem>
80 <emphasis>costs</emphasis>. A revision control tool that's 50 </varlistentry>
81 difficult to understand or use is going to impose a high 51 </variablelist>
82 cost.</para>
83 52
84 <para id="x_78">A five-hundred-person project is likely to collapse under 53 <tip>
85 its own weight almost immediately without a revision control 54 <para>This icon signifies a tip, suggestion, or general
86 tool and process. In this case, the cost of using revision 55 note.</para>
87 control might hardly seem worth considering, since 56 </tip>
88 <emphasis>without</emphasis> it, failure is almost
89 guaranteed.</para>
90 57
91 <para id="x_79">On the other hand, a one-person <quote>quick hack</quote> 58 <caution>
92 might seem like a poor place to use a revision control tool, 59 <para>This icon indicates a warning or caution.</para>
93 because surely the cost of using one must be close to the 60 </caution>
94 overall cost of the project. Right?</para>
95
96 <para id="x_7a">Mercurial uniquely supports <emphasis>both</emphasis> of
97 these scales of development. You can learn the basics in just
98 a few minutes, and due to its low overhead, you can apply
99 revision control to the smallest of projects with ease. Its
100 simplicity means you won't have a lot of abstruse concepts or
101 command sequences competing for mental space with whatever
102 you're <emphasis>really</emphasis> trying to do. At the same
103 time, Mercurial's high performance and peer-to-peer nature let
104 you scale painlessly to handle large projects.</para>
105
106 <para id="x_7b">No revision control tool can rescue a poorly run project,
107 but a good choice of tools can make a huge difference to the
108 fluidity with which you can work on a project.</para>
109
110 </sect2>
111
112 <sect2>
113 <title>The many names of revision control</title>
114
115 <para id="x_7c">Revision control is a diverse field, so much so that it is
116 referred to by many names and acronyms. Here are a few of the
117 more common variations you'll encounter:</para>
118 <itemizedlist>
119 <listitem><para id="x_7d">Revision control (RCS)</para></listitem>
120 <listitem><para id="x_7e">Software configuration management (SCM), or
121 configuration management</para></listitem>
122 <listitem><para id="x_7f">Source code management</para></listitem>
123 <listitem><para id="x_80">Source code control, or source
124 control</para></listitem>
125 <listitem><para id="x_81">Version control
126 (VCS)</para></listitem></itemizedlist>
127 <para id="x_82">Some people claim that these terms actually have different
128 meanings, but in practice they overlap so much that there's no
129 agreed or even useful way to tease them apart.</para>
130
131 </sect2>
132 </sect1> 61 </sect1>
133 62
134 <sect1> 63 <sect1>
135 <title>This book is a work in progress</title> 64 <title>Using Code Examples</title>
136 65
137 <para id="x_83">I am releasing this book while I am still writing it, in the 66 <para>This book is here to help you get your job done. In general,
138 hope that it will prove useful to others. I am writing under an 67 you may use the code in this book in your programs and
139 open license in the hope that you, my readers, will contribute 68 documentation. You do not need to contact us for permission
140 feedback and perhaps content of your own.</para> 69 unless you’re reproducing a significant portion of the code. For
70 example, writing a program that uses several chunks of code from
71 this book does not require permission. Selling or distributing a
72 CD-ROM of examples from O’Reilly books does require permission.
73 Answering a question by citing this book and quoting example
74 code does not require permission. Incorporating a significant
75 amount of example code from this book into your product’s
76 documentation does require permission.</para>
141 77
142 </sect1> 78 <para>We appreciate, but do not require, attribution. An
143 <sect1> 79 attribution usually includes the title, author, publisher, and
144 <title>About the examples in this book</title> 80 ISBN. For example: “<emphasis>Book Title</emphasis> by Some
81 Author. Copyright 2008 O’Reilly Media, Inc.,
82 978-0-596-xxxx-x.”</para>
145 83
146 <para id="x_84">This book takes an unusual approach to code samples. Every 84 <para>If you feel your use of code examples falls outside fair use
147 example is <quote>live</quote>&emdash;each one is actually the result 85 or the permission given above, feel free to contact us at
148 of a shell script that executes the Mercurial commands you see. 86 <email>permissions@oreilly.com</email>.</para>
149 Every time an image of the book is built from its sources, all
150 the example scripts are automatically run, and their current
151 results compared against their expected results.</para>
152
153 <para id="x_85">The advantage of this approach is that the examples are
154 always accurate; they describe <emphasis>exactly</emphasis> the
155 behavior of the version of Mercurial that's mentioned at the
156 front of the book. If I update the version of Mercurial that
157 I'm documenting, and the output of some command changes, the
158 build fails.</para>
159
160 <para id="x_86">There is a small disadvantage to this approach, which is
161 that the dates and times you'll see in examples tend to be
162 <quote>squashed</quote> together in a way that they wouldn't be
163 if the same commands were being typed by a human. Where a human
164 can issue no more than one command every few seconds, with any
165 resulting timestamps correspondingly spread out, my automated
166 example scripts run many commands in one second.</para>
167
168 <para id="x_87">As an instance of this, several consecutive commits in an
169 example can show up as having occurred during the same second.
170 You can see this occur in the <literal
171 role="hg-ext">bisect</literal> example in <xref
172 linkend="sec:undo:bisect"/>, for instance.</para>
173
174 <para id="x_88">So when you're reading examples, don't place too much weight
175 on the dates or times you see in the output of commands. But
176 <emphasis>do</emphasis> be confident that the behavior you're
177 seeing is consistent and reproducible.</para>
178
179 </sect1> 87 </sect1>
180 88
181 <sect1> 89 <sect1>
182 <title>Trends in the field</title> 90 <title>Safari® Books Online</title>
183 91
184 <para id="x_89">There has been an unmistakable trend in the development and 92 <note role="safarienabled">
185 use of revision control tools over the past four decades, as 93 <para>When you see a Safari® Books Online icon on the cover of
186 people have become familiar with the capabilities of their tools 94 your favorite technology book, that means the book is
187 and constrained by their limitations.</para> 95 available online through the O’Reilly Network Safari
96 Bookshelf.</para>
97 </note>
188 98
189 <para id="x_8a">The first generation began by managing single files on 99 <para>Safari offers a solution that’s better than e-books. It’s a
190 individual computers. Although these tools represented a huge 100 virtual library that lets you easily search thousands of top
191 advance over ad-hoc manual revision control, their locking model 101 tech books, cut and paste code samples, download chapters, and
192 and reliance on a single computer limited them to small, 102 find quick answers when you need the most accurate, current
193 tightly-knit teams.</para> 103 information. Try it for free at <ulink role="orm:hideurl:ital"
194 104 url="http://my.safaribooksonline.com/?portal=oreilly">http://my.safaribooksonline.com</ulink>.</para>
195 <para id="x_8b">The second generation loosened these constraints by moving
196 to network-centered architectures, and managing entire projects
197 at a time. As projects grew larger, they ran into new problems.
198 With clients needing to talk to servers very frequently, server
199 scaling became an issue for large projects. An unreliable
200 network connection could prevent remote users from being able to
201 talk to the server at all. As open source projects started
202 making read-only access available anonymously to anyone, people
203 without commit privileges found that they could not use the
204 tools to interact with a project in a natural way, as they could
205 not record their changes.</para>
206
207 <para id="x_8c">The current generation of revision control tools is
208 peer-to-peer in nature. All of these systems have dropped the
209 dependency on a single central server, and allow people to
210 distribute their revision control data to where it's actually
211 needed. Collaboration over the Internet has moved from
212 constrained by technology to a matter of choice and consensus.
213 Modern tools can operate offline indefinitely and autonomously,
214 with a network connection only needed when syncing changes with
215 another repository.</para>
216
217 </sect1>
218 <sect1>
219 <title>A few of the advantages of distributed revision
220 control</title>
221
222 <para id="x_8d">Even though distributed revision control tools have for
223 several years been as robust and usable as their
224 previous-generation counterparts, people using older tools have
225 not yet necessarily woken up to their advantages. There are a
226 number of ways in which distributed tools shine relative to
227 centralised ones.</para>
228
229 <para id="x_8e">For an individual developer, distributed tools are almost
230 always much faster than centralised tools. This is for a simple
231 reason: a centralised tool needs to talk over the network for
232 many common operations, because most metadata is stored in a
233 single copy on the central server. A distributed tool stores
234 all of its metadata locally. All else being equal, talking over
235 the network adds overhead to a centralised tool. Don't
236 underestimate the value of a snappy, responsive tool: you're
237 going to spend a lot of time interacting with your revision
238 control software.</para>
239
240 <para id="x_8f">Distributed tools are indifferent to the vagaries of your
241 server infrastructure, again because they replicate metadata to
242 so many locations. If you use a centralised system and your
243 server catches fire, you'd better hope that your backup media
244 are reliable, and that your last backup was recent and actually
245 worked. With a distributed tool, you have many backups
246 available on every contributor's computer.</para>
247
248 <para id="x_90">The reliability of your network will affect distributed
249 tools far less than it will centralised tools. You can't even
250 use a centralised tool without a network connection, except for
251 a few highly constrained commands. With a distributed tool, if
252 your network connection goes down while you're working, you may
253 not even notice. The only thing you won't be able to do is talk
254 to repositories on other computers, something that is relatively
255 rare compared with local operations. If you have a far-flung
256 team of collaborators, this may be significant.</para>
257
258 <sect2>
259 <title>Advantages for open source projects</title>
260
261 <para id="x_91">If you take a shine to an open source project and decide
262 that you would like to start hacking on it, and that project
263 uses a distributed revision control tool, you are at once a
264 peer with the people who consider themselves the
265 <quote>core</quote> of that project. If they publish their
266 repositories, you can immediately copy their project history,
267 start making changes, and record your work, using the same
268 tools in the same ways as insiders. By contrast, with a
269 centralised tool, you must use the software in a <quote>read
270 only</quote> mode unless someone grants you permission to
271 commit changes to their central server. Until then, you won't
272 be able to record changes, and your local modifications will
273 be at risk of corruption any time you try to update your
274 client's view of the repository.</para>
275
276 <sect3>
277 <title>The forking non-problem</title>
278
279 <para id="x_92">It has been suggested that distributed revision control
280 tools pose some sort of risk to open source projects because
281 they make it easy to <quote>fork</quote> the development of
282 a project. A fork happens when there are differences in
283 opinion or attitude between groups of developers that cause
284 them to decide that they can't work together any longer.
285 Each side takes a more or less complete copy of the
286 project's source code, and goes off in its own
287 direction.</para>
288
289 <para id="x_93">Sometimes the camps in a fork decide to reconcile their
290 differences. With a centralised revision control system, the
291 <emphasis>technical</emphasis> process of reconciliation is
292 painful, and has to be performed largely by hand. You have
293 to decide whose revision history is going to
294 <quote>win</quote>, and graft the other team's changes into
295 the tree somehow. This usually loses some or all of one
296 side's revision history.</para>
297
298 <para id="x_94">What distributed tools do with respect to forking is
299 they make forking the <emphasis>only</emphasis> way to
300 develop a project. Every single change that you make is
301 potentially a fork point. The great strength of this
302 approach is that a distributed revision control tool has to
303 be really good at <emphasis>merging</emphasis> forks,
304 because forks are absolutely fundamental: they happen all
305 the time.</para>
306
307 <para id="x_95">If every piece of work that everybody does, all the
308 time, is framed in terms of forking and merging, then what
309 the open source world refers to as a <quote>fork</quote>
310 becomes <emphasis>purely</emphasis> a social issue. If
311 anything, distributed tools <emphasis>lower</emphasis> the
312 likelihood of a fork:</para>
313 <itemizedlist>
314 <listitem><para id="x_96">They eliminate the social distinction that
315 centralised tools impose: that between insiders (people
316 with commit access) and outsiders (people
317 without).</para></listitem>
318 <listitem><para id="x_97">They make it easier to reconcile after a
319 social fork, because all that's involved from the
320 perspective of the revision control software is just
321 another merge.</para></listitem></itemizedlist>
322
323 <para id="x_98">Some people resist distributed tools because they want
324 to retain tight control over their projects, and they
325 believe that centralised tools give them this control.
326 However, if you're of this belief, and you publish your CVS
327 or Subversion repositories publicly, there are plenty of
328 tools available that can pull out your entire project's
329 history (albeit slowly) and recreate it somewhere that you
330 don't control. So while your control in this case is
331 illusory, you are forgoing the ability to fluidly
332 collaborate with whatever people feel compelled to mirror
333 and fork your history.</para>
334
335 </sect3>
336 </sect2>
337 <sect2>
338 <title>Advantages for commercial projects</title>
339
340 <para id="x_99">Many commercial projects are undertaken by teams that are
341 scattered across the globe. Contributors who are far from a
342 central server will see slower command execution and perhaps
343 less reliability. Commercial revision control systems attempt
344 to ameliorate these problems with remote-site replication
345 add-ons that are typically expensive to buy and cantankerous
346 to administer. A distributed system doesn't suffer from these
347 problems in the first place. Better yet, you can easily set
348 up multiple authoritative servers, say one per site, so that
349 there's no redundant communication between repositories over
350 expensive long-haul network links.</para>
351
352 <para id="x_9a">Centralised revision control systems tend to have
353 relatively low scalability. It's not unusual for an expensive
354 centralised system to fall over under the combined load of
355 just a few dozen concurrent users. Once again, the typical
356 response tends to be an expensive and clunky replication
357 facility. Since the load on a central server&emdash;if you have
358 one at all&emdash;is many times lower with a distributed tool
359 (because all of the data is replicated everywhere), a single
360 cheap server can handle the needs of a much larger team, and
361 replication to balance load becomes a simple matter of
362 scripting.</para>
363
364 <para id="x_9b">If you have an employee in the field, troubleshooting a
365 problem at a customer's site, they'll benefit from distributed
366 revision control. The tool will let them generate custom
367 builds, try different fixes in isolation from each other, and
368 search efficiently through history for the sources of bugs and
369 regressions in the customer's environment, all without needing
370 to connect to your company's network.</para>
371
372 </sect2>
373 </sect1>
374 <sect1>
375 <title>Why choose Mercurial?</title>
376
377 <para id="x_9c">Mercurial has a unique set of properties that make it a
378 particularly good choice as a revision control system.</para>
379 <itemizedlist>
380 <listitem><para id="x_9d">It is easy to learn and use.</para></listitem>
381 <listitem><para id="x_9e">It is lightweight.</para></listitem>
382 <listitem><para id="x_9f">It scales excellently.</para></listitem>
383 <listitem><para id="x_a0">It is easy to
384 customise.</para></listitem></itemizedlist>
385
386 <para id="x_a1">If you are at all familiar with revision control systems,
387 you should be able to get up and running with Mercurial in less
388 than five minutes. Even if not, it will take no more than a few
389 minutes longer. Mercurial's command and feature sets are
390 generally uniform and consistent, so you can keep track of a few
391 general rules instead of a host of exceptions.</para>
392
393 <para id="x_a2">On a small project, you can start working with Mercurial in
394 moments. Creating new changes and branches; transferring changes
395 around (whether locally or over a network); and history and
396 status operations are all fast. Mercurial attempts to stay
397 nimble and largely out of your way by combining low cognitive
398 overhead with blazingly fast operations.</para>
399
400 <para id="x_a3">The usefulness of Mercurial is not limited to small
401 projects: it is used by projects with hundreds to thousands of
402 contributors, each containing tens of thousands of files and
403 hundreds of megabytes of source code.</para>
404
405 <para id="x_a4">If the core functionality of Mercurial is not enough for
406 you, it's easy to build on. Mercurial is well suited to
407 scripting tasks, and its clean internals and implementation in
408 Python make it easy to add features in the form of extensions.
409 There are a number of popular and useful extensions already
410 available, ranging from helping to identify bugs to improving
411 performance.</para>
412
413 </sect1>
414 <sect1>
415 <title>Mercurial compared with other tools</title>
416
417 <para id="x_a5">Before you read on, please understand that this section
418 necessarily reflects my own experiences, interests, and (dare I
419 say it) biases. I have used every one of the revision control
420 tools listed below, in most cases for several years at a
421 time.</para>
422
423
424 <sect2>
425 <title>Subversion</title>
426
427 <para id="x_a6">Subversion is a popular revision control tool, developed
428 to replace CVS. It has a centralised client/server
429 architecture.</para>
430
431 <para id="x_a7">Subversion and Mercurial have similarly named commands for
432 performing the same operations, so if you're familiar with
433 one, it is easy to learn to use the other. Both tools are
434 portable to all popular operating systems.</para>
435
436 <para id="x_a8">Prior to version 1.5, Subversion had no useful support for
437 merges. At the time of writing, its merge tracking capability
438 is new, and known to be <ulink
439 url="http://svnbook.red-bean.com/nightly/en/svn.branchmerge.advanced.html#svn.branchmerge.advanced.finalword">complicated
440 and buggy</ulink>.</para>
441
442 <para id="x_a9">Mercurial has a substantial performance advantage over
443 Subversion on every revision control operation I have
444 benchmarked. I have measured its advantage as ranging from a
445 factor of two to a factor of six when compared with Subversion
446 1.4.3's <emphasis>ra_local</emphasis> file store, which is the
447 fastest access method available. In more realistic
448 deployments involving a network-based store, Subversion will
449 be at a substantially larger disadvantage. Because many
450 Subversion commands must talk to the server and Subversion
451 does not have useful replication facilities, server capacity
452 and network bandwidth become bottlenecks for modestly large
453 projects.</para>
454
455 <para id="x_aa">Additionally, Subversion incurs substantial storage
456 overhead to avoid network transactions for a few common
457 operations, such as finding modified files
458 (<literal>status</literal>) and displaying modifications
459 against the current revision (<literal>diff</literal>). As a
460 result, a Subversion working copy is often the same size as,
461 or larger than, a Mercurial repository and working directory,
462 even though the Mercurial repository contains a complete
463 history of the project.</para>
464
465 <para id="x_ab">Subversion is widely supported by third party tools.
466 Mercurial currently lags considerably in this area. This gap
467 is closing, however, and indeed some of Mercurial's GUI tools
468 now outshine their Subversion equivalents. Like Mercurial,
469 Subversion has an excellent user manual.</para>
470
471 <para id="x_ac">Because Subversion doesn't store revision history on the
472 client, it is well suited to managing projects that deal with
473 lots of large, opaque binary files. If you check in fifty
474 revisions to an incompressible 10MB file, Subversion's
475 client-side space usage stays constant The space used by any
476 distributed SCM will grow rapidly in proportion to the number
477 of revisions, because the differences between each revision
478 are large.</para>
479
480 <para id="x_ad">In addition, it's often difficult or, more usually,
481 impossible to merge different versions of a binary file.
482 Subversion's ability to let a user lock a file, so that they
483 temporarily have the exclusive right to commit changes to it,
484 can be a significant advantage to a project where binary files
485 are widely used.</para>
486
487 <para id="x_ae">Mercurial can import revision history from a Subversion
488 repository. It can also export revision history to a
489 Subversion repository. This makes it easy to <quote>test the
490 waters</quote> and use Mercurial and Subversion in parallel
491 before deciding to switch. History conversion is incremental,
492 so you can perform an initial conversion, then small
493 additional conversions afterwards to bring in new
494 changes.</para>
495
496
497 </sect2>
498 <sect2>
499 <title>Git</title>
500
501 <para id="x_af">Git is a distributed revision control tool that was
502 developed for managing the Linux kernel source tree. Like
503 Mercurial, its early design was somewhat influenced by
504 Monotone.</para>
505
506 <para id="x_b0">Git has a very large command set, with version 1.5.0
507 providing 139 individual commands. It has something of a
508 reputation for being difficult to learn. Compared to Git,
509 Mercurial has a strong focus on simplicity.</para>
510
511 <para id="x_b1">In terms of performance, Git is extremely fast. In
512 several cases, it is faster than Mercurial, at least on Linux,
513 while Mercurial performs better on other operations. However,
514 on Windows, the performance and general level of support that
515 Git provides is, at the time of writing, far behind that of
516 Mercurial.</para>
517
518 <para id="x_b2">While a Mercurial repository needs no maintenance, a Git
519 repository requires frequent manual <quote>repacks</quote> of
520 its metadata. Without these, performance degrades, while
521 space usage grows rapidly. A server that contains many Git
522 repositories that are not rigorously and frequently repacked
523 will become heavily disk-bound during backups, and there have
524 been instances of daily backups taking far longer than 24
525 hours as a result. A freshly packed Git repository is
526 slightly smaller than a Mercurial repository, but an unpacked
527 repository is several orders of magnitude larger.</para>
528
529 <para id="x_b3">The core of Git is written in C. Many Git commands are
530 implemented as shell or Perl scripts, and the quality of these
531 scripts varies widely. I have encountered several instances
532 where scripts charged along blindly in the presence of errors
533 that should have been fatal.</para>
534
535 <para id="x_b4">Mercurial can import revision history from a Git
536 repository.</para>
537
538
539 </sect2>
540 <sect2>
541 <title>CVS</title>
542
543 <para id="x_b5">CVS is probably the most widely used revision control tool
544 in the world. Due to its age and internal untidiness, it has
545 been only lightly maintained for many years.</para>
546
547 <para id="x_b6">It has a centralised client/server architecture. It does
548 not group related file changes into atomic commits, making it
549 easy for people to <quote>break the build</quote>: one person
550 can successfully commit part of a change and then be blocked
551 by the need for a merge, causing other people to see only a
552 portion of the work they intended to do. This also affects
553 how you work with project history. If you want to see all of
554 the modifications someone made as part of a task, you will
555 need to manually inspect the descriptions and timestamps of
556 the changes made to each file involved (if you even know what
557 those files were).</para>
558
559 <para id="x_b7">CVS has a muddled notion of tags and branches that I will
560 not attempt to even describe. It does not support renaming of
561 files or directories well, making it easy to corrupt a
562 repository. It has almost no internal consistency checking
563 capabilities, so it is usually not even possible to tell
564 whether or how a repository is corrupt. I would not recommend
565 CVS for any project, existing or new.</para>
566
567 <para id="x_b8">Mercurial can import CVS revision history. However, there
568 are a few caveats that apply; these are true of every other
569 revision control tool's CVS importer, too. Due to CVS's lack
570 of atomic changes and unversioned filesystem hierarchy, it is
571 not possible to reconstruct CVS history completely accurately;
572 some guesswork is involved, and renames will usually not show
573 up. Because a lot of advanced CVS administration has to be
574 done by hand and is hence error-prone, it's common for CVS
575 importers to run into multiple problems with corrupted
576 repositories (completely bogus revision timestamps and files
577 that have remained locked for over a decade are just two of
578 the less interesting problems I can recall from personal
579 experience).</para>
580
581 <para id="x_b9">Mercurial can import revision history from a CVS
582 repository.</para>
583
584
585 </sect2>
586 <sect2>
587 <title>Commercial tools</title>
588
589 <para id="x_ba">Perforce has a centralised client/server architecture,
590 with no client-side caching of any data. Unlike modern
591 revision control tools, Perforce requires that a user run a
592 command to inform the server about every file they intend to
593 edit.</para>
594
595 <para id="x_bb">The performance of Perforce is quite good for small teams,
596 but it falls off rapidly as the number of users grows beyond a
597 few dozen. Modestly large Perforce installations require the
598 deployment of proxies to cope with the load their users
599 generate.</para>
600
601
602 </sect2>
603 <sect2>
604 <title>Choosing a revision control tool</title>
605
606 <para id="x_bc">With the exception of CVS, all of the tools listed above
607 have unique strengths that suit them to particular styles of
608 work. There is no single revision control tool that is best
609 in all situations.</para>
610
611 <para id="x_bd">As an example, Subversion is a good choice for working
612 with frequently edited binary files, due to its centralised
613 nature and support for file locking.</para>
614
615 <para id="x_be">I personally find Mercurial's properties of simplicity,
616 performance, and good merge support to be a compelling
617 combination that has served me well for several years.</para>
618
619
620 </sect2>
621 </sect1>
622 <sect1>
623 <title>Switching from another tool to Mercurial</title>
624
625 <para id="x_bf">Mercurial is bundled with an extension named <literal
626 role="hg-ext">convert</literal>, which can incrementally
627 import revision history from several other revision control
628 tools. By <quote>incremental</quote>, I mean that you can
629 convert all of a project's history to date in one go, then rerun
630 the conversion later to obtain new changes that happened after
631 the initial conversion.</para>
632
633 <para id="x_c0">The revision control tools supported by <literal
634 role="hg-ext">convert</literal> are as follows:</para>
635 <itemizedlist>
636 <listitem><para id="x_c1">Subversion</para></listitem>
637 <listitem><para id="x_c2">CVS</para></listitem>
638 <listitem><para id="x_c3">Git</para></listitem>
639 <listitem><para id="x_c4">Darcs</para></listitem></itemizedlist>
640
641 <para id="x_c5">In addition, <literal role="hg-ext">convert</literal> can
642 export changes from Mercurial to Subversion. This makes it
643 possible to try Subversion and Mercurial in parallel before
644 committing to a switchover, without risking the loss of any
645 work.</para>
646
647 <para id="x_c6">The <command role="hg-ext-convert">convert</command> command
648 is easy to use. Simply point it at the path or URL of the
649 source repository, optionally give it the name of the
650 destination repository, and it will start working. After the
651 initial conversion, just run the same command again to import
652 new changes.</para>
653 </sect1> 105 </sect1>
654 106
655 <sect1> 107 <sect1>
656 <title>A short history of revision control</title> 108 <title>How to Contact Us</title>
657 109
658 <para id="x_c7">The best known of the old-time revision control tools is 110 <para>Please address comments and questions concerning this book
659 SCCS (Source Code Control System), which Marc Rochkind wrote at 111 to the publisher:</para>
660 Bell Labs, in the early 1970s. SCCS operated on individual
661 files, and required every person working on a project to have
662 access to a shared workspace on a single system. Only one
663 person could modify a file at any time; arbitration for access
664 to files was via locks. It was common for people to lock files,
665 and later forget to unlock them, preventing anyone else from
666 modifying those files without the help of an
667 administrator.</para>
668 112
669 <para id="x_c8">Walter Tichy developed a free alternative to SCCS in the 113 <simplelist type="vert">
670 early 1980s; he called his program RCS (Revision Control System). 114 <member>O’Reilly Media, Inc.</member>
671 Like SCCS, RCS required developers to work in a single shared
672 workspace, and to lock files to prevent multiple people from
673 modifying them simultaneously.</para>
674 115
675 <para id="x_c9">Later in the 1980s, Dick Grune used RCS as a building block 116 <member>1005 Gravenstein Highway North</member>
676 for a set of shell scripts he initially called cmt, but then
677 renamed to CVS (Concurrent Versions System). The big innovation
678 of CVS was that it let developers work simultaneously and
679 somewhat independently in their own personal workspaces. The
680 personal workspaces prevented developers from stepping on each
681 other's toes all the time, as was common with SCCS and RCS. Each
682 developer had a copy of every project file, and could modify
683 their copies independently. They had to merge their edits prior
684 to committing changes to the central repository.</para>
685 117
686 <para id="x_ca">Brian Berliner took Grune's original scripts and rewrote 118 <member>Sebastopol, CA 95472</member>
687 them in C, releasing in 1989 the code that has since developed
688 into the modern version of CVS. CVS subsequently acquired the
689 ability to operate over a network connection, giving it a
690 client/server architecture. CVS's architecture is centralised;
691 only the server has a copy of the history of the project. Client
692 workspaces just contain copies of recent versions of the
693 project's files, and a little metadata to tell them where the
694 server is. CVS has been enormously successful; it is probably
695 the world's most widely used revision control system.</para>
696 119
697 <para id="x_cb">In the early 1990s, Sun Microsystems developed an early 120 <member>800-998-9938 (in the United States or Canada)</member>
698 distributed revision control system, called TeamWare. A
699 TeamWare workspace contains a complete copy of the project's
700 history. TeamWare has no notion of a central repository. (CVS
701 relied upon RCS for its history storage; TeamWare used
702 SCCS.)</para>
703 121
704 <para id="x_cc">As the 1990s progressed, awareness grew of a number of 122 <member>707-829-0515 (international or local)</member>
705 problems with CVS. It records simultaneous changes to multiple
706 files individually, instead of grouping them together as a
707 single logically atomic operation. It does not manage its file
708 hierarchy well; it is easy to make a mess of a repository by
709 renaming files and directories. Worse, its source code is
710 difficult to read and maintain, which made the <quote>pain
711 level</quote> of fixing these architectural problems
712 prohibitive.</para>
713 123
714 <para id="x_cd">In 2001, Jim Blandy and Karl Fogel, two developers who had 124 <member>707 829-0104 (fax)</member>
715 worked on CVS, started a project to replace it with a tool that 125 </simplelist>
716 would have a better architecture and cleaner code. The result,
717 Subversion, does not stray from CVS's centralised client/server
718 model, but it adds multi-file atomic commits, better namespace
719 management, and a number of other features that make it a
720 generally better tool than CVS. Since its initial release, it
721 has rapidly grown in popularity.</para>
722 126
723 <para id="x_ce">More or less simultaneously, Graydon Hoare began working on 127 <para>We have a web page for this book, where we list errata,
724 an ambitious distributed revision control system that he named 128 examples, and any additional information. You can access this
725 Monotone. While Monotone addresses many of CVS's design flaws 129 page at:</para>
726 and has a peer-to-peer architecture, it goes beyond earlier (and
727 subsequent) revision control tools in a number of innovative
728 ways. It uses cryptographic hashes as identifiers, and has an
729 integral notion of <quote>trust</quote> for code from different
730 sources.</para>
731 130
732 <para id="x_cf">Mercurial began life in 2005. While a few aspects of its 131 <simplelist type="vert">
733 design are influenced by Monotone, Mercurial focuses on ease of 132 <member><ulink url="http://www.oreilly.com/catalog/&lt;catalog
734 use, high performance, and scalability to very large 133 page&gt;"></ulink></member>
735 projects.</para> 134 </simplelist>
736 135
737 </sect1> 136 <remark>Don’t forget to update the &lt;url&gt; attribute,
137 too.</remark>
738 138
739 <sect1> 139 <para>To comment or ask technical questions about this book, send
740 <title>Colophon&emdash;this book is Free</title> 140 email to:</para>
741 141
742 <para id="x_d0">This book is licensed under the Open Publication License, 142 <simplelist type="vert">
743 and is produced entirely using Free Software tools. It is 143 <member><email>bookquestions@oreilly.com</email></member>
744 typeset with DocBook XML. Illustrations are drawn and rendered with 144 </simplelist>
745 <ulink url="http://www.inkscape.org/">Inkscape</ulink>.</para>
746 145
747 <para id="x_d1">The complete source code for this book is published as a 146 <para>For more information about our books, conferences, Resource
748 Mercurial repository, at <ulink 147 Centers, and the O’Reilly Network, see our web site at:</para>
749 url="http://hg.serpentine.com/mercurial/book">http://hg.serpentine.com/mercurial/book</ulink>.</para>
750 148
149 <simplelist type="vert">
150 <member><ulink url="http://www.oreilly.com"></ulink></member>
151 </simplelist>
751 </sect1> 152 </sect1>
752 </preface> 153 </preface>
154
753 <!-- 155 <!--
754 local variables: 156 local variables:
755 sgml-parent-document: ("00book.xml" "book" "preface") 157 sgml-parent-document: ("00book.xml" "book" "preface")
756 end: 158 end:
757 --> 159 -->