comparison en/ch03-concepts.xml @ 828:477d6a3e5023

Many final changes.
author Bryan O'Sullivan <bos@serpentine.com>
date Mon, 04 May 2009 23:52:38 -0700
parents 29f0f79cf614
children 18131160f7ee
comparison
equal deleted inserted replaced
827:d2aacc06e562 828:477d6a3e5023
110 </figure> 110 </figure>
111 111
112 <para id="x_2f3">As the illustration shows, there is 112 <para id="x_2f3">As the illustration shows, there is
113 <emphasis>not</emphasis> a <quote>one to one</quote> 113 <emphasis>not</emphasis> a <quote>one to one</quote>
114 relationship between revisions in the changelog, manifest, or 114 relationship between revisions in the changelog, manifest, or
115 filelog. If the manifest hasn't changed between two 115 filelog. If a file that
116 changesets, the changelog entries for those changesets will
117 point to the same revision of the manifest. If a file that
118 Mercurial tracks hasn't changed between two changesets, the 116 Mercurial tracks hasn't changed between two changesets, the
119 entry for that file in the two revisions of the manifest will 117 entry for that file in the two revisions of the manifest will
120 point to the same revision of its filelog.</para> 118 point to the same revision of its filelog<footnote>
119 <para>It is possible (though unusual) for the manifest to
120 remain the same between two changesets, in which case the
121 changelog entries for those changesets will point to the
122 same revision of the manifest.</para>
123 </footnote>.</para>
121 124
122 </sect2> 125 </sect2>
123 </sect1> 126 </sect1>
124 <sect1> 127 <sect1>
125 <title>Safe, efficient storage</title> 128 <title>Safe, efficient storage</title>
173 176
174 </sect2> 177 </sect2>
175 <sect2> 178 <sect2>
176 <title>Fast retrieval</title> 179 <title>Fast retrieval</title>
177 180
178 <para id="x_2fa">Mercurial cleverly avoids a pitfall common to all earlier 181 <para id="x_2fa">Mercurial cleverly avoids a pitfall common to
179 revision control systems: the problem of <emphasis>inefficient 182 all earlier revision control systems: the problem of
180 retrieval</emphasis>. Most revision control systems store 183 <emphasis>inefficient retrieval</emphasis>. Most revision
181 the contents of a revision as an incremental series of 184 control systems store the contents of a revision as an
182 modifications against a <quote>snapshot</quote>. To 185 incremental series of modifications against a
183 reconstruct a specific revision, you must first read the 186 <quote>snapshot</quote>. (Some base the snapshot on the
184 snapshot, and then every one of the revisions between the 187 oldest revision, others on the newest.) To reconstruct a
185 snapshot and your target revision. The more history that a 188 specific revision, you must first read the snapshot, and then
186 file accumulates, the more revisions you must read, hence the 189 every one of the revisions between the snapshot and your
187 longer it takes to reconstruct a particular revision.</para> 190 target revision. The more history that a file accumulates,
191 the more revisions you must read, hence the longer it takes to
192 reconstruct a particular revision.</para>
188 193
189 <figure id="fig:concepts:snapshot"> 194 <figure id="fig:concepts:snapshot">
190 <title>Snapshot of a revlog, with incremental deltas</title> 195 <title>Snapshot of a revlog, with incremental deltas</title>
191 <mediaobject> 196 <mediaobject>
192 <imageobject><imagedata fileref="figs/snapshot.png"/></imageobject> 197 <imageobject><imagedata fileref="figs/snapshot.png"/></imageobject>
209 read to reconstruct a particular revision.</para> 214 read to reconstruct a particular revision.</para>
210 215
211 <sect3> 216 <sect3>
212 <title>Aside: the influence of video compression</title> 217 <title>Aside: the influence of video compression</title>
213 218
214 <para id="x_2fe">If you're familiar with video compression or have ever 219 <para id="x_2fe">If you're familiar with video compression or
215 watched a TV feed through a digital cable or satellite 220 have ever watched a TV feed through a digital cable or
216 service, you may know that most video compression schemes 221 satellite service, you may know that most video compression
217 store each frame of video as a delta against its predecessor 222 schemes store each frame of video as a delta against its
218 frame. In addition, these schemes use <quote>lossy</quote> 223 predecessor frame.</para>
219 compression techniques to increase the compression ratio, so 224
220 visual errors accumulate over the course of a number of 225 <para id="x_2ff">Mercurial borrows this idea to make it
221 inter-frame deltas.</para> 226 possible to reconstruct a revision from a snapshot and a
222 227 small number of deltas.</para>
223 <para id="x_2ff">Because it's possible for a video stream to <quote>drop
224 out</quote> occasionally due to signal glitches, and to
225 limit the accumulation of artefacts introduced by the lossy
226 compression process, video encoders periodically insert a
227 complete frame (called a <quote>key frame</quote>) into the
228 video stream; the next delta is generated against that
229 frame. This means that if the video signal gets
230 interrupted, it will resume once the next key frame is
231 received. Also, the accumulation of encoding errors
232 restarts anew with each key frame.</para>
233 228
234 </sect3> 229 </sect3>
235 </sect2> 230 </sect2>
236 <sect2> 231 <sect2>
237 <title>Identification and strong integrity</title> 232 <title>Identification and strong integrity</title>
259 corrupted due to a hardware error or system bug, it's often 254 corrupted due to a hardware error or system bug, it's often
260 possible to reconstruct some or most revisions from the 255 possible to reconstruct some or most revisions from the
261 uncorrupted sections of the revlog, both before and after the 256 uncorrupted sections of the revlog, both before and after the
262 corrupted section. This would not be possible with a 257 corrupted section. This would not be possible with a
263 delta-only storage model.</para> 258 delta-only storage model.</para>
264
265 </sect2> 259 </sect2>
266 </sect1> 260 </sect1>
261
267 <sect1> 262 <sect1>
268 <title>Revision history, branching, and merging</title> 263 <title>Revision history, branching, and merging</title>
269 264
270 <para id="x_304">Every entry in a Mercurial revlog knows the identity of its 265 <para id="x_304">Every entry in a Mercurial revlog knows the identity of its
271 immediate ancestor revision, usually referred to as its 266 immediate ancestor revision, usually referred to as its
312 at the time that changeset was committed, and which revision of 307 at the time that changeset was committed, and which revision of
313 each file was then current. It then recreates a copy of each of 308 each file was then current. It then recreates a copy of each of
314 those files, with the same contents it had when the changeset 309 those files, with the same contents it had when the changeset
315 was committed.</para> 310 was committed.</para>
316 311
317 <para id="x_309">The <emphasis>dirstate</emphasis> contains Mercurial's 312 <para id="x_309">The <emphasis>dirstate</emphasis> is a special
318 knowledge of the working directory. This details which 313 structure that contains Mercurial's knowledge of the working
319 changeset the working directory is updated to, and all of the 314 directory. It is maintained as a file named
320 files that Mercurial is tracking in the working 315 <filename>.hg/dirstate</filename> inside a repository. The
321 directory.</para> 316 dirstate details which changeset the working directory is
317 updated to, and all of the files that Mercurial is tracking in
318 the working directory. It also lets Mercurial quickly notice
319 changed files, by recording their checkout times and
320 sizes.</para>
322 321
323 <para id="x_30a">Just as a revision of a revlog has room for two parents, so 322 <para id="x_30a">Just as a revision of a revlog has room for two parents, so
324 that it can represent either a normal revision (with one parent) 323 that it can represent either a normal revision (with one parent)
325 or a merge of two earlier revisions, the dirstate has slots for 324 or a merge of two earlier revisions, the dirstate has slots for
326 two parents. When you use the <command role="hg-cmd">hg 325 two parents. When you use the <command role="hg-cmd">hg
424 <textobject><phrase>XXX add text</phrase></textobject> 423 <textobject><phrase>XXX add text</phrase></textobject>
425 </mediaobject> 424 </mediaobject>
426 </figure> 425 </figure>
427 426
428 <note> 427 <note>
429 <para id="x_315"> If you're new to Mercurial, you should keep in mind a 428 <para id="x_315">If you're new to Mercurial, you should keep
430 common <quote>error</quote>, which is to use the <command 429 in mind a common <quote>error</quote>, which is to use the
431 role="hg-cmd">hg pull</command> command without any 430 <command role="hg-cmd">hg pull</command> command without any
432 options. By default, the <command role="hg-cmd">hg 431 options. By default, the <command role="hg-cmd">hg
433 pull</command> command <emphasis>does not</emphasis> 432 pull</command> command <emphasis>does not</emphasis>
434 update the working directory, so you'll bring new changesets 433 update the working directory, so you'll bring new changesets
435 into your repository, but the working directory will stay 434 into your repository, but the working directory will stay
436 synced at the same changeset as before the pull. If you 435 synced at the same changeset as before the pull. If you
437 make some changes and commit afterwards, you'll thus create 436 make some changes and commit afterwards, you'll thus create
438 a new head, because your working directory isn't synced to 437 a new head, because your working directory isn't synced to
439 whatever the current tip is.</para> 438 whatever the current tip is. To combine the operation of a
440 439 pull, followed by an update, run <command>hg pull
441 <para id="x_316"> I put the word <quote>error</quote> in 440 -u</command>.</para>
442 quotes because all that you need to do to rectify this 441
443 situation is <command role="hg-cmd">hg merge</command>, then 442 <para id="x_316">I put the word <quote>error</quote> in quotes
444 <command role="hg-cmd">hg commit</command>. In other words, 443 because all that you need to do to rectify the situation
445 this almost never has negative consequences; it's just 444 where you created a new head by accident is
446 something of a surprise for newcomers. I'll discuss other 445 <command role="hg-cmd">hg merge</command>, then <command
447 ways to avoid this behavior, and why Mercurial behaves in 446 role="hg-cmd">hg commit</command>. In other words, this
448 this initially surprising way, later on.</para> 447 almost never has negative consequences; it's just something
448 of a surprise for newcomers. I'll discuss other ways to
449 avoid this behavior, and why Mercurial behaves in this
450 initially surprising way, later on.</para>
449 </note> 451 </note>
450 452
451 </sect2> 453 </sect2>
452 <sect2> 454 <sect2>
453 <title>Merging changes</title> 455 <title>Merging changes</title>
509 changeset I'm about to commit</quote>. After the <command 511 changeset I'm about to commit</quote>. After the <command
510 role="hg-cmd">hg merge</command> command completes, the 512 role="hg-cmd">hg merge</command> command completes, the
511 working directory has two parents; these will become the 513 working directory has two parents; these will become the
512 parents of the new changeset.</para> 514 parents of the new changeset.</para>
513 515
514 <para id="x_322">Mercurial lets you perform multiple merges, but you must 516 <para id="x_322">Mercurial lets you perform multiple merges, but
515 commit the results of each individual merge as you go. This 517 you must commit the results of each individual merge as you
516 is necessary because Mercurial only tracks two parents for 518 go. This is necessary because Mercurial only tracks two
517 both revisions and the working directory. While it would be 519 parents for both revisions and the working directory. While
518 technically possible to merge multiple changesets at once, the 520 it would be technically feasible to merge multiple changesets
519 prospect of user confusion and making a terrible mess of a 521 at once, Mercurial avoids this for simplicity. With multi-way
520 merge immediately becomes overwhelming.</para> 522 merges, the risks of user confusion, nasty conflict
523 resolution, and making a terrible mess of a merge would grow
524 intolerable.</para>
521 525
522 </sect2> 526 </sect2>
523 527
524 <sect2> 528 <sect2>
525 <title>Merging and renames</title> 529 <title>Merging and renames</title>
596 compression of the entire stream (instead of a revision at a 600 compression of the entire stream (instead of a revision at a
597 time) substantially reduces the number of bytes to be 601 time) substantially reduces the number of bytes to be
598 transferred, yielding better network performance over most 602 transferred, yielding better network performance over most
599 kinds of network.</para> 603 kinds of network.</para>
600 604
601 <para id="x_329">(If the connection is over <command>ssh</command>, 605 <para id="x_329">If the connection is over
602 Mercurial <emphasis>doesn't</emphasis> recompress the 606 <command>ssh</command>, Mercurial
603 stream, because <command>ssh</command> can already do this 607 <emphasis>doesn't</emphasis> recompress the stream, because
604 itself.)</para> 608 <command>ssh</command> can already do this itself. You can
609 tell Mercurial to always use <command>ssh</command>'s
610 compression feature by editing the
611 <filename>.hgrc</filename> file in your home directory as
612 follows.</para>
613
614 <programlisting>[ui]
615 ssh = ssh -C</programlisting>
605 616
606 </sect3> 617 </sect3>
607 </sect2> 618 </sect2>
608 <sect2> 619 <sect2>
609 <title>Read/write ordering and atomicity</title> 620 <title>Read/write ordering and atomicity</title>
610 621
611 <para id="x_32a">Appending to files isn't the whole story when 622 <para id="x_32a">Appending to files isn't the whole story when
612 it comes to guaranteeing that a reader won't see a partial 623 it comes to guaranteeing that a reader won't see a partial
613 write. If you recall <xref linkend="fig:concepts:metadata"/>, 624 write. If you recall <xref linkend="fig:concepts:metadata"/>,
614 revisions in 625 revisions in the changelog point to revisions in the manifest,
615 the changelog point to revisions in the manifest, and 626 and revisions in the manifest point to revisions in filelogs.
616 revisions in the manifest point to revisions in filelogs.
617 This hierarchy is deliberate.</para> 627 This hierarchy is deliberate.</para>
618 628
619 <para id="x_32b">A writer starts a transaction by writing filelog and 629 <para id="x_32b">A writer starts a transaction by writing filelog and
620 manifest data, and doesn't write any changelog data until 630 manifest data, and doesn't write any changelog data until
621 those are finished. A reader starts by reading changelog 631 those are finished. A reader starts by reading changelog
635 Mercurial never needs to <emphasis>lock</emphasis> a 645 Mercurial never needs to <emphasis>lock</emphasis> a
636 repository when it's reading data, even if the repository is 646 repository when it's reading data, even if the repository is
637 being written to while the read is occurring. This has a big 647 being written to while the read is occurring. This has a big
638 effect on scalability; you can have an arbitrary number of 648 effect on scalability; you can have an arbitrary number of
639 Mercurial processes safely reading data from a repository 649 Mercurial processes safely reading data from a repository
640 safely all at once, no matter whether it's being written to or 650 all at once, no matter whether it's being written to or
641 not.</para> 651 not.</para>
642 652
643 <para id="x_32e">The lockless nature of reading means that if you're 653 <para id="x_32e">The lockless nature of reading means that if you're
644 sharing a repository on a multi-user system, you don't need to 654 sharing a repository on a multi-user system, you don't need to
645 grant other local users permission to 655 grant other local users permission to
707 this idea of making a complete private copy of a file is not 717 this idea of making a complete private copy of a file is not
708 very efficient in its use of storage. While this is true, 718 very efficient in its use of storage. While this is true,
709 storage is cheap, and this method gives the highest 719 storage is cheap, and this method gives the highest
710 performance while deferring most book-keeping to the operating 720 performance while deferring most book-keeping to the operating
711 system. An alternative scheme would most likely reduce 721 system. An alternative scheme would most likely reduce
712 performance and increase the complexity of the software, each 722 performance and increase the complexity of the software, but
713 of which is much more important to the <quote>feel</quote> of 723 speed and simplicity are key to the <quote>feel</quote> of
714 day-to-day use.</para> 724 day-to-day use.</para>
715 725
716 </sect2> 726 </sect2>
717 <sect2> 727 <sect2>
718 <title>Other contents of the dirstate</title> 728 <title>Other contents of the dirstate</title>
729 <command role="hg-cmd">hg rename</command> or <command 739 <command role="hg-cmd">hg rename</command> or <command
730 role="hg-cmd">hg copy</command> files, Mercurial updates the 740 role="hg-cmd">hg copy</command> files, Mercurial updates the
731 dirstate so that it knows what to do with those files when you 741 dirstate so that it knows what to do with those files when you
732 commit.</para> 742 commit.</para>
733 743
734 <para id="x_337">When Mercurial is checking the states of files in the 744 <para id="x_337">The dirstate helps Mercurial to efficiently
735 working directory, it first checks a file's modification time. 745 check the status of files in a repository.</para>
736 If that has not changed, the file must not have been modified. 746
737 If the file's size has changed, the file must have been 747 <itemizedlist>
738 modified. If the modification time has changed, but the size 748 <listitem>
739 has not, only then does Mercurial need to read the actual 749 <para>When Mercurial checks the state of a file in the
740 contents of the file to see if they've changed. Storing these 750 working directory, it first checks a file's modification
741 few extra pieces of information dramatically reduces the 751 time against the time in the dirstate that records when
742 amount of data that Mercurial needs to read, which yields 752 Mercurial last wrote the file. If the last modified time
743 large performance improvements compared to other revision 753 is the same as the time when Mercurial wrote the file, the
744 control systems.</para> 754 file must not have been modified, so Mercurial does not
745 755 need to check any further.</para>
756 </listitem>
757 <listitem>
758 <para>If the file's size has changed, the file must have
759 been modified. If the modification time has changed, but
760 the size has not, only then does Mercurial need to
761 actually read the contents of the file to see if it has
762 changed.</para>
763 </listitem>
764 </itemizedlist>
765
766 <para>Storing the modification time and size dramatically
767 reduces the number of read operations that Mercurial needs to
768 perform when we run commands like <command>hg status</command>.
769 This results in large performance improvements.</para>
746 </sect2> 770 </sect2>
747 </sect1> 771 </sect1>
748 </chapter> 772 </chapter>
749 773
750 <!-- 774 <!--