Mercurial > hgbook
comparison en/ch03-concepts.xml @ 828:477d6a3e5023
Many final changes.
author | Bryan O'Sullivan <bos@serpentine.com> |
---|---|
date | Mon, 04 May 2009 23:52:38 -0700 |
parents | 29f0f79cf614 |
children | 18131160f7ee |
comparison
equal
deleted
inserted
replaced
827:d2aacc06e562 | 828:477d6a3e5023 |
---|---|
110 </figure> | 110 </figure> |
111 | 111 |
112 <para id="x_2f3">As the illustration shows, there is | 112 <para id="x_2f3">As the illustration shows, there is |
113 <emphasis>not</emphasis> a <quote>one to one</quote> | 113 <emphasis>not</emphasis> a <quote>one to one</quote> |
114 relationship between revisions in the changelog, manifest, or | 114 relationship between revisions in the changelog, manifest, or |
115 filelog. If the manifest hasn't changed between two | 115 filelog. If a file that |
116 changesets, the changelog entries for those changesets will | |
117 point to the same revision of the manifest. If a file that | |
118 Mercurial tracks hasn't changed between two changesets, the | 116 Mercurial tracks hasn't changed between two changesets, the |
119 entry for that file in the two revisions of the manifest will | 117 entry for that file in the two revisions of the manifest will |
120 point to the same revision of its filelog.</para> | 118 point to the same revision of its filelog<footnote> |
119 <para>It is possible (though unusual) for the manifest to | |
120 remain the same between two changesets, in which case the | |
121 changelog entries for those changesets will point to the | |
122 same revision of the manifest.</para> | |
123 </footnote>.</para> | |
121 | 124 |
122 </sect2> | 125 </sect2> |
123 </sect1> | 126 </sect1> |
124 <sect1> | 127 <sect1> |
125 <title>Safe, efficient storage</title> | 128 <title>Safe, efficient storage</title> |
173 | 176 |
174 </sect2> | 177 </sect2> |
175 <sect2> | 178 <sect2> |
176 <title>Fast retrieval</title> | 179 <title>Fast retrieval</title> |
177 | 180 |
178 <para id="x_2fa">Mercurial cleverly avoids a pitfall common to all earlier | 181 <para id="x_2fa">Mercurial cleverly avoids a pitfall common to |
179 revision control systems: the problem of <emphasis>inefficient | 182 all earlier revision control systems: the problem of |
180 retrieval</emphasis>. Most revision control systems store | 183 <emphasis>inefficient retrieval</emphasis>. Most revision |
181 the contents of a revision as an incremental series of | 184 control systems store the contents of a revision as an |
182 modifications against a <quote>snapshot</quote>. To | 185 incremental series of modifications against a |
183 reconstruct a specific revision, you must first read the | 186 <quote>snapshot</quote>. (Some base the snapshot on the |
184 snapshot, and then every one of the revisions between the | 187 oldest revision, others on the newest.) To reconstruct a |
185 snapshot and your target revision. The more history that a | 188 specific revision, you must first read the snapshot, and then |
186 file accumulates, the more revisions you must read, hence the | 189 every one of the revisions between the snapshot and your |
187 longer it takes to reconstruct a particular revision.</para> | 190 target revision. The more history that a file accumulates, |
191 the more revisions you must read, hence the longer it takes to | |
192 reconstruct a particular revision.</para> | |
188 | 193 |
189 <figure id="fig:concepts:snapshot"> | 194 <figure id="fig:concepts:snapshot"> |
190 <title>Snapshot of a revlog, with incremental deltas</title> | 195 <title>Snapshot of a revlog, with incremental deltas</title> |
191 <mediaobject> | 196 <mediaobject> |
192 <imageobject><imagedata fileref="figs/snapshot.png"/></imageobject> | 197 <imageobject><imagedata fileref="figs/snapshot.png"/></imageobject> |
209 read to reconstruct a particular revision.</para> | 214 read to reconstruct a particular revision.</para> |
210 | 215 |
211 <sect3> | 216 <sect3> |
212 <title>Aside: the influence of video compression</title> | 217 <title>Aside: the influence of video compression</title> |
213 | 218 |
214 <para id="x_2fe">If you're familiar with video compression or have ever | 219 <para id="x_2fe">If you're familiar with video compression or |
215 watched a TV feed through a digital cable or satellite | 220 have ever watched a TV feed through a digital cable or |
216 service, you may know that most video compression schemes | 221 satellite service, you may know that most video compression |
217 store each frame of video as a delta against its predecessor | 222 schemes store each frame of video as a delta against its |
218 frame. In addition, these schemes use <quote>lossy</quote> | 223 predecessor frame.</para> |
219 compression techniques to increase the compression ratio, so | 224 |
220 visual errors accumulate over the course of a number of | 225 <para id="x_2ff">Mercurial borrows this idea to make it |
221 inter-frame deltas.</para> | 226 possible to reconstruct a revision from a snapshot and a |
222 | 227 small number of deltas.</para> |
223 <para id="x_2ff">Because it's possible for a video stream to <quote>drop | |
224 out</quote> occasionally due to signal glitches, and to | |
225 limit the accumulation of artefacts introduced by the lossy | |
226 compression process, video encoders periodically insert a | |
227 complete frame (called a <quote>key frame</quote>) into the | |
228 video stream; the next delta is generated against that | |
229 frame. This means that if the video signal gets | |
230 interrupted, it will resume once the next key frame is | |
231 received. Also, the accumulation of encoding errors | |
232 restarts anew with each key frame.</para> | |
233 | 228 |
234 </sect3> | 229 </sect3> |
235 </sect2> | 230 </sect2> |
236 <sect2> | 231 <sect2> |
237 <title>Identification and strong integrity</title> | 232 <title>Identification and strong integrity</title> |
259 corrupted due to a hardware error or system bug, it's often | 254 corrupted due to a hardware error or system bug, it's often |
260 possible to reconstruct some or most revisions from the | 255 possible to reconstruct some or most revisions from the |
261 uncorrupted sections of the revlog, both before and after the | 256 uncorrupted sections of the revlog, both before and after the |
262 corrupted section. This would not be possible with a | 257 corrupted section. This would not be possible with a |
263 delta-only storage model.</para> | 258 delta-only storage model.</para> |
264 | |
265 </sect2> | 259 </sect2> |
266 </sect1> | 260 </sect1> |
261 | |
267 <sect1> | 262 <sect1> |
268 <title>Revision history, branching, and merging</title> | 263 <title>Revision history, branching, and merging</title> |
269 | 264 |
270 <para id="x_304">Every entry in a Mercurial revlog knows the identity of its | 265 <para id="x_304">Every entry in a Mercurial revlog knows the identity of its |
271 immediate ancestor revision, usually referred to as its | 266 immediate ancestor revision, usually referred to as its |
312 at the time that changeset was committed, and which revision of | 307 at the time that changeset was committed, and which revision of |
313 each file was then current. It then recreates a copy of each of | 308 each file was then current. It then recreates a copy of each of |
314 those files, with the same contents it had when the changeset | 309 those files, with the same contents it had when the changeset |
315 was committed.</para> | 310 was committed.</para> |
316 | 311 |
317 <para id="x_309">The <emphasis>dirstate</emphasis> contains Mercurial's | 312 <para id="x_309">The <emphasis>dirstate</emphasis> is a special |
318 knowledge of the working directory. This details which | 313 structure that contains Mercurial's knowledge of the working |
319 changeset the working directory is updated to, and all of the | 314 directory. It is maintained as a file named |
320 files that Mercurial is tracking in the working | 315 <filename>.hg/dirstate</filename> inside a repository. The |
321 directory.</para> | 316 dirstate details which changeset the working directory is |
317 updated to, and all of the files that Mercurial is tracking in | |
318 the working directory. It also lets Mercurial quickly notice | |
319 changed files, by recording their checkout times and | |
320 sizes.</para> | |
322 | 321 |
323 <para id="x_30a">Just as a revision of a revlog has room for two parents, so | 322 <para id="x_30a">Just as a revision of a revlog has room for two parents, so |
324 that it can represent either a normal revision (with one parent) | 323 that it can represent either a normal revision (with one parent) |
325 or a merge of two earlier revisions, the dirstate has slots for | 324 or a merge of two earlier revisions, the dirstate has slots for |
326 two parents. When you use the <command role="hg-cmd">hg | 325 two parents. When you use the <command role="hg-cmd">hg |
424 <textobject><phrase>XXX add text</phrase></textobject> | 423 <textobject><phrase>XXX add text</phrase></textobject> |
425 </mediaobject> | 424 </mediaobject> |
426 </figure> | 425 </figure> |
427 | 426 |
428 <note> | 427 <note> |
429 <para id="x_315"> If you're new to Mercurial, you should keep in mind a | 428 <para id="x_315">If you're new to Mercurial, you should keep |
430 common <quote>error</quote>, which is to use the <command | 429 in mind a common <quote>error</quote>, which is to use the |
431 role="hg-cmd">hg pull</command> command without any | 430 <command role="hg-cmd">hg pull</command> command without any |
432 options. By default, the <command role="hg-cmd">hg | 431 options. By default, the <command role="hg-cmd">hg |
433 pull</command> command <emphasis>does not</emphasis> | 432 pull</command> command <emphasis>does not</emphasis> |
434 update the working directory, so you'll bring new changesets | 433 update the working directory, so you'll bring new changesets |
435 into your repository, but the working directory will stay | 434 into your repository, but the working directory will stay |
436 synced at the same changeset as before the pull. If you | 435 synced at the same changeset as before the pull. If you |
437 make some changes and commit afterwards, you'll thus create | 436 make some changes and commit afterwards, you'll thus create |
438 a new head, because your working directory isn't synced to | 437 a new head, because your working directory isn't synced to |
439 whatever the current tip is.</para> | 438 whatever the current tip is. To combine the operation of a |
440 | 439 pull, followed by an update, run <command>hg pull |
441 <para id="x_316"> I put the word <quote>error</quote> in | 440 -u</command>.</para> |
442 quotes because all that you need to do to rectify this | 441 |
443 situation is <command role="hg-cmd">hg merge</command>, then | 442 <para id="x_316">I put the word <quote>error</quote> in quotes |
444 <command role="hg-cmd">hg commit</command>. In other words, | 443 because all that you need to do to rectify the situation |
445 this almost never has negative consequences; it's just | 444 where you created a new head by accident is |
446 something of a surprise for newcomers. I'll discuss other | 445 <command role="hg-cmd">hg merge</command>, then <command |
447 ways to avoid this behavior, and why Mercurial behaves in | 446 role="hg-cmd">hg commit</command>. In other words, this |
448 this initially surprising way, later on.</para> | 447 almost never has negative consequences; it's just something |
448 of a surprise for newcomers. I'll discuss other ways to | |
449 avoid this behavior, and why Mercurial behaves in this | |
450 initially surprising way, later on.</para> | |
449 </note> | 451 </note> |
450 | 452 |
451 </sect2> | 453 </sect2> |
452 <sect2> | 454 <sect2> |
453 <title>Merging changes</title> | 455 <title>Merging changes</title> |
509 changeset I'm about to commit</quote>. After the <command | 511 changeset I'm about to commit</quote>. After the <command |
510 role="hg-cmd">hg merge</command> command completes, the | 512 role="hg-cmd">hg merge</command> command completes, the |
511 working directory has two parents; these will become the | 513 working directory has two parents; these will become the |
512 parents of the new changeset.</para> | 514 parents of the new changeset.</para> |
513 | 515 |
514 <para id="x_322">Mercurial lets you perform multiple merges, but you must | 516 <para id="x_322">Mercurial lets you perform multiple merges, but |
515 commit the results of each individual merge as you go. This | 517 you must commit the results of each individual merge as you |
516 is necessary because Mercurial only tracks two parents for | 518 go. This is necessary because Mercurial only tracks two |
517 both revisions and the working directory. While it would be | 519 parents for both revisions and the working directory. While |
518 technically possible to merge multiple changesets at once, the | 520 it would be technically feasible to merge multiple changesets |
519 prospect of user confusion and making a terrible mess of a | 521 at once, Mercurial avoids this for simplicity. With multi-way |
520 merge immediately becomes overwhelming.</para> | 522 merges, the risks of user confusion, nasty conflict |
523 resolution, and making a terrible mess of a merge would grow | |
524 intolerable.</para> | |
521 | 525 |
522 </sect2> | 526 </sect2> |
523 | 527 |
524 <sect2> | 528 <sect2> |
525 <title>Merging and renames</title> | 529 <title>Merging and renames</title> |
596 compression of the entire stream (instead of a revision at a | 600 compression of the entire stream (instead of a revision at a |
597 time) substantially reduces the number of bytes to be | 601 time) substantially reduces the number of bytes to be |
598 transferred, yielding better network performance over most | 602 transferred, yielding better network performance over most |
599 kinds of network.</para> | 603 kinds of network.</para> |
600 | 604 |
601 <para id="x_329">(If the connection is over <command>ssh</command>, | 605 <para id="x_329">If the connection is over |
602 Mercurial <emphasis>doesn't</emphasis> recompress the | 606 <command>ssh</command>, Mercurial |
603 stream, because <command>ssh</command> can already do this | 607 <emphasis>doesn't</emphasis> recompress the stream, because |
604 itself.)</para> | 608 <command>ssh</command> can already do this itself. You can |
609 tell Mercurial to always use <command>ssh</command>'s | |
610 compression feature by editing the | |
611 <filename>.hgrc</filename> file in your home directory as | |
612 follows.</para> | |
613 | |
614 <programlisting>[ui] | |
615 ssh = ssh -C</programlisting> | |
605 | 616 |
606 </sect3> | 617 </sect3> |
607 </sect2> | 618 </sect2> |
608 <sect2> | 619 <sect2> |
609 <title>Read/write ordering and atomicity</title> | 620 <title>Read/write ordering and atomicity</title> |
610 | 621 |
611 <para id="x_32a">Appending to files isn't the whole story when | 622 <para id="x_32a">Appending to files isn't the whole story when |
612 it comes to guaranteeing that a reader won't see a partial | 623 it comes to guaranteeing that a reader won't see a partial |
613 write. If you recall <xref linkend="fig:concepts:metadata"/>, | 624 write. If you recall <xref linkend="fig:concepts:metadata"/>, |
614 revisions in | 625 revisions in the changelog point to revisions in the manifest, |
615 the changelog point to revisions in the manifest, and | 626 and revisions in the manifest point to revisions in filelogs. |
616 revisions in the manifest point to revisions in filelogs. | |
617 This hierarchy is deliberate.</para> | 627 This hierarchy is deliberate.</para> |
618 | 628 |
619 <para id="x_32b">A writer starts a transaction by writing filelog and | 629 <para id="x_32b">A writer starts a transaction by writing filelog and |
620 manifest data, and doesn't write any changelog data until | 630 manifest data, and doesn't write any changelog data until |
621 those are finished. A reader starts by reading changelog | 631 those are finished. A reader starts by reading changelog |
635 Mercurial never needs to <emphasis>lock</emphasis> a | 645 Mercurial never needs to <emphasis>lock</emphasis> a |
636 repository when it's reading data, even if the repository is | 646 repository when it's reading data, even if the repository is |
637 being written to while the read is occurring. This has a big | 647 being written to while the read is occurring. This has a big |
638 effect on scalability; you can have an arbitrary number of | 648 effect on scalability; you can have an arbitrary number of |
639 Mercurial processes safely reading data from a repository | 649 Mercurial processes safely reading data from a repository |
640 safely all at once, no matter whether it's being written to or | 650 all at once, no matter whether it's being written to or |
641 not.</para> | 651 not.</para> |
642 | 652 |
643 <para id="x_32e">The lockless nature of reading means that if you're | 653 <para id="x_32e">The lockless nature of reading means that if you're |
644 sharing a repository on a multi-user system, you don't need to | 654 sharing a repository on a multi-user system, you don't need to |
645 grant other local users permission to | 655 grant other local users permission to |
707 this idea of making a complete private copy of a file is not | 717 this idea of making a complete private copy of a file is not |
708 very efficient in its use of storage. While this is true, | 718 very efficient in its use of storage. While this is true, |
709 storage is cheap, and this method gives the highest | 719 storage is cheap, and this method gives the highest |
710 performance while deferring most book-keeping to the operating | 720 performance while deferring most book-keeping to the operating |
711 system. An alternative scheme would most likely reduce | 721 system. An alternative scheme would most likely reduce |
712 performance and increase the complexity of the software, each | 722 performance and increase the complexity of the software, but |
713 of which is much more important to the <quote>feel</quote> of | 723 speed and simplicity are key to the <quote>feel</quote> of |
714 day-to-day use.</para> | 724 day-to-day use.</para> |
715 | 725 |
716 </sect2> | 726 </sect2> |
717 <sect2> | 727 <sect2> |
718 <title>Other contents of the dirstate</title> | 728 <title>Other contents of the dirstate</title> |
729 <command role="hg-cmd">hg rename</command> or <command | 739 <command role="hg-cmd">hg rename</command> or <command |
730 role="hg-cmd">hg copy</command> files, Mercurial updates the | 740 role="hg-cmd">hg copy</command> files, Mercurial updates the |
731 dirstate so that it knows what to do with those files when you | 741 dirstate so that it knows what to do with those files when you |
732 commit.</para> | 742 commit.</para> |
733 | 743 |
734 <para id="x_337">When Mercurial is checking the states of files in the | 744 <para id="x_337">The dirstate helps Mercurial to efficiently |
735 working directory, it first checks a file's modification time. | 745 check the status of files in a repository.</para> |
736 If that has not changed, the file must not have been modified. | 746 |
737 If the file's size has changed, the file must have been | 747 <itemizedlist> |
738 modified. If the modification time has changed, but the size | 748 <listitem> |
739 has not, only then does Mercurial need to read the actual | 749 <para>When Mercurial checks the state of a file in the |
740 contents of the file to see if they've changed. Storing these | 750 working directory, it first checks a file's modification |
741 few extra pieces of information dramatically reduces the | 751 time against the time in the dirstate that records when |
742 amount of data that Mercurial needs to read, which yields | 752 Mercurial last wrote the file. If the last modified time |
743 large performance improvements compared to other revision | 753 is the same as the time when Mercurial wrote the file, the |
744 control systems.</para> | 754 file must not have been modified, so Mercurial does not |
745 | 755 need to check any further.</para> |
756 </listitem> | |
757 <listitem> | |
758 <para>If the file's size has changed, the file must have | |
759 been modified. If the modification time has changed, but | |
760 the size has not, only then does Mercurial need to | |
761 actually read the contents of the file to see if it has | |
762 changed.</para> | |
763 </listitem> | |
764 </itemizedlist> | |
765 | |
766 <para>Storing the modification time and size dramatically | |
767 reduces the number of read operations that Mercurial needs to | |
768 perform when we run commands like <command>hg status</command>. | |
769 This results in large performance improvements.</para> | |
746 </sect2> | 770 </sect2> |
747 </sect1> | 771 </sect1> |
748 </chapter> | 772 </chapter> |
749 | 773 |
750 <!-- | 774 <!-- |