diff en/ch03-concepts.xml @ 828:477d6a3e5023

Many final changes.
author Bryan O'Sullivan <bos@serpentine.com>
date Mon, 04 May 2009 23:52:38 -0700
parents 29f0f79cf614
children 18131160f7ee
line wrap: on
line diff
--- a/en/ch03-concepts.xml	Sun May 03 20:27:28 2009 -0700
+++ b/en/ch03-concepts.xml	Mon May 04 23:52:38 2009 -0700
@@ -112,12 +112,15 @@
       <para id="x_2f3">As the illustration shows, there is
 	<emphasis>not</emphasis> a <quote>one to one</quote>
 	relationship between revisions in the changelog, manifest, or
-	filelog. If the manifest hasn't changed between two
-	changesets, the changelog entries for those changesets will
-	point to the same revision of the manifest.  If a file that
+	filelog. If a file that
 	Mercurial tracks hasn't changed between two changesets, the
 	entry for that file in the two revisions of the manifest will
-	point to the same revision of its filelog.</para>
+	point to the same revision of its filelog<footnote>
+	  <para>It is possible (though unusual) for the manifest to
+	    remain the same between two changesets, in which case the
+	    changelog entries for those changesets will point to the
+	    same revision of the manifest.</para>
+	</footnote>.</para>
 
     </sect2>
   </sect1>
@@ -175,16 +178,18 @@
     <sect2>
       <title>Fast retrieval</title>
 
-      <para id="x_2fa">Mercurial cleverly avoids a pitfall common to all earlier
-	revision control systems: the problem of <emphasis>inefficient
-	  retrieval</emphasis>. Most revision control systems store
-	the contents of a revision as an incremental series of
-	modifications against a <quote>snapshot</quote>.  To
-	reconstruct a specific revision, you must first read the
-	snapshot, and then every one of the revisions between the
-	snapshot and your target revision.  The more history that a
-	file accumulates, the more revisions you must read, hence the
-	longer it takes to reconstruct a particular revision.</para>
+      <para id="x_2fa">Mercurial cleverly avoids a pitfall common to
+	all earlier revision control systems: the problem of
+	<emphasis>inefficient retrieval</emphasis>. Most revision
+	control systems store the contents of a revision as an
+	incremental series of modifications against a
+	<quote>snapshot</quote>.  (Some base the snapshot on the
+	oldest revision, others on the newest.)  To reconstruct a
+	specific revision, you must first read the snapshot, and then
+	every one of the revisions between the snapshot and your
+	target revision.  The more history that a file accumulates,
+	the more revisions you must read, hence the longer it takes to
+	reconstruct a particular revision.</para>
 
       <figure id="fig:concepts:snapshot">
 	<title>Snapshot of a revlog, with incremental deltas</title>
@@ -211,25 +216,15 @@
       <sect3>
 	<title>Aside: the influence of video compression</title>
 
-	<para id="x_2fe">If you're familiar with video compression or have ever
-	  watched a TV feed through a digital cable or satellite
-	  service, you may know that most video compression schemes
-	  store each frame of video as a delta against its predecessor
-	  frame.  In addition, these schemes use <quote>lossy</quote>
-	  compression techniques to increase the compression ratio, so
-	  visual errors accumulate over the course of a number of
-	  inter-frame deltas.</para>
+	<para id="x_2fe">If you're familiar with video compression or
+	  have ever watched a TV feed through a digital cable or
+	  satellite service, you may know that most video compression
+	  schemes store each frame of video as a delta against its
+	  predecessor frame.</para>
 
-	<para id="x_2ff">Because it's possible for a video stream to <quote>drop
-	    out</quote> occasionally due to signal glitches, and to
-	  limit the accumulation of artefacts introduced by the lossy
-	  compression process, video encoders periodically insert a
-	  complete frame (called a <quote>key frame</quote>) into the
-	  video stream; the next delta is generated against that
-	  frame.  This means that if the video signal gets
-	  interrupted, it will resume once the next key frame is
-	  received.  Also, the accumulation of encoding errors
-	  restarts anew with each key frame.</para>
+	<para id="x_2ff">Mercurial borrows this idea to make it
+	  possible to reconstruct a revision from a snapshot and a
+	  small number of deltas.</para>
 
       </sect3>
     </sect2>
@@ -261,9 +256,9 @@
 	uncorrupted sections of the revlog, both before and after the
 	corrupted section.  This would not be possible with a
 	delta-only storage model.</para>
-
     </sect2>
   </sect1>
+
   <sect1>
     <title>Revision history, branching, and merging</title>
 
@@ -314,11 +309,15 @@
       those files, with the same contents it had when the changeset
       was committed.</para>
 
-    <para id="x_309">The <emphasis>dirstate</emphasis> contains Mercurial's
-      knowledge of the working directory.  This details which
-      changeset the working directory is updated to, and all of the
-      files that Mercurial is tracking in the working
-      directory.</para>
+    <para id="x_309">The <emphasis>dirstate</emphasis> is a special
+      structure that contains Mercurial's knowledge of the working
+      directory.  It is maintained as a file named
+      <filename>.hg/dirstate</filename> inside a repository.  The
+      dirstate details which changeset the working directory is
+      updated to, and all of the files that Mercurial is tracking in
+      the working directory. It also lets Mercurial quickly notice
+      changed files, by recording their checkout times and
+      sizes.</para>
 
     <para id="x_30a">Just as a revision of a revlog has room for two parents, so
       that it can represent either a normal revision (with one parent)
@@ -426,9 +425,9 @@
       </figure>
 
       <note>
-	<para id="x_315">  If you're new to Mercurial, you should keep in mind a
-	  common <quote>error</quote>, which is to use the <command
-	    role="hg-cmd">hg pull</command> command without any
+	<para id="x_315">If you're new to Mercurial, you should keep
+	  in mind a common <quote>error</quote>, which is to use the
+	  <command role="hg-cmd">hg pull</command> command without any
 	  options.  By default, the <command role="hg-cmd">hg
 	    pull</command> command <emphasis>does not</emphasis>
 	  update the working directory, so you'll bring new changesets
@@ -436,16 +435,19 @@
 	  synced at the same changeset as before the pull.  If you
 	  make some changes and commit afterwards, you'll thus create
 	  a new head, because your working directory isn't synced to
-	  whatever the current tip is.</para>
+	  whatever the current tip is.  To combine the operation of a
+	  pull, followed by an update, run <command>hg pull
+	    -u</command>.</para>
 
-	<para id="x_316">  I put the word <quote>error</quote> in
-	  quotes because all that you need to do to rectify this
-	  situation is <command role="hg-cmd">hg merge</command>, then
-	  <command role="hg-cmd">hg commit</command>.  In other words,
-	  this almost never has negative consequences; it's just
-	  something of a surprise for newcomers.  I'll discuss other
-	  ways to avoid this behavior, and why Mercurial behaves in
-	  this initially surprising way, later on.</para>
+	<para id="x_316">I put the word <quote>error</quote> in quotes
+	  because all that you need to do to rectify the situation
+	  where you created a new head by accident is
+	  <command role="hg-cmd">hg merge</command>, then <command
+	    role="hg-cmd">hg commit</command>.  In other words, this
+	  almost never has negative consequences; it's just something
+	  of a surprise for newcomers.  I'll discuss other ways to
+	  avoid this behavior, and why Mercurial behaves in this
+	  initially surprising way, later on.</para>
       </note>
 
     </sect2>
@@ -511,13 +513,15 @@
 	working directory has two parents; these will become the
 	parents of the new changeset.</para>
 
-      <para id="x_322">Mercurial lets you perform multiple merges, but you must
-	commit the results of each individual merge as you go.  This
-	is necessary because Mercurial only tracks two parents for
-	both revisions and the working directory.  While it would be
-	technically possible to merge multiple changesets at once, the
-	prospect of user confusion and making a terrible mess of a
-	merge immediately becomes overwhelming.</para>
+      <para id="x_322">Mercurial lets you perform multiple merges, but
+	you must commit the results of each individual merge as you
+	go.  This is necessary because Mercurial only tracks two
+	parents for both revisions and the working directory.  While
+	it would be technically feasible to merge multiple changesets
+	at once, Mercurial avoids this for simplicity.  With multi-way
+	merges, the risks of user confusion, nasty conflict
+	resolution, and making a terrible mess of a merge would grow
+	intolerable.</para>
 
     </sect2>
 
@@ -598,10 +602,17 @@
 	  transferred, yielding better network performance over most
 	  kinds of network.</para>
 
-	<para id="x_329">(If the connection is over <command>ssh</command>,
-	  Mercurial <emphasis>doesn't</emphasis> recompress the
-	  stream, because <command>ssh</command> can already do this
-	  itself.)</para>
+	<para id="x_329">If the connection is over
+	  <command>ssh</command>, Mercurial
+	  <emphasis>doesn't</emphasis> recompress the stream, because
+	  <command>ssh</command> can already do this itself.  You can
+	  tell Mercurial to always use <command>ssh</command>'s
+	  compression feature by editing the
+	  <filename>.hgrc</filename> file in your home directory as
+	  follows.</para>
+
+	<programlisting>[ui]
+ssh = ssh -C</programlisting>
 
       </sect3>
     </sect2>
@@ -611,9 +622,8 @@
       <para id="x_32a">Appending to files isn't the whole story when
 	it comes to guaranteeing that a reader won't see a partial
 	write.  If you recall <xref linkend="fig:concepts:metadata"/>,
-	revisions in
-	the changelog point to revisions in the manifest, and
-	revisions in the manifest point to revisions in filelogs.
+	revisions in the changelog point to revisions in the manifest,
+	and revisions in the manifest point to revisions in filelogs.
 	This hierarchy is deliberate.</para>
 
       <para id="x_32b">A writer starts a transaction by writing filelog and
@@ -637,7 +647,7 @@
 	being written to while the read is occurring. This has a big
 	effect on scalability; you can have an arbitrary number of
 	Mercurial processes safely reading data from a repository
-	safely all at once, no matter whether it's being written to or
+	all at once, no matter whether it's being written to or
 	not.</para>
 
       <para id="x_32e">The lockless nature of reading means that if you're
@@ -709,8 +719,8 @@
 	storage is cheap, and this method gives the highest
 	performance while deferring most book-keeping to the operating
 	system.  An alternative scheme would most likely reduce
-	performance and increase the complexity of the software, each
-	of which is much more important to the <quote>feel</quote> of
+	performance and increase the complexity of the software, but
+	speed and simplicity are key to the <quote>feel</quote> of
 	day-to-day use.</para>
 
     </sect2>
@@ -731,18 +741,32 @@
 	dirstate so that it knows what to do with those files when you
 	commit.</para>
 
-      <para id="x_337">When Mercurial is checking the states of files in the
-	working directory, it first checks a file's modification time.
-	If that has not changed, the file must not have been modified.
-	If the file's size has changed, the file must have been
-	modified.  If the modification time has changed, but the size
-	has not, only then does Mercurial need to read the actual
-	contents of the file to see if they've changed. Storing these
-	few extra pieces of information dramatically reduces the
-	amount of data that Mercurial needs to read, which yields
-	large performance improvements compared to other revision
-	control systems.</para>
+      <para id="x_337">The dirstate helps Mercurial to efficiently
+	  check the status of files in a repository.</para>
 
+      <itemizedlist>
+	<listitem>
+	  <para>When Mercurial checks the state of a file in the
+	    working directory, it first checks a file's modification
+	    time against the time in the dirstate that records when
+	    Mercurial last wrote the file. If the last modified time
+	    is the same as the time when Mercurial wrote the file, the
+	    file must not have been modified, so Mercurial does not
+	    need to check any further.</para>
+	</listitem>
+	<listitem>
+	  <para>If the file's size has changed, the file must have
+	    been modified.  If the modification time has changed, but
+	    the size has not, only then does Mercurial need to
+	    actually read the contents of the file to see if it has
+	    changed.</para>
+	</listitem>
+      </itemizedlist>
+
+      <para>Storing the modification time and size dramatically
+	reduces the number of read operations that Mercurial needs to
+	perform when we run commands like <command>hg status</command>.
+	This results in large performance improvements.</para>
     </sect2>
   </sect1>
 </chapter>