changeset 110:75c076c7a374

More concepts stuff.
author Bryan O'Sullivan <bos@serpentine.com>
date Fri, 10 Nov 2006 15:09:49 -0800
parents 1b67dc96f27a
children 34b8b7a15ea1
files en/Makefile en/concepts.tex en/metadata.svg
diffstat 3 files changed, 126 insertions(+), 47 deletions(-) [+]
line wrap: on
line diff
--- a/en/Makefile	Fri Nov 10 12:42:00 2006 -0800
+++ b/en/Makefile	Fri Nov 10 15:09:49 2006 -0800
@@ -25,6 +25,7 @@
 	kdiff3.png \
 	metadata.svg \
 	mq-stack.svg \
+	snapshot.svg \
 	tour-history.svg \
 	tour-merge-conflict.svg \
 	tour-merge-merge.svg \
--- a/en/concepts.tex	Fri Nov 10 12:42:00 2006 -0800
+++ b/en/concepts.tex	Fri Nov 10 15:09:49 2006 -0800
@@ -77,15 +77,15 @@
   \label{fig:concepts:metadata}
 \end{figure}
 
-Note that there is not a ``one to one'' relationship between revisions
-in these different metadata files.  If the manifest hasn't changed
-between two changesets, their changelog entries will point to the same
-revision of the manifest.  If a file that Mercurial tracks hasn't
-changed between two changesets, the entry for that file in the two
-revisions of the manifest will point to the same revision of its
-filelog.
+As the illustration shows, there is \emph{not} a ``one to one''
+relationship between revisions in the changelog, manifest, or filelog.
+If the manifest hasn't changed between two changesets, the changelog
+entries for those changesets will point to the same revision of the
+manifest.  If a file that Mercurial tracks hasn't changed between two
+changesets, the entry for that file in the two revisions of the
+manifest will point to the same revision of its filelog.
 
-\section{An efficient, unified, safe storage mechanism}
+\section{Safe, efficient storage}
 
 The underpinnings of changelogs, manifests, and filelogs are provided
 by a single structure called the \emph{revlog}.
@@ -136,13 +136,24 @@
 revisions you must read, hence the longer it takes to reconstruct a
 particular revision.
 
+\begin{figure}[ht]
+  \centering
+  \grafix{snapshot}
+  \caption{Snapshot of a revlog, with incremental deltas}
+  \label{fig:concepts:snapshot}
+\end{figure}
+
 The innovation that Mercurial applies to this problem is simple but
 effective.  Once the cumulative amount of delta information stored
 since the last snapshot exceeds a fixed threshold, it stores a new
 snapshot (compressed, of course), instead of another delta.  This
 makes it possible to reconstruct \emph{any} revision of a file
-quickly.  This approach works so well that it has subsequently been
-copied by several other revision control systems.
+quickly.  This approach works so well that it has since been copied by
+several other revision control systems.
+
+Figure~\ref{fig:concepts:snapshot} illustrates the idea.  In an entry
+in a revlog's index file, Mercurial stores the range of entries from
+the data file that it must read to reconstruct a particular revision.
 
 \subsubsection{Aside: the influence of video compression}
 
@@ -163,6 +174,61 @@
 the next key frame is received.  Also, the accumulation of encoding
 errors restarts anew with each key frame.
 
+\subsection{Strong integrity}
+
+Along with delta or snapshot information, a revlog entry contains a
+cryptographic hash of the data that it represents.  This makes it
+difficult to forge the contents of a revision, and easy to detect
+accidental corruption.
+
+Mercurial checks these hashes when retrieving file revisions and when
+pulling changes from a repository.  If it encounters an integrity
+problem, it will complain and stop whatever it's doing.
+
+In addition to the effect it has on retrieval efficiency, Mercurial's
+use of periodic snapshots makes it more robust against partial data
+corruption.  If a revlog becomes partly corrupted due to a hardware
+error or system bug, it's often possible to reconstruct some or most
+revisions from the uncorrupted sections of the revlog, both before and
+after the corrupted section.  This would not be possible with a
+delta-only storage model.
+
+\section{The working directory}
+
+Mercurial's good ideas are not confined to the repository; it also
+needs to manage the working directory.  The \emph{dirstate} contains
+Mercurial's knowledge of the working directory.  This details which
+revision(s) the working directory is updated to, and all files that
+Mercurial is tracking in the working directory.
+
+Because Mercurial doesn't force you to tell it when you're modifying a
+file, it uses the dirstate to store some extra information so it can
+determine efficiently whether you have modified a file.  For each file
+in the working directory, it stores the time that it last modified the
+file itself, and the size of the file at that time.  
+
+When Mercurial is checking the states of files in the working
+directory, it first checks a file's modification time.  If that has
+not changed, the file must not have been modified.  If the file's size
+has changed, the file must have been modified.  If the modification
+time has changed, but the size has not, only then does Mercurial need
+to read the actual contents of the file to see if they've changed.
+Storing these few extra pieces of information dramatically reduces the
+amount of data that Mercurial needs to read, which yields large
+performance improvements compared to other revision control systems.
+
+\section{Other interesting design features}
+
+In the sections above, I've tried to highlight some of the most
+important aspects of Mercurial's design, to illustrate that it pays
+careful attention to reliability and performance.  However, the
+attention to detail doesn't stop there.  There are a number of other
+aspects of Mercurial's construction that I personally find
+interesting.  I'll detail a few of them here, separate from the ``big
+ticket'' items above, so that if you're interested, you can gain a
+better idea of the amount of thinking that goes into a well-designed
+system.
+
 \subsection{Clever compression}
 
 When appropriate, Mercurial will store both snapshots and deltas in
@@ -183,24 +249,26 @@
 the snapshot, again saving space compared to a naive delta-only
 approach.
 
-\subsection{Strong integrity}
+\subsubsection{Network recompression}
 
-Along with delta or snapshot information, a revlog entry contains a
-cryptographic hash of the data that it represents.  This makes it
-difficult to forge the contents of a revision, and easy to detect
-accidental corruption.
+When storing revisions on disk, Mercurial uses the ``deflate''
+compression algorithm (the same one used by the popular \texttt{zip}
+archive format), which balances good speed with a respectable
+compression ratio.  However, when transmitting revision data over a
+network connection, Mercurial uncompresses the compressed revision
+data.
 
-Mercurial checks these hashes when retrieving file revisions and when
-pulling changes from a repository.  If it encounters an integrity
-problem, it will complain and stop whatever it's doing.
+If the connection is over HTTP, Mercurial recompresses the entire
+stream of data using a compression algorithm that gives a etter
+compression ratio (the Burrows-Wheeler algorithm from the widely used
+\texttt{bzip2} compression package).  This combination of algorithm
+and compression of the entire stream (instead of a revision at a time)
+substantially reduces the number of bytes to be transferred, yielding
+better network performance over almost all kinds of network.
 
-In addition to the effect it has on retrieval efficiency, Mercurial's
-use of periodic snapshots makes it more robust against partial data
-corruption.  If a revlog becomes partly corrupted due to a hardware
-error or system bug, it's often possible to reconstruct some or most
-revisions from the uncorrupted sections of the revlog, both before and
-after the corrupted section.  This would not be possible with a
-delta-only storage model.
+(If the connection is over \command{ssh}, Mercurial \emph{doesn't}
+recompress the stream, because \command{ssh} can already do this
+itself.)
 
 \subsection{Read/write ordering and atomicity}
 
@@ -241,15 +309,25 @@
 makes for all kinds of nasty and annoying security and administrative
 problems.)
 
-Mercurial uses a locking mechanism to ensure that only one process can
-write to a repository at a time.  This locking mechanism is safe even
-over filesystems that are notoriously unsafe for locking, such as NFS.
-If a repository is locked, a writer will wait for a while to retry if
-the repository becomes unlocked, but if the repository remains locked
-for too long, the process attempting to write will time out after a
-while.  This means that your daily automated scripts won't get stuck
-forever and pile up if a system crashes unnoticed, for example.  (Yes,
-the timeout is configurable, from zero to infinity.)
+Mercurial uses locks to ensure that only one process can write to a
+repository at a time (the locking mechanism is safe even over
+filesystems that are notoriously hostile to locking, such as NFS).  If
+a repository is locked, a writer will wait for a while to retry if the
+repository becomes unlocked, but if the repository remains locked for
+too long, the process attempting to write will time out after a while.
+This means that your daily automated scripts won't get stuck forever
+and pile up if a system crashes unnoticed, for example.  (Yes, the
+timeout is configurable, from zero to infinity.)
+
+\subsubsection{Safe dirstate access}
+
+As with revision data, Mercurial doesn't take a lock to read the
+dirstate file; it does acquire a lock to write it.  To avoid the
+possibility of reading a partially written copy of the dirstate file,
+Mercurial writes to a file with a unique name in the same directory as
+the dirstate file, then renames the temporary file atomically to
+\filename{dirstate}.  The file named \filename{dirstate} is thus
+guaranteed to be complete, not partially written.
 
 
 
--- a/en/metadata.svg	Fri Nov 10 12:42:00 2006 -0800
+++ b/en/metadata.svg	Fri Nov 10 15:09:49 2006 -0800
@@ -67,7 +67,7 @@
      id="layer1">
     <path
        style="fill:none;fill-opacity:0.75;fill-rule:evenodd;stroke:#a7a7a7;stroke-width:1.5;stroke-linecap:butt;stroke-linejoin:miter;marker-end:url(#Arrow1Mend);stroke-miterlimit:4;stroke-dasharray:4.5, 1.5;stroke-dashoffset:0;stroke-opacity:1;display:inline"
-       d="M 326.94646,471.18359 L 326.94646,510.98123"
+       d="M 326.94646,467.18359 L 326.94646,510.98123"
        id="path1910"
        inkscape:connector-type="polyline"
        inkscape:connection-end="#rect2962"
@@ -87,8 +87,8 @@
        inkscape:connection-end="#rect3038"
        inkscape:connection-start="#rect2962" />
     <path
-       style="fill:none;fill-opacity:0.75;fill-rule:evenodd;stroke:#484848;stroke-width:1.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#Arrow1Mend);stroke-miterlimit:4;stroke-dasharray:4.5,1.5;stroke-dashoffset:0"
-       d="M 254.23217,471.18359 L 254.23216,510.98123"
+       style="fill:none;fill-opacity:0.75;fill-rule:evenodd;stroke:#484848;stroke-width:1.5;stroke-linecap:butt;stroke-linejoin:miter;marker-end:url(#Arrow1Mend);stroke-miterlimit:4;stroke-dasharray:4.5, 1.5;stroke-dashoffset:0;stroke-opacity:1"
+       d="M 254.23217,467.18359 L 254.23216,510.98123"
        id="path3088"
        inkscape:connector-type="polyline"
        inkscape:connection-start="#rect1872"
@@ -113,52 +113,52 @@
        width="51.42857"
        height="20"
        x="228.51788"
-       y="450.68359" />
+       y="446.68359" />
     <rect
        style="fill:#cacbfb;fill-opacity:1;stroke:#595959;stroke-width:1;stroke-linecap:round;stroke-linejoin:round;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
        id="rect2764"
        width="51.42857"
        height="20"
        x="301.23218"
-       y="450.68359" />
+       y="446.68359" />
     <rect
        style="fill:#cacbfb;fill-opacity:1;stroke:#595959;stroke-width:1;stroke-linecap:round;stroke-linejoin:round;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
        id="rect2766"
        width="51.42857"
        height="20"
        x="155.80359"
-       y="450.68359" />
+       y="446.68359" />
     <rect
        style="fill:#cacbfb;fill-opacity:1;stroke:#595959;stroke-width:1;stroke-linecap:round;stroke-linejoin:round;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
        id="rect2768"
        width="51.42857"
        height="20"
        x="83.089294"
-       y="450.68359" />
+       y="446.68359" />
     <path
        style="fill:none;fill-opacity:0.75;fill-rule:evenodd;stroke:#747474;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;marker-end:url(#Arrow1Mend);stroke-opacity:1"
-       d="M 135.01786,460.68359 L 155.30359,460.68359"
+       d="M 135.01786,456.68359 L 155.30359,456.68359"
        id="path2770"
        inkscape:connector-type="polyline"
        inkscape:connection-start="#rect2768"
        inkscape:connection-end="#rect2766" />
     <path
        style="fill:none;fill-opacity:0.75;fill-rule:evenodd;stroke:#747474;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;marker-end:url(#Arrow1Mend);stroke-opacity:1"
-       d="M 207.73216,460.68359 L 228.01788,460.68359"
+       d="M 207.73216,456.68359 L 228.01788,456.68359"
        id="path2772"
        inkscape:connector-type="polyline"
        inkscape:connection-start="#rect2766"
        inkscape:connection-end="#rect1872" />
     <path
        style="fill:none;fill-opacity:0.75;fill-rule:evenodd;stroke:#747474;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;marker-end:url(#Arrow1Mend);stroke-opacity:1"
-       d="M 280.44645,460.68359 L 300.73218,460.68359"
+       d="M 280.44645,456.68359 L 300.73218,456.68359"
        id="path2774"
        inkscape:connector-type="polyline"
        inkscape:connection-start="#rect1872"
        inkscape:connection-end="#rect2764" />
     <path
        style="fill:none;fill-opacity:0.75;fill-rule:evenodd;stroke:#747474;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;marker-end:url(#Arrow1Mend);stroke-miterlimit:4;stroke-dasharray:3, 3;stroke-dashoffset:0;stroke-opacity:1"
-       d="M 62.303571,460.68359 L 82.589293,460.68359"
+       d="M 62.303571,456.68359 L 82.589294,456.68359"
        id="path2778"
        inkscape:connector-type="polyline"
        inkscape:connection-end="#rect2768" />
@@ -298,12 +298,12 @@
        xml:space="preserve"
        style="font-size:12px;font-style:normal;font-weight:normal;fill:black;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;font-family:Times New Roman"
        x="82.072548"
-       y="436.64789"
+       y="432.64789"
        id="text3094"><tspan
          sodipodi:role="line"
          id="tspan3096"
          x="82.072548"
-         y="436.64789">Changelog</tspan></text>
+         y="432.64789">Changelog</tspan></text>
     <text
        xml:space="preserve"
        style="font-size:12px;font-style:normal;font-weight:normal;fill:black;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;font-family:Times New Roman"