diff en/concepts.tex @ 112:2fcead053b7a

More. Concept. Fun.
author Bryan O'Sullivan <bos@serpentine.com>
date Mon, 13 Nov 2006 13:21:29 -0800
parents 34b8b7a15ea1
children a0f57b3e677e
line wrap: on
line diff
--- a/en/concepts.tex	Fri Nov 10 15:32:33 2006 -0800
+++ b/en/concepts.tex	Mon Nov 13 13:21:29 2006 -0800
@@ -12,6 +12,10 @@
 the software is doing when I perform a revision control task, I'm less
 likely to be surprised by its behaviour.
 
+In this chapter, we'll initially cover the core concepts behind
+Mercurial's design, then continue to discuss some of the interesting
+details of its implementation.
+
 \section{Mercurial's historical record}
 
 \subsection{Tracking the history of a single file}
@@ -174,19 +178,23 @@
 the next key frame is received.  Also, the accumulation of encoding
 errors restarts anew with each key frame.
 
-\subsection{Strong integrity}
+\subsection{Identification and strong integrity}
 
 Along with delta or snapshot information, a revlog entry contains a
 cryptographic hash of the data that it represents.  This makes it
 difficult to forge the contents of a revision, and easy to detect
-accidental corruption.  The hash that Mercurial uses is SHA-1, which
-is 160 bits long.  Although all revision data is hashed, the changeset
+accidental corruption.  
+
+Hashes provide more than a mere check against corruption; they are
+used as the identifiers for revisions.  The changeset identification
 hashes that you see as an end user are from revisions of the
-changelog.  Manifest and file hashes are only used behind the scenes.
+changelog.  Although filelogs and the manifest also use hashes,
+Mercurial only uses these behind the scenes.
 
-Mercurial checks these hashes when retrieving file revisions and when
-pulling changes from a repository.  If it encounters an integrity
-problem, it will complain and stop whatever it's doing.
+Mercurial verifies that hashes are correct when it retrieves file
+revisions and when it pulls changes from another repository.  If it
+encounters an integrity problem, it will complain and stop whatever
+it's doing.
 
 In addition to the effect it has on retrieval efficiency, Mercurial's
 use of periodic snapshots makes it more robust against partial data
@@ -220,6 +228,35 @@
 amount of data that Mercurial needs to read, which yields large
 performance improvements compared to other revision control systems.
 
+\section{Revision history, branching,
+  and merging}
+
+Every entry in a Mercurial revlog knows the identity of its immediate
+ancestor revision, usually referred to as its \emph{parent}.  In fact,
+a revision contains room for not one parent, but two.  Mercurial uses
+a special hash, called the ``null ID'', to represent the idea ``there
+is no parent here''.  This hash is simply a string of zeroes.
+
+In figure~\ref{fig:concepts:revlog}, you can see an example of the
+conceptual structure of a revlog.  Filelogs, manifests, and changelogs
+all have this same structure; they differ only in the kind of data
+stored in each delta or snapshot.
+
+The first revision in a revlog (at the bottom of the image) has the
+null ID in both of its parent slots.  For a ``normal'' revision, its
+first parent slot contains the ID of its parent revision, and its
+second contains the null ID, indicating that the revision has only one
+real parent.  Any two revisions that have the same parent ID are
+branches.  A revision that represents a merge between branches has two
+normal revision IDs in its parent slots.
+
+\begin{figure}[ht]
+  \centering
+  \grafix{revlog}
+  \caption{}
+  \label{fig:concepts:revlog}
+\end{figure}
+
 \section{Other interesting design features}
 
 In the sections above, I've tried to highlight some of the most