Mercurial > hgbook
changeset 110:75c076c7a374
More concepts stuff.
author | Bryan O'Sullivan <bos@serpentine.com> |
---|---|
date | Fri, 10 Nov 2006 15:09:49 -0800 |
parents | 1b67dc96f27a |
children | 34b8b7a15ea1 |
files | en/Makefile en/concepts.tex en/metadata.svg |
diffstat | 3 files changed, 126 insertions(+), 47 deletions(-) [+] |
line wrap: on
line diff
--- a/en/Makefile Fri Nov 10 12:42:00 2006 -0800 +++ b/en/Makefile Fri Nov 10 15:09:49 2006 -0800 @@ -25,6 +25,7 @@ kdiff3.png \ metadata.svg \ mq-stack.svg \ + snapshot.svg \ tour-history.svg \ tour-merge-conflict.svg \ tour-merge-merge.svg \
--- a/en/concepts.tex Fri Nov 10 12:42:00 2006 -0800 +++ b/en/concepts.tex Fri Nov 10 15:09:49 2006 -0800 @@ -77,15 +77,15 @@ \label{fig:concepts:metadata} \end{figure} -Note that there is not a ``one to one'' relationship between revisions -in these different metadata files. If the manifest hasn't changed -between two changesets, their changelog entries will point to the same -revision of the manifest. If a file that Mercurial tracks hasn't -changed between two changesets, the entry for that file in the two -revisions of the manifest will point to the same revision of its -filelog. +As the illustration shows, there is \emph{not} a ``one to one'' +relationship between revisions in the changelog, manifest, or filelog. +If the manifest hasn't changed between two changesets, the changelog +entries for those changesets will point to the same revision of the +manifest. If a file that Mercurial tracks hasn't changed between two +changesets, the entry for that file in the two revisions of the +manifest will point to the same revision of its filelog. -\section{An efficient, unified, safe storage mechanism} +\section{Safe, efficient storage} The underpinnings of changelogs, manifests, and filelogs are provided by a single structure called the \emph{revlog}. @@ -136,13 +136,24 @@ revisions you must read, hence the longer it takes to reconstruct a particular revision. +\begin{figure}[ht] + \centering + \grafix{snapshot} + \caption{Snapshot of a revlog, with incremental deltas} + \label{fig:concepts:snapshot} +\end{figure} + The innovation that Mercurial applies to this problem is simple but effective. Once the cumulative amount of delta information stored since the last snapshot exceeds a fixed threshold, it stores a new snapshot (compressed, of course), instead of another delta. This makes it possible to reconstruct \emph{any} revision of a file -quickly. This approach works so well that it has subsequently been -copied by several other revision control systems. +quickly. This approach works so well that it has since been copied by +several other revision control systems. + +Figure~\ref{fig:concepts:snapshot} illustrates the idea. In an entry +in a revlog's index file, Mercurial stores the range of entries from +the data file that it must read to reconstruct a particular revision. \subsubsection{Aside: the influence of video compression} @@ -163,6 +174,61 @@ the next key frame is received. Also, the accumulation of encoding errors restarts anew with each key frame. +\subsection{Strong integrity} + +Along with delta or snapshot information, a revlog entry contains a +cryptographic hash of the data that it represents. This makes it +difficult to forge the contents of a revision, and easy to detect +accidental corruption. + +Mercurial checks these hashes when retrieving file revisions and when +pulling changes from a repository. If it encounters an integrity +problem, it will complain and stop whatever it's doing. + +In addition to the effect it has on retrieval efficiency, Mercurial's +use of periodic snapshots makes it more robust against partial data +corruption. If a revlog becomes partly corrupted due to a hardware +error or system bug, it's often possible to reconstruct some or most +revisions from the uncorrupted sections of the revlog, both before and +after the corrupted section. This would not be possible with a +delta-only storage model. + +\section{The working directory} + +Mercurial's good ideas are not confined to the repository; it also +needs to manage the working directory. The \emph{dirstate} contains +Mercurial's knowledge of the working directory. This details which +revision(s) the working directory is updated to, and all files that +Mercurial is tracking in the working directory. + +Because Mercurial doesn't force you to tell it when you're modifying a +file, it uses the dirstate to store some extra information so it can +determine efficiently whether you have modified a file. For each file +in the working directory, it stores the time that it last modified the +file itself, and the size of the file at that time. + +When Mercurial is checking the states of files in the working +directory, it first checks a file's modification time. If that has +not changed, the file must not have been modified. If the file's size +has changed, the file must have been modified. If the modification +time has changed, but the size has not, only then does Mercurial need +to read the actual contents of the file to see if they've changed. +Storing these few extra pieces of information dramatically reduces the +amount of data that Mercurial needs to read, which yields large +performance improvements compared to other revision control systems. + +\section{Other interesting design features} + +In the sections above, I've tried to highlight some of the most +important aspects of Mercurial's design, to illustrate that it pays +careful attention to reliability and performance. However, the +attention to detail doesn't stop there. There are a number of other +aspects of Mercurial's construction that I personally find +interesting. I'll detail a few of them here, separate from the ``big +ticket'' items above, so that if you're interested, you can gain a +better idea of the amount of thinking that goes into a well-designed +system. + \subsection{Clever compression} When appropriate, Mercurial will store both snapshots and deltas in @@ -183,24 +249,26 @@ the snapshot, again saving space compared to a naive delta-only approach. -\subsection{Strong integrity} +\subsubsection{Network recompression} -Along with delta or snapshot information, a revlog entry contains a -cryptographic hash of the data that it represents. This makes it -difficult to forge the contents of a revision, and easy to detect -accidental corruption. +When storing revisions on disk, Mercurial uses the ``deflate'' +compression algorithm (the same one used by the popular \texttt{zip} +archive format), which balances good speed with a respectable +compression ratio. However, when transmitting revision data over a +network connection, Mercurial uncompresses the compressed revision +data. -Mercurial checks these hashes when retrieving file revisions and when -pulling changes from a repository. If it encounters an integrity -problem, it will complain and stop whatever it's doing. +If the connection is over HTTP, Mercurial recompresses the entire +stream of data using a compression algorithm that gives a etter +compression ratio (the Burrows-Wheeler algorithm from the widely used +\texttt{bzip2} compression package). This combination of algorithm +and compression of the entire stream (instead of a revision at a time) +substantially reduces the number of bytes to be transferred, yielding +better network performance over almost all kinds of network. -In addition to the effect it has on retrieval efficiency, Mercurial's -use of periodic snapshots makes it more robust against partial data -corruption. If a revlog becomes partly corrupted due to a hardware -error or system bug, it's often possible to reconstruct some or most -revisions from the uncorrupted sections of the revlog, both before and -after the corrupted section. This would not be possible with a -delta-only storage model. +(If the connection is over \command{ssh}, Mercurial \emph{doesn't} +recompress the stream, because \command{ssh} can already do this +itself.) \subsection{Read/write ordering and atomicity} @@ -241,15 +309,25 @@ makes for all kinds of nasty and annoying security and administrative problems.) -Mercurial uses a locking mechanism to ensure that only one process can -write to a repository at a time. This locking mechanism is safe even -over filesystems that are notoriously unsafe for locking, such as NFS. -If a repository is locked, a writer will wait for a while to retry if -the repository becomes unlocked, but if the repository remains locked -for too long, the process attempting to write will time out after a -while. This means that your daily automated scripts won't get stuck -forever and pile up if a system crashes unnoticed, for example. (Yes, -the timeout is configurable, from zero to infinity.) +Mercurial uses locks to ensure that only one process can write to a +repository at a time (the locking mechanism is safe even over +filesystems that are notoriously hostile to locking, such as NFS). If +a repository is locked, a writer will wait for a while to retry if the +repository becomes unlocked, but if the repository remains locked for +too long, the process attempting to write will time out after a while. +This means that your daily automated scripts won't get stuck forever +and pile up if a system crashes unnoticed, for example. (Yes, the +timeout is configurable, from zero to infinity.) + +\subsubsection{Safe dirstate access} + +As with revision data, Mercurial doesn't take a lock to read the +dirstate file; it does acquire a lock to write it. To avoid the +possibility of reading a partially written copy of the dirstate file, +Mercurial writes to a file with a unique name in the same directory as +the dirstate file, then renames the temporary file atomically to +\filename{dirstate}. The file named \filename{dirstate} is thus +guaranteed to be complete, not partially written.
--- a/en/metadata.svg Fri Nov 10 12:42:00 2006 -0800 +++ b/en/metadata.svg Fri Nov 10 15:09:49 2006 -0800 @@ -67,7 +67,7 @@ id="layer1"> <path style="fill:none;fill-opacity:0.75;fill-rule:evenodd;stroke:#a7a7a7;stroke-width:1.5;stroke-linecap:butt;stroke-linejoin:miter;marker-end:url(#Arrow1Mend);stroke-miterlimit:4;stroke-dasharray:4.5, 1.5;stroke-dashoffset:0;stroke-opacity:1;display:inline" - d="M 326.94646,471.18359 L 326.94646,510.98123" + d="M 326.94646,467.18359 L 326.94646,510.98123" id="path1910" inkscape:connector-type="polyline" inkscape:connection-end="#rect2962" @@ -87,8 +87,8 @@ inkscape:connection-end="#rect3038" inkscape:connection-start="#rect2962" /> <path - style="fill:none;fill-opacity:0.75;fill-rule:evenodd;stroke:#484848;stroke-width:1.5;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#Arrow1Mend);stroke-miterlimit:4;stroke-dasharray:4.5,1.5;stroke-dashoffset:0" - d="M 254.23217,471.18359 L 254.23216,510.98123" + style="fill:none;fill-opacity:0.75;fill-rule:evenodd;stroke:#484848;stroke-width:1.5;stroke-linecap:butt;stroke-linejoin:miter;marker-end:url(#Arrow1Mend);stroke-miterlimit:4;stroke-dasharray:4.5, 1.5;stroke-dashoffset:0;stroke-opacity:1" + d="M 254.23217,467.18359 L 254.23216,510.98123" id="path3088" inkscape:connector-type="polyline" inkscape:connection-start="#rect1872" @@ -113,52 +113,52 @@ width="51.42857" height="20" x="228.51788" - y="450.68359" /> + y="446.68359" /> <rect style="fill:#cacbfb;fill-opacity:1;stroke:#595959;stroke-width:1;stroke-linecap:round;stroke-linejoin:round;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" id="rect2764" width="51.42857" height="20" x="301.23218" - y="450.68359" /> + y="446.68359" /> <rect style="fill:#cacbfb;fill-opacity:1;stroke:#595959;stroke-width:1;stroke-linecap:round;stroke-linejoin:round;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" id="rect2766" width="51.42857" height="20" x="155.80359" - y="450.68359" /> + y="446.68359" /> <rect style="fill:#cacbfb;fill-opacity:1;stroke:#595959;stroke-width:1;stroke-linecap:round;stroke-linejoin:round;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1" id="rect2768" width="51.42857" height="20" x="83.089294" - y="450.68359" /> + y="446.68359" /> <path style="fill:none;fill-opacity:0.75;fill-rule:evenodd;stroke:#747474;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;marker-end:url(#Arrow1Mend);stroke-opacity:1" - d="M 135.01786,460.68359 L 155.30359,460.68359" + d="M 135.01786,456.68359 L 155.30359,456.68359" id="path2770" inkscape:connector-type="polyline" inkscape:connection-start="#rect2768" inkscape:connection-end="#rect2766" /> <path style="fill:none;fill-opacity:0.75;fill-rule:evenodd;stroke:#747474;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;marker-end:url(#Arrow1Mend);stroke-opacity:1" - d="M 207.73216,460.68359 L 228.01788,460.68359" + d="M 207.73216,456.68359 L 228.01788,456.68359" id="path2772" inkscape:connector-type="polyline" inkscape:connection-start="#rect2766" inkscape:connection-end="#rect1872" /> <path style="fill:none;fill-opacity:0.75;fill-rule:evenodd;stroke:#747474;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;marker-end:url(#Arrow1Mend);stroke-opacity:1" - d="M 280.44645,460.68359 L 300.73218,460.68359" + d="M 280.44645,456.68359 L 300.73218,456.68359" id="path2774" inkscape:connector-type="polyline" inkscape:connection-start="#rect1872" inkscape:connection-end="#rect2764" /> <path style="fill:none;fill-opacity:0.75;fill-rule:evenodd;stroke:#747474;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;marker-end:url(#Arrow1Mend);stroke-miterlimit:4;stroke-dasharray:3, 3;stroke-dashoffset:0;stroke-opacity:1" - d="M 62.303571,460.68359 L 82.589293,460.68359" + d="M 62.303571,456.68359 L 82.589294,456.68359" id="path2778" inkscape:connector-type="polyline" inkscape:connection-end="#rect2768" /> @@ -298,12 +298,12 @@ xml:space="preserve" style="font-size:12px;font-style:normal;font-weight:normal;fill:black;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;font-family:Times New Roman" x="82.072548" - y="436.64789" + y="432.64789" id="text3094"><tspan sodipodi:role="line" id="tspan3096" x="82.072548" - y="436.64789">Changelog</tspan></text> + y="432.64789">Changelog</tspan></text> <text xml:space="preserve" style="font-size:12px;font-style:normal;font-weight:normal;fill:black;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;font-family:Times New Roman"