diff en/concepts.tex @ 115:b74102b56df5

Wow! Lots more work detailing the working directory, merging, etc.
author Bryan O'Sullivan <bos@serpentine.com>
date Mon, 13 Nov 2006 16:19:48 -0800
parents a0f57b3e677e
children ca99f247899e
line wrap: on
line diff
--- a/en/concepts.tex	Mon Nov 13 14:34:57 2006 -0800
+++ b/en/concepts.tex	Mon Nov 13 16:19:48 2006 -0800
@@ -204,6 +204,35 @@
 after the corrupted section.  This would not be possible with a
 delta-only storage model.
 
+\section{Revision history, branching,
+  and merging}
+
+Every entry in a Mercurial revlog knows the identity of its immediate
+ancestor revision, usually referred to as its \emph{parent}.  In fact,
+a revision contains room for not one parent, but two.  Mercurial uses
+a special hash, called the ``null ID'', to represent the idea ``there
+is no parent here''.  This hash is simply a string of zeroes.
+
+In figure~\ref{fig:concepts:revlog}, you can see an example of the
+conceptual structure of a revlog.  Filelogs, manifests, and changelogs
+all have this same structure; they differ only in the kind of data
+stored in each delta or snapshot.
+
+The first revision in a revlog (at the bottom of the image) has the
+null ID in both of its parent slots.  For a ``normal'' revision, its
+first parent slot contains the ID of its parent revision, and its
+second contains the null ID, indicating that the revision has only one
+real parent.  Any two revisions that have the same parent ID are
+branches.  A revision that represents a merge between branches has two
+normal revision IDs in its parent slots.
+
+\begin{figure}[ht]
+  \centering
+  \grafix{revlog}
+  \caption{}
+  \label{fig:concepts:revlog}
+\end{figure}
+
 \section{The working directory}
 
 In the working directory, Mercurial stores a snapshot of the files
@@ -266,59 +295,118 @@
 
 After a commit, Mercurial will update the parents of the working
 directory, so that the first parent is the ID of the new changeset,
-and the second is the null ID.  This is illustrated in
-figure~\ref{fig:concepts:wdir-after-commit}.
-
-\subsection{Other contents of the dirstate}
+and the second is the null ID.  This is shown in
+figure~\ref{fig:concepts:wdir-after-commit}.  Mercurial doesn't touch
+any of the files in the working directory when you commit; it just
+modifies the dirstate to note its new parents.
 
-Because Mercurial doesn't force you to tell it when you're modifying a
-file, it uses the dirstate to store some extra information so it can
-determine efficiently whether you have modified a file.  For each file
-in the working directory, it stores the time that it last modified the
-file itself, and the size of the file at that time.  
-
-When you explicitly \hgcmd{add}, \hgcmd{remove}, \hgcmd{rename} or
-\hgcmd{copy} files, the dirstate is updated each time.
+\subsection{Creating a new head}
 
-When Mercurial is checking the states of files in the working
-directory, it first checks a file's modification time.  If that has
-not changed, the file must not have been modified.  If the file's size
-has changed, the file must have been modified.  If the modification
-time has changed, but the size has not, only then does Mercurial need
-to read the actual contents of the file to see if they've changed.
-Storing these few extra pieces of information dramatically reduces the
-amount of data that Mercurial needs to read, which yields large
-performance improvements compared to other revision control systems.
-
-\section{Revision history, branching,
-  and merging}
+It's perfectly normal to update the working directory to a changeset
+other than the current tip.  For example, you might want to know what
+your project looked like last Tuesday, or you could be looking through
+changesets to see which one introduced a bug.  In cases like this, the
+natural thing to do is update the working directory to the changeset
+you're interested in, and then examine the files in the working
+directory directly to see their contents as they werea when you
+committed that changeset.  The effect of this is shown in
+figure~\ref{fig:concepts:wdir-pre-branch}.
 
-Every entry in a Mercurial revlog knows the identity of its immediate
-ancestor revision, usually referred to as its \emph{parent}.  In fact,
-a revision contains room for not one parent, but two.  Mercurial uses
-a special hash, called the ``null ID'', to represent the idea ``there
-is no parent here''.  This hash is simply a string of zeroes.
+\begin{figure}[ht]
+  \centering
+  \grafix{wdir-pre-branch}
+  \caption{The working directory, updated to an older changeset}
+  \label{fig:concepts:wdir-pre-branch}
+\end{figure}
 
-In figure~\ref{fig:concepts:revlog}, you can see an example of the
-conceptual structure of a revlog.  Filelogs, manifests, and changelogs
-all have this same structure; they differ only in the kind of data
-stored in each delta or snapshot.
-
-The first revision in a revlog (at the bottom of the image) has the
-null ID in both of its parent slots.  For a ``normal'' revision, its
-first parent slot contains the ID of its parent revision, and its
-second contains the null ID, indicating that the revision has only one
-real parent.  Any two revisions that have the same parent ID are
-branches.  A revision that represents a merge between branches has two
-normal revision IDs in its parent slots.
+Having updated the working directory to an older changeset, what
+happens if you make some changes, and then commit?  Mercurial behaves
+in the same way as I outlined above.  The parents of the working
+directory become the parents of the new changeset.  This new changeset
+has no children, so it becomes the new tip.  And the repository now
+contains two changesets that have no children; we call these
+\emph{heads}.  You can see the structure that this creates in
+figure~\ref{fig:concepts:wdir-branch}.
 
 \begin{figure}[ht]
   \centering
-  \grafix{revlog}
-  \caption{}
-  \label{fig:concepts:revlog}
+  \grafix{wdir-branch}
+  \caption{After a commit made while synced to an older changeset}
+  \label{fig:concepts:wdir-branch}
+\end{figure}
+
+\begin{note}
+  If you're new to Mercurial, you should keep in mind a common
+  ``error'', which is to use the \hgcmd{pull} command without any
+  options.  By default, the \hgcmd{pull} command \emph{does not}
+  update the working directory, so you'll bring new changesets into
+  your repository, but the working directory will stay synced at the
+  same changeset as before the pull.  If you make some changes and
+  commit afterwards, you'll thus create a new head, because your
+  working directory isn't synced to whatever the current tip is.
+
+  I put the word ``error'' in quotes because all that you need to do
+  to rectify this situation is \hgcmd{merge}, then \hgcmd{commit}.  In
+  other words, this almost never has negative consequences; it just
+  surprises people.  I'll discuss other ways to avoid this behaviour,
+  and why Mercurial behaves in this initially surprising way, later
+  on.
+\end{note}
+
+\subsection{Merging heads}
+
+When you run the \hgcmd{merge} command, Mercurial leaves the first
+parent of the working directory unchanged, and sets the second parent
+to the changeset you're merging with, as shown in
+figure~\ref{fig:concepts:wdir-merge}.
+
+\begin{figure}[ht]
+  \centering
+  \grafix{wdir-merge}
+  \caption{Merging two hehads}
+  \label{fig:concepts:wdir-merge}
 \end{figure}
 
+Mercurial also has to modify the working directory, to merge the files
+managed in the two changesets.  Simplified a little, the merging
+process goes like this, for every file in the manifests of both
+changesets.
+\begin{itemize}
+\item If neither changeset has modified a file, do nothing with that
+  file.
+\item If one changeset has modified a file, and the other hasn't,
+  create the modified copy of the file in the working directory.
+\item If one changeset has removed a file, and the other hasn't (or
+  has also deleted it), delete the file from the working directory.
+\item If one changeset has removed a file, but the other has modified
+  the file, ask the user what to do: keep the modified file, or remove
+  it?
+\item If both changesets have modified a file, invoke an external
+  merge program to choose the new contents for the merged file.  This
+  may require input from the user.
+\item If one changeset has modified a file, and the other has renamed
+  or copied the file, make sure that the changes follow the new name
+  of the file.
+\end{itemize}
+There are more details---merging has plenty of corner cases---but
+these are the most common choices that are involved in a merge.  As
+you can see, most cases are completely automatic, and indeed most
+merges finish automatically, without requiring your input to resolve
+any conflicts.
+
+When you're thinking about what happens when you commit after a merge,
+once again the working directory is ``the changeset I'm about to
+commit''.  After the \hgcmd{merge} command completes, the working
+directory has two parents; these will become the parents of the new
+changeset.
+
+Mercurial lets you perform multiple merges, but you must commit the
+results of each individual merge as you go.  This is necessary because
+Mercurial only tracks two parents for both revisions and the working
+directory.  While it would be technically possible to merge multiple
+changesets at once, the prospect of user confusion and making a
+terrible mess of a merge immediately becomes overwhelming.
+
 \section{Other interesting design features}
 
 In the sections above, I've tried to highlight some of the most
@@ -460,6 +548,27 @@
 performance and increase the complexity of the software, each of which
 is much more important to the ``feel'' of day-to-day use.
 
+\subsection{Other contents of the dirstate}
+
+Because Mercurial doesn't force you to tell it when you're modifying a
+file, it uses the dirstate to store some extra information so it can
+determine efficiently whether you have modified a file.  For each file
+in the working directory, it stores the time that it last modified the
+file itself, and the size of the file at that time.  
+
+When you explicitly \hgcmd{add}, \hgcmd{remove}, \hgcmd{rename} or
+\hgcmd{copy} files, the dirstate is updated each time.
+
+When Mercurial is checking the states of files in the working
+directory, it first checks a file's modification time.  If that has
+not changed, the file must not have been modified.  If the file's size
+has changed, the file must have been modified.  If the modification
+time has changed, but the size has not, only then does Mercurial need
+to read the actual contents of the file to see if they've changed.
+Storing these few extra pieces of information dramatically reduces the
+amount of data that Mercurial needs to read, which yields large
+performance improvements compared to other revision control systems.
+
 %%% Local Variables: 
 %%% mode: latex
 %%% TeX-master: "00book"