comparison en/concepts.tex @ 115:b74102b56df5

Wow! Lots more work detailing the working directory, merging, etc.
author Bryan O'Sullivan <bos@serpentine.com>
date Mon, 13 Nov 2006 16:19:48 -0800
parents a0f57b3e677e
children ca99f247899e
comparison
equal deleted inserted replaced
114:ccff2b25478e 115:b74102b56df5
202 error or system bug, it's often possible to reconstruct some or most 202 error or system bug, it's often possible to reconstruct some or most
203 revisions from the uncorrupted sections of the revlog, both before and 203 revisions from the uncorrupted sections of the revlog, both before and
204 after the corrupted section. This would not be possible with a 204 after the corrupted section. This would not be possible with a
205 delta-only storage model. 205 delta-only storage model.
206 206
207 \section{Revision history, branching,
208 and merging}
209
210 Every entry in a Mercurial revlog knows the identity of its immediate
211 ancestor revision, usually referred to as its \emph{parent}. In fact,
212 a revision contains room for not one parent, but two. Mercurial uses
213 a special hash, called the ``null ID'', to represent the idea ``there
214 is no parent here''. This hash is simply a string of zeroes.
215
216 In figure~\ref{fig:concepts:revlog}, you can see an example of the
217 conceptual structure of a revlog. Filelogs, manifests, and changelogs
218 all have this same structure; they differ only in the kind of data
219 stored in each delta or snapshot.
220
221 The first revision in a revlog (at the bottom of the image) has the
222 null ID in both of its parent slots. For a ``normal'' revision, its
223 first parent slot contains the ID of its parent revision, and its
224 second contains the null ID, indicating that the revision has only one
225 real parent. Any two revisions that have the same parent ID are
226 branches. A revision that represents a merge between branches has two
227 normal revision IDs in its parent slots.
228
229 \begin{figure}[ht]
230 \centering
231 \grafix{revlog}
232 \caption{}
233 \label{fig:concepts:revlog}
234 \end{figure}
235
207 \section{The working directory} 236 \section{The working directory}
208 237
209 In the working directory, Mercurial stores a snapshot of the files 238 In the working directory, Mercurial stores a snapshot of the files
210 from the repository as of a particular changeset. 239 from the repository as of a particular changeset.
211 240
264 already tracking; the new changeset will have the parents of the 293 already tracking; the new changeset will have the parents of the
265 working directory as its parents. 294 working directory as its parents.
266 295
267 After a commit, Mercurial will update the parents of the working 296 After a commit, Mercurial will update the parents of the working
268 directory, so that the first parent is the ID of the new changeset, 297 directory, so that the first parent is the ID of the new changeset,
269 and the second is the null ID. This is illustrated in 298 and the second is the null ID. This is shown in
270 figure~\ref{fig:concepts:wdir-after-commit}. 299 figure~\ref{fig:concepts:wdir-after-commit}. Mercurial doesn't touch
271 300 any of the files in the working directory when you commit; it just
272 \subsection{Other contents of the dirstate} 301 modifies the dirstate to note its new parents.
273 302
274 Because Mercurial doesn't force you to tell it when you're modifying a 303 \subsection{Creating a new head}
275 file, it uses the dirstate to store some extra information so it can 304
276 determine efficiently whether you have modified a file. For each file 305 It's perfectly normal to update the working directory to a changeset
277 in the working directory, it stores the time that it last modified the 306 other than the current tip. For example, you might want to know what
278 file itself, and the size of the file at that time. 307 your project looked like last Tuesday, or you could be looking through
279 308 changesets to see which one introduced a bug. In cases like this, the
280 When you explicitly \hgcmd{add}, \hgcmd{remove}, \hgcmd{rename} or 309 natural thing to do is update the working directory to the changeset
281 \hgcmd{copy} files, the dirstate is updated each time. 310 you're interested in, and then examine the files in the working
282 311 directory directly to see their contents as they werea when you
283 When Mercurial is checking the states of files in the working 312 committed that changeset. The effect of this is shown in
284 directory, it first checks a file's modification time. If that has 313 figure~\ref{fig:concepts:wdir-pre-branch}.
285 not changed, the file must not have been modified. If the file's size 314
286 has changed, the file must have been modified. If the modification 315 \begin{figure}[ht]
287 time has changed, but the size has not, only then does Mercurial need 316 \centering
288 to read the actual contents of the file to see if they've changed. 317 \grafix{wdir-pre-branch}
289 Storing these few extra pieces of information dramatically reduces the 318 \caption{The working directory, updated to an older changeset}
290 amount of data that Mercurial needs to read, which yields large 319 \label{fig:concepts:wdir-pre-branch}
291 performance improvements compared to other revision control systems. 320 \end{figure}
292 321
293 \section{Revision history, branching, 322 Having updated the working directory to an older changeset, what
294 and merging} 323 happens if you make some changes, and then commit? Mercurial behaves
295 324 in the same way as I outlined above. The parents of the working
296 Every entry in a Mercurial revlog knows the identity of its immediate 325 directory become the parents of the new changeset. This new changeset
297 ancestor revision, usually referred to as its \emph{parent}. In fact, 326 has no children, so it becomes the new tip. And the repository now
298 a revision contains room for not one parent, but two. Mercurial uses 327 contains two changesets that have no children; we call these
299 a special hash, called the ``null ID'', to represent the idea ``there 328 \emph{heads}. You can see the structure that this creates in
300 is no parent here''. This hash is simply a string of zeroes. 329 figure~\ref{fig:concepts:wdir-branch}.
301 330
302 In figure~\ref{fig:concepts:revlog}, you can see an example of the 331 \begin{figure}[ht]
303 conceptual structure of a revlog. Filelogs, manifests, and changelogs 332 \centering
304 all have this same structure; they differ only in the kind of data 333 \grafix{wdir-branch}
305 stored in each delta or snapshot. 334 \caption{After a commit made while synced to an older changeset}
306 335 \label{fig:concepts:wdir-branch}
307 The first revision in a revlog (at the bottom of the image) has the 336 \end{figure}
308 null ID in both of its parent slots. For a ``normal'' revision, its 337
309 first parent slot contains the ID of its parent revision, and its 338 \begin{note}
310 second contains the null ID, indicating that the revision has only one 339 If you're new to Mercurial, you should keep in mind a common
311 real parent. Any two revisions that have the same parent ID are 340 ``error'', which is to use the \hgcmd{pull} command without any
312 branches. A revision that represents a merge between branches has two 341 options. By default, the \hgcmd{pull} command \emph{does not}
313 normal revision IDs in its parent slots. 342 update the working directory, so you'll bring new changesets into
314 343 your repository, but the working directory will stay synced at the
315 \begin{figure}[ht] 344 same changeset as before the pull. If you make some changes and
316 \centering 345 commit afterwards, you'll thus create a new head, because your
317 \grafix{revlog} 346 working directory isn't synced to whatever the current tip is.
318 \caption{} 347
319 \label{fig:concepts:revlog} 348 I put the word ``error'' in quotes because all that you need to do
320 \end{figure} 349 to rectify this situation is \hgcmd{merge}, then \hgcmd{commit}. In
350 other words, this almost never has negative consequences; it just
351 surprises people. I'll discuss other ways to avoid this behaviour,
352 and why Mercurial behaves in this initially surprising way, later
353 on.
354 \end{note}
355
356 \subsection{Merging heads}
357
358 When you run the \hgcmd{merge} command, Mercurial leaves the first
359 parent of the working directory unchanged, and sets the second parent
360 to the changeset you're merging with, as shown in
361 figure~\ref{fig:concepts:wdir-merge}.
362
363 \begin{figure}[ht]
364 \centering
365 \grafix{wdir-merge}
366 \caption{Merging two hehads}
367 \label{fig:concepts:wdir-merge}
368 \end{figure}
369
370 Mercurial also has to modify the working directory, to merge the files
371 managed in the two changesets. Simplified a little, the merging
372 process goes like this, for every file in the manifests of both
373 changesets.
374 \begin{itemize}
375 \item If neither changeset has modified a file, do nothing with that
376 file.
377 \item If one changeset has modified a file, and the other hasn't,
378 create the modified copy of the file in the working directory.
379 \item If one changeset has removed a file, and the other hasn't (or
380 has also deleted it), delete the file from the working directory.
381 \item If one changeset has removed a file, but the other has modified
382 the file, ask the user what to do: keep the modified file, or remove
383 it?
384 \item If both changesets have modified a file, invoke an external
385 merge program to choose the new contents for the merged file. This
386 may require input from the user.
387 \item If one changeset has modified a file, and the other has renamed
388 or copied the file, make sure that the changes follow the new name
389 of the file.
390 \end{itemize}
391 There are more details---merging has plenty of corner cases---but
392 these are the most common choices that are involved in a merge. As
393 you can see, most cases are completely automatic, and indeed most
394 merges finish automatically, without requiring your input to resolve
395 any conflicts.
396
397 When you're thinking about what happens when you commit after a merge,
398 once again the working directory is ``the changeset I'm about to
399 commit''. After the \hgcmd{merge} command completes, the working
400 directory has two parents; these will become the parents of the new
401 changeset.
402
403 Mercurial lets you perform multiple merges, but you must commit the
404 results of each individual merge as you go. This is necessary because
405 Mercurial only tracks two parents for both revisions and the working
406 directory. While it would be technically possible to merge multiple
407 changesets at once, the prospect of user confusion and making a
408 terrible mess of a merge immediately becomes overwhelming.
321 409
322 \section{Other interesting design features} 410 \section{Other interesting design features}
323 411
324 In the sections above, I've tried to highlight some of the most 412 In the sections above, I've tried to highlight some of the most
325 important aspects of Mercurial's design, to illustrate that it pays 413 important aspects of Mercurial's design, to illustrate that it pays
458 gives the highest performance while deferring most book-keeping to the 546 gives the highest performance while deferring most book-keeping to the
459 operating system. An alternative scheme would most likely reduce 547 operating system. An alternative scheme would most likely reduce
460 performance and increase the complexity of the software, each of which 548 performance and increase the complexity of the software, each of which
461 is much more important to the ``feel'' of day-to-day use. 549 is much more important to the ``feel'' of day-to-day use.
462 550
551 \subsection{Other contents of the dirstate}
552
553 Because Mercurial doesn't force you to tell it when you're modifying a
554 file, it uses the dirstate to store some extra information so it can
555 determine efficiently whether you have modified a file. For each file
556 in the working directory, it stores the time that it last modified the
557 file itself, and the size of the file at that time.
558
559 When you explicitly \hgcmd{add}, \hgcmd{remove}, \hgcmd{rename} or
560 \hgcmd{copy} files, the dirstate is updated each time.
561
562 When Mercurial is checking the states of files in the working
563 directory, it first checks a file's modification time. If that has
564 not changed, the file must not have been modified. If the file's size
565 has changed, the file must have been modified. If the modification
566 time has changed, but the size has not, only then does Mercurial need
567 to read the actual contents of the file to see if they've changed.
568 Storing these few extra pieces of information dramatically reduces the
569 amount of data that Mercurial needs to read, which yields large
570 performance improvements compared to other revision control systems.
571
463 %%% Local Variables: 572 %%% Local Variables:
464 %%% mode: latex 573 %%% mode: latex
465 %%% TeX-master: "00book" 574 %%% TeX-master: "00book"
466 %%% End: 575 %%% End: