Mercurial > hgbook
comparison en/concepts.tex @ 115:b74102b56df5
Wow! Lots more work detailing the working directory, merging, etc.
author | Bryan O'Sullivan <bos@serpentine.com> |
---|---|
date | Mon, 13 Nov 2006 16:19:48 -0800 |
parents | a0f57b3e677e |
children | ca99f247899e |
comparison
equal
deleted
inserted
replaced
114:ccff2b25478e | 115:b74102b56df5 |
---|---|
202 error or system bug, it's often possible to reconstruct some or most | 202 error or system bug, it's often possible to reconstruct some or most |
203 revisions from the uncorrupted sections of the revlog, both before and | 203 revisions from the uncorrupted sections of the revlog, both before and |
204 after the corrupted section. This would not be possible with a | 204 after the corrupted section. This would not be possible with a |
205 delta-only storage model. | 205 delta-only storage model. |
206 | 206 |
207 \section{Revision history, branching, | |
208 and merging} | |
209 | |
210 Every entry in a Mercurial revlog knows the identity of its immediate | |
211 ancestor revision, usually referred to as its \emph{parent}. In fact, | |
212 a revision contains room for not one parent, but two. Mercurial uses | |
213 a special hash, called the ``null ID'', to represent the idea ``there | |
214 is no parent here''. This hash is simply a string of zeroes. | |
215 | |
216 In figure~\ref{fig:concepts:revlog}, you can see an example of the | |
217 conceptual structure of a revlog. Filelogs, manifests, and changelogs | |
218 all have this same structure; they differ only in the kind of data | |
219 stored in each delta or snapshot. | |
220 | |
221 The first revision in a revlog (at the bottom of the image) has the | |
222 null ID in both of its parent slots. For a ``normal'' revision, its | |
223 first parent slot contains the ID of its parent revision, and its | |
224 second contains the null ID, indicating that the revision has only one | |
225 real parent. Any two revisions that have the same parent ID are | |
226 branches. A revision that represents a merge between branches has two | |
227 normal revision IDs in its parent slots. | |
228 | |
229 \begin{figure}[ht] | |
230 \centering | |
231 \grafix{revlog} | |
232 \caption{} | |
233 \label{fig:concepts:revlog} | |
234 \end{figure} | |
235 | |
207 \section{The working directory} | 236 \section{The working directory} |
208 | 237 |
209 In the working directory, Mercurial stores a snapshot of the files | 238 In the working directory, Mercurial stores a snapshot of the files |
210 from the repository as of a particular changeset. | 239 from the repository as of a particular changeset. |
211 | 240 |
264 already tracking; the new changeset will have the parents of the | 293 already tracking; the new changeset will have the parents of the |
265 working directory as its parents. | 294 working directory as its parents. |
266 | 295 |
267 After a commit, Mercurial will update the parents of the working | 296 After a commit, Mercurial will update the parents of the working |
268 directory, so that the first parent is the ID of the new changeset, | 297 directory, so that the first parent is the ID of the new changeset, |
269 and the second is the null ID. This is illustrated in | 298 and the second is the null ID. This is shown in |
270 figure~\ref{fig:concepts:wdir-after-commit}. | 299 figure~\ref{fig:concepts:wdir-after-commit}. Mercurial doesn't touch |
271 | 300 any of the files in the working directory when you commit; it just |
272 \subsection{Other contents of the dirstate} | 301 modifies the dirstate to note its new parents. |
273 | 302 |
274 Because Mercurial doesn't force you to tell it when you're modifying a | 303 \subsection{Creating a new head} |
275 file, it uses the dirstate to store some extra information so it can | 304 |
276 determine efficiently whether you have modified a file. For each file | 305 It's perfectly normal to update the working directory to a changeset |
277 in the working directory, it stores the time that it last modified the | 306 other than the current tip. For example, you might want to know what |
278 file itself, and the size of the file at that time. | 307 your project looked like last Tuesday, or you could be looking through |
279 | 308 changesets to see which one introduced a bug. In cases like this, the |
280 When you explicitly \hgcmd{add}, \hgcmd{remove}, \hgcmd{rename} or | 309 natural thing to do is update the working directory to the changeset |
281 \hgcmd{copy} files, the dirstate is updated each time. | 310 you're interested in, and then examine the files in the working |
282 | 311 directory directly to see their contents as they werea when you |
283 When Mercurial is checking the states of files in the working | 312 committed that changeset. The effect of this is shown in |
284 directory, it first checks a file's modification time. If that has | 313 figure~\ref{fig:concepts:wdir-pre-branch}. |
285 not changed, the file must not have been modified. If the file's size | 314 |
286 has changed, the file must have been modified. If the modification | 315 \begin{figure}[ht] |
287 time has changed, but the size has not, only then does Mercurial need | 316 \centering |
288 to read the actual contents of the file to see if they've changed. | 317 \grafix{wdir-pre-branch} |
289 Storing these few extra pieces of information dramatically reduces the | 318 \caption{The working directory, updated to an older changeset} |
290 amount of data that Mercurial needs to read, which yields large | 319 \label{fig:concepts:wdir-pre-branch} |
291 performance improvements compared to other revision control systems. | 320 \end{figure} |
292 | 321 |
293 \section{Revision history, branching, | 322 Having updated the working directory to an older changeset, what |
294 and merging} | 323 happens if you make some changes, and then commit? Mercurial behaves |
295 | 324 in the same way as I outlined above. The parents of the working |
296 Every entry in a Mercurial revlog knows the identity of its immediate | 325 directory become the parents of the new changeset. This new changeset |
297 ancestor revision, usually referred to as its \emph{parent}. In fact, | 326 has no children, so it becomes the new tip. And the repository now |
298 a revision contains room for not one parent, but two. Mercurial uses | 327 contains two changesets that have no children; we call these |
299 a special hash, called the ``null ID'', to represent the idea ``there | 328 \emph{heads}. You can see the structure that this creates in |
300 is no parent here''. This hash is simply a string of zeroes. | 329 figure~\ref{fig:concepts:wdir-branch}. |
301 | 330 |
302 In figure~\ref{fig:concepts:revlog}, you can see an example of the | 331 \begin{figure}[ht] |
303 conceptual structure of a revlog. Filelogs, manifests, and changelogs | 332 \centering |
304 all have this same structure; they differ only in the kind of data | 333 \grafix{wdir-branch} |
305 stored in each delta or snapshot. | 334 \caption{After a commit made while synced to an older changeset} |
306 | 335 \label{fig:concepts:wdir-branch} |
307 The first revision in a revlog (at the bottom of the image) has the | 336 \end{figure} |
308 null ID in both of its parent slots. For a ``normal'' revision, its | 337 |
309 first parent slot contains the ID of its parent revision, and its | 338 \begin{note} |
310 second contains the null ID, indicating that the revision has only one | 339 If you're new to Mercurial, you should keep in mind a common |
311 real parent. Any two revisions that have the same parent ID are | 340 ``error'', which is to use the \hgcmd{pull} command without any |
312 branches. A revision that represents a merge between branches has two | 341 options. By default, the \hgcmd{pull} command \emph{does not} |
313 normal revision IDs in its parent slots. | 342 update the working directory, so you'll bring new changesets into |
314 | 343 your repository, but the working directory will stay synced at the |
315 \begin{figure}[ht] | 344 same changeset as before the pull. If you make some changes and |
316 \centering | 345 commit afterwards, you'll thus create a new head, because your |
317 \grafix{revlog} | 346 working directory isn't synced to whatever the current tip is. |
318 \caption{} | 347 |
319 \label{fig:concepts:revlog} | 348 I put the word ``error'' in quotes because all that you need to do |
320 \end{figure} | 349 to rectify this situation is \hgcmd{merge}, then \hgcmd{commit}. In |
350 other words, this almost never has negative consequences; it just | |
351 surprises people. I'll discuss other ways to avoid this behaviour, | |
352 and why Mercurial behaves in this initially surprising way, later | |
353 on. | |
354 \end{note} | |
355 | |
356 \subsection{Merging heads} | |
357 | |
358 When you run the \hgcmd{merge} command, Mercurial leaves the first | |
359 parent of the working directory unchanged, and sets the second parent | |
360 to the changeset you're merging with, as shown in | |
361 figure~\ref{fig:concepts:wdir-merge}. | |
362 | |
363 \begin{figure}[ht] | |
364 \centering | |
365 \grafix{wdir-merge} | |
366 \caption{Merging two hehads} | |
367 \label{fig:concepts:wdir-merge} | |
368 \end{figure} | |
369 | |
370 Mercurial also has to modify the working directory, to merge the files | |
371 managed in the two changesets. Simplified a little, the merging | |
372 process goes like this, for every file in the manifests of both | |
373 changesets. | |
374 \begin{itemize} | |
375 \item If neither changeset has modified a file, do nothing with that | |
376 file. | |
377 \item If one changeset has modified a file, and the other hasn't, | |
378 create the modified copy of the file in the working directory. | |
379 \item If one changeset has removed a file, and the other hasn't (or | |
380 has also deleted it), delete the file from the working directory. | |
381 \item If one changeset has removed a file, but the other has modified | |
382 the file, ask the user what to do: keep the modified file, or remove | |
383 it? | |
384 \item If both changesets have modified a file, invoke an external | |
385 merge program to choose the new contents for the merged file. This | |
386 may require input from the user. | |
387 \item If one changeset has modified a file, and the other has renamed | |
388 or copied the file, make sure that the changes follow the new name | |
389 of the file. | |
390 \end{itemize} | |
391 There are more details---merging has plenty of corner cases---but | |
392 these are the most common choices that are involved in a merge. As | |
393 you can see, most cases are completely automatic, and indeed most | |
394 merges finish automatically, without requiring your input to resolve | |
395 any conflicts. | |
396 | |
397 When you're thinking about what happens when you commit after a merge, | |
398 once again the working directory is ``the changeset I'm about to | |
399 commit''. After the \hgcmd{merge} command completes, the working | |
400 directory has two parents; these will become the parents of the new | |
401 changeset. | |
402 | |
403 Mercurial lets you perform multiple merges, but you must commit the | |
404 results of each individual merge as you go. This is necessary because | |
405 Mercurial only tracks two parents for both revisions and the working | |
406 directory. While it would be technically possible to merge multiple | |
407 changesets at once, the prospect of user confusion and making a | |
408 terrible mess of a merge immediately becomes overwhelming. | |
321 | 409 |
322 \section{Other interesting design features} | 410 \section{Other interesting design features} |
323 | 411 |
324 In the sections above, I've tried to highlight some of the most | 412 In the sections above, I've tried to highlight some of the most |
325 important aspects of Mercurial's design, to illustrate that it pays | 413 important aspects of Mercurial's design, to illustrate that it pays |
458 gives the highest performance while deferring most book-keeping to the | 546 gives the highest performance while deferring most book-keeping to the |
459 operating system. An alternative scheme would most likely reduce | 547 operating system. An alternative scheme would most likely reduce |
460 performance and increase the complexity of the software, each of which | 548 performance and increase the complexity of the software, each of which |
461 is much more important to the ``feel'' of day-to-day use. | 549 is much more important to the ``feel'' of day-to-day use. |
462 | 550 |
551 \subsection{Other contents of the dirstate} | |
552 | |
553 Because Mercurial doesn't force you to tell it when you're modifying a | |
554 file, it uses the dirstate to store some extra information so it can | |
555 determine efficiently whether you have modified a file. For each file | |
556 in the working directory, it stores the time that it last modified the | |
557 file itself, and the size of the file at that time. | |
558 | |
559 When you explicitly \hgcmd{add}, \hgcmd{remove}, \hgcmd{rename} or | |
560 \hgcmd{copy} files, the dirstate is updated each time. | |
561 | |
562 When Mercurial is checking the states of files in the working | |
563 directory, it first checks a file's modification time. If that has | |
564 not changed, the file must not have been modified. If the file's size | |
565 has changed, the file must have been modified. If the modification | |
566 time has changed, but the size has not, only then does Mercurial need | |
567 to read the actual contents of the file to see if they've changed. | |
568 Storing these few extra pieces of information dramatically reduces the | |
569 amount of data that Mercurial needs to read, which yields large | |
570 performance improvements compared to other revision control systems. | |
571 | |
463 %%% Local Variables: | 572 %%% Local Variables: |
464 %%% mode: latex | 573 %%% mode: latex |
465 %%% TeX-master: "00book" | 574 %%% TeX-master: "00book" |
466 %%% End: | 575 %%% End: |