Mercurial > hgbook
comparison en/ch03-concepts.xml @ 753:1c13ed2130a7
Merge with http://hg.serpentine.com/mercurial/book
author | Dongsheng Song <dongsheng.song@gmail.com> |
---|---|
date | Mon, 30 Mar 2009 16:23:33 +0800 |
parents | 7e7c47481e4f 0b45854f0b7b |
children | e9ef075327c1 |
comparison
equal
deleted
inserted
replaced
752:6b1577ef5135 | 753:1c13ed2130a7 |
---|---|
1 <!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : --> | 1 <!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : --> |
2 | 2 |
3 <chapter id="chap.concepts"> | 3 <chapter id="chap:concepts"> |
4 <?dbhtml filename="behind-the-scenes.html"?> | 4 <?dbhtml filename="behind-the-scenes.html"?> |
5 <title>Behind the scenes</title> | 5 <title>Behind the scenes</title> |
6 | 6 |
7 <para>Unlike many revision control systems, the concepts upon which | 7 <para id="x_2e8">Unlike many revision control systems, the concepts upon which |
8 Mercurial is built are simple enough that it's easy to understand | 8 Mercurial is built are simple enough that it's easy to understand |
9 how the software really works. Knowing this certainly isn't | 9 how the software really works. Knowing this certainly isn't |
10 necessary, but I find it useful to have a <quote>mental | 10 necessary, but I find it useful to have a <quote>mental |
11 model</quote> of what's going on.</para> | 11 model</quote> of what's going on.</para> |
12 | 12 |
13 <para>This understanding gives me confidence that Mercurial has been | 13 <para id="x_2e9">This understanding gives me confidence that Mercurial has been |
14 carefully designed to be both <emphasis>safe</emphasis> and | 14 carefully designed to be both <emphasis>safe</emphasis> and |
15 <emphasis>efficient</emphasis>. And just as importantly, if it's | 15 <emphasis>efficient</emphasis>. And just as importantly, if it's |
16 easy for me to retain a good idea of what the software is doing | 16 easy for me to retain a good idea of what the software is doing |
17 when I perform a revision control task, I'm less likely to be | 17 when I perform a revision control task, I'm less likely to be |
18 surprised by its behaviour.</para> | 18 surprised by its behaviour.</para> |
19 | 19 |
20 <para>In this chapter, we'll initially cover the core concepts | 20 <para id="x_2ea">In this chapter, we'll initially cover the core concepts |
21 behind Mercurial's design, then continue to discuss some of the | 21 behind Mercurial's design, then continue to discuss some of the |
22 interesting details of its implementation.</para> | 22 interesting details of its implementation.</para> |
23 | 23 |
24 <sect1> | 24 <sect1> |
25 <title>Mercurial's historical record</title> | 25 <title>Mercurial's historical record</title> |
26 | 26 |
27 <sect2> | 27 <sect2> |
28 <title>Tracking the history of a single file</title> | 28 <title>Tracking the history of a single file</title> |
29 | 29 |
30 <para>When Mercurial tracks modifications to a file, it stores | 30 <para id="x_2eb">When Mercurial tracks modifications to a file, it stores |
31 the history of that file in a metadata object called a | 31 the history of that file in a metadata object called a |
32 <emphasis>filelog</emphasis>. Each entry in the filelog | 32 <emphasis>filelog</emphasis>. Each entry in the filelog |
33 contains enough information to reconstruct one revision of the | 33 contains enough information to reconstruct one revision of the |
34 file that is being tracked. Filelogs are stored as files in | 34 file that is being tracked. Filelogs are stored as files in |
35 the <filename role="special" | 35 the <filename role="special" |
36 class="directory">.hg/store/data</filename> directory. A | 36 class="directory">.hg/store/data</filename> directory. A |
37 filelog contains two kinds of information: revision data, and | 37 filelog contains two kinds of information: revision data, and |
38 an index to help Mercurial to find a revision | 38 an index to help Mercurial to find a revision |
39 efficiently.</para> | 39 efficiently.</para> |
40 | 40 |
41 <para>A file that is large, or has a lot of history, has its | 41 <para id="x_2ec">A file that is large, or has a lot of history, has its |
42 filelog stored in separate data | 42 filelog stored in separate data |
43 (<quote><literal>.d</literal></quote> suffix) and index | 43 (<quote><literal>.d</literal></quote> suffix) and index |
44 (<quote><literal>.i</literal></quote> suffix) files. For | 44 (<quote><literal>.i</literal></quote> suffix) files. For |
45 small files without much history, the revision data and index | 45 small files without much history, the revision data and index |
46 are combined in a single <quote><literal>.i</literal></quote> | 46 are combined in a single <quote><literal>.i</literal></quote> |
47 file. The correspondence between a file in the working | 47 file. The correspondence between a file in the working |
48 directory and the filelog that tracks its history in the | 48 directory and the filelog that tracks its history in the |
49 repository is illustrated in figure <xref | 49 repository is illustrated in <xref |
50 endterm="fig.concepts.filelog.caption" | 50 linkend="fig:concepts:filelog"/>.</para> |
51 linkend="fig.concepts.filelog"/>.</para> | 51 |
52 | 52 <figure id="fig:concepts:filelog"> |
53 <informalfigure id="fig.concepts.filelog"> | 53 <title>Relationships between files in working directory and |
54 <mediaobject> | 54 filelogs in repository</title> |
55 <imageobject><imagedata fileref="images/filelog.png"/></imageobject> | 55 <mediaobject> |
56 <textobject><phrase>XXX add text</phrase></textobject> | 56 <imageobject><imagedata fileref="figs/filelog.png"/></imageobject> |
57 <caption><para id="fig.concepts.filelog.caption">Relationships between | 57 <textobject><phrase>XXX add text</phrase></textobject> |
58 files in working directory and filelogs in repository</para> | 58 </mediaobject> |
59 </caption> | 59 </figure> |
60 </mediaobject> | |
61 </informalfigure> | |
62 | 60 |
63 </sect2> | 61 </sect2> |
64 <sect2> | 62 <sect2> |
65 <title>Managing tracked files</title> | 63 <title>Managing tracked files</title> |
66 | 64 |
67 <para>Mercurial uses a structure called a | 65 <para id="x_2ee">Mercurial uses a structure called a |
68 <emphasis>manifest</emphasis> to collect together information | 66 <emphasis>manifest</emphasis> to collect together information |
69 about the files that it tracks. Each entry in the manifest | 67 about the files that it tracks. Each entry in the manifest |
70 contains information about the files present in a single | 68 contains information about the files present in a single |
71 changeset. An entry records which files are present in the | 69 changeset. An entry records which files are present in the |
72 changeset, the revision of each file, and a few other pieces | 70 changeset, the revision of each file, and a few other pieces |
74 | 72 |
75 </sect2> | 73 </sect2> |
76 <sect2> | 74 <sect2> |
77 <title>Recording changeset information</title> | 75 <title>Recording changeset information</title> |
78 | 76 |
79 <para>The <emphasis>changelog</emphasis> contains information | 77 <para id="x_2ef">The <emphasis>changelog</emphasis> contains information |
80 about each changeset. Each revision records who committed a | 78 about each changeset. Each revision records who committed a |
81 change, the changeset comment, other pieces of | 79 change, the changeset comment, other pieces of |
82 changeset-related information, and the revision of the | 80 changeset-related information, and the revision of the |
83 manifest to use.</para> | 81 manifest to use.</para> |
84 | 82 |
85 </sect2> | 83 </sect2> |
86 <sect2> | 84 <sect2> |
87 <title>Relationships between revisions</title> | 85 <title>Relationships between revisions</title> |
88 | 86 |
89 <para>Within a changelog, a manifest, or a filelog, each | 87 <para id="x_2f0">Within a changelog, a manifest, or a filelog, each |
90 revision stores a pointer to its immediate parent (or to its | 88 revision stores a pointer to its immediate parent (or to its |
91 two parents, if it's a merge revision). As I mentioned above, | 89 two parents, if it's a merge revision). As I mentioned above, |
92 there are also relationships between revisions | 90 there are also relationships between revisions |
93 <emphasis>across</emphasis> these structures, and they are | 91 <emphasis>across</emphasis> these structures, and they are |
94 hierarchical in nature.</para> | 92 hierarchical in nature.</para> |
95 | 93 |
96 <para>For every changeset in a repository, there is exactly one | 94 <para id="x_2f1">For every changeset in a repository, there is exactly one |
97 revision stored in the changelog. Each revision of the | 95 revision stored in the changelog. Each revision of the |
98 changelog contains a pointer to a single revision of the | 96 changelog contains a pointer to a single revision of the |
99 manifest. A revision of the manifest stores a pointer to a | 97 manifest. A revision of the manifest stores a pointer to a |
100 single revision of each filelog tracked when that changeset | 98 single revision of each filelog tracked when that changeset |
101 was created. These relationships are illustrated in figure | 99 was created. These relationships are illustrated in |
102 <xref endterm="fig.concepts.metadata.caption" | 100 <xref linkend="fig:concepts:metadata"/>.</para> |
103 linkend="fig.concepts.metadata"/>.</para> | 101 |
104 | 102 <figure id="fig:concepts:metadata"> |
105 <informalfigure id="fig.concepts.metadata"> | 103 <title>Metadata relationships</title> |
106 <mediaobject> | 104 <mediaobject> |
107 <imageobject><imagedata fileref="images/metadata.png"/></imageobject> | 105 <imageobject><imagedata fileref="figs/metadata.png"/></imageobject> |
108 <textobject><phrase>XXX add text</phrase></textobject> | 106 <textobject><phrase>XXX add text</phrase></textobject> |
109 <caption><para id="fig.concepts.metadata.caption">Metadata | 107 </mediaobject> |
110 relationships</para></caption> | 108 </figure> |
111 </mediaobject> | 109 |
112 </informalfigure> | 110 <para id="x_2f3">As the illustration shows, there is |
113 | |
114 <para>As the illustration shows, there is | |
115 <emphasis>not</emphasis> a <quote>one to one</quote> | 111 <emphasis>not</emphasis> a <quote>one to one</quote> |
116 relationship between revisions in the changelog, manifest, or | 112 relationship between revisions in the changelog, manifest, or |
117 filelog. If the manifest hasn't changed between two | 113 filelog. If the manifest hasn't changed between two |
118 changesets, the changelog entries for those changesets will | 114 changesets, the changelog entries for those changesets will |
119 point to the same revision of the manifest. If a file that | 115 point to the same revision of the manifest. If a file that |
124 </sect2> | 120 </sect2> |
125 </sect1> | 121 </sect1> |
126 <sect1> | 122 <sect1> |
127 <title>Safe, efficient storage</title> | 123 <title>Safe, efficient storage</title> |
128 | 124 |
129 <para>The underpinnings of changelogs, manifests, and filelogs are | 125 <para id="x_2f4">The underpinnings of changelogs, manifests, and filelogs are |
130 provided by a single structure called the | 126 provided by a single structure called the |
131 <emphasis>revlog</emphasis>.</para> | 127 <emphasis>revlog</emphasis>.</para> |
132 | 128 |
133 <sect2> | 129 <sect2> |
134 <title>Efficient storage</title> | 130 <title>Efficient storage</title> |
135 | 131 |
136 <para>The revlog provides efficient storage of revisions using a | 132 <para id="x_2f5">The revlog provides efficient storage of revisions using a |
137 <emphasis>delta</emphasis> mechanism. Instead of storing a | 133 <emphasis>delta</emphasis> mechanism. Instead of storing a |
138 complete copy of a file for each revision, it stores the | 134 complete copy of a file for each revision, it stores the |
139 changes needed to transform an older revision into the new | 135 changes needed to transform an older revision into the new |
140 revision. For many kinds of file data, these deltas are | 136 revision. For many kinds of file data, these deltas are |
141 typically a fraction of a percent of the size of a full copy | 137 typically a fraction of a percent of the size of a full copy |
142 of a file.</para> | 138 of a file.</para> |
143 | 139 |
144 <para>Some obsolete revision control systems can only work with | 140 <para id="x_2f6">Some obsolete revision control systems can only work with |
145 deltas of text files. They must either store binary files as | 141 deltas of text files. They must either store binary files as |
146 complete snapshots or encoded into a text representation, both | 142 complete snapshots or encoded into a text representation, both |
147 of which are wasteful approaches. Mercurial can efficiently | 143 of which are wasteful approaches. Mercurial can efficiently |
148 handle deltas of files with arbitrary binary contents; it | 144 handle deltas of files with arbitrary binary contents; it |
149 doesn't need to treat text as special.</para> | 145 doesn't need to treat text as special.</para> |
150 | 146 |
151 </sect2> | 147 </sect2> |
152 <sect2 id="sec.concepts.txn"> | 148 <sect2 id="sec:concepts:txn"> |
153 <title>Safe operation</title> | 149 <title>Safe operation</title> |
154 | 150 |
155 <para>Mercurial only ever <emphasis>appends</emphasis> data to | 151 <para id="x_2f7">Mercurial only ever <emphasis>appends</emphasis> data to |
156 the end of a revlog file. It never modifies a section of a | 152 the end of a revlog file. It never modifies a section of a |
157 file after it has written it. This is both more robust and | 153 file after it has written it. This is both more robust and |
158 efficient than schemes that need to modify or rewrite | 154 efficient than schemes that need to modify or rewrite |
159 data.</para> | 155 data.</para> |
160 | 156 |
161 <para>In addition, Mercurial treats every write as part of a | 157 <para id="x_2f8">In addition, Mercurial treats every write as part of a |
162 <emphasis>transaction</emphasis> that can span a number of | 158 <emphasis>transaction</emphasis> that can span a number of |
163 files. A transaction is <emphasis>atomic</emphasis>: either | 159 files. A transaction is <emphasis>atomic</emphasis>: either |
164 the entire transaction succeeds and its effects are all | 160 the entire transaction succeeds and its effects are all |
165 visible to readers in one go, or the whole thing is undone. | 161 visible to readers in one go, or the whole thing is undone. |
166 This guarantee of atomicity means that if you're running two | 162 This guarantee of atomicity means that if you're running two |
167 copies of Mercurial, where one is reading data and one is | 163 copies of Mercurial, where one is reading data and one is |
168 writing it, the reader will never see a partially written | 164 writing it, the reader will never see a partially written |
169 result that might confuse it.</para> | 165 result that might confuse it.</para> |
170 | 166 |
171 <para>The fact that Mercurial only appends to files makes it | 167 <para id="x_2f9">The fact that Mercurial only appends to files makes it |
172 easier to provide this transactional guarantee. The easier it | 168 easier to provide this transactional guarantee. The easier it |
173 is to do stuff like this, the more confident you should be | 169 is to do stuff like this, the more confident you should be |
174 that it's done correctly.</para> | 170 that it's done correctly.</para> |
175 | 171 |
176 </sect2> | 172 </sect2> |
177 <sect2> | 173 <sect2> |
178 <title>Fast retrieval</title> | 174 <title>Fast retrieval</title> |
179 | 175 |
180 <para>Mercurial cleverly avoids a pitfall common to all earlier | 176 <para id="x_2fa">Mercurial cleverly avoids a pitfall common to all earlier |
181 revision control systems: the problem of <emphasis>inefficient | 177 revision control systems: the problem of <emphasis>inefficient |
182 retrieval</emphasis>. Most revision control systems store | 178 retrieval</emphasis>. Most revision control systems store |
183 the contents of a revision as an incremental series of | 179 the contents of a revision as an incremental series of |
184 modifications against a <quote>snapshot</quote>. To | 180 modifications against a <quote>snapshot</quote>. To |
185 reconstruct a specific revision, you must first read the | 181 reconstruct a specific revision, you must first read the |
186 snapshot, and then every one of the revisions between the | 182 snapshot, and then every one of the revisions between the |
187 snapshot and your target revision. The more history that a | 183 snapshot and your target revision. The more history that a |
188 file accumulates, the more revisions you must read, hence the | 184 file accumulates, the more revisions you must read, hence the |
189 longer it takes to reconstruct a particular revision.</para> | 185 longer it takes to reconstruct a particular revision.</para> |
190 | 186 |
191 <informalfigure id="fig.concepts.snapshot"> | 187 <figure id="fig:concepts:snapshot"> |
192 <mediaobject> | 188 <title>Snapshot of a revlog, with incremental deltas</title> |
193 <imageobject><imagedata fileref="images/snapshot.png"/></imageobject> | 189 <mediaobject> |
194 <textobject><phrase>XXX add text</phrase></textobject> | 190 <imageobject><imagedata fileref="figs/snapshot.png"/></imageobject> |
195 <caption><para id="fig.concepts.snapshot.caption">Snapshot of | 191 <textobject><phrase>XXX add text</phrase></textobject> |
196 a revlog, with incremental deltas</para></caption> | 192 </mediaobject> |
197 </mediaobject> | 193 </figure> |
198 </informalfigure> | 194 |
199 | 195 <para id="x_2fc">The innovation that Mercurial applies to this problem is |
200 <para>The innovation that Mercurial applies to this problem is | |
201 simple but effective. Once the cumulative amount of delta | 196 simple but effective. Once the cumulative amount of delta |
202 information stored since the last snapshot exceeds a fixed | 197 information stored since the last snapshot exceeds a fixed |
203 threshold, it stores a new snapshot (compressed, of course), | 198 threshold, it stores a new snapshot (compressed, of course), |
204 instead of another delta. This makes it possible to | 199 instead of another delta. This makes it possible to |
205 reconstruct <emphasis>any</emphasis> revision of a file | 200 reconstruct <emphasis>any</emphasis> revision of a file |
206 quickly. This approach works so well that it has since been | 201 quickly. This approach works so well that it has since been |
207 copied by several other revision control systems.</para> | 202 copied by several other revision control systems.</para> |
208 | 203 |
209 <para>Figure <xref endterm="fig.concepts.snapshot.caption" | 204 <para id="x_2fd"><xref linkend="fig:concepts:snapshot"/> illustrates |
210 linkend="fig.concepts.snapshot"/> illustrates | |
211 the idea. In an entry in a revlog's index file, Mercurial | 205 the idea. In an entry in a revlog's index file, Mercurial |
212 stores the range of entries from the data file that it must | 206 stores the range of entries from the data file that it must |
213 read to reconstruct a particular revision.</para> | 207 read to reconstruct a particular revision.</para> |
214 | 208 |
215 <sect3> | 209 <sect3> |
216 <title>Aside: the influence of video compression</title> | 210 <title>Aside: the influence of video compression</title> |
217 | 211 |
218 <para>If you're familiar with video compression or have ever | 212 <para id="x_2fe">If you're familiar with video compression or have ever |
219 watched a TV feed through a digital cable or satellite | 213 watched a TV feed through a digital cable or satellite |
220 service, you may know that most video compression schemes | 214 service, you may know that most video compression schemes |
221 store each frame of video as a delta against its predecessor | 215 store each frame of video as a delta against its predecessor |
222 frame. In addition, these schemes use <quote>lossy</quote> | 216 frame. In addition, these schemes use <quote>lossy</quote> |
223 compression techniques to increase the compression ratio, so | 217 compression techniques to increase the compression ratio, so |
224 visual errors accumulate over the course of a number of | 218 visual errors accumulate over the course of a number of |
225 inter-frame deltas.</para> | 219 inter-frame deltas.</para> |
226 | 220 |
227 <para>Because it's possible for a video stream to <quote>drop | 221 <para id="x_2ff">Because it's possible for a video stream to <quote>drop |
228 out</quote> occasionally due to signal glitches, and to | 222 out</quote> occasionally due to signal glitches, and to |
229 limit the accumulation of artefacts introduced by the lossy | 223 limit the accumulation of artefacts introduced by the lossy |
230 compression process, video encoders periodically insert a | 224 compression process, video encoders periodically insert a |
231 complete frame (called a <quote>key frame</quote>) into the | 225 complete frame (called a <quote>key frame</quote>) into the |
232 video stream; the next delta is generated against that | 226 video stream; the next delta is generated against that |
238 </sect3> | 232 </sect3> |
239 </sect2> | 233 </sect2> |
240 <sect2> | 234 <sect2> |
241 <title>Identification and strong integrity</title> | 235 <title>Identification and strong integrity</title> |
242 | 236 |
243 <para>Along with delta or snapshot information, a revlog entry | 237 <para id="x_300">Along with delta or snapshot information, a revlog entry |
244 contains a cryptographic hash of the data that it represents. | 238 contains a cryptographic hash of the data that it represents. |
245 This makes it difficult to forge the contents of a revision, | 239 This makes it difficult to forge the contents of a revision, |
246 and easy to detect accidental corruption.</para> | 240 and easy to detect accidental corruption.</para> |
247 | 241 |
248 <para>Hashes provide more than a mere check against corruption; | 242 <para id="x_301">Hashes provide more than a mere check against corruption; |
249 they are used as the identifiers for revisions. The changeset | 243 they are used as the identifiers for revisions. The changeset |
250 identification hashes that you see as an end user are from | 244 identification hashes that you see as an end user are from |
251 revisions of the changelog. Although filelogs and the | 245 revisions of the changelog. Although filelogs and the |
252 manifest also use hashes, Mercurial only uses these behind the | 246 manifest also use hashes, Mercurial only uses these behind the |
253 scenes.</para> | 247 scenes.</para> |
254 | 248 |
255 <para>Mercurial verifies that hashes are correct when it | 249 <para id="x_302">Mercurial verifies that hashes are correct when it |
256 retrieves file revisions and when it pulls changes from | 250 retrieves file revisions and when it pulls changes from |
257 another repository. If it encounters an integrity problem, it | 251 another repository. If it encounters an integrity problem, it |
258 will complain and stop whatever it's doing.</para> | 252 will complain and stop whatever it's doing.</para> |
259 | 253 |
260 <para>In addition to the effect it has on retrieval efficiency, | 254 <para id="x_303">In addition to the effect it has on retrieval efficiency, |
261 Mercurial's use of periodic snapshots makes it more robust | 255 Mercurial's use of periodic snapshots makes it more robust |
262 against partial data corruption. If a revlog becomes partly | 256 against partial data corruption. If a revlog becomes partly |
263 corrupted due to a hardware error or system bug, it's often | 257 corrupted due to a hardware error or system bug, it's often |
264 possible to reconstruct some or most revisions from the | 258 possible to reconstruct some or most revisions from the |
265 uncorrupted sections of the revlog, both before and after the | 259 uncorrupted sections of the revlog, both before and after the |
269 </sect2> | 263 </sect2> |
270 </sect1> | 264 </sect1> |
271 <sect1> | 265 <sect1> |
272 <title>Revision history, branching, and merging</title> | 266 <title>Revision history, branching, and merging</title> |
273 | 267 |
274 <para>Every entry in a Mercurial revlog knows the identity of its | 268 <para id="x_304">Every entry in a Mercurial revlog knows the identity of its |
275 immediate ancestor revision, usually referred to as its | 269 immediate ancestor revision, usually referred to as its |
276 <emphasis>parent</emphasis>. In fact, a revision contains room | 270 <emphasis>parent</emphasis>. In fact, a revision contains room |
277 for not one parent, but two. Mercurial uses a special hash, | 271 for not one parent, but two. Mercurial uses a special hash, |
278 called the <quote>null ID</quote>, to represent the idea | 272 called the <quote>null ID</quote>, to represent the idea |
279 <quote>there is no parent here</quote>. This hash is simply a | 273 <quote>there is no parent here</quote>. This hash is simply a |
280 string of zeroes.</para> | 274 string of zeroes.</para> |
281 | 275 |
282 <para>In figure <xref endterm="fig.concepts.revlog.caption" | 276 <para id="x_305">In <xref linkend="fig:concepts:revlog"/>, you can see |
283 linkend="fig.concepts.revlog"/>, you can see | |
284 an example of the conceptual structure of a revlog. Filelogs, | 277 an example of the conceptual structure of a revlog. Filelogs, |
285 manifests, and changelogs all have this same structure; they | 278 manifests, and changelogs all have this same structure; they |
286 differ only in the kind of data stored in each delta or | 279 differ only in the kind of data stored in each delta or |
287 snapshot.</para> | 280 snapshot.</para> |
288 | 281 |
289 <para>The first revision in a revlog (at the bottom of the image) | 282 <para id="x_306">The first revision in a revlog (at the bottom of the image) |
290 has the null ID in both of its parent slots. For a | 283 has the null ID in both of its parent slots. For a |
291 <quote>normal</quote> revision, its first parent slot contains | 284 <quote>normal</quote> revision, its first parent slot contains |
292 the ID of its parent revision, and its second contains the null | 285 the ID of its parent revision, and its second contains the null |
293 ID, indicating that the revision has only one real parent. Any | 286 ID, indicating that the revision has only one real parent. Any |
294 two revisions that have the same parent ID are branches. A | 287 two revisions that have the same parent ID are branches. A |
295 revision that represents a merge between branches has two normal | 288 revision that represents a merge between branches has two normal |
296 revision IDs in its parent slots.</para> | 289 revision IDs in its parent slots.</para> |
297 | 290 |
298 <informalfigure id="fig.concepts.revlog"> | 291 <figure id="fig:concepts:revlog"> |
292 <title>The conceptual structure of a revlog</title> | |
299 <mediaobject> | 293 <mediaobject> |
300 <imageobject><imagedata fileref="images/revlog.png"/></imageobject> | 294 <imageobject><imagedata fileref="figs/revlog.png"/></imageobject> |
301 <textobject><phrase>XXX add text</phrase></textobject> | 295 <textobject><phrase>XXX add text</phrase></textobject> |
302 <caption><para id="fig.concepts.revlog.caption">Revision in revlog</para> | |
303 </caption> | |
304 </mediaobject> | 296 </mediaobject> |
305 </informalfigure> | 297 </figure> |
306 | 298 |
307 </sect1> | 299 </sect1> |
308 <sect1> | 300 <sect1> |
309 <title>The working directory</title> | 301 <title>The working directory</title> |
310 | 302 |
311 <para>In the working directory, Mercurial stores a snapshot of the | 303 <para id="x_307">In the working directory, Mercurial stores a snapshot of the |
312 files from the repository as of a particular changeset.</para> | 304 files from the repository as of a particular changeset.</para> |
313 | 305 |
314 <para>The working directory <quote>knows</quote> which changeset | 306 <para id="x_308">The working directory <quote>knows</quote> which changeset |
315 it contains. When you update the working directory to contain a | 307 it contains. When you update the working directory to contain a |
316 particular changeset, Mercurial looks up the appropriate | 308 particular changeset, Mercurial looks up the appropriate |
317 revision of the manifest to find out which files it was tracking | 309 revision of the manifest to find out which files it was tracking |
318 at the time that changeset was committed, and which revision of | 310 at the time that changeset was committed, and which revision of |
319 each file was then current. It then recreates a copy of each of | 311 each file was then current. It then recreates a copy of each of |
320 those files, with the same contents it had when the changeset | 312 those files, with the same contents it had when the changeset |
321 was committed.</para> | 313 was committed.</para> |
322 | 314 |
323 <para>The <emphasis>dirstate</emphasis> contains Mercurial's | 315 <para id="x_309">The <emphasis>dirstate</emphasis> contains Mercurial's |
324 knowledge of the working directory. This details which | 316 knowledge of the working directory. This details which |
325 changeset the working directory is updated to, and all of the | 317 changeset the working directory is updated to, and all of the |
326 files that Mercurial is tracking in the working | 318 files that Mercurial is tracking in the working |
327 directory.</para> | 319 directory.</para> |
328 | 320 |
329 <para>Just as a revision of a revlog has room for two parents, so | 321 <para id="x_30a">Just as a revision of a revlog has room for two parents, so |
330 that it can represent either a normal revision (with one parent) | 322 that it can represent either a normal revision (with one parent) |
331 or a merge of two earlier revisions, the dirstate has slots for | 323 or a merge of two earlier revisions, the dirstate has slots for |
332 two parents. When you use the <command role="hg-cmd">hg | 324 two parents. When you use the <command role="hg-cmd">hg |
333 update</command> command, the changeset that you update to is | 325 update</command> command, the changeset that you update to is |
334 stored in the <quote>first parent</quote> slot, and the null ID | 326 stored in the <quote>first parent</quote> slot, and the null ID |
340 dirstate are.</para> | 332 dirstate are.</para> |
341 | 333 |
342 <sect2> | 334 <sect2> |
343 <title>What happens when you commit</title> | 335 <title>What happens when you commit</title> |
344 | 336 |
345 <para>The dirstate stores parent information for more than just | 337 <para id="x_30b">The dirstate stores parent information for more than just |
346 book-keeping purposes. Mercurial uses the parents of the | 338 book-keeping purposes. Mercurial uses the parents of the |
347 dirstate as <emphasis>the parents of a new | 339 dirstate as <emphasis>the parents of a new |
348 changeset</emphasis> when you perform a commit.</para> | 340 changeset</emphasis> when you perform a commit.</para> |
349 | 341 |
350 <informalfigure id="fig.concepts.wdir"> | 342 <figure id="fig:concepts:wdir"> |
351 <mediaobject> | 343 <title>The working directory can have two parents</title> |
352 <imageobject><imagedata fileref="images/wdir.png"/></imageobject> | 344 <mediaobject> |
353 <textobject><phrase>XXX add text</phrase></textobject> | 345 <imageobject><imagedata fileref="figs/wdir.png"/></imageobject> |
354 <caption><para id="fig.concepts.wdir.caption">The working | 346 <textobject><phrase>XXX add text</phrase></textobject> |
355 directory can have two parents</para></caption> | 347 </mediaobject> |
356 </mediaobject> | 348 </figure> |
357 </informalfigure> | 349 |
358 | 350 <para id="x_30d"><xref linkend="fig:concepts:wdir"/> shows the |
359 <para>Figure <xref endterm="fig.concepts.wdir.caption" | |
360 linkend="fig.concepts.wdir"/> shows the | |
361 normal state of the working directory, where it has a single | 351 normal state of the working directory, where it has a single |
362 changeset as parent. That changeset is the | 352 changeset as parent. That changeset is the |
363 <emphasis>tip</emphasis>, the newest changeset in the | 353 <emphasis>tip</emphasis>, the newest changeset in the |
364 repository that has no children.</para> | 354 repository that has no children.</para> |
365 | 355 |
366 <informalfigure id="fig.concepts.wdir-after-commit"> | 356 <figure id="fig:concepts:wdir-after-commit"> |
367 <mediaobject> | 357 <title>The working directory gains new parents after a |
368 <imageobject><imagedata fileref="images/wdir-after-commit.png"/> | 358 commit</title> |
369 </imageobject> | 359 <mediaobject> |
370 <textobject><phrase>XXX add text</phrase></textobject> | 360 <imageobject><imagedata fileref="figs/wdir-after-commit.png"/></imageobject> |
371 <caption><para id="fig.concepts.wdir-after-commit.caption">The working | 361 <textobject><phrase>XXX add text</phrase></textobject> |
372 directory gains new parents after a commit</para></caption> | 362 </mediaobject> |
373 </mediaobject> | 363 </figure> |
374 </informalfigure> | 364 |
375 | 365 <para id="x_30f">It's useful to think of the working directory as |
376 <para>It's useful to think of the working directory as | |
377 <quote>the changeset I'm about to commit</quote>. Any files | 366 <quote>the changeset I'm about to commit</quote>. Any files |
378 that you tell Mercurial that you've added, removed, renamed, | 367 that you tell Mercurial that you've added, removed, renamed, |
379 or copied will be reflected in that changeset, as will | 368 or copied will be reflected in that changeset, as will |
380 modifications to any files that Mercurial is already tracking; | 369 modifications to any files that Mercurial is already tracking; |
381 the new changeset will have the parents of the working | 370 the new changeset will have the parents of the working |
382 directory as its parents.</para> | 371 directory as its parents.</para> |
383 | 372 |
384 <para>After a commit, Mercurial will update the parents of the | 373 <para id="x_310">After a commit, Mercurial will update the |
385 working directory, so that the first parent is the ID of the | 374 parents of the working directory, so that the first parent is |
386 new changeset, and the second is the null ID. This is shown | 375 the ID of the new changeset, and the second is the null ID. |
387 in figure <xref endterm="fig.concepts.wdir-after-commit.caption" | 376 This is shown in <xref |
388 linkend="fig.concepts.wdir-after-commit"/>. | 377 linkend="fig:concepts:wdir-after-commit"/>. Mercurial |
389 Mercurial | |
390 doesn't touch any of the files in the working directory when | 378 doesn't touch any of the files in the working directory when |
391 you commit; it just modifies the dirstate to note its new | 379 you commit; it just modifies the dirstate to note its new |
392 parents.</para> | 380 parents.</para> |
393 | 381 |
394 </sect2> | 382 </sect2> |
395 <sect2> | 383 <sect2> |
396 <title>Creating a new head</title> | 384 <title>Creating a new head</title> |
397 | 385 |
398 <para>It's perfectly normal to update the working directory to a | 386 <para id="x_311">It's perfectly normal to update the working directory to a |
399 changeset other than the current tip. For example, you might | 387 changeset other than the current tip. For example, you might |
400 want to know what your project looked like last Tuesday, or | 388 want to know what your project looked like last Tuesday, or |
401 you could be looking through changesets to see which one | 389 you could be looking through changesets to see which one |
402 introduced a bug. In cases like this, the natural thing to do | 390 introduced a bug. In cases like this, the natural thing to do |
403 is update the working directory to the changeset you're | 391 is update the working directory to the changeset you're |
404 interested in, and then examine the files in the working | 392 interested in, and then examine the files in the working |
405 directory directly to see their contents as they were when you | 393 directory directly to see their contents as they were when you |
406 committed that changeset. The effect of this is shown in | 394 committed that changeset. The effect of this is shown in |
407 figure <xref endterm="fig.concepts.wdir-pre-branch.caption" | 395 <xref linkend="fig:concepts:wdir-pre-branch"/>.</para> |
408 linkend="fig.concepts.wdir-pre-branch"/>.</para> | 396 |
409 | 397 <figure id="fig:concepts:wdir-pre-branch"> |
410 <informalfigure id="fig.concepts.wdir-pre-branch"> | 398 <title>The working directory, updated to an older |
411 <mediaobject> | 399 changeset</title> |
412 <imageobject><imagedata fileref="images/wdir-pre-branch.png"/> | 400 <mediaobject> |
413 </imageobject> | 401 <imageobject><imagedata fileref="figs/wdir-pre-branch.png"/></imageobject> |
414 <textobject><phrase>XXX add text</phrase></textobject> | 402 <textobject><phrase>XXX add text</phrase></textobject> |
415 <caption><para id="fig.concepts.wdir-pre-branch.caption">The working | 403 </mediaobject> |
416 directory, updated to an older changeset</para></caption> | 404 </figure> |
417 </mediaobject> | 405 |
418 </informalfigure> | 406 <para id="x_313">Having updated the working directory to an |
419 | 407 older changeset, what happens if you make some changes, and |
420 <para>Having updated the working directory to an older | 408 then commit? Mercurial behaves in the same way as I outlined |
421 changeset, what happens if you make some changes, and then | |
422 commit? Mercurial behaves in the same way as I outlined | |
423 above. The parents of the working directory become the | 409 above. The parents of the working directory become the |
424 parents of the new changeset. This new changeset has no | 410 parents of the new changeset. This new changeset has no |
425 children, so it becomes the new tip. And the repository now | 411 children, so it becomes the new tip. And the repository now |
426 contains two changesets that have no children; we call these | 412 contains two changesets that have no children; we call these |
427 <emphasis>heads</emphasis>. You can see the structure that | 413 <emphasis>heads</emphasis>. You can see the structure that |
428 this creates in figure <xref | 414 this creates in <xref |
429 endterm="fig.concepts.wdir-branch.caption" | 415 linkend="fig:concepts:wdir-branch"/>.</para> |
430 linkend="fig.concepts.wdir-branch"/>.</para> | 416 |
431 | 417 <figure id="fig:concepts:wdir-branch"> |
432 <informalfigure id="fig.concepts.wdir-branch"> | 418 <title>After a commit made while synced to an older |
433 <mediaobject> | 419 changeset</title> |
434 <imageobject><imagedata fileref="images/wdir-branch.png"/> | 420 <mediaobject> |
435 </imageobject> | 421 <imageobject><imagedata fileref="figs/wdir-branch.png"/></imageobject> |
436 <textobject><phrase>XXX add text</phrase></textobject> | 422 <textobject><phrase>XXX add text</phrase></textobject> |
437 <caption><para id="fig.concepts.wdir-branch.caption">After a | 423 </mediaobject> |
438 commit made while synced to an older changeset</para></caption> | 424 </figure> |
439 </mediaobject> | |
440 </informalfigure> | |
441 | 425 |
442 <note> | 426 <note> |
443 <para> If you're new to Mercurial, you should keep in mind a | 427 <para id="x_315"> If you're new to Mercurial, you should keep in mind a |
444 common <quote>error</quote>, which is to use the <command | 428 common <quote>error</quote>, which is to use the <command |
445 role="hg-cmd">hg pull</command> command without any | 429 role="hg-cmd">hg pull</command> command without any |
446 options. By default, the <command role="hg-cmd">hg | 430 options. By default, the <command role="hg-cmd">hg |
447 pull</command> command <emphasis>does not</emphasis> | 431 pull</command> command <emphasis>does not</emphasis> |
448 update the working directory, so you'll bring new changesets | 432 update the working directory, so you'll bring new changesets |
450 synced at the same changeset as before the pull. If you | 434 synced at the same changeset as before the pull. If you |
451 make some changes and commit afterwards, you'll thus create | 435 make some changes and commit afterwards, you'll thus create |
452 a new head, because your working directory isn't synced to | 436 a new head, because your working directory isn't synced to |
453 whatever the current tip is.</para> | 437 whatever the current tip is.</para> |
454 | 438 |
455 <para> I put the word <quote>error</quote> in quotes because | 439 <para id="x_316"> I put the word <quote>error</quote> in quotes because |
456 all that you need to do to rectify this situation is | 440 all that you need to do to rectify this situation is |
457 <command role="hg-cmd">hg merge</command>, then <command | 441 <command role="hg-cmd">hg merge</command>, then <command |
458 role="hg-cmd">hg commit</command>. In other words, this | 442 role="hg-cmd">hg commit</command>. In other words, this |
459 almost never has negative consequences; it just surprises | 443 almost never has negative consequences; it just surprises |
460 people. I'll discuss other ways to avoid this behaviour, | 444 people. I'll discuss other ways to avoid this behaviour, |
464 | 448 |
465 </sect2> | 449 </sect2> |
466 <sect2> | 450 <sect2> |
467 <title>Merging heads</title> | 451 <title>Merging heads</title> |
468 | 452 |
469 <para>When you run the <command role="hg-cmd">hg merge</command> | 453 <para id="x_317">When you run the <command role="hg-cmd">hg |
470 command, Mercurial leaves the first parent of the working | 454 merge</command> command, Mercurial leaves the first parent |
471 directory unchanged, and sets the second parent to the | 455 of the working directory unchanged, and sets the second parent |
472 changeset you're merging with, as shown in figure <xref | 456 to the changeset you're merging with, as shown in <xref |
473 endterm="fig.concepts.wdir-merge.caption" | 457 linkend="fig:concepts:wdir-merge"/>.</para> |
474 linkend="fig.concepts.wdir-merge"/>.</para> | 458 |
475 | 459 <figure id="fig:concepts:wdir-merge"> |
476 <informalfigure id="fig.concepts.wdir-merge"> | 460 <title>Merging two heads</title> |
477 <mediaobject> | 461 <mediaobject> |
478 <imageobject><imagedata fileref="images/wdir-merge.png"/> | 462 <imageobject> |
479 </imageobject> | 463 <imagedata fileref="figs/wdir-merge.png"/> |
480 <textobject><phrase>XXX add text</phrase></textobject> | 464 </imageobject> |
481 <caption><para id="fig.concepts.wdir-merge.caption">Merging two | 465 <textobject><phrase>XXX add text</phrase></textobject> |
482 heads</para></caption> | 466 </mediaobject> |
483 </mediaobject> | 467 </figure> |
484 </informalfigure> | 468 |
485 | 469 <para id="x_319">Mercurial also has to modify the working directory, to |
486 <para>Mercurial also has to modify the working directory, to | |
487 merge the files managed in the two changesets. Simplified a | 470 merge the files managed in the two changesets. Simplified a |
488 little, the merging process goes like this, for every file in | 471 little, the merging process goes like this, for every file in |
489 the manifests of both changesets.</para> | 472 the manifests of both changesets.</para> |
490 <itemizedlist> | 473 <itemizedlist> |
491 <listitem><para>If neither changeset has modified a file, do | 474 <listitem><para id="x_31a">If neither changeset has modified a file, do |
492 nothing with that file.</para> | 475 nothing with that file.</para> |
493 </listitem> | 476 </listitem> |
494 <listitem><para>If one changeset has modified a file, and the | 477 <listitem><para id="x_31b">If one changeset has modified a file, and the |
495 other hasn't, create the modified copy of the file in the | 478 other hasn't, create the modified copy of the file in the |
496 working directory.</para> | 479 working directory.</para> |
497 </listitem> | 480 </listitem> |
498 <listitem><para>If one changeset has removed a file, and the | 481 <listitem><para id="x_31c">If one changeset has removed a file, and the |
499 other hasn't (or has also deleted it), delete the file | 482 other hasn't (or has also deleted it), delete the file |
500 from the working directory.</para> | 483 from the working directory.</para> |
501 </listitem> | 484 </listitem> |
502 <listitem><para>If one changeset has removed a file, but the | 485 <listitem><para id="x_31d">If one changeset has removed a file, but the |
503 other has modified the file, ask the user what to do: keep | 486 other has modified the file, ask the user what to do: keep |
504 the modified file, or remove it?</para> | 487 the modified file, or remove it?</para> |
505 </listitem> | 488 </listitem> |
506 <listitem><para>If both changesets have modified a file, | 489 <listitem><para id="x_31e">If both changesets have modified a file, |
507 invoke an external merge program to choose the new | 490 invoke an external merge program to choose the new |
508 contents for the merged file. This may require input from | 491 contents for the merged file. This may require input from |
509 the user.</para> | 492 the user.</para> |
510 </listitem> | 493 </listitem> |
511 <listitem><para>If one changeset has modified a file, and the | 494 <listitem><para id="x_31f">If one changeset has modified a file, and the |
512 other has renamed or copied the file, make sure that the | 495 other has renamed or copied the file, make sure that the |
513 changes follow the new name of the file.</para> | 496 changes follow the new name of the file.</para> |
514 </listitem></itemizedlist> | 497 </listitem></itemizedlist> |
515 <para>There are more details&emdash;merging has plenty of corner | 498 <para id="x_320">There are more details&emdash;merging has plenty of corner |
516 cases&emdash;but these are the most common choices that are | 499 cases&emdash;but these are the most common choices that are |
517 involved in a merge. As you can see, most cases are | 500 involved in a merge. As you can see, most cases are |
518 completely automatic, and indeed most merges finish | 501 completely automatic, and indeed most merges finish |
519 automatically, without requiring your input to resolve any | 502 automatically, without requiring your input to resolve any |
520 conflicts.</para> | 503 conflicts.</para> |
521 | 504 |
522 <para>When you're thinking about what happens when you commit | 505 <para id="x_321">When you're thinking about what happens when you commit |
523 after a merge, once again the working directory is <quote>the | 506 after a merge, once again the working directory is <quote>the |
524 changeset I'm about to commit</quote>. After the <command | 507 changeset I'm about to commit</quote>. After the <command |
525 role="hg-cmd">hg merge</command> command completes, the | 508 role="hg-cmd">hg merge</command> command completes, the |
526 working directory has two parents; these will become the | 509 working directory has two parents; these will become the |
527 parents of the new changeset.</para> | 510 parents of the new changeset.</para> |
528 | 511 |
529 <para>Mercurial lets you perform multiple merges, but you must | 512 <para id="x_322">Mercurial lets you perform multiple merges, but you must |
530 commit the results of each individual merge as you go. This | 513 commit the results of each individual merge as you go. This |
531 is necessary because Mercurial only tracks two parents for | 514 is necessary because Mercurial only tracks two parents for |
532 both revisions and the working directory. While it would be | 515 both revisions and the working directory. While it would be |
533 technically possible to merge multiple changesets at once, the | 516 technically possible to merge multiple changesets at once, the |
534 prospect of user confusion and making a terrible mess of a | 517 prospect of user confusion and making a terrible mess of a |
537 </sect2> | 520 </sect2> |
538 </sect1> | 521 </sect1> |
539 <sect1> | 522 <sect1> |
540 <title>Other interesting design features</title> | 523 <title>Other interesting design features</title> |
541 | 524 |
542 <para>In the sections above, I've tried to highlight some of the | 525 <para id="x_323">In the sections above, I've tried to highlight some of the |
543 most important aspects of Mercurial's design, to illustrate that | 526 most important aspects of Mercurial's design, to illustrate that |
544 it pays careful attention to reliability and performance. | 527 it pays careful attention to reliability and performance. |
545 However, the attention to detail doesn't stop there. There are | 528 However, the attention to detail doesn't stop there. There are |
546 a number of other aspects of Mercurial's construction that I | 529 a number of other aspects of Mercurial's construction that I |
547 personally find interesting. I'll detail a few of them here, | 530 personally find interesting. I'll detail a few of them here, |
550 of thinking that goes into a well-designed system.</para> | 533 of thinking that goes into a well-designed system.</para> |
551 | 534 |
552 <sect2> | 535 <sect2> |
553 <title>Clever compression</title> | 536 <title>Clever compression</title> |
554 | 537 |
555 <para>When appropriate, Mercurial will store both snapshots and | 538 <para id="x_324">When appropriate, Mercurial will store both snapshots and |
556 deltas in compressed form. It does this by always | 539 deltas in compressed form. It does this by always |
557 <emphasis>trying to</emphasis> compress a snapshot or delta, | 540 <emphasis>trying to</emphasis> compress a snapshot or delta, |
558 but only storing the compressed version if it's smaller than | 541 but only storing the compressed version if it's smaller than |
559 the uncompressed version.</para> | 542 the uncompressed version.</para> |
560 | 543 |
561 <para>This means that Mercurial does <quote>the right | 544 <para id="x_325">This means that Mercurial does <quote>the right |
562 thing</quote> when storing a file whose native form is | 545 thing</quote> when storing a file whose native form is |
563 compressed, such as a <literal>zip</literal> archive or a JPEG | 546 compressed, such as a <literal>zip</literal> archive or a JPEG |
564 image. When these types of files are compressed a second | 547 image. When these types of files are compressed a second |
565 time, the resulting file is usually bigger than the | 548 time, the resulting file is usually bigger than the |
566 once-compressed form, and so Mercurial will store the plain | 549 once-compressed form, and so Mercurial will store the plain |
567 <literal>zip</literal> or JPEG.</para> | 550 <literal>zip</literal> or JPEG.</para> |
568 | 551 |
569 <para>Deltas between revisions of a compressed file are usually | 552 <para id="x_326">Deltas between revisions of a compressed file are usually |
570 larger than snapshots of the file, and Mercurial again does | 553 larger than snapshots of the file, and Mercurial again does |
571 <quote>the right thing</quote> in these cases. It finds that | 554 <quote>the right thing</quote> in these cases. It finds that |
572 such a delta exceeds the threshold at which it should store a | 555 such a delta exceeds the threshold at which it should store a |
573 complete snapshot of the file, so it stores the snapshot, | 556 complete snapshot of the file, so it stores the snapshot, |
574 again saving space compared to a naive delta-only | 557 again saving space compared to a naive delta-only |
575 approach.</para> | 558 approach.</para> |
576 | 559 |
577 <sect3> | 560 <sect3> |
578 <title>Network recompression</title> | 561 <title>Network recompression</title> |
579 | 562 |
580 <para>When storing revisions on disk, Mercurial uses the | 563 <para id="x_327">When storing revisions on disk, Mercurial uses the |
581 <quote>deflate</quote> compression algorithm (the same one | 564 <quote>deflate</quote> compression algorithm (the same one |
582 used by the popular <literal>zip</literal> archive format), | 565 used by the popular <literal>zip</literal> archive format), |
583 which balances good speed with a respectable compression | 566 which balances good speed with a respectable compression |
584 ratio. However, when transmitting revision data over a | 567 ratio. However, when transmitting revision data over a |
585 network connection, Mercurial uncompresses the compressed | 568 network connection, Mercurial uncompresses the compressed |
586 revision data.</para> | 569 revision data.</para> |
587 | 570 |
588 <para>If the connection is over HTTP, Mercurial recompresses | 571 <para id="x_328">If the connection is over HTTP, Mercurial recompresses |
589 the entire stream of data using a compression algorithm that | 572 the entire stream of data using a compression algorithm that |
590 gives a better compression ratio (the Burrows-Wheeler | 573 gives a better compression ratio (the Burrows-Wheeler |
591 algorithm from the widely used <literal>bzip2</literal> | 574 algorithm from the widely used <literal>bzip2</literal> |
592 compression package). This combination of algorithm and | 575 compression package). This combination of algorithm and |
593 compression of the entire stream (instead of a revision at a | 576 compression of the entire stream (instead of a revision at a |
594 time) substantially reduces the number of bytes to be | 577 time) substantially reduces the number of bytes to be |
595 transferred, yielding better network performance over almost | 578 transferred, yielding better network performance over almost |
596 all kinds of network.</para> | 579 all kinds of network.</para> |
597 | 580 |
598 <para>(If the connection is over <command>ssh</command>, | 581 <para id="x_329">(If the connection is over <command>ssh</command>, |
599 Mercurial <emphasis>doesn't</emphasis> recompress the | 582 Mercurial <emphasis>doesn't</emphasis> recompress the |
600 stream, because <command>ssh</command> can already do this | 583 stream, because <command>ssh</command> can already do this |
601 itself.)</para> | 584 itself.)</para> |
602 | 585 |
603 </sect3> | 586 </sect3> |
604 </sect2> | 587 </sect2> |
605 <sect2> | 588 <sect2> |
606 <title>Read/write ordering and atomicity</title> | 589 <title>Read/write ordering and atomicity</title> |
607 | 590 |
608 <para>Appending to files isn't the whole story when it comes to | 591 <para id="x_32a">Appending to files isn't the whole story when |
609 guaranteeing that a reader won't see a partial write. If you | 592 it comes to guaranteeing that a reader won't see a partial |
610 recall figure <xref endterm="fig.concepts.metadata.caption" | 593 write. If you recall <xref linkend="fig:concepts:metadata"/>, |
611 linkend="fig.concepts.metadata"/>, revisions in the | 594 revisions in |
612 changelog point to revisions in the manifest, and revisions in | 595 the changelog point to revisions in the manifest, and |
613 the manifest point to revisions in filelogs. This hierarchy | 596 revisions in the manifest point to revisions in filelogs. |
614 is deliberate.</para> | 597 This hierarchy is deliberate.</para> |
615 | 598 |
616 <para>A writer starts a transaction by writing filelog and | 599 <para id="x_32b">A writer starts a transaction by writing filelog and |
617 manifest data, and doesn't write any changelog data until | 600 manifest data, and doesn't write any changelog data until |
618 those are finished. A reader starts by reading changelog | 601 those are finished. A reader starts by reading changelog |
619 data, then manifest data, followed by filelog data.</para> | 602 data, then manifest data, followed by filelog data.</para> |
620 | 603 |
621 <para>Since the writer has always finished writing filelog and | 604 <para id="x_32c">Since the writer has always finished writing filelog and |
622 manifest data before it writes to the changelog, a reader will | 605 manifest data before it writes to the changelog, a reader will |
623 never read a pointer to a partially written manifest revision | 606 never read a pointer to a partially written manifest revision |
624 from the changelog, and it will never read a pointer to a | 607 from the changelog, and it will never read a pointer to a |
625 partially written filelog revision from the manifest.</para> | 608 partially written filelog revision from the manifest.</para> |
626 | 609 |
627 </sect2> | 610 </sect2> |
628 <sect2> | 611 <sect2> |
629 <title>Concurrent access</title> | 612 <title>Concurrent access</title> |
630 | 613 |
631 <para>The read/write ordering and atomicity guarantees mean that | 614 <para id="x_32d">The read/write ordering and atomicity guarantees mean that |
632 Mercurial never needs to <emphasis>lock</emphasis> a | 615 Mercurial never needs to <emphasis>lock</emphasis> a |
633 repository when it's reading data, even if the repository is | 616 repository when it's reading data, even if the repository is |
634 being written to while the read is occurring. This has a big | 617 being written to while the read is occurring. This has a big |
635 effect on scalability; you can have an arbitrary number of | 618 effect on scalability; you can have an arbitrary number of |
636 Mercurial processes safely reading data from a repository | 619 Mercurial processes safely reading data from a repository |
637 safely all at once, no matter whether it's being written to or | 620 safely all at once, no matter whether it's being written to or |
638 not.</para> | 621 not.</para> |
639 | 622 |
640 <para>The lockless nature of reading means that if you're | 623 <para id="x_32e">The lockless nature of reading means that if you're |
641 sharing a repository on a multi-user system, you don't need to | 624 sharing a repository on a multi-user system, you don't need to |
642 grant other local users permission to | 625 grant other local users permission to |
643 <emphasis>write</emphasis> to your repository in order for | 626 <emphasis>write</emphasis> to your repository in order for |
644 them to be able to clone it or pull changes from it; they only | 627 them to be able to clone it or pull changes from it; they only |
645 need <emphasis>read</emphasis> permission. (This is | 628 need <emphasis>read</emphasis> permission. (This is |
648 readers to be able to lock a repository to access it safely, | 631 readers to be able to lock a repository to access it safely, |
649 and this requires write permission on at least one directory, | 632 and this requires write permission on at least one directory, |
650 which of course makes for all kinds of nasty and annoying | 633 which of course makes for all kinds of nasty and annoying |
651 security and administrative problems.)</para> | 634 security and administrative problems.)</para> |
652 | 635 |
653 <para>Mercurial uses locks to ensure that only one process can | 636 <para id="x_32f">Mercurial uses locks to ensure that only one process can |
654 write to a repository at a time (the locking mechanism is safe | 637 write to a repository at a time (the locking mechanism is safe |
655 even over filesystems that are notoriously hostile to locking, | 638 even over filesystems that are notoriously hostile to locking, |
656 such as NFS). If a repository is locked, a writer will wait | 639 such as NFS). If a repository is locked, a writer will wait |
657 for a while to retry if the repository becomes unlocked, but | 640 for a while to retry if the repository becomes unlocked, but |
658 if the repository remains locked for too long, the process | 641 if the repository remains locked for too long, the process |
662 timeout is configurable, from zero to infinity.)</para> | 645 timeout is configurable, from zero to infinity.)</para> |
663 | 646 |
664 <sect3> | 647 <sect3> |
665 <title>Safe dirstate access</title> | 648 <title>Safe dirstate access</title> |
666 | 649 |
667 <para>As with revision data, Mercurial doesn't take a lock to | 650 <para id="x_330">As with revision data, Mercurial doesn't take a lock to |
668 read the dirstate file; it does acquire a lock to write it. | 651 read the dirstate file; it does acquire a lock to write it. |
669 To avoid the possibility of reading a partially written copy | 652 To avoid the possibility of reading a partially written copy |
670 of the dirstate file, Mercurial writes to a file with a | 653 of the dirstate file, Mercurial writes to a file with a |
671 unique name in the same directory as the dirstate file, then | 654 unique name in the same directory as the dirstate file, then |
672 renames the temporary file atomically to | 655 renames the temporary file atomically to |
677 </sect3> | 660 </sect3> |
678 </sect2> | 661 </sect2> |
679 <sect2> | 662 <sect2> |
680 <title>Avoiding seeks</title> | 663 <title>Avoiding seeks</title> |
681 | 664 |
682 <para>Critical to Mercurial's performance is the avoidance of | 665 <para id="x_331">Critical to Mercurial's performance is the avoidance of |
683 seeks of the disk head, since any seek is far more expensive | 666 seeks of the disk head, since any seek is far more expensive |
684 than even a comparatively large read operation.</para> | 667 than even a comparatively large read operation.</para> |
685 | 668 |
686 <para>This is why, for example, the dirstate is stored in a | 669 <para id="x_332">This is why, for example, the dirstate is stored in a |
687 single file. If there were a dirstate file per directory that | 670 single file. If there were a dirstate file per directory that |
688 Mercurial tracked, the disk would seek once per directory. | 671 Mercurial tracked, the disk would seek once per directory. |
689 Instead, Mercurial reads the entire single dirstate file in | 672 Instead, Mercurial reads the entire single dirstate file in |
690 one step.</para> | 673 one step.</para> |
691 | 674 |
692 <para>Mercurial also uses a <quote>copy on write</quote> scheme | 675 <para id="x_333">Mercurial also uses a <quote>copy on write</quote> scheme |
693 when cloning a repository on local storage. Instead of | 676 when cloning a repository on local storage. Instead of |
694 copying every revlog file from the old repository into the new | 677 copying every revlog file from the old repository into the new |
695 repository, it makes a <quote>hard link</quote>, which is a | 678 repository, it makes a <quote>hard link</quote>, which is a |
696 shorthand way to say <quote>these two names point to the same | 679 shorthand way to say <quote>these two names point to the same |
697 file</quote>. When Mercurial is about to write to one of a | 680 file</quote>. When Mercurial is about to write to one of a |
698 revlog's files, it checks to see if the number of names | 681 revlog's files, it checks to see if the number of names |
699 pointing at the file is greater than one. If it is, more than | 682 pointing at the file is greater than one. If it is, more than |
700 one repository is using the file, so Mercurial makes a new | 683 one repository is using the file, so Mercurial makes a new |
701 copy of the file that is private to this repository.</para> | 684 copy of the file that is private to this repository.</para> |
702 | 685 |
703 <para>A few revision control developers have pointed out that | 686 <para id="x_334">A few revision control developers have pointed out that |
704 this idea of making a complete private copy of a file is not | 687 this idea of making a complete private copy of a file is not |
705 very efficient in its use of storage. While this is true, | 688 very efficient in its use of storage. While this is true, |
706 storage is cheap, and this method gives the highest | 689 storage is cheap, and this method gives the highest |
707 performance while deferring most book-keeping to the operating | 690 performance while deferring most book-keeping to the operating |
708 system. An alternative scheme would most likely reduce | 691 system. An alternative scheme would most likely reduce |
712 | 695 |
713 </sect2> | 696 </sect2> |
714 <sect2> | 697 <sect2> |
715 <title>Other contents of the dirstate</title> | 698 <title>Other contents of the dirstate</title> |
716 | 699 |
717 <para>Because Mercurial doesn't force you to tell it when you're | 700 <para id="x_335">Because Mercurial doesn't force you to tell it when you're |
718 modifying a file, it uses the dirstate to store some extra | 701 modifying a file, it uses the dirstate to store some extra |
719 information so it can determine efficiently whether you have | 702 information so it can determine efficiently whether you have |
720 modified a file. For each file in the working directory, it | 703 modified a file. For each file in the working directory, it |
721 stores the time that it last modified the file itself, and the | 704 stores the time that it last modified the file itself, and the |
722 size of the file at that time.</para> | 705 size of the file at that time.</para> |
723 | 706 |
724 <para>When you explicitly <command role="hg-cmd">hg | 707 <para id="x_336">When you explicitly <command role="hg-cmd">hg |
725 add</command>, <command role="hg-cmd">hg remove</command>, | 708 add</command>, <command role="hg-cmd">hg remove</command>, |
726 <command role="hg-cmd">hg rename</command> or <command | 709 <command role="hg-cmd">hg rename</command> or <command |
727 role="hg-cmd">hg copy</command> files, Mercurial updates the | 710 role="hg-cmd">hg copy</command> files, Mercurial updates the |
728 dirstate so that it knows what to do with those files when you | 711 dirstate so that it knows what to do with those files when you |
729 commit.</para> | 712 commit.</para> |
730 | 713 |
731 <para>When Mercurial is checking the states of files in the | 714 <para id="x_337">When Mercurial is checking the states of files in the |
732 working directory, it first checks a file's modification time. | 715 working directory, it first checks a file's modification time. |
733 If that has not changed, the file must not have been modified. | 716 If that has not changed, the file must not have been modified. |
734 If the file's size has changed, the file must have been | 717 If the file's size has changed, the file must have been |
735 modified. If the modification time has changed, but the size | 718 modified. If the modification time has changed, but the size |
736 has not, only then does Mercurial need to read the actual | 719 has not, only then does Mercurial need to read the actual |