Mercurial > hgbook
comparison en/ch06-filenames.xml @ 749:7e7c47481e4f
Oops, this is the real merge for my hg's oddity
author | Dongsheng Song <dongsheng.song@gmail.com> |
---|---|
date | Fri, 20 Mar 2009 16:43:35 +0800 |
parents | en/ch07-filenames.xml@cfdb601a3c8b |
children | 1c13ed2130a7 |
comparison
equal
deleted
inserted
replaced
748:d13c7c706a58 | 749:7e7c47481e4f |
---|---|
1 <!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : --> | |
2 | |
3 <chapter id="chap.names"> | |
4 <?dbhtml filename="file-names-and-pattern-matching.html"?> | |
5 <title>File names and pattern matching</title> | |
6 | |
7 <para id="x_543">Mercurial provides mechanisms that let you work with file | |
8 names in a consistent and expressive way.</para> | |
9 | |
10 <sect1> | |
11 <title>Simple file naming</title> | |
12 | |
13 <para id="x_544">Mercurial uses a unified piece of machinery <quote>under the | |
14 hood</quote> to handle file names. Every command behaves | |
15 uniformly with respect to file names. The way in which commands | |
16 work with file names is as follows.</para> | |
17 | |
18 <para id="x_545">If you explicitly name real files on the command line, | |
19 Mercurial works with exactly those files, as you would expect. | |
20 &interaction.filenames.files;</para> | |
21 | |
22 <para id="x_546">When you provide a directory name, Mercurial will interpret | |
23 this as <quote>operate on every file in this directory and its | |
24 subdirectories</quote>. Mercurial traverses the files and | |
25 subdirectories in a directory in alphabetical order. When it | |
26 encounters a subdirectory, it will traverse that subdirectory | |
27 before continuing with the current directory.</para> | |
28 | |
29 &interaction.filenames.dirs; | |
30 | |
31 </sect1> | |
32 <sect1> | |
33 <title>Running commands without any file names</title> | |
34 | |
35 <para id="x_547">Mercurial's commands that work with file names have useful | |
36 default behaviours when you invoke them without providing any | |
37 file names or patterns. What kind of behaviour you should | |
38 expect depends on what the command does. Here are a few rules | |
39 of thumb you can use to predict what a command is likely to do | |
40 if you don't give it any names to work with.</para> | |
41 <itemizedlist> | |
42 <listitem><para id="x_548">Most commands will operate on the entire working | |
43 directory. This is what the <command role="hg-cmd">hg | |
44 add</command> command does, for example.</para> | |
45 </listitem> | |
46 <listitem><para id="x_549">If the command has effects that are difficult or | |
47 impossible to reverse, it will force you to explicitly | |
48 provide at least one name or pattern (see below). This | |
49 protects you from accidentally deleting files by running | |
50 <command role="hg-cmd">hg remove</command> with no | |
51 arguments, for example.</para> | |
52 </listitem></itemizedlist> | |
53 | |
54 <para id="x_54a">It's easy to work around these default behaviours if they | |
55 don't suit you. If a command normally operates on the whole | |
56 working directory, you can invoke it on just the current | |
57 directory and its subdirectories by giving it the name | |
58 <quote><filename class="directory">.</filename></quote>.</para> | |
59 | |
60 &interaction.filenames.wdir-subdir; | |
61 | |
62 <para id="x_54b">Along the same lines, some commands normally print file | |
63 names relative to the root of the repository, even if you're | |
64 invoking them from a subdirectory. Such a command will print | |
65 file names relative to your subdirectory if you give it explicit | |
66 names. Here, we're going to run <command role="hg-cmd">hg | |
67 status</command> from a subdirectory, and get it to operate on | |
68 the entire working directory while printing file names relative | |
69 to our subdirectory, by passing it the output of the <command | |
70 role="hg-cmd">hg root</command> command.</para> | |
71 | |
72 &interaction.filenames.wdir-relname; | |
73 | |
74 </sect1> | |
75 <sect1> | |
76 <title>Telling you what's going on</title> | |
77 | |
78 <para id="x_54c">The <command role="hg-cmd">hg add</command> example in the | |
79 preceding section illustrates something else that's helpful | |
80 about Mercurial commands. If a command operates on a file that | |
81 you didn't name explicitly on the command line, it will usually | |
82 print the name of the file, so that you will not be surprised | |
83 what's going on.</para> | |
84 | |
85 <para id="x_54d">The principle here is of <emphasis>least | |
86 surprise</emphasis>. If you've exactly named a file on the | |
87 command line, there's no point in repeating it back at you. If | |
88 Mercurial is acting on a file <emphasis>implicitly</emphasis>, | |
89 because you provided no names, or a directory, or a pattern (see | |
90 below), it's safest to tell you what it's doing.</para> | |
91 | |
92 <para id="x_54e">For commands that behave this way, you can silence them | |
93 using the <option role="hg-opt-global">-q</option> option. You | |
94 can also get them to print the name of every file, even those | |
95 you've named explicitly, using the <option | |
96 role="hg-opt-global">-v</option> option.</para> | |
97 | |
98 </sect1> | |
99 <sect1> | |
100 <title>Using patterns to identify files</title> | |
101 | |
102 <para id="x_54f">In addition to working with file and directory names, | |
103 Mercurial lets you use <emphasis>patterns</emphasis> to identify | |
104 files. Mercurial's pattern handling is expressive.</para> | |
105 | |
106 <para id="x_550">On Unix-like systems (Linux, MacOS, etc.), the job of | |
107 matching file names to patterns normally falls to the shell. On | |
108 these systems, you must explicitly tell Mercurial that a name is | |
109 a pattern. On Windows, the shell does not expand patterns, so | |
110 Mercurial will automatically identify names that are patterns, | |
111 and expand them for you.</para> | |
112 | |
113 <para id="x_551">To provide a pattern in place of a regular name on the | |
114 command line, the mechanism is simple:</para> | |
115 <programlisting>syntax:patternbody</programlisting> | |
116 <para id="x_552">That is, a pattern is identified by a short text string that | |
117 says what kind of pattern this is, followed by a colon, followed | |
118 by the actual pattern.</para> | |
119 | |
120 <para id="x_553">Mercurial supports two kinds of pattern syntax. The most | |
121 frequently used is called <literal>glob</literal>; this is the | |
122 same kind of pattern matching used by the Unix shell, and should | |
123 be familiar to Windows command prompt users, too.</para> | |
124 | |
125 <para id="x_554">When Mercurial does automatic pattern matching on Windows, | |
126 it uses <literal>glob</literal> syntax. You can thus omit the | |
127 <quote><literal>glob:</literal></quote> prefix on Windows, but | |
128 it's safe to use it, too.</para> | |
129 | |
130 <para id="x_555">The <literal>re</literal> syntax is more powerful; it lets | |
131 you specify patterns using regular expressions, also known as | |
132 regexps.</para> | |
133 | |
134 <para id="x_556">By the way, in the examples that follow, notice that I'm | |
135 careful to wrap all of my patterns in quote characters, so that | |
136 they won't get expanded by the shell before Mercurial sees | |
137 them.</para> | |
138 | |
139 <sect2> | |
140 <title>Shell-style <literal>glob</literal> patterns</title> | |
141 | |
142 <para id="x_557">This is an overview of the kinds of patterns you can use | |
143 when you're matching on glob patterns.</para> | |
144 | |
145 <para id="x_558">The <quote><literal>*</literal></quote> character matches | |
146 any string, within a single directory.</para> | |
147 | |
148 &interaction.filenames.glob.star; | |
149 | |
150 <para id="x_559">The <quote><literal>**</literal></quote> pattern matches | |
151 any string, and crosses directory boundaries. It's not a | |
152 standard Unix glob token, but it's accepted by several popular | |
153 Unix shells, and is very useful.</para> | |
154 | |
155 &interaction.filenames.glob.starstar; | |
156 | |
157 <para id="x_55a">The <quote><literal>?</literal></quote> pattern matches | |
158 any single character.</para> | |
159 | |
160 &interaction.filenames.glob.question; | |
161 | |
162 <para id="x_55b">The <quote><literal>[</literal></quote> character begins a | |
163 <emphasis>character class</emphasis>. This matches any single | |
164 character within the class. The class ends with a | |
165 <quote><literal>]</literal></quote> character. A class may | |
166 contain multiple <emphasis>range</emphasis>s of the form | |
167 <quote><literal>a-f</literal></quote>, which is shorthand for | |
168 <quote><literal>abcdef</literal></quote>.</para> | |
169 | |
170 &interaction.filenames.glob.range; | |
171 | |
172 <para id="x_55c">If the first character after the | |
173 <quote><literal>[</literal></quote> in a character class is a | |
174 <quote><literal>!</literal></quote>, it | |
175 <emphasis>negates</emphasis> the class, making it match any | |
176 single character not in the class.</para> | |
177 | |
178 <para id="x_55d">A <quote><literal>{</literal></quote> begins a group of | |
179 subpatterns, where the whole group matches if any subpattern | |
180 in the group matches. The <quote><literal>,</literal></quote> | |
181 character separates subpatterns, and | |
182 <quote><literal>}</literal></quote> ends the group.</para> | |
183 | |
184 &interaction.filenames.glob.group; | |
185 | |
186 <sect3> | |
187 <title>Watch out!</title> | |
188 | |
189 <para id="x_55e">Don't forget that if you want to match a pattern in any | |
190 directory, you should not be using the | |
191 <quote><literal>*</literal></quote> match-any token, as this | |
192 will only match within one directory. Instead, use the | |
193 <quote><literal>**</literal></quote> token. This small | |
194 example illustrates the difference between the two.</para> | |
195 | |
196 &interaction.filenames.glob.star-starstar; | |
197 | |
198 </sect3> | |
199 </sect2> | |
200 <sect2> | |
201 <title>Regular expression matching with <literal>re</literal> | |
202 patterns</title> | |
203 | |
204 <para id="x_55f">Mercurial accepts the same regular expression syntax as | |
205 the Python programming language (it uses Python's regexp | |
206 engine internally). This is based on the Perl language's | |
207 regexp syntax, which is the most popular dialect in use (it's | |
208 also used in Java, for example).</para> | |
209 | |
210 <para id="x_560">I won't discuss Mercurial's regexp dialect in any detail | |
211 here, as regexps are not often used. Perl-style regexps are | |
212 in any case already exhaustively documented on a multitude of | |
213 web sites, and in many books. Instead, I will focus here on a | |
214 few things you should know if you find yourself needing to use | |
215 regexps with Mercurial.</para> | |
216 | |
217 <para id="x_561">A regexp is matched against an entire file name, relative | |
218 to the root of the repository. In other words, even if you're | |
219 already in subbdirectory <filename | |
220 class="directory">foo</filename>, if you want to match files | |
221 under this directory, your pattern must start with | |
222 <quote><literal>foo/</literal></quote>.</para> | |
223 | |
224 <para id="x_562">One thing to note, if you're familiar with Perl-style | |
225 regexps, is that Mercurial's are <emphasis>rooted</emphasis>. | |
226 That is, a regexp starts matching against the beginning of a | |
227 string; it doesn't look for a match anywhere within the | |
228 string. To match anywhere in a string, start your pattern | |
229 with <quote><literal>.*</literal></quote>.</para> | |
230 | |
231 </sect2> | |
232 </sect1> | |
233 <sect1> | |
234 <title>Filtering files</title> | |
235 | |
236 <para id="x_563">Not only does Mercurial give you a variety of ways to | |
237 specify files; it lets you further winnow those files using | |
238 <emphasis>filters</emphasis>. Commands that work with file | |
239 names accept two filtering options.</para> | |
240 <itemizedlist> | |
241 <listitem><para id="x_564"><option role="hg-opt-global">-I</option>, or | |
242 <option role="hg-opt-global">--include</option>, lets you | |
243 specify a pattern that file names must match in order to be | |
244 processed.</para> | |
245 </listitem> | |
246 <listitem><para id="x_565"><option role="hg-opt-global">-X</option>, or | |
247 <option role="hg-opt-global">--exclude</option>, gives you a | |
248 way to <emphasis>avoid</emphasis> processing files, if they | |
249 match this pattern.</para> | |
250 </listitem></itemizedlist> | |
251 <para id="x_566">You can provide multiple <option | |
252 role="hg-opt-global">-I</option> and <option | |
253 role="hg-opt-global">-X</option> options on the command line, | |
254 and intermix them as you please. Mercurial interprets the | |
255 patterns you provide using glob syntax by default (but you can | |
256 use regexps if you need to).</para> | |
257 | |
258 <para id="x_567">You can read a <option role="hg-opt-global">-I</option> | |
259 filter as <quote>process only the files that match this | |
260 filter</quote>.</para> | |
261 | |
262 &interaction.filenames.filter.include; | |
263 | |
264 <para id="x_568">The <option role="hg-opt-global">-X</option> filter is best | |
265 read as <quote>process only the files that don't match this | |
266 pattern</quote>.</para> | |
267 | |
268 &interaction.filenames.filter.exclude; | |
269 | |
270 </sect1> | |
271 <sect1> | |
272 <title>Ignoring unwanted files and directories</title> | |
273 | |
274 <para id="x_569">XXX.</para> | |
275 | |
276 </sect1> | |
277 <sect1 id="sec.names.case"> | |
278 <title>Case sensitivity</title> | |
279 | |
280 <para id="x_56a">If you're working in a mixed development environment that | |
281 contains both Linux (or other Unix) systems and Macs or Windows | |
282 systems, you should keep in the back of your mind the knowledge | |
283 that they treat the case (<quote>N</quote> versus | |
284 <quote>n</quote>) of file names in incompatible ways. This is | |
285 not very likely to affect you, and it's easy to deal with if it | |
286 does, but it could surprise you if you don't know about | |
287 it.</para> | |
288 | |
289 <para id="x_56b">Operating systems and filesystems differ in the way they | |
290 handle the <emphasis>case</emphasis> of characters in file and | |
291 directory names. There are three common ways to handle case in | |
292 names.</para> | |
293 <itemizedlist> | |
294 <listitem><para id="x_56c">Completely case insensitive. Uppercase and | |
295 lowercase versions of a letter are treated as identical, | |
296 both when creating a file and during subsequent accesses. | |
297 This is common on older DOS-based systems.</para> | |
298 </listitem> | |
299 <listitem><para id="x_56d">Case preserving, but insensitive. When a file | |
300 or directory is created, the case of its name is stored, and | |
301 can be retrieved and displayed by the operating system. | |
302 When an existing file is being looked up, its case is | |
303 ignored. This is the standard arrangement on Windows and | |
304 MacOS. The names <filename>foo</filename> and | |
305 <filename>FoO</filename> identify the same file. This | |
306 treatment of uppercase and lowercase letters as | |
307 interchangeable is also referred to as <emphasis>case | |
308 folding</emphasis>.</para> | |
309 </listitem> | |
310 <listitem><para id="x_56e">Case sensitive. The case of a name is | |
311 significant at all times. The names <filename>foo</filename> | |
312 and {FoO} identify different files. This is the way Linux | |
313 and Unix systems normally work.</para> | |
314 </listitem></itemizedlist> | |
315 | |
316 <para id="x_56f">On Unix-like systems, it is possible to have any or all of | |
317 the above ways of handling case in action at once. For example, | |
318 if you use a USB thumb drive formatted with a FAT32 filesystem | |
319 on a Linux system, Linux will handle names on that filesystem in | |
320 a case preserving, but insensitive, way.</para> | |
321 | |
322 <sect2> | |
323 <title>Safe, portable repository storage</title> | |
324 | |
325 <para id="x_570">Mercurial's repository storage mechanism is <emphasis>case | |
326 safe</emphasis>. It translates file names so that they can | |
327 be safely stored on both case sensitive and case insensitive | |
328 filesystems. This means that you can use normal file copying | |
329 tools to transfer a Mercurial repository onto, for example, a | |
330 USB thumb drive, and safely move that drive and repository | |
331 back and forth between a Mac, a PC running Windows, and a | |
332 Linux box.</para> | |
333 | |
334 </sect2> | |
335 <sect2> | |
336 <title>Detecting case conflicts</title> | |
337 | |
338 <para id="x_571">When operating in the working directory, Mercurial honours | |
339 the naming policy of the filesystem where the working | |
340 directory is located. If the filesystem is case preserving, | |
341 but insensitive, Mercurial will treat names that differ only | |
342 in case as the same.</para> | |
343 | |
344 <para id="x_572">An important aspect of this approach is that it is | |
345 possible to commit a changeset on a case sensitive (typically | |
346 Linux or Unix) filesystem that will cause trouble for users on | |
347 case insensitive (usually Windows and MacOS) users. If a | |
348 Linux user commits changes to two files, one named | |
349 <filename>myfile.c</filename> and the other named | |
350 <filename>MyFile.C</filename>, they will be stored correctly | |
351 in the repository. And in the working directories of other | |
352 Linux users, they will be correctly represented as separate | |
353 files.</para> | |
354 | |
355 <para id="x_573">If a Windows or Mac user pulls this change, they will not | |
356 initially have a problem, because Mercurial's repository | |
357 storage mechanism is case safe. However, once they try to | |
358 <command role="hg-cmd">hg update</command> the working | |
359 directory to that changeset, or <command role="hg-cmd">hg | |
360 merge</command> with that changeset, Mercurial will spot the | |
361 conflict between the two file names that the filesystem would | |
362 treat as the same, and forbid the update or merge from | |
363 occurring.</para> | |
364 | |
365 </sect2> | |
366 <sect2> | |
367 <title>Fixing a case conflict</title> | |
368 | |
369 <para id="x_574">If you are using Windows or a Mac in a mixed environment | |
370 where some of your collaborators are using Linux or Unix, and | |
371 Mercurial reports a case folding conflict when you try to | |
372 <command role="hg-cmd">hg update</command> or <command | |
373 role="hg-cmd">hg merge</command>, the procedure to fix the | |
374 problem is simple.</para> | |
375 | |
376 <para id="x_575">Just find a nearby Linux or Unix box, clone the problem | |
377 repository onto it, and use Mercurial's <command | |
378 role="hg-cmd">hg rename</command> command to change the | |
379 names of any offending files or directories so that they will | |
380 no longer cause case folding conflicts. Commit this change, | |
381 <command role="hg-cmd">hg pull</command> or <command | |
382 role="hg-cmd">hg push</command> it across to your Windows or | |
383 MacOS system, and <command role="hg-cmd">hg update</command> | |
384 to the revision with the non-conflicting names.</para> | |
385 | |
386 <para id="x_576">The changeset with case-conflicting names will remain in | |
387 your project's history, and you still won't be able to | |
388 <command role="hg-cmd">hg update</command> your working | |
389 directory to that changeset on a Windows or MacOS system, but | |
390 you can continue development unimpeded.</para> | |
391 | |
392 <note> | |
393 <para id="x_577"> Prior to version 0.9.3, Mercurial did not use a case | |
394 safe repository storage mechanism, and did not detect case | |
395 folding conflicts. If you are using an older version of | |
396 Mercurial on Windows or MacOS, I strongly recommend that you | |
397 upgrade.</para> | |
398 </note> | |
399 | |
400 </sect2> | |
401 </sect1> | |
402 </chapter> | |
403 | |
404 <!-- | |
405 local variables: | |
406 sgml-parent-document: ("00book.xml" "book" "chapter") | |
407 end: | |
408 --> |