comparison en/ch06-filenames.xml @ 749:7e7c47481e4f

Oops, this is the real merge for my hg's oddity
author Dongsheng Song <dongsheng.song@gmail.com>
date Fri, 20 Mar 2009 16:43:35 +0800
parents en/ch07-filenames.xml@cfdb601a3c8b
children 1c13ed2130a7
comparison
equal deleted inserted replaced
748:d13c7c706a58 749:7e7c47481e4f
1 <!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : -->
2
3 <chapter id="chap.names">
4 <?dbhtml filename="file-names-and-pattern-matching.html"?>
5 <title>File names and pattern matching</title>
6
7 <para id="x_543">Mercurial provides mechanisms that let you work with file
8 names in a consistent and expressive way.</para>
9
10 <sect1>
11 <title>Simple file naming</title>
12
13 <para id="x_544">Mercurial uses a unified piece of machinery <quote>under the
14 hood</quote> to handle file names. Every command behaves
15 uniformly with respect to file names. The way in which commands
16 work with file names is as follows.</para>
17
18 <para id="x_545">If you explicitly name real files on the command line,
19 Mercurial works with exactly those files, as you would expect.
20 &interaction.filenames.files;</para>
21
22 <para id="x_546">When you provide a directory name, Mercurial will interpret
23 this as <quote>operate on every file in this directory and its
24 subdirectories</quote>. Mercurial traverses the files and
25 subdirectories in a directory in alphabetical order. When it
26 encounters a subdirectory, it will traverse that subdirectory
27 before continuing with the current directory.</para>
28
29 &interaction.filenames.dirs;
30
31 </sect1>
32 <sect1>
33 <title>Running commands without any file names</title>
34
35 <para id="x_547">Mercurial's commands that work with file names have useful
36 default behaviours when you invoke them without providing any
37 file names or patterns. What kind of behaviour you should
38 expect depends on what the command does. Here are a few rules
39 of thumb you can use to predict what a command is likely to do
40 if you don't give it any names to work with.</para>
41 <itemizedlist>
42 <listitem><para id="x_548">Most commands will operate on the entire working
43 directory. This is what the <command role="hg-cmd">hg
44 add</command> command does, for example.</para>
45 </listitem>
46 <listitem><para id="x_549">If the command has effects that are difficult or
47 impossible to reverse, it will force you to explicitly
48 provide at least one name or pattern (see below). This
49 protects you from accidentally deleting files by running
50 <command role="hg-cmd">hg remove</command> with no
51 arguments, for example.</para>
52 </listitem></itemizedlist>
53
54 <para id="x_54a">It's easy to work around these default behaviours if they
55 don't suit you. If a command normally operates on the whole
56 working directory, you can invoke it on just the current
57 directory and its subdirectories by giving it the name
58 <quote><filename class="directory">.</filename></quote>.</para>
59
60 &interaction.filenames.wdir-subdir;
61
62 <para id="x_54b">Along the same lines, some commands normally print file
63 names relative to the root of the repository, even if you're
64 invoking them from a subdirectory. Such a command will print
65 file names relative to your subdirectory if you give it explicit
66 names. Here, we're going to run <command role="hg-cmd">hg
67 status</command> from a subdirectory, and get it to operate on
68 the entire working directory while printing file names relative
69 to our subdirectory, by passing it the output of the <command
70 role="hg-cmd">hg root</command> command.</para>
71
72 &interaction.filenames.wdir-relname;
73
74 </sect1>
75 <sect1>
76 <title>Telling you what's going on</title>
77
78 <para id="x_54c">The <command role="hg-cmd">hg add</command> example in the
79 preceding section illustrates something else that's helpful
80 about Mercurial commands. If a command operates on a file that
81 you didn't name explicitly on the command line, it will usually
82 print the name of the file, so that you will not be surprised
83 what's going on.</para>
84
85 <para id="x_54d">The principle here is of <emphasis>least
86 surprise</emphasis>. If you've exactly named a file on the
87 command line, there's no point in repeating it back at you. If
88 Mercurial is acting on a file <emphasis>implicitly</emphasis>,
89 because you provided no names, or a directory, or a pattern (see
90 below), it's safest to tell you what it's doing.</para>
91
92 <para id="x_54e">For commands that behave this way, you can silence them
93 using the <option role="hg-opt-global">-q</option> option. You
94 can also get them to print the name of every file, even those
95 you've named explicitly, using the <option
96 role="hg-opt-global">-v</option> option.</para>
97
98 </sect1>
99 <sect1>
100 <title>Using patterns to identify files</title>
101
102 <para id="x_54f">In addition to working with file and directory names,
103 Mercurial lets you use <emphasis>patterns</emphasis> to identify
104 files. Mercurial's pattern handling is expressive.</para>
105
106 <para id="x_550">On Unix-like systems (Linux, MacOS, etc.), the job of
107 matching file names to patterns normally falls to the shell. On
108 these systems, you must explicitly tell Mercurial that a name is
109 a pattern. On Windows, the shell does not expand patterns, so
110 Mercurial will automatically identify names that are patterns,
111 and expand them for you.</para>
112
113 <para id="x_551">To provide a pattern in place of a regular name on the
114 command line, the mechanism is simple:</para>
115 <programlisting>syntax:patternbody</programlisting>
116 <para id="x_552">That is, a pattern is identified by a short text string that
117 says what kind of pattern this is, followed by a colon, followed
118 by the actual pattern.</para>
119
120 <para id="x_553">Mercurial supports two kinds of pattern syntax. The most
121 frequently used is called <literal>glob</literal>; this is the
122 same kind of pattern matching used by the Unix shell, and should
123 be familiar to Windows command prompt users, too.</para>
124
125 <para id="x_554">When Mercurial does automatic pattern matching on Windows,
126 it uses <literal>glob</literal> syntax. You can thus omit the
127 <quote><literal>glob:</literal></quote> prefix on Windows, but
128 it's safe to use it, too.</para>
129
130 <para id="x_555">The <literal>re</literal> syntax is more powerful; it lets
131 you specify patterns using regular expressions, also known as
132 regexps.</para>
133
134 <para id="x_556">By the way, in the examples that follow, notice that I'm
135 careful to wrap all of my patterns in quote characters, so that
136 they won't get expanded by the shell before Mercurial sees
137 them.</para>
138
139 <sect2>
140 <title>Shell-style <literal>glob</literal> patterns</title>
141
142 <para id="x_557">This is an overview of the kinds of patterns you can use
143 when you're matching on glob patterns.</para>
144
145 <para id="x_558">The <quote><literal>*</literal></quote> character matches
146 any string, within a single directory.</para>
147
148 &interaction.filenames.glob.star;
149
150 <para id="x_559">The <quote><literal>**</literal></quote> pattern matches
151 any string, and crosses directory boundaries. It's not a
152 standard Unix glob token, but it's accepted by several popular
153 Unix shells, and is very useful.</para>
154
155 &interaction.filenames.glob.starstar;
156
157 <para id="x_55a">The <quote><literal>?</literal></quote> pattern matches
158 any single character.</para>
159
160 &interaction.filenames.glob.question;
161
162 <para id="x_55b">The <quote><literal>[</literal></quote> character begins a
163 <emphasis>character class</emphasis>. This matches any single
164 character within the class. The class ends with a
165 <quote><literal>]</literal></quote> character. A class may
166 contain multiple <emphasis>range</emphasis>s of the form
167 <quote><literal>a-f</literal></quote>, which is shorthand for
168 <quote><literal>abcdef</literal></quote>.</para>
169
170 &interaction.filenames.glob.range;
171
172 <para id="x_55c">If the first character after the
173 <quote><literal>[</literal></quote> in a character class is a
174 <quote><literal>!</literal></quote>, it
175 <emphasis>negates</emphasis> the class, making it match any
176 single character not in the class.</para>
177
178 <para id="x_55d">A <quote><literal>{</literal></quote> begins a group of
179 subpatterns, where the whole group matches if any subpattern
180 in the group matches. The <quote><literal>,</literal></quote>
181 character separates subpatterns, and
182 <quote><literal>}</literal></quote> ends the group.</para>
183
184 &interaction.filenames.glob.group;
185
186 <sect3>
187 <title>Watch out!</title>
188
189 <para id="x_55e">Don't forget that if you want to match a pattern in any
190 directory, you should not be using the
191 <quote><literal>*</literal></quote> match-any token, as this
192 will only match within one directory. Instead, use the
193 <quote><literal>**</literal></quote> token. This small
194 example illustrates the difference between the two.</para>
195
196 &interaction.filenames.glob.star-starstar;
197
198 </sect3>
199 </sect2>
200 <sect2>
201 <title>Regular expression matching with <literal>re</literal>
202 patterns</title>
203
204 <para id="x_55f">Mercurial accepts the same regular expression syntax as
205 the Python programming language (it uses Python's regexp
206 engine internally). This is based on the Perl language's
207 regexp syntax, which is the most popular dialect in use (it's
208 also used in Java, for example).</para>
209
210 <para id="x_560">I won't discuss Mercurial's regexp dialect in any detail
211 here, as regexps are not often used. Perl-style regexps are
212 in any case already exhaustively documented on a multitude of
213 web sites, and in many books. Instead, I will focus here on a
214 few things you should know if you find yourself needing to use
215 regexps with Mercurial.</para>
216
217 <para id="x_561">A regexp is matched against an entire file name, relative
218 to the root of the repository. In other words, even if you're
219 already in subbdirectory <filename
220 class="directory">foo</filename>, if you want to match files
221 under this directory, your pattern must start with
222 <quote><literal>foo/</literal></quote>.</para>
223
224 <para id="x_562">One thing to note, if you're familiar with Perl-style
225 regexps, is that Mercurial's are <emphasis>rooted</emphasis>.
226 That is, a regexp starts matching against the beginning of a
227 string; it doesn't look for a match anywhere within the
228 string. To match anywhere in a string, start your pattern
229 with <quote><literal>.*</literal></quote>.</para>
230
231 </sect2>
232 </sect1>
233 <sect1>
234 <title>Filtering files</title>
235
236 <para id="x_563">Not only does Mercurial give you a variety of ways to
237 specify files; it lets you further winnow those files using
238 <emphasis>filters</emphasis>. Commands that work with file
239 names accept two filtering options.</para>
240 <itemizedlist>
241 <listitem><para id="x_564"><option role="hg-opt-global">-I</option>, or
242 <option role="hg-opt-global">--include</option>, lets you
243 specify a pattern that file names must match in order to be
244 processed.</para>
245 </listitem>
246 <listitem><para id="x_565"><option role="hg-opt-global">-X</option>, or
247 <option role="hg-opt-global">--exclude</option>, gives you a
248 way to <emphasis>avoid</emphasis> processing files, if they
249 match this pattern.</para>
250 </listitem></itemizedlist>
251 <para id="x_566">You can provide multiple <option
252 role="hg-opt-global">-I</option> and <option
253 role="hg-opt-global">-X</option> options on the command line,
254 and intermix them as you please. Mercurial interprets the
255 patterns you provide using glob syntax by default (but you can
256 use regexps if you need to).</para>
257
258 <para id="x_567">You can read a <option role="hg-opt-global">-I</option>
259 filter as <quote>process only the files that match this
260 filter</quote>.</para>
261
262 &interaction.filenames.filter.include;
263
264 <para id="x_568">The <option role="hg-opt-global">-X</option> filter is best
265 read as <quote>process only the files that don't match this
266 pattern</quote>.</para>
267
268 &interaction.filenames.filter.exclude;
269
270 </sect1>
271 <sect1>
272 <title>Ignoring unwanted files and directories</title>
273
274 <para id="x_569">XXX.</para>
275
276 </sect1>
277 <sect1 id="sec.names.case">
278 <title>Case sensitivity</title>
279
280 <para id="x_56a">If you're working in a mixed development environment that
281 contains both Linux (or other Unix) systems and Macs or Windows
282 systems, you should keep in the back of your mind the knowledge
283 that they treat the case (<quote>N</quote> versus
284 <quote>n</quote>) of file names in incompatible ways. This is
285 not very likely to affect you, and it's easy to deal with if it
286 does, but it could surprise you if you don't know about
287 it.</para>
288
289 <para id="x_56b">Operating systems and filesystems differ in the way they
290 handle the <emphasis>case</emphasis> of characters in file and
291 directory names. There are three common ways to handle case in
292 names.</para>
293 <itemizedlist>
294 <listitem><para id="x_56c">Completely case insensitive. Uppercase and
295 lowercase versions of a letter are treated as identical,
296 both when creating a file and during subsequent accesses.
297 This is common on older DOS-based systems.</para>
298 </listitem>
299 <listitem><para id="x_56d">Case preserving, but insensitive. When a file
300 or directory is created, the case of its name is stored, and
301 can be retrieved and displayed by the operating system.
302 When an existing file is being looked up, its case is
303 ignored. This is the standard arrangement on Windows and
304 MacOS. The names <filename>foo</filename> and
305 <filename>FoO</filename> identify the same file. This
306 treatment of uppercase and lowercase letters as
307 interchangeable is also referred to as <emphasis>case
308 folding</emphasis>.</para>
309 </listitem>
310 <listitem><para id="x_56e">Case sensitive. The case of a name is
311 significant at all times. The names <filename>foo</filename>
312 and {FoO} identify different files. This is the way Linux
313 and Unix systems normally work.</para>
314 </listitem></itemizedlist>
315
316 <para id="x_56f">On Unix-like systems, it is possible to have any or all of
317 the above ways of handling case in action at once. For example,
318 if you use a USB thumb drive formatted with a FAT32 filesystem
319 on a Linux system, Linux will handle names on that filesystem in
320 a case preserving, but insensitive, way.</para>
321
322 <sect2>
323 <title>Safe, portable repository storage</title>
324
325 <para id="x_570">Mercurial's repository storage mechanism is <emphasis>case
326 safe</emphasis>. It translates file names so that they can
327 be safely stored on both case sensitive and case insensitive
328 filesystems. This means that you can use normal file copying
329 tools to transfer a Mercurial repository onto, for example, a
330 USB thumb drive, and safely move that drive and repository
331 back and forth between a Mac, a PC running Windows, and a
332 Linux box.</para>
333
334 </sect2>
335 <sect2>
336 <title>Detecting case conflicts</title>
337
338 <para id="x_571">When operating in the working directory, Mercurial honours
339 the naming policy of the filesystem where the working
340 directory is located. If the filesystem is case preserving,
341 but insensitive, Mercurial will treat names that differ only
342 in case as the same.</para>
343
344 <para id="x_572">An important aspect of this approach is that it is
345 possible to commit a changeset on a case sensitive (typically
346 Linux or Unix) filesystem that will cause trouble for users on
347 case insensitive (usually Windows and MacOS) users. If a
348 Linux user commits changes to two files, one named
349 <filename>myfile.c</filename> and the other named
350 <filename>MyFile.C</filename>, they will be stored correctly
351 in the repository. And in the working directories of other
352 Linux users, they will be correctly represented as separate
353 files.</para>
354
355 <para id="x_573">If a Windows or Mac user pulls this change, they will not
356 initially have a problem, because Mercurial's repository
357 storage mechanism is case safe. However, once they try to
358 <command role="hg-cmd">hg update</command> the working
359 directory to that changeset, or <command role="hg-cmd">hg
360 merge</command> with that changeset, Mercurial will spot the
361 conflict between the two file names that the filesystem would
362 treat as the same, and forbid the update or merge from
363 occurring.</para>
364
365 </sect2>
366 <sect2>
367 <title>Fixing a case conflict</title>
368
369 <para id="x_574">If you are using Windows or a Mac in a mixed environment
370 where some of your collaborators are using Linux or Unix, and
371 Mercurial reports a case folding conflict when you try to
372 <command role="hg-cmd">hg update</command> or <command
373 role="hg-cmd">hg merge</command>, the procedure to fix the
374 problem is simple.</para>
375
376 <para id="x_575">Just find a nearby Linux or Unix box, clone the problem
377 repository onto it, and use Mercurial's <command
378 role="hg-cmd">hg rename</command> command to change the
379 names of any offending files or directories so that they will
380 no longer cause case folding conflicts. Commit this change,
381 <command role="hg-cmd">hg pull</command> or <command
382 role="hg-cmd">hg push</command> it across to your Windows or
383 MacOS system, and <command role="hg-cmd">hg update</command>
384 to the revision with the non-conflicting names.</para>
385
386 <para id="x_576">The changeset with case-conflicting names will remain in
387 your project's history, and you still won't be able to
388 <command role="hg-cmd">hg update</command> your working
389 directory to that changeset on a Windows or MacOS system, but
390 you can continue development unimpeded.</para>
391
392 <note>
393 <para id="x_577"> Prior to version 0.9.3, Mercurial did not use a case
394 safe repository storage mechanism, and did not detect case
395 folding conflicts. If you are using an older version of
396 Mercurial on Windows or MacOS, I strongly recommend that you
397 upgrade.</para>
398 </note>
399
400 </sect2>
401 </sect1>
402 </chapter>
403
404 <!--
405 local variables:
406 sgml-parent-document: ("00book.xml" "book" "chapter")
407 end:
408 -->