comparison en/ch07-filenames.xml @ 658:b90b024729f1

WIP DocBook snapshot that all compiles. Mirabile dictu!
author Bryan O'Sullivan <bos@serpentine.com>
date Wed, 18 Feb 2009 00:22:09 -0800
parents en/ch07-filenames.tex@f72b7e6cbe90
children 21c62e09b99f
comparison
equal deleted inserted replaced
657:8631da51309b 658:b90b024729f1
1 <!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : -->
2
3 <chapter id="chap:names">
4 <title>File names and pattern matching</title>
5
6 <para>Mercurial provides mechanisms that let you work with file
7 names in a consistent and expressive way.</para>
8
9 <sect1>
10 <title>Simple file naming</title>
11
12 <para>Mercurial uses a unified piece of machinery <quote>under the
13 hood</quote> to handle file names. Every command behaves
14 uniformly with respect to file names. The way in which commands
15 work with file names is as follows.</para>
16
17 <para>If you explicitly name real files on the command line,
18 Mercurial works with exactly those files, as you would expect.
19 <!-- &interaction.filenames.files; --></para>
20
21 <para>When you provide a directory name, Mercurial will interpret
22 this as <quote>operate on every file in this directory and its
23 subdirectories</quote>. Mercurial traverses the files and
24 subdirectories in a directory in alphabetical order. When it
25 encounters a subdirectory, it will traverse that subdirectory
26 before continuing with the current directory. <!--
27 &interaction.filenames.dirs; --></para>
28
29 </sect1>
30 <sect1>
31 <title>Running commands without any file names</title>
32
33 <para>Mercurial's commands that work with file names have useful
34 default behaviours when you invoke them without providing any
35 file names or patterns. What kind of behaviour you should
36 expect depends on what the command does. Here are a few rules
37 of thumb you can use to predict what a command is likely to do
38 if you don't give it any names to work with.</para>
39 <itemizedlist>
40 <listitem><para>Most commands will operate on the entire working
41 directory. This is what the <command role="hg-cmd">hg
42 add</command> command does, for example.</para>
43 </listitem>
44 <listitem><para>If the command has effects that are difficult or
45 impossible to reverse, it will force you to explicitly
46 provide at least one name or pattern (see below). This
47 protects you from accidentally deleting files by running
48 <command role="hg-cmd">hg remove</command> with no
49 arguments, for example.</para>
50 </listitem></itemizedlist>
51
52 <para>It's easy to work around these default behaviours if they
53 don't suit you. If a command normally operates on the whole
54 working directory, you can invoke it on just the current
55 directory and its subdirectories by giving it the name
56 <quote><filename class="directory">.</filename></quote>. <!--
57 &interaction.filenames.wdir-subdir; --></para>
58
59 <para>Along the same lines, some commands normally print file
60 names relative to the root of the repository, even if you're
61 invoking them from a subdirectory. Such a command will print
62 file names relative to your subdirectory if you give it explicit
63 names. Here, we're going to run <command role="hg-cmd">hg
64 status</command> from a subdirectory, and get it to operate on
65 the entire working directory while printing file names relative
66 to our subdirectory, by passing it the output of the <command
67 role="hg-cmd">hg root</command> command. <!--
68 &interaction.filenames.wdir-relname; --></para>
69
70 </sect1>
71 <sect1>
72 <title>Telling you what's going on</title>
73
74 <para>The <command role="hg-cmd">hg add</command> example in the
75 preceding section illustrates something else that's helpful
76 about Mercurial commands. If a command operates on a file that
77 you didn't name explicitly on the command line, it will usually
78 print the name of the file, so that you will not be surprised
79 what's going on.</para>
80
81 <para>The principle here is of <emphasis>least
82 surprise</emphasis>. If you've exactly named a file on the
83 command line, there's no point in repeating it back at you. If
84 Mercurial is acting on a file <emphasis>implicitly</emphasis>,
85 because you provided no names, or a directory, or a pattern (see
86 below), it's safest to tell you what it's doing.</para>
87
88 <para>For commands that behave this way, you can silence them
89 using the <option role="hg-opt-global">-q</option> option. You
90 can also get them to print the name of every file, even those
91 you've named explicitly, using the <option
92 role="hg-opt-global">-v</option> option.</para>
93
94 </sect1>
95 <sect1>
96 <title>Using patterns to identify files</title>
97
98 <para>In addition to working with file and directory names,
99 Mercurial lets you use <emphasis>patterns</emphasis> to identify
100 files. Mercurial's pattern handling is expressive.</para>
101
102 <para>On Unix-like systems (Linux, MacOS, etc.), the job of
103 matching file names to patterns normally falls to the shell. On
104 these systems, you must explicitly tell Mercurial that a name is
105 a pattern. On Windows, the shell does not expand patterns, so
106 Mercurial will automatically identify names that are patterns,
107 and expand them for you.</para>
108
109 <para>To provide a pattern in place of a regular name on the
110 command line, the mechanism is simple:</para>
111 <programlisting>syntax:patternbody</programlisting>
112 <para>That is, a pattern is identified by a short text string that
113 says what kind of pattern this is, followed by a colon, followed
114 by the actual pattern.</para>
115
116 <para>Mercurial supports two kinds of pattern syntax. The most
117 frequently used is called <literal>glob</literal>; this is the
118 same kind of pattern matching used by the Unix shell, and should
119 be familiar to Windows command prompt users, too.</para>
120
121 <para>When Mercurial does automatic pattern matching on Windows,
122 it uses <literal>glob</literal> syntax. You can thus omit the
123 <quote><literal>glob:</literal></quote> prefix on Windows, but
124 it's safe to use it, too.</para>
125
126 <para>The <literal>re</literal> syntax is more powerful; it lets
127 you specify patterns using regular expressions, also known as
128 regexps.</para>
129
130 <para>By the way, in the examples that follow, notice that I'm
131 careful to wrap all of my patterns in quote characters, so that
132 they won't get expanded by the shell before Mercurial sees
133 them.</para>
134
135 <sect2>
136 <title>Shell-style <literal>glob</literal> patterns</title>
137
138 <para>This is an overview of the kinds of patterns you can use
139 when you're matching on glob patterns.</para>
140
141 <para>The <quote><literal>*</literal></quote> character matches
142 any string, within a single directory. <!--
143 &interaction.filenames.glob.star; --></para>
144
145 <para>The <quote><literal>**</literal></quote> pattern matches
146 any string, and crosses directory boundaries. It's not a
147 standard Unix glob token, but it's accepted by several popular
148 Unix shells, and is very useful. <!--
149 &interaction.filenames.glob.starstar; --></para>
150
151 <para>The <quote><literal>?</literal></quote> pattern matches
152 any single character. <!--
153 &interaction.filenames.glob.question; --></para>
154
155 <para>The <quote><literal>[</literal></quote> character begins a
156 <emphasis>character class</emphasis>. This matches any single
157 character within the class. The class ends with a
158 <quote><literal>]</literal></quote> character. A class may
159 contain multiple <emphasis>range</emphasis>s of the form
160 <quote><literal>a-f</literal></quote>, which is shorthand for
161 <quote><literal>abcdef</literal></quote>. <!--
162 &interaction.filenames.glob.range; --> If the first character
163 after the <quote><literal>[</literal></quote> in a character
164 class is a <quote><literal>!</literal></quote>, it
165 <emphasis>negates</emphasis> the class, making it match any
166 single character not in the class.</para>
167
168 <para>A <quote><literal>{</literal></quote> begins a group of
169 subpatterns, where the whole group matches if any subpattern
170 in the group matches. The <quote><literal>,</literal></quote>
171 character separates subpatterns, and <quote>\texttt{}}</quote>
172 ends the group. <!-- &interaction.filenames.glob.group;
173 --></para>
174
175 <sect3>
176 <title>Watch out!</title>
177
178 <para>Don't forget that if you want to match a pattern in any
179 directory, you should not be using the
180 <quote><literal>*</literal></quote> match-any token, as this
181 will only match within one directory. Instead, use the
182 <quote><literal>**</literal></quote> token. This small
183 example illustrates the difference between the two. <!--
184 &interaction.filenames.glob.star-starstar; --></para>
185
186 </sect3>
187 </sect2>
188 <sect2>
189 <title>Regular expression matching with <literal>re</literal>
190 patterns</title>
191
192 <para>Mercurial accepts the same regular expression syntax as
193 the Python programming language (it uses Python's regexp
194 engine internally). This is based on the Perl language's
195 regexp syntax, which is the most popular dialect in use (it's
196 also used in Java, for example).</para>
197
198 <para>I won't discuss Mercurial's regexp dialect in any detail
199 here, as regexps are not often used. Perl-style regexps are
200 in any case already exhaustively documented on a multitude of
201 web sites, and in many books. Instead, I will focus here on a
202 few things you should know if you find yourself needing to use
203 regexps with Mercurial.</para>
204
205 <para>A regexp is matched against an entire file name, relative
206 to the root of the repository. In other words, even if you're
207 already in subbdirectory <filename
208 class="directory">foo</filename>, if you want to match files
209 under this directory, your pattern must start with
210 <quote><literal>foo/</literal></quote>.</para>
211
212 <para>One thing to note, if you're familiar with Perl-style
213 regexps, is that Mercurial's are <emphasis>rooted</emphasis>.
214 That is, a regexp starts matching against the beginning of a
215 string; it doesn't look for a match anywhere within the
216 string. To match anywhere in a string, start your pattern
217 with <quote><literal>.*</literal></quote>.</para>
218
219 </sect2>
220 </sect1>
221 <sect1>
222 <title>Filtering files</title>
223
224 <para>Not only does Mercurial give you a variety of ways to
225 specify files; it lets you further winnow those files using
226 <emphasis>filters</emphasis>. Commands that work with file
227 names accept two filtering options.</para>
228 <itemizedlist>
229 <listitem><para><option role="hg-opt-global">-I</option>, or
230 <option role="hg-opt-global">--include</option>, lets you
231 specify a pattern that file names must match in order to be
232 processed.</para>
233 </listitem>
234 <listitem><para><option role="hg-opt-global">-X</option>, or
235 <option role="hg-opt-global">--exclude</option>, gives you a
236 way to <emphasis>avoid</emphasis> processing files, if they
237 match this pattern.</para>
238 </listitem></itemizedlist>
239 <para>You can provide multiple <option
240 role="hg-opt-global">-I</option> and <option
241 role="hg-opt-global">-X</option> options on the command line,
242 and intermix them as you please. Mercurial interprets the
243 patterns you provide using glob syntax by default (but you can
244 use regexps if you need to).</para>
245
246 <para>You can read a <option role="hg-opt-global">-I</option>
247 filter as <quote>process only the files that match this
248 filter</quote>. <!-- &interaction.filenames.filter.include;
249 --> The <option role="hg-opt-global">-X</option> filter is best
250 read as <quote>process only the files that don't match this
251 pattern</quote>. <!-- &interaction.filenames.filter.exclude;
252 --></para>
253
254 </sect1>
255 <sect1>
256 <title>Ignoring unwanted files and directories</title>
257
258 <para>XXX.</para>
259
260 </sect1>
261 <sect1 id="sec:names:case">
262 <title>Case sensitivity</title>
263
264 <para>If you're working in a mixed development environment that
265 contains both Linux (or other Unix) systems and Macs or Windows
266 systems, you should keep in the back of your mind the knowledge
267 that they treat the case (<quote>N</quote> versus
268 <quote>n</quote>) of file names in incompatible ways. This is
269 not very likely to affect you, and it's easy to deal with if it
270 does, but it could surprise you if you don't know about
271 it.</para>
272
273 <para>Operating systems and filesystems differ in the way they
274 handle the <emphasis>case</emphasis> of characters in file and
275 directory names. There are three common ways to handle case in
276 names.</para>
277 <itemizedlist>
278 <listitem><para>Completely case insensitive. Uppercase and
279 lowercase versions of a letter are treated as identical,
280 both when creating a file and during subsequent accesses.
281 This is common on older DOS-based systems.</para>
282 </listitem>
283 <listitem><para>Case preserving, but insensitive. When a file
284 or directory is created, the case of its name is stored, and
285 can be retrieved and displayed by the operating system.
286 When an existing file is being looked up, its case is
287 ignored. This is the standard arrangement on Windows and
288 MacOS. The names <filename>foo</filename> and
289 <filename>FoO</filename> identify the same file. This
290 treatment of uppercase and lowercase letters as
291 interchangeable is also referred to as <emphasis>case
292 folding</emphasis>.</para>
293 </listitem>
294 <listitem><para>Case sensitive. The case of a name is
295 significant at all times. The names <filename>foo</filename>
296 and {FoO} identify different files. This is the way Linux
297 and Unix systems normally work.</para>
298 </listitem></itemizedlist>
299
300 <para>On Unix-like systems, it is possible to have any or all of
301 the above ways of handling case in action at once. For example,
302 if you use a USB thumb drive formatted with a FAT32 filesystem
303 on a Linux system, Linux will handle names on that filesystem in
304 a case preserving, but insensitive, way.</para>
305
306 <sect2>
307 <title>Safe, portable repository storage</title>
308
309 <para>Mercurial's repository storage mechanism is <emphasis>case
310 safe</emphasis>. It translates file names so that they can
311 be safely stored on both case sensitive and case insensitive
312 filesystems. This means that you can use normal file copying
313 tools to transfer a Mercurial repository onto, for example, a
314 USB thumb drive, and safely move that drive and repository
315 back and forth between a Mac, a PC running Windows, and a
316 Linux box.</para>
317
318 </sect2>
319 <sect2>
320 <title>Detecting case conflicts</title>
321
322 <para>When operating in the working directory, Mercurial honours
323 the naming policy of the filesystem where the working
324 directory is located. If the filesystem is case preserving,
325 but insensitive, Mercurial will treat names that differ only
326 in case as the same.</para>
327
328 <para>An important aspect of this approach is that it is
329 possible to commit a changeset on a case sensitive (typically
330 Linux or Unix) filesystem that will cause trouble for users on
331 case insensitive (usually Windows and MacOS) users. If a
332 Linux user commits changes to two files, one named
333 <filename>myfile.c</filename> and the other named
334 <filename>MyFile.C</filename>, they will be stored correctly
335 in the repository. And in the working directories of other
336 Linux users, they will be correctly represented as separate
337 files.</para>
338
339 <para>If a Windows or Mac user pulls this change, they will not
340 initially have a problem, because Mercurial's repository
341 storage mechanism is case safe. However, once they try to
342 <command role="hg-cmd">hg update</command> the working
343 directory to that changeset, or <command role="hg-cmd">hg
344 merge</command> with that changeset, Mercurial will spot the
345 conflict between the two file names that the filesystem would
346 treat as the same, and forbid the update or merge from
347 occurring.</para>
348
349 </sect2>
350 <sect2>
351 <title>Fixing a case conflict</title>
352
353 <para>If you are using Windows or a Mac in a mixed environment
354 where some of your collaborators are using Linux or Unix, and
355 Mercurial reports a case folding conflict when you try to
356 <command role="hg-cmd">hg update</command> or <command
357 role="hg-cmd">hg merge</command>, the procedure to fix the
358 problem is simple.</para>
359
360 <para>Just find a nearby Linux or Unix box, clone the problem
361 repository onto it, and use Mercurial's <command
362 role="hg-cmd">hg rename</command> command to change the
363 names of any offending files or directories so that they will
364 no longer cause case folding conflicts. Commit this change,
365 <command role="hg-cmd">hg pull</command> or <command
366 role="hg-cmd">hg push</command> it across to your Windows or
367 MacOS system, and <command role="hg-cmd">hg update</command>
368 to the revision with the non-conflicting names.</para>
369
370 <para>The changeset with case-conflicting names will remain in
371 your project's history, and you still won't be able to
372 <command role="hg-cmd">hg update</command> your working
373 directory to that changeset on a Windows or MacOS system, but
374 you can continue development unimpeded.</para>
375
376 <note>
377 <para> Prior to version 0.9.3, Mercurial did not use a case
378 safe repository storage mechanism, and did not detect case
379 folding conflicts. If you are using an older version of
380 Mercurial on Windows or MacOS, I strongly recommend that you
381 upgrade.</para>
382 </note>
383
384 </sect2>
385 </sect1>
386 </chapter>
387
388 <!--
389 local variables:
390 sgml-parent-document: ("00book.xml" "book" "chapter")
391 end:
392 -->