Mercurial > hgbook

\chapter{File names and pattern matching}
\label{chap:names}

Mercurial provee mecanismos que le permiten trabajar con nombres de
ficheros en una manera consistente y expresiva.

\section{Nombrado de ficheros simple}

% TODO traducción literal de "under the hood". revisar
Mercurial usa un mecanismo unificado ``bajo el capó'' para manejar
nombres de ficheros. Cada comando se comporta de manera uniforme con
respecto a los nombres de fichero. La manera en que los comandos
operan con nombres de fichero es la siguiente.

Si usted especifica explícitamente nombres reales de ficheros en la
línea de comandos, Mercurial opera únicamente sobre dichos ficheros,
como usted esperaría.
\interaction{filenames.files}

Cuando usted provee el nombre de un directorio, Mercurial interpreta
eso como ``opere en cada fichero en este directorio y sus
subdirectorios''. Mercurial va por todos los ficheros y subdirectorios
de un directorio en orden alfabético. Cuando encuentra un
subdirectorio, lo recorrerá antes de continuar con el directorio
actual.
\interaction{filenames.dirs}

\section{Ejecución de comandos sin ningún nombre de fichero}

Los comandos de Mercurial que trabajan con nombres de fichero tienen
comportamientos por defecto adecuados cuando son utilizados sin pasar
ningún patrón o nombre de fichero. El tipo de comportamiento depende
de lo que haga el comando. Aquí presento unas cuantas reglas generales
que usted puede usar para que es lo que probablemente hará un comando
si usted no le pasa ningún nombre de fichero con el cual trabajar.
\begin{itemize}
\item Muchos comandos operarán sobre el directorio de trabajo
    completo. Por ejemplo, esto es lo que hace el comando
    \hgcmd{add},
\item Si el comando tiene efectos difíciles o incluso imposibles de
    revertir, se le obligará a usted a proveer explícitamente al menos
    % TODO revisar ese "lo proteje a usted"
    un nombre o patrón (ver más abajo). Esto lo proteje a usted de,
    por ejemplo, borrar ficheros accidentalmente al ejecutar
    \hgcmd{remove} sin ningún argumento.
\end{itemize}


Es fácil evitar este comportamiento por defecto, si no es el adecuado
para usted. Si un comando opera normalmente en todo el directorio de
trabajo, usted puede llamarlo para que trabaje sólo en el directorio
actual y sus subdirectorio pasándole el nombre ``\dirname{.}''.
\interaction{filenames.wdir-subdir}

Siguiendo la misma línea, algunos comandos normalmente imprimen las
rutas de ficheros con respecto a la raíz del repositorio, aún si usted
los llama dentro de un subdirectorio. Dichos comandos imprimirán las
rutas de los ficheros respecto al directorio en que usted se encuentra
si se les pasan nombres explícitos. Vamos a ejecutar el comando
\hgcmd{status} desde un subdirectorio, y a hacer que opere en el
directorio de trabajo completo, a la vez que todas las rutas de
ficheros se imprimen respecto a nuestro subdirectorio, pasándole la
salida del comando \hgcmd{root}.
\interaction{filenames.wdir-relname}

\section{Reportar que está pasando}

El ejemplo con el comando \hgcmd{add} en la sección anterior ilustra
algo más que es útil acerca de los comandos de Mercurial. Si un
comando opera en un fichero que usted no pasó explícitamente en la
línea de comandos, usualmente se imprimirá el nombre del fichero, para
que usted no sea sorprendido por lo que sucede.

Esto es el principio de \emph{mínima sorpresa}. Si usted se ha
referido explícitamente a un fichero en la línea de comandos, no tiene
mucho sentido repetir esto de vuelta a usted. Si Mercurial está
actuando en un fichero \emph{implícitamente}, porque usted no pasó
nombres, ni directorios, ni patrones (ver más abajo), lo más seguro es
decirle a usted qué se está haciendo.

Usted puede silenciar a los comandos que se comportan de esta manera
usando la opción \hggopt{-q}.  También puede hacer que impriman el
nombre de cada fichero, aún aquellos que usted indicó explícitamente,
usando la opción \hggopt{-v}.

\section{Uso de patrones para identificar ficheros}

Además de trabajar con nombres de ficheros y directorios, Mercurial le
permite usar \emph{patrones} para identificar ficheros. El manejo de
patrones de Mercurial es expresivo.

En sistemas tipo Unix (Linux, MacOS, etc.), el trabajo de asociar
patrones con nombres de ficheros recae sobre el intérprete de comandos.
En estos sistemas, usted debe indicarle explícitamente a Mercurial que
el nombre que se le pasa es un patrón. En Windows, el intérprete no
expande los patrones, así que Mercurial identificará automáticamente
los nombres que son patrones, y hará la expansión necesaria.

Para pasar un patrón en vez de un nombre normal en la línea de
comandos, el mecanismo es simple:
\begin{codesample2}
  syntax:patternbody
\end{codesample2}
Un patrón es identificado por una cadena de texto corta que indica qué
tipo de patrón es, seguido por un dos puntos, seguido por el patrón en
sí.

Mercurial soporta dos tipos de sintaxis para patrones. La que se usa
con más frecuencia  se denomina \texttt{glob}\ndt{Grupo, colección,
aglomeración.}; es el mismo tipo de asociación de patrones usado por
el intérprete de Unix, y también debería ser familiar para los
usuarios de la línea de comandos de Windows.

Cuando Mercurial hace asociación automática de patrones en Windows,
usa la sintaxis \texttt{glob}.  Por esto, usted puede omitir el
prefijo ``\texttt{glob:}'' en Windows, pero también es seguro usarlo.

La sintaxis \texttt{re}\ndt{Expresiones regulares.} es más poderosa;
le permite especificar patrones usando expresiones regulares, también
conocidas como regexps.

A propósito, en los ejemplos siguientes, por favor note que yo tengo
el cuidado de rodear todos mis patrones con comillas sencillas, para
que no sean expandidos por el intérprete antes de que Mercurial pueda
verlos.

\subsection{Patrones \texttt{glob} estilo intérprete}

Este es un vistazo general de los tipos de patrones que usted puede
usar cuando está usando asociación con patrone glob.

La secuencia ``\texttt{*}'' se asocia con cualquier cadena, dentro de
un único directorio.
\interaction{filenames.glob.star}

La secuencia ``\texttt{**}'' se asocia con cualquier cadena, y cruza los
% TODO token
límites de los directorios. No es una elemento estándar de los tokens
de glob de Unix, pero es aceptado por varios intérpretes Unix
populares, y es muy útil.
\interaction{filenames.glob.starstar}

La secuencia ``\texttt{?}'' se asocia con cualquier caracter sencillo.
\interaction{filenames.glob.question}

El caracter ``\texttt{[}'' marca el inicio de una \emph{clase de
caracteres}.  Ella se asocia con cualquier caracter sencillo dentro de
la clase. La clase se finaliza con un caracter ``\texttt{]}''. Una
clase puede contener múltiples \emph{rango}s de la forma
``\texttt{a-f}'', que en este caso es una abreviación para
``\texttt{abcdef}''.
\interaction{filenames.glob.range}
Si el primer caracter en aparecer después de ``\texttt{[}'' en la
clase de caracteres es un ``\texttt{!}'', se \emph{niega} la clase,
haciendo que se asocie con cualquier caracter sencillo que no se
encuentre en la clase.

A ``\texttt{\{}'' begins a group of subpatterns, where the whole group
matches if any subpattern in the group matches.  The ``\texttt{,}''
character separates subpatterns, and ``\texttt{\}}'' ends the group.
\interaction{filenames.glob.group}

\subsubsection{Watch out!}

Don't forget that if you want to match a pattern in any directory, you
should not be using the ``\texttt{*}'' match-any token, as this will
only match within one directory.  Instead, use the ``\texttt{**}''
token.  This small example illustrates the difference between the two.
\interaction{filenames.glob.star-starstar}

\subsection{Regular expression matching with \texttt{re} patterns}

Mercurial accepts the same regular expression syntax as the Python
programming language (it uses Python's regexp engine internally).
This is based on the Perl language's regexp syntax, which is the most
popular dialect in use (it's also used in Java, for example).

I won't discuss Mercurial's regexp dialect in any detail here, as
regexps are not often used.  Perl-style regexps are in any case
already exhaustively documented on a multitude of web sites, and in
many books.  Instead, I will focus here on a few things you should
know if you find yourself needing to use regexps with Mercurial.

A regexp is matched against an entire file name, relative to the root
of the repository.  In other words, even if you're already in
subbdirectory \dirname{foo}, if you want to match files under this
directory, your pattern must start with ``\texttt{foo/}''.

One thing to note, if you're familiar with Perl-style regexps, is that
Mercurial's are \emph{rooted}.  That is, a regexp starts matching
against the beginning of a string; it doesn't look for a match
anywhere within the string.  To match anywhere in a string, start
your pattern with ``\texttt{.*}''.

\section{Filtering files}

Not only does Mercurial give you a variety of ways to specify files;
it lets you further winnow those files using \emph{filters}.  Commands
that work with file names accept two filtering options.
\begin{itemize}
\item \hggopt{-I}, or \hggopt{--include}, lets you specify a pattern
  that file names must match in order to be processed.
\item \hggopt{-X}, or \hggopt{--exclude}, gives you a way to
  \emph{avoid} processing files, if they match this pattern.
\end{itemize}
You can provide multiple \hggopt{-I} and \hggopt{-X} options on the
command line, and intermix them as you please.  Mercurial interprets
the patterns you provide using glob syntax by default (but you can use
regexps if you need to).

You can read a \hggopt{-I} filter as ``process only the files that
match this filter''.
\interaction{filenames.filter.include}
The \hggopt{-X} filter is best read as ``process only the files that
don't match this pattern''.
\interaction{filenames.filter.exclude}

\section{Ignoring unwanted files and directories}

XXX.

\section{Case sensitivity}
\label{sec:names:case}

If you're working in a mixed development environment that contains
both Linux (or other Unix) systems and Macs or Windows systems, you
should keep in the back of your mind the knowledge that they treat the
case (``N'' versus ``n'') of file names in incompatible ways.  This is
not very likely to affect you, and it's easy to deal with if it does,
but it could surprise you if you don't know about it.

Operating systems and filesystems differ in the way they handle the
\emph{case} of characters in file and directory names.  There are
three common ways to handle case in names.
\begin{itemize}
\item Completely case insensitive.  Uppercase and lowercase versions
  of a letter are treated as identical, both when creating a file and
  during subsequent accesses.  This is common on older DOS-based
  systems.
\item Case preserving, but insensitive.  When a file or directory is
  created, the case of its name is stored, and can be retrieved and
  displayed by the operating system.  When an existing file is being
  looked up, its case is ignored.  This is the standard arrangement on
  Windows and MacOS.  The names \filename{foo} and \filename{FoO}
  identify the same file.  This treatment of uppercase and lowercase
  letters as interchangeable is also referred to as \emph{case
    folding}.
\item Case sensitive.  The case of a name is significant at all times.
  The names \filename{foo} and {FoO} identify different files.  This
  is the way Linux and Unix systems normally work.
\end{itemize}

On Unix-like systems, it is possible to have any or all of the above
ways of handling case in action at once.  For example, if you use a
USB thumb drive formatted with a FAT32 filesystem on a Linux system,
Linux will handle names on that filesystem in a case preserving, but
insensitive, way.

\subsection{Safe, portable repository storage}

Mercurial's repository storage mechanism is \emph{case safe}.  It
translates file names so that they can be safely stored on both case
sensitive and case insensitive filesystems.  This means that you can
use normal file copying tools to transfer a Mercurial repository onto,
for example, a USB thumb drive, and safely move that drive and
repository back and forth between a Mac, a PC running Windows, and a
Linux box.

\subsection{Detecting case conflicts}

When operating in the working directory, Mercurial honours the naming
policy of the filesystem where the working directory is located.  If
the filesystem is case preserving, but insensitive, Mercurial will
treat names that differ only in case as the same.

An important aspect of this approach is that it is possible to commit
a changeset on a case sensitive (typically Linux or Unix) filesystem
that will cause trouble for users on case insensitive (usually Windows
and MacOS) users.  If a Linux user commits changes to two files, one
named \filename{myfile.c} and the other named \filename{MyFile.C},
they will be stored correctly in the repository.  And in the working
directories of other Linux users, they will be correctly represented
as separate files.

If a Windows or Mac user pulls this change, they will not initially
have a problem, because Mercurial's repository storage mechanism is
case safe.  However, once they try to \hgcmd{update} the working
directory to that changeset, or \hgcmd{merge} with that changeset,
Mercurial will spot the conflict between the two file names that the
filesystem would treat as the same, and forbid the update or merge
from occurring.

\subsection{Fixing a case conflict}

If you are using Windows or a Mac in a mixed environment where some of
your collaborators are using Linux or Unix, and Mercurial reports a
case folding conflict when you try to \hgcmd{update} or \hgcmd{merge},
the procedure to fix the problem is simple.

Just find a nearby Linux or Unix box, clone the problem repository
onto it, and use Mercurial's \hgcmd{rename} command to change the
names of any offending files or directories so that they will no
longer cause case folding conflicts.  Commit this change, \hgcmd{pull}
or \hgcmd{push} it across to your Windows or MacOS system, and
\hgcmd{update} to the revision with the non-conflicting names.

The changeset with case-conflicting names will remain in your
project's history, and you still won't be able to \hgcmd{update} your
working directory to that changeset on a Windows or MacOS system, but
you can continue development unimpeded.

\begin{note}
  Prior to version~0.9.3, Mercurial did not use a case safe repository
  storage mechanism, and did not detect case folding conflicts.  If
  you are using an older version of Mercurial on Windows or MacOS, I
  strongly recommend that you upgrade.
\end{note}

%%% Local Variables:
%%% mode: latex
%%% TeX-master: "00book"
%%% End:
author	jerojasro@localhost
date	Sun, 30 Nov 2008 18:41:51 -0500
parents	3afc654d70e5
children	5da084395a69