Mercurial > hgbook

diff tools/po4a/lib/Locale/Po4a/TransTractor.pm @ 722:082bb76417f1
Add Po4a 0.37-dev(2009-03-08)
author: Dongsheng Song <dongsheng.song@gmail.com>
date: Thu, 12 Mar 2009 15:43:56 +0800
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/po4a/lib/Locale/Po4a/TransTractor.pm	Thu Mar 12 15:43:56 2009 +0800
@@ -0,0 +1,1100 @@
+#!/usr/bin/perl -w
+
+require Exporter;
+
+package Locale::Po4a::TransTractor;
+use DynaLoader;
+
+use 5.006;
+use strict;
+use warnings;
+
+use subs qw(makespace);
+use vars qw($VERSION @ISA @EXPORT);
+$VERSION="0.36";
+@ISA = qw(DynaLoader);
+@EXPORT = qw(new process translate 
+             read write readpo writepo
+             getpoout setpoout);
+
+# Try to use a C extension if present.
+eval("bootstrap Locale::Po4a::TransTractor $VERSION");
+
+use Carp qw(croak);
+use Locale::Po4a::Po;
+use Locale::Po4a::Common;
+
+use File::Path; # mkdir before write
+
+use Encode;
+use Encode::Guess;
+
+=head1 NAME
+
+Locale::Po4a::TransTractor - Generic trans(lator ex)tractor.
+
+=head1 DESCRIPTION
+
+The po4a (po for anything) project goal is to ease translations (and more
+interestingly, the maintenance of translations) using gettext tools on
+areas where they were not expected like documentation.
+
+This class is the ancestor of every po4a parsers used to parse a document to
+search translatable strings, extract them to a po file and replace them by
+their translation in the output document. 
+
+More formally, it takes the following arguments as input:
+
+=over 2
+
+=item -
+
+a document to translate ;
+
+=item -
+
+a po file containing the translations to use.
+
+=back
+
+As output, it produces:
+
+=over 2
+
+=item -
+
+another po file, resulting of the extraction of translatable strings from
+the input document ;
+
+=item -
+
+a translated document, with the same structure than the one in input, but
+with all translatable strings replaced with the translations found in the
+po file provided in input.
+
+=back
+
+Here is a graphical representation of this:
+
+   Input document --\                             /---> Output document
+                     \                           /       (translated)
+                      +-> parse() function -----+
+                     /                           \
+   Input po --------/                             \---> Output po
+                                                         (extracted)
+
+=head1 FUNCTIONS YOUR PARSER SHOULD OVERRIDE
+
+=over 4
+
+=item parse()
+
+This is where all the work takes place: the parsing of input documents, the
+generation of output, and the extraction of the translatable strings. This
+is pretty simple using the provided functions presented in the section
+"INTERNAL FUNCTIONS" below. See also the synopsis, which present an
+example.
+
+This function is called by the process() function bellow, but if you choose
+to use the new() function, and to add content manually to your document,
+you will have to call this function yourself.
+
+=item docheader()
+
+This function returns the header we should add to the produced document,
+quoted properly to be a comment in the target language.  See the section
+"Educating developers about translations", from L<po4a(7)|po4a.7>, for what
+it is good for.
+
+=back
+
+=cut
+
+sub docheader {}
+
+sub parse {}
+
+=head1 SYNOPSIS
+
+The following example parses a list of paragraphs beginning with "<p>". For the sake
+of simplicity, we assume that the document is well formatted, i.e. that '<p>'
+tags are the only tags present, and that this tag is at the very beginning
+of each paragraph.
+
+ sub parse {
+   my $self = shift;
+
+   PARAGRAPH: while (1) {
+       my ($paragraph,$pararef)=("","");
+       my $first=1;
+       my ($line,$lref)=$self->shiftline();
+       while (defined($line)) {
+	   if ($line =~ m/<p>/ && !$first--; ) {
+	       # Not the first time we see <p>. 
+	       # Reput the current line in input,
+	       #  and put the built paragraph to output
+	       $self->unshiftline($line,$lref);
+	      
+	       # Now that the document is formed, translate it:
+	       #   - Remove the leading tag
+	       $paragraph =~ s/^<p>//s;
+
+	       #   - push to output the leading tag (untranslated) and the
+	       #     rest of the paragraph (translated)
+	       $self->pushline(  "<p>"
+                               . $document->translate($paragraph,$pararef)
+                               );
+
+ 	       next PARAGRAPH;
+	   } else {
+	       # Append to the paragraph
+	       $paragraph .= $line;
+	       $pararef = $lref unless(length($pararef));
+	   }
+
+           # Reinit the loop
+           ($line,$lref)=$self->shiftline();
+       }
+       # Did not get a defined line? End of input file.
+       return;
+   }
+ } 
+
+Once you've implemented the parse function, you can use your document
+class, using the public interface presented in the next section.
+
+=head1 PUBLIC INTERFACE for scripts using your parser
+
+=head2 Constructor
+
+=over 4
+
+=item process(%)
+
+This function can do all you need to do with a po4a document in one
+invocation. Its arguments must be packed as a hash. ACTIONS:
+
+=over 3
+
+=item a.
+
+Reads all the po files specified in po_in_name
+
+=item b.
+
+Reads all original documents specified in file_in_name
+
+=item c.
+
+Parses the document
+
+=item d.
+
+Reads and applies all the addenda specified
+
+=item e.
+
+Writes the translated document to file_out_name (if given)
+
+=item f.
+
+Writes the extracted po file to po_out_name (if given)
+
+=back
+
+ARGUMENTS, beside the ones accepted by new() (with expected type):
+
+=over 4
+
+=item file_in_name (@)
+
+List of filenames where we should read the input document.
+
+=item file_in_charset ($)
+
+Charset used in the input document (if it isn't specified, it will try
+to detect it from the input document).
+
+=item file_out_name ($)
+
+Filename where we should write the output document.
+
+=item file_out_charset ($)
+
+Charset used in the output document (if it isn't specified, it will use
+the po file charset).
+
+=item po_in_name (@)
+
+List of filenames where we should read the input po files from, containing
+the translation which will be used to translate the document.
+
+=item po_out_name ($)
+
+Filename where we should write the output po file, containing the strings
+extracted from the input document.
+
+=item addendum (@)
+
+List of filenames where we should read the addenda from.
+
+=item addendum_charset ($)
+
+Charset for the addenda.
+
+=back
+
+=item new(%)
+
+Create a new Po4a document. Accepted options (but be in a hash):
+
+=over 4
+
+=item verbose ($)
+
+Sets the verbosity.
+
+=item debug ($)
+
+Sets the debugging.
+
+=back
+
+=cut
+
+sub process {
+    ## Determine if we were called via an object-ref or a classname
+    my $self = shift;
+
+    ## Any remaining arguments are treated as initial values for the
+    ## hash that is used to represent this object.
+    my %params = @_;
+    
+    # Build the args for new()
+    my %newparams = ();
+    foreach (keys %params) {
+	next if ($_ eq 'po_in_name' ||
+		 $_ eq 'po_out_name' ||
+		 $_ eq 'file_in_name' ||
+		 $_ eq 'file_in_charset' ||
+		 $_ eq 'file_out_name' ||
+		 $_ eq 'file_out_charset' ||
+		 $_ eq 'addendum' ||
+		 $_ eq 'addendum_charset');
+	$newparams{$_}=$params{$_};
+    }
+
+    $self->detected_charset($params{'file_in_charset'});
+    $self->{TT}{'file_out_charset'}=$params{'file_out_charset'};
+    if (defined($self->{TT}{'file_out_charset'}) and
+	length($self->{TT}{'file_out_charset'})) {
+	$self->{TT}{'file_out_encoder'} = find_encoding($self->{TT}{'file_out_charset'});
+    }
+    $self->{TT}{'addendum_charset'}=$params{'addendum_charset'};
+
+    foreach my $file (@{$params{'po_in_name'}}) {
+	print STDERR "readpo($file)... " if $self->debug();
+	$self->readpo($file);
+	print STDERR "done.\n" if $self->debug()
+    }
+    foreach my $file (@{$params{'file_in_name'}}) {
+	print STDERR "read($file)..." if $self->debug();
+	$self->read($file);
+	print STDERR "done.\n"  if $self->debug();
+    }
+    print STDERR "parse..." if $self->debug();
+    $self->parse();
+    print STDERR "done.\n" if $self->debug();
+    foreach my $file (@{$params{'addendum'}}) {
+	print STDERR "addendum($file)..." if $self->debug();
+	$self->addendum($file) || die "An addendum failed\n";
+	print STDERR "done.\n" if $self->debug();
+    }
+    if (defined $params{'file_out_name'}) {
+	print STDERR "write(".$params{'file_out_name'}.")... " 
+	    if $self->debug();
+	$self->write($params{'file_out_name'});
+	print STDERR "done.\n" if $self->debug();
+    }
+    if (defined $params{'po_out_name'}) {
+	print STDERR "writepo(".$params{'po_out_name'}.")... "
+	     if $self->debug();
+	$self->writepo($params{'po_out_name'});
+	print STDERR "done.\n" if $self->debug();
+    }
+    return $self;
+}
+
+sub new {
+    ## Determine if we were called via an object-ref or a classname
+    my $this = shift;
+    my $class = ref($this) || $this;
+    my $self = { };
+    my %options=@_;
+    ## Bless ourselves into the desired class and perform any initialization
+    bless $self, $class;
+    
+    ## initialize the plugin
+    # prevent the plugin from croaking on the options intended for Po.pm
+    $self->{options}{'porefs'} = '';
+    # let the plugin parse the options and such
+    $self->initialize(%options);
+
+    ## Create our private data
+    my %po_options;
+    $po_options{'porefs'} = $self->{options}{'porefs'};
+    
+    # private data
+    $self->{TT}=(); 
+    $self->{TT}{po_in}=Locale::Po4a::Po->new();
+    $self->{TT}{po_out}=Locale::Po4a::Po->new(\%po_options);
+    # Warning, this is an array of array:
+    #  The document is splited on lines, and for each
+    #  [0] is the line content, [1] is the reference [2] the type
+    $self->{TT}{doc_in}=();
+    $self->{TT}{doc_out}=();
+    if (defined $options{'verbose'}) {
+	$self->{TT}{verbose}  =  $options{'verbose'};
+    }
+    if (defined $options{'debug'}) {
+	$self->{TT}{debug}  =  $options{'debug'};
+    }
+    # Input document is in ascii until we prove the opposite (in read())
+    $self->{TT}{ascii_input}=1;
+    # We try not to use utf unless it's forced from the outside (in case the
+    # document isn't in ascii)
+    $self->{TT}{utf_mode}=0;
+
+    
+    return $self;
+}
+
+=back
+
+=head2 Manipulating document files
+
+=over 4
+
+=item read($)
+
+Add another input document at the end of the existing one. The argument is
+the filename to read. 
+
+Please note that it does not parse anything. You should use the parse()
+function when you're done with packing input files into the document. 
+
+=cut
+
+#'
+sub read() {
+    my $self=shift;
+    my $filename=shift
+	or croak wrap_msg(dgettext("po4a", "Can't read from file without having a filename"));
+    my $linenum=0;
+
+    open INPUT,"<$filename" 
+	or croak wrap_msg(dgettext("po4a", "Can't read from %s: %s"), $filename, $!);
+    while (defined (my $textline = <INPUT>)) {
+	$linenum++;
+	my $ref="$filename:$linenum";
+	my @entry=($textline,$ref);
+	push @{$self->{TT}{doc_in}}, @entry;
+
+	if (!defined($self->{TT}{'file_in_charset'})) {
+	    # Detect if this file has non-ascii characters
+	    if($self->{TT}{ascii_input}) {
+		my $decoder = guess_encoding($textline);
+		if (!ref($decoder) or $decoder !~ /Encode::XS=/) {
+		    # We have detected a non-ascii line
+		    $self->{TT}{ascii_input} = 0;
+		    # Save the reference for future error message
+		    $self->{TT}{non_ascii_ref} ||= $ref;
+		}
+	    }
+	}
+    }
+    close INPUT 
+	or croak wrap_msg(dgettext("po4a", "Can't close %s after reading: %s"), $filename, $!);
+
+}
+
+=item write($)
+
+Write the translated document to the given filename.
+
+=cut
+
+sub write {
+    my $self=shift;
+    my $filename=shift
+	or croak wrap_msg(dgettext("po4a", "Can't write to a file without filename"));
+
+    my $fh;
+    if ($filename eq '-') {
+	$fh=\*STDOUT;
+    } else {
+	# make sure the directory in which we should write the localized file exists
+	my $dir = $filename;
+	if ($dir =~ m|/|) {
+	    $dir =~ s|/[^/]*$||;
+	
+	    File::Path::mkpath($dir, 0, 0755) # Croaks on error
+	      if (length ($dir) && ! -e $dir);
+	}
+	open $fh,">$filename"
+	    or croak wrap_msg(dgettext("po4a", "Can't write to %s: %s"), $filename, $!);
+    }
+    
+    map { print $fh $_ } $self->docheader();
+    map { print $fh $_ } @{$self->{TT}{doc_out}};
+
+    if ($filename ne '-') {
+	close $fh or croak wrap_msg(dgettext("po4a", "Can't close %s after writing: %s"), $filename, $!);
+    }
+
+}
+
+=back
+
+=head2 Manipulating po files
+
+=over 4 
+
+=item readpo($)
+
+Add the content of a file (which name is passed in argument) to the
+existing input po. The old content is not discarded.
+
+=item writepo($)
+
+Write the extracted po file to the given filename.
+
+=item stats()
+
+Returns some statistics about the translation done so far. Please note that
+it's not the same statistics than the one printed by msgfmt
+--statistic. Here, it's stats about recent usage of the po file, while
+msgfmt reports the status of the file. It is a wrapper to the
+Locale::Po4a::Po::stats_get function applied to the input po file. Example
+of use:
+
+    [normal use of the po4a document...]
+
+    ($percent,$hit,$queries) = $document->stats();
+    print "We found translations for $percent\%  ($hit from $queries) of strings.\n";
+
+=back
+
+=cut
+
+sub getpoout {
+    return $_[0]->{TT}{po_out};
+}
+sub setpoout {
+    $_[0]->{TT}{po_out} = $_[1];
+}
+sub readpo  { 
+    $_[0]->{TT}{po_in}->read($_[1]);        
+}
+sub writepo { 
+    $_[0]->{TT}{po_out}->write( $_[1] );    
+}
+sub stats   { 
+    return $_[0]->{TT}{po_in}->stats_get(); 
+}
+
+=head2 Manipulating addenda
+
+=over 4
+
+=item addendum($)
+
+Please refer to L<po4a(7)|po4a.7> for more information on what addenda are,
+and how translators should write them. To apply an addendum to the translated
+document, simply pass its filename to this function and you are done ;)
+
+This function returns a non-null integer on error.
+
+=cut
+
+# Internal function to read the header.
+sub addendum_parse {
+    my ($filename,$header)=shift;
+
+    my ($errcode,$mode,$position,$boundary,$bmode,$content)=
+	(1,"","","","","");
+
+    unless (open (INS, "<$filename")) {
+	warn wrap_msg(dgettext("po4a", "Can't read from %s: %s"), $filename, $!);
+	goto END_PARSE_ADDFILE;
+    } 
+
+    unless (defined ($header=<INS>) && $header)  {
+	warn wrap_msg(dgettext("po4a", "Can't read Po4a header from %s."), $filename);
+	goto END_PARSE_ADDFILE;
+    }
+
+    unless ($header =~ s/PO4A-HEADER://i) {
+	warn wrap_msg(dgettext("po4a", "First line of %s does not look like a Po4a header."), $filename);
+	goto END_PARSE_ADDFILE;
+    }
+    foreach my $part (split(/;/,$header)) {
+	unless ($part =~ m/^\s*([^=]*)=(.*)$/) {
+	    warn wrap_msg(dgettext("po4a", "Syntax error in Po4a header of %s, near \"%s\""), $filename, $part);
+	    goto END_PARSE_ADDFILE;
+	}
+	my ($key,$value)=($1,$2);
+	$key=lc($key);
+  	     if ($key eq 'mode')     {  $mode=lc($value);
+	} elsif ($key eq 'position') {  $position=$value;
+	} elsif ($key eq 'endboundary') {  
+	    $boundary=$value;
+	    $bmode='after';
+	} elsif ($key eq 'beginboundary') {  
+	    $boundary=$value;
+	    $bmode='before';
+	} else { 
+	    warn wrap_msg(dgettext("po4a", "Invalid argument in the Po4a header of %s: %s"), $filename, $key);
+	    goto END_PARSE_ADDFILE;
+	}
+    }
+
+    unless (length($mode)) {
+	warn wrap_msg(dgettext("po4a", "The Po4a header of %s does not define the mode."), $filename);
+	goto END_PARSE_ADDFILE;
+    }
+    unless ($mode eq "before" || $mode eq "after") {
+	warn wrap_msg(dgettext("po4a", "Mode invalid in the Po4a header of %s: should be 'before' or 'after' not %s."), $filename, $mode);
+	goto END_PARSE_ADDFILE;
+    }
+
+    unless (length($position)) {
+	warn wrap_msg(dgettext("po4a", "The Po4a header of %s does not define the position."), $filename);
+	goto END_PARSE_ADDFILE;
+    }
+    unless ($mode eq "before" || length($boundary)) {
+    	warn wrap_msg(dgettext("po4a", "No ending boundary given in the Po4a header, but mode=after."));
+	goto END_PARSE_ADDFILE;
+    }
+
+    while (defined(my $line = <INS>)) {
+	$content .= $line;
+    }
+    close INS;
+
+    $errcode=0;
+  END_PARSE_ADDFILE: 
+      return ($errcode,$mode,$position,$boundary,$bmode,$content);
+}
+
+sub mychomp {
+    my ($str) = shift;
+    chomp($str);
+    return $str;
+}
+
+sub addendum {
+    my ($self,$filename) = @_;
+
+    print STDERR "Apply addendum $filename..." if $self->debug();
+    unless ($filename) {
+	warn wrap_msg(dgettext("po4a",
+	    "Can't apply addendum when not given the filename"));
+	return 0;
+    }
+    die wrap_msg(dgettext("po4a", "Addendum %s does not exist."), $filename)
+      unless -e $filename;
+  
+    my ($errcode,$mode,$position,$boundary,$bmode,$content)=
+	addendum_parse($filename);
+    return 0 if ($errcode);
+
+    print STDERR "mode=$mode;pos=$position;bound=$boundary;bmode=$bmode;ctn=$content\n"
+      if $self->debug();
+    
+    # We only recode the addendum if an origin charset is specified, else we
+    # suppose it's already in the output document's charset
+    if (defined($self->{TT}{'addendum_charset'}) &&
+        length($self->{TT}{'addendum_charset'})) {
+	Encode::from_to($content,$self->{TT}{'addendum_charset'},
+	    $self->get_out_charset);
+    }
+
+    my $found = scalar grep { /$position/ } @{$self->{TT}{doc_out}};
+    if ($found == 0) {
+	warn wrap_msg(dgettext("po4a",
+	    "No candidate position for the addendum %s."), $filename);
+	return 0;
+    }
+    if ($found > 1) {
+	warn wrap_msg(dgettext("po4a",
+	    "More than one candidate position found for the addendum %s."), $filename);
+	return 0;
+    }
+
+    if ($mode eq "before") {
+	if ($self->verbose() > 1 || $self->debug() ) {
+	    map { print STDERR wrap_msg(dgettext("po4a", "Addendum '%s' applied before this line: %s"), $filename, $_) if (/$position/);
+ 	        } @{$self->{TT}{doc_out}};
+	}
+	@{$self->{TT}{doc_out}} = map { /$position/ ? ($content,$_) : $_ 
+                                        }  @{$self->{TT}{doc_out}};
+    } else {
+	my @newres=();
+
+	do {
+	    # make sure it doesnt whine on empty document
+	    my $line = scalar @{$self->{TT}{doc_out}} ? shift @{$self->{TT}{doc_out}} : "";
+	    push @newres,$line;
+	    my $outline=mychomp($line);
+	    $outline =~ s/^[ \t]*//;
+	      
+	    if ($line =~ m/$position/) {
+		while ($line=shift @{$self->{TT}{doc_out}}) {
+		    last if ($line=~/$boundary/);
+		    push @newres,$line;
+		}
+		if (defined $line) {
+		    if ($bmode eq 'before') {
+			print wrap_msg(dgettext("po4a",
+			    "Addendum '%s' applied before this line: %s"),
+			    $filename, $outline)
+			  if ($self->verbose() > 1 || $self->debug());
+			push @newres,$content;
+			push @newres,$line;
+		    } else {
+			print wrap_msg(dgettext("po4a",
+			    "Addendum '%s' applied after the line: %s."),
+			    $filename, $outline)
+			  if ($self->verbose() > 1 || $self->debug());
+			push @newres,$line;
+			push @newres,$content;
+		    }
+		} else {
+		    print wrap_msg(dgettext("po4a", "Addendum '%s' applied at the end of the file."), $filename)
+		      if ($self->verbose() > 1 || $self->debug());
+		    push @newres,$content;
+		}
+	    }
+	} while (scalar @{$self->{TT}{doc_out}});
+	@{$self->{TT}{doc_out}} = @newres;
+    }
+    print STDERR "done.\n" if $self->debug();
+    return 1;
+}
+
+=back
+
+=head1 INTERNAL FUNCTIONS used to write derivated parsers
+
+=head2 Getting input, providing output
+
+Four functions are provided to get input and return output. They are very
+similar to shift/unshift and push/pop. The first pair is about input, while
+the second is about output. Mnemonic: in input, you are interested in the
+first line, what shift gives, and in output you want to add your result at
+the end, like push does.
+
+=over 4
+
+=item shiftline()
+
+This function returns the next line of the doc_in to be parsed and its
+reference (packed as an array).
+
+=item unshiftline($$)
+
+Unshifts a line of the input document and its reference. 
+
+=item pushline($)
+
+Push a new line to the doc_out.
+
+=item popline()
+
+Pop the last pushed line from the doc_out.
+
+=back
+
+=cut
+
+sub shiftline   {  
+    my ($line,$ref)=(shift @{$_[0]->{TT}{doc_in}},
+		     shift @{$_[0]->{TT}{doc_in}}); 
+    return ($line,$ref);
+}
+sub unshiftline {
+	my $self = shift;
+	unshift @{$self->{TT}{doc_in}},@_;
+}
+
+sub pushline    {  push @{$_[0]->{TT}{doc_out}}, $_[1] if defined $_[1]; }
+sub popline     {  return pop @{$_[0]->{TT}{doc_out}};            }
+
+=head2 Marking strings as translatable
+
+One function is provided to handle the text which should be translated. 
+
+=over 4
+
+=item translate($$$)
+
+Mandatory arguments:
+
+=over 2
+
+=item -
+
+A string to translate
+
+=item -
+
+The reference of this string (ie, position in inputfile)
+
+=item -
+
+The type of this string (ie, the textual description of its structural role
+; used in Locale::Po4a::Po::gettextization() ; see also L<po4a(7)|po4a.7>,
+section I<Gettextization: how does it work?>)
+
+=back
+
+This function can also take some extra arguments. They must be organized as
+a hash. For example:
+
+  $self->translate("string","ref","type",
+		   'wrap' => 1);
+
+=over
+
+=item wrap
+
+boolean indicating whether we can consider that whitespaces in string are
+not important. If yes, the function canonizes the string before looking for
+a translation or extracting it, and wraps the translation.
+
+=item wrapcol
+
+The column at which we should wrap (default: 76).
+
+=item comment
+
+An extra comment to add to the entry.
+
+=back
+
+Actions:
+
+=over 2
+
+=item -
+
+Pushes the string, reference and type to po_out.
+
+=item -
+
+Returns the translation of the string (as found in po_in) so that the
+parser can build the doc_out.
+
+=item -
+
+Handles the charsets to recode the strings before sending them to
+po_out and before returning the translations.
+
+=back
+
+=back
+
+=cut
+
+sub translate {
+    my $self=shift;
+    my ($string,$ref,$type)=(shift,shift,shift);
+    my (%options)=@_;
+
+    # my $validoption="wrap wrapcol";
+    # my %validoption;
+
+    return "" unless defined($string) && length($string);
+
+    # map { $validoption{$_}=1 } (split(/ /,$validoption));
+    # foreach (keys %options) {
+    #	Carp::confess "internal error: translate() called with unknown arg $_. Valid options: $validoption"
+    #	    unless $validoption{$_};
+    # }
+
+    my $in_charset;
+    if ($self->{TT}{ascii_input}) {
+	$in_charset = "ascii";
+    } else {
+	if (defined($self->{TT}{'file_in_charset'}) and
+	    length($self->{TT}{'file_in_charset'}) and
+	    $self->{TT}{'file_in_charset'} !~ m/ascii/i) {
+	    $in_charset=$self->{TT}{'file_in_charset'};
+	} else {
+	    # FYI, the document charset have to be determined *before* we see the first
+	    # string to recode.
+	    die wrap_mod("po4a", dgettext("po4a", "Couldn't determine the input document's charset. Please specify it on the command line. (non-ascii char at %s)"), $self->{TT}{non_ascii_ref})
+	}
+    }
+
+    if ($self->{TT}{po_in}->get_charset ne "CHARSET") {
+	$string = encode_from_to($string,
+	                         $self->{TT}{'file_in_encoder'},
+	                         $self->{TT}{po_in}{encoder});
+    }
+
+    if (defined $options{'wrapcol'} && $options{'wrapcol'} < 0) {
+# FIXME: should be the parameter given with --width
+        $options{'wrapcol'} = 76 + $options{'wrapcol'};
+    }
+    my $transstring = $self->{TT}{po_in}->gettext($string,
+					'wrap'      => $options{'wrap'}||0,
+					'wrapcol'   => $options{'wrapcol'});
+
+    if ($self->{TT}{po_in}->get_charset ne "CHARSET") {
+	my $out_encoder = $self->{TT}{'file_out_encoder'};
+	unless (defined $out_encoder) {
+	    $out_encoder = find_encoding($self->get_out_charset)
+	}
+	$transstring = encode_from_to($transstring,
+	                              $self->{TT}{po_in}{encoder},
+	                              $out_encoder);
+    }
+
+    # If the input document isn't completely in ascii, we should see what to
+    # do with the current string
+    unless ($self->{TT}{ascii_input}) {
+        my $out_charset = $self->{TT}{po_out}->get_charset;
+	# We set the output po charset 
+        if ($out_charset eq "CHARSET") {
+	    if ($self->{TT}{utf_mode}) {
+		$out_charset="utf-8";
+	    } else {
+		$out_charset=$in_charset;
+	    }
+	    $self->{TT}{po_out}->set_charset($out_charset);
+	}
+	if ( $in_charset !~ /^$out_charset$/i ) {
+	    Encode::from_to($string,$in_charset,$out_charset);
+	    if (defined($options{'comment'}) and length($options{'comment'})) {
+		Encode::from_to($options{'comment'},$in_charset,$out_charset);
+	    }
+	}
+    }
+
+    # the comments provided by the modules are automatic comments from the PO point of view
+    $self->{TT}{po_out}->push('msgid'     => $string,
+			      'reference' => $ref,
+			      'type'      => $type,
+	                      'automatic' => $options{'comment'},
+			      'wrap'      => $options{'wrap'}||0,
+			      'wrapcol'   => $options{'wrapcol'});
+
+#    if ($self->{TT}{po_in}->get_charset ne "CHARSET") {
+#	Encode::from_to($transstring,$self->{TT}{po_in}->get_charset,
+#	    $self->get_out_charset);
+#    }
+
+    if ($options{'wrap'}||0) {
+        $transstring =~ s/( *)$//s;
+        my $trailing_spaces = $1||"";
+        $transstring =~ s/ *$//gm;
+        $transstring .= $trailing_spaces;
+    }
+
+    return $transstring;
+}
+
+=head2 Misc functions
+
+=over 4
+
+=item verbose()
+
+Returns if the verbose option was passed during the creation of the
+TransTractor.
+
+=cut
+
+sub verbose {
+    if (defined $_[1]) {
+	$_[0]->{TT}{verbose} = $_[1];
+    } else {
+	return $_[0]->{TT}{verbose} || 0; # undef and 0 have the same meaning, but one generates warnings
+    }
+}
+
+=item debug()
+
+Returns if the debug option was passed during the creation of the
+TransTractor.
+
+=cut
+
+sub debug {
+    return $_[0]->{TT}{debug};
+}
+
+=item detected_charset($)
+
+This tells TransTractor that a new charset (the first argument) has been
+detected from the input document. It can usually be read from the document
+header. Only the first charset will remain, coming either from the
+process() arguments or detected from the document.
+
+=cut
+
+sub detected_charset {
+    my ($self,$charset)=(shift,shift);
+    unless (defined($self->{TT}{'file_in_charset'}) and
+            length($self->{TT}{'file_in_charset'}) ) {
+        $self->{TT}{'file_in_charset'}=$charset;
+        if (defined $charset) {
+            $self->{TT}{'file_in_encoder'}=find_encoding($charset);
+        }
+    }
+
+    if (defined $self->{TT}{'file_in_charset'} and
+        length $self->{TT}{'file_in_charset'} and
+        $self->{TT}{'file_in_charset'} !~ m/ascii/i) {
+	$self->{TT}{ascii_input}=0;
+    }
+}
+
+=item get_out_charset()
+
+This function will return the charset that should be used in the output
+document (usually useful to substitute the input document's detected charset
+where it has been found).
+
+It will use the output charset specified in the command line. If it wasn't
+specified, it will use the input po's charset, and if the input po has the
+default "CHARSET", it will return the input document's charset, so that no
+encoding is performed.
+
+=cut
+
+sub get_out_charset {
+    my $self=shift;
+    my $charset;
+
+    # Use the value specified at the command line
+    if (defined($self->{TT}{'file_out_charset'}) and
+	length($self->{TT}{'file_out_charset'})) {
+	$charset=$self->{TT}{'file_out_charset'};
+    } else {
+	if ($self->{TT}{utf_mode} && $self->{TT}{ascii_input}) {
+	    $charset="utf-8";
+	} else {
+	    $charset=$self->{TT}{po_in}->get_charset;
+	    $charset=$self->{TT}{'file_in_charset'}
+		if $charset eq "CHARSET" and
+		    defined($self->{TT}{'file_in_charset'}) and
+		    length($self->{TT}{'file_in_charset'});
+	    $charset="ascii"
+		if $charset eq "CHARSET";
+	}
+    }
+    return $charset;
+}
+
+=item recode_skipped_text($)
+
+This function returns the recoded text passed as argument, from the input
+document's charset to the output document's one. This isn't needed when
+translating a string (translate() recodes everything itself), but it is when
+you skip a string from the input document and you want the output document to
+be consistent with the global encoding.
+
+=cut
+
+sub recode_skipped_text {
+    my ($self,$text)=(shift,shift);
+    unless ($self->{TT}{'ascii_input'}) {
+	if(defined($self->{TT}{'file_in_charset'}) and
+	    length($self->{TT}{'file_in_charset'}) ) {
+	    $text = encode_from_to($text,
+	                           $self->{TT}{'file_in_encoder'},
+	                           find_encoding($self->get_out_charset));
+	} else {
+	    die wrap_mod("po4a", dgettext("po4a", "Couldn't determine the input document's charset. Please specify it on the command line. (non-ascii char at %s)"), $self->{TT}{non_ascii_ref})
+	}
+    }
+    return $text;
+}
+
+
+# encode_from_to($,$,$)
+#
+# Encode the given text from one encoding to another one.
+# It differs from Encode::from_to because it does not take the name of the
+# encoding in argument, but the encoders (as returned by the
+# Encode::find_encoding(<name>) method). Thus it permits to save a bunch
+# of call to find_encoding.
+#
+# If the "from" encoding is undefined, it is considered as UTF-8 (or
+# ascii).
+# If the "to" encoding is undefined, it is considered as UTF-8.
+#
+sub encode_from_to {
+    my ($text,$from,$to) = (shift,shift,shift);
+
+    if (not defined $from) {
+        # for ascii and UTF-8, no conversion needed to get an utf-8
+        # string.
+    } else {
+        $text = $from->decode($text, 0);
+    }
+
+    if (not defined $to) {
+        # Already in UTF-8, no conversion needed
+    } else {
+        $text = $to->encode($text, 0);
+    }
+
+    return $text;
+}
+
+=back
+
+=head1 FUTURE DIRECTIONS
+
+One shortcoming of the current TransTractor is that it can't handle
+translated document containing all languages, like debconf templates, or
+.desktop files.
+
+To address this problem, the only interface changes needed are:
+
+=over 2
+
+=item -
+
+take a hash as po_in_name (a list per language)
+
+=item -
+
+add an argument to translate to indicate the target language
+
+=item -
+
+make a pushline_all function, which would make pushline of its content for
+all language, using a map-like syntax:
+
+    $self->pushline_all({ "Description[".$langcode."]=".
+			  $self->translate($line,$ref,$langcode) 
+		        });
+
+=back
+
+Will see if it's enough ;)
+
+=head1 AUTHORS
+
+ Denis Barbier <barbier@linuxfr.org>
+ Martin Quinson (mquinson#debian.org)
+ Jordi Vilalta <jvprat@gmail.com>
+
+=cut
+
+1;
author	Dongsheng Song <dongsheng.song@gmail.com>
date	Thu, 12 Mar 2009 15:43:56 +0800
parents
children