If you want to open a file that has been encoded in something other than ASCII, ISO-8859-1 or UTF8, you have to use Perl's
The general idea is that internally, Perl uses a super-set character encoding. That is, Perl strings can contain any possible characters - Perl 'magically' handles it. However, since you've got s file on disk that is encoded in some special way, you have to tell Perl how to read that file correctly so that it can get the characters right.
Here is where
^
use Encode;5
open(FILE, "<:encoding(cp1252)", "myfile.txt");
open(OUT, ">:utf8", "outfile.txt");<br /> while(<FILE>)<br /> {<br /> print OUT "$_";<br /> }7
^
The above example will open the CP1252 encoded file "myfile.txt" and write it out in
There seems to be a caveat to all this. You might imagine the following woud be equivalent:
^
use Encode;15
open(FILE, "<:encoding(cp1252)", "myfile.txt");
open(OUT, ">:encoding(utf8)", "outfile.txt");<br /> while(<FILE>)<br /> {<br /> print OUT "$_";<br /> }17
^
...however, you'd be wrong. This seems to do strange things which I can't quite get to the bottom of. I found the problem parsing
The truth is, I haven't fully got to the bottom of this, and I may not be quite right about it. However I have developed a (slightly convoluted) test case, which works fine when I use "<:utf8" but not when I use "<:encoding(utf8)".
Caveat reader!