grs1 Character Set

This is a bit random, but does anyone know how to convert the grs1 character set into utf8?
<--break-->
For various reasons, I'm hacking about with Zebra and grs1 output records. My actual scenario (if it matters) is that I have utf8 SOIF records, and get them out of Zebra using Zap. To do this sensibly, I've ended up using grs1 format retrieval into Zap.

The grs1 character set is really weird. I've found that a » character appears to become » when piped though grs1 (very much like looking at the utf8 character with iso-8859-1 eyes). I'd like to have a proper grs1 character -> HTML entities converter, but alas, I can't find anything about the character set on t'interweb.

I have found a hack which seems to work, although probably will cause more problems than it solves. It's a Perl regular expression:

$data =~ s/..Â(W)/$1/g;

...I dunno why it needs two characters before the  - hex dumps of data don't really show up what's going on either. I'd really like to get this cleaned up and working properly (ie. without some god-awful hack in place!).

UPDATE (30th Aug, 2004):
I've got a long way to sorting this out. Certainly, it now works in the majority of cases, unlike the regex which really doesn't work at all.

I've written a Perl module to perform character conversions from GRS1 to Unicode (and back again). I can't guarantee it's a full implementation, or even that it'll work how you might expect, but it does seem to do what I need (at least). Do with it what you will.

Z3950::grs1 (567k file) (POD documentation)

AttachmentSize
grs1.pm567.17 KB
grs1.html.txt3.76 KB
Submitted by coofercat on Fri, 2004-08-27 01:51

Comments

grs1 Character Set

on the basis that google can return different results for different people on the basis of identical keywords, i've done a search and got the following back:

http://www.indexdata.dk/zap/doc/individual-input-parameters.tkl?skin=print

there is also a link that says something about a bug fix:

http://www.indexdata.dk/yaz/NEWS (search for "YAZ Iconv utility now supports MARC8 decoding")

could you do some pre-processing as data is going in?me not no nuffink about what i'm talking too but gibber schneeble triandywobbygog so good luck on your mission...

knobs...

Submitted by robert (not verified) on Fri, 2004-08-27 16:56.
grs1 Character Set

Thanks for that - been there, read those, and got no where though.

MARC8 is YACS (yet another character set), for which there also seems to be very little information. Even if there was, MARC format records are a pain in the rear, so I ended up with grs1 instead. That's also a pain in the rear, as it turns out. Grr!

Submitted by coofercat on Fri, 2004-08-27 17:38.