Quantcast
Viewing latest article 22
Browse Latest Browse All 25

HOWTO: Compressing text in a database while keeping it searchable with Perl Compress::AsciiFlate

I’m archiving our Sybase ASE MDA tables into a central repository but we do run into space issues from time to time with the stored queries from master…monSysSQLText and master..monProcessSQLText. The workaround was to compress the queries using Compress::Zlib and storing them in an image column if the query was over 400 bytes. The problem comes in that we aren’t able to easily search the queries.

Jimi-Carlo Bukowski-Wills’s Compress::AsciiFlate allows me to compress the queries into a searchable string I can put in a text column. It seems to work fairly well but there are a few caveats:

  1. module reads the entire text into memory before processing it. This isn’t a problem for most query texts though.
  2. if the text is mostly unique words then the compressed text may actually be larger than the original text
  3. it doesn’t handle unicode text. Expect anything in the latin iso_1 character set to work though

Of course, this should work with any database that is set up to use/accept latin iso_1 characters.

use Compress::AsciiFlate;

my $af = new Compress::AsciiFlate;
my $text = 'some words some words some words';

$af->deflate($text);
print $text; # prints: "some words _1 _2 _1 _2"
$af->inflate($text);

print $text; # now prints: "some words some words some words"
print $af->olength; # original length: 33
print $af->dlength; # deflated length: 23
print $af->difference; # 10
print $af->ratio; # 0.696969696969697
print $af->ratio(3); # 0.697
print $af->percentage; # 69.69
print $af->percentage(4); # 69.697
print $af->count; # how many different words: 2
print join(' ',$af->table); # _1 some _2 words

Viewing latest article 22
Browse Latest Browse All 25

Trending Articles