Skip to content

Commit 7e8e06c

Browse files
committed
20051121
- Removed the -S option's argument (now only option is space or underscore) - Added the -Z option to translate multipart/signed attachments - Fixed loss of original mbox From_ header in certain cases - Added the -T option to output in raw mail format - If antiword fails, try catdoc (for rtf pretending to be msword doc)
1 parent d7d1b0f commit 7e8e06c

File tree

2 files changed

+78
-29
lines changed

2 files changed

+78
-29
lines changed

CHANGELOG

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,11 @@
1+
20051121
2+
3+
- Removed the -S option's argument (now only option is space or underscore)
4+
- Added the -Z option to translate multipart/signed attachments
5+
- Fixed loss of original mbox From_ header in certain cases
6+
- Added the -T option to output in raw mail format
7+
- If antiword fails, try catdoc (for rtf pretending to be msword doc)
8+
19
20051111
210

311
- Extract long names for attachments inside winmail.dat attachments

textmail

Lines changed: 70 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ use strict;
2020
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
2121
# or visit http://www.gnu.org/copyleft/gpl.html
2222
#
23-
# 20051111 raf <raf@raf.org>
23+
# 20051121 raf <raf@raf.org>
2424

2525
=head1 NAME
2626
@@ -35,6 +35,7 @@ I<textmail> - mail filter to replace MS Word/HTML attachments with plain text
3535
-w - Print the manpage in html format then exit
3636
-r - Print the manpage in nroff format then exit
3737
-M - Output in mailbox format (mboxrd)
38+
-T - Output in raw mail format (for smtp)
3839
-W - Don't replace MS Word attachments with text
3940
-E - Don't replace MS Excel attachments with csv
4041
-H - Don't replace HTML attachments with text
@@ -47,7 +48,8 @@ I<textmail> - mail filter to replace MS Word/HTML attachments with plain text
4748
-V - Don't delete video attachments
4849
-X - Don't delete MS Windows executable attachments
4950
-B - Don't recode text that was base64-encoded
50-
-S ' ' - Replace spaces in filenames with ' ' (default is '_')
51+
-S - Don't replace spaces in filenames with underscores
52+
-Z - Do translate signed content (discards signatures)
5153
-O - Delete all application/octet-stream attachments
5254
-! - Delete all application/* attachments
5355
-D hdrs - Delete headers (list of header prefixes and filenames)
@@ -102,13 +104,19 @@ manpage with a command like:
102104
103105
=item C<-M>
104106
105-
This option adds a mailbox C<From> line at the top if there isn't one
106-
already and ensures that there is a blank line at the bottom of the output.
107-
It also performs mailbox quoting on any lines in the body that look like
108-
mailbox C<From> headers. Only use this when the output is to be stored
109-
directly in a mailbox file. It is not necessary when the output is to be
110-
sent to an SMTP server or when I<textmail> is being used as a mail filter by
111-
I<procmail(1)>.
107+
This option causes the output to be in mboxrd format by adding a mailbox
108+
C<From> line at the top if there isn't one already and ensures that there is
109+
a blank line at the bottom of the output. It also performs mailbox quoting
110+
on any lines in the body that look like mailbox C<From> headers. Use this
111+
when the output is to be stored directly in a mailbox file. It is not
112+
necessary when I<textmail> is being used as a mail filter by I<procmail(1)>.
113+
114+
=item C<-T>
115+
116+
This option causes the output to be in raw mail format by removing any
117+
mailbox C<From> line and by not performing mailbox quoting. Use this when
118+
the output is to be sent directly to an SMTP server. It is not necessary
119+
when I<textmail> is being used as a mail filter by I<procmail(1)>.
112120
113121
=item C<-W>
114122
@@ -191,14 +199,22 @@ appropriate. This option suppresses this recoding. Note that if the text is
191199
large enough and contains a high enough proportion of non-ASCII characters,
192200
it will remain C<base64>-encoded to minimise space.
193201
194-
=item C<-S> I<' '>
202+
=item C<-S>
203+
204+
When translating attachments, I<textmail> replaces bad filename characters
205+
such as space characters with the underscore character. This option causes
206+
underscore characters to subsequently be converted into space characters. In
207+
other words, you can use this option to preserve space characters in
208+
attachment filenames (other bad filename characters will then be converted
209+
to spaces as well).
210+
211+
=item C<-Z>
195212
196-
When translating files, I<textmail> replaces bad characters such as space
197-
characters with the underscore character. This option lets you specify a
198-
character other than underscore to which bad filename characters will be
199-
converted. In other words, you can use this option to preserve space
200-
characters in attachment filenames (other bad filename characters will then
201-
be converted to spaces as well).
213+
By default, I<textmail> will not translate C<multipart/signed> attachments.
214+
This option causes C<multipart/signed> attachments to be replaced by the
215+
signed attachment contained therein, discarding the signature control data.
216+
The no-longer-signed data is then translated to text as normal. Note that
217+
C<multipart/encrypted> attachments are never translated.
202218
203219
=item C<-O>
204220
@@ -278,7 +294,7 @@ doesn't translate the attachments contained therein into text and doesn't
278294
delete windows executables (with output in mailbox format):
279295
280296
:0 fw
281-
| textmail -MWEHRPLIAVX
297+
| textmail -MWEHRPLIAVXS
282298
283299
=head1 REQUIREMENTS
284300
@@ -307,8 +323,6 @@ to do nothing (i.e. C<-WEHRPULIAVX>), then it degenerates into I<cat(1)>.
307323
308324
=head1 CAVEAT
309325
310-
Mail messages that are signed or encrypted are not translated.
311-
312326
The latest version of I<xls2csv(1)> at the time of writing (i.e.
313327
catdoc-0.93.3) loses data.
314328
@@ -333,7 +347,7 @@ C<http://raf.org/minimail/>
333347
334348
=head1 AUTHOR
335349
336-
20051111 raf <raf@raf.org>
350+
20051121 raf <raf@raf.org>
337351
338352
=head1 URL
339353
@@ -353,6 +367,7 @@ sub help
353367
" -w - Print the manpage in html format then exit\n",
354368
" -r - Print the manpage in nroff format then exit\n",
355369
" -M - Output in mailbox format\n",
370+
" -T - Output in raw mail format (for smtp)\n",
356371
" -W - Don't replace MS Word attachments with text\n",
357372
" -E - Don't replace MS Excel attachments with csv\n",
358373
" -H - Don't replace HTML attachments with text\n",
@@ -365,7 +380,8 @@ sub help
365380
" -V - Don't delete video attachments\n",
366381
" -X - Don't delete MS Windows executable attachments\n",
367382
" -B - Don't recode text that was base64-encoded\n",
368-
" -S ' ' - Replace spaces in filenames with ' ' (default is '_')\n",
383+
" -S - Don't replace spaces in filenames with underscores\n",
384+
" -Z - Do translate signed content (discards signatures)\n",
369385
" -O - Delete all application/octet-stream attachments\n",
370386
" -! - Delete all application/* attachments\n",
371387
" -D hdrs - Delete headers (list of header prefixes and filenames)\n",
@@ -727,6 +743,7 @@ sub newmail # rfc2822, rfc2045, rfc2046, rfc2183 (also rfc3282, rfc3066, rfc2424
727743
($m->{mime_type}, $m->{mime_boundary}, $m->{mime_parts}) = ($type =~ /^\s*([\w\/.-]+)/, $bound, $a{parts} || []) if $multi;
728744
($m->{mime_type}, $m->{mime_message}) = ($type =~ /^\s*([\w\/.-]+)/, $a{message} || {}) if $msg;
729745
$m->{body} = encode($a{body} || '', $enc) unless $multi || $msg;
746+
$m->{mbox} = $a{mbox} if exists $a{mbox} && defined $a{mbox} && length $a{mbox};
730747
return $m;
731748
}
732749

@@ -918,14 +935,17 @@ sub winmail
918935
919936
my %opt;
920937
use Getopt::Std;
921-
help unless getopts 'hmrwMWEHRPLUIAVXBS:O!D:K:f?', \%opt;
938+
help unless getopts 'hmrwMTWEHRPLUIAVXBSZO!D:K:f?', \%opt;
922939
help if exists $opt{h};
923940
man if exists $opt{m};
924941
nroff if exists $opt{r};
925942
html if exists $opt{w};
926943
my $mailbox = exists $opt{M};
944+
my $raw = exists $opt{T};
945+
die "textmail: The -M and -T options are incompatible\n" if $mailbox && $raw;
927946
my $catdoc = find('catdoc');
928-
my $antiword = find('antiword') || $catdoc;
947+
my $antiword = find('antiword');
948+
$antiword = $antiword ? $catdoc ? "$antiword|$catdoc" : $antiword : $catdoc;
929949
my $xls2csv = find('xls2csv');
930950
my $lynx = find('lynx');
931951
my $pdftotext = find('pdftotext');
@@ -945,14 +965,15 @@ my $remove_audio = ! exists $opt{A};
945965
my $remove_video = ! exists $opt{V};
946966
my $remove_exe = ! exists $opt{X};
947967
my $recode_base64_text = ! exists $opt{B};
948-
my $replace_space = $opt{S} if exists $opt{S};
968+
my $replace_space = ' ' if exists $opt{S};
969+
my $remove_signed = exists $opt{Z};
949970
my $remove_octet = exists $opt{O};
950971
my $remove_application = exists $opt{'!'};
951972
my $remove_headers = exists $opt{D};
952973
my @headers = get_file($opt{D}) if $remove_headers;
953974
my $keep_attachments = exists $opt{K};
954975
my @keep = get_file($opt{K}) if $keep_attachments;
955-
my $removing = $remove_word || $remove_excel || $remove_html || $remove_rtf || $remove_pdf || $remove_tnef || $remove_apple || $remove_images || $remove_audio || $remove_video || $remove_exe || $recode_base64_text || $remove_octet || $remove_application || $remove_headers;
976+
my $removing = $remove_word || $remove_excel || $remove_html || $remove_rtf || $remove_pdf || $remove_tnef || $remove_apple || $remove_images || $remove_audio || $remove_video || $remove_exe || $recode_base64_text || $remove_signed || $remove_octet || $remove_application || $remove_headers || $mailbox || $raw;
956977
chop(my $tmp = `$mktemp -dq /tmp/textmail.XXXXXX`) if $removing && defined $mktemp;
957978
if (!$removing || (($? || !defined $tmp || ! -d $tmp) && !mkdir($tmp = "/tmp/textmail.$$", 0700)))
958979
{
@@ -967,6 +988,7 @@ formail(sub { <> }, sub
967988
{
968989
my $m = mail2singlepart(textmail(mail2multipart(shift)));
969990
delete_header($m, qr/(?:content-length|lines)/i);
991+
delete $m->{mbox} if $raw;
970992
print mail2str($mailbox ? mail2mbox($m) : $m);
971993
});
972994
@@ -992,10 +1014,12 @@ sub textmail
9921014
my $entity = shift;
9931015
my $isapart = shift || 0;
9941016
my @parts = @{parts($entity)};
1017+
my $mbox = $entity->{mbox} if exists $entity->{mbox};
9951018
996-
# Do nothing if this is encrypted or signed
1019+
# Do nothing if this is encrypted (or signed unless -Z)
9971020
998-
return $entity if isa($entity, qr/multipart\/(?:signed|encrypted)/i);
1021+
return $entity if isa($entity, qr/multipart\/encrypted/i);
1022+
return $entity if !$remove_signed && isa($entity, qr/multipart\/signed/i);
9991023
10001024
# Remove headers
10011025
@@ -1011,6 +1035,7 @@ sub textmail
10111035
my $plain = $parts[isa($parts[0], 'text/plain') ? 0 : 1];
10121036
@{$plain->{headers}} = (grep(!/^content-/i, @{$entity->{headers}}), grep { /^content-/i } @{$plain->{headers}});
10131037
%{$plain->{header}} = (map { ($_, $entity->{header}->{$_}) } grep { !/^content-/i } keys %{$entity->{header}}), (map { ($_, $plain->{header}->{$_}) } grep { /^content-/i } keys %{$plain->{header}});
1038+
$plain->{mbox} = $mbox if defined $mbox;
10141039
return debase64($plain);
10151040
}
10161041
}
@@ -1024,10 +1049,25 @@ sub textmail
10241049
my $data = $parts[1];
10251050
@{$data->{headers}} = (grep(!/^content-/i, @{$entity->{headers}}), grep { /^content-/i } @{$data->{headers}});
10261051
%{$data->{header}} = (map { ($_, $entity->{header}->{$_}) } grep { !/^content-/i } keys %{$entity->{header}}), (map { ($_, $data->{header}->{$_}) } grep { /^content-/i } keys %{$data->{header}});
1052+
$data->{mbox} = $mbox if defined $mbox;
10271053
return mail2singlepart(textmail(mail2multipart($parts[1]), 0));
10281054
}
10291055
}
10301056
1057+
# Reduce signed attachments to just the signed data attachment
1058+
1059+
if ($remove_signed && isa($entity, 'multipart/signed') && @parts == 2)
1060+
{
1061+
if (isa($parts[1], param($entity, 'content-type', 'protocol')))
1062+
{
1063+
my $data = $parts[0];
1064+
@{$data->{headers}} = (grep(!/^content-/i, @{$entity->{headers}}), grep { /^content-/i } @{$data->{headers}});
1065+
%{$data->{header}} = (map { ($_, $entity->{header}->{$_}) } grep { !/^content-/i } keys %{$entity->{header}}), (map { ($_, $data->{header}->{$_}) } grep { /^content-/i } keys %{$data->{header}});
1066+
$data->{mbox} = $mbox if defined $mbox;
1067+
return mail2singlepart(textmail(mail2multipart($parts[0]), 0));
1068+
}
1069+
}
1070+
10311071
# Process parts
10321072
10331073
for (my $i = 0; $i < @parts; ++$i)
@@ -1164,7 +1204,7 @@ sub translate
11641204
return newmail(filename => $textpath, body => '') if !defined $cmd && $force;
11651205
my $origdata = body($part);
11661206
open A, ">$tmp/$origpath" and do { print A $origdata; close A };
1167-
my $failed = $origpath ne $textpath && system($cmd . ' ' . quotemeta("$tmp/$origpath") . ' > ' . quotemeta("$tmp/$textpath")) || -s "$tmp/$origpath" && -z "$tmp/$textpath";
1207+
my $failed; $failed = $origpath ne $textpath && system($_ . ' ' . quotemeta("$tmp/$origpath") . ' > ' . quotemeta("$tmp/$textpath")) || -s "$tmp/$origpath" && -z "$tmp/$textpath" or last for split /\|/, $cmd;
11681208
unlink "$tmp/$origpath" unless $origpath eq $textpath;
11691209
unlink("$tmp/$textpath"), return $part if $failed && !$force;
11701210
$part = newmail(filename => "$tmp/$textpath"); unlink "$tmp/$textpath";
@@ -1181,7 +1221,8 @@ sub debase64
11811221
return $entity unless $type =~ /^text\//i && encoding($entity) =~ /^base64$/i;
11821222
my $body = body($entity); $body =~ tr/\r//d;
11831223
my $name = filename($entity);
1184-
return newmail(type => $type, body => $body, (defined $name ? (name => $name) : ()));
1224+
my $mbox = $entity->{mbox} if exists $entity->{mbox};
1225+
return newmail(type => $type, body => $body, (defined $name ? (name => $name) : ()), (defined $mbox ? (mbox => $mbox) : ()));
11851226
}
11861227
11871228
# Parse a data file

0 commit comments

Comments
 (0)