-
Notifications
You must be signed in to change notification settings - Fork 259
FAQ
This is most certainly due to fonts being available on a given OS, and not on another.
On Linux, installing additional font should help:
sudo apt install ttf*
See issue
PdfPig does not support all images filters out of the box.
If you already have applied the below solution and images are still missing, it might be because they are contained into a Pattern
color, see this issue for a possible solution.
Filters requiring external implementation are: DCT, JPX and JBIG2. You can either implement your own, or use the following NuGet packages:
- PdfPig.Filters.Dct.JpegLibrary
- PdfPig.Filters.Jbig2.PdfboxJbig2
- PdfPig.Filters.Jpx.OpenJpegDotNet
Once the Nuget packages are added, use the following:
// Create your filter provider
public sealed class MyFilterProvider : BaseFilterProvider
{
/// <summary>
/// The single instance of this provider.
/// </summary>
public static readonly IFilterProvider Instance = new MyFilterProvider();
/// <inheritdoc/>
private MyFilterProvider() : base(GetDictionary())
{
}
private static Dictionary<string, IFilter> GetDictionary()
{
// new filters
var jbig2 = new PdfboxJbig2DecodeFilter();
var jpx = new OpenJpegJpxDecodeFilter();
var dct = new JpegLibraryDctDecodeFilter();
// Default filters
var ascii85 = new Ascii85Filter();
var asciiHex = new AsciiHexDecodeFilter();
var ccitt = new CcittFaxDecodeFilter();
var dct = new DctDecodeFilter();
var flate = new FlateFilter();
var runLength = new RunLengthFilter();
var lzw = new LzwFilter();
return new Dictionary<string, IFilter>
{
{ NameToken.Ascii85Decode.Data, ascii85 },
{ NameToken.Ascii85DecodeAbbreviation.Data, ascii85 },
{ NameToken.AsciiHexDecode.Data, asciiHex },
{ NameToken.AsciiHexDecodeAbbreviation.Data, asciiHex },
{ NameToken.CcittfaxDecode.Data, ccitt },
{ NameToken.CcittfaxDecodeAbbreviation.Data, ccitt },
{ NameToken.DctDecode.Data, dct },
{ NameToken.DctDecodeAbbreviation.Data, dct },
{ NameToken.FlateDecode.Data, flate },
{ NameToken.FlateDecodeAbbreviation.Data, flate },
{ NameToken.Jbig2Decode.Data, jbig2 },
{ NameToken.JpxDecode.Data, jpx },
{ NameToken.RunLengthDecode.Data, runLength },
{ NameToken.RunLengthDecodeAbbreviation.Data, runLength },
{ NameToken.LzwDecode.Data, lzw },
{ NameToken.LzwDecodeAbbreviation.Data, lzw }
};
}
}
var parsingOption = new ParsingOptions()
{
UseLenientParsing = true, // Optinal
SkipMissingFonts = true, // Optinal
FilterProvider = MyFilterProvider.Instance
};
using (var doc = PdfDocument.Open("my_document.pdf", parsingOption))
{
int i = 0;
foreach (var page in doc.GetPages())
{
foreach (var pdfImage in page.GetImages())
{
// Process your images, e.g.:
File.WriteAllBytes($"image_{i++}.png", bytes);
}
}
}
This is often the case when the document was created using "fake bold" to bold letters. When this is the case, the document creator will duplicate each letter that are supposed to be bold with a slight offset, creating a thicker appearance. When this method is used, the PdfPig letter object will not be flagged as bold.
In order to handle duplicate letters, post-process the page's letter collection with:
letters = DuplicateOverlappingTextProcessor.Get(letters);