Entries tagged with 'pdf'

Extending the ImageBox component to display the contents of a PDF file using C#

In this article, I'll describe how to extend the ImageBox control discussed in earlier articles to be able to display PDF files with the help of the GhostScript library and the conversion library described in the previous article.

Getting Started

You can download the source code used in this article from the links below, these are:

  • Cyotek.GhostScript - core library providing GhostScript integration support
  • Cyotek.GhostScript.PdfConversion - support library for converting a PDF document into images
  • PdfImageBoxSample - sample project containing an updated ImageBox control, and the extended PdfImageBox.

Please note that the native GhostScript DLL is not included in these downloads, you will need to obtain that from the GhostScript project page.

Extending the ImageBox

To start extending the ImageBox, create a new class and inherit the ImageBox control. I also decided to override some of the default properties, so I added a constructor which sets the new values.

    public PdfImageBox()
    {
      // override some of the original ImageBox defaults
      this.GridDisplayMode = ImageBoxGridDisplayMode.None;
      this.BackColor = SystemColors.AppWorkspace;
      this.ImageBorderStyle = ImageBoxBorderStyle.FixedSingleDropShadow;

      // new pdf conversion settings
      this.Settings = new Pdf2ImageSettings();
    }

To ensure correct designer support, override versions of the properties with new DefaultValue attributes were added. With this done, it's time to add the new properties that will support viewing PDF files. The new properties are:

  • PdfFileName - the filename of the PDF to view
  • PdfPassword - specifies the password of the PDF file if one is required to open it (note, I haven't actually tested that this works!)
  • Settings - uses the Pdf2ImageSettings class discussed earlier to control quality settings for the converted document.
  • PageCache - an internal dictionary which stores a Bitmap against a page number to cache pages after these have loaded.

With the exception of PageCache, each of these properties also has backing event for change notifications, and as Pdf2ImageSettings implements INotifyPropertyChanged we'll also bind an event detect when the individual setting properties are modified.

    [Category("Appearance"), DefaultValue(typeof(Pdf2ImageSettings), "")]
    public virtual Pdf2ImageSettings Settings
    {
      get { return _settings; }
      set
      {
        if (this.Settings != value)
        {
          if (_settings != null)
            _settings.PropertyChanged -= SettingsPropertyChangedHandler;

          _settings = value;
          _settings.PropertyChanged += SettingsPropertyChangedHandler;

          this.OnSettingsChanged(EventArgs.Empty);
        }
      }
    }
    
    private void SettingsPropertyChangedHandler(object sender, PropertyChangedEventArgs e)
    {
      this.OnSettingsChanged(e);
    }

    protected virtual void OnSettingsChanged(EventArgs e)
    {
      this.OpenPDF();

      if (this.SettingsChanged != null)
        this.SettingsChanged(this, e);
    }

Navigation support

Although the PdfImageBox doesn't supply a user interface for navigating to different pages, we want to make it easy for the hosting application to provide one. To support this, a new CurrentPage property will be added for allowing the active page to retrieved or set, and also a number of readonly CanMove* properties. These properties allow the host to query which navigation options are applicable in order to present the correct UI.

    [Browsable(false)]
    public virtual int PageCount
    { get { return _converter != null ? _converter.PageCount : 0; } }

    [Category("Appearance"), DefaultValue(1)]
    public int CurrentPage
    {
      get { return _currentPage; }
      set
      {
        if (this.CurrentPage != value)
        {
          if (value < 1 || value > this.PageCount)
            throw new ArgumentException("Page number is out of bounds");

          _currentPage = value;

          this.OnCurrentPageChanged(EventArgs.Empty);
        }
      }
    }

    [Browsable(false)]
    public bool CanMoveFirst
    { get { return this.PageCount != 0 && this.CurrentPage != 1; } }

    [Browsable(false)]
    public bool CanMoveLast
    { get { return this.PageCount != 0 && this.CurrentPage != this.PageCount; } }

    [Browsable(false)]
    public bool CanMoveNext
    { get { return this.PageCount != 0 && this.CurrentPage < this.PageCount; } }

    [Browsable(false)]
    public bool CanMovePrevious
    { get { return this.PageCount != 0 && this.CurrentPage > 1; } }

Again, to make it easier for the host to connect to the control, we also add some helper navigation methods.

    public void FirstPage()
    {
      this.CurrentPage = 1;
    }

    public void LastPage()
    {
      this.CurrentPage = this.PageCount;
    }

    public void NextPage()
    {
      this.CurrentPage++;
    }

    public void PreviousPage()
    {
      this.CurrentPage--;
    }

Finally, it can sometimes take a few seconds to convert a page in a PDF file. To allow the host to provide a busy notification, such as setting the wait cursor or displaying a status bar message, we'll add a pair of events which will be called before and after a page is converted.

public event EventHandler LoadingPage;

public event EventHandler LoadedPage;

Opening the PDF file

Each of the property changed handlers in turn call the OpenPDF method. This method first clears any existing image cache and then initializes the conversion class based on the current PDF file name and quality settings. If the specified file is a valid PDF, the first page is converted, cached, and displayed.

    public void OpenPDF()
    {
      this.CleanUp();

      if (!this.DesignMode)
      {
        _converter = new Pdf2Image()
        {
          PdfFileName = this.PdfFileName,
          PdfPassword = this.PdfPassword,
          Settings = this.Settings
        };

        this.Image = null;
        this.PageCache= new Dictionary<int, Bitmap>();
        _currentPage = 1;

        if (this.PageCount != 0)
        {
          _currentPage = 0;
          this.CurrentPage = 1;
        }
      }
    }

    private void CleanUp()
    {
      // release  bitmaps
      if (this.PageCache != null)
      {
        foreach (KeyValuePair<int, Bitmap> pair in this.PageCache)
          pair.Value.Dispose();
        this.PageCache = null;
      }
    }

Displaying the image

Each time the CurrentPage property is changed, it calls the SetPageImage method. This method first checks to ensure the specified page is present in the cache. If it is not, it will load the page in. Once the page is in the cache, it is then displayed in the ImageBox, and the user can then pan and zoom as with any other image.

    protected virtual void SetPageImage()
    {
      if (!this.DesignMode && this.PageCache != null)
      {
        lock (_lock)
        {
          if (!this.PageCache.ContainsKey(this.CurrentPage))
          {
            this.OnLoadingPage(EventArgs.Empty);
            this.PageCache.Add(this.CurrentPage, _converter.GetImage(this.CurrentPage));
            this.OnLoadedPage(EventArgs.Empty);
          }

          this.Image = this.PageCache[this.CurrentPage];
        }
      }
    }

Note that we operate a lock during the execution of this method, to ensure that you can't try and load the same page twice.

With this method in place, the control is complete and ready to be used as a basic PDF viewer. In order to keep the article down to a reasonable size, I've excluded some of the definitions, overloads and helper methods; these can all be found in the sample download below.

The sample project demonstrates all the features described above and provides an example setting up a user interface for navigating a PDF document.

Future changes

At the moment, the PdfImageBox control processes on page at a time and caches the results. This means that navigation through already viewed pages is fast, but displaying new pages can be less than ideal. A possible enhancement would be to make the control multithreaded, and continue to load pages on a background thread.

Another issue is that as the control is caching the converted images in memory, it may use a lot of memory in order to display large PDF files. Not quite sure on the best approach to resolve this one, either to "expire" older pages, or to keep only a fixed number in memory. Or even save each page to a temporary disk file.

Finally, I haven't put in any handling at all for if the converter fails to convert a given page... I'll add this to a future update, and hopefully get the code hosted on an SVN server for interested parties.

Downloads:

  • PdfImageBoxSample.zip

    (513.29 KB | 04 September 2011 )

    Sample project showing how to extend the ImageBox control in order to display convert and display PDF files in a .NET WinForms application with the help of GhostScript.

  • Cyotek.GhostScript.zip

    (11.68 KB | 04 September 2011 )

    Work in progress class library for providing GhostScript integration in a .NET application.

  • Cyotek.GhostScript.PdfConversion.zip

    (5.43 KB | 04 September 2011 )

    Class library for converting PDF files into images using GhostScript. Also requires the Cyotek.GhostScript assembly.

2 comments | | Trackback specific URL for this entry

Convert a PDF into a series of images using C# and GhostScript

An application I was recently working on received PDF files from a webservice which it then needed to store in a database. I wanted the ability to display previews of these documents within the application. While there are a number of solutions for creating PDF files from C#, options for viewing a PDF within your application is much more limited, unless you purchase expensive commercial products, or use COM interop to embed Acrobat Reader into your application.

This article describes an alternate solution, in which the pages in a PDF are converted into images using GhostScript, from where you can then display them in your application.

In order to avoid huge walls of text, this article has been split into two parts, the first dealing with the actual conversion of a PDF, and the second demonstrates how to extend the ImageBox control to display the images.

Caveat emptor

Before we start, some quick points.

  • The method I'm about to demonstrate converts into page of the PDF into an image. This means that it is very suitable for viewing, but interactive elements such as forms, hyperlinks and even good old text selection are not available.
  • GhostScript has a number of licenses associated with it but I can't find any information of the pricing of commercial licenses.
  • The GhostScript API Integration library used by this project isn't complete and I'm not going to go into the bells and whistles of how it works in this pair of articles - once I've completed the outstanding functionality I'll create a new article for it.

Getting Started

You can download the two libraries used in this article from the links below, these are:

  • Cyotek.GhostScript - core library providing GhostScript integration support
  • Cyotek.GhostScript.PdfConversion - support library for converting a PDF document into images

Please note that the native GhostScript DLL is not included in these downloads, you will need to obtain that from the GhostScript project page.

Using the GhostScriptAPI class

As mentioned above, the core GhostScript library isn't complete yet, so I'll just give a description of the basic functionality required by the conversion library.

The GhostScriptAPI class handles all communication with GhostScript. When you create an instance of the class, it automatically calls gsapi_new_instance in the native GhostScript DLL. When the class is disposed, it will automatically release any handles and calls the native gsapi_exit and gsapi_delete_instance methods.

In order to actually call GhostScript, you call the Execute method, passing in either a string array of all the arguments to pass to GhostScript, or a typed dictionary of commands and values. The GhostScriptCommand enum contains most of the commands supported by GhostScript, which may be a preferable approach rather than trying to remember the parameter names themselves.

Defining conversion settings

The Pdf2ImageSettings class allows you to customize various properties of the output image. The following properties are available:

  • AntiAliasMode - specifies the antialiasing level between Low, Medium and High. This internally will set the dTextAlphaBits and dGraphicsAlphaBits GhostScript switches to appropriate values.
  • Dpi - dots per inch. Internally sets the r switch. This property is not used if a paper size is set.
  • GridFitMode - controls the text readability mode. Internally sets the dGridFitTT switch.
  • ImageFormat - specifies the output image format. Internally sets the sDEVICE switch.
  • PaperSize - specifies a paper size from one of the standard sizes supported by GhostScript.
  • TrimMode - specifies how the image should be sized. Your milage may vary if you try and use the paper size option. Internally sets either the dFIXEDMEDIA and sPAPERSIZE or the dUseCropBox or the dUseTrimBox switches.

Typical settings could look like this:

      Pdf2ImageSettings settings;

      settings = new Pdf2ImageSettings();
      settings.AntiAliasMode = AntiAliasMode.High;
      settings.Dpi = 300;
      settings.GridFitMode = GridFitMode.Topological;
      settings.ImageFormat = ImageFormat.Png24;
      settings.TrimMode = PdfTrimMode.CropBox;

Converting the PDF

To convert a PDF file into a series of images, use the Pdf2Image class. The following properties and methods are offered:

  • ConvertPdfPageToImage - converts a given page in the PDF into an image which is saved to disk
  • GetImage - converts a page in the PDF into an image and returns the image
  • GetImages - converts a range of pages into the PDF into images and returns an image array
  • PageCount - returns the number of pages in the source PDF
  • PdfFilename - returns or sets the filename of the PDF document to convert
  • PdfPassword - returns or sets the password of the PDF document to convert
  • Settings - returns or sets the settings object described above

A typical example to convert the first image in a PDF document:

Bitmap firstPage = new Pdf2Image("sample.pdf").GetImage();

The inner workings

Most of the code in the class is taken up with the GetConversionArguments method. This method looks at the various properties of the conversion such as output format, quality, etc, and returns the appropriate commands to pass to GhostScript:

protected virtual IDictionary<GhostScriptCommand, object> GetConversionArguments(string pdfFileName, string outputImageFileName, int pageNumber, string password, Pdf2ImageSettings settings)
    {
      IDictionary<GhostScriptCommand, object> arguments;

      arguments = new Dictionary<GhostScriptCommand, object>();

      // basic GhostScript setup
      arguments.Add(GhostScriptCommand.Silent, null);
      arguments.Add(GhostScriptCommand.Safer, null);
      arguments.Add(GhostScriptCommand.Batch, null);
      arguments.Add(GhostScriptCommand.NoPause, null);

      // specify the output
      arguments.Add(GhostScriptCommand.Device, GhostScriptAPI.GetDeviceName(settings.ImageFormat));
      arguments.Add(GhostScriptCommand.OutputFile, outputImageFileName);

      // page numbers
      arguments.Add(GhostScriptCommand.FirstPage, pageNumber);
      arguments.Add(GhostScriptCommand.LastPage, pageNumber);

      // graphics options
      arguments.Add(GhostScriptCommand.UseCIEColor, null);

      if (settings.AntiAliasMode != AntiAliasMode.None)
      {
        arguments.Add(GhostScriptCommand.TextAlphaBits, settings.AntiAliasMode);
        arguments.Add(GhostScriptCommand.GraphicsAlphaBits, settings.AntiAliasMode);
      }

      arguments.Add(GhostScriptCommand.GridToFitTT, settings.GridFitMode);

      // image size
      if (settings.TrimMode != PdfTrimMode.PaperSize)
        arguments.Add(GhostScriptCommand.Resolution, settings.Dpi.ToString());

      switch (settings.TrimMode)
      {
        case PdfTrimMode.PaperSize:
          if (settings.PaperSize != PaperSize.Default)
          {
            arguments.Add(GhostScriptCommand.FixedMedia, true);
            arguments.Add(GhostScriptCommand.PaperSize, settings.PaperSize);
          }
          break;
        case PdfTrimMode.TrimBox:
          arguments.Add(GhostScriptCommand.UseTrimBox, true);
          break;
        case PdfTrimMode.CropBox:
          arguments.Add(GhostScriptCommand.UseCropBox, true);
          break;
      }

      // pdf password
      if (!string.IsNullOrEmpty(password))
        arguments.Add(GhostScriptCommand.PDFPassword, password);

      // pdf filename
      arguments.Add(GhostScriptCommand.InputFile, pdfFileName);

      return arguments;
    }
    

As you can see from the method above, the commands are being returned as a strongly typed dictionary - the GhostScriptAPI class will convert these into the correct GhostScript commands, but the enum is much easier to work with from your code! The following is an example of the typical GhostScript commands to convert a single page in a PDF document:

-q -dSAFER -dBATCH -dNOPAUSE -sDEVICE=png16m -sOutputFile=tmp78BC.tmp -dFirstPage=1 -dLastPage=1 -dUseCIEColor -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -dGridFitTT=2 -r150 -dUseCropBox=true sample.pdf

The next step is to call GhostScript and convert the PDF which is done using the ConvertPdfPageToImage method:

    public void ConvertPdfPageToImage(string outputFileName, int pageNumber)
    {
      if (pageNumber < 1 || pageNumber > this.PageCount)
        throw new ArgumentException("Page number is out of bounds", "pageNumber");

      using (GhostScriptAPI api = new GhostScriptAPI())
        api.Execute(this.GetConversionArguments(this._pdfFileName, outputFileName, pageNumber, this.PdfPassword, this.Settings));
    }

As you can see, this is a very simple call - create an instance of the GhostScriptAPI class and then pass in the list of parameters to execute. The GhostScriptAPI class takes care of everything else.

Once the file is saved to disk, you can then load it into a Bitmap or Image object for use in your application. Don't forget to delete the file when you are finished with it!

Alternatively, the GetImage method will convert the file and return the bitmap image for you, automatically deleting the temporary file. This saves you from having to worry about providing and deleting the output file, but it does mean you are responsible for disposing of the returned bitmap.

    public Bitmap GetImage(int pageNumber)
    {
      Bitmap result;
      string workFile;

      if (pageNumber < 1 || pageNumber > this.PageCount)
        throw new ArgumentException("Page number is out of bounds", "pageNumber");

      workFile = Path.GetTempFileName();

      try
      {
        this.ConvertPdfPageToImage(workFile, pageNumber);
        using (FileStream stream = new FileStream(workFile, FileMode.Open, FileAccess.Read))
          result = new Bitmap(stream);
      }
      finally
      {
        File.Delete(workFile);
      }

      return result;
    }

You could also convert a range of pages at once using the GetImages method:

public Bitmap[] GetImages(int startPage, int lastPage)
{
  List<Bitmap> results;

  if (startPage < 1 || startPage > this.PageCount)
    throw new ArgumentException("Start page number is out of bounds", "startPage");

  if (lastPage < 1 || lastPage > this.PageCount)
    throw new ArgumentException("Last page number is out of bounds", "lastPage");
  else if (lastPage < startPage)
    throw new ArgumentException("Last page cannot be less than start page", "lastPage");

  results = new List<Bitmap>();
  for (int i = startPage; i <= lastPage; i++)
    results.Add(this.GetImage(i));

  return results.ToArray();
}

In conclusion

The above methods provide a simple way of providing basic PDF viewing in your applications. In the next part of this series, we describe how to extend the ImageBox component to support conversion and navigation.

Downloads:

  • Cyotek.GhostScript.zip

    (11.68 KB | 04 September 2011 )

    Work in progress class library for providing GhostScript integration in a .NET application.

  • Cyotek.GhostScript.PdfConversion.zip

    (5.43 KB | 04 September 2011 )

    Class library for converting PDF files into images using GhostScript. Also requires the Cyotek.GhostScript assembly.

9 comments | | Trackback specific URL for this entry

Recent News

Recent Articles

Most Popular

Tags

Advertisments