biztalk pdf pipeline component

PDF Decoder Pipeline Component for BizTalk Server

Once again I update my BizTalk Pipeline Components Extensions Utility Pack project with new pipeline components. For those who aren’t familiar with it, this project is a set of custom pipeline components (libraries) with several custom pipeline components that can be used in receive and sent pipelines. Those pipeline components provide extensions of BizTalk’s out-of-the-box pipeline capabilities and it is available on GitHub.

At the moment the project has 19 custom pipeline components:

  • CBRIdocOperationPromotionEncode: Content-Based Routing Component to promote the IDOC Operation property.
    • This component requires one configuration, which is the MessageType string, to be ignored. Then it will take the last string (word) from the MessageType Message Context Property and promote it to the Operation Message Context Property.
  • CBROperationPromotionEncode: Content-Based Routing Component to promote Operation property.
    • This component doesn’t require any configuration. It takes the value (word) which lies ahead of the cardinal (#) from the MessageType message context property, and promote it to the Operation Message Context Property.
  • Carry SOAPHeader To WCF-BasicHttp Pipeline Component: Content-Based Routing Component to carry forward the received SOAP Header to the outgoing message.
    • This component accesses the original SOAP Header received in the source message, and add it to the destination message(s): OutboundCustomHeaders property (used by the WCF-BasicHTTP Adapter)
  • Multi-Part Message Attachments Zipper Pipeline Component: The BizTalk Multi-Part Message Attachments Zipper is a pipeline component for BizTalk Server which can be used in a send pipeline and is intended to replace all attachments of a multi-part message for its zipped equivalent.
    • The capabilities are similar to those available in compression software such as WinZip or 7-zip:
      • Attachments Compression – Extracts, in a send pipeline, all message parts included in a multi-part message that is not included in the message body (Message Body Part = False), compresses it and attaches the compressed attachment back to the message.
  • Zip Pipeline Component: The Zip pipeline component is a pipeline component for BizTalk Server which can be used in a send pipeline (encode stage) and is intended to compress (zip/gzip) outgoing messages.
  • UnZip File Pipeline Component: This UnZip File Pipeline Component for BizTalk Server can be used in a Receive pipeline (Disassemble stage) and it allows you to receive a compressed (zip/gzip) file and extract its contents into different XML messages.
  • Remove XML Namespace Pipeline Component: The RemoveXmlNamespace is a pipeline component for BizTalk Server made by Johan Hedberg which can be used to remove XML namespaces from XML documents inside custom pipelines.
  • XML Complete Validator Pipeline Component: The XML Complete Validator Pipeline Component is a component to be used on the Validate stage of a pipeline and to be used as a substitution for the out-of-the-box XML validator component. The default component stops validating the XML message after encountering an error. It doesn’t continue through the rest of the validation and provide a complete detailed list of all errors. That means that if you have several errors, you probably will test it several times and fix one issue at a time. This component suppresses this limitation/behaviour. Instead of throwing an exception the moment the first error occurs, it will validate the entire message and then provide a complete list of the errors found.
  • ExtractingXmlDisassembler: Demonstration exercise in which the first stage is a FF disassembler, followed by an XML disassembler. This sample maybe will not be very useful in real case scenarios, but it shows the principle and it avoids 3rd party components.
  • JSON Encoder: The JSON Encoder is a pipeline component for BizTalk Server which can be used in a Send Pipeline (Encode stage) to encode any XML message into a JSON equivalent.
    • This pipeline component is an extension of the default JSON Encoder pipeline component and is fully compatible with it.
  • ODBC File Decoder Pipeline Component: ODBC File Decoder Pipeline Component is, as the name suggests, a decode component that you can use in a receive pipeline to process DBF or Excel files. Still, it can be possible to process other ODBC types (maybe requiring minor adjustments). The component uses basic ADO.NET to parse the incoming DBF or Excel files into an XML document.
  • BizTalk Archive Pipeline Component: This component enables you to archive incoming/outgoing messages from any adapters to either a folder (local, shared, network) or in a SQL Server database.
  • BizTalk PDF2Xml Pipeline Component: is, as the name suggests, a decode component that transforms the content of a PDF document to an XML message that BizTalk can understand and process
  • Receive Location Name Property Promotion Pipeline Component: This is a simple pipeline component to promote the Receive Location Name property to the context of the message.
  • Local Folder Archive Pipeline Component: it’s a pipeline component that can be used for archiving incoming/outgoing messages from any adapters.
  • XML Namespace Management Pipeline Component: This is a pipeline component for BizTalk Server which can be used in any stage of both receive and send pipelines that allow you to add or change the namespace to inbound and outbound BizTalk Messages.
  • SQL Server Polling Debatch Message By Grouping Filter Pipeline Component: This is a pipeline component for BizTalk Server able to debatch messages that are pulled from SQL Server based on a filter.
  • NamespaceStripper Pipeline Component: pipeline component to remove all namespaces and prefixes from an XML message.

biztalk pdf pipeline component

Today my team and I updated this project with a brand new component: PDF Decoder Pipeline Component.

PDF Decoder Pipeline Component

Recently I added a similar component to this project: BizTalk PDF2Xml Pipeline Component that I posted on my blog. However, as I mentioned there, that component works well with some PDF files, but others simply ignore its content. Well, this new component comes to solve all these issues.

PDF Decoder Pipeline Component is, as the name suggests, a decode component that transforms the content of a PDF document to an XML message that BizTalk can understand and process. The component uses the iTextSharp library to extract the PDF content. 

  • This component doesn’t require any configuration.

Once you pass the PDF by this component, the outcome will be an XML message with all PDF content in it. Each line from the PDF will be translated into a Line record in the XML message.

biztalk pipeline component

How to install it

As always, you just need to add these DLLs on the Pipeline Components folder, that in BizTalk Server 2020, is by default:

  • C:\Program Files (x86)\Microsoft BizTalk Server\Pipeline Components

In this particular component, we need to have these 2 DLLs:

  • BizTalk.PipelineComponents.PDFDecoder.dll
  • itextsharp.dll

If you want to process the message as an XML message, you also need to deploy the BizTalk Server project that contains the Schema used in this pipeline component:

  • internal.dvs.components.pdf 

How to use it

To use the pipeline component, I recommend you to create a generic or several generic pipelines that can be reused by all your applications, and add the PDF Decoder Pipeline Component
in the Decode stage inside a receive pipeline. 

biztalk pdf adapter

Deploy the pipeline to your environment and configure the port accordingly.

biztalk pdf pipeline component

Download

THIS COMPONENT IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND.

You can download the PDF Decoder Pipeline Component
from GitHub: