In this blog post, I'll guide you through the process of using AI to import purchase orders into Dynamics 365 Finance and Operations (D365FO) from complex PDF files.
Although the example provided here is simplified to illustrate core integration concepts, the approach and code samples are based on real-world scenarios. This makes them highly adaptable for similar integration tasks you might encounter. We'll utilize the free and open-source External Integration Framework, which provides reusable components explicitly designed for building robust integrations in X++.
Let's start by defining the integration scenario:
Goal: The customer receives complex invoice documents from external vendors as PDF files. We want to create Purchase orders in D365FO based on these documents.
Before approaching this task, I researched the available options.
There are numerous online services, such as https://studio.convertapi.com/pdf-to-excel and https://www.ilovepdf.com/pdf_to_excel, that claim to extract tables from invoice documents. The idea was that the user converts the PDF to an Excel template and then uploads it to D365FO.
However, after initial testing, they generated many errors in the line sections—for example, empty cells when a description spans two lines, or merged cells where columns should be separate, etc.
The invoice capture solution is a standard option for Microsoft Dynamics 365 for Finance and Operations (D365FO). It is based on the Power Platform and utilises Optical Character Recognition (OCR) to parse invoices.
I researched it and found:
Because invoice capture relies on Azure services, I tried Microsoft AI Foundry Document Intelligence directly in Azure, and result's were not perfect even on the first sample (see below)
The advantage of OCR is that it provides a direct link between the image and the parsed data, and it is quite fast. However, after the initial research, I decided not to pursue this further.
We currently live in an AI world, and many modern models support file inputs. The idea was to use an AI model to read the document and output it as a JSON structure. Then, use the External Integration Framework (which can work with JSON data) to create the required D365FO documents.
I tried several models and got the best results with Gemini 2.5 Flash. It’s among the fastest on the market, and performance is essential. For example, an invoice with ~200 lines (4–5 pages) takes about 2 minutes to process.
After some quick prototyping with Claude Code, I extended the External Integration Framework with a “Get data from AI” option. Below are the setup and usage instructions.
Note: Microsoft recently released a Power App Sales Order Agent that implements a similar concept of processing documents by defining custom prompts and then utilising virtual data entities. However, the customisation option is unclear(probably not even possible).
In this section, we'll discuss how to set up AI for PDF import.
You can have multiple AI providers. Define a provider using the AI Providers form. A provider is a class that extends a DEVIntegAIProviderBase and defines the following:
In the current implementation, we can use only one Gemini class, but you can extend it later.
To store an API key, we need to create a connection type. The Google Gemini endpoint is: https://generativelanguage.googleapis.com/v1beta/models/.
To obtain the key, sign in to https://aistudio.google.com and click the "Get API Key". The price is a fraction of a cent per 1 page.
Enter these values in the Connections form.
Also, we need to create a Manual connection for the inbound message.
After setting up the connection, define a prompt. The AI Prompt Definition form lets you define and validate the prompt.
First, validate that the connection works by sending a simple “Hi” prompt. The Call API button runs the prompt, displays the response, and shows related statistics.
Then define a prompt for invoice parsing and test it on sample invoices.
A sample prompt used here:
You are an expert at extracting structured data from multi-page PDF invoices. Your task is to process the entire document and produce a single, valid JSON object.
The JSON object must have two top-level keys: HEADER and LINES.
HEADER: A single JSON object containing summary information from the invoice.
LINES: A JSON array of objects, where each object represents a single purchased item.
Extraction Rules for HEADER:
VendorName: This is printed on the first page header before or after the "Tax Invoice" label after "Vendor:" label.
ReceiptDate: Find the "Delivery date" date on the first page. Convert this to a "yyyy-MM-DD" format.
PurchPoolId: Extract the value located directly below the "Delivery date" date on the first page.
ChargeValue: At the end of the document, there is a line that states "Handling fee". Get the value from it, the data type should be real, if not found 0.
InvoiceTotalAmount: On the final page, find the line "Grand Total". Extract the final numeric total. The data type must be a number.
InvoiceTotalQty: On the final page, find the line "Total Quantity". The data type must be a number.
Extraction Rules for LINES:
Each object in the LINES array represents a single invoice item and must contain the following fields. IMPORTANT: Do not include summary or Total by lines as items in this array.
InvoiceItem: Take from the "Description" column.
ItemId: Check the "Description" column. If this is 37' TV, then use "T0004", if 52'TV, use "T0005"
Quantity: Take the numeric value from the "Qty" column.
Price: Take the numeric value from the "Unit Price" column.
GST: Take the numeric value from the "Tax Amount" column.
TotalValue: Take the numeric value from the "Amount" column.
Color: This value is determined by grouping. The invoice items are separated into sections by summary lines like "Total by color Black", "Total by color Silver".
First, identify these summary lines throughout the document.
All regular item lines that appear before a specific summary line (e.g., "Total by color Black") belong to that group.
The Color for all items in that group should be the code from the description (e.g., for "Total by color Black", the Color is "Black"; for "Total by color Silver", the Color is "Silver"). Apply this logic consistently across all pages.
Output ONLY the raw JSON. DO NOT INCLUDE any other text, explanations, or markdown formatting.
If you notice that the abilities are quite powerful, we can utilise AI to implement simple mappings. For example, to convert from InvoiceItem to ItemId, I defined a couple of rules. To derive the Color for each line, we used quite a complex logic based on the “Total by color” group.
However, if the mapping requires numeric comparisons (e.g., greater/less), Flash 2.5 can struggle; in that case, include all the necessary data in the output and implement the logic in X++.
Output for the invoice above looks like this:
{
"HEADER": {
"VendorName": "Contoso Asia",
"ReceiptDate": "2025-09-07",
"PurchPoolId": "01",
"ChargeValue": 50.00,
"InvoiceTotalAmount": 3751.50,
"InvoiceTotalQty": 8
},
"LINES": [
{
"InvoiceItem": "37' TV Model A",
"ItemId": "T0004",
"Quantity": 2,
"Price": 400.00,
"GST": 80.00,
"TotalValue": 880.00,
"Color": "Black"
},
{
"InvoiceItem": "52' TV Model B",
"ItemId": "T0005",
"Quantity": 1,
"Price": 450.00,
"GST": 45.00,
"TotalValue": 495.00,
"Color": "Black"
},
{
"InvoiceItem": "37' TV Model A",
"ItemId": "T0004",
"Quantity": 2,
"Price": 420.00,
"GST": 84.00,
"TotalValue": 924.00,
"Color": "Silver"
},
This output will be used as initial data for the processing class.
To setup processing, we need to create a new Inbound message type with a processing class DEVIntegTutorialPurchOrderOCRProcess which reads the JSON from the AI call and creates a purchase order based on it.
As a custom parameter, this class requires a link to the previously defined prompt.
We also need some custom parameters. When importing documents, you’ll likely define rules for tax calculation. There are usually two options: either take the value from the document (and then use the line Tax adjustment) or calculate it directly in D365FO. The latter requires defining a Zero tax group.
After specifying all settings, we can test the import. Open Tutorial Purchase orders OCR Staging and select New order import.
The system analyses the provided PDF and creates staging data.
The next step is validation. AI models may hallucinate, so it’s crucial to establish proper validation.
In our case, we have three validation points:
You can also edit any field in the staging tables if needed.
After validation, select Process to create a purchase order. The purchase order is created, and the staging data reference is updated (so you can always trace an individual order to the original PDF file).
From this form, you can also open the created purchase order using Open PO.
All resources mentioned in this blog post are available on GitHub. Here’s a brief overview of what’s included and how you can use these resources as a starting point for your AI integration projects.
The main components are:
Once these components are set up, the External Integration framework automatically handles the remaining integration process. The solution uses vanilla X++ code without external DLLs and works even on a local VHD.
In this post, I explained how to use AI to implement a complex PDF document import into Dynamics 365 Finance and Operations using the External Integration framework. We covered these key topics:
I hope you found this information helpful. If you have any questions, suggestions, or improvements, please feel free to reach out.
Similar posts: