Ways to validate CSV structures

BizTalk Server Best practices, Tips, and Tricks: #8 Another way to Validate CSV structures

Published on : Jan 16, 2023

Category : BizTalk Server

Sandro

Author

Welcome once again to another BizTalk Server Best practices, Tips, and Tricks blog post! In my previous blog posts, I discussed some essential tips and tricks for BizTalk Server administrators:

And for BizTalk Server Developers:

Today, I will speak about another critical Best practice, Tips and Tricks for BizTalk Server developers: How to validate (or a different way to) a CSV structure

Ways to validate CSV structures

Another way to Validate CSV structures

I’m not completely far away from the truth when I say that 90% of the flat files we integrate – except for EDI, RosettaNet, and HL7, which are universal standards – are basic CSV files or (if you prefer) basic Delimited by Symbols flat files. Something like:

1
2
3
4
Sandro;Pereira;1978;Crestuma;4415
Fernanda;Leite;1980;Porto;4400
José;Silva;1972;Crestuma;4415
Rui;Barbosa;1975;Lever;4415

And I’m again not far away from the truth when I say that 95% of the time we:

  • Do not validate the flat files CSV structures – and trust that the source will provide a valid file/message.
  • Or we will use a custom pipeline with the XML Validator to validate the message structure.

Is that incorrect? Not really; it all depends on your solution requirements. I have several solutions implemented like this, and they have been working for ages without any problems.

Is that enough?  Not really; again, it all depends on your solution requirements.

The main problem with this “default” approach – the use of a custom pipeline with the XML Validator to validate the message structure – is that we are going to validate the message after we have already translated the CSV message to an XML message.

And again, in probably 90% of the scenarios, that will be OK – you can learn more about how to process Flat Files Delimited by Symbols here on my personal blog:  BizTalk Server: Teach me something new about Flat Files (or not) – Files Delimited by Symbols. However, the main problem with CSV is when we need to deal with optional fields or actually validate the structure of the Flat File before we translate it to XML.

If the optional field or fields are in the end, then that may be a simple task to achieve, and we won’t need to create a custom Pipeline component to achieve this goal. We can make use of the Suppress Empty Nodes property to accomplish that. Let’s say that:

  • In the above example, the Postal Code (last property) is an optional field. Then – assuming you already have the flat file schema generated – you want the Postal Code to be an optional field in the schema. To accomplish that, we need to:
    • In BizTalk Editor, right-click on the Schema node and select the Properties option
    • In the Properties window for the schema, set the “Suppress Empty Nodes” property to “true”

BizTalk schema node editor

Assuming that we will use the following message:

1
2
3
4
Sandro;Pereira;1978;Crestuma;4415
Fernanda;Leite;1980;Porto;4400
José;Silva;1972;Crestuma
Rui;Barbosa;1975;Lever;4415

We will get the following XML message:

<People xmlns="http://DelimitedBySymbols.FF_DSYMBOL_PEOPLE_Advance">   <Person xmlns="">     <Name>Sandro</Name>     <Surname>Pereira</Surname>     <Birthyear>1978</Birthyear>     <Address>Crestuma</Address>     <ZipCode>4415</ZipCode>   </Person>   <Person xmlns="">     <Name>Fernande</Name>     <Surname>Leite</     <Birthyear>1980</Birthyear>     <Address>Porto</Address>     <ZipCode>4400</ZipCode>   </Person>   <Person xmlns="">     <Name>José</Name>     <Surname>Silva</Surname>     <Birthyear>1972</Birthyear>     <Address>Crestuma</Address>   </Person>   <Person xmlns="">     <Name>Rui</Name>     <Surname>Barbosa</Surname>     <Birthyear>1975</Birthyear>     <Address>Lever</Address>     <ZipCode>4415</ZipCode>   </Person> </People>

which is actually excellent. The problem we face is when we have one, or several properties set as optional fields in the middle of the message, for example, the Birthyear. Take this example:

1
2
3
4
Sandro;Pereira;1978;Crestuma;4415
Fernanda;Leite;1980;Porto;4400
José;Silva;Crestuma;4415
Rui;Barbosa;1975;Lever;4415

The end result will be: 

<Person xmlns="">     <Name>José</Name>     <Surname>Silva</Surname>     <Birthyear>Crestuma</Birthyear>     <Address>       4415       Rui     </Address>     <ZipCode>Barbosa;1975;Lever;4415</ZipCode>   </Person>

which is incorrect. The correct way of the file structure will always need to be like this: 

1
2
3
4
Sandro;Pereira;1978;Crestuma;4415
Fernanda;Leite;1980;Porto;4400
José;Silva;;Crestuma;4415
Rui;Barbosa;1975;Lever;4415
 

So what are my other options?

Create a custom pipeline component

In my personal opinion, if you want to validate this process, then we need to validate the message before BizTalk Server applies the flat file translation rules.

That means that we need to create a custom CSV Flat-File structure validation.

In this sample that I created on my personal blog – CSV Structure Validation Pipeline Component – we are accepting two fields: 

  • DelimiterChar: that accepts a delimiter character to work as a property splitter action.
  • NumberFieldsPerLine: Number of fields expected per line.

CSV validation with BizTalk

Note that this component takes as granted that the line delimiter is the CRLF (Carriage Return\Line Feed).

This is going to validate if the parsing that we will implement on our flat-file schema will be processed correctly:

for (int i = 0; i < totalLines; i++)
{
if (lines[i].Split(Convert.ToChar(DelimiterChar)).Length != this.NumberFieldsPerLine)
throw new Exception("Invalid format data in the document. Line number " + (i + 1) + " has " + lines[i].Split(Convert.ToChar(DelimiterChar)).Length + " fields and it should be expect " + this.NumberFieldsPerLine + " fields");
}

return pInMsg;

If we receive an invalid file, the component will raise an error suspending the message in the BizTalk Server Administration Console. For example, with the following error message:

  • Invalid format data in the document. Line number 3 has 2 fields, and it should be expected 3 fields.

Results of CSV validate in BizTalk

This strategy will cover 95% of the flat file scenarios we cover in our integration project. But you can always use the same strategy to validate the flat-file structure before BizTalk Server translate them to XML in more complex scenarios.

Stay tuned for the following BizTalk Server Best practices, Tips, and Tricks.