Welcome once again to another BizTalk Server Best practices, Tips, and Tricks blog post! In my previous blog posts, I discussed some essential tips and tricks for BizTalk Server administrators:
And for BizTalk Server Developers:
Today, I will speak about another critical Best practice, Tips and Tricks for BizTalk Server developers: How to validate (or a different way to) a CSV structure.
I’m not completely far away from the truth when I say that 90% of the flat files we integrate – except for EDI, RosettaNet, and HL7, which are universal standards – are basic CSV files or (if you prefer) basic Delimited by Symbols flat files. Something like:
1
2
3
4
|
Sandro;Pereira;1978;Crestuma;4415 Fernanda;Leite;1980;Porto;4400 José;Silva;1972;Crestuma;4415 Rui;Barbosa;1975;Lever;4415 |
And I’m again not far away from the truth when I say that 95% of the time we:
Is that incorrect? Not really; it all depends on your solution requirements. I have several solutions implemented like this, and they have been working for ages without any problems.
Is that enough? Not really; again, it all depends on your solution requirements.
The main problem with this “default” approach – the use of a custom pipeline with the XML Validator to validate the message structure – is that we are going to validate the message after we have already translated the CSV message to an XML message.
And again, in probably 90% of the scenarios, that will be OK – you can learn more about how to process Flat Files Delimited by Symbols here on my personal blog: BizTalk Server: Teach me something new about Flat Files (or not) – Files Delimited by Symbols. However, the main problem with CSV is when we need to deal with optional fields or actually validate the structure of the Flat File before we translate it to XML.
If the optional field or fields are in the end, then that may be a simple task to achieve, and we won’t need to create a custom Pipeline component to achieve this goal. We can make use of the Suppress Empty Nodes property to accomplish that. Let’s say that:
Assuming that we will use the following message:
1
2
3
4
|
Sandro;Pereira;1978;Crestuma;4415 Fernanda;Leite;1980;Porto;4400 José;Silva;1972;Crestuma Rui;Barbosa;1975;Lever;4415 |
We will get the following XML message:
<
People
xmlns
=
"http://DelimitedBySymbols.FF_DSYMBOL_PEOPLE_Advance"
>
<
Person
xmlns
=
""
>
<
Name
>Sandro</
Name
>
<
Surname
>Pereira</
Surname
>
<
Birthyear
>1978</
Birthyear
>
<
Address
>Crestuma</
Address
>
<
ZipCode
>4415</
ZipCode
>
</
Person
>
<
Person
xmlns
=
""
>
<
Name
>Fernande</
Name
>
<
Surname
>Leite</
<
Birthyear
>1980</
Birthyear
>
<
Address
>Porto</
Address
>
<
ZipCode
>4400</
ZipCode
>
</
Person
>
<
Person
xmlns
=
""
>
<
Name
>José</
Name
>
<
Surname
>Silva</
Surname
>
<
Birthyear
>1972</
Birthyear
>
<
Address
>Crestuma</
Address
>
</
Person
>
<
Person
xmlns
=
""
>
<
Name
>Rui</
Name
>
<
Surname
>Barbosa</
Surname
>
<
Birthyear
>1975</
Birthyear
>
<
Address
>Lever</
Address
>
<
ZipCode
>4415</
ZipCode
>
</
Person
>
</
People
>
which is actually excellent. The problem we face is when we have one, or several properties set as optional fields in the middle of the message, for example, the Birthyear. Take this example:
1
2
3
4
|
Sandro;Pereira;1978;Crestuma;4415 Fernanda;Leite;1980;Porto;4400 José;Silva;Crestuma;4415 Rui;Barbosa;1975;Lever;4415 |
The end result will be:
<
Person
xmlns
=
""
>
<
Name
>José</
Name
>
<
Surname
>Silva</
Surname
>
<
Birthyear
>Crestuma</
Birthyear
>
<
Address
>
4415
Rui
</
Address
>
<
ZipCode
>Barbosa;1975;Lever;4415</
ZipCode
>
</
Person
>
which is incorrect. The correct way of the file structure will always need to be like this:
1
2
3
4
|
Sandro;Pereira;1978;Crestuma;4415 Fernanda;Leite;1980;Porto;4400 José;Silva;;Crestuma;4415 Rui;Barbosa;1975;Lever;4415 |
So what are my other options?
In my personal opinion, if you want to validate this process, then we need to validate the message before BizTalk Server applies the flat file translation rules.
That means that we need to create a custom CSV Flat-File structure validation.
In this sample that I created on my personal blog – CSV Structure Validation Pipeline Component – we are accepting two fields:
Note that this component takes as granted that the line delimiter is the CRLF (Carriage Return\Line Feed).
This is going to validate if the parsing that we will implement on our flat-file schema will be processed correctly:
for (int i = 0; i < totalLines; i++) { if (lines[i].Split(Convert.ToChar(DelimiterChar)).Length != this.NumberFieldsPerLine) throw new Exception("Invalid format data in the document. Line number " + (i + 1) + " has " + lines[i].Split(Convert.ToChar(DelimiterChar)).Length + " fields and it should be expect " + this.NumberFieldsPerLine + " fields"); } return pInMsg;
If we receive an invalid file, the component will raise an error suspending the message in the BizTalk Server Administration Console. For example, with the following error message:
This strategy will cover 95% of the flat file scenarios we cover in our integration project. But you can always use the same strategy to validate the flat-file structure before BizTalk Server translate them to XML in more complex scenarios.
Stay tuned for the following BizTalk Server Best practices, Tips, and Tricks.