Home‎ > ‎

Parsing EDI with LINQ

 Reading EDI files is simpler than it first looks. For a start, the file is broken up by two kinds of delimiters: ~ usually separates each segment, and * separates each element within a segment. Here's a sample EDI file for the 850 message (Purchase Order):

ISA*00*          *00*          *01*000123456      *ZZ*PARTNERID      *090827*0936*U*00401*000000055*0*T*>~
N1*ST*John Doe~
N3*126 Any St*~

 This sample is "unwrapped", meaning I've put a carriage-return/linefeed at the end of each segment so it's easier to read. A "wrapped" file would omit the CR/LF and just look like one long string. Segments begin with a segment ID (2-3 letter codes), and then one or more elements that contain data.

 The first segment--called the Interchange Control Header (the line beginning with "ISA")--is always fixed-width. This is so you can discover what the delimiters are supposed to be for each file: character 104 is always the element delimiter, 105 is the sub-element delimiter, and 106 is the segment delimiter.

 The EDI guide given to you by your trading partner will tell you what each segment is used for and what each element means to them. So already we can write a short piece of LINQ code to read an EDI file into a collection of segments and elements:

StreamReader reader;
reader = new StreamReader(InputStream);
String message = reader.ReadToEnd();

// Discover the delimiters used. They're always in the same positions 
char SegDelimiter = message[105];
char ElemDelimiter = message[103];

var segments = from seg in message.Split(SegDelimiter).Select(x => x.Trim())
               where !String.IsNullOrEmpty(seg)
               select new {
                    SegID = seg.Substring(0,seg.IndexOf(ElemDelimiter)),
                    Elements = seg.Split(ElemDelimiter).Skip(1).ToArray()

 The above assumes that InputStream is an IO.Stream with the file you want to read. First we load that into a StreamReader so that we can fill a string with the entire contents of the file. Then we discover what the segment and element delimiters are (positions 105 and 103 in a zero-based array). The next step is to use LINQ to first break it up by segment (using Split(SegDelimiter)), and then create an anonymous type with the segment ID and an array of elements. 

 We call .ToArray() at the end of splitting the line into elements so that we can address them by index position later. In fact, the only reason I even did that is because I wanted to be clean and tidy and Skip(1) the first element that identifies the segment. We'd already put that into "SegID".

 Now you have an IEnumerable full of anonymous types with the segment ID and its addressable elements. This might already be enough for your needs, but we'll assume that the data will be more useful if you could translate it into a hierarchy. Many shops use XML as their intermediate format, so we'll convert into that. 

 However, we don't want to merely wrap each segment and element up with angle brackets, we'd like to have some meaningful structure. EDI is a hierarchical format, much like XML, but the structure isn't explicit the way it is with XML. In this tutorial I'm going to model the implicit structure of EDI in XML and use it to drive a conversion algorithm. The below defines the structure of a common EDI message type, the 850 (Purchase Order) and will serve as a reusable configuration that we can tweak for other messages and trading partners:

  <Rank Name="Envelope">
    <Segment ID="ISA" Name="Interchange Control Header">
      <Element Position="1" ID="I01" Name="Authorization Information Qualifier"/>
      <Element Position="2" ID="I02" Name="Authorization Information"/>
      <Element Position="3" ID="I03" Name="Security Information Qualifier"/>
      <Element Position="4" ID="I04" Name="Security Information"/>
      <Element Position="5" ID="I05" Name="Interchange ID Qualifier"/>
      <Element Position="6" ID="I06" Name="Interchange Sender ID"/>
      <Element Position="7" ID="I05" Name="Interchange ID Qualifier"/>
      <Element Position="8" ID="I07" Name="Interchange Receiver ID"/>
      <Element Position="9" ID="I08" Name="Interchange Date"/>
      <Element Position="10" ID="I09" Name="Interchange Time"/>
      <Element Position="11" ID="I10" Name="Interchange Control Standards ID"/>
      <Element Position="12" ID="I11" Name="Interchange Control Version Num"/>
      <Element Position="13" ID="I12" Name="Interchange Control Number"/>
      <Element Position="14" ID="I13" Name="Acknowledgement Requested"/>
      <Element Position="15" ID="I14" Name="Usage Indicator"/>
      <Element Position="16" ID="I15" Name="Component Element Separator"/>

    <Segment ID="GS" Name="Functional Group Header">
      <Element Position="1" ID="479" Name="Functional Identifier Code"/>
      <Element Position="2" ID="142" Name="Application Senders Code"/>
      <Element Position="3" ID="124" Name="Application Receivers Code"/>
      <Element Position="4" ID="373" Name="Date"/>
      <Element Position="5" ID="337" Name="Time"/>
      <Element Position="6" ID="28" Name="Group Control Number"/>
      <Element Position="7" ID="455" Name="Responsible Agency Code"/>
      <Element Position="8" ID="480" Name="Version ID"/>

    <Rank Name="Heading">
      <Segment ID="ST" Name="Transaction Set Header">
        <Element Position="1" ID="143" Name="Transaction Set Identifier Code"/>
        <Element Position="2" ID="329" Name="Transaction Set Control Number"/>

      <Segment ID="BEG" Name="Beginning of PO">
        <Element Position="1" ID="353" Name="Transactional Set Purpose Code"/>
        <Element Position="2" ID="92" Name="Purchase Order Type Code"/>
        <Element Position="3" ID="324" Name="Purchase Order Number"/>
        <Element Position="5" ID="373" Name="Date"/>

      <Segment ID="REF" Name="Reference Identification">
        <Element Position="1" ID="128" Name="Reference Identification Qualifier"/>
        <Element Position="2" ID="127" Name="Reference Identification"/>

      <Segment ID="TD5" Name="Carrier Details">
        <Element Position="1" ID="133" Name="Routing Sequence Code"/>
        <Element Position="2" ID="66" Name="Identification Code Qualifier"/>
        <Element Position="3" ID="67" Name="Identification Code"/>

      <Segment ID="N1" Name="Name">
        <Element Position="1" ID="98" Name="Entity Identifier Code"/>
        <Element Position="2" ID="93" Name="Company Name"/>

      <Segment ID="N2" Name="Additional Name Information">
        <Element Position="1" ID="93" Name="Name"/>
        <Element Position="2" ID="93" Name="Address"/>

      <Segment ID="N3" Name="Address">
        <Element Position="1" ID="166" Name="Street Address"/>
        <Element Position="2" ID="166" Name="Addl Address"/>

      <Segment ID="N4" Name="Location">
        <Element Position="1" ID="19" Name="City"/>
        <Element Position="2" ID="156" Name="State"/>
        <Element Position="3" ID="116" Name="Postal Code"/>
        <Element Position="4" Name="Country"/>

      <Rank Name="Detail">
        <Segment ID="PO1" Name="Baseline Item Data">
          <Element Position="1" ID="350" Name="Assigned Identification"/>
          <Element Position="2" ID="330" Name="Quantity"/>
          <Element Position="3" ID="355" Name="Unit"/>
          <Element Position="4" ID="212" Name="Unit Price"/>
          <Element Position="6" ID="235" Name="Product ID Qualifier"/>
          <Element Position="7" ID="234" Name="Product ID"/>

      <Segment ID="CTT" Name="Transaction Totals">
        <Element Position="1" ID="354" Name="Number of Line Items"/>

      <Segment ID="SE" Name="Transaction Set Trailer">
        <Element Position="1" ID="96" Name="Number of Included Segments"/>
        <Element Position="2" ID="329" Name="Transaction Set Control Number"/>

    <Segment ID="GE" Name="Function Group Trailer">
      <Element Position="1" ID="97" Name="Number of Transaction Sets Incl"/>
      <Element Position="2" ID="28" Name="Group Control Number"/>

    <Segment ID="IEA" Name="Interchange Control Trailer">
      <Element Position="1" ID="I16" Name="Num of Included Functional Grps"/>
      <Element Position="2" ID="I12" Name="Interchange Control Number"/>

 The above divides the message into ranks and gives names for each segment and element. We'll use the ranks to control how the data is nested in the translated message, and the names will be used for the XML element names.

 The best way to traverse a hierarchy is with a recursive function, and before we write that we'll take a moment to create a formal class to store our segments in.

        class Segment
            public string SegID { get; set; }
            public string[] Elements { get; set; }

 Then slightly modify our LINQ query to use this instead of an anonymous type:

            var segments = from seg in message.Split(SegDelimiter).Select(x => x.Trim())
                           where !String.IsNullOrEmpty(seg)
                           select new Segment {
                               SegID = seg.Substring(0,seg.IndexOf(ElemDelimiter)),
                               Elements = seg.Split(ElemDelimiter).Skip(1).ToArray()

 Our recursive function for translating the collection of parsed segments into XML would then look like this:

private IEnumerable<XStreamingElement> Ranks(XElement RankDefinition, IEnumerable<Segment> Segments)
    if (RankDefinition.Name.LocalName == "Rank")
        String BeginningSegment = RankDefinition.Elements("Segment").First().Attribute("ID").Value;
        String EndingSegment = RankDefinition.Elements("Segment").Last().Attribute("ID").Value;
        List<IEnumerable<Segment>> SegmentGroups = new List<IEnumerable<Segment>>();
        List<Segment> CurrentGroup = null;
        foreach (Segment seg in Segments)
            if (seg.SegID == BeginningSegment)
                CurrentGroup = new List<Segment>();

            if (CurrentGroup != null)

            if (seg.SegID == EndingSegment)
                CurrentGroup = null;
        return from g in SegmentGroups
               select new XStreamingElement(RankDefinition.Attribute("Name").Value.Replace(' ', '_'),
                                    from e in RankDefinition.Elements()
                                    select Ranks(e, g));

    if (RankDefinition.Name.LocalName == "Segment")
        var Matching = from s in Segments
                       where s.SegID == RankDefinition.Attribute("ID").Value
                       select s;
        return new XStreamingElement[] {
            new XStreamingElement(RankDefinition.Attribute("Name").Value.Replace(' ', '_'),
                                     from s in Matching
                                     from e in RankDefinition.Elements("Element")
                                     where s.Elements.Length >= int.Parse(e.Attribute("Position").Value)
                                     select new XElement(e.Attribute("Name").Value.Replace(' ', '_'),
                                                s.Elements[int.Parse(e.Attribute("Position").Value) - 1]))

    return null;

 It uses XStreamingElement to return an XML tree that's built dynamically from more LINQ queries. What you pass to it is an XElement containing our mapping configuration plus the collection of Segments we parsed from the EDI file.

 The function will convert the segments into something that might look a bit like this:

      <Authorization_Information>          </Authorization_Information>
      <Security_Information>          </Security_Information>
      <Interchange_Sender_ID>000123456      </Interchange_Sender_ID>
      <Interchange_Receiver_ID>PARTNERID      </Interchange_Receiver_ID>
        <Company_Name>John Doe</Company_Name>
      <Additional_Name_Information />
        <Street_Address>123 Any St</Street_Address>

 Although quite noisy with tags, it may now be more suitable for consumption by your fulfillment system or stylable with XSLT.

Further reading