Import variable fixed length file

I have a requirement to import a file of rows containing fixed length data. The problem is that each row can be one of 5 different formats (i.e. different columns) -- where the "type" of row is indicated by the first two characters of the row. Each row gets inserted into its own table.

Could I use a simple Conditional Split to route the rows Or is the split for routing similiar rows Anyways, problems are never this simple...

In addition, each "grouping" of rows is related. The "first" row is considered the "primary" row (and gets a row id via IDENTITY, whereas the remaining rows in the group are "secondary" rows and have foreign key references back the the primary rows id.

Given (using spaces to separate columns and CrLf to show "grouping"):

01 MSFT blah blah
02 blahblah blahblahblah
03 boring boringblah

01 AAPL blah blah
02 blahblah blahblahblah
03 boring boringblah

01 CSCO blah blah
02 blahblah blahblahblah
02 blahblah blahblahblah
03 boring boringblah

So, the first 3 lines are all related to a MSFT record which needs to be spread across multiple tables. The next three lines are all related to AAPL, And the next FOUR lines (yes, each record can have zero, one, or more secondary rows) are related to CSCO.

(If this is still not clear, all the "01" rows will be written to [Table1] with each row having an IDENTITY value. All the "02" rows will be written to [Table2] the a FK pointing to the correct [Table1] row. All the "03" rows will be written to... and so on.

Any ideas would be appreciated.



Answer this question

Import variable fixed length file

  • AdKhan

    For an incrementing integer key, how about the Row Number transform on http://www.sqlis.com. Rather than relying on IDENTITY, used an Exec SQL Task prior to the Data Flow to capture the current max value in use, and set to a variable. This can be supplied as the seed to the transform.

    In this case it will not be much use as you want to assign the key values in the script since you would have multiple outputs from the source and each output would need that key value assigned to the primary as I think of it.



  • thejez

    You could read the data into a script destination rather than a recordset. Then store the data in a form that you can then use in the other script component to lookup the values

  • Pmps

    Well. Everyone has given me some place to start. I appreciate it. I am sure I will be back with more questions as this particular project advances.
  • Phil Thornley

    Whilst I think the derived colum has its place, I prefer the script component, it provides more control. It will also allow you to include the row number. You can include the conditional split in the script task as well.

    This is done by redirecting the input row to an output row. However if the outputs are very different I suggest that you would be better of defining the outputs you want on the script component.



  • ArunPersad

    I think the eaiest way is to pass your flow via an OLEDB command, this inserts the 01 record and returns the identity into the flow. Then pass the flow into a script component that puts the Id of the 01 record on all subsequent records until another 01 record is found and you repeat the excercise.

    Then store the results in table2



  • hellosmithy

    There is no precedence inside a data flow. The way to solve this is -

    1 Change the package such that the "ID" is assigned once in advanced, before the separate paths are broken up.

    2 Partition the load across several Data Flows tasks. The easiest and most efficient way to do this is to use the raw file source and destination. Split the flow by ending and starting with raw files at the stages you need to enforce the order of execution.



  • gaxtell

    Doing well but hit a roadblock. First off, the Row Numbers transformation is great. Works like a charm.

    But the block is the Lookup transformation. Seems that it will only perform lookups against a connection not the in memory recordset variable I created. I am guessing that I could use the Script transformation but have not figured out how to get access to the Recordset transformation object from there.... Will keep looking.


  • cuy

    The first issue I see is how to parse the file. A Script Component source, or a full custom component source would do this. Each row type would be directed to a separate output. These outputs could then be directed to separate destinations for insertion. The issue is how to link them togather with the correct ID. I would actually avoid an identity, and track and generate the ID in the source. Perhaps use an Exec SQL Task pior to the Data Flow to get the current MAX ID and store it in a variable. This would then be used as the seed for the source when it ws generating increment ID values. It all makes perfect sense in my mind, but I may not have explained it very well.

    Simon, I think this is along the lines of what you are suggesting but with some differences, avoiding a two stage read and intermediate DB trip.



  • Haitham.ElGhaareeb

    Why do I think this will get ugly...

    My current line of thought is the following:

    1) Ragged right with two columns defined. RowType and RowContents. Needed ragged right since each row type has its own format. So RowType has the values A1, B1, C1, D1, E1, or F1. RowContent is the remainder of the row.

    2) Into a Flat File Source.

    3) Into a Conditional Split transformation.

    4) Into..... Derived Column transformation ( )

    I need to take the RowContents column (one column) and break it into the correct columns (one or more) for that particular row type. Would a derived column transformation using the SUBSTRING function to grab each value be the correct approach

    Thinking a cool idea for a custom transformation object would be something like the fixed width or ragged right editor of the flat file connection as the input with individual columns as the output...

    5a) (A1) Into..... ( )

    Need to "determine" row number or what the identity will be...

    Use precedence to make sure this is run before any other steps.

    5b) (B1->F1) Into.... Lookup transformation ( )

    Need to find its "parent" row number for FK reference.

    6) Into.... Severl OLE DB destinations

    --------------------

    My idea is to insert a transformation (not sure which) between steps 2 and 3 which gives each row a sequential number (1, 2, 3,...., n).

    Then, when in step 5b (6b now ) I do my lookup, my lookup would be to get the max identity from the A1 rows where the sequencial number is less than the current row. (Example: row B1/100, C1/101, D1/102 would look for the identity for A1/99)


  • DWBI_Doug

    The conditionla split will defintly work for spliting the incoming data, not sure what to do about relating rows though
  • Daniel_CL

    Jason Callas wrote:

    Doing well but hit a roadblock. First off, the Row Numbers transformation is great. Works like a charm.

    But the block is the Lookup transformation. Seems that it will only perform lookups against a connection not the in memory recordset variable I created. I am guessing that I could use the Script transformation but have not figured out how to get access to the Recordset transformation object from there.... Will keep looking.

    Jason,

    This'll help in accessing the recordset from script: http://blogs.conchango.com/jamiethomson/archive/2006/01/04/2540.aspx

    As for accessing something other than a relational source in the LOOKUP - well I have asked for that here: http://lab.msdn.microsoft.com/productfeedback/viewfeedback.aspx feedbackid=58d0100b-aa34-4191-ae03-7655543614a1 Feel free to add your scenario as this can only help to get this feature included in the next version.

    -Jamie



  • Bob Schmidt

    This is great so far. Thanks for the help.

    Is there the idea of precendence (the ability for one task to not start until another is done) within the dataflows I need all the A1 rows to be processed first so I can get them into a form in which they can have IDENTITY values assigned before the other rows perform lookups against them.

    Or do I need to have separate dataflows (Which would mean processing the imput file more than once...)


  • Sean Whitton

    If only SSIS had an easy way of generating surrogate keys :)



  • Import variable fixed length file