importing a file with variable columns

Question

We have been tasked with importing a file that has a large number of variable columns. For the sake of easy explanation, let's say the first five columns are standard (time, entity, ud1, ud2, ud3) but the file could have from 50 to 150 additional columns, one for each account. If there is no data for the account, there is no column for it. New accounts could appear in the future without warning.
No, we don't have the ability to change the format of the report. (Oh, how I wish.)
I have thought up several ways of making this work but each is fraught with its own type of peril.

Create a new, custom table dynamically to stage the data. Parse the column names from the file. Use the column name list to run a new SQL query to unpivot.
Parse the file in-memory to manually unpivot by parsing each data column and adding rows to a datatable, then returing the full data table.
Maintain a list of the columns we care about the most, parse the file in advance and save the column name/position maps to parameters/a lookup table. Use up every possible attribute/value field in a data source to stage to BI Blend and try to unpivot from there. Hope they never need more "important" columns than OS can handle. (This is similar to option 1 but we're not stuck dropping/creating a custom table ourselves and we have more consistent column names.)
Write a manual file-parser that creates a new, sane text file and then imports that instead. (Seems wasteful. If I can get it this far, I can probably just do it in-memory, ie, option 2.)
Some other, better idea that I haven't thought of yet.

RobbSalzmann · Accepted Answer

I would use approach 1 or 2. The deciding factor is the size of the file, number of records.

Since you know the first n columns are fixed to specific dimensions, and the remaining columns are specific to accounts, assuming column header = account name, this is pretty easy to pivot, something like this:

Function PivotAccounts(inputTable As DataTable) As DataTable
    ' Prepare output table
    Dim outputTable As New DataTable()
    outputTable.Columns.Add("time", GetType(String))
    outputTable.Columns.Add("entity", GetType(String))
    outputTable.Columns.Add("ud1", GetType(String))
    outputTable.Columns.Add("ud2", GetType(String))
    outputTable.Columns.Add("ud3", GetType(String))
    outputTable.Columns.Add("account", GetType(String))
    outputTable.Columns.Add("amount", GetType(Decimal))

    ' Validate column count
    If inputTable.Columns.Count < 7 Then
        Throw New InvalidOperationException("Input table must have at least 7 columns.")
    End If

    ' Fixed column indexes
    Dim fixedColCount As Integer = 5
    Dim lastColIndex As Integer = inputTable.Columns.Count - 1

    ' Iterate through each row
    For Each row As DataRow In inputTable.Rows
        ' Extract fixed values once per row
        Dim timeVal As String = row(0).ToString()
        Dim entityVal As String = row(1).ToString()
        Dim ud1Val As String = row(2).ToString()
        Dim ud2Val As String = row(3).ToString()
        Dim ud3Val As String = row(4).ToString()

        ' Iterate over the account columns in the current row
        For i As Integer = fixedColCount To lastColIndex - 1
            Dim accountName As String = inputTable.Columns(i).ColumnName
            Dim rawValue As Object = row(i)

            If Not IsDBNull(rawValue) AndAlso Decimal.TryParse(rawValue.ToString(), Nothing) Then
                Dim amountVal As Decimal = Convert.ToDecimal(rawValue)

                ' create a record for each account's amount
                If amountVal <> 0D Then
                    Dim outputRow As DataRow = outputTable.NewRow()
                    outputRow("time") = timeVal
                    outputRow("entity") = entityVal
                    outputRow("ud1") = ud1Val
                    outputRow("ud2") = ud2Val
                    outputRow("ud3") = ud3Val
                    outputRow("account") = accountName
                    outputRow("amount") = amountVal
                    outputTable.Rows.Add(outputRow)
                End If
            End If
        Next
    Next

    Return outputTable
End Function

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded