Order cheap dedicated servers in USA.

Parsing CSV Data, the Easy Way

September 17th, 2009

I am always surprised how there is no direct support for parsing CSV data in .NET even though Microsoft Excel can export data to CSV and it is infinitely easier to parse a CSV file than an Excel document. (correct me if I am wrong) I rarely deal with files anymore and when I do they are typically XML formatted, so when it came time to parse some CSV files I browsed CodePlex for a project which would handle this task for me. There are several options there but nothing looked right for me so I searched Code Project and found CsvReader which is designed for reading large CSV documents quickly. Great, it had unit tests and all but 1 of the many tests passed so I got to using right away.

I finished my work on this particular project and today my client sends me an updated input file which does not include header lines and actually starts out with a few blank lines. I also found that the last field in some rows were blank and these differences were enough to break all of my code which was using CsvReader. I tried to dig through the code and recompile an updated version that would safely handle my new requirements but it was too much work. This particular document has 3 types of rows: batch, header, and detail. Each of these line types has a different number of fields and CsvReader seems to assume all lines with have the same number of fields and throws exceptions if it does not like what it reads. It is very inflexible and unforgiving. And if the last column has no value it also throws an exception. It was not helpful in the slightest.

Now the only reason to use a complex CSV parser is the condition that a field contains a comma and is then wrapped with quotes. I created a spreadsheet with Microsoft Excel 2007 with values with commas and exported the document to CSV format. It was simple enough with double quotes around the fields which had commas in the value. With a little coding I could do this. And with a little .NET 3.5 I could do it as a set of extension methods with the proper amount of unit tests and safely start using it in this project, so that is what I did. In about a half hour I had it all working with unit tests and was able to quickly integrate it with the client application and get it to pass all of the unit tests for business logic.

Below you can see the extension class which is just a little over 100 lines of code. I hope that you find it easy to read and understand in case you need to extend it to account for more complexities in your CSV files. You can download the project and see how to make use of it with the unit tests. It only accounts for double quoted fields, so if you do have single quoted fields you will need to make an update to this class. If I find this parser is not sufficient I will probably create more unit tests and add more functionality as needed. For now it is doing the job and I am happy to share it with anyone who would like to use freely.

5 Responses to “Parsing CSV Data, the Easy Way”

  1. Geoff Lane Says:

    Check out the Wikipedia entry about CSV, you're missing some cases like having embedded newlines and embedded quotes. Some of those things are ultimately what makes CSV a bit more difficult.

  2. Nate Says:

    Comma.ToString().ToCharArray()

    tsk,tsk,tsk... performant version is

    new []{Comma}

  3. Brennan Stehling Says:

    @Geoff,

    I know it does not handle all possible complications, but for now it is working. I could handle the embedded quotes, but I really do not want to deal with embedded newline characters. That instantly makes this whole thing 100 times more complex. And at about 100 lines of code it is doing a pretty good job. It does appear the data I will be getting does not have embedded quotes or newlines so for my purposes it is good to go. If I could get full CSV support for all complexities with under 200 lines of code, while still keeping the code readable, I would be happy to use it.

    And if I do get these complexities in the CSV I think I will just put limits on the output file format to require a simplified CSV format, meaning no field values can be more than the single line and should not have quotes. Still, there should be something built into the .NET framework to for importing/exporting CSV data.

    @Nate,

    I did not like that either but I did not know a better way to do it without digging deeper. I remember at one point I could initialize an array with just brackets and values inside them, but that must have been something before I got into C#. Thanks for the tip. I have adjusted the code to use a variable for the separator now.

    public static readonly char[] Separator = new char[] { Comma };

    I would like to be a constant but using "new" is not allowed in that context. I have more to add to this Extensions project for database and other common uses that I will incorporate into a common project once I get into plans for a new OOS project from SmallSharpTools. I am considering merging some of the projects into a main project which is very modular and lightweight so it is easy to use in your own project with limited dependencies.

  4. Interesting Finds: 2009 09.15 ~ 09.20 - gOODiDEA.NET Says:

    [...] Parsing CSV Data, the Easy Way Other [...]

  5. dave Says:

    Thanks for the work and posting