I am always surprised how there is no direct support for parsing CSV data in .NET even though Microsoft Excel can export data to CSV and it is infinitely easier to parse a CSV file than an Excel document. (correct me if I am wrong) I rarely deal with files anymore and when I do they are typically XML formatted, so when it came time to parse some CSV files I browsed CodePlex for a project which would handle this task for me. There are several options there but nothing looked right for me so I searched Code Project and found CsvReader which is designed for reading large CSV documents quickly. Great, it had unit tests and all but 1 of the many tests passed so I got to using right away.
I finished my work on this particular project and today my client sends me an updated input file which does not include header lines and actually starts out with a few blank lines. I also found that the last field in some rows were blank and these differences were enough to break all of my code which was using CsvReader. I tried to dig through the code and recompile an updated version that would safely handle my new requirements but it was too much work. This particular document has 3 types of rows: batch, header, and detail. Each of these line types has a different number of fields and CsvReader seems to assume all lines with have the same number of fields and throws exceptions if it does not like what it reads. It is very inflexible and unforgiving. And if the last column has no value it also throws an exception. It was not helpful in the slightest.
Now the only reason to use a complex CSV parser is the condition that a field contains a comma and is then wrapped with quotes. I created a spreadsheet with Microsoft Excel 2007 with values with commas and exported the document to CSV format. It was simple enough with double quotes around the fields which had commas in the value. With a little coding I could do this. And with a little .NET 3.5 I could do it as a set of extension methods with the proper amount of unit tests and safely start using it in this project, so that is what I did. In about a half hour I had it all working with unit tests and was able to quickly integrate it with the client application and get it to pass all of the unit tests for business logic.
Below you can see the extension class which is just a little over 100 lines of code. I hope that you find it easy to read and understand in case you need to extend it to account for more complexities in your CSV files. You can download the project and see how to make use of it with the unit tests. It only accounts for double quoted fields, so if you do have single quoted fields you will need to make an update to this class. If I find this parser is not sufficient I will probably create more unit tests and add more functionality as needed. For now it is doing the job and I am happy to share it with anyone who would like to use freely.