I have a lot of experience with um, "manhandling" flat file data into usable formats. Usually this "data" comes to me in the form of excel spreadsheets, crafted with presentation in mind. They look great but, have little flexibility when it comes to importing the data into relational databases. There are hundreds of solutions for doing this sort of manipulation in bulk and if that's your lot in life by all means beg the IT department to buy you a good data transformation tool!
For some reason my situations are frequently one offs of user data that is begging to get into a database but, as of yet is still running in the wild. My solution has come down to quick Groovy scripts and splicing spreed sheets into multiple CSV files Groovy to chew through. This can be done with Python and many other script languages but, I'm fully fluent in Java so Groovy is my fastest path to a database at this point in time. This is a great place to learn a new scripting language if you have the time but, I usually want this over in a flash.
On that note, if you would like to just grab the code and run jump to the bottom to grab My CsvUtils code which takes care of this whole process in a more formal utility object.
Reading a file and working with it is pretty darn simple in Groovy, the following code will open up a CSV and move all the data into nicely structured arrays. (This provides the same results as calling CSVUtils.csvToList())
//load room types from file
def f= new File('roomtypemap-min.csv');
//set up a List of 'Rows' which will contain a list of 'columns' or you can think of it as array[][]
roomTypes = [];
def lines = 0;
f.splitEachLine(',') {
def row = [];
row = it;
roomTypes << row;
lines++
}
println "Processed $lines Lines"
The above chewed through a CSV file with name,value pairs of data and placed them in a multi-dimensional array types as a Java list. If your not up on Java collections, it's a whole lot like an array but, better.
Now lets say we did the above code with a few files then process all that data into a new CSV file ready for your code. The code below shows the building of the final results. You'll notice that along with the
roomTypes list imported from a CSV file there are now a few others
resortList and
amenities. All these were pulled from random table dumps and such and I turned them in to usable data.
def aId = 0;
roomTypes.each{
def indx = roomTypes.indexOf(it);
def resortList = roomResortMap[indx];
def roomType = it[0];
resortList.each() {
def facIdx = it.toInteger();
def facilityId = resorts[it.toInteger()];
amenities.each() {
def ai = amenities.indexOf(it);
def avail = values[ai];
if(avail[facIdx] == true) {
def description = it;
println "\"$facilityId\",$roomType,$aId,\"$description\"";
aId++;
}
}
}
There are a couple things to point out if your new to Groovy. While there is a whole lot of Java in Groovy it also looks a lot like JavaScript making Groovy pretty easy for JavaScript developers to pick up despite deep Java knowledge.
The first thing to note is the iteration I am performing with the .each callback. Personally, I think it looks a good bit like a JavaScript closure. Your simply passing in code to the .each(){} iterator to execute on each item in the List. OH the () is optional when creating this closure so add if you wish.
Second, in the version of Groovy I am using, 1.7.10, you have to do a little bit of work to get the current index position of the for loop. To do this simply call yourListName.indexOf(it). It looks like the upcoming 1.8 is going to have an additional closure called eachWithIndex() which will provide you
it and
i variables where the
i will be the current index. I'm looking forward to that feature!
Finally I just want to point out one of my favorite things for keeping life simple, string replacement. Notice on the next to list line before the braces cascade I'm calling
println but, the cool part is the replacements in the string using $fieldName. When your trying to print out complicated strings nothing is worse for readability and bugs then concatenating or calling tons of append() methods. It just leads to errors. You can do simple string replacement with variables using the $fieldName inside your string or for more complicated inserts ${} will work for things like ${Object.field}.
Hope this will help you to simplify the tedious task of data mining text files!
My CSVUtils Object:
/**
* CSVUtils is released into the public domain, do what ya want with it.
* By Lee Clarke
*/
public class CSVUtils {
/**
* Removed quotes from around imported csv values if present.
*/
public static String stripQuotes(String strIn) {
def valOut = strIn;
if(valOut==null)
valOut == "";
if(valOut.startsWith("\""))
valOut = valOut.substring(1);
if(valOut.endsWith("\""))
valOut = valOut.substring(0,valOut.length()-1);
return valOut;
}
/**
* Load the csv into a Map using indexed position of values for key and value.
* @param keyPos - col index of the key value in a row
* @param valPos - pull value data from index/col position in row
* @param csvFilePath - full path to file
*/
public static Map csvToMap(int keyPos, int valPos, String csvFilePath) {
def fp= new File(csvFilePath);
def rtnMap = [:];
def palines = 0;
fp.splitEachLine(',') {
if(palines > 0)//skip col header line
{
def row = [];
row = it;
def key = CSVUtils.stripQuotes( row[keyPos]);
def val = CSVUtils.stripQuotes( (valPos < 0)?row.last():row[valPos]);
rtnMap.putAt(key, val);
}
palines++
}
return rtnMap;
}
/**
* loads values in csv file into multi-dimentional like List of rows and cols.
* @param csvFilePath - full file path.
* @param skipFirstRow - skip first row if it contains column names.
*/
public static List csvToList( String csvFilePath, boolean skipFirstRow) {
def fr= new File(csvFilePath);
def rtnList = [];
def rlines = 0;
fr.splitEachLine(',') {
if(!(skipFirstRow && rlines == 0)) {
println "row= $it"
def row = [];
row = it;
rtnList << row;
}
rlines++
}
return rtnList;
}
/**
* loads values in csv file into multi-dimentional like List of rows and cols.
* @param csvFilePath - full file path.
*/
public static List csvToList( String csvFilePath) {
return csvToList(csvFilePath,true);
}
}