Sonntag, 5. Juni 2011

Convert a simple CSV File with Groovy, Java, Scala, awk, Ruby, Python, PHP or Bash?

Change Log:
  • 05.06.2011 1:30 pm - Initial created Post with  Groovy, Java, Scala, awk, Ruby, Python Implementation
  • 05.06.2011 4:00 pm - Add PHP implementation and update voting (now you can vote for PHP).
  • 06.06.2011 4:00 pm - Add Bash implementation and update voting (now you can also vote for Bash)

Which is the best programming language for converting a simple CSV into another format?

First I blogged three Java VM based solutions written in Groovy, Java and Scala to convert a simple CSV file into another format. Rainer sends me the Java based solution, yesterday Axel Knauf sends an awk based solution, Niko sends Ruby based solution, Hendrik sends a Python based solution, Sebastian sends me a PHP implementation and Julien sends a Bash version. Now there are a Groovy, Java, Scala, awk, Ruby, Python, PHP and Bash implementation.

Now here again a complete overview of the different implementations:

The Groovy Implementation:
new File("output.csv") withPrintWriter { out ->
new File("input.csv") splitEachLine(';') { fields ->
def name = fields[2]
def firstname = fields[1]
def kto = fields[3]
def blz = fields[4]
def amount = fields[5]
out.println "${name};${firstname} ${name};${kto};${blz};${amount}"
}
}

The Java Implementation:
import java.io.*;
public class CsvConvertor {
public static void main(String[] args) throws Exception {
FileWriter out = new FileWriter("output.csv");
BufferedReader in = new BufferedReader(new FileReader("input.csv"));
String line;
while ((line = in.readLine()) != null) {
String[] fields = line.split(";");
String name = fields[2];
String firstname = fields[1];
String kto = fields[3];
String blz = fields[4];
String amount = fields[5];
out.append(
String.format("%s;%s %1$s;%s;%s;%s%n",
name, firstname,kto, blz, amount));
}
out.close();
}
}

The Scala Implementation:
import io.Source._
import java.io._
object CsvConvertor extends Application {
val outputCsv = new FileWriter("output.csv")
val accounts = fromFile("input.csv") getLines() map (line => Account(line))
accounts foreach (account => outputCsv append (account toCsv))
outputCsv close
}
case class Account(line: String) {
val data = line split (';')
val firstname = data(1)
val lastname = data(2)
val kto = data(3)
val blz = data(4)
val amount = data(5)
def toCsv() =
"%s;%s %1$s;%s;%s;%s%n" format (lastname, firstname, kto, blz, amount)
}

Here the shell command and awk script:
awk 'BEGIN { FS=";"; OFS=";" } { print $2,$1" "$2,$3,$4,$5 }' input.csv > output.csv
view raw convert.awk hosted with ❤ by GitHub

The pure Ruby Implementation:

The Python Implementation:
import csv
with open('input.csv', 'rb') as input, open('output.csv', 'wb') as output:
reader = csv.reader(input, delimiter=';')
writer = csv.writer(output, delimiter=';')
for input_row in reader:
firstname, name, accno, bsc, amount = input_row
output_row = [name, '%s %s' % (firstname, name), accno, bsc, amount]
writer.writerow(output_row)
# For Jython 2.5, the context manager usage ('with') has to be replaced with a classic try..finally.
# For Python 3.x, the files have to be opened in text mode ('r', 'w') instead of binary mode ('rb', 'wb').
view raw CsvConverter.py hosted with ❤ by GitHub

The pure PHP Implementation:
<?php
$start = microtime(true);
$fpIn = fopen('input.csv', 'r');
$fpOut = fopen('output-pure.csv', 'w');
while (($row = fgets($fpIn)) !== false)
{
$fields = explode(";", $row);
fwrite($fpOut,
implode(
";",
array(
"name" => $fields[0],
"firstname name" => '"'.$fields[1] ." ". $fields[0].'"',
"kto" => $fields[2],
"blz" => $fields[3],
"amount" => $fields[4]
)
)
);
}
$end = microtime(true);
$duration = $end - $start;
echo "Duration: ".round($duration, 2) . "s".PHP_EOL;

The Bash Implementation:
#!/bin/bash
while IFS=$';' read -r name vorname kto blz amount
do
echo "$name;$vorname $name;$kto;$blz;$amount" >> output.csv
done < input.csv
view raw converter.sh hosted with ❤ by GitHub


I'm curious whether there are other implementation proposals (Clojure, Perl, PHP, …), if you have one you could send me the script via Twitter or leave a comment here…

I am also curious which implementation Groovy, Java, Scala, awk or Ruby you like and why? I have create voting here:


Thanks Rainer, Axel Knauf, Niko Dittmann, Hendrik Heimbuerger S.Barthenheier and Julien Guitton for the Java, awk, Ruby Python, PHP and Bash implementation.

Links: