ignore but recall malformed data : iterate & process folder with bash script + .jar -
there folder full of files- each of contains data need convert single output file.
i've built conversion script- can run so:
java -jar tablegenerator.jar -inputfile more-adzuna-jobs-type-9.rdf -skillnames skillnames.ttl -countries countries_europe.rdf -outputcsv out.csv
the problem is- of files contain characters regarded invalid .jar
file, there way create bash script run command simultaneously on folder full of these files (many hundreds) , each 1 generates error to:
- ignore it, i.e. not let halt process
- remember it- later can dealt appropriately
it seems possible bash-fu quite weak- logical way execute task?
if java program in fact exits error status should easy write bash
script processes files in folder , tracks had errors. emphasize java program must exit error (non-zero) status easy. example, should terminate execution invoking system.exit(1)
.
if program report success or failure system via exit status, might this:
#!/bin/bash # name of directory process expected first argument. if [ $# -lt 1 ]; echo usage: $0 directory exit 1 fi # first argument script $1 if [ -e failures.txt ]; rm failures.txt fi touch failures.txt f in $1/*; if ! java -jar /path/to/tablegenerator.jar \ -inputfile $f \ -skillnames /path/to/skillnames.ttl \ -countries /path/to/countries_europe.rdf \ -outputcsv $f.out.csv echo $f >> failures.txt fi done
that iterates on files in directory specified first script argument, assigning each path in turn shell variable $f
, , runs java program each one, passing path argument following -inputfile
. in event program exits non-zero status, script writes name of failing file in file failures.txt
in script's current working directory (unrelated data directory designated it) , continues.
note not run command simultaneously on files, instead iteratively. uncertain whether key component of request. inasmuch system run on unlikely have separate core can dedicate each of hundreds of instances of program, , inasmuch storage medium on files reside has 1 data channel, cannot run command hundreds of times simultaneously, anyway.
if want run multiple jobs in parallel bash
has ways that, recommend getting serial script working first. if processing files serially not enough can explore ways achieve parallelism. however, extent java vm startup time may present issue starting hundreds of jvms, might better off building multiple-file handling directly java program, can process files in same vm.
Comments
Post a Comment