ignore but recall malformed data : iterate & process folder with bash script + .jar -


there folder full of files- each of contains data need convert single output file.

i've built conversion script- can run so:

java -jar tablegenerator.jar -inputfile more-adzuna-jobs-type-9.rdf -skillnames skillnames.ttl -countries countries_europe.rdf -outputcsv out.csv 

the problem is- of files contain characters regarded invalid .jar file, there way create bash script run command simultaneously on folder full of these files (many hundreds) , each 1 generates error to:

  • ignore it, i.e. not let halt process
  • remember it- later can dealt appropriately

it seems possible bash-fu quite weak- logical way execute task?

if java program in fact exits error status should easy write bash script processes files in folder , tracks had errors. emphasize java program must exit error (non-zero) status easy. example, should terminate execution invoking system.exit(1).

if program report success or failure system via exit status, might this:

#!/bin/bash  # name of directory process expected first argument. if [ $# -lt 1 ];   echo usage: $0 directory   exit 1   fi  # first argument script $1  if [ -e failures.txt ];   rm failures.txt fi  touch failures.txt  f in $1/*;   if ! java -jar /path/to/tablegenerator.jar \       -inputfile $f \       -skillnames /path/to/skillnames.ttl \       -countries /path/to/countries_europe.rdf \       -outputcsv $f.out.csv       echo $f >> failures.txt   fi done 

that iterates on files in directory specified first script argument, assigning each path in turn shell variable $f, , runs java program each one, passing path argument following -inputfile. in event program exits non-zero status, script writes name of failing file in file failures.txt in script's current working directory (unrelated data directory designated it) , continues.

note not run command simultaneously on files, instead iteratively. uncertain whether key component of request. inasmuch system run on unlikely have separate core can dedicate each of hundreds of instances of program, , inasmuch storage medium on files reside has 1 data channel, cannot run command hundreds of times simultaneously, anyway.

if want run multiple jobs in parallel bash has ways that, recommend getting serial script working first. if processing files serially not enough can explore ways achieve parallelism. however, extent java vm startup time may present issue starting hundreds of jvms, might better off building multiple-file handling directly java program, can process files in same vm.


Comments

Popular posts from this blog

php - Wordpress website dashboard page or post editor content is not showing but front end data is showing properly -

How to get the ip address of VM and use it to configure SSH connection dynamically in Ansible -

javascript - Get parameter of GET request -