Integrate description from one file in another XML file in Perl -


i'm new here apologize bad english. have 2 files (file 1: main-xml-file , file 2: description-file) , want integrate description line per line in specific position (replace xx in hit_def) in xml-file.

file 1: here xml-tree:

 <blastoutput>     <blastoutput_iterations>         <iteration> (gene 1)             <iteration_hits>                 <hit> (1-10)                     <hit_def>         <iteration> (gene 2)             <iteration_hits>                 <hit> (1-10)                     <hit_def> 

and here first , last lines, because file 5 gb big:

<?xmlversion="1.0"?> <blastoutput> <blastoutput_program>rapsearch</blastoutput_program> <blastoutput_version>rapsearch2</blastoutput_version> <blastoutput_reference>yonganzhao,haixutangandyuzhenye.rapsearch2:afastandmemory-efficientproteinsimilaritysearchtoolfornextgenerationsequencingdata.bioinformatics2012,28(1):125-126</blastoutput_reference> <blastoutput_db>/mreferate/dwolff/rapsearch2.23/db/ncbi_nr_dec15</blastoutput_db> <blastoutput_param> <parameters> </parameters> </blastoutput_param> <blastoutput_iterations> <iteration> <iteration_iter-num>1</iteration_iter-num> <iteration_query-def>gene_id_1</iteration_query-def> <iteration_query-len>37</iteration_query-len> <iteration_hits> <hit> <hit_num>1</hit_num> <hit_id>gi|939543432|gb|kpv42113.1|</hit_id> <hit_def>xx</hit_def> <hit_accession>kpv42113.1</hit_accession> <hit_len>162</hit_len> <hit_hsps> <hsp> <hsp_num>1</hsp_num> <hsp_bit-score>58.151</hsp_bit-score> <hsp_score>139</hsp_score> <hsp_evalue>-5.6061</hsp_evalue> <hsp_query-from>1</hsp_query-from> <hsp_query-to>37</hsp_query-to> <hsp_hit-from>54</hsp_hit-from> <hsp_hit-to>90</hsp_hit-to> <hsp_query-frame>0</hsp_query-frame> <hsp_identity>28</hsp_identity> <hsp_positive>33</hsp_positive> <hsp_align-len>37</hsp_align-len> <hsp_qseq>mvvcdepvsaldvsvqaavltllveiqqqhetamili</hsp_qseq> <hsp_hseq>lvlcdepvsaldvsvqaavlnllleiqrehgttmifi</hsp_hseq> <hsp_midline>+v+cdepvsaldvsvqaavlll+eiq++htmii</hsp_midline> </hsp> </hit_hsps> </hit> <hit> <hit_num>2</hit_num> <hit_id>gi|385280362|gb|eif44286.1|</hit_id> <hit_def>xx</hit_def> <hit_accession>eif44286.1</hit_accession> <hit_len>327</hit_len> <hit_hsps> <hsp> <hsp_num>1</hsp_num> <hsp_bit-score>54.6842</hsp_bit-score> <hsp_score>130</hsp_score> <hsp_evalue>-4.56249</hsp_evalue> <hsp_query-from>1</hsp_query-from> <hsp_query-to>37</hsp_query-to> <hsp_hit-from>169</hsp_hit-from> <hsp_hit-to>205</hsp_hit-to> <hsp_query-frame>0</hsp_query-frame> <hsp_identity>24</hsp_identity> <hsp_positive>31</hsp_positive> <hsp_align-len>37</hsp_align-len> <hsp_qseq>mvvcdepvsaldvsvqaavltllveiqqqhetamili</hsp_qseq> <hsp_hseq>lvicdepvsaldvsvqaqiinllqelqtehntamlfi</hsp_hseq> <hsp_midline>+v+cdepvsaldvsvqa++lle+q+htam+i</hsp_midline> </hsp> </hit_hsps> </hit> <hit> <hit_num>3</hit_num> <hit_id>gi|550913550|ref|wp_022666548.1|</hit_id> <hit_def>xx</hit_def> <hit_accession>wp_022666548.1</hit_accession> <hit_len>721</hit_len> <hit_hsps> <hsp> <hsp_num>1</hsp_num> <hsp_bit-score>53.5286</hsp_bit-score> <hsp_score>127</hsp_score> <hsp_evalue>-4.21462</hsp_evalue> <hsp_query-from>1</hsp_query-from> <hsp_query-to>37</hsp_query-to> <hsp_hit-from>549</hsp_hit-from> <hsp_hit-to>585</hsp_hit-to> <hsp_query-frame>0</hsp_query-frame> <hsp_identity>27</hsp_identity> <hsp_positive>31</hsp_positive> <hsp_align-len>37</hsp_align-len> <hsp_qseq>mvvcdepvsaldvsvqaavltllveiqqqhetamili</hsp_qseq> <hsp_hseq>mvicdepvsaldvsvqaavlnllneikeemgttmifi</hsp_hseq> <hsp_midline>mv+cdepvsaldvsvqaavlllei+++tmii</hsp_midline> </hsp> </hit_hsps> </hit> ... </iteration_hits> <iteration_stat> <statistics> <statistics_db-num>77704984</statistics_db-num> <statistics_db-len>28292933896</statistics_db-len> <statistics_hsp-len>0</statistics_hsp-len> <statistics_eff-space>0</statistics_eff-space> <statistics_kappa>0.041</statistics_kappa> <statistics_lambda>0.267</statistics_lambda> <statistics_entropy>0.14</statistics_entropy> </statistics> </iteration_stat> </iteration> </blastoutput_iterations> </blastoutput> 

file 2:

peptide abc transporter atpase, partial [kouleothrix aurantiaca] oligopeptide abc transporter [gamma proteobacterium bdw918] abc transporter atp-binding protein [desulfospira joergensenii] 

output should be:

<?xmlversion="1.0"?> <blastoutput> <blastoutput_program>rapsearch</blastoutput_program> <blastoutput_version>rapsearch2</blastoutput_version> <blastoutput_reference>yonganzhao,haixutangandyuzhenye.rapsearch2:afastandmemory-efficientproteinsimilaritysearchtoolfornextgenerationsequencingdata.bioinformatics2012,28(1):125-126</blastoutput_reference> <blastoutput_db>/mreferate/dwolff/rapsearch2.23/db/ncbi_nr_dec15</blastoutput_db> <blastoutput_param> <parameters> </parameters> </blastoutput_param> <blastoutput_iterations> <iteration> <iteration_iter-num>1</iteration_iter-num> <iteration_query-def>gene_id_1</iteration_query-def> <iteration_query-len>37</iteration_query-len> <iteration_hits> <hit> <hit_num>1</hit_num> <hit_id>gi|939543432|gb|kpv42113.1|</hit_id> <hit_def>peptide abc transporter atpase, partial [kouleothrix aurantiaca]</hit_def> <hit_accession>kpv42113.1</hit_accession> <hit_len>162</hit_len> <hit_hsps> <hsp> <hsp_num>1</hsp_num> <hsp_bit-score>58.151</hsp_bit-score> <hsp_score>139</hsp_score> <hsp_evalue>-5.6061</hsp_evalue> <hsp_query-from>1</hsp_query-from> <hsp_query-to>37</hsp_query-to> <hsp_hit-from>54</hsp_hit-from> <hsp_hit-to>90</hsp_hit-to> <hsp_query-frame>0</hsp_query-frame> <hsp_identity>28</hsp_identity> <hsp_positive>33</hsp_positive> <hsp_align-len>37</hsp_align-len> <hsp_qseq>mvvcdepvsaldvsvqaavltllveiqqqhetamili</hsp_qseq> <hsp_hseq>lvlcdepvsaldvsvqaavlnllleiqrehgttmifi</hsp_hseq> <hsp_midline>+v+cdepvsaldvsvqaavlll+eiq++htmii</hsp_midline> </hsp> </hit_hsps> </hit> <hit> <hit_num>2</hit_num> <hit_id>gi|385280362|gb|eif44286.1|</hit_id> <hit_def>oligopeptide abc transporter [gamma proteobacterium bdw918]</hit_def> <hit_accession>eif44286.1</hit_accession> <hit_len>327</hit_len> <hit_hsps> <hsp> <hsp_num>1</hsp_num> <hsp_bit-score>54.6842</hsp_bit-score> <hsp_score>130</hsp_score> <hsp_evalue>-4.56249</hsp_evalue> <hsp_query-from>1</hsp_query-from> <hsp_query-to>37</hsp_query-to> <hsp_hit-from>169</hsp_hit-from> <hsp_hit-to>205</hsp_hit-to> <hsp_query-frame>0</hsp_query-frame> <hsp_identity>24</hsp_identity> <hsp_positive>31</hsp_positive> <hsp_align-len>37</hsp_align-len> <hsp_qseq>mvvcdepvsaldvsvqaavltllveiqqqhetamili</hsp_qseq> <hsp_hseq>lvicdepvsaldvsvqaqiinllqelqtehntamlfi</hsp_hseq> <hsp_midline>+v+cdepvsaldvsvqa++lle+q+htam+i</hsp_midline> </hsp> </hit_hsps> </hit> <hit> <hit_num>3</hit_num> <hit_id>gi|550913550|ref|wp_022666548.1|</hit_id> <hit_def>abc transporter atp-binding protein [desulfospira joergensenii]</hit_def> <hit_accession>wp_022666548.1</hit_accession> <hit_len>721</hit_len> <hit_hsps> <hsp> <hsp_num>1</hsp_num> <hsp_bit-score>53.5286</hsp_bit-score> <hsp_score>127</hsp_score> <hsp_evalue>-4.21462</hsp_evalue> <hsp_query-from>1</hsp_query-from> <hsp_query-to>37</hsp_query-to> <hsp_hit-from>549</hsp_hit-from> <hsp_hit-to>585</hsp_hit-to> <hsp_query-frame>0</hsp_query-frame> <hsp_identity>27</hsp_identity> <hsp_positive>31</hsp_positive> <hsp_align-len>37</hsp_align-len> <hsp_qseq>mvvcdepvsaldvsvqaavltllveiqqqhetamili</hsp_qseq> <hsp_hseq>mvicdepvsaldvsvqaavlnllneikeemgttmifi</hsp_hseq> <hsp_midline>mv+cdepvsaldvsvqaavlllei+++tmii</hsp_midline> </hsp> </hit_hsps> </hit> ... </iteration_hits> <iteration_stat> <statistics> <statistics_db-num>77704984</statistics_db-num> <statistics_db-len>28292933896</statistics_db-len> <statistics_hsp-len>0</statistics_hsp-len> <statistics_eff-space>0</statistics_eff-space> <statistics_kappa>0.041</statistics_kappa> <statistics_lambda>0.267</statistics_lambda> <statistics_entropy>0.14</statistics_entropy> </statistics> </iteration_stat> </iteration> </blastoutput_iterations> </blastoutput> 

first trials write script gave no results , disastrous. hope can me.

i updated script match new xml structure new xml above.

check comments in code below:

    use strict;                                                                                                                                                                                           use warnings;     use xml::simple;      #first, parse xml hash     open $mf1,'<', 'my_xml.xml';     $xml = xmlin($mf1);     close $mf1;      =com    $xml sample     $var1 = {         'blastoutput_db' => '/mreferate/dwolff/rapsearch2.23/db/ncbi_nr_dec15',         'blastoutput_program' => 'rapsearch',         'blastoutput_param' => {         'parameters' => {}         },         'blastoutput_reference' => 'yonganzhao,haixutangandyuzhenye.rapsearch2:afastandmemory-efficientproteinsimilaritysearchtoolfornextgenerationsequencingdata.bioinformatics2012,28(1):125-126',         'blastoutput_version' => 'rapsearch2',         'blastoutput_iterations' => {             'iteration' => {                 'iteration_hits' => {                     'hit' => [                         {                         'hit_accession' => 'kpv42113.1',                         'hit_id' => 'gi|939543432|gb|kpv42113.1|',                         'hit_hsps' => {                         'hsp' => {                         'hsp_hseq' => 'lvlcdepvsaldvsvqaavlnllleiqrehgttmifi',                         'hsp_bit-score' => '58.151',                         'hsp_identity' => '28',                         'hsp_align-len' => '37',                         'hsp_query-frame' => '0',                         'hsp_query-from' => '1',                         'hsp_qseq' => 'mvvcdepvsaldvsvqaavltllveiqqqhetamili',                         'hsp_evalue' => '-5.6061',                         'hsp_midline' => '+v+cdepvsaldvsvqaavlll+eiq++htmii',                         'hsp_num' => '1',                         'hsp_positive' => '33',                         'hsp_hit-from' => '54',                         'hsp_score' => '139',                         'hsp_hit-to' => '90',                         'hsp_query-to' => '37'                         }                         },                     'hit_len' => '162',                     'hit_num' => '1',                     'hit_def' => 'xx'                     },                     {                     'hit_accession' => 'eif44286.1',                     'hit_id' => 'gi|385280362|gb|eif44286.1|',                     'hit_hsps' => {                     'hsp' => {                     'hsp_hit-from' => '169',                     'hsp_positive' => '31',                     'hsp_score' => '130',                     'hsp_query-to' => '37',                     'hsp_hit-to' => '205',                     'hsp_num' => '1',                     'hsp_midline' => '+v+cdepvsaldvsvqa++lle+q+htam+i',                     'hsp_align-len' => '37',                     'hsp_query-frame' => '0',                     'hsp_qseq' => 'mvvcdepvsaldvsvqaavltllveiqqqhetamili',                     'hsp_evalue' => '-4.56249',                     'hsp_query-from' => '1',                     'hsp_bit-score' => '54.6842',                     'hsp_identity' => '24',                     'hsp_hseq' => 'lvicdepvsaldvsvqaqiinllqelqtehntamlfi'                     }                     },                     'hit_def' => 'xx',                     'hit_len' => '327',                     'hit_num' => '2'                     },  =cut  # save second file array open $mf2, '<', 'file2'; chomp( @defs = <$mf2> ); close $mf2;  # update xml hash foreach $iteration ( @{ $xml->{'blastoutput_iterations'}{'iteration'}}){  foreach $hit ( @{$iteration->{'iteration_hits'}{'hit'}}){     $hit->{'hit_def'} = @defs[ $hit->{'hit_num'} - 1 ]; }}  # write new xml file1 open $mf1_new, '>', 'my_xml.xml'; xmlout($xml, outputfile => $mf1_new, noattr => 1, rootname => 'blastoutput' ); close $mf1_new;   

Comments

Popular posts from this blog

php - Wordpress website dashboard page or post editor content is not showing but front end data is showing properly -

How to get the ip address of VM and use it to configure SSH connection dynamically in Ansible -

javascript - Get parameter of GET request -