Un buen día, explorer.
He creado un
script para obtener unas secuencias de un archivo fasta.
Con el código anterior que me diste solo me saca el encabezado pero no me saca la secuencia.
Los datos son estos:
Using text Syntax Highlighting
>ENST00000415118 havana_ig_gene:known chromosome:GRCh38:14:22438547:22438554:1 gene:ENSG00000223997 gene_biotype:TR_D_gene transcript_biotype:TR_D_gene
GAAATAGT
>ENST00000448914 havana_ig_gene:known chromosome:GRCh38:12:22449113:22449125:1 gene:ENSG00000228985 gene_biotype:TR_D_gene transcript_biotype:TR_D_gene
ACTGGGGGATACG
>ENST00000431870 havana_ig_gene:known chromosome:GRCh38:7:105894508:105894523:-1 gene:ENSG00000227800 gene_biotype:IG_D_gene transcript_biotype:IG_D_gene
TGACTACGGTGACTAC
>ENST00000414852 havana_ig_gene:known chromosome:GRCh38:14:105913222:105913237:-1 gene:ENSG00000233655 gene_biotype:IG_D_gene transcript_biotype:IG_D_gene
TGACTACAGTAACTAC
>ENST00000390578 havana_ig_gene:known chromosome:GRCh38:7:105897957:105897987:-1 gene:ENSG00000211918 gene_biotype:IG_D_gene transcript_biotype:IG_D_gene
AGGATATTGTAGTGGTGGTAGCTGCTACTCC
>ENST00000390571 havana_ig_gene:known chromosome:GRCh38:14:105886031:105886061:-1 gene:ENSG00000211911 gene_biotype:IG_D_gene transcript_biotype:IG_D_gene
GTATTACTATGATAGTAGTGGTTATTACTAC
>ENST00000390577 havana_ig_gene:known chromosome:GRCh38:12:105895634:105895670:-1 gene:ENSG00000211917 gene_biotype:IG_D_gene transcript_biotype:IG_D_gene
GTATTATGATTACGTTTGGGGGAGTTATCGTTATACC
>ENST00000390575 havana_ig_gene:known chromosome:GRCh38:14:105893542:105893561:-1 gene:ENSG00000211915 gene_biotype:IG_D_gene transcript_biotype:IG_D_gene
GTGGATACAGCTATGGTTAC
>ENST00000452198 havana_ig_gene:known chromosome:GRCh38:14:105881539:105881556:-1 gene:ENSG00000225825 gene_biotype:IG_D_gene transcript_biotype:IG_D_gene
GGGTATAGCAGCGGCTAC
>ENST00000604446 havana_ig_gene:known chromosome:GRCh38:15:21010494:21010516:-1 gene:ENSG00000270824 gene_biotype:IG_D_gene transcript_biotype:IG_D_gene
GTGGATATAGTGTCTACGATTAC
Coloreado en 0.000 segundos, usando
GeSHi 1.0.8.4
El resultado que espero en mi archivo de salida es:
Using text Syntax Highlighting
>ENST00000431870 havana_ig_gene:known chromosome:GRCh38:7:105894508:105894523:-1 gene:ENSG00000227800 gene_biotype:IG_D_gene transcript_biotype:IG_D_gene
TGACTACGGTGACTAC
>ENST00000390578 havana_ig_gene:known chromosome:GRCh38:7:105897957:105897987:-1 gene:ENSG00000211918 gene_biotype:IG_D_gene transcript_biotype:IG_D_gene
AGGATATTGTAGTGGTGGTAGCTGCTACTCC
Coloreado en 0.000 segundos, usando
GeSHi 1.0.8.4
El
script es:
use strict;
use Data::Dumper;
my @list_of_lines;
my @list_out;
my $r=0;
open my $INPUT, "< $ARGV[0]";
open my $OUT, " > $ARGV[0].out";
### Puts every line of the archive in an array called @list_of_lines
while (<($INPUT)>) {
chomp;
push @list_of_lines, $_;
}
CONTINUE:
for (my $i = $r ; $i <= $#list_of_lines; $i++ ) {
### Read until the header is found
if ($list_of_lines[$i] =~ /^>:GRCh38:7:/) {
### $r counts the number of lines before the next header
for ($r = $i; $r <= $#list_of_lines; $r++) {
push @list_out, $list_of_lines[$r];
if ($list_of_lines[$r+1] =~ s/\r\n/n/) {
goto CONTINUE;
}
}
}
}
for (my $d <= $#list_out) {
if ($list_out[$d] =~ /^>/) {
$list_out[$d] =~ s/\s.+//;
}
}
warn Dumper\@list_out;
print $OUT join ("\n",@list_out);
close $INPUT;
close $OUT;
exit;
pero no me funciona. Le agradezco su colaboración.