Using xml Syntax Highlighting
- <?xml version="1.0" encoding="UTF-8"?>
- <!DOCTYPE eSummaryResult PUBLIC "-//NLM//DTD esummary v1 20060131//EN" "http://eutils.ncbi.nlm.nih.gov/eutils/dtd/20060131/esummary-v1.dtd">
- <eSummaryResult>
- <DocSum>
- <Id>1811</Id>
- <Item Name="AssayName" Type="String">Experimentally measured binding affinity data derived from PDB</Item>
- <Item Name="AssayDescription" Type="String">This data entry provides a collection of experimentally measured binding affinity data (Kd, Ki, and IC50), which are exclusively for the protein-ligand complexes available in the Protein Data Bank (PDB). All of the binding affinity data compiled in this data entry are cited from original references. This work is contributed by the PDBbind database.</Item>
- <Item Name="ReadoutCount" Type="Integer">0</Item>
- <Item Name="SourceNameList" Type="List">
- <Item Name="string" Type="String">Shanghai Institute of Organic Chemistry</Item>
- </Item>
- <Item Name="ActiveSidCount" Type="Integer">3073</Item>
- <Item Name="ActivityOutcomeMethod" Type="String">other</Item>
- <Item Name="InactiveSidCount" Type="Integer">0</Item>
- <Item Name="InconclusiveSidCount" Type="Integer">0</Item>
- <Item Name="TotalSidCount" Type="Integer">3073</Item>
- <Item Name="XRefDburlList" Type="List">
- <Item Name="string" Type="String">http://www.sioc.ac.cn/esioc/1.htm</Item>
- </Item>
- <Item Name="XRefAsurlList" Type="List">
- <Item Name="string" Type="String">http://www.pdbbind.org.cn</Item>
- <Item Name="string" Type="String">http://www.pdbbind.org.cn</Item>
- </Item>
- <Item Name="ModifyDate" Type="Date">2010/07/01 00:00</Item>
- <Item Name="DepositDate" Type="Date">2009/06/08 00:00</Item>
- <Item Name="HoldUntilDate" Type="Date">1/01/01 00:00</Item>
- <Item Name="AID" Type="Integer">1811</Item>
- <Item Name="TotalCidCount" Type="Integer">2438</Item>
- <Item Name="ActiveCidCount" Type="Integer">2438</Item>
- <Item Name="ProteinTargetList" Type="List">
- <Item Name="ProteinTarget" Type="Structure">
- <Item Name="Name" Type="String">Chain 1, Crystal Structure Of Gamma-Chymotrypsin In Complex With 7- Hydroxycoumarin</Item>
- <Item Name="GI" Type="Integer">17943055</Item>
- <DocSum>
- <DocSum>
- <Id>648328</Id>
- <Item Name="AssayName" Type="String">Cytotoxicity against human A549 cells at 10 to 100 uM after 48 hrs by MTT assay</Item>
- <Item Name="AssayDescription" Type="String">Title: New antitumor compounds from Carya cathayensis. Abstract: A new lignan (7R,8S,8'R)-4,4',9-trihydroxy-7,9'-epoxy-8,8'-lignan, and three new phenolics, carayensin-A, carayensin-B, and carayensin-C, together with 13 known compounds were isolated from the shells of Carya cathayensis. Their chemical structures were established mainly by 1D and 2D NMR techniques and mass spectrometry. All the compounds were evaluated for cytotoxicity against several human tumor types including human colorectal cancer cell lines (HCT-116, HT-29), human lung cancer cell line (A549), and human breast cancer cell line (MCF-7). The compounds 1, 5, 6, and 16 are considered to be potential as antitumor agents, which could significantly inhibit the cancer cell growth in a dose-dependent manner.</Item>
- <Item Name="ReadoutCount" Type="Integer">5</Item>
- <Item Name="SourceNameList" Type="List">
- <Item Name="string" Type="String">ChEMBL</Item>
- </Item>
- <Item Name="ActiveSidCount" Type="Integer">8</Item>
- <Item Name="ActivityOutcomeMethod" Type="String"></Item>
- <Item Name="InactiveSidCount" Type="Integer">0</Item>
- <Item Name="InconclusiveSidCount" Type="Integer">0</Item>
- <Item Name="TotalSidCount" Type="Integer">8</Item>
- <Item Name="XRefDburlList" Type="List">
- <Item Name="string" Type="String">http://www.ebi.ac.uk/chembldb/</Item>
- </Item>
- <Item Name="XRefAsurlList" Type="List">
- <Item Name="string" Type="String">http://www.ebi.ac.uk/chembldb/index.php/assay/inspect/805776</Item>
- <Item Name="string" Type="String">http://www.ebi.ac.uk/chembldb/index.php/assay/inspect/805776</Item>
- <Item Name="string" Type="String">http://www.ebi.ac.uk/chembldb/index.php/assay/inspect/805776</Item>
- </Item>
- <Item Name="ModifyDate" Type="Date">2013/07/07 00:00</Item>
- <Item Name="DepositDate" Type="Date">2012/09/09 00:00</Item>
- <Item Name="HoldUntilDate" Type="Date">1/01/01 00:00</Item>
- <Item Name="AID" Type="Integer">648328</Item>
- <Item Name="TotalCidCount" Type="Integer">8</Item>
- <Item Name="ActiveCidCount" Type="Integer">8</Item>
- <Item Name="ProteinTargetList" Type="List"></Item>
- </DocSum>
- <DocSum>
- <Id>399372</Id>
- <Item Name="AssayName" Type="String">Deterrent activity against Perknaster fuscus assessed as induction of sustained retractions of tube-feet by sea-star deterrent assay</Item>
- <Item Name="AssayDescription" Type="String">Title: Purine and nucleoside metabolites from the Antarctic sponge Isodictya erinacea. Abstract: The bright yellow sponge Isodictya erinacea is one of several chemically defended sponges found on the benthos of McMurdo Sound, Antarctica. An investigation of the metabolites from this sponge has resulted in the isolation of purine and nucleoside metabolites, including the previously unreported erinacean (1) and p-hydroxybenzaldehyde. The latter metabolite has been demonstrated to cause a feeding deterrence behavior in Perknaster fuscus, the major predator of antarctic sponges.</Item>
- <Item Name="ReadoutCount" Type="Integer">5</Item>
- <Item Name="SourceNameList" Type="List">
- <Item Name="string" Type="String">ChEMBL</Item>
- </Item>
- <Item Name="ActiveSidCount" Type="Integer">1</Item>
- <Item Name="ActivityOutcomeMethod" Type="String"></Item>
- <Item Name="InactiveSidCount" Type="Integer">6</Item>
- <Item Name="InconclusiveSidCount" Type="Integer">0</Item>
- <Item Name="TotalSidCount" Type="Integer">7</Item>
- <Item Name="XRefDburlList" Type="List">
- <Item Name="string" Type="String">http://www.ebi.ac.uk/chembldb/</Item>
- </Item>
- <Item Name="XRefAsurlList" Type="List">
- <Item Name="string" Type="String">http://www.ebi.ac.uk/chembldb/index.php/assay/inspect/547401</Item>
- <Item Name="string" Type="String">http://www.ebi.ac.uk/chembldb/index.php/assay/inspect/547401</Item>
- </Item>
- <Item Name="ModifyDate" Type="Date">2013/05/08 00:00</Item>
- <Item Name="DepositDate" Type="Date">2010/05/26 00:00</Item>
- <Item Name="HoldUntilDate" Type="Date">1/01/01 00:00</Item>
- <Item Name="AID" Type="Integer">399372</Item>
- <Item Name="TotalCidCount" Type="Integer">7</Item>
- <Item Name="ActiveCidCount" Type="Integer">1</Item>
- <Item Name="ProteinTargetList" Type="List"></Item>
- </DocSum>
- </eSummaryResult>
Coloreado en 0.004 segundos, usando GeSHi 1.0.8.4
Lo que está entre </DocSum> corresponde a una entrada en la base de datos. A mi me interesa extraer las líneas que contengan:
<Item Name="GI" Type="Integer">
Eso es sencillo. El problema que tengo es que antes de obtener esas líneas quiero evitar las entradas que contengan la siguiente línea:
<Item Name="AssayName" Type="String">Experimentally measured binding affinity data derived from PDB</Item>
Para ello he escrito el siguiente código:
Using perl Syntax Highlighting
- #! /usr/bin/perl
- my $file=$ARGV[0];
- open FILE, $file;
- my @array=<FILE>;
- chomp @array;
- close FILE;
- my $skip_pdb='<Item Name="AssayName" Type="String">Experimentally measured binding affinity data derived from PDB</Item>';
- my $gi_id='<Item Name="GI" Type="Integer">';
- for ($i=0;$i<scalar(@array); $i++){
- if ($array[$i]=~ '<DocSum>'){
- my $j=$i+2;
- my $k=$i+29;
- if ($array[$j]=~ $skip_pdb){
- next;
- }
- print "$array[$k]\n";
- }
- }
Coloreado en 0.002 segundos, usando GeSHi 1.0.8.4
El código funciona, es posible extraer las líneas deseadas (las cuales están en $k). El problema surge cuando intento utilar otros archivos ya que la estructura no es fija, es decir, la línea deseada no siempre está en "$i + 29". ¿Alguna sugerencia?
¡Saludos y gracias!