Evolved cellular automata for protein secondary structure prediction imitate the determinants for folding observed in nature
We demonstrate the first application of cellular automata to the secondary structure predictions of proteins. Cellular automata use localized interactions to simulate global phenomena, which resembles the protein folding problem where individual residues interact locally to define the global protein conformation. The protein's amino acid sequence was input into the cellular automaton and rules for updating states were evolved using a genetic algorithm. An optimized accuracy (Q3) for the RS126 and CB513 dataset of 58.21% and 56.51%, respectively, could be obtained. Thus, the current work demonstrates the applicability of a rather simple algorithm on a problem as complex as protein secondary structure prediction.
The zipped file which can be downloaded below contains the full text of paper and the accompanying code:
Paper
evoca_prot.pdf -- The full text of the paper
Main Executable File
framework.py
Classes
* Chain_SS -- relates a chain's secondary structure (DSSP notation),
solvent accessibility to its amino acid sequence
Modules Used
* Random -- Used for Random number generation
* cPickle -- To dump and load objects (Chain_SS, etc) from files
* GA -- Genetic Algorithm for evolving parameters
File Details
* framework.py -- Main executable
* chain_ss.py -- Contains the class definition for Chain_SS
* global_info.py -- contains global variables and definitions
Folder Details
* data/RS126_dump -- has 126 chain_ss dumps from RS126 dataset
* data/RS126 -- has RS126 dataset in fasta format
* data/CB513_dump -- has 513 chain_ss dumps from CB513 dataset
* data/CB513 -- has CB513 dataset in fasta format




