Efficient Data Mining Algorithms for Screening Potential Proteins of Drug Targets

The past decades of years has witnessed the booming of pharmacology as well as the dilemma of drug development. Playing in a crucial role in drug design, retrieval of potential proteins of drug targets from open access database is an overwhelming task of challenge but significance, receiving much attention from areas of academia and medical engineering. Human proteins with several measured physical-chemical properties can be well mined for screening reliable candidates, which would accelerate the process of drug development and bring the reduction in expense. In this paper, retrieval of potential drug target proteins (DTPs) from a fine collected dataset was researched from the perspective of data mining. Puzzled with the uncertainty of non-drug target proteins (NDTPs), proposed two strategies assisting the construction of dataset of reliable NDTPs and bagging method was combined in the final prediction. Such two-stage algorithms have shown their effectiveness and superior performance after analyzing the results on testing set. Both of algorithms maintain higher recall ratios of DTPs, indicating the paradigms work well for the task. In one turn of experiments, strategy1-based bagging algorithm retrieved about 467 possible DTPs as well as 1771 potential DTPs predicted in strategy2-based bagging algorithm. Meanwhile, two strategies based algorithms behaved consensus in the predicting results with about 375 likely DTPs in common. These selected likely DTPs provide reliable choices for further verification based on biomedical experiments.




National University of Defense Tecnology
Changsha, Hunan 410073