Feature Engineering for Lung Cancer Classification Using Next Generation Sequencing Data

  • Syed Naseer Ahmad Shah Department of Computer Science, Jamia Millia Islamia, New Delhi-110025, India
  • Rafat Parveen Department of Computer Science, Jamia Millia Islamia, New Delhi-110025, India
Keywords: Next-generation sequencing, lung cancer, feature engineering, machine learning, dimensionality reduction, classification, Support Vector Machine.

Abstract

Next-generation sequencing (NGS) has profoundly transformed the field of genomics with its ability to detect molecular findings on a large scale, particularly for the somatic genome.
Research on complex diseases such as lung cancer has shifted significantly as NGS technology provides an efficient method to unravel the genetic fingerprint of this extensively studied disorder. This advancement has opened new pathways for understanding the molecular underpinnings of lung cancer, facilitating more targeted approaches in diagnosis, treatment, and research. While NGS data are high dimensional and complex, they pose significant challenges to data analysis and classification tasks. In this paper, we investigated feature engineering to improve the classification accuracy of lung cancer using NGS data. The goal of these methods of dimensionality reduction, feature selection, and transformation techniques is to improve machine learning's predictive power. In this work, the dimensionality reduction method, Principal Component Analysis (PCA), is used to optimise feature selection. Advanced transformation techniques like normalisation and scaling are applied to optimise the data for better model performance. The efficacy of these techniques is evaluated through a comprehensive comparison of various machine learning classifiers, with a focus on Support Vector Machine (SVM). The results demonstrate that efficient feature engineering, particularly PCA, enhances the classification accuracy and robustness of lung cancer prediction models, providing valuable insights for the development of precision medicine approaches in oncology.

Downloads

Download data is not yet available.
Published
2025-04-21
How to Cite
Syed Naseer Ahmad Shah, & Rafat Parveen. (2025). Feature Engineering for Lung Cancer Classification Using Next Generation Sequencing Data. IJRDO -Journal of Computer Science Engineering, 11(1), 31-34. https://doi.org/10.53555/cse.v11i1.6378