There are ~20,500 annotated protein-coding genes in the human genome which constitutes nearly 2% of the entire genome. This gene set serves as reference dataset in biological studies. However, transcriptome profiling studies have revealed that a large fraction of the human genome is transcribed and most of it is considered to be non-coding. Various ribosome profiling studies in the past decade have identified significant ribosomal occupancy on lncRNAs. Translational efficiency of these transcripts is often comparable with mRNAs. Moreover, mass-spectrometry based studies have provided direct protein-level evidence for a subset of proteins encoded by annotated ncRNAs. In the last 5 years, small proteins encoded by these annotated non-coding regions have been shown to play an important role in various biological processes including development, muscle performance and DNA repair. This suggests that genome annotation pipelines have probably incorrectly annotated some of the protein-coding regions as non-coding. We have systematically analyzed publicly available transcriptome, Ribo-Seq and mass spectrometry datasets using comparative genomics and bioinformatics approaches to evaluate the protein-coding potential of annotated non-coding RNAs. We are developing a machine-learning algorithm that defines various aspects of RNA-protein translation paradigm to identify potential protein-coding candidates among annotated ncRNAs. Identification of novel proteins encoded by ncRNAs will enable researchers to further explore cellular functions regulated by these proteins and their role in various human diseases.