The courses introduces the student to the wide and competitive world of genomics business and growth opportunities as an industry. The lecture has been presented after rigorous market research and survey of literature and web resources by Molsys team.
Course Content Overview
Total Duration: 52 hr
Units
Unit Name
Duration
Unit 1: Introduction to Genomics 10 hr
Unit 2: Exploratory data analysis 18 hr
Unit 3: Next Generation Sequencing 08 hr
Unit 4: Genome Informatics
08 hr
Unit 5: Sequence Alignments and Phylogenetics 08 hr


Course Content Overview
Total Duration: 52 hr
Units
Unit Name
Duration
Unit 1: Introduction to Statistical Genomics 10 hr
Unit 2: Data Collection & Sampling 11 hr
Unit 3: Probability & Theoretical Distributions 13 hr
Unit 4: Data Clustering
11 hr
Unit 5: Statistical Genomics Case studies 08 hr


Course Content Overview
Total Duration: 52 hr
Units
Unit Name
Duration
Unit 1: Introduction to Bioprogramming 10 hr
Unit 2: Biopython 10 hr
Unit 3: Big Data and Data Management 08 hr
Unit 4: Introduction to Node.js and bio-node package
08 hr
Unit 5: Introduction to Hadoop and Spark 16 hr


Course Content Overview
Total Duration: 52 hr
Units
Unit Name
Duration
Unit 1: Introduction to Cloud Computing 10 hr
Unit 2: Introduction to Job Schedulers and Node Management 10 hr
Unit 3: Introduction to Genomics on Cloud 10 hr
Unit 4: Introduction to NGS analysis with Hadoop frameworks
12 hr
Unit 5: Introduction to NGS-analysis using Spark 10 hr


  1. Sequence alignment –BLAST and FASTA
  2. Multiple sequences alignment and Protein and DNA motif searches.
  3. Evolutionary studies / Phylogenetic analysis – Analysis of parameters affecting trees.
  4. Gene Prediction for Prokaryotes and Eukaryotes genome
  5. Working with NGS databases and NGS file formats.
  6. Quality checking and trimming using free and commercial software.
  7. Bacterial genome assembly using Velvet and Soap Denovo assembly.
  8. References genome assembly using BWA and CLC Genomic Workbench.
  9. Genome Annotation using Gene Ontology (GO).
  10. Identification of SNPs using Cancer genome datasets (GATK Pipeline).
  11. Genome browser- UCSC and Ensemble genome browser and comparative genomics.
  12. Whole genome (WGS), Transcriptome (RNA, Exome) and Chip-Seq analysis using Cloud based server.
  13. Unix/Linux Command Line mode, file and directory handling, Vi Editor.
  14. Unix shell scripts – conditional operators, looping, string handling.
  15. Basic R commands, Normalization and Gene expression studies on GEO datasets.
  1. Overview of Python – Working with nucleic acid and protein sequences.
  2. Epigenetics – ChIPSeq, genotyping arrays and GWAS analysis
  3. Multivariate Analysis
  4. Multi-level Statistical Tests and Annova Test
  5. Basic operations in Excel sheet and calculation of big datasets.
  6. Python Programming Advanced
  7. Python file handling and data manipulation
  8. Bash Scripting
  9. Programming on Hadoop and Spark Basics
  10. Programming on DBMS and NodeJS
  11. Programming in Hadoop and Spark Advanced level
  12. Exercises on AWS platform: Data storage and access
  13. Data analysis pipeline & deployment in AWS