This demonstration proposes to analyse 5 different datasets with 5 different types of analysis.
First, please download the following input files:
- Breast tumor Affymetrix dataset 1 subsequently called X (right-click, save-as). The expression of 22, 215 genes on 89 early stage breast tumors (stage I and II) aggressively treated with either tamoxifen or adjuvant chemotherapy.
- Breast tumor Affymetrix dataset 2 subsequently called Xprim (right-click, save-as). The expression of 20,681 genes on 89 early stage breast tumors (stage I and II). Xprim is a subset of X that does not contain genes located on chromosome 8.
- Breast tumor CGH dataset subsequently called Yprim (right-click, save-as). The data set contains 138 unique probes on chromosome 8 on the same matching tumors.
- Breast tumor class dataset (right-click, save-as). A vector components containing the estrogen receptor status of the 89 patients.
Sparse PLS analysis: the aim of sPLS is to select correlated or co-jointly expressed variables from the two data sets.
In this analysis we would like to identify genes that contribute to breast cancer pathophysiologies
when deregulated by recurrent aberrations. In this
.pdf file, you can find a whole illustration on the results obtained on these data sets. To directly go to the results page on this analysis, go
here.
You can now start the wizard by clicking on the "next" button at the bottom of the page.
The following pages present several options to be chosen by the user. Below, we recommend to use the chosen parameters
in the following format:
Name of the option |
Recommended parameter
(Other options can be chosen but will not be covered in this demonstration).
- Please choose your methodology | (s)PLS
- Please choose your parameters
- Approach | Sparse PLS
- Number of components | 2
- Please choose the mode of computation | Canonical
- How many variables to keep on dimension 1 for X: | 100
- How many variables to keep on dimension 2 for X: | 50
- How many variables to keep on dimension 1-2 for Y: | 138
- Please choose your graphical outputs parameters
- What should be the Correlation Threshold? | 0.7
- Display the X variables label? | Unchecked
- Display the samples label? | Checked
- Add colors to the graphical output? | Checked
- Display two-dimensional visualizations of the correlation matrices within and between two data sets? | Unchecked
- Please choose your export parameters
- Save a Cytoscape-compatible file of the networks representation? | Checked
- Save a GSEA-compatible file for the genes selected with the analysis from the X dataset? | Checked
- Save a GSEA-compatible file of the class dataset? | Checked
- Retrieve the gene information from the iHOP database for the genes selected with the analysis from the X dataset? | Checked
- Source of accession numbers: | Online Mendelian Inheritance in Man
- Type of information desired: | Genes interaction
- Please upload your datasets
- X dataset: | X_affy.csv
- Xprim dataset: | Xprim_affy.csv
- Y dataset: | Yprim_cgh.csv
- Class vector: | ER_status.csv
First, please download the following input files:
- Multidrug ABC transporter dataset (right-click, save-as). The expression of the 48 human ABC transporters measured by real-time quantitative RT-PCR for each cell line. There is one missing value in this data set.
- Multidrug class dataset (right-click, save-as). A vector components containing the phenotypes of the 60 cell lines.
PCA analysis: the aim of PCA is to discover interesting aspects in the data and to reduce the dimension of the data
in a smaller number of variables (the "principal components") which summarize most of the information in the data set.
It also allows identifying possible artifacts. In this analysis we would like to understand the relationships between
the expression of the ABC transporters and how they relate to the different cell lines type.
You can now start the wizard by clicking on the "next" button at the bottom of the page.
The following pages present several options to be chosen by the user. Below, we recommend to use the chosen parameters
in the following format:
Name of the option |
Recommended parameter
(Other options can be chosen but will not be covered in this demonstration).
- Please choose your methodology | (s)PCA
- Please choose your parameters
- Approach | PCA
- Number of components | 2
- Center the data | Checked
- Scale the data | Checked
- Please choose your graphical outputs parameters
- Display the X variables label? | Checked
- Display the samples label? | Unchecked
- Add colors to the graphical output? | Checked
- Please choose your export parameters
- Please upload your datasets
- X dataset: | multi_abc_trans.csv
- Class vector: | multi_colour.csv
First, please download the following input files:
- Nutrimouse gene dataset (right-click, save-as). Expressions of 120 genes measured in liver cells for 40 mouse, selected (among about 30,000) as potentially relevant in the context of the nutrition study.
- Nutrimouse lipid dataset (right-click, save-as). Concentrations (in percentages) of 21 hepatic fatty acids measured by gas chromatography in the same mouse.
- Nutrimouse class dataset (right-click, save-as). 5-levels factor. Oils used for experimental diets preparation were corn and colza oils (50/50) for a reference diet (REF), hydrogenated coconut oil for a saturated fatty acid diet (COC), sunflower oil for an Omega6 fatty acid-rich diet (SUN), linseed oil for an Omega3-rich diet (LIN) and corn/colza/enriched fish oils for the FISH diet (43/43/14).
rCCA analysis: the aim of rCCA is to highlight the correlations between the fatty acids and the genes and see how these
variables illustrate the effects of the genotypes and the diets of the mice.
You can now start the wizard by clicking on the "next" button at the bottom of the page.
The following pages present several options to be chosen by the user. Below, we recommend to use the chosen parameters
in the following format:
Name of the option |
Recommended parameter
(Other options can be chosen but will not be covered in this demonstration).
- Please choose your methodology | (r)CCA
- Please choose your parameters
- Number of components | 3
- Lambda 1 value: | 0.064
- Lambda 2 value: | 0.008096
- Please choose your graphical outputs parameters
- What should be the Correlation Threshold? | 0.6
- Display the X variables label? | Checked
- Display the samples label? | Unchecked
- Add colours to the graphical output? | Checked
- Display two-dimensional visualizations of the correlation matrices within and between two data sets? | Unchecked
- Display a color-coded Clustered Image Maps (CIMs) ("heat maps")? | Checked
- Please choose your export parameters
- Save a Cytoscape-compatible file of the networks representation? | Checked
- Please upload your datasets
- X dataset: | nutri_gene.csv
- Y dataset: | nutri_clinic.csv
- Class vector: | nutri_colour.csv
First, please download the following input files:
sPLs analysis: the aim of sPLS is to select correlated or co-jointly expressed variables from the two data sets.
In this specific example, we want to know if there exists a subset of clinical variables and a subset of transcripts
that can give more insight into the paracetamol toxicity in the liver.
You can now start the wizard by clicking on the "next" button at the bottom of the page.
The following pages present several options to be chosen by the user. Below, we recommend to use the chosen parameters
in the following format:
Name of the option |
Recommended parameter
(Other options can be chosen but will not be covered in this demonstration).
- Please choose your methodology | (s)PLS
- Please choose your parameters
- Approach | Sparse PLS
- Number of components | 3
- Please choose the mode of computation | Regression
- How many variables to keep on dimension 1-3 for X: | 50
- How many variables to keep on dimension 1-3 for Y: | 5
- Please choose your graphical outputs parameters
- What should be the Correlation Threshold? | 0.7
- Display the X variables label? | Checked
- Display the samples label? | Unchecked
- Add colors to the graphical output? | Checked
- Display two-dimensional visualizations of the correlation matrices within and between two data sets? | Unchecked
- Display a color-coded Clustered Image Maps (CIMs) (heatmap of the variables)? | Unchecked
- Please choose your export parameters
- Save a Cytoscape-compatible file of the networks representation? | Checked
- Save a GSEA-compatible file for the genes selected with the analysis from the X dataset? | Checked
- Save a GSEA-compatible file of the class dataset? | Checked
- Retrieve the gene information from the iHOP database for the genes selected with the analysis from the X dataset? | Checked
- Source of accession numbers: | NCBI Gene ID
- Type of information desired: | Genes interaction
- Please upload your datasets
- X dataset: | liver_gene.csv
- Y dataset: | liver_clinic.csv
- Class vector: | liver_colour.csv
First, please download the following input files:
sIPCA analysis: IPCA can be used as an alternative to PCA to sometimes obtain more "meaningful" components. The aim is to discover interesting aspects
in the data and to reduce the dimension of the data in a smaller number of variables (the "independent principal components")
which summarize most of the information in the data set. In this analysis we would like to see if the gene expression can help clustering the samples
according to the biological conditions. The sparse version allows to select the relevant genes.
You can now start the wizard by clicking on the "next" button at the bottom of the page.
The following pages present several options to be chosen by the user. Below, we recommend to use the chosen parameters
in the following format:
Name of the option |
Recommended parameter
(Other options can be chosen but will not be covered in this demonstration).
- Please choose your methodology | (s)IPCA
- Please choose your parameters
- Approach | Sparse IPCA
- Number of components | 2
- Please choose the mode of computation | Deflation
- How many variables to keep on dimension 1-2 for X: | 50
- Please choose your graphical outputs parameters
- Display the X variables label? | Checked
- Display the samples label? | Unchecked
- Add colors to the graphical output? | Checked
- Please choose your export parameters
- Save a GSEA-compatible file for the genes selected with the analysis from the X dataset? | Unchecked
- Save a GSEA-compatible file of the classes dataset? | Ucnhecked
- Retrieve the gene information from the iHOP database for the genes selected with the analysis from the X dataset? | Checked
- Source of accession numbers: | NCBI Gene ID
- Type of information desired: | Genes interaction
- Please upload your datasets
- X dataset: | liver_gene_id.csv
- Class vector: | liver_toxicity_timeanddose.csv
First, please download the following input files:
sPLS-DA analysis: the aim of the analysis is to select the discriminative genes that can help separating the different
classes of tumours. Based on this subset of potentially biomarkers extracted from our training data, we can then assess
if we can correctly predict the classes of a new sample test set.
You can now start the wizard by clicking on the "next" button at the bottom of the page.
The following pages present several options to be chosen by the user. Below, we recommend to use the chosen parameters
in the following format:
Name of the option |
Recommended parameter
(Other options can be chosen but will not be covered in this demonstration).
- Please choose your methodology | (s)PLS-DA
- Please choose your parameters
- Approach | Sparse PLS-DA
- Predict the class of samples from a new test set? | Checked
- Number of components | 3
- How many variables to keep on dimension 1-3: | 50
- Please choose your graphical outputs parameters
- Display the X variables label? | Checked
- Display the samples label? | Unchecked
- Please choose your export parameters
- Save a Cytoscape-compatible file of the networks representation? | Checked
- Save a GSEA-compatible file for the genes selected with the analysis from the X dataset? | Checked
- Save a GSEA-compatible file of the class dataset? | Checked
- Retrieve the gene information from the iHOP database for the genes selected with the analysis from the X dataset? | Unchecked
- Please upload your datasets
- Train dataset X (size n x p): | srbct_gene_train.csv
- Train class vector Y (length n): | srbc_class_train.csv
- Test dataset X' (size n' x p): | srbct_gene_test.csv