2. Quick Start

2.1. ANOVA analysis

If you already know what you can do with GDSCTools, we assume you have a well formatted IC50 matrix and a genomic features binary matrix. Then, you can run the entire ANOVA analysis as follows:

from gdsctools import ANOVA
# For example, use these test files
# from gdsctools import ic50_test as ic50_filename
# from gdsctools import gf_v17 as genomic_feature_filename
gdsc = ANOVA(IC50_filename, genomic_feature_filename)
results = gdsc.anova_all()

And create an HTML report as follows:

from gdsctools import ANOVAReport
report = ANOVAReport(gdsc, results)
report.create_html_pages()

The results variable contains all tested associations within a single dataframe. The report will focus on significant associations and create boxplots or volcano plots accordingly.

More details about the ANOVA analysis itself can be found in the ANOVA analysis (introduction) and The ANOVA analysis in details sections. The data structure can be found in the next section (Data Format and Readers).

2.2. Regression analysis

Similarly, for the regression analysis, one can write a script as above:

from gdsctools import GDSCLasso
lasso = GDSCLasso(IC50_filename, genomic_feature_filename)
for drugid in lasso.drugIds:
    res = lasso.runCV(drugid, kfolds=8)
    best_model = lasso.get_model(alpha=res.alpha)
    weights = lasso.plot_weight(drugid, best_model)
    boxplots = lasso.boxplot(drugid, model=best_model, n=10, bx_vert=False)

However, we could recommend to use a worflow designed for this analysis. In a shell, type:

gdsctools_regression -I IC50_filename -F genomic_feature_filename
    --method lasso -o analysis
cd analysis

Edit the config.yaml file to change any parameters. Then, execute the pipeline:

snakemake -s regression.rules

See Regression analysis section for details.