By signatures of perturbagen response (which we call pharmaco-response signatures) we mean a set of reduced dimensionality descriptors of the underlying data that provide insight into mechanism and serve as predictors. To develop meaningful signatures, data from diverse assays must be scaled and normalized. Normalization and scaling of heterogeneous, multi-parameter dose-response data is a non-trivial task involving conceptual and practical hurdles. We need to translate a very diverse set of measurements into comparable units. Initially, we will most likely combine data into simple sets of biologically informative parameters, such as the fraction of cells in G0, mitosis, mesenchymal state etc. to reduce their dimensionality. We will also explore more complex methods that explicitly take cellular heterogeneity into account.
Normalization by z-scores
A z-score, which can take positive or negative values, is assigned based on how many standard deviations the mean response in a given well is from that of control wells (typically on the same plate to minimize experimental noise). Z-scores reduce diverse measurements to the same unit that has real meaning, albeit statistical rather than mechanistic. They also allow scoring of cellular heterogeneity, by use of non-parametric statistics. We took this approach to normalization in a previous large-scale microscopy analysis of dose-response data (Perlman et al. 2004 PMID 15539606). Z-scores represent a purely statistical view of the data, with no mechanistic hypothesis. Individual measurements made before and during drug treatment are not explicitly connected and are scored separately such that every measurement at every drug concentration and time is compared to the same measurement in all other cell lines. They can be used to derive dose-response data, calculate EC50s, and to perform unsupervised clustering of perturbagens, or cell lines, into statistically significant subsets where, for example, targets are different, or networks are differently wired.
Fractional response units
In this approach, data are normalized and scaled such that each measurement is represented as a fractional change in activity in post- drug treatment data relative to the pre-treatment condition. This inevitably imposes a cause-effect structure on the data, namely that measured values change as a consequence of drug treatment. A signature based on fractional responses makes the response itself the key parameter, not absolute pre- or post-treatment values. Data of this type is ideally suited to regression analysis based on partial least squares (PLSR) methods in which the minimum set of measurements predictive of drug response is identified. Unit-variance scaling can be applied to further normalize fractional response datasets, so each measurement contributes appropriately to mechanistic or statistical models based on the data. The Sorger lab has considerable experience with this type of modeling.
IC50, EC50 and Emax units
A conventional, and powerful, approach in pharmacology is to assume a monotonic, S-shaped relationship between drug concentration and biological effect, and report the drug concentration causing a half maximum effect (EC50), the level at the high concentration plateau (Emax), and (sometimes) a slope parameter. IC50 and Emax parameters for pure protein reactions have simple meanings, related to Michaelis-Menten Km and Vmax parameters. They are more difficult to interpret for cellular responses; a cellular EC50 is NOT the same as the pure-protein EC50 for many reasons! These parameters have nevertheless been extremely useful, and are a natural reporting unit for small molecule perturbagen responses. We will compute them for our dose-response experiments by fitting response data to Hill functions. We know from Mitchison’s previous multi-parameter dose-response study that different parameters show different IC50 values, either because (1) a larger or smaller fraction of the target must be inhibited to perturb different aspects of the downstream pathway or (2) poly-pharmacology becomes important at high concentration. Reducing dose-response data to EC50 and Emax values has the advantages of (1) reduction in data complexity and (2) direct physical meaning that will resonate with medicinal chemists. How well the data can be fit to a series of S-shaped curves for each parameter, and how well a PRS based on this kind of model-based data reduction captures mechanistic or genomic variation, are questions that need to be tested by running the experiment.
Based on the metrics described above, we will classify and cluster cell lines, compare dose-response surfaces and generate similarity metrics for all pair-wise comparisons. We will do this in two ways, concentration-dependent and concentration-independent. The first will be a standard similarity metric that directly compares the dose-response surfaces. The second will compute similarities as a function of all possible left and right shifts on the concentration axis, so we can score for maximal similarity in response independent of the precise potency of the drug on that cell line. This will tend to mask cell-cell differences that are due to, for example, drug permeability or drug metabolism differences, and highlight differences in downstream response pathways. Both classifiers will be used to identify genomic signatures, and we anticipate they may reveal different aspects of similarity.