Title: | Differential Gene Expression (DGE) Analysis Results Data Object |
---|---|
Description: | Provides a flexible container to manage and annotate Differential Gene Expression (DGE) analysis results (Smythe et. al (2015) <doi:10.1093/nar/gkv007>). The DGEobj has data slots for row (gene), col (samples), assays (matrix n-rows by m-samples dimensions) and metadata (not keyed to row, col, or assays). A set of accessory functions to deposit, query and retrieve subsets of a data workflow has been provided. Attributes are used to capture metadata such as species and gene model, including reproducibility information such that a 3rd party can access a DGEobj history to see how each data object was created or modified. Since the DGEobj is customizable and extensible it is not limited to RNA-seq analysis types of workflows -- it can accommodate nearly any data analysis workflow that starts from a matrix of assays (rows) by samples (columns). |
Authors: | John Thompson [aut], Connie Brett [aut, cre], Isaac Neuhaus [aut], Ryan Thompson [aut] |
Maintainer: | Connie Brett <[email protected]> |
License: | GPL-3 |
Version: | 1.1.2 |
Built: | 2024-11-04 04:04:23 UTC |
Source: | https://github.com/cb4ds/dgeobj |
DGEobj is an S3 data class that provides a flexible container for Differential Gene Expression (DGE) analysis results. The DGEobj class is designed to be extensible allowing definition of new data types as needed. A set of accessory functions to deposit, query and retrieve subsets of a data workflow has been provided. Attributes are used to capture metadata such as species and gene model, including reproducibility information such that a 3rd party can access a DGEobj history to see how each data object was created or modified.
Operationally, the DGEobj is influenced by the RangedSummarizedExperiment (RSE). The DGEobj has data slots for row (gene), col (samples), assays (anything with n-rows by m-samples dimensions) and metadata (anything that can't be keyed to row, col or assay). The key motivation for creating the DGEobj data structure is that the RSE only allows one data item each in the row and col slots and thus is unsuitable for capturing the plethora of data objects created during a typical DGE workflow. The DGEobj data structure can hold any number of row and col data objects and thus is suitable for capturing the multiple steps of a downstream analysis.
Certain object types, primarily the count matrix and associated row and column info, are defined as unique which means only one instance of that type may be added to the DGEobj.
When multiple objects of one type are included in a DGEobj (e.g. two different fits), the concept of parent attributes is used to associate downstream data objects (e.g. contrasts) with the appropriate data object they are derived from.
browseVignettes(package = 'DGEobj')
Subset with square brackets
## S3 method for class 'DGEobj' x[...]
## S3 method for class 'DGEobj' x[...]
x |
A DGEobj |
... |
Additional parameters |
A DGEobj
Add a data item
addItem( dgeObj, item, itemName, itemType, funArgs = match.call(), itemAttr, parent = "", overwrite = FALSE, init = FALSE )
addItem( dgeObj, item, itemName, itemType, funArgs = match.call(), itemAttr, parent = "", overwrite = FALSE, init = FALSE )
dgeObj |
A class DGEobj created by function initDGEobj() |
item |
The data item to be deposited in the DGEobj |
itemName |
The user-assigned name for this data item |
itemType |
The type attribute. See showTypes() to see the predefined types – types are extensible with the newType() function. |
funArgs |
(optional) A text field to annotate how the data object was created. If the result of match.call() is passed as this argument, the name and arguments used in the current function are captured |
itemAttr |
(optional) A named list of attributes to add directly to the item |
parent |
(optional) itemName of the parent of this item |
overwrite |
Whether to overwrite a matching data object stored in the itemName slot (default = FALSE) |
init |
Internal Use (default = FALSE) |
A DGEobj
## Not run: myFunArgs <- match.call() # Capture calling function and arguments myDGEobj <- addItem(myDGEobj, item = MyCounts, itemName = "counts", itemType = "counts", funArgs = myFunArgs) ## End(Not run)
## Not run: myFunArgs <- match.call() # Capture calling function and arguments myDGEobj <- addItem(myDGEobj, item = MyCounts, itemName = "counts", itemType = "counts", funArgs = myFunArgs) ## End(Not run)
Add multiple data items
addItems(dgeObj, itemList, itemTypes, parents, itemAttr, overwrite = FALSE)
addItems(dgeObj, itemList, itemTypes, parents, itemAttr, overwrite = FALSE)
dgeObj |
A class DGEobj created by function initDGEobj() |
itemList |
A named list of data items to add to DGEobj |
itemTypes |
A list of type values for each item on itemList |
parents |
(optional) A list of parent values for each item on itemList (optional, but highly recommended) |
itemAttr |
(optional) An named list of attributes to add to each item. These attributes will be attached to all items in the call. |
overwrite |
Whether to overwrite a matching data object stored in the itemName slot (default = FALSE) |
A DGEobj
## Not run: # NOTE: Requires the edgeR package # Add normalized counts and log2CPM as additional "assay" items in the DGEobj dgeObj <- readRDS(system.file("exampleObj.RDS", package = "DGEobj")) dgeList <- edgeR::calcNormFactors(edgeR::DGEList(dgeObj$counts), method="TMM") log2cpm <- edgeR::cpm(dgeList, log = TRUE) dgeObj <- addItems(dgeObj, itemList = list(newDgelist = dgeList, Log2CPM = log2cpm), itemTypes = list("assay", "assay"), parents = list("counts", "newDgelist") ) inventory(dgeObj) ## End(Not run)
## Not run: # NOTE: Requires the edgeR package # Add normalized counts and log2CPM as additional "assay" items in the DGEobj dgeObj <- readRDS(system.file("exampleObj.RDS", package = "DGEobj")) dgeList <- edgeR::calcNormFactors(edgeR::DGEList(dgeObj$counts), method="TMM") log2cpm <- edgeR::cpm(dgeList, log = TRUE) dgeObj <- addItems(dgeObj, itemList = list(newDgelist = dgeList, Log2CPM = log2cpm), itemTypes = list("assay", "assay"), parents = list("counts", "newDgelist") ) inventory(dgeObj) ## End(Not run)
Reads an annotation file containing key/value pairs or a named list and attaches them attributes to a DGEobj. If a file is used, it should be a text file containing key/value pairs separated by an equals sign. The keys argument specifies which keys we want to capture as attributes on the DGEobj.
annotateDGEobj(dgeObj, annotations, keys = NULL)
annotateDGEobj(dgeObj, annotations, keys = NULL)
dgeObj |
A object of class DGEobj created by function initDGEobj() |
annotations |
Either a character string path to a file with annotations given as key/value pairs separated by an equal sign, or a named list of key/value pairs |
keys |
By default (value = NULL), all keys are read in and applied as DGEobj attributes. Use the keys argument to specify a specific list of keys to read from the file. |
A DGEobj
MyDgeObj <- system.file("exampleObj.RDS", package = "DGEobj") ## Not run: #using a text file file of key=value pairs annotationFile <- "/location/to/myAnnotations.txt" MyDgeObj <- annotateDGEobj(MyDgeObj, annotationFile) #using a named list of key/values annotations <- list(Title = "Rat Liver Slices from Bile Duct Ligation animals", Organism = "Rat", GeneModel = "Ensembl.R89") MyDgeObj <- annotateDGEobj(MyDgeObj, annotations) ## End(Not run)
MyDgeObj <- system.file("exampleObj.RDS", package = "DGEobj") ## Not run: #using a text file file of key=value pairs annotationFile <- "/location/to/myAnnotations.txt" MyDgeObj <- annotateDGEobj(MyDgeObj, annotationFile) #using a named list of key/values annotations <- list(Title = "Rat Liver Slices from Bile Duct Ligation animals", Organism = "Rat", GeneModel = "Ensembl.R89") MyDgeObj <- annotateDGEobj(MyDgeObj, annotations) ## End(Not run)
Cast as a simple list
## S3 method for class 'DGEobj' as.list(x, ...)
## S3 method for class 'DGEobj' as.list(x, ...)
x |
A DGEobj |
... |
Additional parameters |
A named list
# example DGEobj exObj <- readRDS(system.file("miniObj.RDS", package = "DGEobj")) mylist <- as.list(exObj)
# example DGEobj exObj <- readRDS(system.file("miniObj.RDS", package = "DGEobj")) mylist <- as.list(exObj)
Get the baseType of an internal data item
baseType(dgeObj, type)
baseType(dgeObj, type)
dgeObj |
A class DGEobj created by function initDGEobj() |
type |
An item type for which you want the baseType |
character string
# example DGEobj exObj <- readRDS(system.file("miniObj.RDS", package = "DGEobj")) baseType(exObj, type = "DGEList")
# example DGEobj exObj <- readRDS(system.file("miniObj.RDS", package = "DGEobj")) baseType(exObj, type = "DGEList")
Get a list of the available baseTypes
baseTypes(dgeObj)
baseTypes(dgeObj)
dgeObj |
(optional) A class DGEobj created by function initDGEobj() |
A character vector of baseTypes
# Global definition of baseTypes baseTypes() # example DGEobj exObj <- readRDS(system.file("miniObj.RDS", package = "DGEobj")) # Basetypes from a specific DGEobj baseTypes(exObj)
# Global definition of baseTypes baseTypes() # example DGEobj exObj <- readRDS(system.file("miniObj.RDS", package = "DGEobj")) # Basetypes from a specific DGEobj baseTypes(exObj)
Returns the dimensions of the assay data (baseType)
## S3 method for class 'DGEobj' dim(x)
## S3 method for class 'DGEobj' dim(x)
x |
A class DGEobj created by function initDGEobj() |
An integer vector [r,c] with a length of 2.
Returns a list of length 2 containing the the assay data names (baseType)
## S3 method for class 'DGEobj' dimnames(x)
## S3 method for class 'DGEobj' dimnames(x)
x |
A class DGEobj created by function initDGEobj() |
A list of length 2 containing rownames and colnames of the DGEobj
Get all user-defined attributes from a DGEobj except for any listed in the excludeList argument.
getAttributes( dgeObj, excludeList = list("dim", "dimnames", "names", "row.names", "class") )
getAttributes( dgeObj, excludeList = list("dim", "dimnames", "names", "row.names", "class") )
dgeObj |
A DGEobj |
excludeList |
A list of attribute names to exclude from the output (default = list("dim", "dimnames", "names", "row.names")) |
A named list
# example DGEobj exObj <- readRDS(system.file("miniObj.RDS", package = "DGEobj")) getAttributes(exObj) # Get the formula attribute from the design (if set) attr(exObj$design, "formula")
# example DGEobj exObj <- readRDS(system.file("miniObj.RDS", package = "DGEobj")) getAttributes(exObj) # Get the formula attribute from the design (if set) attr(exObj$design, "formula")
Retrieve data items by baseType
getBaseType(dgeObj, baseType)
getBaseType(dgeObj, baseType)
dgeObj |
A class DGEobj created by function initDGEobj() |
baseType |
One or more of: ["row", "col", "assay", "meta"] |
A list of data items
# example DGEobj exObj <- readRDS(system.file("miniObj.RDS", package = "DGEobj")) Assays <- getBaseType(exObj, baseType = "assay") AssaysAndMeta <- getBaseType(exObj, c("assay", "meta"))
# example DGEobj exObj <- readRDS(system.file("miniObj.RDS", package = "DGEobj")) Assays <- getBaseType(exObj, baseType = "assay") AssaysAndMeta <- getBaseType(exObj, c("assay", "meta"))
Retrieve a data item by name
getItem(dgeObj, itemName)
getItem(dgeObj, itemName)
dgeObj |
A class DGEobj created by function initDGEobj() |
itemName |
Name of item to retrieve |
The requested data item
# example DGEobj exObj <- readRDS(system.file("miniObj.RDS", package = "DGEobj")) MyCounts <- getItem(exObj, "counts")
# example DGEobj exObj <- readRDS(system.file("miniObj.RDS", package = "DGEobj")) MyCounts <- getItem(exObj, "counts")
Retrieve multiple data items by name
getItems(dgeObj, itemNames)
getItems(dgeObj, itemNames)
dgeObj |
A class DGEobj created by function initDGEobj() |
itemNames |
A character string, character vector, or list names to retrieve |
A list
# example DGEobj exObj <- readRDS(system.file("miniObj.RDS", package = "DGEobj")) myList <- getItems(exObj, list("counts", "geneData")) names(myList)
# example DGEobj exObj <- readRDS(system.file("miniObj.RDS", package = "DGEobj")) myList <- getItems(exObj, list("counts", "geneData")) names(myList)
Retrieve data items by type
getType(dgeObj, type, parent)
getType(dgeObj, type, parent)
dgeObj |
A class DGEobj created by function initDGEobj() |
type |
A type or list of types to retrieve |
parent |
(optional) Filter return list for common parent (e.g. useful to select one set of contrast results when multiple fits have been performed) |
A list of data items
# example DGEobj exObj <- readRDS(system.file("miniObj.RDS", package = "DGEobj")) MyRawData <- getType(exObj, type = list("counts", "design", "geneData"))
# example DGEobj exObj <- readRDS(system.file("miniObj.RDS", package = "DGEobj")) MyRawData <- getType(exObj, type = list("counts", "design", "geneData"))
Initialize with base data (primaryAssayData, row annotations, col annotations)
initDGEobj( primaryAssayData, rowData, colData, level, customAttr, allowShortSampleIDs = FALSE, DGEobjDef = initDGEobjDef() )
initDGEobj( primaryAssayData, rowData, colData, level, customAttr, allowShortSampleIDs = FALSE, DGEobjDef = initDGEobjDef() )
primaryAssayData |
A numeric matrix or dataframe with row and colnames. Each column represents a sample. Each row represents and assay. This is typically the counts matrix in a DGE RNA-Seq experiment. |
rowData |
Gene, exon, isoform or protein level annotation. Rownames must match the rownames in primaryAssayData |
colData |
A dataframe describing the experiment design. Rownames much match the colnames(primaryAssayData) |
level |
One of "gene", "exon", "isoform" or "protein" |
customAttr |
(optional) Named list of attributes |
allowShortSampleIDs |
Using sequential integer rownames (even if typed as character) is discouraged and by default will abort the DGEobj creation. If you have a legitimate need to have short sample names composed of numeric characters, you can set this argument to TRUE (default = FALSE) |
DGEobjDef |
An object definition. Defaults to the global DGEobj definition (initDGEobjDef()) and you usually shouldn't change this unless you're customizing the object for new data types. |
A DGEobj
dgeObj <- readRDS(system.file("miniObj.RDS", package = "DGEobj")) MyCounts <- dgeObj$counts geneinfo <- dgeObj$geneData sampinfo <- dgeObj$design myDgeObj <- initDGEobj(primaryAssayData = MyCounts, rowData = geneinfo, colData = sampinfo, level = "gene", customAttr = list (Genome = "Rat.B6.0", GeneModel = "Ensembl.R89"))
dgeObj <- readRDS(system.file("miniObj.RDS", package = "DGEobj")) MyCounts <- dgeObj$counts geneinfo <- dgeObj$geneData sampinfo <- dgeObj$design myDgeObj <- initDGEobj(primaryAssayData = MyCounts, rowData = geneinfo, colData = sampinfo, level = "gene", customAttr = list (Genome = "Rat.B6.0", GeneModel = "Ensembl.R89"))
Instantiate a class DGEobjDef object.
initDGEobjDef(levels, primaryAssayNames, types, uniqueTypes)
initDGEobjDef(levels, primaryAssayNames, types, uniqueTypes)
levels |
A character string or vector providing names for new levels |
primaryAssayNames |
A character string or vector, must be the same length as levels This argument supplies the primaryAssayNames for the corresponding levels. |
types |
A named character vector of new types where the values indicate the basetype for each named type (optional) |
uniqueTypes |
A name or vector of names to add to the uniqueType list (optional) |
A class DGEobjDef object suitable for use with initDGEobj
# return the default DGEobj definition myDGEobjDef <- initDGEobjDef() # Optionally add some new types and levels for metabolomics data myDGEobjDef <- initDGEobjDef(levels = "metabolomics", primaryAssayNames = "intensity", types <- c(normalizedIntensity = "assay")) # When a new level is defined, the itemNames and types for the # rowData and colData are automatically established. The # types argument is only needed to define downstream workflow objects.
# return the default DGEobj definition myDGEobjDef <- initDGEobjDef() # Optionally add some new types and levels for metabolomics data myDGEobjDef <- initDGEobjDef(levels = "metabolomics", primaryAssayNames = "intensity", types <- c(normalizedIntensity = "assay")) # When a new level is defined, the itemNames and types for the # rowData and colData are automatically established. The # types argument is only needed to define downstream workflow objects.
Retrieve the object inventory
inventory(dgeObj, verbose = FALSE)
inventory(dgeObj, verbose = FALSE)
dgeObj |
A class DGEobj created by function initDGEobj() |
verbose |
Include funArgs column in the output (default = FALSE) |
A data.frame summarizing the data contained in the DGEobj
# example DGEobj exObj <- readRDS(system.file("miniObj.RDS", package = "DGEobj")) inventory(exObj)
# example DGEobj exObj <- readRDS(system.file("miniObj.RDS", package = "DGEobj")) inventory(exObj)
Add a new type definition to a DGEobj
newType(dgeObj, itemType, baseType, uniqueItem = FALSE)
newType(dgeObj, itemType, baseType, uniqueItem = FALSE)
dgeObj |
A class DGEobj created by function initDGEobj() |
itemType |
The name of the new type to create |
baseType |
The baseType of the new item. One of [row, col, assay, meta] |
uniqueItem |
If set to TRUE, only one instance of the new type is allowed in a DGEobj |
A DGEobj
# example DGEobj exObj <- readRDS(system.file("miniObj.RDS", package = "DGEobj")) exObj <- newType(exObj, itemType = "AffyRMA", baseType = "assay", uniqueItem = TRUE)
# example DGEobj exObj <- readRDS(system.file("miniObj.RDS", package = "DGEobj")) exObj <- newType(exObj, itemType = "AffyRMA", baseType = "assay", uniqueItem = TRUE)
Print the Inventory
## S3 method for class 'DGEobj' print(x, ..., verbose = FALSE)
## S3 method for class 'DGEobj' print(x, ..., verbose = FALSE)
x |
A class DGEobj created by function initDGEobj() |
... |
Additional parameters |
verbose |
Include funArgs column in the output (default = FALSE) |
# example DGEobj exObj <- readRDS(system.file("miniObj.RDS", package = "DGEobj")) print(exObj)
# example DGEobj exObj <- readRDS(system.file("miniObj.RDS", package = "DGEobj")) print(exObj)
During a workflow, a DGEobj typically gets filtered down to remove samples that fail QC or non-expressed genes. The resetDGEobj() function produces a new DGEobj with the original unfiltered data. Resetting an object does not restore changes to attributes or class, but does revert changes made with addItem() and rmItem(). Reset requires that *_orig data is still in the DGEobj.
resetDGEobj(dgeObj)
resetDGEobj(dgeObj)
dgeObj |
A class DGEobj created by function initDGEobj() |
A DGEobj
#example object exObj <- readRDS(system.file("miniObj.RDS", package = "DGEobj")) # subset to first 10 rows to show reset functionality exObj <- exObj[c(1:10), ] exObj <- resetDGEobj(exObj) dim(exObj)
#example object exObj <- readRDS(system.file("miniObj.RDS", package = "DGEobj")) # subset to first 10 rows to show reset functionality exObj <- exObj[c(1:10), ] exObj <- resetDGEobj(exObj) dim(exObj)
Removes a named data item
rmItem(dgeObj, itemName)
rmItem(dgeObj, itemName)
dgeObj |
A class DGEobj created by function initDGEobj() |
itemName |
Name of the item to remove |
A DGEobj
# example DGEobj exObj <- readRDS(system.file("miniObj.RDS", package = "DGEobj")) exObj <- rmItem(exObj, "design")
# example DGEobj exObj <- readRDS(system.file("miniObj.RDS", package = "DGEobj")) exObj <- rmItem(exObj, "design")
Set one or more attributes on a DGEobj or on a specific item within a DGEobj.
setAttributes(dgeObj, attribs)
setAttributes(dgeObj, attribs)
dgeObj |
A DGEobj |
attribs |
A named list of attribute/value pairs |
This function adds attributes without deleting the attributes that are already present. Any named attribute that already exists in the object will be updated. To remove an attribute from an object pass NULL as the attribute value.
A DGEobj
# example DGEobj exObj <- readRDS(system.file("miniObj.RDS", package = "DGEobj")) # Assign attributes to a DGEobj MyAttributes <- list(Platform = "RNA-Seq", Instrument = "HiSeq", Vendor = "Unknown", readType = "PE", readLength = 75, strandSpecific = TRUE) exObj <- setAttributes(exObj, MyAttributes) # Set attributes on an item inside a DGEobj MyAttributes <- list(normalized = FALSE, LowIntFilter = "FPK >5 in >= 1 group") exObj[["counts"]] <- setAttributes(exObj[["counts"]], MyAttributes)
# example DGEobj exObj <- readRDS(system.file("miniObj.RDS", package = "DGEobj")) # Assign attributes to a DGEobj MyAttributes <- list(Platform = "RNA-Seq", Instrument = "HiSeq", Vendor = "Unknown", readType = "PE", readLength = 75, strandSpecific = TRUE) exObj <- setAttributes(exObj, MyAttributes) # Set attributes on an item inside a DGEobj MyAttributes <- list(normalized = FALSE, LowIntFilter = "FPK >5 in >= 1 group") exObj[["counts"]] <- setAttributes(exObj[["counts"]], MyAttributes)
This function prints all attributes regardless of the class of the attribute value.
showAttributes( dgeObj, skipList = c("dim", "dimnames", "rownames", "colnames", "listData", "objDef") )
showAttributes( dgeObj, skipList = c("dim", "dimnames", "rownames", "colnames", "listData", "objDef") )
dgeObj |
A DGEobj |
skipList |
A character vector of attributes to skip. Use this to avoid printing certain lengthy attributes like rownames. Defaults to c("dim", "dimnames", "rownames", "colnames", "listData", "objDef") |
*Note* Use showMeta() to only retrieve attributes that are key/value pairs.
# example DGEobj exObj <- readRDS(system.file("miniObj.RDS", package = "DGEobj")) showAttributes(exObj)
# example DGEobj exObj <- readRDS(system.file("miniObj.RDS", package = "DGEobj")) showAttributes(exObj)
Retrieve the Key/Value metadata attributes that have a character value and length of 1
showMeta(dgeObj)
showMeta(dgeObj)
dgeObj |
A DGEobj with attributes |
A data.frame with "Attribute" and "Value" columns
# example DGEobj exObj <- readRDS(system.file("miniObj.RDS", package = "DGEobj")) showMeta(exObj)
# example DGEobj exObj <- readRDS(system.file("miniObj.RDS", package = "DGEobj")) showMeta(exObj)
Returns and prints the list of all defined types
showTypes(dgeObj)
showTypes(dgeObj)
dgeObj |
A class DGEobj created by function initDGEobj() |
data.frame
# example DGEobj exObj <- readRDS(system.file("miniObj.RDS", package = "DGEobj")) showTypes(exObj)
# example DGEobj exObj <- readRDS(system.file("miniObj.RDS", package = "DGEobj")) showTypes(exObj)
Subset internal row or column data
## S3 method for class 'DGEobj' subset(x, ..., row, col, drop = FALSE, debug = FALSE)
## S3 method for class 'DGEobj' subset(x, ..., row, col, drop = FALSE, debug = FALSE)
x |
A class DGEobj created by function initDGEobj() |
... |
Additional parameters |
row |
Row index for the subset |
col |
Col index for the subset |
drop |
Included for compatibility only |
debug |
(default = FALSE) Set to TRUE to get additional information on the console if subsetting a DGEobj fails with a dimension error. |
A DGEobj
# example DGEobj exObj <- readRDS(system.file("miniObj.RDS", package = "DGEobj")) exObj <- subset(exObj, 1:10, 5:50)
# example DGEobj exObj <- readRDS(system.file("miniObj.RDS", package = "DGEobj")) exObj <- subset(exObj, 1:10, 5:50)