Microarray data processing
MicroarrayProcessing is to automate microarray data processing for multiple public project data.
I first download raw-data files from GEO (you might be able to do this with R package GEOquery for most cases. But please note that there are some cases you can’t download raw data with GEOquery.) and saved raw data files of each project into separate folders of GSE15059_RAW and GSE28320_RAW.
Then I write up a metadata file that shows information about how to process two project data. The content of this file should be in metadata.txt. There are 9 columns and 19 rows (one header and 18 samples). The descriptions of columns are:
There can be variety of column names. In the example of GSE28320, “F635 Median”, “F532 Median”, “B635 Median”, “B532 Median” indicate R, G, Rb, Gb, respectively. In this example, F denotes Foreground, B denotes Background, 635, 532 are wavelengths of flurorescence and 635 indicates Red signal, whereas 532 indicates Green signal. Different project might name the columns differently. I can give you more examples of column names (in the order of R, G, Rb, Gb) below. The general idea is that Median is preferred over Mean, we only use Mean when Median is not available. Channel 2, Cy5, Wavelength of 6XX all correspond to Red channel. Likewise, Channel 1, Cy3, Wavelength of 5XX all correspond to Green channel. You could tell whether column is for foreground signal (SIG, F, Signal, and etc) or for background signal (BKD, B, Bkg, and etc.) pretty easily.
I run the R script (RunDataProcessing.R) by
Rscript RunDataProcessing.R metadata.txt
.
This script will produce 18 gene expression files where the first column is ID and the second column is gene expression level.