- Methodological
- Statistical analysis of massive datasets
- Multiphase analyses / preprocessed data
- Dependent data (time series, networks, spatial, etc.)
- Computer experiments
- Application Areas
- Systems biology
- Astronomy
- Information networks
- Advisors
- Xiao-Li Meng, Harvard University Department of Statistics
- Edo Airoldi, Harvard University Department of Statistics
- Selected publications
- Deconvolution of mixing time series on a graph (2011).
Blocker, A. W. & Airoldi, E.A. - Published in Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI 2011)
- arXiv:1105.2526 [stat.ME]
- Winner of IBM Thomas J. Watson Research Center Student Research Award, April 2011
- A Bayesian approach to the analysis of time symmetry in
light curves: Reconsidering Scorpius X-1 occultations
(2009).
Alexander W Blocker, Pavlos Protopapas, & Charles Alcock. - Published in The Astrophysical Journal (doi: 10.1088/0004-637X/701/2/1742)
- arXiv:0904.0645v1[astro-ph.IM]
- Winner of NESS Student Paper Award (Sponsored by Microsoft and Google), April 2010
- Selected talks
- Semi-parametric Robust Event Detection for Massive Time-Series Datasets
-
The detection and analysis of events within massive collections
of time-series has become an extremely important task for
time-domain astronomy. In particular, many scientific
investigations (e.g. the analysis of microlensing and other
transients) begin with the detection of isolated events in
irregularly-sampled series with both non-linear trends and
non-Gaussian noise. I will discuss a semi-parametric, robust,
parallel method for identifying variability and isolated events
at multiple scales in the presence of the above complications.
This approach harnesses the power of Bayesian modeling while
maintaining much of the speed and scalability of more ad-hoc
machine learning approaches. I will also contrast this work
with event detection methods from other fields, highlighting
the unique challenges posed by astronomical surveys. Finally, I
will present initial results from the application of this
method to 87.2 million EROS sources, where we have obtained a
greater than 100-fold reduction in candidates for certain types
of phenomena.
Presented August 25, 2010 at the Workshop on Computational Astrostatistics ( http://hea-www.harvard.edu/AstroStat/CAS2010/) - Discussion: The Promise & Peril of Synthetic & Integrated Data
-
A discussion of the potential gains and problems with
integrated, synthetic data of the type produced by Longitudinal
Employer-Household Dynamics at the US Census Bureau.
Presented June 21, 2010 at the ICSA 2010 Applied Statistics Symposium as part of the "Informative But Not Invasive" panel, organized by Xiao-Li Meng.
Followed presentations by Jeremy Wu (US Census Bureau), John Abowd (Cornell University), and Jerry Reiter (Duke University). - Doing Right By Massive Data: Using Probability Modeling To Advance The Analysis Of Huge Astronomical Datasets
- The analysis of extremely large, complex datasets is
becoming an increasingly important task in the analysis of
scientific data. This trend is especially prevalent in
astronomy, as large-scale surveys such as SDSS, Pan-STARRS, and
the LSST deliver (or promise to deliver) terabytes of data per
night. While both the statistics and machine-learning
communities have offered approaches to these problems, neither
has produced a completely satisfactory approach. Working in the
context of event detection for the MACHO LMC data, I will
present an approach that combines much of the power of Bayesian
probability modeling with the efficiency and scalability
typically associated with more ad-hoc machine learning
approaches. This provides both rigorous assessments of
uncertainty and improved statistical efficiency on a dataset
containing approximately 20 million sources and 40 million
individual time series. I will also discuss how this framework
could be extended to related problems.
Presented at NESS 2010 - Two Problems in X-ray Astronomy
- Discussion of my work on two projects in x-ray astronomy: the development of a hierarchical Bayesian replacement for "stacking" and the analysis of events in x-ray light curves. For each problem, I outlined the development of an improved model for the data and the computational methods employed. I also discussd the unique challenges that each case has presented from a cultural perspective.
- Selected posters
- Deconvolution of Mixing Time Series on a Graph (UAI 2011)
- In many applications we are interested in making inference on latent time series from indirect measurements, which are often low-dimensional projections resulting from mixing or aggregation. Positron emission tomography, super-resolution, and network traffic monitoring are some examples. Inference in such settings requires solving a sequence of ill-posed inverse problems, y_t = A x_t, where the projection mechanism provides information on A. We consider problems in which A specifies mixing on a graph of times series that are bursty and sparse. We develop a multilevel state-space model for mixing times series and an efficient approach to inference. A simple model is used to calibrate regularization parameters that lead to efficient inference in the multilevel state-space model. We apply this method to the problem of estimating point-to-point traffic flows on a network from aggregate measurements. Our solution outperforms existing methods for this problem, and our two-stage approach suggests an efficient inference strategy for multilevel models of multivariate time series.
- Software
- fastGHQuad : Fast, numerically-stable evaluation of Gauss-Hermite quadrature rules in R/C++
- Available via CRAN
- bayesstack: Bayesian x-ray stacking analysis
- Part of the ChaMP software packages for the analysis of multiwavelength surveys
- Kalman tools for Matlab
- Kalman filter & smoother
- Allow for control inputs in state equation & affine term in measurement equation
- Maximum likelihood estimation of linear state-space systems
- Implementation of the expectation maximization algorithm
- Can estimate input matrix and/or affine term in measurement equation
- Optional diagonal restrictions on state & observation noise covariance matrices
- 12/06/2007: Updated with moderate efficiency improvements for M-step routines & major change in EM convergence criterion (relative instead of absolute change)
- 12/13/2007: Significant efficiency improvements and further tweaking of EM convergence criterion
- Licensed under LGPL v3.0
- A technical note on the EM algorithm for affine state-space systems & its usage
- Some useful scripts for R
- bagginglm.R: The beginning of a set of functions for bagging LMs and GLMs. Very preliminary. Licensed under GPL v2.0
- AICc.R: A function to calculate corrected AIC (AIC with an adjustment term for small-sample bias). This is written in the same way as the base AIC function, and will work for any model with a logLik method.
- split.data.R: A simple function to break apart a data frame or multivariate time series; it is particularly useful for dealing with the latter. Includes an option to omit missing values while splitting.
- exif2kmz: a Python script to convert geotagged images to a KMZ file
- Requires pyexiv2 and Python Imaging libraries.
- Creates a KMZ file with a placemark for each image and the images themselves.
- Licensed under GPL v2.0
- Current Affiliations
- PhD Student with Harvard University Department of Statistics
- Collaborating with the O'Shea lab at the FAS Center for Systems Biology on methods for estimating chromatin structure from high-throughput sequencing data
- Statistical researcher with LEHD group at United States Census Bureau, developing methods for inference and disclosure limitation with imputed and sythetic data. (STEP intern for Summer 2010; continuing research)
- Working with the Time Series Center (part of the IIC) on computationally-intensive time series analysis (focus on event detection).
- Currently working with astrostatistics group (CHASC) on X-ray stacking for ChaMP.
- Teaching assistant with Harvard University Department of Statistics
- TF for Statistics 244 (Linear and Generalized Linear Models) with Alan Agresti, Fall 2011
- Head TF for Statistics 111 (Introduction to Theoretical Statistics) with Edo Airoldi, Spring 2011
- TF for Statistics 221 (Statistical Computing and Learning) with Edo Airoldi, Fall 2010
- Head TF for Statistics 104 with Kenneth Stanley, Spring 2009
- TF for Statistics 104 with Kenneth Stanley, Fall 2008
- Background
- Boston University Alumnus, Class of 2008
- Bachelors in Mathematics & Economics
- Masters in Economics
- PhD-level coursework in statistics & econometrics
- Formerly:
- Teaching assistant for two sections of Stat 104 (Harvard, Fall 2008)
- Intern with Weiss Asset Management (June 2008 - June 2009)
- Research Assistant with Boston University Department of Economics
- Intern with UBS Fixed Income Research
- US Rates & Govt. Bonds Group
- Senior Research & IT Advisor, Matté &
Company
- Research Assistant, Boston University School of Management