Alexander W Blocker


Blog (Randomized Blocker)
Research interests
  • Methodological
    • Statistical analysis of massive datasets
    • Multiphase analyses / preprocessed data
    • Dependent data (time series, networks, spatial, etc.)
    • Computer experiments
  • Application Areas
    • Systems biology
    • Astronomy
    • Information networks
Advisors
  • Xiao-Li Meng, Harvard University Department of Statistics
  • Edo Airoldi, Harvard University Department of Statistics
Selected publications
Selected talks
  • Semi-parametric Robust Event Detection for Massive Time-Series Datasets
  • The detection and analysis of events within massive collections of time-series has become an extremely important task for time-domain astronomy. In particular, many scientific investigations (e.g. the analysis of microlensing and other transients) begin with the detection of isolated events in irregularly-sampled series with both non-linear trends and non-Gaussian noise. I will discuss a semi-parametric, robust, parallel method for identifying variability and isolated events at multiple scales in the presence of the above complications. This approach harnesses the power of Bayesian modeling while maintaining much of the speed and scalability of more ad-hoc machine learning approaches. I will also contrast this work with event detection methods from other fields, highlighting the unique challenges posed by astronomical surveys. Finally, I will present initial results from the application of this method to 87.2 million EROS sources, where we have obtained a greater than 100-fold reduction in candidates for certain types of phenomena.

    Presented August 25, 2010 at the Workshop on Computational Astrostatistics ( http://hea-www.harvard.edu/AstroStat/CAS2010/)
  • Discussion: The Promise & Peril of Synthetic & Integrated Data
  • A discussion of the potential gains and problems with integrated, synthetic data of the type produced by Longitudinal Employer-Household Dynamics at the US Census Bureau.

    Presented June 21, 2010 at the ICSA 2010 Applied Statistics Symposium as part of the "Informative But Not Invasive" panel, organized by Xiao-Li Meng.
    Followed presentations by Jeremy Wu (US Census Bureau), John Abowd (Cornell University), and Jerry Reiter (Duke University).
  • Doing Right By Massive Data: Using Probability Modeling To Advance The Analysis Of Huge Astronomical Datasets
  • The analysis of extremely large, complex datasets is becoming an increasingly important task in the analysis of scientific data. This trend is especially prevalent in astronomy, as large-scale surveys such as SDSS, Pan-STARRS, and the LSST deliver (or promise to deliver) terabytes of data per night. While both the statistics and machine-learning communities have offered approaches to these problems, neither has produced a completely satisfactory approach. Working in the context of event detection for the MACHO LMC data, I will present an approach that combines much of the power of Bayesian probability modeling with the efficiency and scalability typically associated with more ad-hoc machine learning approaches. This provides both rigorous assessments of uncertainty and improved statistical efficiency on a dataset containing approximately 20 million sources and 40 million individual time series. I will also discuss how this framework could be extended to related problems.

    Presented at NESS 2010
  • Two Problems in X-ray Astronomy
  • Discussion of my work on two projects in x-ray astronomy: the development of a hierarchical Bayesian replacement for "stacking" and the analysis of events in x-ray light curves. For each problem, I outlined the development of an improved model for the data and the computational methods employed. I also discussd the unique challenges that each case has presented from a cultural perspective.
Selected posters
  • Deconvolution of Mixing Time Series on a Graph (UAI 2011)
  • In many applications we are interested in making inference on latent time series from indirect measurements, which are often low-dimensional projections resulting from mixing or aggregation. Positron emission tomography, super-resolution, and network traffic monitoring are some examples. Inference in such settings requires solving a sequence of ill-posed inverse problems, y_t = A x_t, where the projection mechanism provides information on A. We consider problems in which A specifies mixing on a graph of times series that are bursty and sparse. We develop a multilevel state-space model for mixing times series and an efficient approach to inference. A simple model is used to calibrate regularization parameters that lead to efficient inference in the multilevel state-space model. We apply this method to the problem of estimating point-to-point traffic flows on a network from aggregate measurements. Our solution outperforms existing methods for this problem, and our two-stage approach suggests an efficient inference strategy for multilevel models of multivariate time series.
Software
  • fastGHQuad : Fast, numerically-stable evaluation of Gauss-Hermite quadrature rules in R/C++
  • bayesstack: Bayesian x-ray stacking analysis
  • Kalman tools for Matlab
    • Kalman filter & smoother
      • Allow for control inputs in state equation & affine term in measurement equation
    • Maximum likelihood estimation of linear state-space systems
      • Implementation of the expectation maximization algorithm
      • Can estimate input matrix and/or affine term in measurement equation
      • Optional diagonal restrictions on state & observation noise covariance matrices
    • 12/06/2007: Updated with moderate efficiency improvements for M-step routines & major change in EM convergence criterion (relative instead of absolute change)
    • 12/13/2007: Significant efficiency improvements and further tweaking of EM convergence criterion
    • Licensed under LGPL v3.0
  • A technical note on the EM algorithm for affine state-space systems & its usage
  • Some useful scripts for R
    • bagginglm.R: The beginning of a set of functions for bagging LMs and GLMs. Very preliminary. Licensed under GPL v2.0
    • AICc.R: A function to calculate corrected AIC (AIC with an adjustment term for small-sample bias). This is written in the same way as the base AIC function, and will work for any model with a logLik method.
    • split.data.R: A simple function to break apart a data frame or multivariate time series; it is particularly useful for dealing with the latter. Includes an option to omit missing values while splitting.
  • exif2kmz: a Python script to convert geotagged images to a KMZ file
    • Requires pyexiv2 and Python Imaging libraries.
    • Creates a KMZ file with a placemark for each image and the images themselves.
    • Licensed under GPL v2.0
Current Affiliations
  • PhD Student with Harvard University Department of Statistics
    • Collaborating with the O'Shea lab at the FAS Center for Systems Biology on methods for estimating chromatin structure from high-throughput sequencing data
    • Statistical researcher with LEHD group at United States Census Bureau, developing methods for inference and disclosure limitation with imputed and sythetic data. (STEP intern for Summer 2010; continuing research)
    • Working with the Time Series Center (part of the IIC) on computationally-intensive time series analysis (focus on event detection).
    • Currently working with astrostatistics group (CHASC) on X-ray stacking for ChaMP.
  • Teaching assistant with Harvard University Department of Statistics
    • TF for Statistics 244 (Linear and Generalized Linear Models) with Alan Agresti, Fall 2011
    • Head TF for Statistics 111 (Introduction to Theoretical Statistics) with Edo Airoldi, Spring 2011
    • TF for Statistics 221 (Statistical Computing and Learning) with Edo Airoldi, Fall 2010
    • Head TF for Statistics 104 with Kenneth Stanley, Spring 2009
    • TF for Statistics 104 with Kenneth Stanley, Fall 2008
Background
  • Boston University Alumnus, Class of 2008
    • Bachelors in Mathematics & Economics
    • Masters in Economics
    • PhD-level coursework in statistics & econometrics
  • Formerly:
    • Teaching assistant for two sections of Stat 104 (Harvard, Fall 2008)
    • Intern with Weiss Asset Management (June 2008 - June 2009)
    • Research Assistant with Boston University Department of Economics
    • Intern with UBS Fixed Income Research
      • US Rates & Govt. Bonds Group
    • Senior Research & IT Advisor, Matté & Company
    • Research Assistant, Boston University School of Management

Github profile

LinkedIn Profile

CV