Alexander W Blocker


Research interests
  • Statistical analysis of massive datasets
  • Dependent data (time series, networks, spatial, etc.)
  • Analysis of data under non-sampling variability
  • Privacy-protected analysis and data release
  • Semi-parametric methods
  • Efficient computational and Monte Carlo methods
Advisors
  • Xiao-Li Meng, Harvard University Department of Statistics (Primary)
  • Edo Airoldi, Harvard University Department of Statistics
Selected publications
  • A Bayesian approach to the analysis of time symmetry in light curves: Reconsidering Scorpius X-1 occultations.
    Alexander Blocker, Pavlos Protopapas, & Charles Alcock.
  • arXiv:0904.0645v1[astro-ph.IM]
    Published in ApJ (doi: 10.1088/0004-637X/701/2/1742)
    Winner of NESS Student Paper Award, April 2010
Selected talks
  • Discussion: The Promise & Peril of Synthetic & Integrated Data
  • A discussion of the potential gains and problems with integrated, synthetic data of the type produced by Longitudinal Employer-Household Dynamics at the US Census Bureau.

    Presented June 21, 2010 at the ICSA 2010 Applied Statistics Symposium as part of the "Informative But Not Invasive" panel, organized by Xiao-Li Meng.
    Followed presentations by Jeremy Wu (US Census Bureau), John Abowd (Cornell University), and Jerry Reiter (Duke University).
  • Doing Right By Massive Data: Using Probability Modeling To Advance The Analysis Of Huge Astronomical Datasets
  • The analysis of extremely large, complex datasets is becoming an increasingly important task in the analysis of scientific data. This trend is especially prevalent in astronomy, as large-scale surveys such as SDSS, Pan-STARRS, and the LSST deliver (or promise to deliver) terabytes of data per night. While both the statistics and machine-learning communities have offered approaches to these problems, neither has produced a completely satisfactory approach. Working in the context of event detection for the MACHO LMC data, I will present an approach that combines much of the power of Bayesian probability modeling with the efficiency and scalability typically associated with more ad-hoc machine learning approaches. This provides both rigorous assessments of uncertainty and improved statistical efficiency on a dataset containing approximately 20 million sources and 40 million individual time series. I will also discuss how this framework could be extended to related problems.

    Presented at NESS 2010
  • Two Problems in X-ray Astronomy
  • Discussion of my work on two projects in x-ray astronomy: the development of a hierarchical Bayesian replacement for "stacking" and the analysis of events in x-ray light curves. For each problem, I outlined the development of an improved model for the data and the computational methods employed. I also discussd the unique challenges that each case has presented from a cultural perspective.
Software
  • bayesstack: Bayesian x-ray stacking analysis
  • Kalman tools for Matlab
    • Kalman filter & smoother
      • Allow for control inputs in state equation & affine term in measurement equation
    • Maximum likelihood estimation of linear state-space systems
      • Implementation of the expectation maximization algorithm
      • Can estimate input matrix and/or affine term in measurement equation
      • Optional diagonal restrictions on state & observation noise covariance matrices
    • 12/06/2007: Updated with moderate efficiency improvements for M-step routines & major change in EM convergence criterion (relative instead of absolute change)
    • 12/13/2007: Significant efficiency improvements and further tweaking of EM convergence criterion
    • Licensed under LGPL v3.0
  • A technical note on the EM algorithm for affine state-space systems & its usage
  • Some useful scripts for R
    • bagginglm.R: The beginning of a set of functions for bagging LMs and GLMs. Very preliminary. Licensed under GPL v2.0
    • AICc.R: A function to calculate corrected AIC (AIC with an adjustment term for small-sample bias). This is written in the same way as the base AIC function, and will work for any model with a logLik method.
    • split.data.R: A simple function to break apart a data frame or multivariate time series; it is particularly useful for dealing with the latter. Includes an option to omit missing values while splitting.
  • exif2kmz: a Python script to convert geotagged images to a KMZ file
    • Requires pyexiv2 and Python Imaging libraries.
    • Creates a KMZ file with a placemark for each image and the images themselves.
    • Licensed under GPL v2.0
Current Affiliations
  • PhD Student with Harvard University Department of Statistics
    • Statistical researcher with LEHD group at United States Census Bureau, developing methods for inference and disclosure limitation with imputed and sythetic data.
    • Working with the Time Series Center (part of the IIC) on computationally-intensive time series analysis (focus on event detection).
    • Currently working with astrostatistics group (CHASC) on X-ray stacking for ChaMP.
  • Teaching assistant with Harvard University Department of Statistics
    • Head TF for Statistics 104 with Professor Stanley for Spring 2009
Background
  • Boston University Alumnus, Class of 2008
    • Bachelors in Mathematics & Economics
    • Masters in Economics
    • PhD-level coursework in statistics & econometrics
  • Formerly:
    • Teaching assistant for two sections of Stat 104 (Harvard, Fall 2008)
    • Intern with Weiss Asset Management (June 2008 - June 2009)
    • Research Assistant with Boston University Department of Economics
    • Intern with UBS Fixed Income Research
      • US Rates & Govt. Bonds Group
    • Senior Research & IT Advisor, Matté & Company
    • Research Assistant, Boston University School of Management

Academia.edu Profile

LinkedIn Profile

CV