ASTR 600
STATISTICAL ANALYSIS IN PHYSICS AND ASTRONOMY

Spring Semester 2018 Room HBH 254; TTh 10:50am - 12:05pm

Instructor: Dr. Patrick Hartigan, Hermann Brown Rm. 350 , Phone: X2245

Office Hrs: After class and by appt.

Texts:

• Modern Statistical Methods for Astronomy (with R applications) by E. Feigelson and G. Babu
This will be our main text, and students should own a copy, and bring it to class.
• REA's Statistics Problem Solver A quite nice set of worked-out examples for various statistical tests.

• Research Journals in Physics and Astronomy
http://adsabs.harvard.edu/abstract/_service.html is a good source for the astronomy journals

• There are many statistics reference books listed at the end of each chapter in the Feigelson and Babu book

Grading: Based on short presentations (30%), homework creation, grading, and leading the discussion for the problems (25%), long presentations (25%), completing homework problems (10%), and preparation (10%). The preparation portion is based on whether or not the student has read the material beforehand and has otherwise prepared properly for class. The class format only functions well if everyone has their responsibilites completed on time. If a homework set is not created or a long presentation is not done the student will receive a 0 for that grade. The problem-creation+long-presentation grade is 50% of the total, and there are only one or two such events for each student during the semester, so it is important to have these ready on time. A student who otherwise gets straight A's on all other work who does not have a long presentation or homework-creation ready when it is scheduled will drop a full letter grade to a B.

Prereqs: There are no statistical prerequisites. We will cover material as we go.

OUTLINE OF COURSE
Every discipline of science has data, and a researcher generally wants to address a scientific question with the data. The path from the data to the conclusion usually involves some sort of statistical analysis. Even if the problem is not inherently a statistical one, when writing up the results one must always address the question of how certain the conclusions are given that there are uncertainties in the measurements. Often one looks for correlations between different variables, or tries to infer what the underlying structural variables are that define the system, and how they relate to one another. The statistical literature is vast and different disciplines apply different techniques that are suitable to their data. But in most cases the techniques are quite broad, and in principle may be used in many different contexts.

In this class we will identify the statistical methods commonly used by research papers in astrophysics, and the students will chose papers to study that use these methods. By learning how statistics is used in different areas, we will broaden our toolset for use in our own studies. We will have short discussions on many such papers to get an overview of the usage, and then more in-depth discussions that will be led by students on their favorite methods. To solidify this knowledge, students will sometimes construct problems that will be distributed to the remainder of the class, and after a period of time we will discuss the solutions.

The topics will be guided by the professor, but the papers are chosen by the students. Some examples may include hypothesis testing and confidence intervals, extracting periods and signals from time-series data, using principal component analysis, maximum likelihood methods, finding groups and clusters in multidimensional data, maximum entropy reconstructions, non-parametric ranks, optimized profile extractions, Bayesian analysis with priors and so on. Most statistical methods used in astronomy will fall under one of the broad topics in this course, but if you have seen a technique and wondered how it works, we can substitute it for one of the other topics if there is interest. find a paper that uses it and we will study it.

In class we will, (*) find a broad range of papers chosen by the students that use a particular method, (*) study the mathematics of the method, (*) go in-depth into two papers chosen by the students, and (*) work through some problems to understand how the method works in practice. The students will pick the papers, and will be expected to lead the class through an overview of the science objectives. The main focus, however, will be on the method, which we will then all work together to try to understand. The problems may apply the method to some other case, perhaps with contrived data, so we are all sure we could use it if the need arose in another context. There will be a few lectures at the beginning and then interspersed as needed throughout the semester to provide some mathematical structure, and we will follow the overall format of the textbook. But the choice of specific applications is student-driven. The papers need not necessarily be astronomical, though that will likely be the focus of much of our studies.

Tyipcal Work Load, Absence and Late Policies:

• Every 1.5 weeks: 10-minute presentation to class on an article that uses the current technique we are studying
• Every 1.5 weeks: Complete a homework problem devised in part by a classmate
• Once or twice during the semester: A 30-minute presentation to the class that goes through the details of a paper, re-analyzing data or some subset of it if possible.
• Once or twice during the semester: Work with prof to create a homework problem (and solutions!) for the technique we are using. Check answers for classmates (grades actually assigned by the prof)
• Absences: The class relies heavily on presentations and on homework problems that the students create and grade so these must be ready when needed. This is especially true for long presentations. Makeups may not be possible given the schedule. Homeworks must be turned in on time to receive credit, because we discuss them on the date they are due. Refer to the grading policy section for more information.

Outcomes/Assessments
(This section is now required by the Rice Administration for all Syllabi)

Students completing this class should be able to do the following:
• Understand the fundamentals of statistics, including probability distributions, means, variances, the Central Limit Theorem, hypothesis testing, error propagation, Bayesian analysis, jacknife, and bootstrap
• Understand modern statistical methods that relate to curve fitting, hypothesis testing, cluster analysis, principal component analysis, and time-series data
• Assess the veracity of statistical conclusions drawn from any observations or data in the natural sciences, social sciences, or humanities (subject areas chosen by the students)
• Evaluate whether or not the best statistical methods were used to analyze data in papers published in recent refereed journals
• Apply the statistical concepts covered in class to their own sets of data using, in part, the statistics software package R
• Create homework problems that relate to one of the mathematical techniques studied in class
• Improve speaking and presentation skills while leading class discussions
• Learn to summarize complex papers in a fixed amount of time
SPRING 2018 Schedule

 DATE Topic Class Type Subject and Items Due T Jan 9 Mathematical Foundation Lecture Probability Concepts CHAPTER 2 Th Jan 11 " Lecture Probability Distributions CHAPTER 4 T Jan 16 " Lecture Confidence Intervals and Hypothesis Testing CHAPTER 3 Th Jan 18 Nonparametrics Paper Discussions Short Presentations [Everyone] Math HMWK handout [pmh] T Jan 23 " Lecture CHAPTER 5 Th Jan 25 " Lecture/Long Paper Long presentation [Laura] T Jan 30 Regression Paper Discussions Short Presentations [Everyone] Nonparametrics HMWK handout [Adolfo] Math HMWK due [Everyone] Th Feb 1 " Lecture CHAPTER 7 T Feb 6 " Lecture/Long Paper Long presentation [Rae] Th Feb 8 Spring Recess --- --- T Feb 13 Data Smoothing Paper Discussions Short Presentations [Everyone] Regression HMWK handout [Laura] Nonparametric HMWK due [Everyone but Adolfo] Th Feb 15 " Lecture CHAPTER 6 T Feb 20 " Lecture/Long Paper Long presentation [Adolfo] Th Feb 22 Multivariate Analysis/PCA Paper Discussions Short Presentations [Everyone] Data Smoothing HMWK handout [Asa] Regression HMWK due [Everyone but Laura] T Feb 27 " Lecture CHAPTER 8 Th Mar 1 " Lecture/Long Paper Long presentation [Laura] T Mar 6 Time Series Analysis Paper Discussions Short Presentations [Everyone] Multivariate/PCA HMWK handout [Adolfo] Data Smoothing HMWK due [Everyone but Asa] Th Mar 8 " Lecture CHAPTER 11 T Mar 13 Spring Break --- --- Th Mar 15 Spring Break --- --- T Mar 20 Time Series Analysis Lecture/Long Paper Long presentation [Asa] Th Mar 22 Clustering Paper Discussions Short Presentations [Everyone] Time Series Analysis HMWK handout [Alison] Multivariate/PCA HMWK due [Everyone but Adolfo] T Mar 27 " Lecture CHAPTER 9 Th Mar 29 " Lecture/Long Paper Long presentation [Alison] T Apr 3 Truncated Data Paper Discussions Short Presentations [Everyone] Clustering HMWK handout [Asa] Time Series HMWK due [Everyone but Alison] Th Apr 5 " Lecture CHAPTER 10 T Apr 10 " Lecture/Long Paper Long presentation [Rae] Th Apr 12 Spatial Point Processes Paper Discussions Short Presentations [Everyone] Truncated Data HMWK handout [Rae] Clustering HMWK due [Everyone but Asa] T Apr 17 " Lecture CHAPTER 12 Th Apr 19 " Lecture/Long Paper Long presentation [Alison] Truncated Data HMWK due [Everyone but Rae] T May 1 Course Summary due [Everyone]

Honor Code: A general description of the honor code is avilable on-line. Students should turn in their own work and analysis on the homework sets, but may discuss the general nature of the problems with one-another.

Disability Accommodation: If you have a documented disability that will impact your work in this class, please contact me to discuss your needs. Additionally, you will need to register with the Disability Support Services Office in the Ley Student Center.

Short summary of techniques