ASTR 408

Spring Semester 2022 Room HBH 254; 1:00pm - 1:50pm


Instructor: Dr. Patrick Hartigan, Hermann Brown Rm. 350 , Phone: X2245

Office Hrs: After class and by appt.


Scheduling: Several Wednesday classes meet during the Rice Faculty Senate meetings, and as a Senator I should attend these meetings. We will discuss options for these classes when we meet the first day of class.

Grading: Based on short presentations (30%), homework creation, grading, and leading the discussion for the problems (25%), long presentations (25%), completing homework problems (10%), and preparation (10%). The preparation portion is based on whether or not the student has read the material beforehand and has otherwise prepared properly for class. The class format only functions well if everyone has their responsibilites completed on time. If a homework set is not created or a long presentation is not done the student will receive a 0 for that grade. The problem-creation+long-presentation grade is 50% of the total, and there are only one or two such events for each student during the semester, so it is important to have these ready on time. A student who otherwise gets straight A's on all other work who does not have a long presentation or homework-creation ready when it is scheduled will drop a full letter grade to a B.

Course Version: The course is also offered for graduate credit as ASTR 508. The course content is the same, but a larger fraction of the presentations are required of the graduate students.

Curriculum Fit: This class is one of the options for 4xx courses required for the B.S. Astrophysics degree, and also will count towards a minor in ASTR. ASTR 408 is not currently within the data sciences rubric (major or minor). However, the DSCI curriculum is under development, so interested students should meet with faculty in charge of those degrees to determine whether or not ASTR 408 may substitute for a DSCI requirement.
Prereqs: There are no statistical prerequisites. We will cover material as we go.

Every discipline of science has data, and a researcher generally wants to address a scientific question with the data. The path from the data to the conclusion usually involves some sort of statistical analysis. Even if the problem is not inherently a statistical one, when writing up the results one must always address the question of how certain the conclusions are given that there are uncertainties in the measurements. Often one looks for correlations between different variables, or tries to infer what the underlying structural variables are that define the system, and how they relate to one another. The statistical literature is vast and different disciplines tend to have their favorite techniques they apply to their data. But in most cases the techniques are perfectly valid over a wide range of applications.

In this class we will identify the statistical methods commonly used by research papers in physics and astrophysics. We will begin study of each method by having each student present a short summary of a paper they have found that uses the method. The subject area of the paper is completely up to the student. In the following class there will be a lecture on the mathematics of the method, and in the next class we will hear from students who have chosen this method for deeper study as they dive into their paper in depth. To solidify this knowledge, students will construct problems that will be distributed to the remainder of the class, and after a period of time we will discuss the solutions. A final lecture on the topic follows where we go over homework problems for the previous topic and discuss any outstanding questions about the material covered in the book and the methods we saw in the papers. Then we will move on to a new topic. By learning this way, we'll see how statistics is used in different areas. This experience broadens our toolset for use in our own studies.

The topics will be guided by the professor, but the papers are chosen by the students. Topic examples include hypothesis testing and confidence intervals, extracting periods and signals from time-series data, using principal component analysis, maximum likelihood methods, finding groups and clusters in multidimensional data, maximum entropy reconstructions, non-parametric ranks, optimized profile extractions, Bayesian analysis with priors and so on. Most statistical methods used in astronomy and physics will fall under one of the broad topics in this course, but if you have seen a technique and wondered how it works, we can substitute it for one of the other topics if there is interest. Find a paper that uses it and we will study it.

As described above, in class we will, (*) find a broad range of papers chosen by the students that use a particular method, (*) study the mathematics of the method, (*) go in-depth into two papers chosen by the students, and (*) work through some problems to understand how the method works in practice. The students will pick the papers, and will be expected to lead the class through an overview of the science objectives. The main focus, however, will be on the method, which we will then all study together to try to understand. The problem sets may apply the method to some other case, perhaps with contrived data, so we are all sure we could use it if the need arose in another context. There will be a few lectures at the beginning and then interspersed as needed throughout the semester to provide some mathematical structure, and we will follow the overall format of the textbook. While the textbook examples tend to come from astronomy, the student-chosen papers need not. In the past students have chosen papers from the fields of biology, geology, economics, linguistics, social sciences, oceanography, medicine, and political science. All of these are acceptable. In this class the application is less important than the method.

Analysis for this class is done in the R programming language. The book contains many examples of R code. R is a free package, and has convenient structure for loading and manipulating large data sets.

Tyipcal Work Load, Absence and Late Policies:

(This section is now required by the Rice Administration for all Syllabi)

Students completing this class should be able to do the following: SPRING 2022 Schedule

Students will be assigned to specific sections in '[ ]' the first week of class

DATE Topic Class Type Subject and Items Due
M Jan 10 Mathematical Foundation Lecture    Probability Concepts; CHAPTER 2
W Jan 12 " Lecture    Probability Concepts; CHAPTER 2
F Jan 14 " Lecture    Probability Distributions; CHAPTER 4
M Jan 17 Holiday --- ---
W Jan 19 " Lecture    Confidence Intervals and Hypothesis Testing; CHAPTER 3
F Jan 21 Nonparametrics Paper Discussions    Short Presentations [Everyone]
   Math HMWK handout [pmh]
M Jan 24 " Lecture    Nonparametrics; CHAPTER 5
W Jan 26 [5:15pm] " Lecture    Nonparametrics; CHAPTER 5
F Jan 28 " Long Paper    Long Presentation [Mingjian]
M Jan 31 Regression Paper Discussions    Short Presentations [Everyone]
   Nonparametrics HMWK handout [Kyle]
   Math HMWK due [Everyone]
W Feb 2 " Lecture    Regression; CHAPTER 7
F Feb 4 " Lecture    Regression; CHAPTER 7
M Feb 7 " Long Paper    Long presentation [Brandon]
W Feb 9 Data Smoothing Paper Discussions    Short Presentations [Everyone]
   Regression HMWK handout [Mingjian]
   Nonparametric HMWK due [Everyone but Kyle]
F Feb 11 Recess, NO CLASS --- ---
M Feb 14 " Lecture    Smoothing; CHAPTER 6
W Feb 16 " Lecture    Smoothing; CHAPTER 6
F Feb 18 " Long Paper    Long presentation [Kyle]
M Feb 21 Multivariate Analysis/PCA Paper Discussions    Short Presentations [Everyone]
   Data Smoothing HMWK handout [Yucheng]
   Regression HMWK due [Everyone but Mingjian]
W Feb 23 [5:15pm] " Lecture    Multivariate Analysis; CHAPTER 8
F Feb 25 " Lecture    Multivariate Analysis; CHAPTER 8
M Feb 28 " Long Paper    Long presentation [Yucheng]
W Mar 2 Time Series Analysis Paper Discussions    Short Presentations [Everyone]
F Mar 4 " Lecture    Time Series; CHAPTER 11
   Data Smoothing HMWK due [Everyone but Yucheng]
   Multivariate/PCA HMWK handout [Brandon]
M Mar 7 " Lecture    Time Series; CHAPTER 11
W Mar 9 " Long Paper    Long presentation [Preston]
F Mar 11 Clustering Lecture    Clustering; CHAPTER 9
M Mar 14 Spring Break --- ---
W Mar 16 " --- ---
F Mar 18 " --- ---
M Mar 21 " Paper Discussions    Short Presentations [Everyone]
   Time Series Analysis HMWK handout [Brandon]
   Multivariate/PCA HMWK due [Everyone but Brandon]
W Mar 23 [5:15pm] " Lecture    Clustering; CHAPTER 9
F Mar 25 " Long Paper    Long presentation [Mingjian]
M Mar 28 Truncated Data Paper Discussions    Short Presentations [Everyone]
   Clustering HMWK handout [Kyle]
   Time Series HMWK due [Everyone but Brandon]
W Mar 30 " Lecture    Truncated Data; CHAPTER 10
F Apr 1 " Lecture    Truncated Data; CHAPTER 10
M Apr 4 " Long Paper    Long presentation [Preston]
W Apr 6 Spatial Point Processes Paper Discussions    Short Presentations [Everyone]
   Truncated Data HMWK handout [Yucheng]
   Clustering HMWK due [Everyone but Kyle]
F Apr 8 " Lecture    Spatial Processes; CHAPTER 12
M Apr 11 " Lecture    Spatial Processes; CHAPTER 12
W Apr 13 " Long Paper    Long presentation [Yucheng]
F Apr 15 TBD... Gaussian Processes? Paper Discussions    Short Presentations [Everyone]
   Truncated Data HMWK due [Everyone but Yucheng]
   Spatial Processes HMWK handout [Preston]
M Apr 18 " Lecture    Gaussian Processes; CHAPTER 12
W Apr 20 [5:15pm] " Lecture    Gaussian Processes; CHAPTER 12
F Apr 22 " Long Paper    Long presentation [Kyle]
   Spatial Processes HMWK due [Everyone but Preston]
W May 4 Course Summary due [Everyone]

Honor Code: A general description of the honor code is avilable on-line. Students should turn in their own work and analysis on the homework sets, but may discuss the general nature of the problems with one-another.

Disability Accommodation: If you have a documented disability that will impact your work in this class, please contact me to discuss your needs. Additionally, you will need to register with the Disability Support Services Office in the Ley Student Center.