Main Page
From CSI 763
CSI 763 Statistical Methods in the Space Sciences, Spring, 2014
Instructor: Bob Weigel | <rweigel@gmu.edu> | 703-993-1361 | Planetary Hall 259
Meets on Tuesday in Robinson Hall A243 from 4:30-7:10pm.
Contents |
1. Lecture Notes
- Overview | HW #1
- Probability and Counting | HW #2
- Exploratory Data Analysis | HW #3
- Hypothesis Testing | HW #4
- Regression | HW #5
- Time Series | HW #6
- Fourier Methods | HW #7 HW #8
- Wavelets | HW #9
- Principal Component Analysis | HW #10
- Kalman Filter | HW #11
Project presentations will be given on the final exam day: Tues. 5/13 (from 4:30 pm – 6:30 pm). The write-up for the project is due on 5/14 by 9:00 am (grades are due at the end of this day). Project specification
2. Syllabus
2.1. Goals for this course
At the end of this course,
- You should have an deep understanding of 80% of the statistical methods used in the space science literature;
- You should have an intuitive feel for probability and statistics and be able to spot common fallacies and errors in data analysis;
- You should be proficient using a statistical analysis software package; and
- You should understand how to approach a research-grade problem that involves statistical methods.
2.2. General format of this class
This class will require active participation.
For standard material, I direct you to read references. Instead of telling you the concepts in a lecture-style format, I will try to structure homework problems so that you understand the concepts as you work through the problems. I many cases, new concepts appear on a home work problem before the concept is covered in class.
As I prepare the lecture, I will fill out a web page with notes.
Besides covering topics in the space sciences, I will introduce you to important contemporary research tools including software and literature search tools.
2.3. Office Hour
Formally, Tuesday from 3-4:00 pm, but practically by appointment. I will generally be in my office the entire afternoon on the day of class.
2.4. Textbook
The textbook for this course is "Statistical Methods in the Atmospheric Sciences (second edition)" [1]. Although the examples are mostly related to problems in the Atmospheric Sciences, all of the techniques covered are relevant to the Space Sciences. Most of the examples that I give and the assigned problems will be related to problems in the Space Sciences.
I will also hand out photocopies of material as needed. The following are some of the books that I reference in the notes.
- The analysis of time series: an introduction, Chatfield, C.: [2]
- An Introduction to the Bootstrap, B. Efron and R.J. Tibshirani. Amazon: [3]
- Principles of Statistics, M.G. Bulmer. A excellent book to use for review. Scanned copy: pdf; Amazon [4]
- Introduction to Probability Theory and Statistical Inference, H.J. Harson, Upper-level undergraduate book on statistics. Many details on derivations. Amazon: [5]
- Neural Networks for Pattern Recognition, Bishop. Contains much material for understanding statistical methods. Amazon: [6]
- Statistical Distributions in Engineering, Bury. Contains many distributions along with a list of their properties. Amazon: [7]
- Time Series Analyis, Forecasting and Control, by Box, Jenkins, and Reinsel. A classic time series analysis book. Amazon: [8].
- Fifty challenging problems in statistics, Mosteller. Full pdf is on Google Books [9]
- A course that covers much of the same material as CSI 763: [10]
- Online book "A First Course on Time Series Analysis" that uses examples with SAS. [11]
- Online book on machine learning: [12]
- Online Book "Fourier Methods for Beginners": [13]
2.5. Reference material
2.5.1. General
- Predictability of Weather and Climate course [14]
- A Computational Approach to Statistics [15]
- Think Stats: Probability and Statistics for Programmers [16]
- Singular Spectrum Analysis [17]
- Statistical Methods in the Climate Sciences [18]
- Reference list [19]
- Matlab Statistics Toolbox [20]
- Statsoft page [21]
- A course on the Bootstrap [22]
- Efron Reflections [23]
- Time Series and Astrophysics [24]
- Introduction to digital filters [25]
- The Intuitive Inadequacy of Classical Statistics [26]
- Modern Science and the Bayesian-Frequentist Controversy [27]
2.5.2. Popular Press
- "Thou shalt not report ratios" [28]
- Gapminder "Gapminder is a non-profit venture for development and provision of free software that visualise human development." [29]
- Odds of dying [30]
- Bayes' Theorem [31]
- Odds are, its wrong [32]
- Common statistical fallacies [33]
2.6. Grading
- 65% Homework.
- Homework will be assigned approximately once per week.
- Most homework assignments should take between 4 and 10 hours.
- Homework is always due at the start of the lecture one week after the lecture in which is was assigned.
- The most important part of this course is the homework!
- Homework assignments will usually have two portions:
- The first portion will be on material that has not yet been discussed in class and is intended to prepare you for the material to be discussed on the day that it is due. The grading on the first portion will be binary, where a 1 is given if there is evidence that an attempt was made to answer the question and a thought process was documented. You may be required to give a short presentation in class on this portion of the homework.
- The second portion is on material that has already been discussed in class. You may turn in two homeworks (second portion only) one week late. Otherwise all late homeworks are marked off 25%.
- 25% Final Exam.
- Closed book, in class. May be canceled depending on if I feel it is needed. In this case the project report grade is used in place of the final exam grade.
- 10% Project Report.
- To be assigned around Lecture 7.
- Much of the work for the project will be a part of homework assignments.
- Due date is TBD.
2.7. Project
- Your project will be completed in several steps.
- Various steps will be completed as a part of homework assignments.
- The final report will be based on components of your homework assignment.
- Each week you will give a 5-minute presentation about the analysis done on your homework. As part of the next assignment, you must address questions and suggestions that came up during your presentation.
- A template for your report is given at Project.
3. Honor Code
Your instructor enforces the honor code [34]:
Student members of the George Mason University community pledge not to cheat, plagiarize, steal, or lie in matters related to academic work.
For more information see http://honorcode.gmu.edu/
The following paragraph was extracted from the document 2011_SYLLABUS_LANGUAGE.pdf attached to an email sent by the Provost.
Mason is an Honor Code university; please see the University Catalog for a full description of the code and the honor committee process. The principle of academic integrity is taken very seriously and violations are treated gravely. What does academic integrity mean in this course? Essentially this: when you are responsible for a task, you will perform that task. When you rely on someone else’s work in an aspect of the performance of that task, you will give full credit in the proper, accepted form. Another aspect of academic integrity is the free play of ideas. Vigorous discussion and debate are encouraged in this course, with the firm expectation that all aspects of the class will be conducted with civility and respect for differing ideas, perspectives, and traditions. When in doubt (of any kind) please ask for guidance and clarification. In addition, you may not copy any text, computer code, image, data or any other material from the Internet or any other source and represent it as your own. Any material that is taken in whole or in part from any other source (including web-pages) that is not properly cited will be treated as a violation of Mason's academic honor code and will be submitted to the honor committee for adjudication, as will other violations of the honor code.
4. Software
4.1. Overview
Students are welcome to use any software for their homework problems. In the past, most students have used either MATLAB, Octave, or IDL. Other programs that have similar basic libraries (statistical functions and plotting routines) may also be used, for example, R, SAS, S, and NumPy.
If you can solve these problems without much difficulty using any software package, you know enough to solve the homework problems.
When deciding what software to use, you should consider
- the language/plotting program you are most familiar with,
- the fact that the instructor is most familiar with MATLAB/Octave, and
- the instructor's observation that in the past, students who used MATLAB/Octave tended to have an easier time completing their homework (and making presentation-grade plots).
You may do your programming assignments in any language. What is important is that you become proficient enough that data exploration is easy for you. In the same way that you are expected to be able to derive certain relationships in a physics or math course, in this course you should be able to easily arrive at a solution to a certain class of statistical problems.
There are many software packages out there that can be used for this course. Below I mention a few of the packages that I am familiar with and comment on some of the advantages and disadvantages from my perspective. You may use any package that you prefer. I have created a page that outlines some of the basic operations for MATLAB and IDL: [35].
A list of software that is used for statistical analysis along with opinions of the instructor:
- IDL - Primary user base: medical imaging and space sciences. Advantages: Used by many space scientists. Disadvantages: I think this package is dying a slow death and is not gaining many new users.
- GDL - "A free and open-source IDL/PV-WAVE compiler." I don't see this getting much traction from IDL users. Most former IDL users I know have switched to NumPy.
- R: Primary user base: academic statisticians. Advantages: Developed my many academic statisticians.
- MATLAB: Primary user base: engineers and physical scientists. Advantages: Wide-spread use. Many open-source alternatives that have 50-95% of its functionality. Disadvantages: Pricing.
- Octave: Primary user base: same as MATLAB. Advantages: Wide-spread use; no cost. Disadvantages: Not one-to-one compatible with MATLAB. May require more time to install and configure.
- SAS
- Mathematica
- Sage
- And many more [36]
4.2. Software Access
4.2.1. Octave
Octave may be installed on your personal computer for free, but is not available in any of Mason's computer labs (see [37] for installation instructions).
4.2.2. IDL
A link to a copy of IDL 8.1 was sent via email. See below for information about ISO
files. Mason has licenses that allow you to run IDL 8.1 provided that you have an internet connection (for the license server). I will email the license information.
4.2.3. MATLAB
- MATLAB is available on the computers in most of Mason's computer labs (for example, the 3rd-floor Johnson Center computer lab and the 3rd-floor computer lab in Innovation Hall).
- A second option is to run MATLAB "virtually" (and for free) on your personal laptop or desktop computer using Mason's Virtual Computer Lab (VCL) by following the instructions at [38]. Warning: My experience with the VCL from 2012-2014 is that it is unreliable. I usually recommend it at a last resort.
- A third option is to purchase a student version of MATLAB (on a DVD) at Patriot Computers [39] for $109 and install MATLAB permantently on your personal laptop or desktop machine.
- A fourth option is download the
ISO
file of MATLAB 2009a (a link and license file was sent via email). See below for information aboutISO
files. Mason has licenses for MATLAB 2009a and all of the toolboxes provided that you have an internet connection (for the license server).
ISO
An ISO
file is an image of the contents of a CD or DVD [40]. If you don't have a CD/DVD writer and/or reader to burn an ISO
file, you can download and mount the ISO
file and it will appear as an hard drive on your system.
- On Windows, mount the file with a program such as PowerISO.
- On Unix, mount the file using
mkdir /tmp/cd ; mount -t iso9660 -o loop file.iso /tmp/cd; ls /tmp/cd/
. - On OS X,
hdiutil mount file.iso
Crash Course Pages
A few documents that cover the basics of what you need to know if you choose to use MATLAB: