Course Outline: This is a short course in Chemometrics, the application of mathematical and statistical techniques for the analysis of chemical data sets. With the tremendous increase in data collection and processing capabilities, the rate of data generation using modern analytical instruments can be overwhelming. For example, a GC/FT-IR system can collect ten to twenty 4096 point interferograms each second and a single GC run may require 30-45 minutes. The resulting large data set of interferograms must be mathematically processed to identify which correspond to eluting GC peaks, to determine the reference and sample power spectra, and then to determine the infrared absorbance spectra so that each peak can be identified and quantified. The goal of chemometrics is to provide knowledge and understanding from large data sets. For GC/FT-IR analysis, chemometrics provides the tools which take the raw set of unrecognizable interferograms and transform them into a printout listing all substances found in the sample and the amount of each that is present. Chemometrics rescues us from the situation in which we are drowning in information but starving for knowledge.Demonstrate an ability to quickly calculate, understand, and use a wide range of descriptive statistics for a complex data set Know how to conduct propagation of error calculations to derive uncertainties in calculated parameters from measurements Determine confidence intervals; be able to understand, use, and explain the meaning of them Demonstrate an understanding of completing inferential statistical comparisons to include tests for outliers, normal distribution, and differences between means and variances Quickly develop a spreadsheet to analyze and to graphically illustrate a given data set Understand and be able to use analysis of variance (ANOVA) methodology for a a given data set Understand and apply sampling strategies and statistics to a given problem Know how to develop and set up Shewhart control charts Understand the roles of proficiency testing and collaborative trials Understand and be able to calculate and use linear and curvilinear regression techniques Develop calibration curves for a given set of data Develop standard addition curves and apply them to calculate unknown concentrations Understand internal standard methodology and demonstrate an ability to quickly apply it to a given analytical problem Understand multidimensional data sets and how to complete an initial analysis of them Conduct multiple regression of a given data set Understand principal component analysis and apply it to a given data set Demonstrate familiarity with techniques involving object classification
The goal of many chemometric techniques is to use measurements to produce a model for any one of a nearly infinite number of possibilities to include defining a complex system, predicting properties, optimizing a signal, designing an experiment, immediately assessing the quality of a product from an industrial process or proving an important hypothesis. Most research projects require the understanding and judicious use of statistical and mathematical tools we will be learning in this course. While technologies that generate data will continue to evolve, the mathematical and statistical tools available will continue to remain "current." Understanding and using these is an increasingly important part of a science education.
These mathematical and statistical tools are useful for a broad range of applications, particularly those that involve working with large data sets. Applications include solving problems such as apportioning the hydrocarbon air pollutants in a region to specific sources, controlling a major industrial chemical process, evaluating the impurities present in a pharmaceutical product, and determining the amount of moisture in wheat from a satellite. One can even apply these tools to determine the most powerful counting system to use in the game of 21!!
The course begins with a block on descriptive and inferential statistics along with propagation of error and analysis of variance (ANOVA). We will then cover the important issues of sampling and quality control methods, followed by the three types of analytical models used for quantification: calibration, internal standard and standard addition. The course will then focus on multivariate data sets and will examine some of the tools (such as principle component analysis) being employed to gain useful information from large volumes of data. The course will assume that you know how to use Excel (if you are weak or inexperienced, this course will be even more valuable for you); you will also gain familiarity with some advanced mathematical software tools. Here is the course outline:
Class Preparation: Homework assignments from the previous lesson are to be turned in at the beginning of each class. You are responsible for all assigned material and for all material discussed in lecture. You are expected to complete each reading assignment and begin working on the assigned problems prior to the date listed in the syllabus. For each class I recommend that you do the following:
Attendance: You are expected to attend all class meetings for
the full scheduled time.