Wednesday, February 24, 2010

Panel Data

This is Betsy at the RSC. I recently studied a little more on panal data analysis. Panel data is characterized by a large number of phenomena observed over multiple time periods. This is in contrast to times series or cross-sectional data which have one phenomenon over multiple times and multiple phenomena observed in one time period respectively. Panel data is considered balanced if there is an observation for each entity at each time; however, this is not to be mistaken for no missing data, if there is more than one variable some variables may have missing information. There are many different methods and techniques for analyzing panel data this post deals particularly with fixed effect models in STATA.

Panel data allows you to control for unobservable or unmeasurable data like cultural differences across entities. Also you can control for unobservable data across time that is constant over entity. Fixed effect models explore the relationship between the independent and dependent variable within each entity, allowing for differences between entities.

There are two basic approaches that can be used for panel data. One is by including dummy variables for each state and each time. This controls for all the differences between entities and trends in time. The command to do this in STATA is xi: regress dependent independent i.time i.entity. This results in STATA creating dummy variables for each entity and time and running them into the regression. Areg is another command that also uses dummy variables. Areg allows one case, either entity or time to be absorbed in the model. The result is the same but in this instance dummy variables are only internally produced and individual coefficients are not reported only significance. Example syntax for areg is areg dependent independent, absorb(time). If you want to included both time and entity you absorb the larger of the two and include dummies for the other.

You can also use xtreg in STATA. Before you use xtreg you must classify the data as a panel dataset by using the xtset command (xtset entity year). Then the syntax is xtreg dependent independent, fe. This method produces the same results but rather than creating dummy variables for each entity and time, it relaxes the assumption of one intercept term and allows each entity its own.

Hopefully this helps you get started in analyzing panel data sets!

Monday, February 8, 2010

New SPSS Tutorials

The RSC recently received a number of statistics-related tutorials for SPSS which we have added to our website. These topics include:

Descriptive Statistics
Correlation and Bivariate Graphs
Bivariate Regression
Multiple Regression
Oneway ANOVA
Creating ANOVA Interactions
Creating ANOVA Contrasts
Post-hoc Tests in ANOVA
Factorial ANOVA
Planned Comparisons in ANOVA
Repeated Measures ANOVA

A big thanks to Dr. Baldwin and RSC employee Arjan for making these available in HTML and PDF.

Sampling Weights

Hey. This is Brian from the RSC. I recently had to opportunity to do some work with the NHANES (National Health and Nutrition Examination Survey) data provided through the Centers for Disease Control and Prevention (CDC). One of the big issues with large scale surveys, like the NHANES, is understanding sampling weights. This post will take the time to review the basic concept of a sampling weight. Following posts will discuss weighting in NHANES.

Sampling weights are defined as the reciprocal of the probability of selection (which is n/N) where N is the total population size and n is the sample size (Lohr, 1999). Therefore a sampling weight is defined as (N/n).

Let's consider a simple example of this. If we have a county with 5000 people (N = 5000) and we administer a survey to a sample of 100 people (n = 100), the probability of being selected is 100/5000 or .02. If you lived in this county, you would have a 2% chance of being randomly selected for the survey (probability of selection). My sampling weight is the reciprocal of that probability (1/.02) which is 50. This essentially means that my answer to the survey is "worth 50 people" within the county. The weight shows how much I represent.

This weight is important because the sum of the weights equals the total population. Returning to our example, 100 people each with a weight of 50 people equals 5000 people. So in the NHANES, the sum of the weights should equal the population of interest (e.g., the adult population of the United States).

Stay tuned for more on sampling weights in NHANES.


* Sharon Lohr, Sampling: Design and Analysis, 1st edition

Friday, January 29, 2010

Reaching Out

Greetings!

In order to help reach out to the social science community at BYU, we have started a blog to answer your questions about statistics and research. We hope that this will be a helpful resource for you in the future.