SPSS Help: March 2009

Thursday, March 19, 2009

MANOVA in SPSS

Multivariate Analysis of Variance (MANOVA) in SPSS is similar to ANOVA, except that instead of one metric dependent variable, we have two or more dependent variables. MANOVA in SPSS is concerned with examining the differences between groups. MANOVA in SPSS examines the group differences across multiple dependent variables simultaneously.

MANOVA in SPSS is appropriate when there are two or more dependent variables that are correlated. If there are multiple dependent variables that are uncorrelated or orthogonal, ANOVA on each of the dependent variable is more appropriate than MANOVA in SPSS.

Let us take an example in MANOVA in SPSS. Suppose that four groups, each consisting of 100 randomly selected individuals, are exposed to four different commercials about some detergents. After watching the commercial, each individual provided ratings on his preference for the product, preference for the manufacturing company and the preference for the commercial itself. Since these three variables are correlated, MANOVA in SPSS should be conducted to determine the commercial that received the highest preference across the three preference variables.

MANOVA in SPSS is done by selecting “Analyze,” “General Linear Model” and “Multivariate” from the menus.

As in ANOVA, the first step is to identify the dependent and independent variables. MANOVA in SPSS involves two or more metric dependent variables. Metric variables are those which are measured using an interval or ratio scale. The dependent variable is generally denoted by Y and the independent variable is denoted by X.

In MANOVA in SPSS, the null hypothesis is that the vectors of means on multiple dependent variables are equal across groups.

As in ANOVA, MANOVA in SPSS also involves the decomposition of the total variation and is observed in all the dependent variables simultaneously. The total variation in Y in MANOVA in SPSS is denoted by SSy, which can be broken down into two components:

SSy = SSbetween + SSwithin

Here the subscripts ‘between’ and ‘within’ refer to the categories of X in MANOVA in SPSS. SSbetween is the portion of the sum of squares in Y which is related to the independent variable or factor X. Thus, it is generally referred to as the sum of squares of X. SSwithin is the variation in Y which is related to the variation within each category of X. It is generally referred to as the sum of squares for errors in MANOVA in SPSS.

Thus in MANOVA in SPSS, for all the dependent variables (say) Y₁,Y₂ (and so on), the decomposition of the total variation is done simultaneously.

The next task in MANOVA in SPSS is to the measure the effects of X on Y₁,Y₂ (and so on). This is generally done by the sum of squares of X. The relative magnitude of the sum of squares of X in MANOVA in SPSS increases as the difference among the means of Y₁,Y₂ (and so on) in categories of X increases. The relative magnitude of the sum of squares of X in MANOVA in SPSS increases as the variation in Y₁,Y₂ (and so on) within the categories of X decreases.

The strength of the effects of X on Y₁,Y_{2 (}and so on) is measured with the help of η² in MANOVA in SPSS .The value of η² varies between 0 and 1. η² assumes a value of 0 in MANOVA in SPSS when all the category means are equal, indicating that X has no effect on Y₁,Y₂(and so on). η² assumes a value of 1, when there is no variability within each category of X, while there is some variability between the categories.

The final step in MANOVA in SPSS is to calculate the mean square which is obtained by dividing the sum of squares by the corresponding degrees of freedom. The null hypothesis of equal vectors of mean is done by an F statistic, which is the ratio of the mean square related to the independent variable to the mean square related to error.

For further assistance with SPSS click here.

Wednesday, March 18, 2009

Anova in SPSS

Analysis of Variance, i.e. ANOVA in SPSS, is used for examining the differences in the mean values of the dependent variable associated with the effect of the controlled independent variables, after taking into account the influence of the uncontrolled independent variables. Essentially, ANOVA in SPSS is used as the test of means for two or more populations.

ANOVA in SPSS must have a dependent variable which should be metric (measured using an interval or ratio scale). ANOVA in SPSS must also have one or more independent variables, which should be categorical in nature. In ANOVA in SPSS, categorical independent variables are called factors. A particular combination of factor levels, or categories, is called a treatment.

In ANOVA in SPSS, there is one way ANOVA which involves only one categorical variable, or a single factor. For example, if a researcher wants to examine whether heavy, medium, light and nonusers of cereals differed in their preference for Total cereal, then the differences can be examined by the one way ANOVA in SPSS. In one way ANOVA in SPSS, a treatment is the same as the factor level.

If two or more factors are involved in ANOVA in SPSS, then it is termed as n way ANOVA. For example, if the researcher also wants to examine the preference for Total cereal by the customers who are loyal to it and those who are not, then we can use n way ANOVA in SPSS.
In ANOVA in SPSS, from the menu we choose:

“Analyze” then go to “Compare Means” and click on the “One-Way ANOVA.”

Now, let us discuss in detail how the software operates ANOVA:

The first step is to identify the dependent and independent variables. The dependent variable is generally denoted by Y and the independent variable is denoted by X. X is a categorical variable having c categories. The sample size in each category of X is generally denoted as n, and the total sample size N=nXc.

The next step in ANOVA in SPSS is to examine the differences among means. This involves decomposition of the total variation observed in the dependent variable. This variation in ANOVA in SPSS is measured by the sums of the squares of the mean.

The total variation in Y in ANOVA in SPSS is denoted by SSy, which can be decomposed into two components:

SSy=SSbetween+SSwithin

where the subscripts between and within refers to the categories of X in ANOVA in SPSS. SSbetween is the portion of the sum of squares in Y related to the independent variable or factor X. Thus it is generally referred to as the sum of squares of X. SSwithin is the variation in Y related to the variation within each category of X. It is generally referred to as the sum of squares for errors in ANOVA in SPSS.

The logic behind decomposing SSy is to examine the differences in group means.
The next task in ANOVA in SPSS is to measure the effects of X on Y, which is generally done by the sum of squares of X, because it is related to the variation in the means of the categories of X. The relative magnitude of the sum of squares of X in ANOVA in SPSS increases as the differences among the means of Y in categories of X increases. The relative magnitude of the sum of squares of X in ANOVA in SPSS increases as the variation in Y within the categories of X decreases.

The strength of the effects of X on Y is measured with the help of η2 in ANOVA in SPSS .The value of η2 varies between 0 and 1. It assumes a value 0 in ANOVA in SPSS when all the category means are equal, indicating that X has no effect on Y. The value of η2 becomes 1, when there is no variability within each category of X but there is still some variability between the categories.

The final step in ANOVA in SPSS is to calculate the mean square which is obtained by dividing the sum of squares by the corresponding degrees of freedom. The null hypothesis of equal means, which is done by an F statistic, is the ratio between the mean square related to the independent variable and the mean square related to the error.

N way ANOVA in ANOVA in SPSS involves simultaneous examination of two or more categorical independent variables, which is also computed in a similar manner.

A major advantage of ANOVA in SPSS is that the interactions between the independent variables can be examined.

For further assistance with SPSS click here.

Monday, March 16, 2009

Correlation in SPSS

Correlation is a statistical technique that shows how strongly two variables are related to each other or the degree of association between the two. For example, if we have the weight and height data of taller and shorter people, with the correlation between them, we can find out how these two variables are related. We can also find the correlation between these two variables and say that their weights are positively related to height. Correlation is measured by the correlation coefficient. It is very easy to calculate the correlation coefficient in SPSS. Before calculating the correlation in SPSS, we should have some basic knowledge about correlation. The correlation coefficient should always be in the range of -1 to 1. There are three types of correlation:

1. Positive and negative correlation: When one variable moves in the same direction, then it is called positive correlation. When one variable moves in a positive direction, and a second variable moves in a negative direction, then it is said to be negative correlation.

2. Linear and non linear or curvi-linear correlation: When both variables change at the same ratio, they are known to be in linear correlation. When both variables do not change in the same ratio, then they are said to be in curvi-linear correlation. For example, if sale and expenditure move in the same ratio, then they are in linear correlation and if they do not move in the same ratio, then they are in curvi-linear correlation.

3. Simple, partial and multiple correlations: When two variables in correlation are taken in to study, then it is called simple correlation. When one variable is a factor variable and with respect to that factor variable, the correlation of the variable is considered, then it is a partial correlation. When multiple variables are considered for correlation, then they are called multiple correlations.

Degree of correlation

1. Perfect correlation: When both the variables change in the same ratio, then it is called perfect correlation.

2. High degree of correlation: When the correlation coefficient range is above .75, it is called high degree of correlation.

3. Moderate correlation: When the correlation coefficient range is between .50 to .75, it is called in moderate degree of correlation.

4. Low degree of correlation: When the correlation coefficient range is between .25 to .50, it is called low degree of correlation.

5. Absence of correlation: When the correlation coefficient is between . 0 to .25, it shows that there is no correlation.

There are many techniques to calculate the correlation coefficient, but in correlation in SPSS there are four methods to calculate the correlation coefficient. For continuous variables in correlation in SPSS, there is an option in the analysis menu, bivariate analysis with Pearson correlation. If data is in rank order, then we can use Spearman rank correlation. This option is also available in SPSS in analyses menu with the name of Spearman correlation. If data is Nominal then Phi, contingency coefficient and Cramer’s V are the suitable test for correlation. We can calculate this value by requesting SPSS in cross tabulation. Phi coefficient is suitable for 2×2 table. Contingency coefficient C is suitable for any type of table.

Testing the Significance of a Correlation:

Once we compute the correlation coefficient, then we will determine the probability that observed correlation occurred by chance. For that, we have to conduct a significance test. In significance testing we are mostly interested in determining the probability that correlation is the real one and not a chance occurrence. For this we determine hypothesis. There are two types of hypothesis.

Null hypothesis: In Null hypothesis we assume that there is no correlation between the two variables.

Alternative hypothesis: In alternative hypothesis we assume that there is a correlation between variables.

Before testing the hypothesis, we have to determine the significance level. In most of the cases, it is assumed as .05 or .01. At 5% level of significance, it means that we are conducting a test, where the odds are the case that the correlation is a chance occurrence is no more than 5 out of 100. After determining the significance level, we calculate the correlation coefficient value. The correlation coefficient value is determined by ‘r’ sign.

Coefficient of determination:

With the help of the correlation coefficient, we can determine the coefficient of determination. Coefficient of determination is simply the variance that can be explained by X variable in y variable. If we take the square of the correlation coefficient, then we will find the value of the coefficient of determination.

For further assistance with Correlations or SPSS Click Here.

Data Entry in SPSS

Data Entry in SPSS is the most important task involved in any analysis. Data may exist in any form; it may be written on a piece of paper or it may be typed in a computer in raw data form. Before doing data entry in SPSS, one should start SPSS. It is very easy to start SPSS from the start menu by just clicking on the “SPSS” icon. As soon as SPSS opens, a window will appear, which is called the “data viewer window.” In SPSS, data viewer column value is known as the record measure or the variable and row to identify the case (or subject). If the data size is small, then the data entry in SPSS can be done manually in the data viewer window. Whenever a data size is large, then the data entry in SPSS is not possible manually. There are a number of options to do data entry in SPSS. Most of the data is available in Excel, CSV (comma separated value), and text. It is also available in other software formats like SAS, STATA etc.

Reading data using import wizard: Whenever data is outside the SPSS or in any other format, then by using the import wizard we can import data in SPSS. To use the import wizard, we have to click on the “open data” option available on the SPSS file menu, then click the “data” option. The window that appears after clicking the open data option is called the open data wizard. This window will show the location where the data is located. For example, if our data is in the D drive, then we have to click on the “my computer” icon and then find the “D drive,” and click on it to access our data, where we will select the folder in which data is available. As we select the data folder, the name of the data folder will appear below the file name option. If data is not in SPSS file format, then the file will not appear there. Thus, from the “file type” option, we will select the format of the data available over there for the specific file format. Then the file will appear in the desired format (over here it is the SPSS format). Then we can select that file and click on the “ok” button. Now data will appear in the data view window. This is an easy way to do data entry in SPSS when data is large. If data is in Excel format and it is in medium size, then by using the copy and paste option (available in SPSS) we can do data entry in SPSS. But just doing the data entry in SPSS is not sufficient for analysis.

Variable properties: One of the drawbacks of data entry in SPSS is that it cannot perform analysis on a character variable. To perform analysis, we have to convert the character variable to a numeric variable. There is another window in SPSS called the “variable view.” This window shows variables properties. In the variable view window, there are a number of options available that can help to clean up the data and to perform analysis. For data entry in SPSS, we can give only 8 characters to name a variable. But most of the time, a variable name is more than 8 characters. To overcome this problem, we can use the “variable view” window, where we can assign a label to that variable, and in this label, we can assign a label length as we want. To convert the character variable into a numeric variable, we will record the new value by using the “value” option. This option will ask for the old value and the new value that we want to assign to it. Sometimes there are a number of extreme values in data, and one more drawback for data entry in SPSS is that in SPSS, zero value is not considered as a missing value. But SPSS has an option called the “missing value option.” Through this option, we can write that this value is missing and this value is an extreme value. Now SPSS considers that value as the missing value. Sometimes data is in the numeric form and SPSS considers that variable as a string variable. In that case, we have to convert it into numeric type from the “type” option. There is another option through which we can assign a decimal point to the variable. We can fix the alignment of variable left, right, center etc. Sometimes during the analysis, we have to perform a number of mathematical operations with variables. For example, if we want to compute one variable from five variables by calculating an average of all the variables. The “compute” option available in the “transform” option can help you do several mathematical operations. Sometimes we have to split the data by category. By using the “split file” option available in “data” option, we can categorize data in different categories.

Syntax option: SPSS function can be performed in two different ways by using the “click” and “menu” option and the “command based” option. If someone is familiar with the SPSS command, then the above mentioned functions can also be performed in the syntax menu.
For more assistance with SPSS click here.

Thursday, March 12, 2009

SPSS Tutorial

Statistical Package for the Social Sciences (SPSS) is a software program that performs statistical analysis of data. This package contains an icon called “Tutorial” in the Help Menu, which explains a step-by-step account of how to work in SPSS in very detailed format, with the help of case studies. It can be regarded as a Statistical Analysis guide.

SPSS Tutorial consists of the following topics:

· Introduction

This part of the SPSS Tutorial mainly explains the usage of sample files. Sample files are the files which contain fictitious survey data. Through the Introduction, we generally get familiar with SPSS as it gives a step-by-step account of how to open a data file, conduct analysis and see the output.

· Reading Data

There are various sources to import in SPSS format data files. These include spreadsheet applications (such as MS Excel), database applications (such as MS Access) and text files to import in SPSS format data files. Reading data in the SPSS Tutorial explains the process of reading the data in a step-by-step manner. This data is stored in SPSS- format data files imported from various sources.

· Using the Data Editor

The Data Editor in the SPSS Tutorial generally informs us about how to enter a string data or a numeric data in an SPSS data file while using the “variable view” and “data view” option. It also tells us about how to handle missing data, how to add the variable labels, how to value labels for numeric variables and string variables, or how to use value labels for data entry. It also discusses defining the variable properties for categorical variables in a graphical manner.

· Working with Multiple Data Sources

This part of the SPSS tutorial helps by telling us about how to switch back and forth between data sources, how to compare the contents of different data sources, how to copy and paste between data sources, and how to create multiple subsets of cases and/or variables for analysis. It also tutors on how to merge multiple data sources from various data formats without first saving each data source. In general, it tutors us about the basic handling of multiple data sources (including copying and pasting information between datasets).

· Examining Summary Statistics for Individual Variables

In this part of the SPSS tutorial, the tutorial mentions how the level of measurement of a variable influences the type of statistics that should be used. It mainly emphasizes the summary measures for categorical data and scale variables.

· Cross tabulation tables

This topic in the SPSS Tutorial is in the form of some case studies from the sample files. It handles the different types of cross tabulation including simple cross tabulation, count vs. percentages, significance testing for cross tabulations etc…

· Creating and Editing charts

In this section of the SPSS tutorial, the tutorial mainly details how to create a chart using the chart builder gallery. It also explains how to edit the charts. This can be understood with the help of examples of Pie Chart and Grouped Scatter Plot.

· Working with Output

In this section of the SPSS tutorial, the tutorial explains the output, which is displayed after every analysis. This section mainly covers the methodology of observing the output and editing it with the help of Pivot Table Editor as per the requirement.

· Working with Syntax

Instead of clicking on the menus to perform operations, one can write programs in SPSS. This topic in the SPSS Tutorial details information about editing the syntax while programming in SPSS. It also tells us about how to open and run a syntax file.

· Time Saving Features

This topic in the SPSS Tutorial generally gives information about features which can save time while performing the same type of analysis on similar sets of data.

· Customizing SPSS

This topic in the SPSS Tutorial gives guidance on how to make the frequently used menus and toolbars more accessible in order to save time.

· Getting started with tables

There is much analysis that results in the form of tables. This topic in the SPSS Tutorial explains the table add-on module, which is especially useful in survey analysis and market research.

· Index Feature

This feature in the SPSS Tutorial generally enables the user to be comfortable with all types of statistical analyses from A to Z. It details all the tests and techniques in a step-by-step graphical manner.

For assistance with using SPSS for your thesis or dissertation click here.