sas handbook

Upload: lmontes93

Post on 04-Jun-2018

229 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 SAS Handbook

    1/20

    Math 338 - Introduction to SAS

    Fall 2013

    SAS HANDBOOKBy: Luis Montes

  • 8/13/2019 SAS Handbook

    2/20

    Table of Contents

    1.DATA MANAGEMENT1.1 Data Step

    A. DATA Statement OptionsB. Defining VariablesC. Input StatementD. Datalines StatementE. Set StatementF. Merge StatementG. Length StatementH. Label StatementI. If-Else StatementJ. Infile StatementK. Do StatementL. Keep-Drop StatementsM. Output StatementN. Generating Random NumbersO. Internal ValuesP. Format and Informat StatementsQ. File StatementR. Put StatementS. Array Statement1.2 Proc Import StepA. Proc Import Statement OptionsB. Getnames Statement

    1.3 Statements Outside of Data andProcedure Steps

    A. Libname StatementB. Quit Statement

    2.SORTING,PRINTING,ANDSUMMARIZING DATA2.1 Proc Print Step

    A. Proc Print Statement OptionsB. ID StatementC. By StatementD. Sum StatementE. Title & Footnote StatementsF. Var StatementG. Sumby Statement

    2.2 Proc Frequency StepA. Proc Frequency Statement OptionsB. Weight StatementC. Tables StatementD. Where Statement

    2.3 Proc Contents StepA. Proc Contents Statement Options

    2.4 Proc Tabulate StepA. Proc Tabulate Statement OptionsB. Class StatementC. Var StatementD. Table Statement

    2.5 Proc Sort StepA. Proc Sort Statement Options2.6 Proc GChart Step

    A. Proc GChart Statement OptionsB. HBar, VBar, and VBar3DC. Block Statement

    2.7 Proc GPlot StepA. Proc GPlot Statement Options

    B. Plot StatementC. Symbol Statement

  • 8/13/2019 SAS Handbook

    3/20

    2.8 Proc Format StepA. Proc Format Statement OptionsB. Value StatementC. Picture Statement

    3.STATISTICAL ANALYSIS IN SAS3.1 Proc Univariate Step

    A. Proc Univariate Statement OptionsB. Var StatementC. Histogram Statement

    3.2 Proc Means StepA. Proc Means Statement OptionsB. Var Statement

    3.3 Proc ttest StepA. Proc ttest Statement OptionsB. Class StatementC. Var StatementD. Paired Statement

    3.4 Proc Corr StepA. Proc Corr Statement OptionsB. Var Statement

    3.5 Proc Reg StepA. Proc Reg Statement OptionsB. Model StatementC. Plot Statement

    3.6 Proc GLM StepA. Proc GLM Statement OptionsB. LSMeans Statement

    3.7 Proc Logistic StepA. Proc Logistic StatementB. Class StatementC. Model Statement

  • 8/13/2019 SAS Handbook

    4/20

    1DATA MANAGEMENT1.1DATA STEPA. Data Statement Options

    DATADATA-SET-NAME-1

  • 8/13/2019 SAS Handbook

    5/20

    This is a trailing @. It must be the last item in the input statement or else itbecomes a pointer control. It holds the input reader at the final location, and the

    next input statement continues at this spot.

    -#n

    This is a line pointer. It moves the input reader to row n.-/

    Advances the input reader to the first column of the next line.-@n

    This is a column pointer. It moves the input reader to column n. n must be aninteger.

    D. Datalines StatementSYNTAX: DATALINES;

    -With no options, the datalines statement is followed by raw data entered by the user.

    SAS software displays this by highlighting the raw data in yellow.

    -Delimiter= option Specifies what is delimiting the raw data. By default SAS uses one space as a

    delimiter, but it can also use commas or tabs (dlm=09x) among many others.

    E. Set StatementSYNTAX: SETDATA-SET(S);

    -Recall that the DATA step is itself a loop being applied to a data set. Whenever the Set

    statement is read, it reads one row of observations (including all variables), into the

    program data vector, whichcan be manipulated in the data set and even output if

    desired.

    -IN=option

    This option generates a new variable (which we name), which takes a value of 1 ifthe data set contributes to an observation and take a value of 0 otherwise.

    F. Merge StatementSYNTAX: MERGEDATA-SET(S);

  • 8/13/2019 SAS Handbook

    6/20

    -The Merge statement differs from the set statement in that instead of combining datasets by stacking observations vertically, the merge statement combines observations of

    data sets horizontally, adding variables. A BY;statement following a merge

    statement is very helpful.

    G. Length StatementSYNTAX: LENGTHVARIABLE-1VARIABLE-1-LENGTH;

    -The length statement changes the length of a variable to 2-8 or 3-8 for numeric variables

    (depending on operating environment) and 1-32767 for alphanumeric variables.

    Variables can also be defined in the length statement, as such, placing a $ after a

    variable name specifies it as an alphanumeric variable.

    H. Label StatementSYNTAX: LABEL=;

    -The label statement changes the face name of the variable it is applied to. If it is applied

    in a data step, the label is permanently associated with the variable. It can be applied in a

    procedure step, but if it is not used in the data step, the label will not be used outside the

    procedure step.

    I. If-Else StatementSYNTAX: IF(LOGICALEXPRESSION)THEN(STATEMENT);;

    -SAS reads the logical expression after IF and if it returns a TRUE value, then it executes

    the statement after THEN. An ELSE statement is not necessary but it need follow the IF

    statement, and its statement is executed if the logical expression after IF returns a FALSE

    value.

    J. Infile StatementSYNTAX: INFILEFILE-PATH;

    -The file-path is a pathway to an external file we want to pull into SAS, such as a .txt file.

    Just as it was used for the datalines statement, DLM= can be used as an optionhere.

    -FLOWOVER option

    The default method of reading for infile. When a data set has a missing value, it isskipped and the input reader gives a variable the character that follows.

  • 8/13/2019 SAS Handbook

    7/20

    -MISSOVER option

    The input reader continues onto the next variable when it detects a missingvalue, and specifies remaining variables (when it reaches end of input line) as

    missing values.

    -STOPOVER option

    The input reader is stopped and it omits a row when it detects a missingvalue.

    The figure to the right is a screenshot of examples for MISSOVER, FLOWOVER, and

    STOPOVER options for the infile statement. They are applied to the data set:

    1, 2, 31, , 3, 2, 3

    K. Do StatementSYNTAX:DOINDEX-VAR=SPECIFICATION;SASSTATEMENT(S)

    -Conditional Do Loops (While)

    We have the option to have SAS execute statements while a logical expression istrue. The logical expressions value is checked after all the statements are

    executed.

    -Conditional Do Loops (Until)

    We have the option to have SAS execute statements until a logical expressionbecomes true. The logical expressions value is checked before any of the

    statements are executed.

    -Iterative Do Loops (Ex. i=1 to 100 by 5)

    We can also have SAS execute a statement a finite number of times, while alsocreating an iterative variable. The by option designates the increment

    L. Keep-Drop Statement

  • 8/13/2019 SAS Handbook

    8/20

    SYNTAX: DROPVARIABLE-1,VARIABLE-N;

    KEEPVARIABLE-1,VARIABLE-N;

    -The Drop statement drops all listed variables in the data set. Variables not listed remain.

    -The Keep statement keeps all listed variables in the data set. Variables not listed are

    dropped.

    -Keep and Drop can also be used as options in a set statement, in the form: SET DATA

    (KEEP=VARIABLE);

    M.Output StatementSYNTAX: OUTPUT;

    -Without listing data sets after OUTPUT, the OUTPUT statement writes the current

    observation to all data sets in the data statement. Otherwise, only the data sets listed

    take the current observation.

    N. Generating Random NumbersSYNTAX: VARIABLE=RAND(DISTRIBUTION);

    -The random function generates a random number with a given distribution.

    RAND(BINOMIAL,p,n) ~ Bin(p,n) RAND(GEOMETRIC,p) ~ Geom(p) RAND(POISSON,m) ~ Pois(m) RAND(UNIFORM) ~ U(0,1) RAND(BERNOULLI,p) ~ Bern(p)

    O. Internal Values _N_ : The number of observations in the DATA set.

    P. Format and Informat StatementsSYNTAX: FORMATVARIABLE-1FORMAT-1

    INFORMATVARIABLE-1INFORMAT-1

  • 8/13/2019 SAS Handbook

    9/20

    -The format statement changes the appearance of a variable without changing the

    original variable. A list of formats can be found at:

    http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm a001263753.htm.-The informat statement tells SAS to permanently change the raw data form of a variableinto a formatted form. Informats can also be applied in the input statement. Informats

    for SAS 9.2 can be found at:

    http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm a001239776.htm.

    Q.File StatementSYNTAX: FILEFILE-PATH

    -The file statement creates an external file that will be written by the put statements in

    the data step. We can also use the print device so that the created external file is alsodisplayed in the output window.

    R. Put StatementSYNTAX: PUTVARIABLE;

    -The put statement works similarly to the input statement, only it is applied to the

    printing of an external file given by the file statement.

    S. Array StatementSYNTAX:ARRAYARRAY-NAME{SUBSCRIPT};

    -SAS generates an array with the ARRAY statement. The name, subscript, whether or not

    its alphanumeric (placing the $ symbol), length and elements are generated by the user.

    1.2PROC IMPORT STEPA. Proc Import Statement Options

    SYNTAX: PROCIMPORTDATAFILE=FILE-PATHOUT=DATA-SET;

    -The proc import step is helpful for importing large files (given by the file-path) into SAS

    such as excel (.xls) files and export (.xpt) files. The proc import statement includes an out

    argument, producing a data set. The replace option will overwrite any existing data set

    with the same name.

    http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a001263753.htmhttp://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a001263753.htmhttp://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a001263753.htmhttp://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a001239776.htmhttp://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a001239776.htmhttp://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a001239776.htmhttp://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a001239776.htmhttp://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a001239776.htmhttp://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a001263753.htmhttp://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a001263753.htm
  • 8/13/2019 SAS Handbook

    10/20

    B. GetNames StatementSYNTAX: GETNAMES=(YES-OR-NO)

    -This statement specifies whether or not proc import should take the first row of the

    input data file as the list of variable names.

    1.3STATEMENTS OUTSIDE OF DATA AND PROCEDURE STEPSA. Libname Statement

    SYNTAX: LIBNAMENAMEFOLDER-PATH;

    -The libname statement produces a library for permanent SAS data sets to be created by

    data steps. A permanent SAS data set is created in a data step if it is named

    NAME.dataset, where NAME is the nameof the library.

    B. Options StatementSYNTAX: OPTIONS;

    -The options statement can do things like change the line size, page orientation, etc.

  • 8/13/2019 SAS Handbook

    11/20

    2SORTING,PRINTING,ANDSUMMARIZING DATA2.1PROC PRINT STEP

    A. Proc Print Statement OptionsSYNTAX: PROCPRINT;

    -The proc print step is usually used to show the observations of a data set in a list, while

    giving the user several options. The proc print statement itself has a few options:

    Data=data-setSpecifies which data set to print.

    LabelPrompts SAS to use user-generated labels, whether they be created in the data-

    sets data step or in this proc print step.

    noobsRemoves the observation numbers in the print output.

    B. ID StatementSYNTAX: IDVARIABLE(S);

    -Designates that SAS use a particular variable or set of variables in printing instead of

    observation numbers. If more than one variable is in the ID statement, more than one

    group is printed.

    C. By StatementSYNTAX: BYVARIABLE(S);

    -The by statement specifies the ordering of the printing. If we desire the printing to be

    done in a descending order of a variable, then we can add the Descending option before

    the variable name. If more than one variable is listed, then the printing output is done in

    a group format.

    D. Sum StatementSYNTAX: SUMVARIABLE(S);

  • 8/13/2019 SAS Handbook

    12/20

    -The sum statement totals the values of the given variable(s) and prints them in the

    output window.

    E. Title & Footnote StatementSYNTAX:TITLETEXTMESSAGE;

    FOOTNOTETEXTMESSAGE;

    -The title and footnote statements work the same way. The number specifies the

    placementsmallest numbers indicate main titles/footnotes. It can also be used in many

    other procedures with the same effect.

    F. Var StatementSYNTAX:VARVARIABLE(S);

    -The var statement specifies which variables to print and their order. It is used in many

    other procedures.G. Sumby Statement

    SYNTAX: SUMBYVARIABLES(S);

    -The output print will include a sum for each variable listed in the sumby statement.2.2PROC FREQUENCY STEP

    A. Proc Frequency Statement OptionsSYNTAX: PROCFREQUENCY;

    -The frequency procedure is effective in analyzing categorical data as it provides

    frequency counts, proportions, and can be used to perform chi-square tests. The order

    option can take values data, formatted, freq, or internal. The data order is the one of the

    appear FORMATTED: Sorted by order of formatting FREQ: Sorted by descending frequency count INTERNAL: Taking the order of the unformatted values DATA: Order in input data set

    B. Weight Statement

  • 8/13/2019 SAS Handbook

    13/20

    SYNTAX: WEIGHTVARIABLE;

    -Specifying which numeric variable gives the counts of each observation in the input

    data set.C. Tables Statement

    SYNTAX: TABLES;

    -The tables statement generates tables that can be one-way to n-way tables.

    -ALPHA= option

    Setting confidence level for confidence intervals-Binomial option

    Getting binomial proportion, confidence limits, and tests if tables are one-way-Chisq option

    Getting chi-square tests and statisticsD. Where Statement

    SYNTAX: WHEREEXPRESSION-1;

    -Producing proc frequency outputs only where the expression(s) return true values. Can

    be used in many other procedures.2.3PROC CONTENTS STEP

    A. Proc Contents Statement OptionsSYNTAX: PROCCONTENTS;

    -The contents procedure produces a detailed description of a given data set, such as a

    listing of variables with descriptions like length, type, etc.; number of observations in the

    data set; etc.

    2.4PROC TABULATE STEPA. Proc Tabulate Statement Options

    SYNTAX: PROCTABULATE;

  • 8/13/2019 SAS Handbook

    14/20

    -The tabulate procedure provides statistics that can be produced in other procedures,

    but places them in a compact table/set of tables.

    B. Class StatementSYNTAX: CLASSVARIABLE(S);

    -The class statement is used in many procedures, it specifies one or more variables to be

    grouped.

    C. Var StatementSYNTAX: VARVARIABLE(S);

    -The var statement is used in many procedures, it specifies one or more variables to be

    analyzed, the method of which depending on the procedure.

    D. Table StatementSYNTAX: TABLEVARIABLE(S);

    -The class statement is used in many procedures, it specifies one or more variables to be

    grouped.

    2.5PROC SORT STEPA. Proc Sort Statement Options

    SYNTAX: PROCSORT;

    -The sort procedure sorts a data set by a variable specified by a nested by statement. It

    is usually used before a new data set that will merge sorted data sets by a particular

    variable.

    2.6PROC GCHART STEPA. Proc GChart Statement Options

    SYNTAX: PROCGCHART;

    -The GChart procedure produces visual summaries of data in the form of charts. We can

    produce block charts, horizontal and vertical bar charts, pie and donut charts, and star

    charts.

  • 8/13/2019 SAS Handbook

    15/20

    B. HBar, VBar and Vbar3d StatementsSYNTAX: HBARVARIABLE-1;

    VBARVARIABLE-1;

    VBAR3DVARIABLE-1;

    -The HBar statement creates a horizontal bar chart for frequencies (default), sums, or

    means. VBar is similar, only the bar charts are vertical.-The HBar3D statement creates a 3-d horizontal bar chart for frequencies (default),sums, or means. VBar3d is similar.

    C. Block Statement-The block statement is very similar to the bar statements only that the block statement

    produces visual summaries in the form of blocks instead of bars.

    2.7PROC GPLOT STEPA. Proc GPlot Statement OptionsSYNTAX: PROC GPLOT ; --

    -The GPLOT procedure produces visual summaries for data, this time on a set of axes.

    B. Plot StatementSYNTAX: PLOT Y-VARIABLE*X-VARIABLE ;

    -We can plot a y-variable against an x-variable very easily with the plot statement.

    C. Symbol StatementSYNTAX:SYMBOL;The symbol statement helps us edit the gplot output.

    2.8PROC FORMAT STEPA. Proc Format Statement OptionsSYNTAX:PROCFORMAT;The format procedure helps change appearance of output

    B. Value StatementSYNTAX:VALUENAME;

    -The value statement works to replace the original values with a format we specify. We can say a

    set of values should take a specific format, whether it be a category, or even a renaming.

  • 8/13/2019 SAS Handbook

    16/20

    C. Picture StatementSYNTAX: PICTURE NAME ;

    The picture and value statements work similarly. Only the picture statement has the option of

    retaining the original value of a variable in addition to adding a character or formatting. For

    example, we can say 0.88 -

  • 8/13/2019 SAS Handbook

    17/20

    3STATISTICAL ANALYSIS IN SAS3.1PROC UNIVARIATE STEP

    A. Proc Univariate Statement OptionsSYNTAX:PROCUNIVARIATE;

    The univariate procedure is effective in producing univariate statistical analysis on one ormore variables. Options include

    Alpha=This option specifies a significance level for the provided 100(1-alpha)%

    invtervals.

    CIBASIC This option requests confidence intervals for the mean, standard deviation and

    variance of specified variable(s) with the assumption they are normally

    distributed.

    Mu0=This option changes the hypothesized value from the default of 0 to a specified

    value.

    B. Var StatementSYNTAX:VAR;

    This statement specifies a variable(s) for univariate analysis.C. Histogram StatementSYNTAX:HISTOGRAM;

    The histogram statement produces a frequency bar chart for a specified variable(s). In theoptions field, we can specify a continuous distribution (ex. Normal, Exponential, etc.) and

    the procedure will superimpose its estimate of the appropriate probability density curve,

    and it will also provide goodness of fit tests.3.2PROC MEANS STEP

  • 8/13/2019 SAS Handbook

    18/20

    A. Proc Means Statement OptionsSYNTAX:PROCMEANS;

    The means procedure is a more compact version of the univariate procedure. The optionsfield is similar to that of the univariate procedure, but we can limit which statistics aredisplayed by listing them in the desired-statistics field (ex. N=# of observations, MEAN,

    SUM, etc.).B. Var StatementSYNTAX:VARVARIABLE;

    The Var statement works the same way here as it does in the univariate procedure.3.3PROC TTEST STEPA. Proc ttest Statement Options

    SYNTAX:PROCTTEST;

    The ttest procedure produces t-tests for single samples, paired observation sets, and twoindependent samples. The options are similar to those of the Means and Univariate

    procedures.B. Class StatementSYNTAX:CLASSVARIABLE;

    Just like in the frequency procedure, the class statement specifies a group variable for thettest procedure. This is required if we do analysis on two independent samples.

    C. Var StatementSYNTAX:VARVARIABLE;

    Again, the var statement works just as it does in many other procedures.D. Paired StatementSYNTAX:PAIREDVARIABLE-A*VARIABLE-B;

    If we desire to perform analysis on a paired sample, we use the paired statement.

  • 8/13/2019 SAS Handbook

    19/20

  • 8/13/2019 SAS Handbook

    20/20

    The GLM procedure is similar to the regression procedure, only it uses the method of leastsquares to fit general linear models.

    B. LSMeans StatementSYNTAX:LSMEANSVARIABLE;

    The LSMEANS statement calculates least squares means for each listed variable. It performsanalysis on them as well.

    3.7PROC LOGISTIC STEPA. Proc Logistic StatementSYNTAX:PROCLOGISTIC;

    The logistic procedure is useful in creating logistic models (a model to predict probabilitiesgiven explanatory variables) and producing analysis for them.

    B. Class StatementSYNTAX:CLASSVARIABLE(S);

    The class statement works in the logistic procedure similarly to how it does in previouslymentioned procedures.

    C. Model StatementSYNTAX:MODELDEPENDENT-BINARY-VARIABLE=EFFECT(S);

    The model statement works similarly to the model statement in the regression procedure,only the dependent variable need be binary in this case.