A earlier article reveals that you need to use the Intercept parameter to regulate the ratio of occasions to nonevents in a simulation of information from a logistic regression mannequin. In case you lower the intercept parameter, the chance of the occasion decreases; in case you improve the intercept parameter, the chance of the occasion will increase.
The chance of Y=1 additionally depends upon the slope parameters (coefficients of the regressor results) and on the distribution of the explanatory variables. This text reveals the best way to visualize the chance of an occasion as a operate of the regression parameters. I present the visualization for 2 one-variable logistic fashions. For the primary mannequin, the distribution of the explanatory variable is normal regular. Within the second mannequin, the distribution is exponential.
Visualize the chance because the Intercept varies
In a earlier article, I simulated knowledge from a one-variable binary logistic mannequin. I simulated the explanatory variable, X, from a regular regular distribution. The linear predictor is given by
η = β0 + β1 X. The
chance of the occasion Y=1 is given by
μ = logistic(η) = 1 / (1 + exp(-η)).
Within the earlier article, I checked out varied mixtures of Intercept (β0) and Slope (β1) parameters. Now, let’s look systematically at how the chance of Y=1 depends upon the Intercept for a set worth of Slope > 0. Many of the following program is repeated and defined in my earlier submit. In your comfort, it’s repeated right here. First, this system simulates knowledge (N=1000) for a variable, X, from a regular regular distribution. Then it simulates a binary variable for a collection of binary logistic fashions because the intercept parameter varies on the interval [-3, 3].
(To get higher estimates of the chance of Y=1, this system simulates 100 knowledge units for every worth of Intercept.)
Lastly, the variable calls PROC FREQ to compute the proportion of simulated occasions (Y=1), and it plots the proportion versus the Intercept worth.
/* simulate X ~ N(0,1) */ knowledge Explanatory; name streaminit(12345); do i = 1 to 1000; x = rand("Regular", 0, 1); /* ~ N(0,1) */ output; finish; drop i; run; /* as Intercept adjustments, how does P(Y=1) change when Slope=2? */ %let Slope = 2; knowledge SimLogistic; name streaminit(54321); set Explanatory; do Intercept = -3 to 3 by 0.2; do nSim = 1 to 100; /* Optionally available: param estimate is best for big samples */ eta = Intercept + &Slope*x; /* eta = linear predictor */ mu = logistic(eta); /* mu = Prob(Y=1) */ Y = rand("Bernoulli", mu); /* simulate binary response */ output; finish; finish; run; proc kind knowledge=SimLogistic; by Intercept; run; ods choose none; proc freq knowledge=SimLogistic; by Intercept; tables Y / nocum; ods output OneWayFreqs=FreqOut; run; ods choose all; title "P.c of Y=1 in Logistic Mannequin"; title2 "Slope=&Slope"; footnote J=L "X ~ N(0,1)"; proc sgplot knowledge=FreqOut(the place=(Y=1)); collection x=Intercept y=p.c; xaxis grid; yaxis grid label="P.c of Y=1"; run; |
Within the simulation, the slope parameter is ready to Slope=2, and the Intercept parameter varies systematically between -3 and three.
The graph visualizes the chance (as a share) that Y=1, given varied values for the Intercept parameter. This graph depends upon the worth of the slope parameter and on the distribution of the information. It additionally has random variation, because of the name to generate Y as a random Bernoulli variate.
For these knowledge, the graph reveals the estimated chance of the occasion as a operate of the Intercept parameter. When Intercept = –2, Pr(Y=1) = 22%;
when Intercept = 0, Pr(Y=1) = 50%; and when Intercept = 2, Pr(Y=1) = 77%. The assertion that Pr(Y=1) = 50% when Intercept=0 ought to be roughly true when the information is from a symmetric distribution with zero imply.
Differ the slope and intercept collectively
Within the earlier part, the slope parameter is fastened at Slope=2. What if we enable the slope to fluctuate over a spread of values? We will limit our consideration to optimistic slopes as a result of logistic(η) = 1 – logistic(-η).
The next program varies the Intercept parameter within the vary [-3,3] and the Slope parameter within the vary [0, 4].
/* now do two-parameter simulation research the place slope and intercept are diverse */ knowledge SimLogistic2; name streaminit(54321); set Explanatory; do Slope = 0 to 4 by 0.2; do Intercept = -3 to 3 by 0.2; do nSim = 1 to 50; /* non-obligatory: the parameter estimate are higher for bigger samples */ eta = Intercept + Slope*x; /* eta = linear predictor */ mu = logistic(eta); /* rework by inverse logit */ Y = rand("Bernoulli", mu); /* simulate binary response */ output; finish; finish; finish; run; /* Monte Carlo estimate of Pr(Y=1) for every (Int,Slope) pair */ proc kind knowledge=SimLogistic2; by Intercept Slope; run; ods choose none; proc freq knowledge=SimLogistic2; by Intercept Slope; tables Y / nocum; ods output OneWayFreqs=FreqOut2; run; ods choose all; |
The output knowledge set (FreqOut2) accommodates Monte Carlo estimates of the chance that Y=1 for every pair of (Intercept, Slope) parameters, given the distribution of the explanatory variable.
You possibly can use a contour plot to visualise the chance of the occasion for every mixture of slope and intercept:
/* Create template for a contour plot https://blogs.sas.com/content material/iml/2012/07/02/create-a-contour-plot-in-sas.html */ proc template; outline statgraph ContourPlotParm; dynamic _X _Y _Z _TITLE _FOOTNOTE; begingraph; entrytitle _TITLE; entryfootnote halign=left _FOOTNOTE; format overlay; contourplotparm x=_X y=_Y z=_Z / contourtype=fill nhint=12 colormodel=twocolorramp title="Contour"; continuouslegend "Contour" / title=_Z; endlayout; endgraph; finish; run; /* render the Monte Carlo estimates as a contour plot */ proc sgrender knowledge=Freqout2 template=ContourPlotParm; the place Y=1; dynamic _TITLE="P.c of Y=1 in a One-Variable Logistic Mannequin" _FOOTNOTE="X ~ N(0, 1)" _X="Intercept" _Y="Slope" _Z="P.c"; run; |
As talked about earlier, as a result of X is roughly symmetric, the contours of the graph have reflective symmetry.
Discover that the
chance that Y=1 is 50% every time Intercept=0 for these knowledge.
Moreover, if the factors (β0, β1) are on the contour for Pr(Y=1)=α, then
the contour for Pr(Y=1)=1-α accommodates factors near (-β0, β1).
The earlier line plot is equal to slicing the contour plot alongside the horizontal line Slope=2.
You need to use a graph like this to simulate knowledge which have a specified chance of Y=1. For instance, if you need roughly 70% of the instances to be Y=1,
you’ll be able to select any pair of (Intercept, Slope) values alongside that contour, akin to (0.8, 0), (1, 1), (2, 3.4), and (2.4, 4).
If you wish to see all parameter values for which Pr(Y=1) is near a desired worth, you need to use the WHERE assertion in PROC PRINT. For instance, the next name to PROC PRINT shows all parameter values for which the Pr(Y=1) is roughly 0.7 or 70%:
%let Goal = 70; proc print knowledge=FreqOut2; the place Y=1 and %sysevalf(&Goal-1) <= P.c <= %sysevalf(&Goal+1); var Intercept Slope Y P.c; run; |
Possibilities for nonsymmetric knowledge
The symmetries within the earlier graphs are a consequence of the symmetry within the knowledge for the explanatory variable. To display how the graphs change for a nonsymmetric knowledge distribution, you’ll be able to run the identical simulation research, however use knowledge which might be exponentially distributed. To eradicate doable results resulting from a special imply and variance within the knowledge, the next program standardizes the explanatory variable in order that it has zero imply and unit variance.
knowledge Expo; name streaminit(12345); do i = 1 to 1000; x = rand("Expon", 1.5); /* ~ Exp(1.5) */ output; finish; drop i; run; proc stdize knowledge=Expo methodology=Std out=Explanatory; var x; run; |
In case you rerun the simulation research by utilizing the brand new distribution of the explanatory variable, you get hold of the next contour plot of the chance as a operate of the Intercept and Slope parameters:
In case you examine this new contour plot to the earlier one, you will notice that they’re very comparable for small values of the slope parameter. Nonetheless, they’re totally different for bigger values akin to Slope=2.
The contours within the new plot don’t present reflective symmetry concerning the vertical line Intercept=0.
Pairs of parameters for which Pr(Y=1) is roughly 0.7 embody
(0.8, 0) and (1, 1), that are the identical as for the earlier plot, and (2.2, 3.4), and (2.6, 4), that are totally different from the earlier instance.
Abstract
This text presents a Monte Carlo simulation research in SAS to compute and visualize how the Intercept and Slope parameters of a one-variable logistic mannequin have an effect on the chance of the occasion. A earlier article notes that lowering the Intercept decreases the chance. This text reveals that the chance depends upon each the Intercept and Slope parameters. Moreover, the chance depends upon the distribution of the explanatory variables. You need to use the outcomes of the simulation to regulate the proportion of occasions to nonevents within the simulated knowledge.
You possibly can obtain the entire SAS program that generates the outcomes on this article.
The concepts on this article generalize to logistic regression fashions that include a number of explanatory variables. For multivariate fashions, the impact of the Intercept parameter is comparable. Nonetheless, the impact of the slope parameters is extra sophisticated, particularly when the variables are correlated with one another.