SSN & STARS - Frequently Asked Questions | Water and Watersheds (W&W) Program - USDA Forest Service Science

Home

Projects

SSN & STARS

FREQUENTLY ASKED QUESTIONS

Frequently Asked Questions & Answers about: STARS | SSN Package | Have a question? Contact us.

STARS FAQs

Which versions of ArcGIS is the STARS toolset compatible with?
Do I need to use the FLoWS and STARS toolsets?
When I run the Check Network Topology tool, I get the error ‘ImportError: No module named win32com.client’.
How would the anisotropy differ based on the orientation of the close points in the systematic/clustered design?
How does the STARS toolset address edges and dem valley floors that don't match? Is there a stream drop or burning step?
Is the ratio value for site location on a reach calculated automatically or manually?
Can you export polygons into the .ssn directory for mapping in R?
How are multiple streams draining to a waterbody or lake handled in the Landscape Network?
If I'm starting with NHD Plus data (which already has contributing areas and many other attributes available), could I skip many of the pre-processing steps in the STARS tutorial? Which would I still have to do?
If you use watershed area to generate the Segment PI values, does this alleviate the need to normalize covariates by watershed area (i.e. % urban in the watershed)?
What is the best way to create prediction sites in your network? For example, if I want to set a prediction site every 1 kilometer in the network, how do I do that in GIS?

SSN Package FAQs

Does the SpatialStreamNetwork object inherit from the ESRI Network object?
For each monitoring site, where does the observed value(s) such as temperature, reside? In the observed sites shapefile?
Why would you see spatial pattern in residuals if after fitting a spatial statistical model? Does a spatial pattern in the residuals suggest that more modelling is needed?
Why are the R-squared values for the models that included spatial autocorrelation substantially lower compared with the basic linear independence models?
What is block kriging?
What is the difference between prediction and estimation?
Why don't I get an AIC value for logistic and Poisson regression models in the SSN package?
When I generate and plot a Torgegram, symbols are missing for some or all distance bins.
When I use the varcomp function, the proportions for the variance components (covariates and covariance models) sums to 1. Does this mean that the model explains 100% of the variability in the response?

STARS

Which versions of ArcGIS is the STARS toolset compatible with?

The STARS toolset is available for ArcGIS versions 10.2.x, and 10.1.x (STARS version 2.0.0) and ArcGIS version 9.3.1 (STARS version 1.0.2).

Do I need to use the FLoWS and STARS toolsets?

If you are working with STARS version < 2.0.0, you will need to use both the FLoWS and STARS toolsets for ArcGIS version 9.3.1. In STARS version 2.0.0 for ArcGIS versions 10.1.x and 10.2.x, the relevant FLoWS tools were incorporated into the STARS toolset, which provides all of the functionality needed to calculate the spatial data needed to fit spatial stream-network models.

When I run the Check Network Topology tool, I get the error ‘ImportError: No module named win32com.client’.

This error usually occurs when you either 1) don't have PythonWin installed, 2) you have the wrong version of Pythonwin installed, or 3) it's installed, but ArcGIS can't find the win32api which comes with PythonWin. The first thing I would do is go to this website: http://sourceforge.net/projects/pywin32/files/pywin32/Build%20218/ and install this file: pywin32-218.win32-py2.7.exe

How would the anisotropy differ based on the orientation of the close points in the systematic/clustered design?

Anisotropy is a concept that doesn’t translate well to branching, linear stream networks. Obviously, you don’t have directional correlations related E-W/N-S. It seems natural to think that upstream sites are correlated with downstream sites, while downstream sites are uncorrelated with upstream sites; however, this is not the case. Even though water doesn’t flow from the downstream site to the upstream site, the measurement at the downstream site still provides information about the values upstream. However, I suppose that anisotropy might exist in the Euclidean component – it’s not something that we considered in the survey design work we’ve done. Though, you could also try to account for those anisotropic patterns using covariates, such as geology type, elevation, etc.

How does the STARS toolset address edges and dem valley floors that don't match? Is there a stream drop or burning step?

We don’t provide any specific tools to deal with these issues. However, we do walk people through the steps of burning streams into a DEM in the STARS tutorial.

Is the ratio value for site location on a reach calculated automatically or manually?

The ratio value for each site is calculated automatically when you use the Snap Sites to Landscape Network tool. That’s why you have to snap sites even if they already lie directly on the line segment.

Can you export polygons into the .ssn directory for mapping in R?

Yes – you could export polygons into the .ssn directory and then map them in R. You’ll have to do it manually because polygons aren’t exported in the Create SSN Object tool. Store them in shapefile format in the .ssn directory. The polygons won’t be part of the SpatialStreamNetwork object in R, but you should be able to plot them in R – have a look at the sp package if you haven’t worked with spatial data in R before.

How are multiple streams draining to a waterbody or lake handled in the Landscape Network?

Multiple streams are often treated as artificial connections in the stream network datasets, which are used to route flow through waterbodies. We don’t include lakes in the LSN and so we rely on these artificial connections to keep everything ‘connected’ up- and down-stream of the waterbody. If there was not an outlet from the lake (i.e. all segments flow into the lake and none flow out), each segment that flows in would have to be considered an outlet segment.

If I'm starting with NHD Plus data (which already has contributing areas and many other attributes available), could I skip many of the pre-processing steps in the STARS tutorial? Which would I still have to do?

Yes, you can skip some of the pre-processing steps if you’re using the NHDPlus – specifically the Create Cost RCAs and accumulating the watershed area using the Accumulate Attributes Downstream tool. You’ll have to run every other STARS tool in order to format the .ssn object properly.

If you use watershed area to generate the Segment PI values, does this alleviate the need to normalize covariates by watershed area (i.e. % urban in the watershed)?

No. Using drainage area as the segment PI doesn’t alleviate the need to normalize covariates such as % urban in the watershed. You’ll still need to do that yourself. The Segment PI values are used to generate the spatial weights, while the covariates are used in the mean model.

What is the best way to create prediction sites in your network? For example, if I want to set a prediction site every 1 kilometer in the network, how do I do that in GIS?

There are lots of different ways to do things in GIS. We used third party software called ET Geowizards (http://www.ian-ko.com/) to generate predictions at 1 km segments. Another option is to put prediction points at the center of each line segment. This is easy to do in ArcGIS: ArcToolbox > Data Management Tools > Features > Feature to Point. Be sure to check the “Inside” box.

SSN Package

Does the SpatialStreamNetwork object inherit from the ESRI Network object?

In the SSN package, the logistic and Poisson regression models use a model fitting method that has been called quasi-pseudo-likelihood (which is described in the SSN documentation). The real likelihood, including the spatial part of the model, is too difficult to compute. Because there is no true likelihood, AIC cannot be computed. Some have proposed a method called QAIC, but it has a questionable reputation and has not been tested for these types of models. Hence, we do not include AIC, except for models based on a normal distribution. You can use cross-validation, P-values, or some other method to compare, or select, among models. This approach, and the information that we give, is based on the same underlying algorithm as PROC GLIMMIX in SAS, and the glmmPQL function from the MASS package in R. You can read up on that to help you make modelling decisions and understand it better.

For each monitoring site, where does the observed value(s) such as temperature, reside? In the observed sites shapefile?

The observed temperature values are stored in the sites shapefile attribute table. When the .ssn object is imported into R, these values reside in the SpatialStreamNetwork object. More specifically, they are stored in the observed sites point.data data.frame. Please see the help for the SpatialStreamNetwork class object in R for more information (just type help(“SpatialStreamNetwork-class”)). Also, the getSSNdata.frame() function can be used to easily extract this data.frame from the SpatialStreamNetwork object.

Why would you see spatial pattern in residuals if after fitting a spatial statistical model? Does a spatial pattern in the residuals suggest that more modelling is needed?

No - the residuals represent the unexplained variability after accounting for the covariates and the spatial patterns that you observe are essentially what the spatial component is describing. So, it doesn’t necessarily mean that more modelling is required.

Why are the R-squared values for the models that included spatial autocorrelation substantially lower compared with the basic linear independence models?

The R-squared reported as part of the model is the proportion of variation explained by the fixed effects. It is not predictive performance. Because the error structures are different between autocorrelation and independence models, there is no relationship between the R-squared of the two. If you want to compare the predictive ability of autocorrelation and independence models you can look at the correlation between observed values and predictions using cross-validation. That is a completely unrelated idea to the one above.

What is block kriging?

Block kriging is the prediction of an average value over an area, rather than the particular value at a single point in space.

What is the difference between prediction and estimation?

When you set up a statistical model, you generally put the observable quantity on the left of the equal sign, and the model on the right side of the equal sign. Prediction is used to refer to an inference on an unobserved, but potentially observable value to the left of the equal sign. Estimation is used to refer to an inference about a model parameter, which is not directly observable.

Why don’t I get an AIC value for logistic and Poisson regression models in the SSN package?

In the SSN package, the logistic and Poisson regression models use a model fitting method that has been called quasi-pseudo-likelihood (which is described in the SSN documentation). The real likelihood, including the spatial part of the model, is too difficult to compute. Because there is no true likelihood, AIC cannot be computed. Some have proposed a method called QAIC, but it has a questionable reputation and has not been tested for these types of models. Hence, we do not include AIC, except for models based on a normal distribution. You can use cross-validation, P-values, or some other method to compare, or select, among models. This approach, and the information that we give, is essentially equivalent to PROC GLIMMIX in SAS. You can read up on that to help you make modelling decisions and understand it better.

When I generate and plot a Torgegram, symbols are missing for some or all distance bins.

This often happens when there a relatively small number of sites one or more networks. There is an argument in the Torgegram function, which sets the cutoff for the minimum number of pairs needed to estimate the semivariance for a bin (nlagcutoff) based on flow connected and flow-unconnected relationships. The default is 15, but a general rule of thumb is to use at least 30 pairs of measurements in each bin. An alternative is to reduce the number of bins in the Torgegram using the nlag argument; thus increasing the number of measurements in each bin.

When I use the varcomp function, the proportions for the variance components (covariates and covariance models) sums to 1. Does this mean that the model explains 100% of the variability in the response?

No. The proportion explained by the covariates (fixed effects) is the R^2 value, and is generally considered the part explained by the model. The rest is random variation. The part of the random variation assigned to each variance component (spatial and independent components) fills out the rest of proportions, such that they sum to one.

Forest Service Research and Development has created a new web presence. You will be redirected to the new page in a moment. Please update your bookmarks.

https://www.fs.usda.gov/rm/boise/AWAE/projects/SSN_STARS/FAQ.html