Minnesota Department of Transportation
Jump to main content.
Minnesota Department of Transportation Go to 511
MnDOT A to Z | General Contacts | Simple Search | Advanced Search
Survey Implementation Model Map Cross-section
 

Mn/Model

Final Report Phases 1-3 (2002)

Contact Us   Mn/Model Home | Archaeology | Geomorphology | Geographic Information Systems (GIS) | Implementation

 

Quick Links

Chapters

Appendices

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Chapter 8

Model Results and Interpretations

 

By Elizabeth Hobbs, Craig M. Johnson, Guy E. Gibbon, Carol Sersland, Mark Ellis, and Tatiana Nawrocki

Statewide Survey Impelmentation Model Map

 

 

Chapter 8 Table of Contents
8.1   Introduction
8.2   Model Description
         8.2.1 Envirnmental Context
         8.2.2 Types of Models
8.3   Model Evaluation
8.4   Model Interpretation
         8.4.1 Presentation of Interpretations
         8.4.2 Previously Identified Variables
8.5   Model Comparison
         8.5.1 Subsection Group Approach
         8.5.2 Site Catchment Analysis
         8.5.3 Cultural Context and Site Location
         8.5.4 SHPO Intuitive Model
8.6   Model Results
         8.6.1 Phase 1 Results
         8.6.2 Phase 2 Results
         8.6.3 Phase 3 Results
8.7   Agassiz Lowlands
8.8   Anoka Sand Plain
8.9   Aspen Parklands
8.10 Big Woods
8.11 Blufflands
8.12 Border Lakes
8.13 Chippewa Plains
8.14 Coteau Moraines / Inner Coteau
8.15 Glacial Lake Superior Plain/Northshore highlands/ Nashwauk Uplands
8.16 Hardwood Hills
8.17 Laurentian Highlands
8.18 Littlefork-Vermilion Uplands
8.19 Mille Lacs Uplands
8.20 Minnesota River Prairie
8.21 Oak Savanna
8.22 Pine Moraines & Outwash Plains
8.23 Red River Prairie
8.24 Rochester Plateau
8.25 St. Croix Moraines and Outwash Plains (Twin Cities Highlands)
8.26 St. Louis Moraines/ Tamarack Lowlands
8.27 Conclusion
        References

 

 

8.6 MODEL RESULTS

8.6.1 Phase 1 Results

In Phase 1 of the project, models for sites excluding single artifacts were developed only for the 29 counties for which "probabilistic" surveys were available (Figure 4.5). These were grouped into five archaeological resource regions (Section 4.6.1) for modeling. Since it was not possible to acquire and convert data for all of these counties in time to meet modeling deadlines, some of the counties were not modeled or were modeled only for one region when they otherwise would have been modeled for two regions. The regions necessarily had incomplete and sometimes discontinuous coverage.

 

Models were built using only sites from "probabilistic" surveys, excluding single artifacts. All other sites in the counties modeled were used to test the models. Negative survey locations represented non-sites. Modeling methods for this initial phase of the project are discussed in Section 7.3.

 

Some of these regions were too large and contained too much environmental variability to model well using such small site numbers and environmental data from distant and disjunct areas. For example, when models developed for Nicollet County were applied to all counties in the Prairie Lakes Region, they performed well for counties near Nicollet County but not as well for distant counties. In the Central Lakes Coniferous Region, models performed better for centrally located counties that had more data and less well for counties on the margins of the region, which had fewer sites.

 

Because of bias in the locations of "probabilistic" surveys (see Chapter 5), there was not a wide range of environmental difference between sites and non-sites. This weakens the models in two ways. First, it reduces the ability of the statistical analysis to distinguish between sites and non-sites. Second, it fails to represent all possible environmental settings, providing potentially fallacious predictions for unsampled landscapes.

 

In general, site numbers used to build the models were extremely low - only sites from "probabilistic" surveys and then only from some of the counties in each region. Within the modeled areas, only 576 sites were available that met the criteria. However, there is no apparent relationship between the performance of the models and the number of sites used to build them (Table 8.6.1). Site numbers were not, in fact, significantly lower than those used to build individual Phase 2 models excluding single artifacts (Table 8.6.2).

 

Table 8.6.1. Evaluation of Basic Phase 1 Models, Percent Known Sites (excluding single artifacts) in Each Site Potential Class.

Modeling Region

# Sites Modeled

% Low

% Medium

% High

% High /Medium

Gain

Nicollet County (pilot)

31

0

14

86

100

0.34

Prairie Lakes

190

11

18

71

89

0.26

Southeast Riverine

87

8

13

79

92

0.29

Southwest Riverine

41

8

17

75

92

0.29

Central Lakes Deciduous

227

14

27

58

85

0.22

Central Lakes Coniferous

147

2

13

85

98

0.33

Average

96

7

17

76

93

0.29

 

 

Phase 1 models were run using logistic regression in GRID, not S-Plus. This undoubtedly produced weaker models than Phases 2 and 3 for two reasons. First, because there is no stepwise function in GRID for selecting the best model variables, only a limited number of variable combinations could be tried. There is no guarantee that any of these models represents the absolute best set of variables from those available. Second, variable coefficients are rounded in GRID, sometimes to zero for small coefficients. As we found out in Phase 2, small differences in coefficients and attributing zero values to variables with small coefficients can make a considerable difference in model performance.

 

All Phase 1 models had approximately 33 percent of the landscape classified in each of the low, medium, and high site potential zones (Figure 8.4a). Regional models varied primarily in how many sites were predicted to be in the high site potential zone. These models had gain statistics ranging from 0.22 to 0.34 (Table 8.6.1). Gain statistic values are low because the high and medium probability areas were, by definition, 66 percent of the landscape. The strongest model was developed for a small, homogeneous area (Nicollet County.) The weakest model was for the Central Lakes Deciduous Region (Figure 2.2), where the area modeled was large and the data discontinuous.

 

Return to Top

 

8.6.2 Phase 2 Results

Phase 2 models were developed for all 87 counties, divided into 15 modeling regions based on archaeological resource regions (Section 4.6.2). Modeling methods are discussed in Section 7.3. Several variations of site probability models were developed. Basic models refer to those developed using variables that were available statewide. Enhanced models also included variables that were available for only limited parts of the state. Basic and enhanced models were developed for two populations of known sites, one excluding only single artifacts and the other excluding both single artifacts and lithic scatters. A total of 1,815 sites were available statewide that met all modeling criteria and excluded single artifacts. When lithic scatters were removed, this reduced the statewide dataset by 43 percent to 1,048.

 

The best Phase 2 basic models are summarized in Tables 8.6.2 and 8.6.3. The composite of models excluding only single artifacts is illustrated in Figure 8.4b. A goal set at the beginning of this phase of the project was to develop models with 85 percent of the known sites predicted in 33 percent of the area modeled. That would be the equivalent of a gain statistic of 0.61. Seventy-three percent of the models developed with only single artifacts excluded and 36 percent of those with lithic scatters also excluded met or exceeded this goal. Models excluding only single artifacts produced gains statistics ranging from 0.28 to 0.89, with an average gain of 0.68 (Table 8.6.2). Models excluding both single artifacts and lithic scatters had gains from 0.12 to 0.94, with an average gain of 0.61 (Table 8.6.3).

 

These results indicate that models excluding only single artifacts performed, on average, much better than models excluding single artifacts and lithic scatters. However, results varied between regions. While gain statistics for some regions may decline from incorporating lithic scatters as part of the database, this loss is less than 0.10 in all but two regions (Table 8.6.4). However, the increased gain from including lithic scatters is less than 0.10 in only one region. On the average, the gain statistic increases by 0.11 from including lithic scatters in the database. Whether the improvement is attributable to the particular information contained in the lithic scatter locations or simply due to the increase in the number of sites modeled could be debated. The Lake Superior model would not run at all when removing lithic scatters reduced the database from eight sites to three. Certainly the inclination is to have more confidence in models built with large numbers of sites. However, all of the site populations modeled in Phase 2 are smaller than ideal for multivariate analysis. The inclusion of lithic scatters could improve models in cases where lithic scatters have similar environmental settings as other modeled sites by enhancing the detectable pattern. On the other hand, if lithic scatters are found in different environmental settings, or if there is no pattern to where they are found, they could degrade model performance by muddling the "pattern." Consequently, it is important to remember that the detectable pattern from such small site populations, no matter how distinct, may not be representative of the entire universe of sites, most of which have not yet been found. Even relatively modest changes in the number and characteristics of the site population may exert a strong influence on model results. Only very large site populations from a random sample of all landscapes in a region will produce a representative database for modeling.

 

Table 8.6.2. Best Phase 2 Basic Models (excluding single artifacts).

Modeling Region

# Sites Modeled

% Area High/ Medium

% Sites Predicted

Gain

Southwest Riverine

41

35

86

0.59

Prairie Lakes East

77

25

64

0.61

Prairie Lakes North

120

20

71

0.71

Prairie Lakes South

209

20

77

0.74

Southeast Riverine East

58

34

72

0.53

Southeast Riverine West

63

63

87

0.28

Central Lakes Deciduous East

278

18

82

0.78

Central Lakes Deciduous South

199

24

81

0.70

Central Lakes Deciduous West

165

18

81

0.78

Central Lakes Coniferous South and East

119

15

78

0.81

Central Lakes Coniferous Central, North, and West

236

19

77

0.75

Red River Valley

80

19

79

0.76

Northern Bog

8

33

59

0.44

Border Lakes

154

10

81

0.88

Lake Superior

8

8

74

0.89

Average

121

24

77

0.68

 

 

Table 8.6.3. Best Phase 2 Basic Models (excluding single artifacts and lithic scatters).

Modeling Region

# Sites Modeled

% Area High/ Medium

% Sites Predicted

Gain

Southwest Riverine

11

80

91

0.12

Prairie Lakes East

29

30

56

0.46

Prairie Lakes North

74

49

66

0.26

Prairie Lakes South

91

40

82

0.51

Southeast Riverine East

38

17

62

0.73

Southeast Riverine West

54

48

81

0.41

Central Lakes Deciduous East

196

10

76

0.87

Central Lakes Deciduous South

105

34

80

0.58

Central Lakes Deciduous West

103

19

81

0.77

Central Lakes Coniferous South and East

5

22

43

0.49

Central Lakes Coniferous Central, North, and West

171

43

84

0.49

Red River Valley

44

15

78

0.81

Northern Bog

6

44

81

0.46

Border Lakes

118

5

78

0.94

Lake Superior

3

NA

NA

NA

Average

70

35

80

0.61

 

 

Table 8.6.4. Comparison of Phase 2 Models With and Without Lithic Scatters.

Modeled Region

Difference in number of sites modeled (number of lithic scatters)
Lithic scatters as percentage of all sites excluding single artifacts

Gain excluding single artifacts minus gain also excluding lithic scatters

Southwest Riverine

30
73

0.47

Prairie Lakes East

48
62

0.15

Prairie Lakes North

46
38

0.45

Prairie Lakes South

118
56

0.23

Southeast Riverine East

20
34

-0.20

Southeast Riverine West

113
68

-0.13

Central Lakes Deciduous East

82
29

-0.09

Central Lakes Deciduous South

94
47

0.12

Central Lakes Deciduous West

62
38

0.01

Central Lakes Coniferous South and East

114
96

0.32

Central Lakes Coniferous Central, North, and West

65
28

0.26

Red River Valley

36
45

-0.05

Northern Bog

2
25

-0.02

Border Lakes

36
23

-0.06

Lake Superior

5
63

N.A.

Mean

58
48

0.11

 

 

8.6.2.1 Improvement over Phase 1 models

To measure the improvement of Phase 2 models over Phase 1 models, the Phase 1 models were extended to other counties in the Phase 2 model subregions of which they are a part (Table 8.6.5). For this analysis, Phase 1 models were classified into three probability classes following the same procedures used in the Phase 2 modeling (Section 7.5.1.2).

 

In these fourteen subregions, the gain statistic improved an average of 0.29 from Phase 1 to Phase 2. This improvement is attributable to several factors. Most important, the variable selection procedure in S-Plus allows consideration of all the variables in the dataset at once. Most Phase 1 models were developed using only logistic regression in GRID, which takes only a small number of variables at once. For those models, variables had to be grouped subjectively for evaluation. The only model from Phase 1 that performs better for a subregion than the Phase 2 model was in Central Lakes Coniferous South. This is the only Phase 1 model that was developed using variable selection in S-Plus. Additional factors contributing to model improvement are the increase in the number of sites available for modeling, the addition of vegetation data from Marschner, and the modeling of subregions, which in some regions reduces the environmental diversity being considered.

 

Table 8.6.5. Evaluation of Best Phase 1 Model vs. Best Phase 2 Basic Model (excluding single artifacts).

Modeling Region

Phase 1 Model

Phase 2 Model

Southwest Riverine

45

83

0.46

35

86

0.59

Prairie Lakes East

60

82

0.27

25

64

0.61

Prairie Lakes North

63

84

0.25

20

71

0.71

Prairie Lakes South

59

85

0.31

20

77

0.74

Prairie Lakes East

52

83

0.38

34

72

0.53

Southeast Riverine West

67

86

0.22

63

87

0.28

Central Lakes Deciduous East

67

86

0.22

18

82

0.78

Central Lakes Deciduous South

47

73

0.36

24

81

0.70

Central Lakes Deciduous West

76

87

0.13

18

81

0.78

Central Lakes Coniferous West

41

83

0.51

18

76

0.76

Central Lakes Coniferous East

77

89

0.51

15

82

0.82

Central Lakes Coniferous North

39

85

0.54

19

84

0.77

Central Lakes Coniferous South

35

91

0.62

15

25

0.40

Central Lakes Coniferous West

49

85

0.42

19

63

0.70

Average

55.5

84

0.37

24.5

74

0.66

H/M = High/Medium

 

 

8.6.2.2 Contributions of Individual Variables

With square root, sine, and cosine transformations included, a total of 120 basic variables were evaluated in each Phase 2 model run (Table 8.6.6). Two additional variables (distance to nearest bedrock outcrop and the square root of the same) were evaluated in the two Southeast Riverine subregions. Considerable redundancy was contributed to the data by slightly different versions of some environmental characteristics (i.e. distance to lakes, distance to large lakes, distance to permanent lakes) and by transformations (square root, sine, cosine) of most variables.

 

The variable probabilities reported by S-Plus are the best measure for comparison of the performance of individual variables (Section 8.3). Of the 122 variables evaluated in 30 models, all but 13 were assigned a probability greater than zero in at least one model (Table 8.6.6). Fifty-six variables had a maximum probability of 100 in at least one of the thirty runs. Thirteen more had maximum probabilities greater than 80. The cumulative probability is provided in Table 8.6.6 as a measure for ranking variables on a statewide basis. This value was obtained by multiplying the number of model runs in which each variable is not zero by the mean probability for that variable. The results were then analyzed to determine which variables performed best.

 

Several things are apparent from this analysis. First, transformed variables are better predictors than their untransformed counterparts. Second, distances to large, permanent or perennial water bodies are more consistently useful measures than many of the other distance to water variables. Third, any kind of water or wetland may provide protection from fire and other related resources, such as wood for fuel.

 

The prominent role of topographic variables suggests they are more universal, applying to a broad range of archaeological regions. It could be that since there are fewer of these landscape variables, there is less repetition or redundancy in what they are measuring versus a much larger suite of water-related variables considered for model construction. Moreover, subtle variations in topography may serve as surrogates for other factors, such as landscape scale vegetation patterns, soil drainage, or visibility, which are not adequately represented in the database. The importance and meaning of these and other variables can only be evaluated with further analysis.

 

Table 8.6.6. Performance of All Phase 2 Basic Variables in 30 Model Runs.

Columns indicate the number of models in which each variable had probability greater than zero, the maximum probability recorded, and the cumulative probability.

BASIC VARIABLE

Number of Models

Maximum Probability

Cumulative Probability

       

Elevation

8

100

676.16

On alluvium

1

0.8

0.8

Prevailing orientation

2

87.5

186.2

On colluvium

0

 

0

Distance to well-drained soils

1

60.9

60.9

Square root of distance to well-drained soils

2

4.7

13.8

Distance to edge of nearest large lake

5

100

492.5

Square root of distance to edge of nearest large lake

10

100

859

Distance to edge of nearest large area of organic soils

3

97.9

153

Square root of distance to edge of nearest large area of organic soils

4

100

257.6

Distance to edge of nearest large wetland

5

96.6

213

Square root of distance to edge of nearest large wetland

5

100

311.5

Distance to edge of nearest lake, wetland, or area of organic soils

3

22.8

34.2

Square root of distance to edge of nearest lake, wetland, or area of organic soils

4

59.2

98

Distance to edge of nearest lake, wetland, area of organic soils, or stream

4

100

146

Square root of distance to edge of nearest lake, wetland, area of organic soils, or stream

1

18.1

18.1

Distance to edge of nearest lake

5

78.6

132.5

Square root of distance to edge of nearest lake

5

93.1

233.5

Distance to edge of nearest marsh

3

20.1

34.8

Square root of distance to edge of nearest marsh

2

9.3

13.2

Distance to edge of nearest large river

3

38.7

56.7

Square rot of distance to edge of nearest large river

2

100

200

Distance to edge of nearest area of organic soils

5

100

426

Square root of distance to edge of nearest area of organic soils

5

100

297.5

Distance to edge of nearest permanent lake

9

100

594

Square root of distance to edge of nearest permanent lake

9

100

603.9

Distance to edge of nearest perennial river or stream

5

100

317.5

Square root of distance to edge of nearest perennial river or stream

9

100

856.8

Distance to edge of nearest river or stream

3

100

135.9

Square root of distance to edge of nearest river or stream

0

 

0

Distance to edge of nearest swamp

1

2.9

2.9

Square root of distance to edge of nearest swamp

4

100

233.6

Distance to edge of nearest wetland

4

79.7

95.2

Square root of distance to edge of nearest wetland

3

9.4

21.6

Depth to bedrock

3

94.5

184.5

Distance to nearest intermittent stream

3

100

138.6

Square root of distance to nearest intermittent stream

4

100

244.8

Direction to nearest permanent water

2

66.6

69

Sine of direction to nearest permanent water

1

39.3

39.3

Cosine of direction to nearest permanent water

1

28

28

Direction to nearest water

1

33.2

33.2

Sine of direction to nearest water

1

48.2

48.2

Cosine of direction to nearest water

2

24.8

27.8

Direction to nearest water or wetland

6

100

532.8

Sine of direction to nearest water or wetland

8

100

703.2

Cosine of direction to nearest water or wetland

2

100

200

Distance to bedrock outcrops

1

13.7

13.7

Square root of distance to bedrock outcrops

1

3.5

3.5

Distance to aspen-birch

2

100

200

Square root of distance to aspen-birch

3

100

293.4

Distance to birch

2

1.5

1.9

Square root of distance to birch

1

100

100

Distance to brushland

1

100

100

Square root of distance to brushland

0

 

0

Distance to Big Woods

2

91.8

103.6

Square root of distance to Big Woods

2

11.8

23.6

Distance to conifers

0

 

0

Square root of distance to conifers

3

100

215.4

Distance to cranberry

2

100

101.6

Square root of distance to cranberry

2

41.5

45

Distance to hardwoods

6

100

361.8

Square root of distance to hardwoods

5

100

149.5

Distance to Kentucky coffee tree

2

43.7

53.8

Square root of distance to Kentucky coffee tree

1

9.1

9.1

Distance to glacial lake sediments

8

100

286.4

Square root of distance to glacial lake sediments

5

100

356

Distance to sugar maple

5

100

324.5

Square root of distance to sugar maple

5

100

185

Distance to mixed hardwoods and conifers

3

99.4

116.1

Square root of distance to mixed hardwoods and conifers

3

100

204.3

Distance to oak woodland

0

 

0

Square root of distance to oak woodland

1

100

100

Distance to pine barrens or flats

4

100

258.4

Square root of distance to pine barrens or flats

1

4.7

4.7

Distance to pine groves

1

4.3

4.3

Square root of distance to pine groves

2

7.9

10.6

Distance to prairie

2

11.2

20.4

Square root of distance to prairie

4

100

296

Distance to river bottom forest

7

100

510.3

Square root of distance to river bottom forest

4

100

293.6

Distance to woodland

0

 

0

Square root of distance to woodland

1

2.8

2.8

Distance to nearest perennial stream

2

100

103.4

Square root of distance to nearest perennial stream

4

100

335.6

Soil drainage

0

 

0

Height above surroundings

7

100

511

Square root of height above surroundings

10

100

812

Distance to nearest lake or wetland inlet/outlet

1

3.5

3.5

Square root of distance to nearest lake or wetland inlet/outlet

2

11.6

13.6

Solar insolation

0

 

0

Distance to nearest lake inlet/outlet

2

11.8

17.6

Square root of distance to nearest lake inlet/outlet

1

81.9

81.9

On glacial lake sediment

1

37.6

37.6

Size of nearest lake

2

95.8

101.4

Square root of size of nearest lake

2

94.5

131.8

Distance to nearest permanent lake inlet/outlet

3

100

237.3

Square root of distance to nearest permanent lake inlet/outlet

6

100

531.6

On mine pits or dumps

0

 

0

Vegetation diversity within 0.5 km

2

82.5

93.4

Vegetation diversity within 1 km

3

100

223.8

On peat

0

 

0

Distance to nearest confluence between perennial streams and large rivers

2

100

104.4

Size of nearest permanent lake

4

100

244

Square root of size of nearest permanent lake

4

64.1

217.6

Relative elevation

7

100

403.9

Square root of relative elevation

12

100

538.8

Surface roughness

6

100

532.8

Distance to nearest confluence between perennial or intermittent streams and large rivers

3

94.9

122.7

Square root of distance to nearest confluence between perennial or intermittent streams and large rivers

3

100

294.9

Susceptibility to sedimentation

0

 

0

Slope

5

100

444.5

Square root of slope

3

95.4

157.2

Distance to nearest confluence between streams of different classes

2

6.6

12.6

Square root of distance to nearest confluence between streams of different classes

0

 

0

On a river terrace

3

100

186.9

Vertical distance to water

2

100

105.6

Vertical distance to permanent water

4

100

144.4

Susceptibility to erosion by water

0

 

0

Distance to nearest wetland inlet/outlet

3

41.5

45.6

Square root of distance to nearest wetland inlet/outlet

1

5.7

5.7

Distance to nearest permanent wetland inlet/outlet

5

100

252

Square root of distance to nearest permanent wetland inlet/outlet

3

10.7

23.4

 

 

Contributions of Trygg Variables

Variables derived from Trygg maps were not strong contributors to the 18 Trygg enhanced models run in Phase 2 (Table 8.6.7). For those that did make significant contributions to models (probabilities greater than 50), four are vegetation variables redundant with information derived from Marschner, although from a higher resolution source scale. Only two significant variables, distance to Native American cultural features and square root of distance to junctures of roads and trails with water and wetlands can be derived only from Trygg map data. Results may be different, however, when Trygg maps can be made available in digital format for the entire state. Both continuous coverage and the opportunity to test all of the variables throughout the state could produce better results.

 

Table 8.6.7. Performance of All Trygg Variables in 18 Phase 2 Model Runs.

Number of models each variable had probability greater than zero, maximum probability recorded, cumulative probability.

BASIC VARIABLE

Number of Models

Maximum Probability

Cumulative Probability

Distance from grassland (prairie or meadow)

1

88

88

Square root of distance from grassland

1

2.6

2.6

Distance to Native American cultural features

2

100

106.4

Square root of distance to Native American cultural features

0

   

Distance to roads and trails

0

   

Square root of distance to roads and trails

0

   

Distance from wooded land (except swamp)

2

100

117.4

Square root of distance from wooded land

3

100

111.4

Distance to junctures of roads and trails with water and wetland resources

0

   

Square root of distance to junctures of roads and trails with water and wetlands

1

100

100

Vegetation diversity within 510 meters

1

51.9

51.9

Vegetation diversity within 990 meters

0

   

Distance to wild rice sites

0

   

Square root of distance to wild rice sites

0

   

Distance to beaver sites

0

   

Square root of distance to beaver sites

0

   

 

 

Contributions of High Resolution Soils Variables

Variables derived from high resolution soils data (digital county soil surveys) were used to enhance 14 Phase 2 model runs. Only three of these variables (mean soil reaction [pH] for the surface layer, square root of distance to edge of nearest hydric soils, square root of distance to edge of nearest large area of hydric soils) showed promise (Table 8.6.8). Like Trygg variables, soil variables may perform better when more extensive and continuous coverage is available. Improvements in the spatial accuracy of many of the digital soils surveys may also help.

 

Table 8.6.8. Performance of All High Resolution Soils Variables in 14 Phase 2 Model Runs.

Number of models each variable had probability greater than zero, maximum probability recorded, cumulative probability.

SOILS VARIABLE

Number of Models

Maximum Probability

Cumulative Probability

Suitability of soil for archaeological sites, based on soil texture classes

0

   

Mean depth to the lower boundary of the surface layer

0

   

Mean value for clay content of the surface layer

1

2.4

2.4

Mean value for the available water capacity for the surface layer

2

6.5

7.6

Mean value for organic matter content of the surface layer

1

1.6

1.6

Mean soil reaction (pH) for the surface layer

2

100

200

Mean permeability rate of the surface layer

2

2.9

5.7

Distance to edge of nearest hydric soils

0

   

Square root of distance to edge of nearest hydric soils

1

69.5

69.5

Distance to edge of nearest large area of hydric soils

3

42.9

49.6

Square root of distance to edge of nearest large area of hydric soils

4

100

114.6

Distance to edge of nearest water (lakes, wetlands, or hydric soils)

0

   

Square root of distance to edge of nearest water (lakes, wetlands, or hydric soils)

0

   

Distance to edge of nearest water (lakes, wetlands, hydric soils, or streams)

0

   

Square root of distance to edge of nearest water (lakes, wetlands, hydric soils, or streams)

0

   

 

 

The Sparse Population Problem

Once the models have been classified into 20, then three, probability classes, the raw model values are disguised. When the regression equation is applied to a region, it produces a value for each cell, which is the estimated probability of a site occurring in that cell. This value can range from zero to one. The ranges and means of these values vary from region to region (Table 8.6.9).

 

Table 8.6.9. Raw Model Values for Models Excluding Single Artifacts for Phase 2 Models.

Subregion

Minimum

Mean

Maximum

Std. Dev.

1 Southwest Riverine

0.000

0.116

0.999

0.144

2e Prairie Lakes East

0.000

0.064

0.985

0.111

2n Prairie Lakes North

0.000

0.283

1.000

0.258

2s Prairie Lakes South

0.000

0.445

1.000

0.276

3e Southeast Riverine East

0.000

0.480

0.945

0.253

3w Southeast Riverine West

0.003

0.812

1.000

0.122

4e Central Lakes Deciduous East

0.000

0.032

0.999

0.085

4s Central Lakes Deciduous South

0.001

0.216

0.999

0.221

4w Central Lakes Deciduous West

0.000

0.023

0.998

0.060

5e Central Lakes Coniferous East

0.000

0.216

1.000

0.251

5s Central Lakes Coniferous South

0.000

0.154

0.996

0.212

5c Central Lakes Coniferous Central

0.000

0.167

1.000

0.240

5n Central Lakes Coniferous North

0.000

0.065

0.993

0.144

5w Central Lakes Coniferous West

0.000

0.111

0.999

0.168

6n Red River Valley North

0.000

0.160

0.999

0.176

6s Red River Valley South

0.002

0.237

1.000

0.249

7e Northern Bog East

0.009

0.448

0.745

0.226

7w Northern Bog West

0.001

0.392

0.745

0.241

8 Border Lakes

0.000

0.089

1.000

0.185

9n Lake Superior North

0.001

0.258

0.937

0.280

9s Lake Superior South

0.000

0.138

0.937

0.223

 

 

Comparisons of these values provide a rough indication of the relative probability of finding sites within subregions. Mean values, for models excluding single artifacts, range from 0.023 in Central Lakes Deciduous West to 0.812 in Southeast Riverine West. This high value is an outlier and may indicate problems with the model. For models excluding both single artifacts and lithic scatters, means ranged from 0.051 in Central Lakes Coniferous South to 0.599 in Central Lakes Deciduous South (Table 8.6.10). These values should be interpreted with caution. They may reflect the amount of survey that has occurred in these regions, as well as the potential for sites. For less surveyed regions, values may be lower than would be the case based on true site potential. On the other hand, the biased nature of survey locations may result in values higher than true site potential.

 

Table 8.6.10. Raw Model Values for Models Excluding Single Artifacts and Lithic Scatters for Phase 2 Models.

Subregion

Minimum

Mean

Maximum

Std. Dev.

1 Southwest Riverine

0.003

0.076

0.487

0.069

2e Prairie Lakes East

0.000

0.061

0.508

0.109

2n Prairie Lakes North

0.000

0.243

1.000

0.356

3e Southeast Riverine East

0.014

0.322

0.913

0.258

3w Southeast Riverine West

0.006

0.284

0.982

0.245

4e Central Lakes Deciduous East

0.000

0.081

1.000

0.167

4s Central Lakes Deciduous South

0.000

0.599

1.000

0.309

4w Central Lakes Deciduous West

0.000

0.081

1.000

0.156

5e Central Lakes Coniferous East

0.001

0.074

1.000

0.220

5s Central Lakes Coniferous South

0.001

0.051

1.000

0.184

5c Central Lakes Coniferous Central

0.000

0.333

1.000

0.292

5n Central Lakes Coniferous North

0.000

0.224

1.000

0.247

5w Central Lakes Coniferous West

0.000

0.242

1.000

0.285

6n Red River Valley North

0.000

0.222

0.996

0.245

6s Red River Valley South

0.000

0.396

0.999

0.286

7e Northern Bog East

0.153

0.352

0.997

0.231

7w Northern Bog West

0.177

0.371

0.998

0.237

8 Big Lake

0.000

0.113

1.000

0.239

 

Comparing model probabilities with data from random surveys, it would appear that the models overestimate site potential. Three random surveys were conducted for this project in the summer of 1996. The Wright County survey found sites on seven percent of the locations surveyed. The model for Central Lakes Deciduous South estimates a mean probability of 0.216, or 22 percent. The Cass County survey had a two percent success rate, which the model for Central Lakes Coniferous Central predicts 0.167 or 17 percent. The Wabasha County survey had the greatest success with sites on 22 percent of the locations surveyed. However, the models for the Southeast Riverine Region predict 0.480 (48 percent and 0.812 (81 percent). Aside from the possible effect of survey bias, this apparent discrepancy can be at least partially explained by unmet potential. In other words, there are more suitable habitats for sites than there are sites. The population density of Minnesota was quite low in the precontact period. Therefore, all suitable locations were not occupied. Consequently, not all places that are equally well-suited for sites contain archaeological properties.

 

Return to Top

 

8.6.3 Phase 3 Results – Statewide Models

This section summarizes the Phase 3 models on a statewide basis. Except for the last model discussed (Section 8.6.3.4), all of the models discussed here are simply composites of the regional models. Section 8.6.4 provides detailed evaluations of the models for each Phase 3 modeling region.

 

8.6.3.1 Site Probability Model

The site probability models developed in Phase 3 are the counterparts of models developed in Phases 1 and 2. They predict the potential for finding precontact archaeological resources across the state. Figure 8.5 provides a composite map of these 20 models, showing the statewide pattern of site potential from the best Phase 3 models.

 

The average model predicts 86.8 percent of modeled sites in 25 percent of the region's land area. The average gain statistic is 0.68, and the average Kappa (stability value) is 0.54 (Table 8.6.11). These values reflect averages from 20 regions of different sizes, however. When the composite model is evaluated, 86 percent of all modeled sites in the state, excluding single artifacts, are predicted in high and medium probability zones that constitute only 22.82 percent of the state's area. This produces an overall gain statistic of 0.73. This composite model well exceeds the project's goals of predicting 85 percent of known sites in 33 percent of the land area (for a gain statistic of 0.61). However, because composites of the preliminary models were not developed, no Kappa statistic was calculated statewide. The composite model performed well in 2001, when it was tested with 977 sites that were not available when the models were developed. The statewide model predicted 76.6 percent of the new sites, producing a gain statistic of 0.72 for the test population. For the combined training and testing populations, the model predicted 85.5 percent of known sites and produced a gain statistic of 0.73.

 

To evaluate the degree of confidence in these models, one must consider a suite of factors (Table 8.6.11). The Gain Statistic may be the least reliable of these. A reliable model should both predict an adequate number of test sites (85 percent) in a relatively small area (33 percent of the landscape) and be at least somewhat stable (Kappa > 0.5). Models for only four of the 20 regions meet these criteria:


However, two of these regions, Agassiz Lowlands and Aspen Parklands, have very few surveyed places. In these cases, the site models may not adequately reflect the distribution of archaeological resources within the subsection.

 

There is no discernable pattern between the number of sites available for modeling and the overall quality of a model, as measured here. However, any statistical analysis should be improved by a larger sample size. Marginal results achieved by regions with the highest site numbers (Minnesota River Prairie, Big Woods) may have more to do with the nature of the data than with its quantity. Site location errors may be part of the problem, with enough sites erroneously located in unlikely places that it confuses the detectable pattern. For the same reason, model tests may be inaccurate, as site location errors are known to be present in the test data as well. Another possibility is that site function or temporal use may be confusing the analysis. With large samples, the likelihood of having a sizeable number of sites that are different in some way increases. An analysis of the characteristics of sites not predicted by the models may shed some light on this question.

 

Table 8.6.11. Site Probability Model Performance for All 20 Regions Modeled in Phase 3.

Modeled Region

Site Probability Models

Agassiz Lowlands

53

0.00295

19.49

86.79

0.77543

0.55809

7

85.71

0.77258

86.67

0.77512

Anoka Sand Plain

337

0.06599

24.56

83.98

0.70755

0.49125

38

81.58

0.69891

83.73

0.70669

Aspen Parklands

59

0.00561

22.84

83.03

0.72492

0.68100

28

96.43

0.76316

87.36

0.73854

Big Woods

637

0.07934

33.93

86.18

0.60629

0.47874

69

76.81

0.55826

85.27

0.60209

Blufflands

554

0.11336

34.35

87.01

0.60522

0.52492

51

54.90

0.37432

84.30

0.59253

Border Lakes

960

0.10278

18.88

88.54

0.78676

0.67215

154

76.62

0.75358

86.89

0.78271

Chippewa Plains

513

0.06081

26.79

85.96

0.68834

0.62861

158

78.48

0.65864

84.20

0.68183

Coteau Moraines /Inner Coteau

350

0.03152

34.76

86.00

0.59577

0.49626

41

70.73

0.50855

84.40

0.58815

Glacial Lake Superior Plain/ North Shore Highlands/ Nashwauk Uplands

86

0.00825

22.65

84.9

0.73322

0.31592

26

80.76

0.71984

83.94

0.73016

Hardwood Hills

470

0.02394

19.2

85.53

0.77552

0.61453

54

81.48

0.76436

85.11

0.77441

Laurentian Highlands

120

0.06315

9.94

95

0.89537

0.15717

25

84

0.86921

93.10

0.89323

Littlefork-Vermilion Uplands

25

0.00438

18.32

92

0.80087

0.35445

22

63.63

0.71208

78.72

0.76728

Mille Lacs Uplands

437

0.02786

14.37

86.72

0.83429

0.57116

63

76.19

0.81139

85.40

0.8317

Minnesota River Prairie

969

0.03088

19.84

82.98

0.76091

0.58827

57

80.70

0.75421

82.85

0.76052

Oak Savanna

121

0.01761

20.55

91.73

0.77597

0.51064

12

50.00

0.58900

87.97

0.76640

Pine Moraines & Outwash Plains

474

0.03261

19.21

86.07

0.77681

0.78248

64

67.19

0.71409

83.83

0.77085

Red River Prairie

270

0.01469

29.9

84.82

0.64741

0.47199

58

87.93

0.65996

85.33

0.64959

Rochester Plateau

81

0.01524

51.47

85.18

0.39575

0.44729

1

100

0.48523

85.37

0.39709

St. Croix Moraines And Outwash Plains (Twin Cities Highlands)

126

0.05112

48.68

86.51

0.43729

0.99927

12

100

0.51318

87.68

0.44480

St. Louis Moraines/ Tamarack Lowlands

186

0.0165

9.76

87.64

0.88864

0.50751

33

66.67

0.85361

84.47

0.88445

AVERAGE BY REGION

341.45

0.03843

24.97

86.83

0.67678

0.542585

48.65

77.99

0.67733

85.33

0.70691

STATEWIDE EVALUATION

6828

0.03124

22.82

86.00

0.73465

NA

977

76.66

0.72304

85.56

0.73329

  1 Site frequency is the number of sites per square kilometer within a region.

 

 

Improvement over Phase 2

Because different regionalization schemes were used, Phase 2 and 3 models cannot be compared on a regional basis. This discussion is based on the average values for the models in each phase. Phase 3 models performed better than Phase 2 models by every measure except gain (Table 8.6.12). The emphasis in model reclassification in Phase 2 was to maximize gain (Section 7.5.1.2.4), while the emphasis in Phase 3 was to predict as close to 85 percent of known sites possible. With this methodological change, it was expected that Phase 3 models would classify more land areas high/medium potential. Despite this, Phase 3 models reduced the area slightly while increasing the percentage of sites predicted significantly. At the same time, Phase 3 models did not, on average, reduce the gain statistics.

 

Improvements in model performance can be attributed to much larger populations of known sites for deriving models (Sections 7.3.5.2 and 7.5.1.3), refinements in the classification procedures (Section 7.5.1.3), and reducing the environmental variability with regions by using a different regionalization scheme (Section 4.6.3)

 

Phase 3 models (Figure 8.5) did not completely eliminate the edge effect between regions observed in the Phase 2 models (Section 4.6.2 and Figure 8.4). However, this effect is absent between some regions and inconspicuous between others, particularly when region boundaries follow hydrologic features. There seems to be a relationship to site numbers, as the effect is most conspicuous where regions with very low site numbers abut regions with higher site numbers (e.g. Aspen Parklands and Red River Prairie).

 

Table 8.6.12. Comparison of Phase 1, Phase 2 and Phase 3 Site Probability Models. Values are averages of those for the separate regional model evaluations based on combined training and test data.

Model statistic

Phase 1

Phase 2

Phase 3

Percent area in high/medium probability class

55.5

24

23

Percent sites predicted

84

77

85

Gain

0.37

0.68

0.71

 

 

Contributions of Individual Variables

Only 44 variables were used to build models in Phase 3, a considerable reduction from Phase 2 (Section 8.6.2.2). It should be noted again that, in Phase 3, all horizontal distance and size variables were transformed to square roots for modeling and should be compared to the results of the square root equivalents in Phase 2. Likewise, direction was converted to sine.

 

Of the 44 Phase 3 variables, only one (distance to bedrock used for tools) failed to contribute to any site probability model (Table 8.6.13). Clearly, these variables were a more effective set than that used in Phase 2, when 13 variables (11 percent) failed to contribute to any model. All remaining Phase 3 variables had a maximum probability of 100 in at least one model. On the average, each variable figured into six models. Prevailing orientation, relative elevation, and size of nearest permanent lake each contributed to fewer than three models. Distance to edge of nearest large lake, distance to edge of nearest perennial river or stream, distance to nearest lake, wetland, organic soil, or stream, and height above surroundings each contributed to more than ten models. These four variables also had the highest cumulative probabilities.

 

Cumulative probabilities are perhaps the best measure of the contribution of the variables to the overall modeling effort. The average cumulative probability for the Phase 3 model variables is 561. The variables with above average cumulative probabilities are evenly spread between three aspects of the environment:

 

These results stress the importance of several components of the environment that are significant to hunter/gatherers:

 

Table 8.6.13. Performance of All Phase 3 Variables in the Best Site Probability Models.

Number of models in which each variable occurred with probability greater than zero, maximum probability recorded, mean probability recorded, and cumulative probability.

Variable

# of Models

Max Prob

Mean Prob

Cumulative Prob

Direction to nearest water or wetland

8

100

95.9

767.4

Distance to aspen-birch

4

100

92.9

371.4

Distance to bedrock used for tools

0

0

0.0

0.0

Distance to Big Woods

4

100

98.0

391.8

Distance to brushlands

5

100

85.3

426.3

Distance to conifers

6

100

79.0

474.1

Distance to edge of nearest large wetland

6

100

91.9

551.6

Distance to edge of nearest area of organic soils

4

100

97.6

390.3

Distance to edge of nearest large lake

13

100

97.1

1262.4

Distance to edge of nearest perennial river or stream

13

100

98.6

1281.9

Distance to edge of nearest swamp

8

100

100.0

800.0

Distance to glacial lake sediment

4

100

96.7

386.7

Distance to hardwoods

7

100

93.1

651.4

Distance to mixed hardwoods and pine

4

100

98.0

391.8

Distance to nearest confluence between perennial or intermittent streams and large rivers

4

100

87.4

349.7

Distance to nearest intermittent stream

5

100

89.0

445.2

Distance to nearest lake inlet/outlet

6

100

93.0

557.8

Distance to nearest lake, wetland, organic soil, or stream

14

100

98.0

1371.9

Distance to nearest major ridge or divide

4

100

100.0

400.0

Distance to nearest minor ridge or divide

6

100

97.8

586.6

Distance to nearest permanent lake inlet/outlet

4

100

85.4

341.6

Distance to nearest permanent wetland inlet/outlet

4

100

91.0

363.8

Distance to oak woodland

6

100

93.3

559.8

Distance to paper birch

3

100

99.9

299.7

Distance to pine barrens or flats

3

100

93.2

279.7

Distance to prairie

9

100

89.7

807.3

Distance to river bottom forest

3

100

88.5

265.6

Distance to sugar maple

7

100

89.2

624.1

Distance to well-drained soils

3

100

100.0

300.0

Elevation

10

100

98.9

989.0

Height above surroundings

17

100

98.4

1673.6

On alluvium

3

100

74.7

224.1

On river terraces

4

100

93.9

375.6

Prevailing orientation

1

100

100.0

100.0

Relative elevation

2

100

88.0

176.0

Size of major watershed

6

100

86.7

520.1

Size of minor watershed

4

100

66.8

267.0

Size of nearest lake

5

100

90.9

454.6

Size of nearest permanent lake

2

100

99.1

198.2

Slope

6

100

92.4

554.2

Surface roughness

8

100

86.5

692.3

Vegetation diversity within 1 km

10

100

89.7

897.1

Vertical distance to permanent water

9

100

100.0

900.0

Vertical distance to water

4

100

97.4

389.7

 

 

8.6.3.2 Survey Probability Model

The survey probability models developed in Mn/Model Phase 3 have no precedent. Their development was prompted by the realization that past surveys in Minnesota have been strongly biased in favor of locations near water and that known sites may be absent from other locations simply because no surveys have been conducted there. The survey probability models were developed as a CRM tool to identify the kinds of landscapes that have not been adequately surveyed in the past. Survey potential (Figure 8.6) should be interpreted as the probability for each cell that its environment is similar to places where surveys have occurred. The gain statistic for these models can be seen as a measure of the degree of survey bias, with more biased survey patterns producing higher gain statistics.

 

The average survey probability model predicts 85 percent of surveyed places in 43 percent of the region's land area, producing a gain statistic of 0.562463 and a Kappa coefficient of stability of 0.56. This implies that, overall, the models perform significantly better than by chance, leading to the conclusion that surveys in the state exhibit a significant amount of locational bias. This bias is apparent in the composite map of the regional models (Figure 8.6).

 

Since the majority of past surveys have occurred near water, then the majority of the cells near water have been categorized as high survey potential (Figure 8.6). This is particularly apparent in the four regions with the greatest survey bias, as evidenced by strong to very strong gain statistics (0.7 or greater, Table 8.6.14). These models occur in:

 

All of these, except the Laurentian Highlands, also have strong Kappa values (>0.5), another indicator of the strength of the survey bias. Bias is not a function of the number of places surveyed, though Littlefork-Vermilion Uplands has had the second lowest number of surveys recorded. Border Lakes has more than ten times as many sites as this region and also shows a very high degree of survey bias, probably because of the kinds of places that are accessible in that region. Whether these regions have lower or higher than average numbers of surveys, their surveys have been confined to a limited set of environmental situations, primarily near lakes and rivers. These regions will require additional surveys in the low survey potential zone to provide a more balanced picture of the distribution of their archaeological resources.

 

Table 8.6.14. Survey Probability Model Performance for All 20 Regions Modeled in Phase 3.

Modeled Region

Survey Probability Models

Agassiz Lowlands

195

0.01087

36.38

87.17

0.58265

0.63303

Anoka Sand Plain

1016

0.19894

54.04

84.95

0.36386

0.55217

Aspen Parklands

722

0.06863

57.68

93.07

0.38025

0.59768

Big Woods

1993

0.24824

53.32

84.7

0.37048

0.42355

Blufflands

1363

0.27889

38.65

85.03

0.54545

0.64309

Border Lakes

2230

0.23876

18.89

82.73

0.77167

0.72217

Chippewa Plains

1685

0.19973

40.19

84.28

0.52314

0.52964

Coteau Moraines /Inner Coteau

1002

0.09024

54.62

84.03

0.34994

0.58797

Glacial Lake Superior Plain/ North Shore Highlands/ Nashwauk Uplands

1100

0.10435

44

83.9

0.47557

0.63578

Hardwood Hills

1509

0.07685

57.66

86.42

0.33279

0.31642

Laurentian Highlands

746

0.3926

14.90

84.58

0.82384

0.40028

Littlefork-Vermilion Uplands

207

0.0365

20.63

88.88

0.76788

0.70685

Mille Lacs Uplands

1543

0.09835

33.53

84.58

0.60357

0.62430

Minnesota River Prairie

2667

0.08498

44.63

83.24

0.46384

0.65697

Oak Savanna

733

0.1067

39.93

87.45

0.5434

0.49516

Pine Moraines & Outwash Plains

1593

0.1096

43.21

83.99

0.48553

0.59663

Red River Prairie

1051

0.05716

44.94

84.87

0.47048

0.53891

Rochester Plateau

615

0.11574

72.77

84.72

0.14105

0.56311

St. Croix Moraines And Outwash Plains (Twin Cities Highlands)

641

0.26005

61.78

87.05

0.29029

0.44529

St. Louis Moraines/ Tamarack Lowlands

870

0.07718

24.37

82.18

0.70846

0.54660

AVERAGE BY REGION

1174.05

0.142718

42.81

85.39

0.562463

0.56078

STATEWIDE EVALUATION

23443

-

43.29

84.67

0.488721

-

 

1Site frequency is the number of surveyed places per square kilometer within a region.

 

 

Eight regions represent the least biased surveys, evidenced by low gain statistics (<0.5) and higher Kappa coefficients (>0.5):

 

Even in these regions, however, the proclivity to survey environments near water are evident in the model (Figure 8.6).

 

Contributions of Individual Variables

All 44 Phase 3 variables contributed to the survey probability models (Table 8.6.15). Moreover, each variable contributed, on average, to eight survey probability models compared to an average of six site probability models per variable. Relative elevation, distance to bedrock used for tools, prevailing orientation, size of minor watershed, and vertical distance to water each contributed to fewer than four models. Distance to edge of nearest large lake, height above surroundings, and distance to nearest lake, wetland, organic soil, or stream each contributed to more than 12 models and had the highest cumulative probabilities.

 

The higher number of models per variable may be attributable to the larger numbers of variables in the survey probability models. Since the number of variables in an individual model seems to be a function of the number of data points, and there are far more surveyed places than known sites, this result should be expected.

 

Only two variables failed to achieve a maximum probability of 100 in any model. These were relative elevation and distance to bedrock used for tools. Their maximum probabilities were 76.8 and 88.6 respectively. The average cumulative probability for variables in the survey probability models is 74.7. This high value is also a function of having more variables in each model. The variables with higher than average cumulative probabilities include:

 

These variables are more heavily weighted towards water features than are those for the site probability models. Presumably, these variables represent Minnesota archaeologists' mental models of where sites are likely to be found. These models appear to emphasize:

 

Table 8.6.15. Performance of all Phase 3 Variables in the Best Survey Probability Models.

Number of models where each variable had probability greater than zero, the maximum probability recorded, mean probability, and cumulative probability.

Variable

# of Models

Max Prob

Mean Prob

Cumulative Prob

Direction to nearest water or wetland

11

100

99.9

1098.6

Distance to aspen-birch

10

100

100.0

1000.0

Distance to bedrock used for tools

2

88.6

84.8

169.6

Distance to Big Woods

6

100

97.0

581.8

Distance to brushlands

4

100

88.1

352.4

Distance to conifers

7

100

82.1

574.5

Distance to edge of nearest large wetland

8

100

90.7

725.8

Distance to edge of nearest area of organic soils

11

100

98.4

1082.9

Distance to edge of nearest large lake

14

100

97.5

1365.6

Distance to edge of nearest perennial river or stream

12

100

86.7

1039.8

Distance to edge of nearest swamp

6

100

100.0

600.0

Distance to glacial lake sediment

11

100

99.9

1098.4

Distance to hardwoods

12

100

94.2

1130.3

Distance to mixed hardwoods and pine

6

100

98.8

592.7

Distance to nearest confluence between perennial or intermittent streams and large rivers

10

100

99.6

995.8

Distance to nearest intermittent stream

9

100

78.3

704.7

Distance to nearest lake inlet/outlet

10

100

99.5

994.6

Distance to nearest lake, wetland, organic soil, or stream

17

100

95.7

1626.2

Distance to nearest major ridge or divide

10

100

98.5

985.2

Distance to nearest minor ridge or divide

10

100

80.1

800.7

Distance to nearest permanent lake inlet/outlet

7

100

99.8

698.5

Distance to nearest permanent wetland inlet/outlet

10

100

98.2

982.4

Distance to oak woodland

9

100

100.0

900.0

Distance to paper birch

4

100

94.9

379.5

Distance to pine barrens or flats

8

100

94.2

753.5

Distance to prairie

6

100

93.6

561.8

Distance to river bottom forest

12

100

98.2

1178.2

Distance to sugar maple

5

100

98.2

491.0

Distance to well-drained soils

5

100

90.3

451.7

Elevation

9

100

97.6

878.3

Height above surroundings

14

100

98.2

1375.0

On alluvium

4

100

89.1

356.5

On river terraces

8

100

89.7

717.7

Prevailing orientation

3

100

89.6

268.9

Relative elevation

2

76.8

71.2

142.4

Size of major watershed

12

100

99.3

1191.8

Size of minor watershed

3

100

99.6

298.9

Size of nearest lake

5

100

75.2

375.8

Size of nearest permanent lake

8

100

95.3

762.7

Slope

4

100

98.7

394.8

Surface roughness

4

100

100.0

400.0

Vegetation diversity within 1 km

9

100

92.1

829.2

Vertical distance to permanent water

7

100

99.9

699.5

Vertical distance to water

3

100

80.8

242.4

 

 

8.6.3.3 Survey Implementation Model

The survey implementation model is an overlay and reclassification of the site probability and survey probability models (Section 7.5.1.3 and Table 7.9). Its primary feature is that it shows areas that have both a low site potential and a low survey potential as being unknown. In this zone, it is likely that site potential is low primarily because these environmental settings have not been adequately surveyed. This zone occupies 49.54 percent of the state's land area and contains five percent of modeled sites and 13 percent of single artifacts (Figure 8.7 and Table 8.6.16). The deficiency of surveys in this region is further highlighted by the low proportions of negative survey points (13 percent) found here. Nearly fifteen percent of test sites were found in the unknown zone, compared to eight percent in the low and possibly low probability zones. This may indicate a somewhat greater emphasis for surveying the unknown areas, as suggested by the Mn/Model implementation plan (Chapter 11).

 

In regions where the survey probability models are unstable (Table 8.6.14), the unknown area may not be well-identified and may in reality include portions of areas classified as low and possibly low potential. Likewise, unstable site probability models (Table 8.6.11) may have similar effects. Consequently, Kappa coefficients for the site and survey probability models for each region should be considered when using the implementation models to guide future surveys.

 

The low site potential areas that coincide with medium and high survey potential are assigned site potential values possibly low and low respectively. These are depicted on the map in two shades of yellow. These zones occupy 24 percent of the state's area and contain seven percent of the modeled sites and 15 percent of the single artifacts. However, 34.5 percent of negative survey points are in these zones.

 

The patterns for the medium and high site potential zones follow those on the site probability models, with the only difference being that these zones are weighted by survey potential. Sites in the medium probability zones are more likely to be found in areas that have higher survey potential (10 percent of modeled sites in 6.5 percent of the state's land area) ) than in areas with lower potential for surveys (2 percent of modeled sites in 2.5 percent of the state's land area). For comparative purposes, this can be reduced to 1.54 vs. 0.8 percent of sites per each one percent of land area. In the high site potential zones, this discrepancy is even greater. By far the largest group of modeled sites (65 percent) is found in the high site potential zone that also has high survey potential (eight percent of the state's area). Only 7.5 percent of sites in the high site potential zone are found in low and medium survey potential locations, which together occupy 2.25 percent of the state's area. For comparison, consider this to be 8.13 vs. 3.33 percent of sites per each one percent of land area. It is apparent that the models are rather conservative about extending the high probability zone much beyond the well-surveyed parts of the landscape.

 

The majority of sites (77.34 percent) are found in areas where survey potential was rated as high. These are the low, medium, and high site potential classes in the survey implementation model. Another 10.75 percent of sites occur where survey potential is medium (possibly low, medium, and high site potential classes) and 4.92 percent occur where survey potential is low, but where site potential is medium or high (suspected medium and high site potential classes).

 

Table 8.6.16. Evaluation of Survey Implementation Model for All 20 Regions Modeled in Phase 3.

Site

Potential

Region

(30 meter cells)

Random

Points

Negative Survey Points

Single Artifacts

Modeled Sites

                     

Unknown

120,328,861

49.54

23,020

49.20

2335

14.67

92

12.76

370

5.43

Possibly Low

31,063,043

12.79

6037

12.90

1771

11.13

36

4.99

217

3.19

Low

27,046,150

11.14

5164

11.04

3567

22.41

75

10.40

247

3.63

                     

Suspected Medium

6,206,642

2.56

1230

2.63

142

0.89

23

3.19

155

2.28

Possibly Medium

7,935,094

3.27

1484

3.17

385

2.42

23

3.19

177

2.60

Medium

15,917,082

6.55

2909

6.22

2280

14.33

74

10.26

604

8.87

                     

Suspected High

2,208,561

0.91

478

1.02

71

0.45

16

2.22

180

2.64

Possibly High

3,326,216

1.34

6037

12.90

204

1.28

32

4.44

338

4.96

High

19,917,051

8.20

3859

8.25

5057

31.78

348

48.27

4414

64.84

                     

Water

7,722,839

3.18

1647

3.52

39

0.25

0

0.00

61

0.90

Steep Slopes

850,789

0.35

195

0.42

49

0.31

2

0.28

44

0.65

Mines

425,116

0.18

98

0.21

14

0.09

0

0.00

1

0.01

                     

Total

242,887,444

100

46,792

100

15,914

100

721

100

6808

100

 

 

By excluding the unknown zone from consideration, it is possible to evaluate the performance of the site probability model within the kinds of ecological settings that are most likely to have been adequately surveyed. Table 8.6.17 provides recalculations of the percentages of each category of sample points within these site potential classes. Of the 6441 modeled sites found within this area statewide, 91.15 percent are within the six medium and high site potential classes, which constitute 45.28 percent of the adequately surveyed area. This produces a respectable gain statistic of 0.50324, indicating that the model performs significantly better than by chance alone. However, its poor performance compared to the site probability model (Section 8.6.3.1) may be attributable to survey bias, which results in a low level of distinction between places where sites are found and places where surveys have occurred. When future surveys extend the area that can be modeled as adequately surveyed, presumably a stronger model can be produced.

 

Table 8.6.17. Evaluation of Site Potential for Survey Implementation Model Outside the Unknown Zone.

Site

Potential

Region

(30 meter cells)

Random Points

Negative Survey Points

Single Artifacts

Modeled Sites

                     

Possibly Low

31,063,043

25.33

6037

20.72

1771

13.04

36

5.72

217

3.37

Low

27,046,150

22.06

5164

17.72

3567

26.27

75

11.92

247

3.83

                     

Suspected Medium

6,206,642

5.06

1230

4.22

142

1.05

23

3.66

155

2.41

Possibly Medium

7,935,094

6.47

1484

5.09

385

2.84

23

3.66

177

2.75

Medium

15,917,082

12.98

2909

9.98

2280

16.79

74

11.76

604

9.38

                     

Suspected High

2,208,561

1.80

478

1.64

71

0.52

16

2.54

180

2.80

Possibly High

3,326,216

2.71

6037

20.72

204

1.50

32

50.9

338

5.25

High

19,917,051

16.24

3859

13.24

5057

37.24

348

55.33

4414

68.56

                     

Water

7,722,839

6.30

1647

5.65

39

0.29

0

0.00

61

0.94

Steep Slopes

850,789

0.69

195

0.67

49

0.36

2

0.32

44

0.68

Mines

425,116

0.35

98

0.34

14

0.10

0

0.00

1

0.02

 

Total

122,618,583

100

29,138

100

13,579

100

629

100

6438

100

 

 

8.6.3.4 Site Probability Model Developed from Statewide Database

Because of difficulties encountered when trying to produce comparable raw model scores for the 20 regional models, a single site probability model was developed from the entire statewide database (Section 7.7). To develop this model, a number of the Phase 3 variables had to be excluded because they were not present statewide. These included distance to paper birch, distance to Big Woods, distance to oak woodland, distance to mixed hardwoods and pine, distance to pine barrens or flats, distance to aspen-birch, distance to bedrock used for tools, distance to conifers, distance to nearest permanent wetland inlet/outlet, and distance to prairie. This left a total of 34 variables for modeling, of which 27 contributed to the model (Table 8.6.18). The only variables with less than 100 percent probability were size of nearest lake and size of nearest permanent lake. The large number of model variables is undoubtedly a function of the very large number of sites in the database.

 

The resulting model emphasizes the gross patterns in known site distribution (Figure 8.8) and does not articulate the local patterns within the landscape as finely as does the statewide model derived using regionalization (Figure 8.4). Consequently, sites that cannot be explained by large-scale environmental patterns or that depend on local variables are not as well predicted as in the regionalized model. However, this model does provide a more accurate representation of relative probabilities statewide (Figure 8.8). The consequence is that some regions (for instance the Twin Cities metro area and the Arrowhead Region) show very high concentrations of high site potential, while others (like the northern tier of counties west of the Arrowhead) show only limited occurrences of high site potential. Whether these results are interpreted as indicators of ecological settings preferred by hunter-gatherers or as artifacts of past survey efforts, this model is a useful point of comparison with the regionalized site probability model.

 

Table 8.6.18. Site Probability Model from Statewide Database.

Variable

S-Plus Regression Coefficient

Probability

Direction to nearest water or wetland (sine)

-0.2325406

100.0

Distance to edge of nearest large wetland

0.006508734

100.0

Distance to edge of nearest area of organic soils

0.004957566

100.0

Distance to edge of nearest large lake

-0.01192841

100.0

Distance to edge of nearest perennial river or stream

-0.01827615

100.0

Distance to edge of nearest swamp

0.01325694

100.0

Distance to hardwoods

-0.004220171

100.0

Distance to nearest confluence between perennial or intermittent streams and large rivers

-0.003289315

100.0

Distance to nearest intermittent stream

0.006548222

100.0

Distance to nearest lake inlet/outlet

-0.01227073

100.0

Distance to nearest lake, wetland, organic soil, or stream

-0.04851173

100.0

Distance to nearest permanent lake inlet/outlet

0.005728610

100.0

Distance to river bottom forest

0.002051763

100.0

Distance to sugar maple

0.003129357

100.0

Distance to well-drained soils

-0.005608923

100.0

Elevation

0.8425730

100.0

Height above surroundings

0.02045504

100.0

On river terraces

0.7578046

100.0

Prevailing orientation

-0.001215266

100.0

Relative elevation

0.02036699

100.0

Size of major watershed

-0.00001451567

100.0

Size of minor watershed

0.00004488181

100.0

Size of nearest lake

-0.00005703235

48.5

Size of nearest permanent lake

0.00006991912

69.9

Surface roughness

-0.01668705

100.0

Vegetation diversity within 1 km

0.3938607

100.0

Vertical distance to permanent water

-0.006548323

100.0

 

 

This model predicts 84.87 percent of all modeled sites within the high and medium probability zones, which constitute 33.65 percent of the state's area (Table 8.6.19). This produces a good gain statistic (0.60351), indicating that the model performs well and comes very close to meeting project goals. The site probability model developed as a composite of the regional models (Section 8.6.3.1) performs somewhat better because regionalization makes it possible to discern patterns within individual regions, not just statewide. Consequently, regionalization gives more weight to sites in regions with low site numbers.

 

This model tested well, predicting 82.81 percent of new sites. The gain statistic for the test population is 0.59365.

 

Because of time constraints, no preliminary models were run, so no Kappa coefficients could be calculated. Performing this analysis would be very useful, especially as a comparison to the stability patterns within the regional models. Also because of time constraints, no survey probability model was developed statewide. The development of survey probability and survey implementation models from the statewide database would also provide useful information for comparison with the regional models.

 

Table 8.6.19. Evaluation of Site Probability Model from Statewide Database.

   

Low

Medium

High

Water

Steep slopes

Mines

Total

Region (30 meter cells)

#

152,146,814

23,271,112

58,469,737

7721980

850883

425122

242,885,648

 

%

62.64

9.58

24.07

3.18

0.35

0.18

100.0

                 

All random points

#

28804

4555

11501

1641

191

98

46790

 

%

61.56

9.73

24.58

3.51

0.41

0.21

100.0

                 

All negative survey points

#

6356

1698

7758

40

46

14

15912

 

%

39.94

10.67

48.76

0.25

0.29

0.09

100.0

                 

All modeled sites

#

926

500

5283

61

43

1

6814

 

%

13.59

7.34

77.53

0.9

0.63

0.01

100.0

                 
All test sites #
155
63
746
10
3
0
977
   
15.86
6.45

76.36

1.02
0.31
0
100.0

 

 

8.6.4 Phase 3 Results – Regional Models

The following sections report on the regionalized models. Reports are presented for 17 individual ECS subsections and for three sets of combined subsections (Figure 8.1). For six of the individual subsections reported, models are taken from modeling combinations of adjacent subsections, as described in Section 7.5.1.3. Only when two or more subsections share the same best site probability and survey probability models are they reported on in combination.

 

The regional model reports contain descriptions of the environmental context of the region, descriptions and evaluations of the site probability, survey probability, and survey implementation models for that region, and interpretations of the site probability and survey probability models. The order in which the models are presented is as follows:

 

8.27 Conclusion

Site probability models developed in Phase 3 of this project performed very well and met or exceeded project goals. However, several factors limit their overall quality. First and foremost, there are simply too few known sites in some parts of the state to impart a high level of confidence to the models for those areas. In some cases (for example, Laurentian Highlands, St. Louis Moraines and Tamarack Lowlands), the site probability models performed deceptively well. This can be explained by the very limited range of variation in the known sites' environments. This limited range of variation, in turn, results from survey bias, low survey numbers, and low site numbers. In such cases, model stability may be rather high. Again, this is deceptive, as the range of variability in the site environment data is too narrow to introduce sufficient uncertainty into the model. Thus, interpretation of all model results presented here should be tempered by the number of sites available for analysis within each subsection.

 

Similarly, when a large number of sites are taken from biased surveys, as in the Border Lakes subsection, the environmental variability of the site locations may still be low. The survey probability and survey implementation models were developed to address this kind of bias. However, survey bias may have been overestimated in this phase of the project, as surveys are not yet adequately mapped. A more complete mapping of surveys should reduce the amount of land classified as "unknown" in the survey implementation models.

 

These models may also include errors or inaccuracies attributable to site mapping problems. As only site centroids were mapped for this project, a more limited range of environments may have been analyzed than would have been included in the database if sites were mapped as polygons. Moreover, inaccurate or imprecise centroid locations may have resulted in the inclusion of atypical environments in the analysis. Improved site mapping will be an important component of future models.

 

Although the edge effect that was so conspicuous in the Phase 2 models (Figure 8.4b) is not as apparent, it can still be detected in the Phase 3 models (Figure 8.5), particularly along the margins of the Aspen Parklands, Oak Savanna, Rochester Plateau, Red River Prairie, and Nashwauk Uplands. These subsections all have low site numbers and low site frequencies. However, this is almost certainly not the entire explanation for the phenomenon. To some extent it may be an artifact of how the raw model values are classified into probability classes, which permits adjacent regions to have larger or smaller proportions of their areas classified as high and medium site probability. Other, sometimes more conspicuous, edge effects are apparent where elevation data of different qualities adjoin.

 

These models best predict more recent archaeological sites (i.e., those formed within the last 3500 years), as these make up a majority of the available data and will be the most closely associated with modern environmental variables. Combined with the landscape sediment assemblage interpretations (Chapter 12) and hydrologic modeling to identify locations of drained lakes, the site/environment relationships identified by these models may help develop models for earlier archaeological sites via reconstructions of pre-Woodland suitable habitats.

 

All of these limitations can be addressed in the next full modeling phase, which is scheduled to take place in about 2006-2007. Plans for those future enhancements are discussed in Chapter 10.

 

Return to Top

 

 

REFERENCES

Bonham-Carter, G.F.
   1994 Geographic Information Systems for Geoscientists: Modeling with GIS. Pergamon
       (Elsevier Science Ltd.), Tarrytown, NY.

 

Brandt, R., B.J. Groenewoudt, and K.L. Kvamme
   1992 An Experiment in Archaeological Site Location: Modeling in the Netherlands using GIS
       Techniques. World Archaeology 24:268-282.

 

Carmichael, D.L.
   1990 GIS Predictive Modeling of Prehistoric Site Distributions in Central Montana. In Interpreting
       Space: GIS and Archaeology
, edited by K.M.S. Allen, S.W. Green, and E.B.W. Zubrow, pp.
       216-225. Taylor and Francis, London.

 

Cassell, M.S., H.D. Mooers, C.A. Dobbs, T. Madigan, M. Covill, J. Berry, and D.A. Birk
   1997 An Archaeological Sensitivity Model of Prehistoric and Contact Period Settlement at
       Camp Ripley, Morrison County, Minnesota
. Reports of Investigation No. 397. Institute for
       Minnesota Archaeology, Minneapolis.

 

Craig, J.
   1989 Predictive Modeling of Prehistoric Settlement Patterns in the Chicago Lake Plain. The
       Wisconsin Archaeologist
70 (3):347-361.

 

Dalla Bona, L.
   1994 Methodological Considerations. Cultural Heritage Resource Predictive Modeling Project
       Vol.4
. Centre for Archaeological Resource Prediction, Lakehead University, Thunder Bay, Ontario.

 

Dalla Bona, L. and L. Larcombe
   1996 Modeling Prehistoric Land Use in Northern Ontario. In New Methods, Old Problems:
       Geographic Information Systems in Modern Archaeological Research,
edited by H.D.G.
       Maschner, pp. 252-271. Center for Archaeological Investigations Occasional Paper No. 23,
       Southern Illinois University, Carbondale.

 

Dobbs, C.A., Breakey, K.C., and H. Mooers
   1994A Model of Archaeological Sensitivity for Landforms along the Lakehead Pipeline Corridor
       from Neche, North Dakota to Clearbrook, Minnesota
. Reports of Investigation No. 282.
       Institute for Minnesota Archaeology, Minneapolis.

 

Dobbs, C.A. and H. Mooers
   1990 A Preliminary Model of Archaeological Sensitivity for Landforms along the Great Lakes Gas
       Transmission Company Natural Gas Pipeline Corridor from St. Vincent Minnesota to Rapid
       River, Michigan
. Reports of Investigation No. 96. Institute for Minnesota Archaeology,
       Minneapolis.

 

Grimm, E.C.
   1994 Fire and Other Factors Controlling the Big Woods Vegetation of Minnesota in the Mid-Nineteenth Century. Ecological Monographs 54:291-311.

 

 

Hasentrab, R.J.
   1991 Wetlands as a Critical Variable in Predictive Modeling of Prehistoric Site Locations: A Case
       Study from the Passaic River Basin. Man in the Northeast 42:39-61.

 

Howes, D.
   1982 A Predictive Model for Site Location in the Alberta Foothills. Plains Anthropologist 27:97-
       108.

 

Jochim, M.A.
   1976 Hunter-Gatherer Subsistence and Settlement: A Predictive Model. Academic Press, New
       York.

 

Kellogg, D.C.
   1987 Statistical Relevance and Site Locational Data. American Antiquity 52:143-150.

 

Kohler, T.A.
   1988 Predictive Locational Modeling: History and Current Practice. In Quantifying the Present
       and Predicting the Past: Theory, Method, and Application of Archaeological Predictive
       Modeling,
edited by W. J. Judge and L, Sebastian, pp. 19-59. U.S. Government Printing Office,
       Washington, DC

 

Kvamme, K.L.
   1985 Determining Empirical Relationships Between the Natural Environment and Prehistoric Site
       Locations: A Hunter-Gatherer Example. In For Concordance in Archaeological Analysis:
       Bridging Data Structure, Quantitative Technique, and Theory
, edited by C. Carr, pp. 208-
       238. Westport Press, Kansas City.


 1988 Development and Testing of Quantitative Models. In Quantifying the Present and
       Predicting the Past: Theory, Method, and Application of Archaeological Predictive
       Modeling,
edited by W. J. Judge and L, Sebastian, pp. 325-428. U.S. Government Printing
       Office, Washington, DC


  1992 A Predictive Site Location Model on the High Plains: An Example with an Independent Test.
       Plains Anthropologist 37:19-40.

 

  1994 Ranter’s Corner: GIS graphics vs. spatial statistics: how do they fit together? Archaeological
       Computing Newsletter
38:1-2.

 

Larson, T.K., R.G. Hilman, and J.D. Benko
   1991 Site Patterning Analysis of the Granite Falls Locality. In pp. The 1990 Archaeological
       Investigations at the Seim/Livingood Site, 21 CP 29, Chippewa County, Minnesota
, edited by
       T.K. Larson, D.M. Penny, and A.R. Woolworth, pp. 10.1-10.17. Submitted to the Minnesota
       Department of Transportation, St. Paul.

 

Lafferty, R.H., III, S. Parker, and W.F. Limp
   1981 Testing the Sparta Hypotheses. In Model Validation in Sparta, edited by R.H. Lafferty, III.
       And J.H. House, pp.206-229. Research Report No. 25, Arkansas Archaeological Survey,
       Fayetteville.

 

Larralde, S.L. and S.M. Chandler
   1981 Archaeological Inventory in the Seep Ridge Cultural Study Tract, Uintah County,
       Northeastern Utah: With a Regional Predictive Model for Site Location
. Cultural Resources
       Series 11, Bureau of Land Management, Salt Lake City.

 

Limp. W.F., R.H. Lafferty, III, S.C. Scholtz
   1985 Toward a Model of Location Choice in Sparta. In Settlement Predictions in Sparta: A
       Location Analysis and Cultural Resource Assessment in the Uplands of Calhoun County
, edited
       by W.F. Limp, pp. 59-99. Research Series No. 14, Arkansas Archaeological Survey,
       Fayetteville.

 

Marschner, F.J.
   1974 The Original Vegetation of Minnesota. Compiled from U.S. General Land Office Survey
       notes. U.S. Department of Agriculture, Forest Service, North Central Forest Experiment Station,
      St. Paul, Minnesota.

 

Minnesota DNR (Department of Natural Resources)
   1998 Ecological Classification System, URL: http://www.dnr.state.mn.us/ecs/index.html

 

Neumann, T.W.
   1992 The Physiographic Variables Associated with Prehistoric Site Location in the Upper Potomac
       River basin, West Virginia. Archaeology of Eastern North America 20:81-124.

 

Pilgrim, T.
   1987 Predicting Archaeological Sites from Environmental Variables: A Mathematical Model for the
       Sierra Nevada Foothills, California
. BAR International Series 320. Oxford, England.

 

Shermer, S.J. And J.A. Tiffany
   1985 Environmental Variables as Factors in Site Location: An Example from the Upper Midwest.
       Midcontinental Journal of Archaeology 10:215-240.

 

Warren, R.E.
   1990 Predictive Modeling of Archaeological Site Location: A Case Study in the Midwest. In
       Interpreting Space: GIS and Archaeology, edited by K.M.S. Allen, SW Green, and E.B.W.
       Zubrow, pp. 201-215. Taylor and Francis, London.

 

Warren, RE and D.L. Asch
   1996 A Predictive Model of Archaeological Site Location in the Eastern Prairie Peninsula, Illinois.
       Unpublished manuscript, Illinois State Museum, Springfield.

 

Williams, L., D.H. Thomas, and R. Bettinger
   1973 Notions to Numbers: Great Basin Settlements as Polythetic Sets. In Research and Theory in
       Current Archeology
, edited by C.L. Redmond, pp. 215-237. John Willey and Sons, New York.

 

Young, P.M., M.R. Horne, C.D. Varley, P.J. Racher, and A.J. Clish
   1995 A Biophysical Model for Prehistoric Archaeological Sites in Southern Ontario. Research and
       Development Branch, Ministry of Transportation, Ontario.

 

Return to Top

 

 

 

The Mn/Model Final Report (Phases 1-3) is available on CD-ROM. Copies may be requested by e-mail: mnmodel@state.mn.us

 

MnModel Orange Bar Logo

Acknowledgements

Mn/Model was financed by the Minnesota Department of Transportation using funds set aside by the Federal Highway Administration's Intermodal Surface Transportation Efficiency Act.

 

Copyright Notice

The Mn/Model process and the predictive models it produced are copyrighted by the Minnesota Department of Transportation (MnDOT), 2000. They may not be used without MnDOT's Consent.