SEARCH WITHIN CONTENT
Emily A. Knapp * / Usama Bilal / Bridget T. Burke / Geoff B. Dougherty / Thomas A. Glass
Citation Information : Connections. Volume 38, Issue 1, Pages 1-11, DOI: https://doi.org/10.21307/connections-2018-001
License : (BY-NC-ND 4.0)
Published Online: 18-January-2019
Network methods have been applied to obesity to map connections between obesity-related genes, model biological feedback mechanisms and potential interventions, and to understand the spread of obesity through social networks. However, network methods have not been applied to understanding the obesogenic environment. Here, we created a network of 32 features of communities hypothesized to be related to obesity. Data from an existing study of determinants of obesity among 1,288 communities in Pennsylvania were used. Spearman correlation coefficients were used to describe the bivariate association between each pair of features. These correlations were used to create a network in which the nodes are community features and weighted edges are the strength of the correlations among those nodes. Modules of clustered features were identified using the walktrap method. This network was plotted, and then examined separately for communities stratified by quartiles of child obesity prevalence. We also examined the relationship between measures of network centrality and child obesity prevalence. The overall structure of the network suggests that environmental features geographically co-occur, and features of the environment that were more highly correlated with body mass index were more central to the network. Three clusters were identified: a crime-related cluster, a food-environment and land use-related cluster, and a physical activity-related cluster. The structure of connections between features of the environment differed between communities with the highest and lowest burden of childhood obesity, and a higher degree of average correlation was observed in the heaviest communities. Network methods may help to explicate the concept of the obesogenic environment, and ultimately to illuminate features of the environment that may serve as levers of community-level intervention.
Networks are everywhere (Barabasi, 2007, 2012, 2009, 2013). However, in public health, network science has only now begun to make significant in-roads. To date, network science has made contributions in diverse areas of biomedical research including cellular communication in cancer (Stites et al., 2007; Berger et al., 2012; Gill et al., 2014; Mutation Consequences and Pathway Analysis working group of the International Cancer Genome Consortium, 2015), protein–protein interactions (Jeong et al., 2001), and complex disease interactions (Barabasi, 2007; Goh et al., 2007; Hidalgo et al., 2009; Zhou et al., 2014). Common features link these diverse applications, including high dimensional data and emergent patterns not easily visible in bivariate space. Networks depict relationships among objects in a system, and network methods help identify structures that influence system behavior.
Obesity is a challenge for traditional public health research because we do not at present have a robust explanation for the temporal and spatial patterns of the obesity epidemic (Galea et al., 2010). This has led obesity researchers to seek alternative, systems science-oriented methods and approaches (Burke and Heiland, 2007; Huang et al., 2009; Finegood, 2011). Network science has made important contributions in obesity research along several dimensions. First, network methods have been used to identify complex linkages among obesity-related genes in animal models (Chen and Zhang, 2013). Second, researchers have conceptualized the ‘stress-response network’ to understand how feedback within biological systems leads to exacerbation and habituation that results in obesogenic growth (Dallman et al., 2003, 2006). Network approaches have been used to study interactions among organizations and components of obesity interventions (Leroux et al., 2013; Marks et al., 2013), and applied to causal loop diagrams to identify leverage points for intervention (McGlashan et al., 2016). Several studies have focused on how obesity and physical activity spread through populations like infection (Crandall, 1988; Christakis and Fowler, 2007; Blanchflower et al., 2009; Hammond, 2010; Hill et al., 2010; Ali et al., 2012; El-Sayed et al., 2012; Gesell et al., 2012; Simpkins et al., 2013; Hammond and Ornstein, 2014). Others have examined how obesity impacts social relationships (Brewis et al., 2011; de la Haye et al., 2011; Ali et al., 2012). Despite these advances, most network studies in obesity have focused on the structure of linkages between individuals connected through social ties. We are aware of no studies to date that focus on the structure of linkages between features of the environment, thought to be the main drivers of the obesity epidemic.
The concept of the ‘obesogenic environment’ was first proposed in the 1990s as a model for evaluating the contribution of environmental factors to obesity (Hill and Peters, 1998; Swinburn et al., 1999). The obesogenic environment assumes a pattern of spatially co-occurring features that jointly influence obesity risk. There is little doubt that aspects of the food and physical activity environment are important, but the question of how to identify patterns of features within the obesity environment remains unanswered. Tools to examine the connections between features of the obesogenic environment are needed. Network analysis can be used to describe relationships (edges) between objects (nodes), allowing for the characterization of network-level features that are otherwise hidden. Network methods also allow us to visualize these connections, facilitating understanding of a very complex epidemic and potentially prioritizing areas for intervention. In this study, we characterize the obesogenic environment with community features as nodes, and correlations between those features as edges. A version of this methodology has been used in neurologic and genetic research and is commonly referred to as ‘Weighted Correlation Network Analysis’ (Fox et al., 2005; Zhang and Horvath, 2005). Our approach examines the structure of the relationships among multiple community features, instead of examining each community feature as an independent cause of obesity.
The literature demonstrates a strong relationship between environmental features that impact diet and physical activity. However, existing studies have focused on single obesity-related features in isolation, most often evaluated for their linear associations with obesity. There has been scant attention to the interdependence of these environmental features and how relationships between obesogenic features of the environment are structured and may create qualitatively different risk environments for obesity. We draw from network analysis tools to study these interrelations between features of the environment and explore how they relate to spatial patterns of obesity prevalence. We are guided by the view that transportation systems, cultural variation, markets, and other system dynamics create clusters of obesity-related features that may have synergistic and aggregative effects on population behavior. Market forces lead to clusters of restaurants, stores and activity spaces in the built environment (Hidalgo and Castañer, 2015). This clustering may potentiate the effect of any one facility by increasing the joint effect of a built and social environment designed to deliver excess calories with maximal efficiency. Therefore, the clustering of features and the existence of central bridging nodes that link disparate clusters may point toward novel targets for research and intervention.
Our primary aim is to explore the utility of network analysis methods to characterize the linkages among a set of 32 spatially patterned features of the obesogenic environment. We created a weighted network of community features from 1,288 communities in Pennsylvania, and examined the relationship between centrality and clustering measures and a commonly used metric of childhood overweight and obesity (percentage of children with body mass index (BMI) percentile ≥ 85th).
Our goal was to model the network of hypothesized obesity-related features of local environments to better understand how network and node centrality and clustering provide insights about the role of environments in child and adolescent obesity.
Our study was based on data from a study of children from 1,288 communities in central and northeastern Pennsylvania served by Geisinger Health System. From the system’s electronic medical records system (EMR), we previously received data on all patients between 3 and 18 years old who visited a Geisinger primary care physician from 2001 to 2012. The sample included 163,473 children and 523,674 visits. The sample is representative of the youth population in the region (Schwartz et al., 2011). This study was approved by institutional review boards at Geisinger Health System and Johns Hopkins School of Public Health.
Children were previously assigned to one of 1,288 communities based on their geocoded home address. Communities consisted of census tracts within cities, and minor civil divisions (townships and boroughs) outside of cities (Schwartz et al., 2011). From the Geisinger EMR we obtained longitudinal height and weight measurements for children. Implausible BMI values, defined as five standard deviations above and below the median, were assumed to be mismeasurement or data entry errors and were deleted using the standard CDC SAS Program (Schwartz et al., 2011). We calculated z-scores for individual BMI, estimated community-level mean BMI z-score, and percent of children who are overweight or obese (BMI greater than or equal to the 85th percentile for age and sex). We then categorized communities according to quartiles of the percent of children with BMI at or above the 85th percentile.
To characterize obesity-related features of the environment we assembled a corpus of 32 variables hypothesized to be related to obesity based on existing research. These variables include demographic, economic, and geographic information from publicly available datasets, including those published by the U.S. Census, the Federal Bureau of Investigation, and two commercial data vendors, Info USA and Dun & Bradstreet, that provided registries of commercial food and physical activity establishments categorized using standard industry codes. Table 1 describes the community features we studied. This list was selected based on attributes that are well accepted in the literature, have acceptable measurement properties, and span a range of content domains and relations with some being related to physical activity and some related to diet. This set of variables has been used in our previous research to characterize diverse aspects of the obesity-related environment in communities (Nau et al., 2015). We categorized all variables in quintile or quartile rank scores to preserve the rank position of variables that are often poorly distributed. After reviewing variable distributions and Spearman, Pearson, and Phi correlation matrices, we chose Spearman correlations as the best representation of variable distributions and the strength of connections.
Given the complex nature of obesogenic environments, we looked for a way to best characterize the relationships between all 32 community features. We needed to honor both the pairwise (bivariate) correlations between variables and the structures that emerge from these pairwise correlations. We used a method analogous to Weighted Correlation Network Analysis (Zhang and Horvath, 2005). We generated a data array of covariates (32 obesity-related community features) that we treat as nodes in a network of interconnected environmental attributes. Edges were operationalized as the strength of the bivariate correlation between each pair of attributes. Bivariate correlations were estimated using pairwise Spearman correlation coefficients between the community variables. Because we were primarily interested in the strength of linkages among nodes and there is controversy over the direction of the relationships between some of these variables and obesity, we chose to use the absolute value of the correlation between variables.
All 992 pairwise correlations were then converted to a weighted undirected adjacency matrix where each cell was the correlation between any two variables. We created a weighted graph from this adjacency matrix using the R package iGraph (version 1.0.1) (Csardi and Nepusz, 2006), specifying pairwise correlations as the edge weights. From this graph, we obtained five sets of results.
First, we plotted an overall network graph using all 1,288 communities. The coordinates of each node were computed using a force-based algorithm, the Fruchterman-Reingold algorithm (Fruchterman and Reingold, 1991), where the attraction between nodes is proportional to the strength of the correlation between environmental features (nodes). We implemented the version of the algorithm in the R package qgraph (version 1.3.2) (Epskamp et al., 2012). The second set of results represents the same graph stratified by community obesity burden (quartiles). For ease of interpretation we show plots corresponding to quartile 1 and 4 (the thinnest and heaviest communities, respectively (see Fig. 1 (overall network) and Fig. 2 (stratified network)).
Third, in order to better understand the relationships between the components of the obesogenic environment, we sought to obtain a measure of clustering and community structure that allowed us to evaluate if network structure was different across communities classified by prevalence of childhood obesity. We conducted a module detection analysis using the walktrap method (Pons and Latapy, 2005) that performs a series of ‘random walks’ between nodes. The probability of ‘walking’ from one node to the next is proportional to the weight of the edge between the nodes, meaning that a walk is more likely to occur between two highly correlated nodes. Each node is restricted to membership in one module. This creates modules of variables that are highly connected to each other. We then calculated the normalized network modularity score (Newman, 2006), which quantifies the strength of the connections within and between modules. A higher modularity score indicates a network with high within-module connectivity and low between-module connectivity. We calculated the modularity score for the overall network graph and each of the four graphs based on community strata of burden of childhood obesity (see Table 2). We used the pairwise correlation between variables (nodes) as a weight in the computation of modularity. Fourth, we calculated an overall measure of network centrality by calculating the average network degree (Barrat et al., 2004). In a weighted undirected network like ours, average network degree is the mean of all pairwise correlations (Barrat et al., 2004). A high average network degree represents a network that has an overall tighter correlation between all nodes. We calculated average network degree for the overall network graph and each of the four network graphs based on obesity prevalence (see Table 2).
Fifth, we examined the association between the centrality of a node and its correlation with the prevalence of childhood obesity. For this, we plotted the degree centrality of each node against that node’s correlation with the prevalence of childhood obesity (Fig. 3).
The purpose of this analysis was to apply network methodology to characterize patterns of linkages and interactions among obesity-related environmental features among communities in Pennsylvania. Figure 1 is a graph of the network of connections (pairwise correlations) between nodes (obesity-related features) in 1,288 Pennsylvania communities.
The graph illustrates three important network characteristics. First, three clusters of tightly-connected variables were identified using the walktrap method. A cluster of the three crime-related variables (rates per 100,000 persons of crimes against property, crimes against persons, and all Part I offenses) can be seen (green shading), and is weakly linked to the main network. This suggests that communities with high rates of violent crime (i.e., assault) also have high rates of crime against property (i.e., arson). Crime rates appear to be moderately correlated with obesity rates as indicated by the dark color of the crime-related nodes. A second cluster is identified consisting of features representing land use patterns, transportation, and food establishment density (yellow shading). We believe this represents the spatial clustering that occurs in the context of suburban sprawl with co-location of establishments in high-volume transportation corridors. The nodes at the heart of this cluster include household density (per square mile) and all food establishments per square mile. This second cluster appears to be the tightest. Eleven of the 14 nodes have an above average absolute correlation with obesity. The model identified a third cluster (pale blue shading) consisting mostly of features that describe the physical activity environment. These include diversity of physical activity establishments, outdoor recreational facilities per square mile, snack stores (e.g., donuts, pretzels, ice cream) per square mile, indoor recreation centers per square mile, all physical activity establishments per square mile, indoor fitness and recreational facilities per street mile, and indoor recreational clubs and facilities per square mile.
In both the second and third clusters, the nodes that are most highly correlated with obesity (indicated by darker node color), are more central in the network overall, as well as within each cluster. Not all of the food or physical activity nodes are clustered. At the edge of the graph we see several food or physical activity nodes that are not as tightly coupled (including parks and big box stores). Finally, the overall structure of the network suggests that elements of these communities are geographically clustered and are not randomly dispersed across communities, especially features of the physical, food, and land use environments.
Next, we were interested in whether the structure of this network of features varied across strata of community obesity burden. Figure 2 shows the result of running a similar network model separately by quartile of percent of children at or above the 85th percentile on BMI-z, a threshold widely considered to be indicative of overweight and obesity burden. Among communities in the lowest quartile of obesity prevalence (Fig. 2A), community features appear to be less tightly connected than in communities in the highest quartile of obesity prevalence (Fig. 2B). This is also described by the higher modularity in Table 2. For example, among lower obesity prevalence communities, crime is weakly linked to the land use- food- physical activity cluster; but in higher obesity prevalence communities, crime is more tightly linked to this cluster. It is not just that quantities of these features are larger in heavier communities, but that the connections between features are also altered: communities that give rise to higher rates of childhood obesity are structured differently than those with less child obesity.
Table 2 shows the results of the network structure analysis, overall, and stratified by quartile of obesity. The overall network has a positive modularity of 0.15, indicating that the nodes (environmental features) show a degree of clustering (as compared to a random distribution of nodes with no clustering). In the analysis stratified by prevalence of childhood obesity, communities in the 1st and 2nd quartile (thinnest communities) show a higher modularity compared to communities in the 3rd and 4th quartile (heaviest communities) (modularity of 0.19 and 0.27 vs. 0.12 and 0.09, respectively). This means that the modules of variables in thinner communities are either more clustered within each module or have weaker connections to variables in other modules, and that in the heaviest communities variables (nodes) exhibit a lower degree of clustering in modules (as can be seen in Fig. 2). For example, a comparison of the two panels in Figure 2 demonstrates that the crime-related cluster shown in green has fewer strong ties (shown by darker lines) to the center of the network in the thinner communities on the left panel compared to the heavier communities in the right panel. Similarly, the average network degree is higher in the heaviest communities (degree = 0.362) compared with the thinnest communities (degree = 0.332), representing higher average correlation (i.e., stronger connections), between variables in communities with higher prevalence of childhood obesity.
Figure 3 shows the relationship between the degree centrality of each community feature (node) with the bivariate correlation of that feature with childhood overweight and obesity prevalence (percent of kids above the 85th BMI percentile). Each dot represents one of the 32 community features. The correlation between the degree of each feature and its correlation with the prevalence of childhood obesity is positive (r = 0.51), indicating that more ‘central’ variables have a stronger association with the outcome. For example, fresh fruit and vegetable stands per square mile has a low correlation with community obesity, and can be seen in Figure 1 as a variable far from the center of the network and with only a few weak ties into the rest of the network.
We applied network methodology in order to describe linkages between community features associated with obesity. We used network analysis to characterize the obesogenic environment: instead of treating individual features of communities in isolation, this method honors the interactions and spatial co-occurrence that make up this landscape of obesity risk.
This work suggests that (i) there are identifiable clusters of environmental features; (ii) that the level of connectivity and structure of features in the network may be informative; and (iii) that features more highly associated with obesity are more likely to be central in the network of community features. Three clusters were identified in the overall network: a cluster of crime-related variables that was weakly linked into the main network, and food and land use and physical activity clusters, respectively. In communities stratified by prevalence of childhood obesity, the structure and overall connectivity of the network appeared to differ by level of obesity. Not only are the values of these attributes different in the heaviest and thinnest communities, but also the patterns of connections are different. We also found that centrality alone, as measured by degree, is correlated with obesity. Obesity-related features are therefore more tightly geographically clustered. This may be evidence of synergy between features of the obesogenic environment, of non-independent features of communities that join forces to shape obesity risk.
Understanding and intervening on the drivers of the obesity epidemic is a challenge for obesity researchers and policy makers. Obesity is complex and has multiple drivers at the individual, community, and state and national levels (Huang et al., 2009). Traditional methods such as regression models fail to account for interaction between multiple factors at multiple scales, the complexity and importance of contextual factors, and feedback loops and other dynamic processes (Hammond, 2009). Although our work is preliminary, it suggests that systems approaches to obesity may be useful for characterizing linkages among features of the environment. Despite the recognition that environmental features of communities play a strong role in the obesity epidemic, network methods to characterize linkages between attributes of communities has been underutilized. The structure and strength of these linkages may provide evidence for geographic areas or types of clusters of features that would be most efficient for intervention.
Network methods, especially graphical methods, could be used to help set priorities for obesity-related interventions in communities. For example, food establishments exhibited both high centrality (as measured by degree) in our network and high correlation with childhood obesity (Fig. 3). Using these network graphs (e.g., Fig. 2), we can narrow in on features such as these that may have far reaching effects, if intervened upon. This is consistent with the literature on ‘food swamps’ and ‘food deserts’ – but helps to prioritize interventions in this area because these features are more central. This could point to the effectiveness of intervening on such variables that are highly central in the network, and thus may have more far-reaching effects than intervention on less central variables. Network methods may help identify such synergistic actors that could have large effects on obesity due to their connections to other variables.
In particular, our work points toward possible interventions regarding community-zoning policies. Our network graphs show tight clusters of food-related (e.g., grocery and convenience stores, fast food and full-service restaurants) and land use (e.g., road block length, household density) features that are strongly correlated with obesity. Restructuring the community environment may be a promising avenue for obesity prevention. Considering communities to be complex systems where multiple interrelated phenomena act together to create an obesogenic environment, these methods also push us to consider intervening on not just the environmental features themselves, but also on the linkages between features. This is a new way to approach the obesity epidemic – by looking for factors that may be linking features, or that can be manipulated to disrupt harmful connections. For example, the crime-related cluster is more tightly linked to the network among communities with more childhood obesity. Further research into the underlying causes of this linkage (and why it differs in communities stratified by childhood obesity prevalence) may illuminate important drivers of the obesity epidemic.
This work also has methodological implications for obesity research. Future work should explore the mechanisms for how these clusters are associated with increased obesity prevalence, and whether interventions on features in this network change the network structure itself. This future research should consider the relationships, or clustering, of these features. Evaluating independent associations between any single feature and obesity rates would ignore the complex inter-relations this work has highlighted. Other methods that acknowledge these clusters of features, such as latent variable methods (Nau et al., 2015), may be more appropriate to honor the way that environmental features cluster together and to uncover unobserved sources of the correlation observed in this network.
We have data from a large and diverse geographic area that includes urban, rural, and suburban communities. However, this analysis is exploratory. We are not able to rule out the possibility that population density and development may be a common cause of many of the variables we selected. This is potentially a source of bias or a possible explanation for the clustering of features of the environment on which our study is based. It is widely recognized that obesity-related characteristics of communities are geographically correlated. The reasons for those correlations are not well understood. Our results, we believe, support the utility of network methods for the study of environments that are not formed randomly, but which are shaped by diverse market and demographic forces that may be important in driving spatial variation in obesity rates.
Network analysis may be a useful tool for evaluating obesogenic environments and other questions of interest in epidemiology. This preliminary analysis suggests that patterns of clustering and connections between features of the environment are important. Land use and food features are strongly linked (especially in ‘heavier’ communities), and features are more highly clustered in communities with higher average BMI. Network methods can illuminate patterns of linkages and key factors in obesogenic environments. Network position (centrality) is correlated with average BMI. Ultimately, the goal of this type of analysis would be to identify highly connected community features that can be used as levers of intervention to reduce population rates of obesity.
Emily Knapp was supported by the Clinical Research and Epidemiology in Diabetes and Endocrinology Training Grant (T32DK062707). Usama Bilal was supported by a fellowship from the Obra Social La Caixa and by a Johns Hopkins Center for a Livable Future-Lerner Fellowship. Bridget Teevan Burke was supported by the Epidemiology and Biostatistics of Aging Training Grant (T32AG000247).
Data for this manuscript were collected as part of a project supported by grant number U54HD070725 from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD). The project is co-funded by the NICHD and the Office of Behavioral and Social Sciences Research (OBSSR). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NICHD or OBSSR.