Geographically weighted regression for spatial analysis in BigQuery

Although leading data warehouses already offer some level of support for spatial data they can usually only work with core spatial functionalities and lack some of the advanced analytical capabilities required for many geospatial use cases. The CARTO Analytics Toolbox extends the geospatial capabilities of the most popular cloud data warehouses using spatial SQL which means both easier integrations as well as accessibility  given SQL’s universal adoption within the spatial community. Our Analytics Toolbox already unlocks more than 60 advanced spatial functions using a set of User Defined Functions (UDFs) and procedures covering a broad range of spatial use cases  including: data transformations  spatial indexing and advanced functions to carry out geocoding  clustering  route calculations  and more.

As part of the Advanced modules for Google BigQuery, we have now added support for the Geographically Weighted Regression (GWR) method  a statistical regression method that models the local (e.g. regional or sub-regional) relationships between a set of predictor variables and an outcome of interest.

Suppose we have data across the whole UK on the number of crimes per area as well as other associated variables (i.e. such as the unemployed active population or the level of urbanity of the area) and we wanted to model the number of crimes as a function of such variables. The output from this regression model would be a set of parameter estimates  each reflecting the relationship between the number of crimes and a particular attribute. The parameter estimates in this scenario are global statistics and describe the average relationship for the whole of the UK  which is assumed to be a good representation of any local relationship across the territory.

However  the global relationship might hide local differences as well as contrasting relationships in different parts of the study area which tend to cancel at the global level. In other words  there might be particularities in the relationship of the different variables in specific areas of the country (e.g. Central London, Manchester area  rural areas) that may get cancelled if we only study the behaviour in an aggregated manner.

                  
       EU Flag      This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 960401.