Geographic Correlation and Oceanic Tools

Gaussian process covariance captures spatial correlation between island societies.

bayesian

regression

gaussian-processes

models

We extend the oceanic tools Poisson regression with a custom GP covariance matrix that models geographic correlation between societies. The inverse-square distance kernel reveals how nearby islands share tool-making knowledge.

Published

May 28, 2025

We read back the Oceanic tools data with geographic coordinates and a distance matrix to model spatial correlation between island societies using a Gaussian Process covariance structure.

%matplotlib inline
import numpy as np
import scipy as sp
import matplotlib as mpl
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import pandas as pd
pd.set_option('display.width', 500)
pd.set_option('display.max_columns', 100)
pd.set_option('display.notebook_repr_html', True)
import seaborn as sns
sns.set_style("whitegrid")
sns.set_context("poster")
import pymc as pm
import arviz as az

Reading in our data

We read back the Oceanic tools data

df = pd.read_csv("data/Kline2.csv", sep=';')
df.head()

	culture	population	contact	total_tools	mean_TU	lat	lon	lon2	logpop
0	Malekula	1100	low	13	3.2	-16.3	167.5	-12.5	7.003065
1	Tikopia	1500	low	22	4.7	-12.3	168.8	-11.2	7.313220
2	Santa Cruz	3600	low	24	4.0	-10.7	166.0	-14.0	8.188689
3	Yap	4791	high	43	5.0	9.5	138.1	-41.9	8.474494
4	Lau Fiji	7400	high	33	5.0	-17.7	178.1	-1.9	8.909235

And center it

df['logpop_c'] = df.logpop - df.logpop.mean()

df.head()

	culture	population	contact	total_tools	mean_TU	lat	lon	lon2	logpop	logpop_c
0	Malekula	1100	low	13	3.2	-16.3	167.5	-12.5	7.003065	-1.973939
1	Tikopia	1500	low	22	4.7	-12.3	168.8	-11.2	7.313220	-1.663784
2	Santa Cruz	3600	low	24	4.0	-10.7	166.0	-14.0	8.188689	-0.788316
3	Yap	4791	high	43	5.0	9.5	138.1	-41.9	8.474494	-0.502510
4	Lau Fiji	7400	high	33	5.0	-17.7	178.1	-1.9	8.909235	-0.067769

And read in the distance matrix

dfd = pd.read_csv("data/distmatrix.csv", header=None)
dij=dfd.values
dij

array([[0.   , 0.475, 0.631, 4.363, 1.234, 2.036, 3.178, 2.794, 1.86 ,
        5.678],
       [0.475, 0.   , 0.315, 4.173, 1.236, 2.007, 2.877, 2.67 , 1.965,
        5.283],
       [0.631, 0.315, 0.   , 3.859, 1.55 , 1.708, 2.588, 2.356, 2.279,
        5.401],
       [4.363, 4.173, 3.859, 0.   , 5.391, 2.462, 1.555, 1.616, 6.136,
        7.178],
       [1.234, 1.236, 1.55 , 5.391, 0.   , 3.219, 4.027, 3.906, 0.763,
        4.884],
       [2.036, 2.007, 1.708, 2.462, 3.219, 0.   , 1.801, 0.85 , 3.893,
        6.653],
       [3.178, 2.877, 2.588, 1.555, 4.027, 1.801, 0.   , 1.213, 4.789,
        5.787],
       [2.794, 2.67 , 2.356, 1.616, 3.906, 0.85 , 1.213, 0.   , 4.622,
        6.722],
       [1.86 , 1.965, 2.279, 6.136, 0.763, 3.893, 4.789, 4.622, 0.   ,
        5.037],
       [5.678, 5.283, 5.401, 7.178, 4.884, 6.653, 5.787, 6.722, 5.037,
        0.   ]])

Implementing the simple tools:logpop model and varying intercepts models

import pytensor.tensor as pt

with pm.Model() as m2c_onlyp:
    betap = pm.Normal("betap", 0, 1)
    alpha = pm.Normal("alpha", 0, 10)
    loglam = alpha + betap*df.logpop_c
    y = pm.Poisson("ntools", mu=pt.exp(loglam), observed=df.total_tools)
    trace2c_onlyp = pm.sample(6000, tune=1000)

Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [betap, alpha]

/Users/rahul/Library/Caches/uv/archive-v0/aTiHGxSE8gD8G3bEQyxJO/lib/python3.14/site-packages/rich/live.py:260: 
UserWarning: install "ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')

Sampling 4 chains for 1_000 tune and 6_000 draw iterations (4_000 + 24_000 draws total) took 2 seconds.

az.summary(trace2c_onlyp)

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
betap	0.239	0.031	0.179	0.297	0.0	0.0	20521.0	16397.0	1.0
alpha	3.478	0.058	3.372	3.588	0.0	0.0	20922.0	16520.0	1.0

Notice that \(\beta_P\) has a value around 0.24

We also implement the varying intercepts per society model from before

with pm.Model() as m3c:
    betap = pm.Normal("betap", 0, 1)
    alpha = pm.Normal("alpha", 0, 10)
    sigmasoc = pm.HalfCauchy("sigmasoc", 1)
    alphasoc = pm.Normal("alphasoc", 0, sigmasoc, shape=df.shape[0])
    loglam = alpha + alphasoc + betap*df.logpop_c 
    y = pm.Poisson("ntools", mu=pt.exp(loglam), observed=df.total_tools)
with m3c:
    trace3 = pm.sample(6000, tune=1000, target_accept=0.95)

Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [betap, alpha, sigmasoc, alphasoc]

/Users/rahul/Library/Caches/uv/archive-v0/aTiHGxSE8gD8G3bEQyxJO/lib/python3.14/site-packages/rich/live.py:260: 
UserWarning: install "ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')

Sampling 4 chains for 1_000 tune and 6_000 draw iterations (4_000 + 24_000 draws total) took 5 seconds.
There were 10 divergences after tuning. Increase `target_accept` or reparameterize.

az.summary(trace3)

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
betap	0.261	0.081	0.107	0.416	0.001	0.001	9947.0	9891.0	1.0
alpha	3.443	0.120	3.211	3.667	0.001	0.001	7646.0	8098.0	1.0
alphasoc[0]	-0.201	0.241	-0.665	0.243	0.002	0.002	12562.0	11942.0	1.0
alphasoc[1]	0.042	0.221	-0.367	0.478	0.002	0.002	11734.0	10870.0	1.0
alphasoc[2]	-0.048	0.195	-0.418	0.331	0.002	0.002	12818.0	12931.0	1.0
alphasoc[3]	0.325	0.189	-0.025	0.678	0.002	0.002	7467.0	9482.0	1.0
alphasoc[4]	0.044	0.176	-0.281	0.391	0.002	0.001	12480.0	11816.0	1.0
alphasoc[5]	-0.317	0.208	-0.713	0.049	0.002	0.002	11430.0	12974.0	1.0
alphasoc[6]	0.145	0.171	-0.169	0.478	0.002	0.001	11496.0	11783.0	1.0
alphasoc[7]	-0.171	0.182	-0.533	0.151	0.002	0.001	12357.0	13019.0	1.0
alphasoc[8]	0.274	0.175	-0.042	0.613	0.002	0.001	9749.0	10910.0	1.0
alphasoc[9]	-0.093	0.288	-0.651	0.458	0.003	0.003	10202.0	9662.0	1.0
sigmasoc	0.308	0.127	0.087	0.545	0.002	0.001	4838.0	3874.0	1.0

A model with a custom covariance matrix

The assumption here now is that the intercepts for these various societies are correlated…

We use a custom covariance matrix which inverse-square weights distance

GP covariance matrix: squared-exponential kernel weighted by geographic distance, plus a noise term

You have seen this before! This is an example of a Gaussian Process Covariance Matrix.

Here is the complete model:

Full model specification: Poisson regression with GP-correlated society intercepts and distance-based covariance

with pm.Model() as mgc:
    betap = pm.Normal("betap", 0, 1)
    alpha = pm.Normal("alpha", 0, 10)
    etasq = pm.HalfCauchy("etasq", 1)
    rhosq = pm.HalfCauchy("rhosq", 1)
    means = pt.stack([0.0]*10)
    sigma_matrix = pt.diag(pt.as_tensor_variable([0.01]*10))
    cov = pt.exp(-rhosq*dij*dij)*etasq + sigma_matrix
    gammasoc = pm.MvNormal("gammasoc", means, cov=cov, shape=df.shape[0])
    loglam = alpha + gammasoc + betap*df.logpop_c 
    y = pm.Poisson("ntools", mu=pt.exp(loglam), observed=df.total_tools)

with mgc:
    mgctrace = pm.sample(10000, tune=2000, target_accept=0.95)

Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [betap, alpha, etasq, rhosq, gammasoc]

/Users/rahul/Library/Caches/uv/archive-v0/aTiHGxSE8gD8G3bEQyxJO/lib/python3.14/site-packages/rich/live.py:260: 
UserWarning: install "ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')

Sampling 4 chains for 2_000 tune and 10_000 draw iterations (8_000 + 40_000 draws total) took 42 seconds.

az.summary(mgctrace)

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
betap	0.246	0.115	0.028	0.469	0.001	0.001	10228.0	11850.0	1.0
alpha	3.509	0.354	2.861	4.149	0.006	0.013	4779.0	3911.0	1.0
gammasoc[0]	-0.268	0.455	-1.121	0.549	0.007	0.012	5484.0	4869.0	1.0
gammasoc[1]	-0.120	0.443	-0.968	0.670	0.007	0.012	5099.0	4180.0	1.0
gammasoc[2]	-0.164	0.428	-0.982	0.587	0.007	0.012	5138.0	4151.0	1.0
gammasoc[3]	0.300	0.385	-0.414	0.991	0.006	0.012	5388.0	4369.0	1.0
gammasoc[4]	0.032	0.379	-0.660	0.720	0.006	0.013	5340.0	4222.0	1.0
gammasoc[5]	-0.454	0.388	-1.163	0.214	0.006	0.012	5582.0	4762.0	1.0
gammasoc[6]	0.102	0.374	-0.585	0.773	0.006	0.013	5442.0	4361.0	1.0
gammasoc[7]	-0.258	0.376	-0.952	0.402	0.006	0.013	5448.0	4625.0	1.0
gammasoc[8]	0.239	0.357	-0.391	0.908	0.005	0.013	5642.0	4511.0	1.0
gammasoc[9]	-0.115	0.460	-0.978	0.770	0.005	0.009	8816.0	8235.0	1.0
etasq	0.352	0.590	0.000	1.007	0.007	0.058	6572.0	9485.0	1.0
rhosq	2.028	61.589	0.002	3.745	0.361	28.788	7949.0	11599.0	1.0

az.plot_trace(mgctrace);

az.plot_autocorr(mgctrace);

/Users/rahul/Library/Caches/uv/archive-v0/aTiHGxSE8gD8G3bEQyxJO/lib/python3.14/site-packages/arviz/plots/plot_utils.py:270: UserWarning: rcParams['plot.max_subplots'] (40) is smaller than the number of variables to plot (56) in plot_autocorr, generating only 40 plots
  warnings.warn(

d={}
gammasoc_samples = mgctrace.posterior['gammasoc'].values.reshape(-1, 10)
for i, v in enumerate(df.culture.values):
    d[v] = gammasoc_samples[:,i]
dfsamps=pd.DataFrame.from_dict(d)
dfsamps.head()

	Malekula	Tikopia	Santa Cruz	Yap	Lau Fiji	Trobriand	Chuuk	Manus	Tonga	Hawaii
0	-0.130590	0.089298	-0.181058	0.615596	0.044638	0.310527	0.318529	0.011540	0.522527	0.314769
1	-0.086934	0.260856	-0.154601	0.322845	0.349256	-0.795153	0.188721	-0.294910	0.368426	-0.112205
2	-0.032733	-0.008312	0.203633	0.255771	0.592531	-0.736068	0.244687	-0.314360	0.449234	-0.123681
3	-0.103811	-0.273942	0.047349	0.704714	0.157025	-0.358446	0.295830	0.189327	0.385959	0.165699
4	-0.138728	0.030216	-0.301498	0.259886	0.256591	-0.179903	0.237708	-0.223339	0.380631	0.238701

dfsamps.describe()

	Malekula	Tikopia	Santa Cruz	Yap	Lau Fiji	Trobriand	Chuuk	Manus	Tonga	Hawaii
count	40000.000000	40000.000000	40000.000000	40000.000000	40000.000000	40000.000000	40000.000000	40000.000000	40000.000000	40000.000000
mean	-0.267976	-0.119667	-0.164049	0.300261	0.031830	-0.453667	0.102345	-0.257916	0.238663	-0.115166
std	0.455163	0.443377	0.427740	0.384622	0.378590	0.388457	0.373908	0.376287	0.356762	0.459638
min	-5.289634	-4.423674	-4.465862	-3.658895	-3.881839	-4.573565	-3.968056	-4.391581	-3.535199	-4.035076
25%	-0.494601	-0.337898	-0.363096	0.115944	-0.143074	-0.637728	-0.066815	-0.431598	0.076133	-0.363911
50%	-0.236810	-0.093558	-0.132354	0.309491	0.046415	-0.418354	0.116564	-0.231101	0.248910	-0.108310
75%	-0.008959	0.128550	0.076217	0.503830	0.229287	-0.231597	0.297074	-0.055373	0.422726	0.138756
max	3.642086	3.495336	3.480449	3.841792	3.590994	2.914655	3.495762	3.064075	3.666528	3.068279

Plotting posteriors and predictives

Lets plot the covariance posteriors for the 100 random samples in the trace.

smalleta=np.random.choice(mgctrace.posterior['etasq'].values.flatten(), replace=False, size=100)
smallrho=np.random.choice(mgctrace.posterior['rhosq'].values.flatten(), replace=False, size=100)

with sns.plotting_context('poster'):
    d=np.linspace(0,10,100)
    for i in range(100):
        covarod = lambda d: smalleta[i]*np.exp(-smallrho[i]*d*d)
        plt.plot(d, covarod(d),alpha=0.1, color='k')
    medetasq=np.median(mgctrace.posterior['etasq'].values.flatten())
    medrhosq=np.median(mgctrace.posterior['rhosq'].values.flatten())
    covarodmed = lambda d: medetasq*np.exp(-medrhosq*d*d)
    plt.plot(d, covarodmed(d),alpha=1.0, color='k', lw=3)
    plt.ylim([0,1]);

The x-axis is thousands of kilometers. Notice how almost everything damps out by 4000 kms. Lets calculate the median correlation matrix:

medkij = np.diag([0.01]*10)+medetasq*(np.exp(-medrhosq*dij*dij))

#from statsmodels
def cov2corr(cov, return_std=False):
    '''convert covariance matrix to correlation matrix

    Parameters
    ----------
    cov : array_like, 2d
        covariance matrix, see Notes

    Returns
    -------
    corr : ndarray (subclass)
        correlation matrix
    return_std : bool
        If this is true then the standard deviation is also returned.
        By default only the correlation matrix is returned.

    Notes
    -----
    This function does not convert subclasses of ndarrays. This requires
    that division is defined elementwise. np.ma.array and np.matrix are allowed.

    '''
    cov = np.asanyarray(cov)
    std_ = np.sqrt(np.diag(cov))
    corr = cov / np.outer(std_, std_)
    if return_std:
        return corr, std_
    else:
        return corr

medcorrij=cov2corr(medkij)
medcorrij

array([[1.00000000e+00, 8.69835262e-01, 8.11247323e-01, 4.34367364e-04,
        5.14947515e-01, 1.78423268e-01, 1.60816245e-02, 4.06297035e-02,
        2.35399591e-01, 2.09068998e-06],
       [8.69835262e-01, 1.00000000e+00, 9.15425265e-01, 8.36612901e-04,
        5.13920440e-01, 1.87081494e-01, 3.35913172e-02, 5.34275935e-02,
        2.00129903e-01, 1.20288892e-05],
       [8.11247323e-01, 9.15425265e-01, 1.00000000e+00, 2.31844903e-03,
        3.60868694e-01, 2.93087429e-01, 6.35978815e-02, 1.01103669e-01,
        1.16791052e-01, 7.22676264e-06],
       [4.34367364e-04, 8.36612901e-04, 2.31844903e-03, 1.00000000e+00,
        7.54894121e-06, 8.22482782e-02, 3.58611523e-01, 3.31644646e-01,
        2.34734649e-07, 8.62061048e-10],
       [5.14947515e-01, 5.13920440e-01, 3.60868694e-01, 7.54894121e-06,
        1.00000000e+00, 1.44642105e-02, 1.35724605e-03, 2.00050756e-03,
        7.53105751e-01, 6.19798877e-05],
       [1.78423268e-01, 1.87081494e-01, 2.93087429e-01, 8.22482782e-02,
        1.44642105e-02, 1.00000000e+00, 2.56872798e-01, 7.11581900e-01,
        2.08418262e-03, 1.62198197e-08],
       [1.60816245e-02, 3.35913172e-02, 6.35978815e-02, 3.58611523e-01,
        1.35724605e-03, 2.56872798e-01, 1.00000000e+00, 5.25753868e-01,
        8.98552376e-05, 1.26166302e-06],
       [4.06297035e-02, 5.34275935e-02, 1.01103669e-01, 3.31644646e-01,
        2.00050756e-03, 7.11581900e-01, 5.25753868e-01, 1.00000000e+00,
        1.69589545e-04, 1.11702234e-08],
       [2.35399591e-01, 2.00129903e-01, 1.16791052e-01, 2.34734649e-07,
        7.53105751e-01, 2.08418262e-03, 8.98552376e-05, 1.69589545e-04,
        1.00000000e+00, 3.35602210e-05],
       [2.09068998e-06, 1.20288892e-05, 7.22676264e-06, 8.62061048e-10,
        6.19798877e-05, 1.62198197e-08, 1.26166302e-06, 1.11702234e-08,
        3.35602210e-05, 1.00000000e+00]])

We’ll data frame it to see clearly

dfcorr = pd.DataFrame(medcorrij*100)
dfcorr.index = df.culture.tolist()
dfcorr.columns = df.culture.tolist()
dfcorr

	Malekula	Tikopia	Santa Cruz	Yap	Lau Fiji	Trobriand	Chuuk	Manus	Tonga	Hawaii
Malekula	100.000000	86.983526	81.124732	4.343674e-02	51.494751	17.842327	1.608162	4.062970	23.539959	2.090690e-04
Tikopia	86.983526	100.000000	91.542526	8.366129e-02	51.392044	18.708149	3.359132	5.342759	20.012990	1.202889e-03
Santa Cruz	81.124732	91.542526	100.000000	2.318449e-01	36.086869	29.308743	6.359788	10.110367	11.679105	7.226763e-04
Yap	0.043437	0.083661	0.231845	1.000000e+02	0.000755	8.224828	35.861152	33.164465	0.000023	8.620610e-08
Lau Fiji	51.494751	51.392044	36.086869	7.548941e-04	100.000000	1.446421	0.135725	0.200051	75.310575	6.197989e-03
Trobriand	17.842327	18.708149	29.308743	8.224828e+00	1.446421	100.000000	25.687280	71.158190	0.208418	1.621982e-06
Chuuk	1.608162	3.359132	6.359788	3.586115e+01	0.135725	25.687280	100.000000	52.575387	0.008986	1.261663e-04
Manus	4.062970	5.342759	10.110367	3.316446e+01	0.200051	71.158190	52.575387	100.000000	0.016959	1.117022e-06
Tonga	23.539959	20.012990	11.679105	2.347346e-05	75.310575	0.208418	0.008986	0.016959	100.000000	3.356022e-03
Hawaii	0.000209	0.001203	0.000723	8.620610e-08	0.006198	0.000002	0.000126	0.000001	0.003356	1.000000e+02

Notice how there is correlation in the upper left and with Manus and Trobriand. Mcelreath has a distance plot i reproduce below:

Pacific island societies: geographic locations (left) and total tools vs log population with regression fits (right)

To produce a plot like the one on the right, we calculate the posterior predictives with the correlation free part of the model and then overlay the correlations

from scipy.stats import poisson
def compute_pp_no_corr(lpgrid, idata, contact=0):
    alphatrace = idata.posterior['alpha'].values.flatten()
    betaptrace = idata.posterior['betap'].values.flatten()
    tl=len(alphatrace)
    gl=lpgrid.shape[0]
    lam = np.empty((gl, tl))
    lpgrid = lpgrid - lpgrid.mean()
    for i, v in enumerate(lpgrid):
        temp = alphatrace + betaptrace*lpgrid[i]
        lam[i,:] = poisson.rvs(np.exp(temp))
    return lam

lpgrid = np.linspace(6,13,30)
pp = compute_pp_no_corr(lpgrid, mgctrace)
ppmed = np.median(pp, axis=1)
pphpd = az.hdi(pp.T)

/var/folders/wq/mr3zj9r14dzgjnq9rjx_vqbc0000gn/T/ipykernel_99733/1140317622.py:4: FutureWarning: hdi currently interprets 2d data as (draw, shape) but this will change in a future release to (chain, draw) for coherence with other functions
  pphpd = az.hdi(pp.T)

import itertools
corrs={}
for i, j in itertools.product(range(10), range(10)):
    if i <j:
        corrs[(i,j)]=medcorrij[i,j]
corrs

{(0, 1): np.float64(0.8698352620543791),
 (0, 2): np.float64(0.8112473227705732),
 (0, 3): np.float64(0.00043436736396140004),
 (0, 4): np.float64(0.5149475145540022),
 (0, 5): np.float64(0.1784232683108405),
 (0, 6): np.float64(0.016081624526447357),
 (0, 7): np.float64(0.04062970353128279),
 (0, 8): np.float64(0.23539959071681077),
 (0, 9): np.float64(2.0906899790788574e-06),
 (1, 2): np.float64(0.9154252648017558),
 (1, 3): np.float64(0.0008366129005128546),
 (1, 4): np.float64(0.5139204403538437),
 (1, 5): np.float64(0.18708149396439477),
 (1, 6): np.float64(0.03359131718569693),
 (1, 7): np.float64(0.053427593503310396),
 (1, 8): np.float64(0.20012990301592104),
 (1, 9): np.float64(1.202888923895069e-05),
 (2, 3): np.float64(0.0023184490292404336),
 (2, 4): np.float64(0.3608686936824079),
 (2, 5): np.float64(0.29308742903236046),
 (2, 6): np.float64(0.06359788152183848),
 (2, 7): np.float64(0.10110366922210785),
 (2, 8): np.float64(0.11679105204171875),
 (2, 9): np.float64(7.2267626429193134e-06),
 (3, 4): np.float64(7.5489412116086325e-06),
 (3, 5): np.float64(0.08224827819593294),
 (3, 6): np.float64(0.35861152292486287),
 (3, 7): np.float64(0.33164464636786223),
 (3, 8): np.float64(2.347346485820119e-07),
 (3, 9): np.float64(8.620610481644896e-10),
 (4, 5): np.float64(0.014464210496029285),
 (4, 6): np.float64(0.0013572460520712762),
 (4, 7): np.float64(0.002000507564221195),
 (4, 8): np.float64(0.7531057509569615),
 (4, 9): np.float64(6.197988773284955e-05),
 (5, 6): np.float64(0.25687279780700056),
 (5, 7): np.float64(0.7115819000189696),
 (5, 8): np.float64(0.0020841826197110433),
 (5, 9): np.float64(1.621981972350849e-08),
 (6, 7): np.float64(0.5257538678112668),
 (6, 8): np.float64(8.985523762623615e-05),
 (6, 9): np.float64(1.2616630242151445e-06),
 (7, 8): np.float64(0.00016958954461546093),
 (7, 9): np.float64(1.1170223361457242e-08),
 (8, 9): np.float64(3.3560220950963584e-05)}

with sns.plotting_context('poster'):
    plt.plot(df.logpop, df.total_tools,'o', color="g")
    lpv = df.logpop.values
    ttv = df.total_tools.values
    for a,x,y in zip(df.culture.values, lpv, ttv):
        plt.annotate(a, xy=(x,y))
    for i in range(10):
        for j in range(10):
            if i < j:
                plt.plot([lpv[i],lpv[j]],[ttv[i], ttv[j]],'k', alpha=corrs[(i,j)]/1.)
    plt.plot(lpgrid, ppmed, color="g")
    plt.fill_between(lpgrid, pphpd[:,0], pphpd[:,1], color="g", alpha=0.2, lw=1)
    plt.ylim([10, 75])
    plt.xlim([6.5, 13])

Notice how distance probably pulls Fiji up from the median, and how Manus and Trobriand are below the median but highly correlated. A smaller effect can be seen with the triangle on the left. Of-course, causality is uncertain