Our study focused on the factors and their indirect effects on the utilization of provincial health services using the data set of 31 provinces in China from 2010 to 2020. The data in our study was derived from the China Statistical Yearbook 2010-2021. . Referring to the influencing factors of Anderson’s model, we construct the proxy framework for health service utilization associated with predisposing, enabling, and need factors (Fig. 1). The analysis software used Arc GIS 10.2 and STATA 16.0.

### Dependent variable

The dependent variables of this study were the provincial health services, including the annual rate of hospitalization of residents and the average number of outpatient visits per year.

### Independent variable

The independent variable was mainly taken into account from the Anderson model, the literature, and the data available in statistical yearbooks.

#### Predisposing factors

For predisposing factors, we suggest that age, gender, race/ethnicity, education, and the composition of the family structure may have an impact on the utilization of health services. [14, 15, 24,25,26]. Specifically, it was divided into 5 proxies: (i) Sex ratio (SR, %), which reflects the composition by sex. (ii) Rate of minority ethnic groups (EMGR, %), which reflects the composition of minority ethnic groups. (iii) Proportion aged 65 (P65, %), which reflects the composition by age. (iv) Illiteracy rate (IR, %), which reflects the composition of education. (v) Average Household Size (AHS), which reflects the size of the family.

#### favorable factors

In this component, we include 4 proxies: (i) Gross Domestic Product per capita (GDP per capita), which reflects the economic contribution or value creation of each resident to their country or region. (ii) Urbanization rate (UR, %), which reflects the proportion of the urban population in the total population. (iii) Percentage of affiliates to health insurance (MIP, %), which reflects the level of the population that enjoys health insurance. (iv) Health resources (HR), which includes a) Number of medical technical personnel in health institutions for every 10,000 people. b) Number of doctors in health institutions per 10,000 inhabitants. c) Number of registered nurses in health institutions per 10,000 people. d) Number of beds in health institutions for every 10,000 people. For the above four variables, we use principal component analysis (PCA) to convert them to an index of health resources.

#### Need factors

We use two indicators: (i) Life expectancy (LE), which is the total number of years that a group of people at birth can expect to live given an existing age-specific mortality rate [31]. (ii) Perinatal mortality (PM), which is the level of perinatal infant mortality.

### Data analysis

#### Descriptive and exploratory analysis of spatial data

This paper first described the spatial distribution of the annual rate of resident hospitalization and the average number of outpatient visits per year and calculated the growth rate from 2010 to 2020. We used the local Moran I statistic to examine the spatial autocorrelation between the rate annual hospitalization of residents and the average number of outpatient visits per year. We then computed descriptive statistics (observations, means, standard deviations, minimum, maximum, and VIF) to describe the predisposing factors, enabling factors, need factors, and utilization variables for health services in 2010-2020.

#### Dashboard Data Models

Based on the framework for the utilization of health services, we first established a general panel regression model to analyze the impact of various factors influencing the use of provincial health services, the model is represented as follows :

$${UHS}_{\mathrm{it}}=\mathrm{c}+{a}_{1}{\mathrm{PF}}_{\mathrm{it}}+{a}_{2} {EF}_{\mathrm{it}}+{a}_{3}{NF}_{\mathrm{it}}+{\mu }_{i}+{\mu }_{t}+{ \phi }_{it}$$

(1)

where i denotes the province, t represents the year, UHS represents health service utilization as measured by the hospitalization rate and the average number of outpatient visits per year. In each region, PF represents the predisposing factor, EF represents the enabling factors, NF represents the need factors, \(\mu_i\) denotes the individual (spatial) effect, and \(\mu_t\) denotes the effect of time respectively, which could be fixed or random effects. \(\varfi\) is the error term.

#### spatial models

Based on the Anderson model, we use the spatial panel data model to study the factors influencing the utilization of provincial health services in China from 2010 to 2020. First, we examine the normality of the annual hospitalization rate of residents and the average number of outpatients. visits per year using the Shapiro-Wilk (SW) test. Subsequently, based on the panel regression model, we established three spatial regression models: i) the spatial panel autoregressive regression model (SPAR: adding the spatially lagged dependent variable); ii) the spatial panel error model (SPEM: adding the spatially delayed error term); iii) the spatial panel Durbin model (SPDM: adding both the spatially lagged dependent variable and the spatially lagged error term).

The SPAR takes the following form:

$${UHS}_{\mathrm{it}}=\mathrm{c}+\rho \sum {W}_{\mathrm{ij}}\times {UHS}_{\mathrm{it}}+{ a}_{1}{\mathrm{PF}}_{\mathrm{it}}+{a}_{2}{EF}_{\mathrm{it}}+{a}_{3}{NF }_{\mathrm{it}}+{\mu }_{i}+{\mu }_{t}+{\varphi }_{it}$$

(2)

where W denotes the spatial weight matrix, we consider the binary geographic unit matrix = 1 if i and j are neighbors and 0 otherwise. Under the SPAR model framework, the use of health services in a province will be affected by the independent variables of the surrounding provinces.

The SPEM fully considers the spatial correlation of the error terms. The SPEM is established as follows:

$${UHS}_{\mathrm{it}}=\mathrm{c}+{a}_{1}{PF}_{\mathrm{it}}+{a}_{2}{EF}_ {\mathrm{it}}+{a}_{3}{NF}_{\mathrm{it}}+{\mu }_{i}+{\mu }_{t}+{\varphi }_ {it} {\varphi }_{it}=\sum {W}_{ij}{\varphi }_{it}+{\varepsilon }_{it}$$

(3)

\(\varepsilon_{it}\) denotes the error term.

$${UHS}_{\mathrm{it}}=\mathrm{c}+{UHS}_{\mathrm{i}(\mathrm{t}-1)}+\rho \sum {W}_{ \mathrm{ij}}\times {UHS}_{\mathrm{it}}+{aX}_{\mathrm{it}}+\sum {W}_{\mathrm{ij}}\times {X} \mathrm{it}}\theta +{\mu }_{i}+{\mu }_{t}+{\phi }_{it}$$

(4)

\(\theta\) quantifies the spatial spillover effects of spatially lagged independent variables. X denotes predisposing factors, enabling factors and need factors, \(\sum{W_{ij}}\times\text{X}_\text{it}\) which represents the spatial lag term of the independent variables.

SPDM considers the influence of both spatial delay dependent variables and spatial delay independent variables. Therefore, we performed the Lagrange multiplier (LM) test to test the rationality of the spatial panel model construction of the SPAR, SPEM, and SPDM models. and subsequently performed the Wald test and the likelihood ratio (LR) test to evaluate the fit of the model. Then, we first used the LM test (Lagrange multiplier test), and two dependent variables found it appropriate to choose both the SPEM model and the SPAR model, so we initially selected the SPDM model that combines the two (Table 1).

Hausman’s test for the dependent variable of resident hospitalization rate and outpatient visits demonstrated the rationality of using spatial (individual) and time fixed effects models. Table 2 concludes that SPDM could not be simplified to SPAR or SPEM for the dependent variable resident hospitalization rate. Meanwhile, for the dependent variable of the average number of outpatient visits per year, SPDM could also not be simplified to SPAR or SPEM.