The Alcohol-Income Puzzle: An Econometric Analysis

The following paper is a collaborative research project involving data from the National Longitudinal Survey of Youth 1979. The statistical analysis and write-up were done in collaboration with 8 classmates.

Abstract

The “alcohol/income puzzle” refers to the phenomenon of potential positive correlation between drinking and increased wages. While most existing research indicates a positive correlation between the two, researchers have also found evidence to reject the phenomenon. In writing this paper, we sought to contribute to the current literature. Taking an industry-specific approach, we examined whether the number of days an individual drinks per month is associated with their income. To examine this relationship, we used the National Longitudinal Survey of Youth 1979 dataset from the U.S. Bureau of Labor Statistics. We performed an OLS multivariate regression of log wages on days drinking per month, with interactions between drinking and selected industries of construction, retail, and professional services, controlling for both demographic and geographic variables. The relationship between high drinking, wages, and industry did not produce any statistically significant results. While we were unable to reject our null hypotheses – that days drinking does not have an effect on wages – we did find that alcohol consumption is more correlated to earning higher wages in certain industries. Specifically, the return of drinking each additional day is 0.13% higher for construction workers than all other industries, excluding professional services and retail workers. The return is 0.44% lower for professional service employees than all other industries, excluding construction and retail workers. Finally, the return is 0.07% lower for retail workers than all other industries, excluding professional services and construction.

Introduction

The “alcohol/income puzzle” has been used to describe the possibly surprising result that drinking is correlated with increased wages in several studies. Auld found that, all else equal, moderate drinking is associated with 10% higher wages, and heavy drinking is associated with 12% higher wages, compared to those who do not drink at all. Berger and Leigh found that nondrinkers earn far less than those who do drink: 12.8% less for males and 25.5% less for females. While some evidence suggests that problem drinking (alcoholism) is detrimental to labor force participation and therefore results in lower incomes, this finding is disputed.

Using the National Longitudinal Survey of Youth (NLSY) 1979 wave data set, we explored this “alcohol/income puzzle”: specifically, we examined whether or not the number of days an individual drinks per month is associated with their income. Our analysis focused on three industries: construction, professional and related services, and retail. We hypothesized that drinking would benefit workers’ wages in the professional and related services industry but have no significant effect in construction and retail. In order to test our hypothesis, we performed an OLS multivariate regression, with interactions between a variable determinant of the amount of alcohol consumption and each industry. In order to find sound estimates (necessary for a causal inference) we controlled for confounding variables and dropped data points that were incomplete: our intention was to get as pure a relationship between wages, drinking, and industry as possible. 

Data

Using the NLSY data set, we constructed a series of dummy variables. Among the dummy variables on the list we created were drinking quantity, gender, employment, race, religion, region, industry, and state of residence. Notably, high_drink=1 if an individual consumed alcohol for more than 10 days of the past 30 days, emp = 1 if an individual was employed, and female =1 if the individual was female. Our race dummy variables were racedum1, racedum2, and racedum3 for hispanic, black, and other_race (other_race for anyone not in the hispanic or black variable) respectively. Our dummy variable rel identified individuals who, at the time of data collection, were religious, with rel=1. Our region dummy variables were regdum1, regdum2, regdum3, and regdum4 for northeast, northcentral, south, and west respectively. Industry dummies construct, prof, and retail represented the construction, professional and related services, and retail trade industries, respectively. We used lnwg, the natural log of an individual’s wage, to model wages; it served as our left hand side, dependent variable. For our right hand side independent variable, we used days, the number of days an individual drank per month. Finally, we created three interaction variables: days_const, days_prof, and days_retail. These represented the interactions between days drinking and industry of consequence, also serving as our main right hand side, independent variables. The means and standard deviations of all variables used in our analysis are displayed in Table 1 (Table of Means).

Methods

Our multivariate regression uses lnwg as the left hand side, dependent variable, and days_const, days_prof, and days_retail as our independent variables of interest. The econometric model which incorporates those variables is shown below. This model also contains controls: 

lnwgig= b0 + b1days + δDi_ind + γgDi_ind*days +  zigControls+ ui

where “Di_ind represents the dummy variables for the selected industries, Di_ind*days are the interaction variables of the industry dummies and drinking variable, and “g” indexes the cluster effect. 

A balance test allows us to test the independence assumption that there is no relationship between the error term and the independent variable defined by E[u|x]=0, by examining relationships between potential control variables and our independent variable, days. Thus, in order to identify the appropriate control to establish a pure relationship between drinking and wages, we constructed a balance test by regressing days on potential controls using the model shown below:

daysi = θ0 + γi Controls+ νi

We used this model instead of regressing potential controls on days as it allowed us to jointly test our potential control variables and thus analyze the relationship between each control variable and days, while holding other potential controls constant. A statistically significant coefficient on any of these control variables from the above regression would indicate a significant relationship between that control variable and our days  variable. We included such variables as controls in our multivariate regression of lnwg on days in order to avoid omitted variables bias.

Table 2 (Balance Test) includes only the variables with statistically significant coefficients. We found significant negative relationships for female (gender), religion, weight, and high school graduate, to name a few. We also found significant positive relationships, including height and afqtrev (Armed Forces Qualification Test score).  Intuitively, some of the relationships (e.g. religiousness sharing a relationship with fewer days drinking per month) make sense (several religions discourage alcohol consumption). Other relationships, including the negative correlation between weight and days, as well as the positive correlation between afqtrev and days, had less intuitive explanations. Nonetheless, all of the variables which shared significant correlations with the number of days drinking were used as controls in our multivariate regression to satisfy the independence assumption critical for causal estimates.

In order to correct for biased standard errors due to correlation of errors, we used a cluster correction (which also included robust standard errors to correct for bias from heteroskedasticity). Correlation of errors occurs when observations in a data set are not unrelated. For instance, observations from one household would likely involve strong similarities in socioeconomic background. Accordingly, these observations should not be treated as completely independent from one another. Thus, we performed three separate regressions: first without clusters, second clustering on household ID, and finally clustering on state. In our analysis of the results, we compared the effects of the cluster corrections on our standard errors.

Additionally, we graphed a regression of lnwg on days drinking per 30 days – specific to industry – on an x-axis of number of days drinking per month, and a y-axis of the natural log of wages (lnwg). The data was transformed by collapsing based on the number of days drinking to create the mean lnwg for each number of days. The data was weighted by frequency (the number of observations each point represented). Table 1 (Table of Means) features the average values of the dependent and independent variables. The construction industry graphs, Graph 1 & 2, display a positive linear relationship of 0.005 which is not significant relative to the ~9.8 average value. The professional and related services industry graphs, Graph 3 & 4, display a positive linear relationship of 0.016 which is not significant relative to the ~10 average value. The retail industry graphs, Graph 5 & 6, display a positive linear relationship of 0.015 which is not significant relative to the ~9.4 average value. No controls were included in these regressions. 

Results

With three separate regressions – a non-clustered regression, a state clustered regression, and a household clustered regression – our coefficient results were the same but with varying standard errors, demonstrated in Table 3 (Table of Results). For each interaction variable (days_const, days_prof, and days_retail) the coefficients were 0.00133, -0.00444, and 0.000673 respectively. The slopes represented the wage returns of drinking each additional day in a 30 day period, relative to all other industries and holding all else constant. We found that the return of drinking each additional day was 0.13% higher for construction workers than all other industries, excluding professional services and retail workers. The return was 0.44% lower for professional service employees than all other industries, excluding construction and retail workers. Finally, the return was 0.07% lower for retail workers than all other industries, excluding professional services and construction.

These correlations are not significant based on conventional thresholds for statistical significance. Therefore, our null hypothesis – that days drinking across different industries does not have an impact on individual wages – cannot be rejected.

Conclusion

This paper, written to explore the relationship between wages and drinking in different industries, did not return results with large magnitude or statistical significance. There are many factors which may have contributed to this outcome. For example, we cannot control for many of the variables that lead people to become heavy drinkers and, as a result, the effect those confounding variables have on future wage success; divorce or trauma could be potential factors. These data points, among others, are not as widely reported as employment, wage, and other common statistics, making controlling for their effect more difficult. Additionally, this study does not control for many other factors that could lead to heterogeneity within wages besides performance on a standardized test at age 14. A future study taking advantage of panel data within the NLSY data set could explore how difficult-to-measure factors such as ability or likeability (which could impact promotions and, therefore, wages), may impact these results. We also chose to focus on only 3 industries. Including all industries or looking at specific roles within industries could give a clearer picture of the impact of alcohol consumption on wages. 

We found an insignificant positive correlation between the number of days drinking and increased wages. Ultimately, because we could not reject the null hypothesis, we could not use causal inference to find a strong relationship between wages, drinking, and industries. Our research does not strongly defend the findings of the “alcohol/income puzzle,” but it does weakly support the current belief that drinking is positively correlated with wages.  A future study could assess the same relationship (that between wages, drinking, and industry) using an instrumental variable such as state taxes or alcohol cost and potentially see a significant outcome.

Appendix

Graph 1&2: Construction Industry

OLS Regression and Distribution of Days Drinking for Workers in the Construction Industry

Graph 3&4: Professional and Related Services Industry

OLS Regression and Distribution of Days Drinking for Workers in the Professional and Related ServicesIndustry

Graph 5&6: Retail Industry

OLS Regression and Distribution of Days Drinking for Workers in the Retail Industry

Table 1: Table of Means

  Construction Professional Services    Retail  
VARIABLES Overall High Low Overall High Low Overall High Low
Drank 10+ days in last month 0.3842 0.1803 0.2237
[.487] [.3846] [.417]
Days drinking in the last month 8.9326 18.7484 2.808 4.8302 15.6716 2.445 5.8088 18.2613 2.2204
[9.4499] [8.0632] [2.5991] [6.2032] [6.3644] [2.5499] [7.8826] [7.68] [2.3431]
Religion 0.8435 0.8668 0.8289 0.9074 0.8779 0.9139 0.8993 0.8494 0.9137
[.3638] [.3408] [.3773] [.29] [.3283] [.2807] [.3012] [.3588] [.2811]
Employed 0.8981 0.9098 0.9087 0.891 0.8795 0.9378
[.3029] [.2875] [.2882] [.3125] [.3258] [.2423]
Salary 23392.85 24357.81 22790.76 25900.45 33323.38 24267.39 18989.72 22187.99 18068.08
[16641.29] [17054.11] [16383.97] [21655.25] [27886.41] [19674.32] [16932.68] [15168.77] [17311.2]
Log of salary 9.8571 9.8683 9.8499 9.8825 10.0804 9.8386 9.5638 9.8215 9.4886
[.8366] [.8746] [.8128] [.9466] [1.0644] [.9134] [.9541] [.8014] [.9822]
Female 0.083 0.035 0.1129 0.7059 0.4897 0.7535 0.5143 0.277 0.5827
[.2762] [.1845] [.3171] [.4558] [.5013] [.4312] [.5001] [.4489] [.4935]
Black 0.078 0.0901 0.0704 0.1297 0.1152 0.1329 0.0968 0.1017 0.0954
[.2684] [.2872] [.2563] [.3361] [.3201] [.3396] [.2959] [.3033] [.294]
High School Graduate 12.0487 11.8338 12.1828 14.9159 15.5711 14.7718 12.7361 12.9274 12.6809
[1.8085] [1.6362] [1.8989] [2.6111] [2.8209] [2.5419] [1.9272] [1.7915] [1.9626]
Performance on AFQT exam 40.0843 39.9536 40.1658 58.2585 66.8169 56.3757 45.52 48.0291 44.797
[25.3371] [23.4415] [26.4965] [27.4359] [26.9017] [27.2082] [25.9614] [24.7021] [26.2904]
Health 0.059 0.039 0.0714 0.0483 0.055 0.0468 0.0592 0.0239 0.0694
[.2358] [.1942] [.258] [.2144] [.2285] [.2113] [.2361] [.1532] [.2543]
Age 33.1749 33.0584 33.2476 33.1807 33.5196 33.1062 33.0553 32.8978 33.1007
[2.3203] [2.3037] [2.3322] [2.319] [2.3592] [2.3048] [2.3372] [2.2415] [2.3641]
Lived with father at age 14 0.7808 0.7763 0.7836 0.8185 0.8263 0.8167 0.7663 0.8032 0.7556
[.4142] [.4181] [.4126] [.3856] [.3799] [.3871] [.4235] [.3989] [.4301]
Lived with mother at age 14 0.9331 0.9409 0.9282 0.9642 0.9736 0.9621 0.9349 0.9244 0.938
[.2502] [.2366] [.2586] [.1859] [.1609] [.191] [.2468] [.2653] [.2414]
Mother worked 0.5049 0.4766 0.5227 0.537 0.552 0.5337 0.5253 0.5278 0.5245
[.5006] [.5011] [.5005] [.4988] [.4987] [.4991] [.4997] [.5008] [.4998]
Father worked 0.8596 0.889 0.8413 0.8399 0.8921 0.8284 0.8222 0.8436 0.8161
[.3478] [.3152] [.3661] [.3669] [.3111] [.3772] [.3826] [.3644] [.3878]
Mother high school graduate 10.797 11.1822 10.5567 11.7623 12.5561 11.5876 10.938 11.5235 10.7693
[3.3414] [2.8647] [3.5916] [3.5109] [2.7607] [3.6335] [3.4199] [3.1325] [3.4828]
Father high school graduate 10.3371 10.2951 10.3632 11.7588 13.0109 11.4834 10.9473 11.9809 10.6494
[4.3219] [4.175] [4.4191] [4.6122] [4.3758] [4.6196] [4.7865] [4.7811] [4.7508]
Observations 409 156 253 1174 184 990 727 156 571
Standard deviations in brackets

 

Table 2: Balance Test Table

  (1)  
VARIABLES Days drinking in the last month
Female -3.141*** (0.273)
Religion -0.576* (0.303)
Lived with mother at age 14 -0.881** (0.391)
Weight -0.0211*** (0.00280)
Height 0.139*** (0.0351)
Performance on AFQT exam 0.0195*** (0.00485)
High School Graduate -0.312*** (0.0529)
Number of kids in household -0.213* (0.118)
Hispanic -0.909*** (0.322)
Father high school graduate 0.0459** (0.0230)
Mother high school graduate 0.0561* (0.0293)
Constant -7.324 (10.28)
Observations 6,050
R-squared 0.111
Standard errors in parentheses    
*** p<0.01, ** p<0.05, * p<0.1

Table 3: Results Table

  (1) (2) (3)
VARIABLES State Cluster Household Cluster No Cluster
       
Days drinking in the last month 0.00296 0.00296 0.00296
(0.00206) (0.00240) (0.00228)
Construction industry -0.108 -0.108* -0.108
(0.0792) (0.0644) (0.0683)
Professional and Related Services -0.0308 -0.0308 -0.0308
(0.0379) (0.0405) (0.0413)
Retail Industry -0.217*** -0.217*** -0.217***
(0.0512) (0.0493) (0.0479)
days_const interaction 0.00133 0.00133 0.00133
(0.00493) (0.00478) (0.00546)
days_prof interaction -0.00444 -0.00444 -0.00444
(0.00462) (0.00484) (0.00500)
days_retail interaction 0.000673 0.000673 0.000673
(0.00483) (0.00523) (0.00503)
Constant 7.090*** 7.090*** 7.090***
(0.297) (0.347) (0.331)
Observations 5,247 5,247 5,247
R-squared 0.200 0.200 0.200
Standard errors in parentheses
*** p<0.01, ** p<0.05, * p<0.1

The above regression results come from 3 multivariate regressions examining the impact of drinking on wages in 3 industries using the equation mentioned in the above Methods section. These results were controlled for sex, religion, weight, height, performance on AFQT intelligence test, education, number of children in household growing up, ethnicity, parent’s education, and whether or not individuals lived with their parents at age 14.