Projects Funded for Meredith Fowlie
2020-2021
Predicting Demand for Plant-Based Meat
Meredith Fowlie and Hal Giuliani Gordon
Abstract
Specifics of the Project:
This project aimed to better understand consumer demand for plant-based meat. In contrast to other meat substitutes (tofu, veggie burgers), plant-based meat products are being marketed as indistinguishable in taste and appearance to meat. Producers of plant-based meats (PBM) aim to compete more directly with, and eventually replace, meat.
We received a panel of purchases from well over 200,000 grocery store shoppers who have bought plant based meat at least once, as well as a similarly sized control panel in a proprietary dataset from a nationwide grocery chain. We found that buying and more importantly, rebuying PBM is associated with having previously bought less meat and more meat substitutes. In addition, the people entering the PBM market are no more likely to have bought meat than those who first started buying it, suggesting PBM is struggling to expand its reach to those who could most easily switch away from real meat. In addition, because of how promotional pricing is determined at this nationwide chain, we were able to run event study regressions to test the theory that PBM has is a robust substitute for beef in grocery stores. In these regressions, we find little evidence for switching between meat and PBM.
Summary of Results:
After receiving the large sample of purchases, we worked to create covariates from the purchases. While PBM was only first introduced in limited stores in mid 2017, all purchases since the beginning of 2016 are included in the data. This allowed us to characterize households by their purchases in prior periods, as well as flexibly examine what types of purchases could ultimately predict who will buy PBM and who will rebuy it, with the ultimate aim of trying to better understand if PBM are attracting the types of customers who are likely to substitute it for real meat.
Covariates created from this pre-period, as well as those matched from a credit company who provides estimates of the head of household’s age and race, and the size and income of the household, were used to see how well we could predict three things: buying plant based meat, rebuying plant based meat, and the date of first buying plant based meat.
Those who bought PBM were more likely to be younger, to have a higher income, and to be shopping in more liberal voting areas. While households who bought PBM were more likely to be "low meat" households that spent less than 5% of their pre-period budget on meat, the difference was only a half percentage point. This supports the often repeated fact that PBM customers also buy meat. However, they still bought less meat, with a larger gap in their average market basket raw percent devoted to meat in the pre period. In addition, buyers of PBM bought about 3 times as much veggie burgers and tofu.
Rebuyers were more likely to be "low meat" purchasers and had spent more on traditional replacements and a little less on meat in the pre-period. That seems to infer that PBM was more likely to catch on with customers who were already interested in replacing meat in their diets. Early buyers were younger and less wealthy (although still wealthier than those who never bought at all). Earlier buyers were more likely to be ’low meat’, bought less meat and ground beef, and bought more meat replacements.
In order to examine whether the large number of covariates coming from pre period purchases could more accurately predict buying PBM, rebuying it, and buying it later or earlier, we implemented regressions using the least absolute shrinkage and selection operator (Lasso), a common machine learning algorithm that automatically selects a model so that many coefficients go to zero. While not causal, these models are used to better understand who the buyers were, and who was most likely to rebuy or buy later.
The OLS and Lasso results have less predictive power than hoped. The highest R^2s are for the buy/don’t buy decision. There, the OLS models (with nearly 500 variables) have an R^2 of just 0.16. The Lassos have slightly lower R^2s. Unfortunately, the Lassos only collapse the coefficients to zero of about half the categorical variables. This means we are not able to single out a handful of traits that are much more important in predicting the buy/don’t buy decision.
The models on early and late adoption and rebuying within 3 months have even lower R^2s, but are at least able to reduce the number of non-zero coefficients a bit better. When the sample is reduced even further to just California based stores, the Lasso was able to collapse many more coefficients to zero. Table 1.8 and 1.9 from our paper are reproduced below, showing the covariates related to rebuying and related to the week of first buy.
From a climate and animal welfare perspective, we would have liked to see rebuying popular with meat eaters, but instead, our Lasso regressions found the opposite (Table 1.8). Milk, bacon, beef, meat, pork, and ground beef are all predictive of not rebuying PBM, while meat and dairy alternatives, seafood, tofu, kombucha and expensive produce are related to rebuying. This table tells a clear story that customers who are more likely to incorporate plant based meat into their diets are much less likely to have had a diet high in the ground beef that PBM cheerleaders hope to be replacing.
While those households who bought more meat in the pre period were less likely to buy PBM, we were hoping that those types of households would be more likely to have bought PBM in the later period (as information about PBM started reaching more households due to increased media stories), but in table 1.9, we did not find that to be the case. Early adopters were more likely to have bought meat and dairy alternatives, tofu, and organic vegetables. This fits the story that early adopters of PBM were already interested in meat alternatives. However, none of the coefficients positively related to week of first purchase (meaning they bought later) were related to meat purchases, which suggests PBM was not spreading to meat eaters during the end of 2019. From these data, I conclude that PBM has a lot longer to go on making itself more attractive to the kinds of customers who are most likely to be substituting away from beef.
To study if PBM was crowding out beef, we used the as good as random variation of when PBM went on promotion (sale price). After interviewing officials with the corporate office of the grocery chain and running our own tests, we found that the temporary price cuts of PBM were imposed at the region level, and were not coordinated with each other, nor were they correlated with prior sales. We then used these events in event study difference in difference regressions.
While these regressions did show that PBM promotions greatly increase sales of PBM (figure 1.3 from our paper reproduced below), they have little to no measurable effect on beef (figure 1.4). There are a number of reasons to think these regression results are not perfect. First, the number of clusters, 12, was quite low, but that was the level at which prices were set. Second, the predicted demand change for PBM in the first week of the promotion (0.003 lbs) was probably not large enough to overcome the normal noise in the much, much larger ground beef markets. Yet, the main finding remains that we are unable to detect any effect of PBM on the demand for meat.
2018-2019
The Effect of Immigration Enforcement on Agricultural WorkersMovement
Meredith Fowlie
Abstract
Objectives of the Project:
This research project aimed to see how recent changes in immigration enforcement affected the movement and employment of Hispanic immigrant farm labor in rural California.
As stated in the grant proposal, this project relied on a few important facts. First, California agricultural communities are heavily dependent on the employment of undocumented farm workers. Second, after taking office at the end of January 2017, there was a large increase in arrests made by Immigration and Customs Enforcement (ICE). This increase in enforcement activity likely has had an effect on the employment and movement of these Hispanic immigrants, potentially causing significant economic problems for the communities that rely on their labor.
Measuring the impacts on agricultural output is likely to be a difficult research problem, but a more simple first order question to be answered is if the change in policy had a noticeable effect on Hispanic communities. This project sought to use a proprietary dataset of movement from 25 million cell phones in the United States. The dataset records the location of each phone every 15 minutes, and from this, we planned to measure a number of important things. First, the distance traveled by each phone each day is a rough proxy for the employment or other economic activity of the user. Two, measuring how frequently cell phones visit agricultural areas may be a leading indicator of agricultural employment, especially among communities that are likely to be underrepresented in government databases. Three, measuring how often likely agricultural workers move from one community to another is a potentially important sign of those workers’ economic prospects and the health of an industry that depends on migrant labor.
Progress of the Project:
Soon after receiving this grant, we worked to complete the data use agreement (DUA) with the data vendor, Safegraph. The data use agreement required University approval, and we worked closely with the Berkeley Office of Intellectual Property & Industry Alliances (IPIRA). IPIRA required us to work with the Berkeley Information Security and Policy office to develop a security plan under the Minimum Security Standards for Electronic Information. To that end, we created a Protection Level 1 security plan that relied upon storage and processing of the data on secure Amazon Web Service (AWS) servers. The security plan, made with consultation from AWS, was eventually approved by Information Security and Policy office and submitted to IPIRA.
In the meantime, however, the relationship between Safegraph and IPIRA became strained after a few rounds of negotiation on amending Safegraph’s standard DUA and other unforeseen issues. In late 2018, we were told that the data Safegraph were willing to share with their academic partners had changed. In early 2019, we were able to amend our DUA to accept the new form of data and were finally given access to it.
The new data that was shared by Safegraph was far less rich and restricted our ability to answer the questions we hoped to address. Instead of sharing the geolocation of every cell phone in their sample every 15 minutes, Safegraph reported the number of unique visits to Safegraph defined places of interest (POIs) each month. Safegraph also reported the count of visits by likely home census block group (CBG) of the visitors to each POI. This was censured, however, so that CBG visit counts were only reported if at least 5 visitors were detected at a POI in a certain month. POIs are for the most part commercial locations (restaurants, grocery stores, big box stores, hotels), plus medical offices and some schools. Farms were notably omitted from POIs, and visits by employees were specifically not included.
The restrictions on the type of data provided drastically changed the scope of the questions we could answer. Without the panel of daily locations, we could no longer tally the length of daily trips of cell phone owners who were likely Hispanic and or immigrants. We also could no longer track agricultural workers as they traveled with harvesting seasons. In addition, the limits of POIs to mostly commercial locations (and not farms) prevented us from tracking agricultural employment at all.
As a result, we narrowed our research questions to investigate if the inauguration of Donald Trump and the subsequent large increase in enforcement activity impacted Hispanic communities by reducing the number of visits made to commercial locations and trips for medical treatment. In order to do this, we matched the CBGs listed in Safegraph data to census data on the percent of respondents identifying as Hispanic from the 5-year American Community Survey (ACS).
We tested this hypothesis by looking at the reported number of trips made in three geographies: the entire state of California, the central valley of California, and two bordering counties of California and Arizona. The identification strategy employed was a difference in difference, where trips being made to POIs in the two months before the inauguration were compared to trips made after, with the treatment variable (after inauguration) being intersected with percent Hispanic from the 5-year ACS. In one specification, Yuma County in Arizona and Imperial County in California are both used in a triple difference. These counties were chosen because they border each other, and January and February is the harvest season for lettuce. This triple difference was conducted to measure if the more conservative immigration policies in Arizona would have a difference in the movement of potential lettuce workers and their families under more aggressive immigration enforcement.
Findings:
The ideal data (which had been promised but not delivered) would have included the home census block group of each visitor to every POI. The data provided, however, strictly censored the home POIs of visitors if there were under 5 visits a month from that census block group. This meant that we ran 2 imperfect specifications: (1) using all visits, but assigning the demographics of the census block the POI was located in to all visitors, and (2) using only visits from visitors whose census block group was identified.
Using the first strategy, we found no statistical difference between counts of visits from CBGs with large proportions of Hispanics and those without in the geographies we expected to see them in: California at large, in the Central Valley specifically, and when comparing California boarder counties to Arizona counties. While the original data as promised may have given us a clearer picture, with the available data, we were not able to reject the null hypothesis that increased immigration enforcement had no effects on the movement of targeted communities in California. We had also hypothesized that the effect would be more pronounced in important types of POIs (medical offices or schools) where immigrants were more likely to come into contact with government agencies, but we found no statistical difference in visits to these locations.
We conducted the same regressions on the sample of visits that had home location identified. These regressions traded one type of error (misspecifying the home location of the visitors) for another type (censoring of the data). Our results were not materially different, however, and we could not reject the null hypothesis of no effect on visits.
We believe we could have conducted a better study with more conclusive results with the original data we were promised, but with the data we were provided, we were unable to reject the null that the inauguration of Donald Trump had no effects on the number of visits to stores, schools, and hospitals made by people living in Hispanic neighborhoods in California.
2016-2017
Land Use and Transportation: Technology and the Spatial Structure of Cities
Meredith Fowlie and Jonathan Kadish
Abstract
Specific Objectives of the Project
This project investigated the effect of transportation technology shocks on land
development within and around cities. Reductions in transportation costs over
time have resulted in land use conversion (generally agricultural to urban, as
cities expand), leading to urban sprawl and traffic congestion. Motivated by a
rapidly transforming transportation sector (e.g. high-speed rail, uber, driverless
cars), we estimated how historical changes in transportation costs have affected
urban growth in order to inform policies that promote optimal regulation of new
expansion and development.
Summary of Results
We create a novel dataset to better understand the dynamics of city growth and
transportation: national data on housing (38 million residential real estate
transactions and housing attributes from Public Records), employment (locations
and wages from the US Census County Business Patterns), and transportation
(travel times from Google, rideshare coverage from Uber). We examined how
three changes in transportation costs - (1) the National Maximum Speed Law, (2)
gasoline prices, and (3) the rollout of Uber and Lyft - have affected the structure
of cities and land price gradients within cities. I find that real estate prices
respond quickly and significantly to transportation cost changes. Housing price
changes vary with distance from the city. While changes in marginal costs (time
and fuel) have increased prices in the suburbs and encouraged suburbanization,
ridesharing services have led to larger price increases near city-centers. Future
research will explore whether the price changes from services like Uber are
attributable to substitution away from car ownership.
2013-2014
Water Conservation through Informed Residential Electricity Consumption
Meredith Fowlie
Abstract
Specific Objectives of the Project
The issue that this project addresses is the heavy consumption of water in the Californian electricity system. The specific objectives related to that topic are:
1. Research: develop an algorithm to calculate, in real time, the current marginal rate of indirect water use from California electricity consumption per KWh, weighted by the social value of that water (e.g. higher during droughts and in stressed watersheds).
2. Policy: provide this information in real time to interested consumers in California, thus directly creating a tool to permit environmental-based choice in electricity timing.
3. Research: through randomized variation in the presentation of information, test the efficacy of this technique in reducing water consumption from electricity in California.
Project Report/Summary of Results
The production of electricity uses a considerable amount of water in California, with consequences for the availability of water for agricultural use in the state. Further, the marginal indirect water consumption of electricity use varies over time; thus, consumers could in theory reduce their indirect water consumption by shifting their consumption of electricity to less water-intensive times. This project aims to construct and measure the efficacy of a tool to enable that kind of inter-temporal shifting of electricity consumption.
This research has proceeded in three separate steps: (1) developing an algorithm to calculate the marginal rate of indirect water use from California electricity consumption per kilowatt-hour in real time; (2) providing the results of that algorithm to consumers in California, directly enabling a new form of water conservation; and (3) measuring the impact of this information on consumer electricity timing and therefore indirect water consumption. An unanticipated additional goal met by this project was the development of a variety of software tools to reduce the effort “cost” of such shifting.
2011-2012
The Impact of Food Borne Disease Outbreaks on Consumer Purchases and Preferences: The Case of the 2010 Salmonella Outbreak
Meredith Fowlie and Sofia Villas-Boas
Abstract
Objectives of the Project
1. Determine whether egg consumption decreased due to the news of the Salmonella outbreak.
2. Determine whether consumers are substituting away from conventional eggs towards other types of eggs (organic, cage free, free range) or egg substitutes.
3. Determine how long this substitution effect (if any) lasts.
4. Examine whether there are any heterogeneous effects based on income and demographics.
Summary of Results
We examine how consumers in California reacted to three consecutive egg recalls during the 2010 Salmonella outbreak. Eggs infected with Salmonella were recalled through codes clearly labeled in egg boxes, leaving no infected eggs in stores. Using a large product-level scanner data set from a national grocery chain, we test whether consumers reduced egg purchases. Using a difference-in-difference approach, we find a 9 percent reduction in egg sales in California following the three egg recalls. Given an overall price elasticity for eggs in U.S. households of -0.1, this sales reduction is comparable to an almost 100% increase in price. We find no evidence of substitution toward other “greener" type of eggs, such as organic or cage free eggs. We also find no correlation with demographics such as income, but we do find that areas that had a larger than average household size decreased egg purchases more. We also find differentiated effects among Northern and Southern Californian stores. Although the national grocery chain had infected eggs only in Northern California, we find that Southern Californian stores had lower egg sales as well. The sales reduction in Southern California was half as large as the reduction in Northern California, and is consistent with media and reputation effects being significant determinants of demand, even in the absence of an actual food infection occurring in a region.