Is the average of betas from Y ~ X…
Clash Royale CLAN TAG#URR8PPP
I am interested in the relationship between two time series variables: $Y$ and $X$. The two variables are related to each other, and it’s not clear from theory which one causes the other.
Given this, I have no good reason to prefer the linear regression $ Y = alpha + beta X$ over $ X = kappa + gamma Y $.
Clearly there is some relationship between $beta$ and $gamma$, though I recall enough statistics to understand that $beta = 1/ gamma$ is not true. Or perhaps it’s not even close? I’m a bit hazy.
The problem is to decide how much of $X$ one ought to hold against $Y$.
I’m considering taking the average of $beta$ and $1/ gamma$ and using that as the hedge ratio.
Is the average of $beta$ and $1/ gamma$ a meaningful concept?
And as a secondary question (perhaps this should be another post), what is the appropriate way to deal with the fact that the two variables are related to each other — meaning that there really isn’t an independent and dependent variable?

show 13 more comments
I am interested in the relationship between two time series variables: $Y$ and $X$. The two variables are related to each other, and it’s not clear from theory which one causes the other.
Given this, I have no good reason to prefer the linear regression $ Y = alpha + beta X$ over $ X = kappa + gamma Y $.
Clearly there is some relationship between $beta$ and $gamma$, though I recall enough statistics to understand that $beta = 1/ gamma$ is not true. Or perhaps it’s not even close? I’m a bit hazy.
The problem is to decide how much of $X$ one ought to hold against $Y$.
I’m considering taking the average of $beta$ and $1/ gamma$ and using that as the hedge ratio.
Is the average of $beta$ and $1/ gamma$ a meaningful concept?
And as a secondary question (perhaps this should be another post), what is the appropriate way to deal with the fact that the two variables are related to each other — meaning that there really isn’t an independent and dependent variable?

1
The problem is not causality but instead the errors of measurement (it is just that often the dependent variable Y is the one with large measurement error, making “Y = a + B x + error” the common expression) Do you have an idea about the errors in the measurement of X and Y.
– Martijn Weterings
Jan 6 at 12:04

1
The exact values of $beta$ and $gamma$ can be found in this answer of mine to Effect of switching responses and explanatory variables…, and, as you suspect, $beta$ is not the reciprocal of $gamma$, and averaging $beta$ and $1/gamma$ is not the right way to go. A pictorial view of what $beta$ and $gamma$ are minimizing is given in Elvis’s answer to the same question, and he introduces a”least rectangles” regression that you might want …..
– Dilip Sarwate
Jan 6 at 15:43

3
You are in the ideal scenario where the choice of technique has a direct, physically measurable impact; you can simply measure the outofsample hedging error for each estimate, and compare them. Also, typically optimal hedging is better handled by using a VECM model (see for example Gatarek & Johansen, 2014, Optimal hedging with the cointegrated vector autoregressive model), which does not require choosing to model Y as a function of X or viceversa.
– Chris Haug
Jan 6 at 16:32 
1
You might want to look at the geometric mean $sqrt{dfrac{beta}{gamma}}$ as a possibility (if they are both negative you might take the negative square root). Then look at $dfrac{s_y}{s_x}$, which should be very similar
– Henry
Jan 6 at 18:37

1
@ricardo Note that I specified outofsample error, so not the (insample) fit of the model. And it is entirely possible for the optimal hedge ratio to change over time (especially if the relationship is not actually linear), that doesn’t change the fact that figuring out the best hedging strategy can be most directly done by backtesting the model and observing the results.
– Chris Haug
Jan 6 at 23:53

show 13 more comments
I am interested in the relationship between two time series variables: $Y$ and $X$. The two variables are related to each other, and it’s not clear from theory which one causes the other.
Given this, I have no good reason to prefer the linear regression $ Y = alpha + beta X$ over $ X = kappa + gamma Y $.
Clearly there is some relationship between $beta$ and $gamma$, though I recall enough statistics to understand that $beta = 1/ gamma$ is not true. Or perhaps it’s not even close? I’m a bit hazy.
The problem is to decide how much of $X$ one ought to hold against $Y$.
I’m considering taking the average of $beta$ and $1/ gamma$ and using that as the hedge ratio.
Is the average of $beta$ and $1/ gamma$ a meaningful concept?
And as a secondary question (perhaps this should be another post), what is the appropriate way to deal with the fact that the two variables are related to each other — meaning that there really isn’t an independent and dependent variable?
I am interested in the relationship between two time series variables: $Y$ and $X$. The two variables are related to each other, and it’s not clear from theory which one causes the other.
Given this, I have no good reason to prefer the linear regression $ Y = alpha + beta X$ over $ X = kappa + gamma Y $.
Clearly there is some relationship between $beta$ and $gamma$, though I recall enough statistics to understand that $beta = 1/ gamma$ is not true. Or perhaps it’s not even close? I’m a bit hazy.
The problem is to decide how much of $X$ one ought to hold against $Y$.
I’m considering taking the average of $beta$ and $1/ gamma$ and using that as the hedge ratio.
Is the average of $beta$ and $1/ gamma$ a meaningful concept?
And as a secondary question (perhaps this should be another post), what is the appropriate way to deal with the fact that the two variables are related to each other — meaning that there really isn’t an independent and dependent variable?

1
The problem is not causality but instead the errors of measurement (it is just that often the dependent variable Y is the one with large measurement error, making “Y = a + B x + error” the common expression) Do you have an idea about the errors in the measurement of X and Y.
– Martijn Weterings
Jan 6 at 12:04

1
The exact values of $beta$ and $gamma$ can be found in this answer of mine to Effect of switching responses and explanatory variables…, and, as you suspect, $beta$ is not the reciprocal of $gamma$, and averaging $beta$ and $1/gamma$ is not the right way to go. A pictorial view of what $beta$ and $gamma$ are minimizing is given in Elvis’s answer to the same question, and he introduces a”least rectangles” regression that you might want …..
– Dilip Sarwate
Jan 6 at 15:43

3
You are in the ideal scenario where the choice of technique has a direct, physically measurable impact; you can simply measure the outofsample hedging error for each estimate, and compare them. Also, typically optimal hedging is better handled by using a VECM model (see for example Gatarek & Johansen, 2014, Optimal hedging with the cointegrated vector autoregressive model), which does not require choosing to model Y as a function of X or viceversa.
– Chris Haug
Jan 6 at 16:32 
1
You might want to look at the geometric mean $sqrt{dfrac{beta}{gamma}}$ as a possibility (if they are both negative you might take the negative square root). Then look at $dfrac{s_y}{s_x}$, which should be very similar
– Henry
Jan 6 at 18:37

1
@ricardo Note that I specified outofsample error, so not the (insample) fit of the model. And it is entirely possible for the optimal hedge ratio to change over time (especially if the relationship is not actually linear), that doesn’t change the fact that figuring out the best hedging strategy can be most directly done by backtesting the model and observing the results.
– Chris Haug
Jan 6 at 23:53

show 13 more comments

1
The problem is not causality but instead the errors of measurement (it is just that often the dependent variable Y is the one with large measurement error, making “Y = a + B x + error” the common expression) Do you have an idea about the errors in the measurement of X and Y.
– Martijn Weterings
Jan 6 at 12:04

1
The exact values of $beta$ and $gamma$ can be found in this answer of mine to Effect of switching responses and explanatory variables…, and, as you suspect, $beta$ is not the reciprocal of $gamma$, and averaging $beta$ and $1/gamma$ is not the right way to go. A pictorial view of what $beta$ and $gamma$ are minimizing is given in Elvis’s answer to the same question, and he introduces a”least rectangles” regression that you might want …..
– Dilip Sarwate
Jan 6 at 15:43

3
You are in the ideal scenario where the choice of technique has a direct, physically measurable impact; you can simply measure the outofsample hedging error for each estimate, and compare them. Also, typically optimal hedging is better handled by using a VECM model (see for example Gatarek & Johansen, 2014, Optimal hedging with the cointegrated vector autoregressive model), which does not require choosing to model Y as a function of X or viceversa.
– Chris Haug
Jan 6 at 16:32 
1
You might want to look at the geometric mean $sqrt{dfrac{beta}{gamma}}$ as a possibility (if they are both negative you might take the negative square root). Then look at $dfrac{s_y}{s_x}$, which should be very similar
– Henry
Jan 6 at 18:37

1
@ricardo Note that I specified outofsample error, so not the (insample) fit of the model. And it is entirely possible for the optimal hedge ratio to change over time (especially if the relationship is not actually linear), that doesn’t change the fact that figuring out the best hedging strategy can be most directly done by backtesting the model and observing the results.
– Chris Haug
Jan 6 at 23:53
The problem is not causality but instead the errors of measurement (it is just that often the dependent variable Y is the one with large measurement error, making “Y = a + B x + error” the common expression) Do you have an idea about the errors in the measurement of X and Y.
– Martijn Weterings
Jan 6 at 12:04
The problem is not causality but instead the errors of measurement (it is just that often the dependent variable Y is the one with large measurement error, making “Y = a + B x + error” the common expression) Do you have an idea about the errors in the measurement of X and Y.
– Martijn Weterings
Jan 6 at 12:04
The exact values of $beta$ and $gamma$ can be found in this answer of mine to Effect of switching responses and explanatory variables…, and, as you suspect, $beta$ is not the reciprocal of $gamma$, and averaging $beta$ and $1/gamma$ is not the right way to go. A pictorial view of what $beta$ and $gamma$ are minimizing is given in Elvis’s answer to the same question, and he introduces a”least rectangles” regression that you might want …..
– Dilip Sarwate
Jan 6 at 15:43
The exact values of $beta$ and $gamma$ can be found in this answer of mine to Effect of switching responses and explanatory variables…, and, as you suspect, $beta$ is not the reciprocal of $gamma$, and averaging $beta$ and $1/gamma$ is not the right way to go. A pictorial view of what $beta$ and $gamma$ are minimizing is given in Elvis’s answer to the same question, and he introduces a”least rectangles” regression that you might want …..
– Dilip Sarwate
Jan 6 at 15:43
You are in the ideal scenario where the choice of technique has a direct, physically measurable impact; you can simply measure the outofsample hedging error for each estimate, and compare them. Also, typically optimal hedging is better handled by using a VECM model (see for example Gatarek & Johansen, 2014, Optimal hedging with the cointegrated vector autoregressive model), which does not require choosing to model Y as a function of X or viceversa.
– Chris Haug
Jan 6 at 16:32
You are in the ideal scenario where the choice of technique has a direct, physically measurable impact; you can simply measure the outofsample hedging error for each estimate, and compare them. Also, typically optimal hedging is better handled by using a VECM model (see for example Gatarek & Johansen, 2014, Optimal hedging with the cointegrated vector autoregressive model), which does not require choosing to model Y as a function of X or viceversa.
– Chris Haug
Jan 6 at 16:32
You might want to look at the geometric mean $sqrt{dfrac{beta}{gamma}}$ as a possibility (if they are both negative you might take the negative square root). Then look at $dfrac{s_y}{s_x}$, which should be very similar
– Henry
Jan 6 at 18:37
You might want to look at the geometric mean $sqrt{dfrac{beta}{gamma}}$ as a possibility (if they are both negative you might take the negative square root). Then look at $dfrac{s_y}{s_x}$, which should be very similar
– Henry
Jan 6 at 18:37
@ricardo Note that I specified outofsample error, so not the (insample) fit of the model. And it is entirely possible for the optimal hedge ratio to change over time (especially if the relationship is not actually linear), that doesn’t change the fact that figuring out the best hedging strategy can be most directly done by backtesting the model and observing the results.
– Chris Haug
Jan 6 at 23:53
@ricardo Note that I specified outofsample error, so not the (insample) fit of the model. And it is entirely possible for the optimal hedge ratio to change over time (especially if the relationship is not actually linear), that doesn’t change the fact that figuring out the best hedging strategy can be most directly done by backtesting the model and observing the results.
– Chris Haug
Jan 6 at 23:53

show 13 more comments
4 Answers
4
active
oldest
votes
To see the connection between both representations, take a bivariate Normal vector:
$$
begin{pmatrix}
X_1 \
X_2
end{pmatrix} sim mathcal{N} left( begin{pmatrix}
mu_1 \
mu_2
end{pmatrix} , begin{pmatrix}
sigma^2_1 & rho sigma_1 sigma_2 \
rho sigma_1 sigma_2 & sigma^2_2
end{pmatrix} right)
$$
with conditionals
$$X_1 mid X_2=x_2 sim mathcal{N} left( mu_1 + rho frac{sigma_1}{sigma_2}(x_2 – mu_2),(1rho^2)sigma^2_1 right)$$
and
$$X_2 mid X_1=x_1 sim mathcal{N} left( mu_2 + rho frac{sigma_2}{sigma_1}(x_1 – mu_1),(1rho^2)sigma^2_2 right)$$
This means that
$$X_1=underbrace{left(mu_1rho frac{sigma_1}{sigma_2}mu_2right)}_alpha+underbrace{rho frac{sigma_1}{sigma_2}}_beta X_2+sqrt{1rho^2}sigma_1epsilon_1$$
and
$$X_2=underbrace{left(mu_2rho frac{sigma_2}{sigma_1}mu_1right)}_kappa+underbrace{rho frac{sigma_2}{sigma_1}}_gamma X_1+sqrt{1rho^2}sigma_2epsilon_2$$
which means (a) $gamma$ is not $1/beta$ and (b) the connection between the two regressions depends on the joint distribution of $(X_1,X_2)$.

How would I decide if the average of the two betas is a better measure of the hedge ratio than one or the other?
– ricardo
Jan 6 at 9:08 
4
I have no idea.
– Xi’an
Jan 6 at 10:20 
@ricardo By measuring the outofsample hedging error under each estimate, which is ultimately what you are trying to minimize.
– Chris Haug
Jan 6 at 16:35
add a comment 
Converted from a comment…..
The exact values of $beta$ and $gamma$
can be found in this answer of mine to Effect of switching responses and explanatory variables in simple linear regression, and, as you suspect,
$beta$ is not the reciprocal of $gamma$, and averaging $beta$ and $gamma$
(or averaging $beta$ and $1/gamma$) is not the right way to go. A pictorial view of what $beta$ and $gamma$
are minimizing is given in Elvis’s answer to the same question, and in the answer, he introduces a “least rectangles” regression that might be what you are looking for. The comments following Elvis’s answer should not be neglected; they relate this “least rectangles” regression to other, previously studied, techniques. In particular, note that Moderator chl points out that this method is of interest when it is not clear which is the predictor variable and which the response variable.
add a comment 
$beta$ and $gamma$
As Xi’an noted in his answer the $beta$ and $gamma$ are related to each other by relating to the conditional means $XY$ and $YX$ (which in their turn relate to a single joint distribution) these are not symmetric in the sense that $beta neq 1/gamma$. This is neither the case if you would ‘know’ the true $sigma$ and $rho$ instead of using estimates. You have $$beta = rho_{XY} frac{sigma_Y}{sigma_X}$$ and $$gamma = rho_{XY} frac{sigma_X}{sigma_Y}$$
or you could say
$$beta gamma = rho_{XY}^2 leq 1$$
See also simple linear regression on wikipedia for computation of the $beta$ and $gamma$.
It is this correlation term which sort of disturbs the symmetry. When the $beta$ and $gamma$ would be simply the ratio of the standard deviation $sigma_Y/sigma_X$ and $sigma_X/sigma_Y$ then they would indeed be each others inverse. The $rho_{XY}$ term can be seen as modifying this as a sort of regression to the mean.
 With perfect correlation $rho_{XY} = 1$ then you can fully predict $X$ based on $Y$ or vice versa. The slopes will be equal $$beta gamma = 1$$
 But with less than perfect correlation, $rho_{XY} < 1$, you can not make those perfect predictions and the conditional mean will be somewhat closer to the unconditional mean, in comparison to a simple scaling by $sigma_Y/sigma_X$ or $sigma_X/sigma_Y$. The slopes of the regression lines will be less steep. The slopes will be not related as each others reciprocal and their product will be smaller than one $$beta gamma < 1$$
Is a regression line the right method?
You may wonder whether these conditional probabilities and regression lines is what you need to determine your ratios of $X$ and $Y$. It is unclear to me how you would wish to use a regression line in the computation of an optimal ratio.
Below is an alternative way to compute the ratio. This method does have symmetry (ie if you switch X and Y then you will get the same ratio).
Alternative
Say, the yields of bonds $X$ and $Y$ are distributed according to a multivariate normal distribution$^dagger$ with correlation $rho_{XY}$ and standard deviations $sigma_X$ and $sigma_Y$ then the yield of a hedge that is sum of $X$ and $Y$ will be normal distributed:
$$H = alpha X + (1alpha) Y sim N(mu_H,sigma_H^2)$$
were $0 leq alpha leq 1$ and with
$$begin{array}{rcl}
mu_H &=& alpha mu_X+(1alpha) mu_Y \
sigma_H^2 &=& alpha^2 sigma_X^2 + (1alpha)^2 sigma_Y^2 + 2 alpha (1alpha) rho_{XY} sigma_X sigma_Y \
& =& alpha^2(sigma_X^2+sigma_Y^2 2 rho_{XY} sigma_Xsigma_Y) + alpha (2 sigma_Y^2+2rho_{XY}sigma_Xsigma_Y) +sigma_Y^2
end{array} $$
The maximum of the mean $mu_H$ will be at $$alpha = 0 text{ or } alpha=1$$ or not existing when $mu_X=mu_Y$.
The minimum of the variance $sigma_H^2$ will be at $$alpha = 1 – frac{sigma_X^2 rho_{XY}sigma_Xsigma_Y}{sigma_X^2 +sigma_Y^2 2 rho_{XY} sigma_Xsigma_Y} = frac{sigma_Y^2rho_{XY}sigma_Xsigma_Y}{sigma_X^2+sigma_Y^2 2 rho_{XY} sigma_Xsigma_Y} $$
The optimum will be somewhere in between those two extremes and depends on how you wish to compare losses and gains
Note that now there is a symmetry between $alpha$ and $1alpha$. It does not matter whether you use the hedge $H=alpha_1 X+(1alpha_1)Y$ or the hedge $H=alpha_2 Y + (1alpha_2) X$. You will get the same ratios in terms of $alpha_1 = 1alpha_2$.
Minimal variance case and relation with principle components
In the minimal variance case (here you actually do not need to assume a multivariate Normal distribution) you get the following hedge ratio as optimum $$frac{alpha}{1alpha} = frac{var(Y) – cov(X,Y)}{var(X)cov(X,Y)}$$ which can be expressed in terms of the regression coefficients $beta = cov(X,Y)/var(X)$ and $gamma = cov(X,Y)/var(Y)$ and is as following $$frac{alpha}{1alpha} = frac{1beta}{1gamma}$$
In a situation with more than two variables/stocks/bonds you might generalize this to the last (smallest eigenvalue) principle component.
Variants
Improvements of the model can be made by using different distributions than multivariate normal. Also you could incorporate the time in a more sophisticated model to make better predictions of future values/distributions for the pair $X,Y$.
^{$dagger$ This is a simplification but it suits the purpose of explaining how one can, and should, perform the analysis to find an optimal ratio without a regression line.}

1
I am sorry, but as a physicist, I know too little about the language (long, short, holdings, etc.) related to stocks, bonds and finance. If you could cast it in simpler language I might be able to understand it and work with it. My answer is just a very simple expression that is unaware of the details and possibilities how to express hedging and stocks, but it shows the basic principle how you can get away from the use of a regression line (go back to first principles, express the model for profit which is at the core instead of using regression lines whose relevance is not directly clear).
– Martijn Weterings
Jan 7 at 11:41

I think i understand. The problem is that 1/ρ_{XY} ne p_{XY}$. indeed, $p_{XY}$ often changes quite and bit when we take the inverse. Your alternative is close to the case I am thinking about, but i do want to check one thing: does this allow nonnegative holdings? Adopting your terminology, i’d have a unit holding of bond X, and a negative holding of Y. Say long one unit of bond X and short (say) 1.2 units of bond Y … but it could be 0.2 units or 5 units, depending on the math.
– ricardo
Jan 7 at 11:42 
long means that i make 1% on a bond if the price increases by ~1%; short means that i lose ~1% on a bond if the price increases by ~1%. So the idea is that i am long one unit of one bond (so i benefit from an appreciation) and am short some amount of the other bond (so i lose from an appreciation).
– ricardo
Jan 7 at 11:46 
“The problem is to decide how much of X one ought to hold against Y.” My problem with this is that there is no explanation/model/expression how you decide about this. How do you define losses and gains and how much do you value them?
– Martijn Weterings
Jan 7 at 11:46

Are there costs associated with being short and long? I imagine that you have a given amount to invest and this limits how much you can be short/long in those bonds. Then based on your previous knowledge you can estimate/determine the distribution of losses/gains for whatever combination on that limit. Finally, based on some function that determines how you value losses and gains (this expresses why/how you hedge) you can decide which combination to choose.
– Martijn Weterings
Jan 7 at 12:04

show 5 more comments
Perhaps the approach of “Granger causality” might help. This would help you to assess whether X is a good predictor of Y or whether X is a better of Y. In other words, it tells you whether beta or gamma is the thing to take more seriously. Also, considering that you are dealing with time series data, it tells you how much of the history of X counts towards the prediction of Y (or vice versa).
Wikipedia gives a simple explanation:
A time series X is said to Grangercause Y if it can be shown, usually through a series of ttests and Ftests on lagged values of X (and with lagged values of Y also included), that those X values provide statistically significant information about future values of Y.
What you do is the following:
 regress X(t1) and Y(t1) on Y(t)
 regress X(t1), X(t2), Y(t1), Y(t2) on Y(t)
 regress X(t1), X(t2), X(t3), Y(t1), Y(t2), Y(t3) on Y(t)
Continue for whatever history length might be reasonable. Check the significance of the Fstatistics for each regression.
Then do the same the reverse (so, now regress the past values of X and Y on X(t)) and see which regressions have significant Fvalues.
A very straightforward example, with R code, is found here.
Granger causality has been critiqued for not actually establishing causality (in some cases). But it seems that you application is really about “predictive causality,” which is exactly what the Granger causality approach is meant for.
The point is that the approach will tell you whether X predicts Y or whether Y predicts X (so you no longer would be tempted to artificially–and incorrectly–compound the two regression coefficients) and it gives you a better prediction (as you will know how much history of X and Y you need to know to predict Y), which is useful for hedging purposes, right?

I have a strong theoretical reason to believe that neither is truly a cause, and that even if one became a cause that it would not remain true over time. So i don’t think that Granger Causailty is the answer in this case. I’ve upvoted the answer in any case, as it is useful — esp. the R code.
– ricardo
Jan 7 at 3:04 
That is why I explicitly mention that “Granger causality has been critiqued for not actually establishing causality (in some cases).” It seems to me that your question is more about establishing “predictive causality,” which is what Granger causality is meant for. In addition, Granger’s approach uses the information in your time series data, which are a waste not to use if you have them. Of course, you can (should?) reestimate the effects over time. I expect that the Granger effects are more stable than crosssectional OLS (you can test this beforehand, using historical data). HTH
– Steve G. Jones
Jan 7 at 7:04
add a comment 
Your Answer
StackExchange.ifUsing(“editor”, function () {
return StackExchange.using(“mathjaxEditing”, function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [[“$”, “$”], [“\\(“,”\\)”]]);
});
});
}, “mathjaxediting”);
StackExchange.ready(function() {
var channelOptions = {
tags: “”.split(” “),
id: “65”
};
initTagRenderer(“”.split(” “), “”.split(” “), channelOptions);
StackExchange.using(“externalEditor”, function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using(“snippets”, function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: ‘answer’,
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: “”,
imageUploader: {
brandingHtml: “Powered by u003ca class=”iconimgurwhite” href=”https://imgur.com/”u003eu003c/au003e”,
contentPolicyHtml: “User contributions licensed under u003ca href=”https://creativecommons.org/licenses/bysa/3.0/”u003ecc bysa 3.0 with attribution requiredu003c/au003e u003ca href=”https://stackoverflow.com/legal/contentpolicy”u003e(content policy)u003c/au003e”,
allowUrls: true
},
onDemand: true,
discardSelector: “.discardanswer”
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave(‘#loginlink’);
});
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin(‘.newpostlogin’, ‘https%3a%2f%2fstats.stackexchange.com%2fquestions%2f385812%2fistheaverageofbetasfromyxandxyvalid%23newanswer’, ‘question_page’);
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
To see the connection between both representations, take a bivariate Normal vector:
$$
begin{pmatrix}
X_1 \
X_2
end{pmatrix} sim mathcal{N} left( begin{pmatrix}
mu_1 \
mu_2
end{pmatrix} , begin{pmatrix}
sigma^2_1 & rho sigma_1 sigma_2 \
rho sigma_1 sigma_2 & sigma^2_2
end{pmatrix} right)
$$
with conditionals
$$X_1 mid X_2=x_2 sim mathcal{N} left( mu_1 + rho frac{sigma_1}{sigma_2}(x_2 – mu_2),(1rho^2)sigma^2_1 right)$$
and
$$X_2 mid X_1=x_1 sim mathcal{N} left( mu_2 + rho frac{sigma_2}{sigma_1}(x_1 – mu_1),(1rho^2)sigma^2_2 right)$$
This means that
$$X_1=underbrace{left(mu_1rho frac{sigma_1}{sigma_2}mu_2right)}_alpha+underbrace{rho frac{sigma_1}{sigma_2}}_beta X_2+sqrt{1rho^2}sigma_1epsilon_1$$
and
$$X_2=underbrace{left(mu_2rho frac{sigma_2}{sigma_1}mu_1right)}_kappa+underbrace{rho frac{sigma_2}{sigma_1}}_gamma X_1+sqrt{1rho^2}sigma_2epsilon_2$$
which means (a) $gamma$ is not $1/beta$ and (b) the connection between the two regressions depends on the joint distribution of $(X_1,X_2)$.

How would I decide if the average of the two betas is a better measure of the hedge ratio than one or the other?
– ricardo
Jan 6 at 9:08 
4
I have no idea.
– Xi’an
Jan 6 at 10:20 
@ricardo By measuring the outofsample hedging error under each estimate, which is ultimately what you are trying to minimize.
– Chris Haug
Jan 6 at 16:35
add a comment 
To see the connection between both representations, take a bivariate Normal vector:
$$
begin{pmatrix}
X_1 \
X_2
end{pmatrix} sim mathcal{N} left( begin{pmatrix}
mu_1 \
mu_2
end{pmatrix} , begin{pmatrix}
sigma^2_1 & rho sigma_1 sigma_2 \
rho sigma_1 sigma_2 & sigma^2_2
end{pmatrix} right)
$$
with conditionals
$$X_1 mid X_2=x_2 sim mathcal{N} left( mu_1 + rho frac{sigma_1}{sigma_2}(x_2 – mu_2),(1rho^2)sigma^2_1 right)$$
and
$$X_2 mid X_1=x_1 sim mathcal{N} left( mu_2 + rho frac{sigma_2}{sigma_1}(x_1 – mu_1),(1rho^2)sigma^2_2 right)$$
This means that
$$X_1=underbrace{left(mu_1rho frac{sigma_1}{sigma_2}mu_2right)}_alpha+underbrace{rho frac{sigma_1}{sigma_2}}_beta X_2+sqrt{1rho^2}sigma_1epsilon_1$$
and
$$X_2=underbrace{left(mu_2rho frac{sigma_2}{sigma_1}mu_1right)}_kappa+underbrace{rho frac{sigma_2}{sigma_1}}_gamma X_1+sqrt{1rho^2}sigma_2epsilon_2$$
which means (a) $gamma$ is not $1/beta$ and (b) the connection between the two regressions depends on the joint distribution of $(X_1,X_2)$.

How would I decide if the average of the two betas is a better measure of the hedge ratio than one or the other?
– ricardo
Jan 6 at 9:08 
4
I have no idea.
– Xi’an
Jan 6 at 10:20 
@ricardo By measuring the outofsample hedging error under each estimate, which is ultimately what you are trying to minimize.
– Chris Haug
Jan 6 at 16:35
add a comment 
To see the connection between both representations, take a bivariate Normal vector:
$$
begin{pmatrix}
X_1 \
X_2
end{pmatrix} sim mathcal{N} left( begin{pmatrix}
mu_1 \
mu_2
end{pmatrix} , begin{pmatrix}
sigma^2_1 & rho sigma_1 sigma_2 \
rho sigma_1 sigma_2 & sigma^2_2
end{pmatrix} right)
$$
with conditionals
$$X_1 mid X_2=x_2 sim mathcal{N} left( mu_1 + rho frac{sigma_1}{sigma_2}(x_2 – mu_2),(1rho^2)sigma^2_1 right)$$
and
$$X_2 mid X_1=x_1 sim mathcal{N} left( mu_2 + rho frac{sigma_2}{sigma_1}(x_1 – mu_1),(1rho^2)sigma^2_2 right)$$
This means that
$$X_1=underbrace{left(mu_1rho frac{sigma_1}{sigma_2}mu_2right)}_alpha+underbrace{rho frac{sigma_1}{sigma_2}}_beta X_2+sqrt{1rho^2}sigma_1epsilon_1$$
and
$$X_2=underbrace{left(mu_2rho frac{sigma_2}{sigma_1}mu_1right)}_kappa+underbrace{rho frac{sigma_2}{sigma_1}}_gamma X_1+sqrt{1rho^2}sigma_2epsilon_2$$
which means (a) $gamma$ is not $1/beta$ and (b) the connection between the two regressions depends on the joint distribution of $(X_1,X_2)$.
To see the connection between both representations, take a bivariate Normal vector:
$$
begin{pmatrix}
X_1 \
X_2
end{pmatrix} sim mathcal{N} left( begin{pmatrix}
mu_1 \
mu_2
end{pmatrix} , begin{pmatrix}
sigma^2_1 & rho sigma_1 sigma_2 \
rho sigma_1 sigma_2 & sigma^2_2
end{pmatrix} right)
$$
with conditionals
$$X_1 mid X_2=x_2 sim mathcal{N} left( mu_1 + rho frac{sigma_1}{sigma_2}(x_2 – mu_2),(1rho^2)sigma^2_1 right)$$
and
$$X_2 mid X_1=x_1 sim mathcal{N} left( mu_2 + rho frac{sigma_2}{sigma_1}(x_1 – mu_1),(1rho^2)sigma^2_2 right)$$
This means that
$$X_1=underbrace{left(mu_1rho frac{sigma_1}{sigma_2}mu_2right)}_alpha+underbrace{rho frac{sigma_1}{sigma_2}}_beta X_2+sqrt{1rho^2}sigma_1epsilon_1$$
and
$$X_2=underbrace{left(mu_2rho frac{sigma_2}{sigma_1}mu_1right)}_kappa+underbrace{rho frac{sigma_2}{sigma_1}}_gamma X_1+sqrt{1rho^2}sigma_2epsilon_2$$
which means (a) $gamma$ is not $1/beta$ and (b) the connection between the two regressions depends on the joint distribution of $(X_1,X_2)$.

How would I decide if the average of the two betas is a better measure of the hedge ratio than one or the other?
– ricardo
Jan 6 at 9:08 
4
I have no idea.
– Xi’an
Jan 6 at 10:20 
@ricardo By measuring the outofsample hedging error under each estimate, which is ultimately what you are trying to minimize.
– Chris Haug
Jan 6 at 16:35
add a comment 

How would I decide if the average of the two betas is a better measure of the hedge ratio than one or the other?
– ricardo
Jan 6 at 9:08 
4
I have no idea.
– Xi’an
Jan 6 at 10:20 
@ricardo By measuring the outofsample hedging error under each estimate, which is ultimately what you are trying to minimize.
– Chris Haug
Jan 6 at 16:35
How would I decide if the average of the two betas is a better measure of the hedge ratio than one or the other?
– ricardo
Jan 6 at 9:08
How would I decide if the average of the two betas is a better measure of the hedge ratio than one or the other?
– ricardo
Jan 6 at 9:08
I have no idea.
– Xi’an
Jan 6 at 10:20
I have no idea.
– Xi’an
Jan 6 at 10:20
@ricardo By measuring the outofsample hedging error under each estimate, which is ultimately what you are trying to minimize.
– Chris Haug
Jan 6 at 16:35
@ricardo By measuring the outofsample hedging error under each estimate, which is ultimately what you are trying to minimize.
– Chris Haug
Jan 6 at 16:35
add a comment 
Converted from a comment…..
The exact values of $beta$ and $gamma$
can be found in this answer of mine to Effect of switching responses and explanatory variables in simple linear regression, and, as you suspect,
$beta$ is not the reciprocal of $gamma$, and averaging $beta$ and $gamma$
(or averaging $beta$ and $1/gamma$) is not the right way to go. A pictorial view of what $beta$ and $gamma$
are minimizing is given in Elvis’s answer to the same question, and in the answer, he introduces a “least rectangles” regression that might be what you are looking for. The comments following Elvis’s answer should not be neglected; they relate this “least rectangles” regression to other, previously studied, techniques. In particular, note that Moderator chl points out that this method is of interest when it is not clear which is the predictor variable and which the response variable.
add a comment 
Converted from a comment…..
The exact values of $beta$ and $gamma$
can be found in this answer of mine to Effect of switching responses and explanatory variables in simple linear regression, and, as you suspect,
$beta$ is not the reciprocal of $gamma$, and averaging $beta$ and $gamma$
(or averaging $beta$ and $1/gamma$) is not the right way to go. A pictorial view of what $beta$ and $gamma$
are minimizing is given in Elvis’s answer to the same question, and in the answer, he introduces a “least rectangles” regression that might be what you are looking for. The comments following Elvis’s answer should not be neglected; they relate this “least rectangles” regression to other, previously studied, techniques. In particular, note that Moderator chl points out that this method is of interest when it is not clear which is the predictor variable and which the response variable.
add a comment 
Converted from a comment…..
The exact values of $beta$ and $gamma$
can be found in this answer of mine to Effect of switching responses and explanatory variables in simple linear regression, and, as you suspect,
$beta$ is not the reciprocal of $gamma$, and averaging $beta$ and $gamma$
(or averaging $beta$ and $1/gamma$) is not the right way to go. A pictorial view of what $beta$ and $gamma$
are minimizing is given in Elvis’s answer to the same question, and in the answer, he introduces a “least rectangles” regression that might be what you are looking for. The comments following Elvis’s answer should not be neglected; they relate this “least rectangles” regression to other, previously studied, techniques. In particular, note that Moderator chl points out that this method is of interest when it is not clear which is the predictor variable and which the response variable.
Converted from a comment…..
The exact values of $beta$ and $gamma$
can be found in this answer of mine to Effect of switching responses and explanatory variables in simple linear regression, and, as you suspect,
$beta$ is not the reciprocal of $gamma$, and averaging $beta$ and $gamma$
(or averaging $beta$ and $1/gamma$) is not the right way to go. A pictorial view of what $beta$ and $gamma$
are minimizing is given in Elvis’s answer to the same question, and in the answer, he introduces a “least rectangles” regression that might be what you are looking for. The comments following Elvis’s answer should not be neglected; they relate this “least rectangles” regression to other, previously studied, techniques. In particular, note that Moderator chl points out that this method is of interest when it is not clear which is the predictor variable and which the response variable.
add a comment 
add a comment 
$beta$ and $gamma$
As Xi’an noted in his answer the $beta$ and $gamma$ are related to each other by relating to the conditional means $XY$ and $YX$ (which in their turn relate to a single joint distribution) these are not symmetric in the sense that $beta neq 1/gamma$. This is neither the case if you would ‘know’ the true $sigma$ and $rho$ instead of using estimates. You have $$beta = rho_{XY} frac{sigma_Y}{sigma_X}$$ and $$gamma = rho_{XY} frac{sigma_X}{sigma_Y}$$
or you could say
$$beta gamma = rho_{XY}^2 leq 1$$
See also simple linear regression on wikipedia for computation of the $beta$ and $gamma$.
It is this correlation term which sort of disturbs the symmetry. When the $beta$ and $gamma$ would be simply the ratio of the standard deviation $sigma_Y/sigma_X$ and $sigma_X/sigma_Y$ then they would indeed be each others inverse. The $rho_{XY}$ term can be seen as modifying this as a sort of regression to the mean.
 With perfect correlation $rho_{XY} = 1$ then you can fully predict $X$ based on $Y$ or vice versa. The slopes will be equal $$beta gamma = 1$$
 But with less than perfect correlation, $rho_{XY} < 1$, you can not make those perfect predictions and the conditional mean will be somewhat closer to the unconditional mean, in comparison to a simple scaling by $sigma_Y/sigma_X$ or $sigma_X/sigma_Y$. The slopes of the regression lines will be less steep. The slopes will be not related as each others reciprocal and their product will be smaller than one $$beta gamma < 1$$
Is a regression line the right method?
You may wonder whether these conditional probabilities and regression lines is what you need to determine your ratios of $X$ and $Y$. It is unclear to me how you would wish to use a regression line in the computation of an optimal ratio.
Below is an alternative way to compute the ratio. This method does have symmetry (ie if you switch X and Y then you will get the same ratio).
Alternative
Say, the yields of bonds $X$ and $Y$ are distributed according to a multivariate normal distribution$^dagger$ with correlation $rho_{XY}$ and standard deviations $sigma_X$ and $sigma_Y$ then the yield of a hedge that is sum of $X$ and $Y$ will be normal distributed:
$$H = alpha X + (1alpha) Y sim N(mu_H,sigma_H^2)$$
were $0 leq alpha leq 1$ and with
$$begin{array}{rcl}
mu_H &=& alpha mu_X+(1alpha) mu_Y \
sigma_H^2 &=& alpha^2 sigma_X^2 + (1alpha)^2 sigma_Y^2 + 2 alpha (1alpha) rho_{XY} sigma_X sigma_Y \
& =& alpha^2(sigma_X^2+sigma_Y^2 2 rho_{XY} sigma_Xsigma_Y) + alpha (2 sigma_Y^2+2rho_{XY}sigma_Xsigma_Y) +sigma_Y^2
end{array} $$
The maximum of the mean $mu_H$ will be at $$alpha = 0 text{ or } alpha=1$$ or not existing when $mu_X=mu_Y$.
The minimum of the variance $sigma_H^2$ will be at $$alpha = 1 – frac{sigma_X^2 rho_{XY}sigma_Xsigma_Y}{sigma_X^2 +sigma_Y^2 2 rho_{XY} sigma_Xsigma_Y} = frac{sigma_Y^2rho_{XY}sigma_Xsigma_Y}{sigma_X^2+sigma_Y^2 2 rho_{XY} sigma_Xsigma_Y} $$
The optimum will be somewhere in between those two extremes and depends on how you wish to compare losses and gains
Note that now there is a symmetry between $alpha$ and $1alpha$. It does not matter whether you use the hedge $H=alpha_1 X+(1alpha_1)Y$ or the hedge $H=alpha_2 Y + (1alpha_2) X$. You will get the same ratios in terms of $alpha_1 = 1alpha_2$.
Minimal variance case and relation with principle components
In the minimal variance case (here you actually do not need to assume a multivariate Normal distribution) you get the following hedge ratio as optimum $$frac{alpha}{1alpha} = frac{var(Y) – cov(X,Y)}{var(X)cov(X,Y)}$$ which can be expressed in terms of the regression coefficients $beta = cov(X,Y)/var(X)$ and $gamma = cov(X,Y)/var(Y)$ and is as following $$frac{alpha}{1alpha} = frac{1beta}{1gamma}$$
In a situation with more than two variables/stocks/bonds you might generalize this to the last (smallest eigenvalue) principle component.
Variants
Improvements of the model can be made by using different distributions than multivariate normal. Also you could incorporate the time in a more sophisticated model to make better predictions of future values/distributions for the pair $X,Y$.
^{$dagger$ This is a simplification but it suits the purpose of explaining how one can, and should, perform the analysis to find an optimal ratio without a regression line.}

1
I am sorry, but as a physicist, I know too little about the language (long, short, holdings, etc.) related to stocks, bonds and finance. If you could cast it in simpler language I might be able to understand it and work with it. My answer is just a very simple expression that is unaware of the details and possibilities how to express hedging and stocks, but it shows the basic principle how you can get away from the use of a regression line (go back to first principles, express the model for profit which is at the core instead of using regression lines whose relevance is not directly clear).
– Martijn Weterings
Jan 7 at 11:41

I think i understand. The problem is that 1/ρ_{XY} ne p_{XY}$. indeed, $p_{XY}$ often changes quite and bit when we take the inverse. Your alternative is close to the case I am thinking about, but i do want to check one thing: does this allow nonnegative holdings? Adopting your terminology, i’d have a unit holding of bond X, and a negative holding of Y. Say long one unit of bond X and short (say) 1.2 units of bond Y … but it could be 0.2 units or 5 units, depending on the math.
– ricardo
Jan 7 at 11:42 
long means that i make 1% on a bond if the price increases by ~1%; short means that i lose ~1% on a bond if the price increases by ~1%. So the idea is that i am long one unit of one bond (so i benefit from an appreciation) and am short some amount of the other bond (so i lose from an appreciation).
– ricardo
Jan 7 at 11:46 
“The problem is to decide how much of X one ought to hold against Y.” My problem with this is that there is no explanation/model/expression how you decide about this. How do you define losses and gains and how much do you value them?
– Martijn Weterings
Jan 7 at 11:46

Are there costs associated with being short and long? I imagine that you have a given amount to invest and this limits how much you can be short/long in those bonds. Then based on your previous knowledge you can estimate/determine the distribution of losses/gains for whatever combination on that limit. Finally, based on some function that determines how you value losses and gains (this expresses why/how you hedge) you can decide which combination to choose.
– Martijn Weterings
Jan 7 at 12:04

show 5 more comments
$beta$ and $gamma$
As Xi’an noted in his answer the $beta$ and $gamma$ are related to each other by relating to the conditional means $XY$ and $YX$ (which in their turn relate to a single joint distribution) these are not symmetric in the sense that $beta neq 1/gamma$. This is neither the case if you would ‘know’ the true $sigma$ and $rho$ instead of using estimates. You have $$beta = rho_{XY} frac{sigma_Y}{sigma_X}$$ and $$gamma = rho_{XY} frac{sigma_X}{sigma_Y}$$
or you could say
$$beta gamma = rho_{XY}^2 leq 1$$
See also simple linear regression on wikipedia for computation of the $beta$ and $gamma$.
It is this correlation term which sort of disturbs the symmetry. When the $beta$ and $gamma$ would be simply the ratio of the standard deviation $sigma_Y/sigma_X$ and $sigma_X/sigma_Y$ then they would indeed be each others inverse. The $rho_{XY}$ term can be seen as modifying this as a sort of regression to the mean.
 With perfect correlation $rho_{XY} = 1$ then you can fully predict $X$ based on $Y$ or vice versa. The slopes will be equal $$beta gamma = 1$$
 But with less than perfect correlation, $rho_{XY} < 1$, you can not make those perfect predictions and the conditional mean will be somewhat closer to the unconditional mean, in comparison to a simple scaling by $sigma_Y/sigma_X$ or $sigma_X/sigma_Y$. The slopes of the regression lines will be less steep. The slopes will be not related as each others reciprocal and their product will be smaller than one $$beta gamma < 1$$
Is a regression line the right method?
You may wonder whether these conditional probabilities and regression lines is what you need to determine your ratios of $X$ and $Y$. It is unclear to me how you would wish to use a regression line in the computation of an optimal ratio.
Below is an alternative way to compute the ratio. This method does have symmetry (ie if you switch X and Y then you will get the same ratio).
Alternative
Say, the yields of bonds $X$ and $Y$ are distributed according to a multivariate normal distribution$^dagger$ with correlation $rho_{XY}$ and standard deviations $sigma_X$ and $sigma_Y$ then the yield of a hedge that is sum of $X$ and $Y$ will be normal distributed:
$$H = alpha X + (1alpha) Y sim N(mu_H,sigma_H^2)$$
were $0 leq alpha leq 1$ and with
$$begin{array}{rcl}
mu_H &=& alpha mu_X+(1alpha) mu_Y \
sigma_H^2 &=& alpha^2 sigma_X^2 + (1alpha)^2 sigma_Y^2 + 2 alpha (1alpha) rho_{XY} sigma_X sigma_Y \
& =& alpha^2(sigma_X^2+sigma_Y^2 2 rho_{XY} sigma_Xsigma_Y) + alpha (2 sigma_Y^2+2rho_{XY}sigma_Xsigma_Y) +sigma_Y^2
end{array} $$
The maximum of the mean $mu_H$ will be at $$alpha = 0 text{ or } alpha=1$$ or not existing when $mu_X=mu_Y$.
The minimum of the variance $sigma_H^2$ will be at $$alpha = 1 – frac{sigma_X^2 rho_{XY}sigma_Xsigma_Y}{sigma_X^2 +sigma_Y^2 2 rho_{XY} sigma_Xsigma_Y} = frac{sigma_Y^2rho_{XY}sigma_Xsigma_Y}{sigma_X^2+sigma_Y^2 2 rho_{XY} sigma_Xsigma_Y} $$
The optimum will be somewhere in between those two extremes and depends on how you wish to compare losses and gains
Note that now there is a symmetry between $alpha$ and $1alpha$. It does not matter whether you use the hedge $H=alpha_1 X+(1alpha_1)Y$ or the hedge $H=alpha_2 Y + (1alpha_2) X$. You will get the same ratios in terms of $alpha_1 = 1alpha_2$.
Minimal variance case and relation with principle components
In the minimal variance case (here you actually do not need to assume a multivariate Normal distribution) you get the following hedge ratio as optimum $$frac{alpha}{1alpha} = frac{var(Y) – cov(X,Y)}{var(X)cov(X,Y)}$$ which can be expressed in terms of the regression coefficients $beta = cov(X,Y)/var(X)$ and $gamma = cov(X,Y)/var(Y)$ and is as following $$frac{alpha}{1alpha} = frac{1beta}{1gamma}$$
In a situation with more than two variables/stocks/bonds you might generalize this to the last (smallest eigenvalue) principle component.
Variants
Improvements of the model can be made by using different distributions than multivariate normal. Also you could incorporate the time in a more sophisticated model to make better predictions of future values/distributions for the pair $X,Y$.
^{$dagger$ This is a simplification but it suits the purpose of explaining how one can, and should, perform the analysis to find an optimal ratio without a regression line.}

1
I am sorry, but as a physicist, I know too little about the language (long, short, holdings, etc.) related to stocks, bonds and finance. If you could cast it in simpler language I might be able to understand it and work with it. My answer is just a very simple expression that is unaware of the details and possibilities how to express hedging and stocks, but it shows the basic principle how you can get away from the use of a regression line (go back to first principles, express the model for profit which is at the core instead of using regression lines whose relevance is not directly clear).
– Martijn Weterings
Jan 7 at 11:41

I think i understand. The problem is that 1/ρ_{XY} ne p_{XY}$. indeed, $p_{XY}$ often changes quite and bit when we take the inverse. Your alternative is close to the case I am thinking about, but i do want to check one thing: does this allow nonnegative holdings? Adopting your terminology, i’d have a unit holding of bond X, and a negative holding of Y. Say long one unit of bond X and short (say) 1.2 units of bond Y … but it could be 0.2 units or 5 units, depending on the math.
– ricardo
Jan 7 at 11:42 
long means that i make 1% on a bond if the price increases by ~1%; short means that i lose ~1% on a bond if the price increases by ~1%. So the idea is that i am long one unit of one bond (so i benefit from an appreciation) and am short some amount of the other bond (so i lose from an appreciation).
– ricardo
Jan 7 at 11:46 
“The problem is to decide how much of X one ought to hold against Y.” My problem with this is that there is no explanation/model/expression how you decide about this. How do you define losses and gains and how much do you value them?
– Martijn Weterings
Jan 7 at 11:46

Are there costs associated with being short and long? I imagine that you have a given amount to invest and this limits how much you can be short/long in those bonds. Then based on your previous knowledge you can estimate/determine the distribution of losses/gains for whatever combination on that limit. Finally, based on some function that determines how you value losses and gains (this expresses why/how you hedge) you can decide which combination to choose.
– Martijn Weterings
Jan 7 at 12:04

show 5 more comments
$beta$ and $gamma$
As Xi’an noted in his answer the $beta$ and $gamma$ are related to each other by relating to the conditional means $XY$ and $YX$ (which in their turn relate to a single joint distribution) these are not symmetric in the sense that $beta neq 1/gamma$. This is neither the case if you would ‘know’ the true $sigma$ and $rho$ instead of using estimates. You have $$beta = rho_{XY} frac{sigma_Y}{sigma_X}$$ and $$gamma = rho_{XY} frac{sigma_X}{sigma_Y}$$
or you could say
$$beta gamma = rho_{XY}^2 leq 1$$
See also simple linear regression on wikipedia for computation of the $beta$ and $gamma$.
It is this correlation term which sort of disturbs the symmetry. When the $beta$ and $gamma$ would be simply the ratio of the standard deviation $sigma_Y/sigma_X$ and $sigma_X/sigma_Y$ then they would indeed be each others inverse. The $rho_{XY}$ term can be seen as modifying this as a sort of regression to the mean.
 With perfect correlation $rho_{XY} = 1$ then you can fully predict $X$ based on $Y$ or vice versa. The slopes will be equal $$beta gamma = 1$$
 But with less than perfect correlation, $rho_{XY} < 1$, you can not make those perfect predictions and the conditional mean will be somewhat closer to the unconditional mean, in comparison to a simple scaling by $sigma_Y/sigma_X$ or $sigma_X/sigma_Y$. The slopes of the regression lines will be less steep. The slopes will be not related as each others reciprocal and their product will be smaller than one $$beta gamma < 1$$
Is a regression line the right method?
You may wonder whether these conditional probabilities and regression lines is what you need to determine your ratios of $X$ and $Y$. It is unclear to me how you would wish to use a regression line in the computation of an optimal ratio.
Below is an alternative way to compute the ratio. This method does have symmetry (ie if you switch X and Y then you will get the same ratio).
Alternative
Say, the yields of bonds $X$ and $Y$ are distributed according to a multivariate normal distribution$^dagger$ with correlation $rho_{XY}$ and standard deviations $sigma_X$ and $sigma_Y$ then the yield of a hedge that is sum of $X$ and $Y$ will be normal distributed:
$$H = alpha X + (1alpha) Y sim N(mu_H,sigma_H^2)$$
were $0 leq alpha leq 1$ and with
$$begin{array}{rcl}
mu_H &=& alpha mu_X+(1alpha) mu_Y \
sigma_H^2 &=& alpha^2 sigma_X^2 + (1alpha)^2 sigma_Y^2 + 2 alpha (1alpha) rho_{XY} sigma_X sigma_Y \
& =& alpha^2(sigma_X^2+sigma_Y^2 2 rho_{XY} sigma_Xsigma_Y) + alpha (2 sigma_Y^2+2rho_{XY}sigma_Xsigma_Y) +sigma_Y^2
end{array} $$
The maximum of the mean $mu_H$ will be at $$alpha = 0 text{ or } alpha=1$$ or not existing when $mu_X=mu_Y$.
The minimum of the variance $sigma_H^2$ will be at $$alpha = 1 – frac{sigma_X^2 rho_{XY}sigma_Xsigma_Y}{sigma_X^2 +sigma_Y^2 2 rho_{XY} sigma_Xsigma_Y} = frac{sigma_Y^2rho_{XY}sigma_Xsigma_Y}{sigma_X^2+sigma_Y^2 2 rho_{XY} sigma_Xsigma_Y} $$
The optimum will be somewhere in between those two extremes and depends on how you wish to compare losses and gains
Note that now there is a symmetry between $alpha$ and $1alpha$. It does not matter whether you use the hedge $H=alpha_1 X+(1alpha_1)Y$ or the hedge $H=alpha_2 Y + (1alpha_2) X$. You will get the same ratios in terms of $alpha_1 = 1alpha_2$.
Minimal variance case and relation with principle components
In the minimal variance case (here you actually do not need to assume a multivariate Normal distribution) you get the following hedge ratio as optimum $$frac{alpha}{1alpha} = frac{var(Y) – cov(X,Y)}{var(X)cov(X,Y)}$$ which can be expressed in terms of the regression coefficients $beta = cov(X,Y)/var(X)$ and $gamma = cov(X,Y)/var(Y)$ and is as following $$frac{alpha}{1alpha} = frac{1beta}{1gamma}$$
In a situation with more than two variables/stocks/bonds you might generalize this to the last (smallest eigenvalue) principle component.
Variants
Improvements of the model can be made by using different distributions than multivariate normal. Also you could incorporate the time in a more sophisticated model to make better predictions of future values/distributions for the pair $X,Y$.
^{$dagger$ This is a simplification but it suits the purpose of explaining how one can, and should, perform the analysis to find an optimal ratio without a regression line.}
$beta$ and $gamma$
As Xi’an noted in his answer the $beta$ and $gamma$ are related to each other by relating to the conditional means $XY$ and $YX$ (which in their turn relate to a single joint distribution) these are not symmetric in the sense that $beta neq 1/gamma$. This is neither the case if you would ‘know’ the true $sigma$ and $rho$ instead of using estimates. You have $$beta = rho_{XY} frac{sigma_Y}{sigma_X}$$ and $$gamma = rho_{XY} frac{sigma_X}{sigma_Y}$$
or you could say
$$beta gamma = rho_{XY}^2 leq 1$$
See also simple linear regression on wikipedia for computation of the $beta$ and $gamma$.
It is this correlation term which sort of disturbs the symmetry. When the $beta$ and $gamma$ would be simply the ratio of the standard deviation $sigma_Y/sigma_X$ and $sigma_X/sigma_Y$ then they would indeed be each others inverse. The $rho_{XY}$ term can be seen as modifying this as a sort of regression to the mean.
 With perfect correlation $rho_{XY} = 1$ then you can fully predict $X$ based on $Y$ or vice versa. The slopes will be equal $$beta gamma = 1$$
 But with less than perfect correlation, $rho_{XY} < 1$, you can not make those perfect predictions and the conditional mean will be somewhat closer to the unconditional mean, in comparison to a simple scaling by $sigma_Y/sigma_X$ or $sigma_X/sigma_Y$. The slopes of the regression lines will be less steep. The slopes will be not related as each others reciprocal and their product will be smaller than one $$beta gamma < 1$$
Is a regression line the right method?
You may wonder whether these conditional probabilities and regression lines is what you need to determine your ratios of $X$ and $Y$. It is unclear to me how you would wish to use a regression line in the computation of an optimal ratio.
Below is an alternative way to compute the ratio. This method does have symmetry (ie if you switch X and Y then you will get the same ratio).
Alternative
Say, the yields of bonds $X$ and $Y$ are distributed according to a multivariate normal distribution$^dagger$ with correlation $rho_{XY}$ and standard deviations $sigma_X$ and $sigma_Y$ then the yield of a hedge that is sum of $X$ and $Y$ will be normal distributed:
$$H = alpha X + (1alpha) Y sim N(mu_H,sigma_H^2)$$
were $0 leq alpha leq 1$ and with
$$begin{array}{rcl}
mu_H &=& alpha mu_X+(1alpha) mu_Y \
sigma_H^2 &=& alpha^2 sigma_X^2 + (1alpha)^2 sigma_Y^2 + 2 alpha (1alpha) rho_{XY} sigma_X sigma_Y \
& =& alpha^2(sigma_X^2+sigma_Y^2 2 rho_{XY} sigma_Xsigma_Y) + alpha (2 sigma_Y^2+2rho_{XY}sigma_Xsigma_Y) +sigma_Y^2
end{array} $$
The maximum of the mean $mu_H$ will be at $$alpha = 0 text{ or } alpha=1$$ or not existing when $mu_X=mu_Y$.
The minimum of the variance $sigma_H^2$ will be at $$alpha = 1 – frac{sigma_X^2 rho_{XY}sigma_Xsigma_Y}{sigma_X^2 +sigma_Y^2 2 rho_{XY} sigma_Xsigma_Y} = frac{sigma_Y^2rho_{XY}sigma_Xsigma_Y}{sigma_X^2+sigma_Y^2 2 rho_{XY} sigma_Xsigma_Y} $$
The optimum will be somewhere in between those two extremes and depends on how you wish to compare losses and gains
Note that now there is a symmetry between $alpha$ and $1alpha$. It does not matter whether you use the hedge $H=alpha_1 X+(1alpha_1)Y$ or the hedge $H=alpha_2 Y + (1alpha_2) X$. You will get the same ratios in terms of $alpha_1 = 1alpha_2$.
Minimal variance case and relation with principle components
In the minimal variance case (here you actually do not need to assume a multivariate Normal distribution) you get the following hedge ratio as optimum $$frac{alpha}{1alpha} = frac{var(Y) – cov(X,Y)}{var(X)cov(X,Y)}$$ which can be expressed in terms of the regression coefficients $beta = cov(X,Y)/var(X)$ and $gamma = cov(X,Y)/var(Y)$ and is as following $$frac{alpha}{1alpha} = frac{1beta}{1gamma}$$
In a situation with more than two variables/stocks/bonds you might generalize this to the last (smallest eigenvalue) principle component.
Variants
Improvements of the model can be made by using different distributions than multivariate normal. Also you could incorporate the time in a more sophisticated model to make better predictions of future values/distributions for the pair $X,Y$.
^{$dagger$ This is a simplification but it suits the purpose of explaining how one can, and should, perform the analysis to find an optimal ratio without a regression line.}

1
I am sorry, but as a physicist, I know too little about the language (long, short, holdings, etc.) related to stocks, bonds and finance. If you could cast it in simpler language I might be able to understand it and work with it. My answer is just a very simple expression that is unaware of the details and possibilities how to express hedging and stocks, but it shows the basic principle how you can get away from the use of a regression line (go back to first principles, express the model for profit which is at the core instead of using regression lines whose relevance is not directly clear).
– Martijn Weterings
Jan 7 at 11:41

I think i understand. The problem is that 1/ρ_{XY} ne p_{XY}$. indeed, $p_{XY}$ often changes quite and bit when we take the inverse. Your alternative is close to the case I am thinking about, but i do want to check one thing: does this allow nonnegative holdings? Adopting your terminology, i’d have a unit holding of bond X, and a negative holding of Y. Say long one unit of bond X and short (say) 1.2 units of bond Y … but it could be 0.2 units or 5 units, depending on the math.
– ricardo
Jan 7 at 11:42 
long means that i make 1% on a bond if the price increases by ~1%; short means that i lose ~1% on a bond if the price increases by ~1%. So the idea is that i am long one unit of one bond (so i benefit from an appreciation) and am short some amount of the other bond (so i lose from an appreciation).
– ricardo
Jan 7 at 11:46 
“The problem is to decide how much of X one ought to hold against Y.” My problem with this is that there is no explanation/model/expression how you decide about this. How do you define losses and gains and how much do you value them?
– Martijn Weterings
Jan 7 at 11:46

Are there costs associated with being short and long? I imagine that you have a given amount to invest and this limits how much you can be short/long in those bonds. Then based on your previous knowledge you can estimate/determine the distribution of losses/gains for whatever combination on that limit. Finally, based on some function that determines how you value losses and gains (this expresses why/how you hedge) you can decide which combination to choose.
– Martijn Weterings
Jan 7 at 12:04

show 5 more comments

1
I am sorry, but as a physicist, I know too little about the language (long, short, holdings, etc.) related to stocks, bonds and finance. If you could cast it in simpler language I might be able to understand it and work with it. My answer is just a very simple expression that is unaware of the details and possibilities how to express hedging and stocks, but it shows the basic principle how you can get away from the use of a regression line (go back to first principles, express the model for profit which is at the core instead of using regression lines whose relevance is not directly clear).
– Martijn Weterings
Jan 7 at 11:41

I think i understand. The problem is that 1/ρ_{XY} ne p_{XY}$. indeed, $p_{XY}$ often changes quite and bit when we take the inverse. Your alternative is close to the case I am thinking about, but i do want to check one thing: does this allow nonnegative holdings? Adopting your terminology, i’d have a unit holding of bond X, and a negative holding of Y. Say long one unit of bond X and short (say) 1.2 units of bond Y … but it could be 0.2 units or 5 units, depending on the math.
– ricardo
Jan 7 at 11:42 
long means that i make 1% on a bond if the price increases by ~1%; short means that i lose ~1% on a bond if the price increases by ~1%. So the idea is that i am long one unit of one bond (so i benefit from an appreciation) and am short some amount of the other bond (so i lose from an appreciation).
– ricardo
Jan 7 at 11:46 
“The problem is to decide how much of X one ought to hold against Y.” My problem with this is that there is no explanation/model/expression how you decide about this. How do you define losses and gains and how much do you value them?
– Martijn Weterings
Jan 7 at 11:46

Are there costs associated with being short and long? I imagine that you have a given amount to invest and this limits how much you can be short/long in those bonds. Then based on your previous knowledge you can estimate/determine the distribution of losses/gains for whatever combination on that limit. Finally, based on some function that determines how you value losses and gains (this expresses why/how you hedge) you can decide which combination to choose.
– Martijn Weterings
Jan 7 at 12:04
I am sorry, but as a physicist, I know too little about the language (long, short, holdings, etc.) related to stocks, bonds and finance. If you could cast it in simpler language I might be able to understand it and work with it. My answer is just a very simple expression that is unaware of the details and possibilities how to express hedging and stocks, but it shows the basic principle how you can get away from the use of a regression line (go back to first principles, express the model for profit which is at the core instead of using regression lines whose relevance is not directly clear).
– Martijn Weterings
Jan 7 at 11:41
I am sorry, but as a physicist, I know too little about the language (long, short, holdings, etc.) related to stocks, bonds and finance. If you could cast it in simpler language I might be able to understand it and work with it. My answer is just a very simple expression that is unaware of the details and possibilities how to express hedging and stocks, but it shows the basic principle how you can get away from the use of a regression line (go back to first principles, express the model for profit which is at the core instead of using regression lines whose relevance is not directly clear).
– Martijn Weterings
Jan 7 at 11:41
I think i understand. The problem is that 1/ρ_{XY} ne p_{XY}$. indeed, $p_{XY}$ often changes quite and bit when we take the inverse. Your alternative is close to the case I am thinking about, but i do want to check one thing: does this allow nonnegative holdings? Adopting your terminology, i’d have a unit holding of bond X, and a negative holding of Y. Say long one unit of bond X and short (say) 1.2 units of bond Y … but it could be 0.2 units or 5 units, depending on the math.
– ricardo
Jan 7 at 11:42
I think i understand. The problem is that 1/ρ_{XY} ne p_{XY}$. indeed, $p_{XY}$ often changes quite and bit when we take the inverse. Your alternative is close to the case I am thinking about, but i do want to check one thing: does this allow nonnegative holdings? Adopting your terminology, i’d have a unit holding of bond X, and a negative holding of Y. Say long one unit of bond X and short (say) 1.2 units of bond Y … but it could be 0.2 units or 5 units, depending on the math.
– ricardo
Jan 7 at 11:42
long means that i make 1% on a bond if the price increases by ~1%; short means that i lose ~1% on a bond if the price increases by ~1%. So the idea is that i am long one unit of one bond (so i benefit from an appreciation) and am short some amount of the other bond (so i lose from an appreciation).
– ricardo
Jan 7 at 11:46
long means that i make 1% on a bond if the price increases by ~1%; short means that i lose ~1% on a bond if the price increases by ~1%. So the idea is that i am long one unit of one bond (so i benefit from an appreciation) and am short some amount of the other bond (so i lose from an appreciation).
– ricardo
Jan 7 at 11:46
“The problem is to decide how much of X one ought to hold against Y.” My problem with this is that there is no explanation/model/expression how you decide about this. How do you define losses and gains and how much do you value them?
– Martijn Weterings
Jan 7 at 11:46
“The problem is to decide how much of X one ought to hold against Y.” My problem with this is that there is no explanation/model/expression how you decide about this. How do you define losses and gains and how much do you value them?
– Martijn Weterings
Jan 7 at 11:46
Are there costs associated with being short and long? I imagine that you have a given amount to invest and this limits how much you can be short/long in those bonds. Then based on your previous knowledge you can estimate/determine the distribution of losses/gains for whatever combination on that limit. Finally, based on some function that determines how you value losses and gains (this expresses why/how you hedge) you can decide which combination to choose.
– Martijn Weterings
Jan 7 at 12:04
Are there costs associated with being short and long? I imagine that you have a given amount to invest and this limits how much you can be short/long in those bonds. Then based on your previous knowledge you can estimate/determine the distribution of losses/gains for whatever combination on that limit. Finally, based on some function that determines how you value losses and gains (this expresses why/how you hedge) you can decide which combination to choose.
– Martijn Weterings
Jan 7 at 12:04

show 5 more comments
Perhaps the approach of “Granger causality” might help. This would help you to assess whether X is a good predictor of Y or whether X is a better of Y. In other words, it tells you whether beta or gamma is the thing to take more seriously. Also, considering that you are dealing with time series data, it tells you how much of the history of X counts towards the prediction of Y (or vice versa).
Wikipedia gives a simple explanation:
A time series X is said to Grangercause Y if it can be shown, usually through a series of ttests and Ftests on lagged values of X (and with lagged values of Y also included), that those X values provide statistically significant information about future values of Y.
What you do is the following:
 regress X(t1) and Y(t1) on Y(t)
 regress X(t1), X(t2), Y(t1), Y(t2) on Y(t)
 regress X(t1), X(t2), X(t3), Y(t1), Y(t2), Y(t3) on Y(t)
Continue for whatever history length might be reasonable. Check the significance of the Fstatistics for each regression.
Then do the same the reverse (so, now regress the past values of X and Y on X(t)) and see which regressions have significant Fvalues.
A very straightforward example, with R code, is found here.
Granger causality has been critiqued for not actually establishing causality (in some cases). But it seems that you application is really about “predictive causality,” which is exactly what the Granger causality approach is meant for.
The point is that the approach will tell you whether X predicts Y or whether Y predicts X (so you no longer would be tempted to artificially–and incorrectly–compound the two regression coefficients) and it gives you a better prediction (as you will know how much history of X and Y you need to know to predict Y), which is useful for hedging purposes, right?

I have a strong theoretical reason to believe that neither is truly a cause, and that even if one became a cause that it would not remain true over time. So i don’t think that Granger Causailty is the answer in this case. I’ve upvoted the answer in any case, as it is useful — esp. the R code.
– ricardo
Jan 7 at 3:04 
That is why I explicitly mention that “Granger causality has been critiqued for not actually establishing causality (in some cases).” It seems to me that your question is more about establishing “predictive causality,” which is what Granger causality is meant for. In addition, Granger’s approach uses the information in your time series data, which are a waste not to use if you have them. Of course, you can (should?) reestimate the effects over time. I expect that the Granger effects are more stable than crosssectional OLS (you can test this beforehand, using historical data). HTH
– Steve G. Jones
Jan 7 at 7:04
add a comment 
Perhaps the approach of “Granger causality” might help. This would help you to assess whether X is a good predictor of Y or whether X is a better of Y. In other words, it tells you whether beta or gamma is the thing to take more seriously. Also, considering that you are dealing with time series data, it tells you how much of the history of X counts towards the prediction of Y (or vice versa).
Wikipedia gives a simple explanation:
A time series X is said to Grangercause Y if it can be shown, usually through a series of ttests and Ftests on lagged values of X (and with lagged values of Y also included), that those X values provide statistically significant information about future values of Y.
What you do is the following:
 regress X(t1) and Y(t1) on Y(t)
 regress X(t1), X(t2), Y(t1), Y(t2) on Y(t)
 regress X(t1), X(t2), X(t3), Y(t1), Y(t2), Y(t3) on Y(t)
Continue for whatever history length might be reasonable. Check the significance of the Fstatistics for each regression.
Then do the same the reverse (so, now regress the past values of X and Y on X(t)) and see which regressions have significant Fvalues.
A very straightforward example, with R code, is found here.
Granger causality has been critiqued for not actually establishing causality (in some cases). But it seems that you application is really about “predictive causality,” which is exactly what the Granger causality approach is meant for.
The point is that the approach will tell you whether X predicts Y or whether Y predicts X (so you no longer would be tempted to artificially–and incorrectly–compound the two regression coefficients) and it gives you a better prediction (as you will know how much history of X and Y you need to know to predict Y), which is useful for hedging purposes, right?

I have a strong theoretical reason to believe that neither is truly a cause, and that even if one became a cause that it would not remain true over time. So i don’t think that Granger Causailty is the answer in this case. I’ve upvoted the answer in any case, as it is useful — esp. the R code.
– ricardo
Jan 7 at 3:04 
That is why I explicitly mention that “Granger causality has been critiqued for not actually establishing causality (in some cases).” It seems to me that your question is more about establishing “predictive causality,” which is what Granger causality is meant for. In addition, Granger’s approach uses the information in your time series data, which are a waste not to use if you have them. Of course, you can (should?) reestimate the effects over time. I expect that the Granger effects are more stable than crosssectional OLS (you can test this beforehand, using historical data). HTH
– Steve G. Jones
Jan 7 at 7:04
add a comment 
Perhaps the approach of “Granger causality” might help. This would help you to assess whether X is a good predictor of Y or whether X is a better of Y. In other words, it tells you whether beta or gamma is the thing to take more seriously. Also, considering that you are dealing with time series data, it tells you how much of the history of X counts towards the prediction of Y (or vice versa).
Wikipedia gives a simple explanation:
A time series X is said to Grangercause Y if it can be shown, usually through a series of ttests and Ftests on lagged values of X (and with lagged values of Y also included), that those X values provide statistically significant information about future values of Y.
What you do is the following:
 regress X(t1) and Y(t1) on Y(t)
 regress X(t1), X(t2), Y(t1), Y(t2) on Y(t)
 regress X(t1), X(t2), X(t3), Y(t1), Y(t2), Y(t3) on Y(t)
Continue for whatever history length might be reasonable. Check the significance of the Fstatistics for each regression.
Then do the same the reverse (so, now regress the past values of X and Y on X(t)) and see which regressions have significant Fvalues.
A very straightforward example, with R code, is found here.
Granger causality has been critiqued for not actually establishing causality (in some cases). But it seems that you application is really about “predictive causality,” which is exactly what the Granger causality approach is meant for.
The point is that the approach will tell you whether X predicts Y or whether Y predicts X (so you no longer would be tempted to artificially–and incorrectly–compound the two regression coefficients) and it gives you a better prediction (as you will know how much history of X and Y you need to know to predict Y), which is useful for hedging purposes, right?
Perhaps the approach of “Granger causality” might help. This would help you to assess whether X is a good predictor of Y or whether X is a better of Y. In other words, it tells you whether beta or gamma is the thing to take more seriously. Also, considering that you are dealing with time series data, it tells you how much of the history of X counts towards the prediction of Y (or vice versa).
Wikipedia gives a simple explanation:
A time series X is said to Grangercause Y if it can be shown, usually through a series of ttests and Ftests on lagged values of X (and with lagged values of Y also included), that those X values provide statistically significant information about future values of Y.
What you do is the following:
 regress X(t1) and Y(t1) on Y(t)
 regress X(t1), X(t2), Y(t1), Y(t2) on Y(t)
 regress X(t1), X(t2), X(t3), Y(t1), Y(t2), Y(t3) on Y(t)
Continue for whatever history length might be reasonable. Check the significance of the Fstatistics for each regression.
Then do the same the reverse (so, now regress the past values of X and Y on X(t)) and see which regressions have significant Fvalues.
A very straightforward example, with R code, is found here.
Granger causality has been critiqued for not actually establishing causality (in some cases). But it seems that you application is really about “predictive causality,” which is exactly what the Granger causality approach is meant for.
The point is that the approach will tell you whether X predicts Y or whether Y predicts X (so you no longer would be tempted to artificially–and incorrectly–compound the two regression coefficients) and it gives you a better prediction (as you will know how much history of X and Y you need to know to predict Y), which is useful for hedging purposes, right?

I have a strong theoretical reason to believe that neither is truly a cause, and that even if one became a cause that it would not remain true over time. So i don’t think that Granger Causailty is the answer in this case. I’ve upvoted the answer in any case, as it is useful — esp. the R code.
– ricardo
Jan 7 at 3:04 
That is why I explicitly mention that “Granger causality has been critiqued for not actually establishing causality (in some cases).” It seems to me that your question is more about establishing “predictive causality,” which is what Granger causality is meant for. In addition, Granger’s approach uses the information in your time series data, which are a waste not to use if you have them. Of course, you can (should?) reestimate the effects over time. I expect that the Granger effects are more stable than crosssectional OLS (you can test this beforehand, using historical data). HTH
– Steve G. Jones
Jan 7 at 7:04
add a comment 

I have a strong theoretical reason to believe that neither is truly a cause, and that even if one became a cause that it would not remain true over time. So i don’t think that Granger Causailty is the answer in this case. I’ve upvoted the answer in any case, as it is useful — esp. the R code.
– ricardo
Jan 7 at 3:04 
That is why I explicitly mention that “Granger causality has been critiqued for not actually establishing causality (in some cases).” It seems to me that your question is more about establishing “predictive causality,” which is what Granger causality is meant for. In addition, Granger’s approach uses the information in your time series data, which are a waste not to use if you have them. Of course, you can (should?) reestimate the effects over time. I expect that the Granger effects are more stable than crosssectional OLS (you can test this beforehand, using historical data). HTH
– Steve G. Jones
Jan 7 at 7:04
I have a strong theoretical reason to believe that neither is truly a cause, and that even if one became a cause that it would not remain true over time. So i don’t think that Granger Causailty is the answer in this case. I’ve upvoted the answer in any case, as it is useful — esp. the R code.
– ricardo
Jan 7 at 3:04
I have a strong theoretical reason to believe that neither is truly a cause, and that even if one became a cause that it would not remain true over time. So i don’t think that Granger Causailty is the answer in this case. I’ve upvoted the answer in any case, as it is useful — esp. the R code.
– ricardo
Jan 7 at 3:04
That is why I explicitly mention that “Granger causality has been critiqued for not actually establishing causality (in some cases).” It seems to me that your question is more about establishing “predictive causality,” which is what Granger causality is meant for. In addition, Granger’s approach uses the information in your time series data, which are a waste not to use if you have them. Of course, you can (should?) reestimate the effects over time. I expect that the Granger effects are more stable than crosssectional OLS (you can test this beforehand, using historical data). HTH
– Steve G. Jones
Jan 7 at 7:04
That is why I explicitly mention that “Granger causality has been critiqued for not actually establishing causality (in some cases).” It seems to me that your question is more about establishing “predictive causality,” which is what Granger causality is meant for. In addition, Granger’s approach uses the information in your time series data, which are a waste not to use if you have them. Of course, you can (should?) reestimate the effects over time. I expect that the Granger effects are more stable than crosssectional OLS (you can test this beforehand, using historical data). HTH
– Steve G. Jones
Jan 7 at 7:04
add a comment 
Thanks for contributing an answer to Cross Validated!
 Please be sure to answer the question. Provide details and share your research!
But avoid …
 Asking for help, clarification, or responding to other answers.
 Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave(‘#loginlink’);
});
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin(‘.newpostlogin’, ‘https%3a%2f%2fstats.stackexchange.com%2fquestions%2f385812%2fistheaverageofbetasfromyxandxyvalid%23newanswer’, ‘question_page’);
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave(‘#loginlink’);
});
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave(‘#loginlink’);
});
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave(‘#loginlink’);
});
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
The problem is not causality but instead the errors of measurement (it is just that often the dependent variable Y is the one with large measurement error, making “Y = a + B x + error” the common expression) Do you have an idea about the errors in the measurement of X and Y.
– Martijn Weterings
Jan 6 at 12:04
The exact values of $beta$ and $gamma$ can be found in this answer of mine to Effect of switching responses and explanatory variables…, and, as you suspect, $beta$ is not the reciprocal of $gamma$, and averaging $beta$ and $1/gamma$ is not the right way to go. A pictorial view of what $beta$ and $gamma$ are minimizing is given in Elvis’s answer to the same question, and he introduces a”least rectangles” regression that you might want …..
– Dilip Sarwate
Jan 6 at 15:43
You are in the ideal scenario where the choice of technique has a direct, physically measurable impact; you can simply measure the outofsample hedging error for each estimate, and compare them. Also, typically optimal hedging is better handled by using a VECM model (see for example Gatarek & Johansen, 2014, Optimal hedging with the cointegrated vector autoregressive model), which does not require choosing to model Y as a function of X or viceversa.
– Chris Haug
Jan 6 at 16:32
You might want to look at the geometric mean $sqrt{dfrac{beta}{gamma}}$ as a possibility (if they are both negative you might take the negative square root). Then look at $dfrac{s_y}{s_x}$, which should be very similar
– Henry
Jan 6 at 18:37
@ricardo Note that I specified outofsample error, so not the (insample) fit of the model. And it is entirely possible for the optimal hedge ratio to change over time (especially if the relationship is not actually linear), that doesn’t change the fact that figuring out the best hedging strategy can be most directly done by backtesting the model and observing the results.
– Chris Haug
Jan 6 at 23:53