Site Meter

Thursday, March 07, 2024

Avatars of the Tortoise III

In "Avatars of the Tortoise" Jorge Luis Borges wrote "There is a concept which corrupts and upsets all others. I refer not to Evil, whose limited realm is that of ethics; I refer to the infinite."

He concluded ""We (the indivisible divinity that works in us) have dreamed the world. We have dreamed it resistant, mysterious, visible, ubiquitous in space and firm in time, but we have allowed slight, and eternal, bits of the irrational to form part of its architecture so as to know that it is false."

I think I might have something interesting to say about that and I tried to write it here. If you must waste your time reading this blog, read that one not this one. But going from the sublime to the ridiculous I have been a twit (but honestly not a troll) on Twitter. I said I thought we don't need the concept of a derivative (in the simple case of a scalar function of a scalar the limit as delta x goes to zero of ther ratio delta Y over delta x - I insult you with the definition just to be able to write that my tweet got very very ratiod).

In avatars of the Tortoise II I argued that we can consider space time to be a finite set of points with each point in space the same little distance from its nearest neighbors and each unit of time the same discrete jump littleT from the most recent past. If I am right we don't need derivatives to analyse functions of space, there are just slopes, or time, or position as a function of time (velocity and acceleration and such). In such a model, there are only slopes as any series which goes to zero, gets to zero after a finite number of steps and the formula for a derivative must include 0/0.

I will make more arguments against derivatives. First I will say that we learn nothing useful if we know the first, second, ... nth ... derivative of a function at X. Second I will argue that we can do what we do with derivatives using slopes. Third I will argue that current actual applied math consists almost entirely of numerical simulations on computers which are finite state autometa and which do not, in fact, handle continuums when doing the simulating. They take tiny little steps (just as I propose).

I am going to make things simple (because I don't type so good and plane ascii formulas are a pain). I will consider scalar functions of scalars (so the derivative will be AB ap calculus). I will also consider only derivatives at zero.

f'(0) = limit as x goes to zero of (f(x)-f(0))/x

that is, for any positive epsilon there is a positive delta so small that if |x| This is useful if we know we are interested in x with absolute value less than delta, but we can't know that because the definition of a derivative gives us no hint as to how small delta must be.

To go back to Avatars of the tortoise I, another equally valid definition of a derivative at zero is, consider the infinite series X_t = (-0.5)^t.

f'(0) = the limit as t goes to infinity of (f(x_t)-f(0))/x_t that is, for any positive epsilon there is a positive N so big that, if t>N then |f'(0) - (f(x_t)-f(0))/x_t|< epsilon.

so we have again the limit as t goes to infinity and the large enough N with know way of knowing if the n which interests us (say 10^1000) is large enough. Knowing the limit tells us nothing about the billionth element. The exact same number is the billionth element of a large infinity of series some of which converge to A for any real number A, so any number is as valid an asymptotic approximation as any other, so none is valid.

Now very often the second to last step of the derivation of a derivative includes an explicit formula for f'(0) - (f(x)-f(0))/x and then the last step consists of proving it goes to zero by finding a small enough delta as a function of epsilon. That formula right near the end is useful. Te derivative is not. Knowing that there is a delta is not useful if we have no idea how small it must be.

In general for any delta no matter how small for any epsilon no matter how small, there is a function f such that |f'(0) - (f(delta)-f(0))/delta|>1/epsilon (I will give an example soon). for any function there is a delta does not imply that there is a delta which works for any function. The second would be useful. The first is not always useful.

One might consider the first, second, ... nth derivatives and an nth order Taylor series approximation which I will call TaylorN(x)

for any N no matter how big, for any delta no matter how small for any epsilon no matter how small, there is a function f such that |TaylorN(delta) - f(delta)|>1/epsilon

for example consider the function f such that

f(0) = 0, if x is not zero f(x) = (2e/epsilon)e^(-(delta^2/x^2))

f(delta) = 2/epsilon > 1/epsilon.

f'(0) is the limit as x goes to zero of

-(2e/epsilon)(2delta^2/x^3)e^(-(delta^2/x^2)) = 0.

the nth derivative the limit as x goes to zero of an n+2 order polynomial times e^(-(delta^2/x^2)) and so equals zero.

The Nth order taylor series approximation of f(x) equals zero for every x. for x = delta it is off by 2/epsilon > 1/epsilon.

There is no distance from zero so small and no error so big that there is no example in which the Nth order Taylor series approximation is definitely not off by a larger error at that distance.

Knowing all the derivatives at zero, we know nothing about f at any particular x other than zero. Again for any function, for any epsilon, there is a delta, but there isn't a delta for any function. Knowing all the derivatives tells us nothing about how small that delta must be, so nothing we can use.

So if things are so bad, why does ordinary caluclus work so well ? It works for a (large) subset of problems. People have learned about them and how to recognise them either numerically or with actual experiments or empirical observtions. But that succesful effort involved numerical calculations (that is arithmetic not calculus) or experiments or observations. It is definitely Not a mathematical result that the math we use works. Indeed there are counterexamples (of which I presented just one).

part 2 of 3 (not infinite even if it seems that way but 3. If the world is observationally equivalent to a word with a finite set of times and places, then everything in physics is a slope. More generally, we can do what we do with derivatives and such stuff with discrete steps and slopes. We know this because that is what we do when faced with hard problems without closed form solutions. We hand them over to computers which consider a finite set of numbers with a smallest step. and that quickly gets me to part 3 of 3 (finally). One person on Twitter says we need to use derivatives etc to figure out how to write the numerical programs we actually use in applications. This is an odd claim. I can read (some) source code (OK barely source code literate as I am old but some). I can write (some) higher higher language source code. I can force myself to think in some (simple higher higher language) source code (although in practice I use derivatives and such like). Unpleasant but not impossible.

Someone else says we use derivatives to know if the simulation converges or, say, if a dynamical system has a steady state which is a sink or stuff like that. We do, but tehre is no theorem that this is a valid approach and there are counterexamples (basically based on the super simple one I presented). All that about dynamics is about *local* dynamics and is valid if you start out close enough and there is no general way to know how close is close enough. In practice people have found cases where linear and Taylor series (and numerical) approximations work and other cases where they don't (consider chaotic dynamical systems with positive Lyaponoff exponents and no I will not define any of those terms).

Always the invalid pretend pure math is tested with numerical simulations or experiments or observations. People learn when it works and tell other people about the (many) cases where it works and those other people forget the history and pour contempt on me on Twitter.

Avatars of the Tortoise II

In "Avatars of the Tortoise" Jorge Luis Borges wrote "There is a concept which corrupts and upsets all others. I refer not to Evil, whose limited realm is that of ethics; I refer to the infinite."

He concluded ""We (the indivisible divinity that works in us) have dreamed the world. We have dreamed it resistant, mysterious, visible, ubiquitous in space and firm in time, but we have allowed slight, and eternal, bits of the irrational to form part of its architecture so as to know that it is false."

I think rather that we have dreamed of infinity which has no necessary role in describing the objective universe which is "resistant, mysterious, visible, ubiquitous in space and firm in time*".

First the currently favored theory is that space is not, at the moment, infinite but rather is a finite hypersphere. There was a possibility that time might end in a singularity as it began, but the current view is that the universe will expand forever. Bummer. I note however that the 2nd law of thermodynamics implies that life will not last forever (asymptotically we will "all" be dead, "we" referring to living things not currently living people). So I claim that there is a T so large that predictions of what happens after T can never be tested (as there will be nothing left that can test predictions.

However it is still arguable (by Blake) that we can find infinity in a grain of sand and eternity in an hour. Indeed when Blake wrote, that was the general view of phyicists (philosophy makes the oddest bedfellows) as time was assumed to be a continuum with infinitely many distinct instants in an hour.

Since then physicists have changed their mind -- the key word above was "distinct" which I will also call "distinguishable" (and I dare the most pedantice pedant (who knows who he is) to challenge my interchanging the two words which consist of different letters).

The current view is that (delta T)(delta E) = h/(4 pi) where delta T is the uncertainty in time of an event, delta E is the uncertainty in energy involved, h is Planck's constant, pi is the ratio of the circumpherance of a circle to it's diameter and damnit you know what 4 means.

delta E must be less that Mc^2 where M is the (believed to be finite) mass of the observable universe. So there is a minimum delta T which I will call littleT. A universe in which time is continuous (and an hour contains an infinity of instants) is observationally equivalent to a universe in which time (from the big bang) is a natural number times littleT. The time from the big bang to T can be modeled as a finite number of discrete steps just as well as it can be modeled as a continuum of real numbers. This means that the question of which if these hypothetical possibilities time really is is a metaphysical question not a scientific question.

Now about that grain of sand. there is another formula

(delta X)(delta P) = h/(4 pi)

X is the location of something, P is its momentum. |P| and therefore delta P is less than or equal to Mc/2 where M is the mass of the observable universe. The 2 appears because total momentum is zero. This means that there is a minimum delta X and a model in which space is a latice consisting of a dimension zero, countable set of separated points is observationally equivalent to the standard model in which space is a 3 dimensional real manifold. Again the question of what space really is is metaphysical not scientific.

Recall that space is generally believed to be finite (currently a finite hypersphere). It is expanding. At T it will be really really big, but still finite. That means the countable subset of the 3 dimensional manifold model implies a finite number of different places. No infinity in the observablee universe let alone in a grain of sand

There are other things than energy, time, space and momentum. I am pretty sure they can be modeled as finite sets too (boy am I leading with my chin there).

I think there is a model with a finite set of times and of places which is observationally equivalent to the standard model and, therefore, just as scientifically valid. except for metaphysics and theology, I think we have no need for infinity. I think it is not avatars of the tortoise all the way down.

*note not ubiquitous in time as there wass a singularity some time ago.

Wednesday, March 06, 2024

Asymtotically we'll all be dead II

alternative title "avatars of the tortoise I"

Asymptotically we'll all be dead didn't get much of a response, so I am writing a simpler post about infinite series (which is the second in a series of posts which will not be infinite. First some literature "Avatars of the Tortoise" is a brilliant essay by Jorge Luis Borges on paradoxes and infinity. Looking at an idea, or metaphor (I dare not type meme) over centuries was one of his favorite activities. In this case, it was alleged paradoxess based on infinity. He wrote "There is a concept which corrupts and upsets all others. I refer not to Evil, whose limited realm is that of ethics; I refer to the infinite."

When I first read "Aavatars of the Tortoise" I was shocked that the brilliant Borges took Zeno's non paradox seriously. The alleged paradox is based on the incorrect assumpton that a sum of an infitite numbr of intervals of time adds up to forever. In fact, infite sums can be finite numbers, but Zeno didn't understand that.

Zeno's story is (roughly translated and with updated units of measurement)

consider the fleet footed Achilles on the start line and a slow tortoise 100 meters ahead of him. Achilles can run 100 meters in 10 seconds. The tortoise crawls forward one tenth as fast. The start gun goes off. In 10 seconds Achilles reaches the point where the tortoise started by the tortoise has crawled 10 meters (this would only happen if the tortoise were a male chasing a female or a female testing the male's fitness by running away - they can go pretty fast when they are horny).

So the race continues to step 2. Achilles reaches the point where the tortoise was after 10 seconds in one more second, but the tortoise has crawled a meter.

Step 3, Achilles runs another meter in 0.1 seconds, but the tortoise has crawled 10 cm.

The time until Achilles passes the tortoise is an infinite sum. Silly Zeno decided that this means that Achilles never passes the tortoise, that the time until he passes him is infinite. In fact a sum of infinitely many numbers can be finite -- in theis case 10/(1-0.1) = 100/9 < infinity.

Now infinite sums can play nasty tricks. Consider a series x_t t going from 1 to infinity. If the series converges to x, but does not converge absolutely (so sum |x_t| goes to infinity) then one can make the series converge to any number at all by changing the order in which the terms are added. How can this be given the axiom that addition is commutative. Now that's a bit of a paradox.

the proof is simple, let's make it converget to A. DIrst note that the positive terms must add to infinity and the negative terms add to - infinity (so that they cancel enough for the series to converge).

now add positive terms until Sumsofar >A (if A is negative this requires 0 terms). Now add negative terms until sumsofar The partial sums will cross A again and again. The distance from the partial sum to A is less than the last term as it just crossed A. The last term must go to zero as we get t going to infinity (so the original series can converge) so the new series of partial sums converged to A.

That's one of the weird things infinity does. I think that everything which is strongly counterintuitive in math has infinity hiding somewhere (no counterexamples have come to my mind and I have looked for one for decades).

Now I say that the limit of a series (original series of sum t = 1 1 to T) as T goes to infinity is not, in general, of any practical use, because in the long run we will all be dead. I quote from "asymptotically we'll all be dead"

Consider a simple "problem of a series of numbers X_t (not stochastic just determistic numbers). Let's say we are interested in X_1000. What does knowing that the limit of X_t as t goes to infinity is 0 tell us about X_1000 ? Obvioiusly nothing. I can take a series and replace X_1000 with any number at all without changing the limit as t goes to infinity.

Also not only does the advice "use an asymptotic approximation" often lead one astray, it also doesn't actually lead one. The approach is to imaging a series of numbers such that X_1000 is the desired number and then look at the limit as t goes to infinity. The problem is that the same number X_1000 is the 1000th element of a of an uh large infinity of different series. one can make up a series such that the limit is 0 or 10 or pi or anything. the advice "think of the limit as t goes to infinity of an imaginary series with a limit that you just made up" is as valid an argument that X_1000 is approximately zero as it is that X_1000 is pi, that is it is an obviously totally invalid argument.

An example is a series whose first google (10^100) elements aree one google so x_1000000 = 10^100, and the laters elements are zero. The series converges to zero. If one usees the limit as t goes to infinity as an approximation when thinking of X_999 then one concludes that 10^100 is approximately zero.

The point is that the claim that a series goes to x s the claim that (for that particular series) for any positive epsilon, there is an N so large that if t>N then |x_t-x} This tells us nothing about how large that N is and whether it is much larget than any t which interests us. Importantly it is just not true that there is a N so large that (for any series and the same N) if t > n then |x_t- the limit| < 10^10^(100)

Knowing only the limit as t goes to infinity, we have no idea how large an N is needed for any epsilon, so we have no idea if the limit is a useful approximation to anything we will see if we read the series for a billion years.

Now often the proof of the limit contains a useful assertion towards the end of the proof. For example one might prove that |x_t-X| < A/t for some A. The next step is to note that the limit as to goes to infinity of x_t is X. This last step is a step in a very bad direction going from something useful to a useless implication of the useful statement.

Knowing A we know that N = floor(A/epsilon). That's a result we can use. It isn't as elegant as saying something about limits (because it includes the messy A and often includes a formula much messier than A/t). However, unlike knowing the limit as t goes to infinity it might be useful some time in the next trillion years.

In practice limits are used when it seems clear from a (finite) lot of calculations that they are good approximations. But that means one can just do the many but finite number of calculations and not bother with limits or infinity at all.

In this non-infinite series of posts, I will argue that the concept of infinity causes all sorts of fun puzzles,but is not actually needed to describe the universe in which we find ourselves.

Monday, February 19, 2024

Asymptotically We'll all be Dead

This will be a long boring post amplifying on my slogan.

I assert that asymptotic theory and asymptotic approximations have nothing useful to contribute to the study of statistics. I therefore reject that vast bul of mathematical statistics as absolutely worthless.

To stress the positive, I think useful work is done with numerical simulations -- monte carlos in which pseudo data are generated with psudo random number generators and extremely specific assumptions about data generating processes, statistics are calculated, then the process is repeated at least 10,000 times and the pseudo experimental distribution is examined. A problem is that computers only understand simple precise instructions. This means that the Monte Carlo results hold only for very specific (clearly false) assumptions about the data generating process. The approach used to deal with this is to make a variety of extremely specific assumptions and consider the set of distributions of the statistic which result.

I think this approach is useful and I think that mathematical statisticians all agree. They all do this. Often there is a long (often difficult) analysis of asymptotics, then the question of whether the results are at all relevant to the data sets which are actually used, then an answer to that question based on monte carlo simulations. This is the approach of the top experts on asymptotics (eg Hal White and PCB Phillips).

I see no point for the section of asymptotic analysis which no one trusts and note that the simulations often show that the asymptotic approximations are not useful at all. I think they are there for show. Simulating is easy (many many people can program a computer to do a simulation). Asymptotic theory is hard. One shows one is smart by doing asymptotic theory which one does not trust and which is not trustworthy. This reminds me of economic theory (most of which I consider totally pointless).

OK so now against asymptotics. I will attempt to explain what is done -- I will use the two simplest examples. In each case, there is an assumed data generating process and a sample size of data (hence N) which one imagines is generated. Then a statistic is estimated (often this is a function of the sample size and an estimate of a parameter of a parametric class of possible data generating processes). The statistic is modified by a function of the sample size (N). The result is a series of random variables (or one could say a series of distributions of random variables). The function of the sample size N is chosen so that the series of random variables converges in distribution to a random variable (convergence in distribution is convergence of the cumulative distribution function at all points where there are no atoms so it is continuous).

One set of examples (usually described differently) are laws of large numbers. A very simple law of large numbers assumes that the data generating process is a series of independent random numbers with identical distributions (iid). It is assumed (in the simplest case) that this distribution has a finite mean and a finite variance. The statisitc is the sample average. as N goes to invinity it converges to a degenerate distribution with all weight on the population average. It is also tru that the sample average converges to a distrubiton whith mean equal to the population mean and variance going to zero - that is for any positive epsilon there is an N1 so large that the variance of the sample mean is less than epsilon (conergence in quadratic rule). Also for any positive epsilon there is an N1 so large that if the sample size N>N1 then the probability that the sample mean is more than epsilon from the population mean is itself less than epsilon (convergence in probability). The problem is that there is no way to know what N! is. In particular, it depends on the underlying distribution. The population variance can be estimated using the sample variance. This is a consistent estimate so that the difference is less than epsilon if N>N2. The problem is that there is no way of knowing what N2 is.

Another very commonly used asymptotic approximation is the central limit theorem. Again I consider a very simple case of an iid random variable with a mean M and a finite variance V.

In that case (sample mean -M)N^0.5 will converge in distribution to a normal with mean zero and variance V. Again there is no way to know what the required N1 is. for some iid sitributions (say binary 1 or zero with probability 0.5 each or uniform from 0 to 1) N1 is quite low and the distribution looks just like a normal distribution for a sample size around 30. For others the distribution is not approximately normal for a sample size of 1,000,000,000.

I have criticisms of asymptotic analysis as such. The main one is that N has not gone to infinity. Also we are not imortal and my not live long enough to collect N1 observations.

Consider an even simpler problem of a series of numbers X_t (not stochastic just determistic numbers). Let's say we are interested in X_1000. What does knowing that the limit of X_t as t goes to infinity is 0 tell us about X_1000 ? Obvioiusly nothing. I can take a series and replace X_1000 with any number at all without changing the limit as t goes to infinity.

Also not only does the advice "use an asymptotic approximation" often lead one astray, it also doesn't actually lead one. The approach is to imaging a series of numbers such that X_1000 is the desired number and then look at the limit as t goes to infinity. The problem is that the same number X_1000 is the 1000th element of a of an uh large infinity of different series. one can make up a series such that the limit is 0 or 10 or pi or anything. the advice "think of the limit as t goes to infinity of an imaginary series with a limit that you just made up" is as valid an argument that X_1000 is approximately zero as it is that X_1000 is pi, that is it is an obviously totally invalide argument.

This is a very simple example, however there is the exact same problem with actual published asymptotic approximations. The distribution of the statistic for the actual sample size is one element of a very large infinity of possible series of distributions. Equally valid asymptotic analysis can imply completely different assertions about the distribution of the statistic for the actual sample size. As they can't both be valid and they are equally valid, they both have zero validity.

An example. Consider a random walk where x_t = (rho)x_(t-1) + epsilon_t where epsilon is n iid random variable with mean zero and finite variance. There is a standard result that if rho is less than 1 then (rhohat-rho)N^0.5 has a normal distribution. There is a not so standard result that if rho = 1 then (rhohat-rho)N^0.5 goes to a degenerate distribution equal to zero with probability one and (rhohat-rho)N goes to a strange distribution called a unit root distribution (with the expected value of (rhohat-rho)N less than 0.

Once I came late to a lecture on this and casually wrote converges in distribution to a normal with mean zero and variance before noticing that the usual answer was not correct in this case. The professor was the very brilliant Chris Vavanaugh who was one of the first two people to prove the result described below (and was not the first to publish).

Doeg Elmendorf who wasn't even in the class and is very very smart (and later head of the CBO) asked how there can be such a discontinuity at rho +1 when, for a sample of a thousand observations, there is almost no difference in the joint probability distribution or any possible statistic between rho = 1 and rho = 1 - 10^(-100). Prof Cavanaugh said that was his next topic.

The problem is misuse of asymptotics (or according to me use of asymptotics). Note that the question explicity referred to a sample size of 1000 not a sample size going to infinity.

So if rho = 0.999999 = 1 - 1/(a million) then rho^1000 is about 1 but rho^10000000000 iis about zero. taking N to infinity implies that, for a rho very slightly less than one, almost all of the regression coefficients of X_t2 on X_t1 (with t1. Now the same distribution of rhohat for rho = 0.999999 and N = 1000 is the thousandth element of many many series of random variables.

One of them is a series where Rho varies with the sample size N so Rho_N = 1-0.001/N

for N = 1000 rho = 0.999999 so the distribution of rhohat for the sample of 1000 is just the same as before. However the series of random variables (rhohat-rho)N^0.5 does not converge to a normal distribution -- it converges to a degenerate distribution which is 0 with probability 1.

In contrast (rho-rhohat)N converges to a unit root distribution for this completely different series of random variables which has the exact same distribution for the sample size of 1000.

There are two completely different equally valid asymptotic approximations.

So Cavanough decided which to trust by running simulations and said his new asymptotic aproximation worked well for large finite samples and the standard one was totally wrong.

See what happened ? Asymptotic theory did not answer the question at all. The conclusion (which is universally accepted) was based on simulations.

This is standard practice. I promise that mathematical statisticians will go on and on about asymptotics then check whether the approximation is valid using a simulation.

I see no reason not to cut out the asymptotic middle man.