More

sillyinseattle · on Dec 18, 2023

> Anyone with a retirement account that holds passive etfs

This is incorrect. Not all ETFs. These funds are sector focused (vehicles). A regular S&P500 ETF will not hold NKLA.

tony69 · on Dec 19, 2023

I was trying to be brief but you are correct. NKLA was not in the SP500, but it was in the Russel 2000. Also, target retirement date funds will hold total stock market index funds, which will contain NKLA.

Look: https://money.cnn.com/quote/shareholders/shareholders.html?s... Top ten holders are Vanguard, Blackrock, & the other usual providers of index ETFs.

sillyinseattle · on June 22, 2023

WSJ.com is now reporting that a top secret US navy system did detect the implosion - on site commander was informed.

sillyinseattle · on May 18, 2023

How cool! Free of jargon and loads of intuition

sillyinseattle · on May 4, 2023

"Manufacturers may intentionally damage a portion of their goods ... "[0]

The intel chip example has already been mentioned in comments. The general idea of "Damaged Goods" (and how this can benefit consumers) appears here (at Preston McAfee's site).

[0] https://mc4f.ee/Papers/PDF/DamagedGoods.pdf

sillyinseattle · on Sept 26, 2022

> but I feel like this is just a password you can't change

Not quite. IBM has (had?) a research program on "cancelable" biometrics. I do not recall perfectly, but I think they were tweaking the encoded biometric sensor data before committing it to DBs. If there is a leak, one can redo it with a new tweak (like a new salt or nonce).

tsimionescu · on Sept 26, 2022

How does that help if someone has a detailed picture of your fingerprint?

sillyinseattle · on Sept 26, 2022

Oops. You're right.

sillyinseattle · on July 10, 2022

I despise how they operated from start through 2017. But do I wish Uber had never happened? Nope. Also, when you say "letting them get away with .." are you including Macron, Biden etc in "them'?

sillyinseattle · on June 12, 2022

Haha... pegged to 1 + e^(ipi) i.e. 1 + cos(pi) + isin(pi) i.e. 0. Nice of them to make the satire obvious in the twitter post image itself.

sillyinseattle · on May 18, 2022

Question about terminology (no background in AI). In econometrics, estimation is model fitting (training, I guess), and inference refers to hypothesis testing (e.g. t or F tests). What does inference mean here?

iamaaditya · on May 18, 2022

In machine learning (especially deep learning or neural networks), the 'training' is done by using Stochastic Gradient Descent. These gradients are computed using Backpropagation. Backpropagation requires you to do a backward pass of your model (typically many layers of neural weights) and thus requires you to keep in memory a lot of intermediate values (called activations). However, if you are doing "inference" that is if the goal is only to get the result but not improve the model, then you don't have to do the backpropagation and thus you don't need to store/save the intermediate values. As the layers and number of parameters in Deep Learning grows, this difference in computation in training vs inference becomes signifiant. In most modern applications of ML, you train once but infer many times, and thus it makes sense to have specialized hardware that is optimized for "inference" at the cost of its inability to do "training".

eklitzke · on May 18, 2022

Just to add to this, the reason these inference accelerators have become big recently (see also the "neural core" in Pixel phones) is because they help doing inference tasks in real time (lower model latency) with better power usage than a GPU.

As a concrete example, on a camera you might want to run a facial detector so the camera can automatically adjust its focus when it sees a human face. Or you might want a person detector that can detect the outline of the person in the shot, so that you can blur/change their background in something like a Zoom call. All of these applications are going to work better if you can run your model at, say, 60 HZ instead of 20 HZ. Optimizing hardware to do inference tasks like this as fast as possible with the least possible power usage it pretty different from optimizing for all the things a GPU needs to do, so you might end up with hardware that has both and uses them for different tasks.

sillyinseattle · on May 18, 2022

Thank you @iamaaditya and @eklitzke . Very informative

dekhn · on May 18, 2022

it took me 20 years to learn this body of knowledge and now it can just sort of be summed up in a paragraph.

When I learned and used gradient descent, you had to analytically determine your own gradients (https://web.archive.org/web/20161028022707/https://genomics....). I went to grad school to learn how to determine my own gradients. Unfortunately, in my realm, loss landscapes have multiple minima, and gradient descent just gets trapped in local minima.

thentherewere2 · on May 18, 2022

This is the case most contemporary neural networks as well. It turns out for many domains, a "good" local minima generalizes well across many tasks.

dekhn · on May 18, 2022

Huh. I talked to some experts and they told me NN loss functions are bowl-shaped and have single minima, but those minima take a very long time to navigate to in high dimensional spaces.

Salgat · on May 19, 2022

For higher feature counts the real concern is saddle points rather than minima, where the gradient is so small that you barely move at all each iteration and get "stuck".

timomo · on May 19, 2022

To add here: for a local minimum to occur all those dimensions (or features) need to increase. This is highly unlikely for modern NNs where you have millions of dimensions. If one of the dimensions is going down but the rest up, you have a saddle point. Since you go down only one (or few) dimensions it takes longer.

ray__ · on May 19, 2022

What's your realm?

dekhn · on May 19, 2022

protein folding and structure prediction. Protein simulations typically define an energy function, similar to a loss function, over all the atoms in the protein. There are many terms: at least one per bonded atom pair, at least one per bonded atom triple, at least one per bonded atom quadruple, one per each non-bonded pair (although atoms that are distant can be excluded, sometimes making this a sparse matrix). If you start with a proposed model (say, random coordinates for all the atoms) and apply gradient descent, you'll end up with a mess. All those energy terms end up creating a high dimensional surface that is absurdly spiky in the details, and extremely wavy with many local minima at coarse grain.

Instead of using gradient descent, we used molecular dynamics (I'm unaware if this has a direct equivalent) to sample the space by moving along various isocontours (constant energy, or constant temp, or usually constant pressure). Even so, you have to do a lot of sampling- in my day, it was years of computer time, now it's months- to get a good approximation to the total landscape, and measure transition frequencies between areas of the landscape that correspond to energy barries (local maxima) that are smaller than the thermal energy avaialble to the system.

It's complicated. also, deep mind obviated all my work by providng that sequence data (which is cheap to obtain) can be used to predict very accurate structures with little or no simulation.

derbOac · on May 19, 2022

Worth noting that inference in "traditional" statistics and ML/AI/DL isn't really that different at some level. In both cases you have an inverse problem; in one case the parameters are about a group or population (e.g., something about all cats in existence), and in another it is about an individual case (something about a particular cat).

dataexporter · on May 18, 2022

This sounds really fascinating. Are there any resources that you'd recommend for someone who's starting out in learning all this? I'm a complete beginner when it comes to Machine Learning.

dr_zoidberg · on May 18, 2022

Deep Learning with Python (2nd ed), by Francois Chollet.

If you don't mind about learning the part where you program, it's got a lot of beginner/intermediate concepts clearly explained. If you do dive into the programming examples, you get to play around with a few architectures and ideas and you're left on the step to dive into the more advanced material knowing what you're doing.

sshlocalhost98 · on May 19, 2022

Thanks for the explanation, really succinct. Do you recommend any good back propagation tutorials for an EE undergrad?

abm53 · on May 18, 2022

It is confusing that the ML community have come to use "inference" to mean prediction, whereas statisticians have long used it to refer to training/fitting, or hypothesis testing.

I'm not sure when or why this started.

mattkrause · on May 18, 2022

Prediction.

The model is literally "inferring" something about its inputs: e.g., these pixels denote a hot dog, those don't.

malshe · on May 18, 2022

I have background in both and it's very confusing to me. Inference in DL is running a trained model to predict/classify. Inference in stats and econometrics is totally different as you noted.

upwardbound · on May 18, 2022

Inference here means "running" the model. So maybe it has a similar meaning as in econometrics?

Training is learning the weights (millions or billions of parameters) that control the model's behavior, vs inference is "running" the trained model on user data.

Q6T46nT668w6i3m · on May 18, 2022

I’m surprised nobody has provided the basic explanation: inference, here, means matrix, matrix or matrix, scalar multiplication.

sillyinseattle · on May 11, 2022

Commenters are clearly trying to help. They can do better if you provide more context. What kind of math do you want to do? This does not have to be related to your dissertation area. And since you are asking on HN .. what kind of hacking/ development do you want to do? Coming from academia (I have a phd in game theory -- kind of useless in real world, other than ad auctions), it gets easier to find an industry job and develop idea of a good fit once you already have one! Don't be too picky to start with. Good luck!

rosetremiere · on May 12, 2022

You're right! Here is some more context:

* Bachelor in CS, and I kept an interest in computing/programming languages, although I don't have much to show for it. I do have some scientific julia code, but it's closer to the hackish academic-type dump of code than a well architected endeavour.

* I'm coming from pure maths, and I have very little experience with computational aspects of analysis/geometry. I also have pretty much no knowledge of probability/statistics.

* I _do_ feel like it wouldn't be too hard learning the prerequisites for the above, given a few months and a good textbook.

* Re what kind of maths/development I'd like to do: I guess pretty much anything where there is a relatively strong "research" component. Not necessarily meaning pure research, but where it's not just about applying methods, but also about developing them and understanding the problem and solution space before "rote application", if that makes sense. If I was super sold on the end goal of the company, then I guess I'd naturally put less weight on those "requirements".

* Typically, I've been interested in compilers/PLT for quite a long time, but that's typically the kind of thing relatively far from my field. There would probably be quite a steep learning curve, and I'd be in competition with people that actually studied that, hence little chances of success there.

* My university was a small European local university, with a small maths department (third tier, as some might say). I don't feel like I was a particularly bright student, and I also figured that PhD students weren't necessarily as intelligent/productive/creative as one might believe from outside.

Re game theory: I would have guessed it wouldn't be kind of useless! Doesn't it at least provide you with a good perspective/insight into plenty of real life problems?

sillyinseattle · on April 23, 2022

Quite unusual. It's also harder to purchase devices with 2 (physical) SIM slots (a popular feature in Asia)