Most games since Half Life 2 use constraint forces like this to solve collisions. Springs/penalty forces are still used sometimes in commercial physics solvers since they're easier to couple with other simulations, but they require many small timesteps to ensure convergence.
The variables are pointless. Your lib and include directories are already defined somewhere else: your project's directory structure.
On the compiler command line, you just want -iquote include; -iquote $(INC_DIR) doesn't help anything.
If someone renames or moves directories in the future, they can search and replace the Makefile.
Variables hide things. If I see $(INC_DIR), is that something external? I have to look it up.
By default I assume that a variable exists for a reason, which is that something is being made configurable. Usually, configurable things are uncertainties that are external: where other stuff is, and what not.
The article further recommends nonstandard practices.
- LFLAGS: what is that? The standard variables are LDFLAGS and LDLIBS. LDFLAGS are options for linking, and LDLIBS are just the libraries like -lwhatever. They are separated because they go into different parts of the command line. For Pete's sake, do not invent your own variables. Learn the standards. Distro maintainers will thank you when your program is packaged.
- Believe it it or not, but you must should not touch CFLAGS. CFLAGS belongs to the user who is invoking your Makefile. Put any required options or debug options and whatnot into a different variable. Combine that with CFLAGS. The same goes for the aforementioned LDFLAGS and LDLIBS.
This does not always do what the author thinks. It looks like it wants to add some options to CFLAGS, if CFLAGS is already defined. But this is not what GNU Make will do if CFLAGS is coming from the command line:
make CFLAGS=-O2
then the := assignment will be entirely suppressed. That -O2 will be the CFLAGS. The combination will happen if CFLAGS is coming from the environment.
Thus if you have essential code generation options that your program needs, the above pattern will break in some situations. Don't use CFLAGS as the container where you accumulate all your options. Assemble the options in your own variable, where you interpolate CFLAGS:
I see the article is using a double star operator in a $(wildcard ...) call. I don't see that documented in GNU Make, and it's not obvious from looking at all the glob-related code that any such thing is implemented.
The following has a subtle problem:
# Build object files and third-party libraries
$(OBJS): dir
@mkdir -p $(BUILD_DIR)/$(@D)
@$(CC) $(CFLAGS) -o $(BUILD_DIR)/$@ -c $*.c
The problem is that a directory timestamp changes often. This rule will cause the $(OBJS) to be considered out of date whenever the timestamp of dir is touched.
For this you need to use the GNU Make "order-only prerequisite" mechanism, where order-only prerequisites are separated by a bar:
# Build object files and third-party libraries
$(OBJS): | dir
@mkdir -p $(BUILD_DIR)/$(@D)
@$(CC) $(CFLAGS) -o $(BUILD_DIR)/$@ -c $*.c
The GNU Make manual has exactly this kind of example in the section on order-only prerequisites.
$(NAME): format lint dir $(OBJS)
This is running a format step that touches all the sources in the link step of the program. This is just silly and far removed from the purported goal of crafting a clean Makefile. Not only shouldn't build steps be touching the sources, but the dependencies this cruft brings in are unpalatable.
Can't believe I never saw this set. I was a bit of a collector of these "how browsers work" resources for a while. I can remember these off top-of-head; what am I forgetting?
First, I am big fan of your articles even before I joined IPinfo, where we provide IP geolocation data service.
Our geolocation methodology expands on the methodology you described. We utilize some of the publicly available datasets that you are using. However, the core geolocation data comes from our ping-based operation.
We ping an IP address from multiple servers across the world and identify the location of the IP address through a process called multilateration. Pinging an IP address from one server gives us one dimension of location information meaning that based on certain parameters the IP address could be in any place within a certain radius on the globe. Then as we ping that IP from our other servers, the location information becomes more precise. After enough pings, we have a very precise IP location information that almost reaches zip code level precision with a high degree of accuracy. Currently, we have more than 600 probe servers across the world and it is expanding.
The publicly available information that you are referring to is sometimes not very reliable in providing IP location data as:
- They are often stale and not frequently updated.
- They are not precise enough to be generally useful.
- They provide location context at an large IP range level or even at organization level scale.
And last but not least, there is no verification process with these public datasets. With IPv4 trade and VPN services being more and more popular we have seen evidence that in some instances inaccurate information is being injected in these datasets. We are happy and grateful to anyone who submits IP location corrections to us but we do verify these correction submissions for that reason.
From my experience with our probe network, I can definitely say that it is far easier and cheaper to buy a server in New York than in any country in the middle of Africa. Location of an IP address greatly influences the value it can provide.
We have a free IP to Country ASN database that you can use in your project if you like.
I've had similar issues - so much so that I've created Resgen[0] just because it was tough getting call backs. It turns out I really needed to tailor my resume to each job and it resulted in a lot more call backs.
Yes. 6 GHz spectrum has 7 channels that are 160 MHz wide.
Contrast that to the 5 GHz spectrum which only has 2 channels that are 80 MHz wide (excluding DFS). DFS is not an option in many places, and even the US only gets 1 channel that is 160 MHz wide (also DFS).
5 GHz spectrum is almost always congested and gets half the spectrum width.
I agree. They have a breathless tone to them that's quite annoying to me (I work in data compression as an academic, and I found this article uninspiring.)
By the way, there was an old Soviet magazine called "Kvant" (Russian for Quantum, I think). I do not know Russian, but I have 2 collected volumes of selected articles from them. [1] [2] Their quality is astonishingly good, and high-level. The difference is this:
The Kvant articles were written by professional research mathematicians, trying to present their ideas to an audience that were willing to follow them with pencil and paper in hand.
Quanta magazine articles are written by journalists trying to present advanced science to a lay audience - the articles are very stilted and present the articles in a tone that oversimplifies the problem and gives no idea about the actual solution, and uses hackneyed tropes like : oh look the solvers were just some random unknown guys (in a recent case, the random unknown guy is a tenured faculty at UCLA in theoretical computer science, apparently "a world away from mathematics" [3])
The pleroma instance linked in the OP is hosted on a very tiny VPS with no CDN, I fear it may fall over - if it does, consider swapping to the twitter URL.
You can use gradio(online) or download(git will not download, it's too big, do it manually) the weights at https://huggingface.co/lmsys/vicuna-13b-delta-v1.1/tree/main and then load the model in pytourch and try inference(text generation). But you'll need either a lot of RAM(16GB,32GB+) or VRAM(Card).
> How might I go about using these models for doing things like say summarizing news articles or video transcriptions
Again, you might try online or setup a python/bash/powershell script to load the model for you so you can use it. If you can pay I would recommend runpod for the shared GPUs.
> When someone tunes a model for a task, what exactly are they doing and how does this ‘change’ the model?
From my view ... not much ... "fine-tuning" means training(tuning) on a specific dataset(fine, as in fine-grained). As I believe(I'm not sure) they just run more epochs on the model with the new data you have provided it until they reach a good loss(the model works), that's why quality data is important.
So the naive MCTS implementation in Python is ridiculously inefficient. Of course, you could reimplement it in C++ but this then requires you to use the C wrappers of Tensorflow/JAX to do the MCTS/neural network interop.
MCTX comes up with an even niftier implementation in JAX that runs the entire MCTS algorithm on the TPU. This is quite a feat because tree search is typically a heavily pointer based algorithm. It uses the object pool pattern described in https://gameprogrammingpatterns.com/object-pool.html to serialize all of the nodes of the search tree into one flat array (which is how it manages to fit into JAX formalisms). I suspect it's not a particularly efficient use of the TPU, but it does cut out all of the CPU-TPU round trip latency, which I'm sure more than compensates.
There is also a nice new datastructure called Zip Trees [1] which is isomorphic to skip lists in the sense that it performs exactly the same comparisons on insertion/deletion (but it can skip some redundant ones). I would expect zip trees to be faster than skip lists in practice.
I read many sections of a few volumes when trying to pick a textbook to teach a seminar in Oxford. They are really nice, which is not surprising as they have e.g. Jeremy Avigad in the editorial board. Furthermore, they are beautifully typeset. I required a more computational approach, so my favorite texts remain:
The last one is freely available, but also quite advanced as it develops all theory from the Curry-Howard isomorphism angle. I think it is ideal for advanced CS students though, and an amazing textbook due to the breadth of material it manages to cover.
Back when I took the author's course on computer graphics in Utrecht (took me a few times to pass it, by no fault of his) I thought it was very strange that the course started out with ray tracing rather than traditional GPU rendering. After all, when you think graphics, you think OpenGL/Vulkan/DirectX, right?
Only after having to implement both types of renderer do you really get an appreciation of how elegant ray tracing really is in comparison. The basic ray tracer from this tutorial clocks in less than 200 lines of C++ excluding the headers! Then there are optimisations like BVHs/BLAS/TLAS which are all so simple to think and reason about compared to the inner workings of a GPU rendering pipeline.
I should find the time to go through this guide again and find out how I can get more performance out of my old ray tracer now that I've grown a few years older and wiser.
This tutorial is more about optimizing a ray tracer than writing one from scratch. If you're looking to learn the basics, I recommend reading through the tutorial the same author wrote eighteen years ago [1]. It covers the more basic concepts of a ray tracer without telling you exactly what to copy paste unless you "cheat" and download the code archive, which is a great way of teaching concepts to programmers in my opinion, as it gives you the opportunity to think for yourself.
With modern C++ you'd probably want to write your code a bit different (VC++ 6 wasn't the best C++ even at its time) and the compute limitations at the time are dwarfed by even your average integrated GPU, but the core concepts haven't changed.
After this gained prominence I took a hard read of Koopman's Stack Computers: The New Wave intent on writing an interpreter. I sought some optimization properties that would apply to the modern requirements for ILP, OoE and parallelization at low core frequencies -- none are offered. This manifests in WASM, and the criticism was elaborated very well: http://troubles.md/posts/wasm-is-not-a-stack-machine/
> This essentially makes WebAssembly a register machine without liveness analysis, but not only that, it’s a register machine that isn’t even in SSA form - both of the tools at our disposal to do optimisation are unavailable. In a true, optimising compiler we can recreate that information, but WebAssembly was already emitted by a compiler that generated that information once.
>The Abelian sandpile model (ASM) is the more popular name of the original Bak–Tang–Wiesenfeld model (BTW). BTW model was the first discovered example of a dynamical system displaying self-organized criticality. It was introduced by Per Bak, Chao Tang and Kurt Wiesenfeld in a 1987 paper.
>Three years later Deepak Dhar invented that the BTW sandpile model indeed follows the abelian dynamics and therefore referred to this model as the Abelian sandpile model.
>The model is a cellular automaton. In its original formulation, each site on a finite grid has an associated value that corresponds to the slope of the pile. This slope builds up as "grains of sand" (or "chips") are randomly placed onto the pile, until the slope exceeds a specific threshold value at which time that site collapses transferring sand into the adjacent sites, increasing their slope. Bak, Tang, and Wiesenfeld considered process of successive random placement of sand grains on the grid; each such placement of sand at a particular site may have no effect, or it may cause a cascading reaction that will affect many sites.
>Dhar has shown that the final stable sandpile configuration after the avalanche is terminated, is independent of the precise sequence of topplings that is followed during the avalanche. As a direct consequence of this fact, it is shown that if two sand grains are added to the stable configuration in two different orders, e.g., first at site A and then at site B, and first at B and then at A, the final stable configuration of sand grains turns out to be exactly the same. When a sand grain is added to a stable sandpile configuration, it results in an avalanche which finally stops leading to another stable configuration. Dhar proposed that the addition of a sand grain can be looked upon as an operator, when it acts on one stable configuration, it produces another stable configuration. Dhar showed that all such addition operators form an abelian group, hence the name Abelian sandpile model.
>The model has since been studied on the infinite lattice, on other (non-square) lattices, and on arbitrary graphs (including directed multigraphs). It is closely related to the dollar game, a variant of the chip-firing game introduced by Biggs.
>Luis David Garcia-Puente discusses sandpiles, and how they produce amazing "fractal zeroes". Dr Garcia-Puente is an associate professor at Sam Houston State University and was interviewed while attending an MSRI-UP summer program.
The implementation of the __cos kernel in Musl is actually quite elegant. After reducing the input to the range [-pi/4, pi/4], it just applies the best degree-14 polynomial for approximating the cosine on this interval. It turns out that this suffices for having an error that is less than the machine precision. The coefficients of this polynomial can be computed with the Remez algorithm, but even truncating the Chebyshev expansion is going to yield much better results than any of the methods proposed by the author.
This is an old article from the early 90s and I believe it may have been the first public mention of this fact about the x86 encoding, although no doubt many have independently "discovered" it before --- especially in the times when microcomputer programming consisted largely of writing down Asm on paper and then hand-assembling the bytes into memory using something like a hex keypad.
All of these are features inherited from the 8080/8085/Z80.
Here are the corresponding opcode tables in octal:
Does anybody else dislike the socket interface abstraction? To me, an individual TCP port + IP is, effectively, giving me a 'virtual uart'. Instead of saying COM4 or whatever, you say tcp port 12345 IP address 6.7.8.9
Similar thing for ranges of IP address, except in that case, the 'listener' has to figure out which 'virtual uart' to act upon. In that case, the OS splits the 'virtual uart' assignment with the server listening to a range of address.
Sockets seem unnaturally messy to me. Maybe I'm ignorant.
So, how do people come up with these things? I assume every aspect of the design is carefully considered to defend it against various attacks. For example, why "right rotate 7 XOR right rotate 18 XOR right shift 3" and not "right rotate 2 XOR right rotate 3 XOR right shift 4"?
Here are some fun GPS projects I've found in the past, maybe others can add to this list.
GPS/Galileo/Beidou/Glonass status and error monitoring, open-source community-ran project: https://galmon.eu/
DIY GPS receiver using minimal signal frontend, FPGA Forth CPU for real-time processing and RPi running position solvers: http://www.aholme.co.uk/GPS/Main.htm
Transformer based architectures and unsupervised pre-training are achieving state of the art results across multiple modalities including NLP, CV, speech recognition, genomics, physics etc - so here's my must read list of recent papers on the topics (along with some of my notes). Happy holidays!
An “annotated” version of [1] in the form of a line-by-line Pytorch implementation. Super helpful for learning how to implement Transformers in practice!
One of the most highly cited papers in machine learning!
Proposed an unsupervised pre-training objective called masked language modeling; learned bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers.
See the above slideshow from the primary author, noting the remarkably prescient conclusion: "With [unsupervised] pre-training, bigger == better, without clear limits (so far)"
Arguably one of the most important papers published in the last 5 years!
Studies empirical scaling laws for (Transformer) language models; performance scales as a power-law with model size, dataset size, and amount of compute used for training; trends span more than seven orders of magnitude.
Introduced GPT-3, a Tranformer model with 175 billion parameters, 10x more than any previous non-sparse language model.
Trained on Azure's AI supercomputer, training costs rumored to be over 12 million USD.
Presented evidence that the average person cannot distinguish between real or GPT-3 generated news articles that are ~500 words long.
Introduced the Convolutional vision Transformer (CvT) which has alternating layers of convolution and attention; used supervised pre-training on ImageNet-22k.
Scaled up the Conformer architecture to 1B parameters; used both unsupervised pre-training and iterative self-training.
Observed through ablative analysis that unsupervised pre-training is the key to enabling growth in model size to transfer to model performance.
Introduced the Switch Transformer architecture, a sparse Mixture of Experts model advancing the scale of language models by pre-training up to 1 trillion parameter models.
The sparsely-activated model has an outrageous number of parameters, but a constant computational cost. 1T parameter model was distilled (shrunk) by 99% while retaining 30% of the performance benefit of the larger model. Findings were consistent with [5].
Applied Transformer based NLP models to classify & predict properties of protein structure for a given amino acid sequence, using supercomputers at Oak Ridge National Laboratory.
Proved that unsupervised pre-training captured useful features; used learned representation as input to small CNN/FNN models, yielding results challenging state of the art methods, notably without using multiple sequence alignment (MSA) and evolutionary information (EI) as input.
Highlighted a remarkable trend across an immense diversity of protein LMs and corpus: performance on downstream supervised tasks increased with the number of samples presented during unsupervised pre-training.
Genuine question: If I am starting a Python project NOW, which one do I use? I have been using pipenv for quite some time and it works great but locking speed has been problematic, specially after your project grows large enough (minutes waiting for it to lock without any progress warning at all).
Should I just upgrade to Poetry or should I just dive headfirst into PDM? Keep myself at Pipenv? I'm at a loss.
It's a bit dated (covers DAGISel rather than GlobalISel) but it gives a thorough introduction.
2. LLVM Developer Meeting tutorials
These are really good although you'll have to put them in order yourself. They will be out of date, a little. LLVM is a moving target. Also, you don't have to go through every tutorial. For example, MLIR is not for me.
3. LLVM documentation
I spent less time reading this than going through the Developer Meeting tutorials. I generally use it as a reference.
If you're doing a backend, you will need a place to start. The LLVM documentation points you to the horribly out of date SPARC backend. Don't even touch that. AArch64 and x86 are very full featured and thus very complex (100 kloc+). Don't use those either. RISC-V is ok but concerns itself mostly with supporting new RISC-V features rather than keeping up to date with LLVM compiler services. Don't use that either although definitely work through Alex Bradbury's RISC-V backend tutorials. Read the Mips backend. It is actively maintained. It has good GlobalISel support almost on par with the flagship AArch64 and x86 backends.
That's right. Starting today as a PhD student in deep learning is career suicide, even if it may look like this is the thing to do. The number of papers put on arxiv each month must number in the thousands and the top machine learning conferences are so awfully crowded it's impossible to get a paper through.
From my point of view and much like you say, the interesting, groundbreaking work has moved outside strict deep learning research. I mean, I sure would think so, but here's the website of the International Joint Conference on Learning and Reasoning, that brings together a bunch of disparate neurosymbolic and symbolic machine learning communities for the first time:
This is an active field of research with plenty of space for new entrants and full of intersting problems to solve and virgin territory to be the first to explore. I'm hoping we'll soon see an influx of eager and knowledgeable new graduates disappointed with the state of machine learning research and willing to do the real hard work that needs to be done for progress to begin again.
What are some ways I can increase my knowledge in this domain that the OP is very skilled at, meaning low level OS development?
I've taken an intro to OS class and am currently going through Linux From Scratch [1], which is interesting and is teaching me a lot, but it's more about how to setup a Linux distro using existing packages and not really about reading/writing the code involved.
Great article, I was nodding my head with almost every point! Although I think you also have to consider that the author has a strong math background, which both improves and warps your view of programming. If you don't know any math, then this post is probably less relevant to you (e.g. learning a bit Haskell or ML is probably a good idea in that case).
------
As a nitpick about the theory of automata, I would draw a big red line between regular languages and context free grammars.
Regular languages have very useful engineering properties: they have predictable (linear) performance and give you "free" lookahead. (That is, nondeterminism is a math-y idea that many programmers are not comfortable with, but it's useful for engineering.)
They are better than Perl-style regexes for building reliable and understandable systems, which "blow up" on a regular basis.
In contrast, context-free grammars have almost no useful engineering properties by themselves. (There are subsets like LALR(1) that do, but they come with a bunch of tradeoffs.) As the article mentions, learning about recursive descent parsing first is probably more practical.
Also, you're MUCH more likely to encounter a regular expression in real code than a context free grammar. In 15 years of professional programming I probably dealt with regexes on a weekly or monthly basis, but had to write or modify a grammar exactly zero times. It does help to be more familiar with regexes and regular languages.
This is a specific reference on how constraints model contact between rigid bodies https://box2d.org/files/ErinCatto_UnderstandingConstraints_G...
Most games since Half Life 2 use constraint forces like this to solve collisions. Springs/penalty forces are still used sometimes in commercial physics solvers since they're easier to couple with other simulations, but they require many small timesteps to ensure convergence.