> Use reservoir sampling to get 100 random lines from a dictionary that doesn’t fit into memory. Give me an implementation in Python.
> Reservoir sampling is a great choice for randomly selecting a fixed number of items from a stream of unknown size, especially when the entire dataset cannot fit into memory. Here's a Python implementation for selecting 100 random lines from a large file, like a dictionary, using reservoir sampling:
```python
import random
def reservoir_sampling(file_name, k):
reservoir = []
with open(file_name, 'r') as file:
for i, line in enumerate(file):
if i < k:
# Fill the reservoir array
reservoir.append(line.strip())
else:
# Randomly replace elements in the reservoir
# with a decreasing probability.
# Choose an integer between 0 and i (inclusive)
j = random.randint(0, i)
if j < k:
reservoir[j] = line.strip()
return reservoir
# Example usage
file_path = 'path_to_your_large_file.txt' # Replace with your file path
sampled_lines = reservoir_sampling(file_path, 100)
print(sampled_lines)
```
> This script opens a file and reads through it line by line. For the first 100 lines, it adds them directly to the reservoir. For each subsequent line, it randomly decides whether to replace an existing line in the reservoir with the new line. The probability of replacing decreases as it progresses through the file, ensuring each line has an equal chance of being included in the final sample.
Macro expansion is done in the host environment, not the target environment. You can compile the macro definition to a function taking a form and an environment. This would be something you arrange for within your compiler.
CLTL2 specifies a function `parse-macro ` which is available in most implementations. Portably you can call it using the `trivial-cltl2` package:
Either you are describing something that doesn’t work for cross-compilation or you are missing the point. Common Lisp is a hard language to compile because of macros. Consider the following code you might see in bootstrapping:
(defmacro defun ...)
(defun ...)
To macroexpand the second line you must have an environment with the defun macro definition from the previous line, which means that the code to modify the environment with that definition must have been evaluated too. This must be evaluated in the (emulated) target environment because (1) the host environment would otherwise clash with it, and (2) you need to write down the target environment as a build artefact. Evaluating these environment-modifying forms requires an interpreter you can run on the host. If you want a good programming experience for your standard library then this needs to be a reasonably capable interpreter.
But you also need an interpreter to evaluate code during macroexpansion. Three reasons:
1. You must emulate the target environment for correctness, eg JavaScript only has one type of float.
2. You cannot sufficiently interact with the host environment to use its own macroexpander (if you want to be portable). This is because compiling a macrolet requires extending the lexical environment with the macro definition so that calls (from eg the expander for setf) to macroexpand can use the definition. The only portable thing on the host that can evaluate a macrolet is the compiler/interpreter
3. You probably can’t use native types for things like symbols so the built in macro expansion is not sufficient.
Unfortunately none of the FOSS implementations have this ability (to my knowledge). There is nothing inherently in Common Lisp that mandates the "core dump" delivery model.
A characteristic of intrusive data structures is generally good cache friendliness. In this context "intrusive" means that the data structure is embedded with the data that's being stores.
In C terms as in this case that means that the pointers required of individual nodes aren't allocated in separate structs and are instead embedded in one struct that also includes the payload.
This means that the cache behavior is improved, as a given node is stored in a single location. Once you access a node the associated data is already in the cache, instead of having to be fetched via a separate pointer dereference.
So now we know who is really responsible for the whole MySQL utf8mb4 fiasco -- these 2 guys sitting in a diner, conjuring up a brilliant scheme to cover 4 billions characters, which turned out to exceed the actual requirement by more than 2000x.
September 1992: 2 guys scribbling on a placemat.
January 1998: RFC 2279 defines UTF-8 to be between 1 to 6 bytes.
March 2001: A bunch of CJK characters were added to Unicode Data 3.1.0, pushing the total to 94,140, exceeding the 16-bit limit of 3 bytes UTF-8.
November 2003: RFC 3629 defines UTF-8 to be between 1 to 4 bytes.
Arguably, if the placemat was smaller and the guys stopped at 4 bytes after running out of space, perhaps MySQL would have done the right thing? Ah, who am I kidding. The same commit would likely still happen.
EDIT: Just notice this in the footnotes, and the plot thickens...
> The 4, 5, and 6 byte sequences are only there for
political reasons. I would prefer to delete these.
This is also a very simple form of using the idea of a "prefix-free code" from information theory and coding. (the codes {0,10,110,1110,11110,...,111111} is a prefix-free set).
I think there's also the idea that the code can "sync up" when it say, starts in the middle of a character.
> A two-phase exception-handling model is not strictly necessary to implement C++ language semantics, but it does provide some benefits. For example, the first phase allows an exception-handling mechanism to dismiss an exception before stack unwinding begins, which allows resumptive exception handling (correcting the exceptional condition and resuming execution at the point where it was raised). While C++ does not support resumptive exception handling, other languages do, and the two-phase model allows C++ to coexist with those languages on the stack.
I am aware that there have historically been other languages besides Lisp that allow resumptive exception handling (PL/I is a historical example), but I'm unaware of any modern language besides Lisp that does it.
What would C++ be coexisting with on non-mainframe hardware?
When I was at Qt I spent a decent amount of time porting the native rendering engine from Qt4 to Qt5. It should be available and work relatively well. I have retired from the Qt project, but it looks like the code is still there in the repo. [1] You might need to tell the configure script to build it. To use it, set the QT_XCB_NATIVE_PAINTING environment variable.
Because this way no-one has written forms of their mother tongue. They have to learn a second language and its written form. As others are pointing out, the situation is pretty much equivalent to Europe 1000 years ago when Latin had evolved into separate languages but all writing was still done in Latin. What would you prefer: being able to speak and write in French or speak in French but have to learn Latin to be able to write?
> Use reservoir sampling to get 100 random lines from a dictionary that doesn’t fit into memory. Give me an implementation in Python.
> Reservoir sampling is a great choice for randomly selecting a fixed number of items from a stream of unknown size, especially when the entire dataset cannot fit into memory. Here's a Python implementation for selecting 100 random lines from a large file, like a dictionary, using reservoir sampling:
> This script opens a file and reads through it line by line. For the first 100 lines, it adds them directly to the reservoir. For each subsequent line, it randomly decides whether to replace an existing line in the reservoir with the new line. The probability of replacing decreases as it progresses through the file, ensuring each line has an equal chance of being included in the final sample.