johnt15's comments

johnt15 · on Feb 7, 2019

It's interesting that the roadmap of volt has been saying v1.0 is just around the corner for the past half year. The other roadmap items also don't change much.

https://web.archive.org/web/20180615121501/https://volt.ws/

https://web.archive.org/web/20181023093131/https://volt.ws/

It would be great if the roadmap contained realistic items. Once a user is burned by an unmet expectation he won't believe anything else on the website.

amedvednikov · on Feb 7, 2019

Yes, I've done a terrible job with estimations and for 9 months I lived in "release tomorrow" mode.

I made lots of mistakes that caused the delay, I'll post a detailed blog about it.

Should have sticked to "it's ready when it's ready".

Ironically this time it it really is going to be released tomorrow (Feb 7).

adamrezich · on Feb 7, 2019

please don't take this the wrong way but I'm almost more excited to read blog posts about your process and what you've learned than I am for the eventual product, or language used to create the product (though I am excited for both of those!). reading experiences people had trying crazy new stuff is more interesting than results of trying crazy new stuff imo

User23 · on Feb 7, 2019

When you figure out how to do a good job with estimations that can be another post, because I still haven't figured that out. It's way easier to reason about programming language semantics than to guess how long a reference implementation will take.

amedvednikov · on Feb 7, 2019

macOS version has just been released. Windows version will be released later today.

thecupisblue · on Feb 7, 2019

Yesterday, I was in my Applications folder and deleted an old version with an "ahh too bad this never lived". Now, a new story. Thanks, can't wait to read more about it!!

johnt15 · on Feb 2, 2019

Are there any routers supported by OpenWrt that can handle 1Gbps WAN to LAN? My current router only pulls 150Mbps and I want to keep OpenWrt due to the amount of custom network configuration that has been done.

tyingq · on Feb 2, 2019

EdgeRouter gets pretty close running OpenWrt. And hits your mark if you keep the stock firmware. No wifi though. https://an.undulating.space/post/180927-er_alternate_firmwar...

Maybe the Turris Omnia if you need WiFi and aren't worried about cost: https://omnia.turris.cz/en/

The NetGear R7800 would be somewhere between the two in cost, and reportedly does GB for wired, somewhat less over WiFi. Pretty easy to find used ones for $100USD too.

johnt15 · on Feb 3, 2019

Looks like EdgeRouter X fits my requirements and is pretty cheap (50Eur). Thank you!

theandrewbailey · on Feb 3, 2019

Try looking at the fast path module.[0] Your router might have some kind of Qualcomm NAT accelerator chip[1] that isn't natively supported/distributed by OpenWRT.[2] You'll have to compile your own build of OpenWRT, so it might be a bit difficult to get going. I haven't needed to do it, because my WAN connection is only 50 Mbps.

[0] https://github.com/gwlim/openwrt-sfe-flowoffload

[1] https://forum.openwrt.org/t/qualcomm-fast-path-for-lede/4582

[2] like mine https://openwrt.org/toh/wd/n750

bubblethink · on Feb 2, 2019

What custom config ? If you need sqm with gigabit, x86 would be a safe bet. If you want just raw throughput without additional processing, others will do that fine.

tapper82 · on Feb 2, 2019

WRT3200acm Or build a x86 box.

paulcarroty · on Feb 2, 2019

Check Netgear devices.

johnt15 · on Aug 15, 2018

It works fine on OSX 10.12.6. You need the following customizations (not sure what's in OSX-KVM already):

- tell explicitly it's a "Penryl" CPU (<model fallback='allow'>Penryl</model>)

- force AES instructions in order to use encryption effectively (<feature policy='require' name='aes'/>)

- explicitly define topology (<topology sockets='1' cores='8' threads='2'/>)

- use usb-tablet (<input type='tablet' bus='usb'/>) for much more convenient mouse input that does not lock to window. Initial setup may need to be done with usb mouse (<input type='mouse' bus='usb'/>)

All of the above need to be reflected in QEMU command line.

I've been using this setup for last half year without issues (mostly heavy compiling).

I'm looking forward porting this setup to 32-core Threadripper. Would be a hell of a beast that outperforms Apple HW that costs several times more.

johnt15 · on June 20, 2018

I can second that!

johnt15 · on June 1, 2018

> or LGPL (also requires project to be LGPL)

This is not correct. LGPL only requires that Qt is linked as shared libraries and sources of any changes to the Qt itself is provided.

johnt15 · on Sept 30, 2017

Some anecdotal data:

Due to some reasons I couldn't use contact lenses in one eye for a couple of months. I switched to using contacts in single eye for that period, as I can't stand glasses for more than one hour due to very high myopia. It turns out, single contact has enough benefits that I didn't switch back to wearing two contact lenses after the issues with the eye went away. Instead I use single contact lens alternating between eyes for the last four years. Some observations:

- The issues with contact lenses became almost nonexistent. I almost never feel dryness in the eyes.

- Perhaps surprisingly, whenever I feel dryness in the eyes, it's always the eye without contact lens.

- It seems that the eyes have much much higher stamina now. They never feel tired regardless of how much I abuse them. I could look into screen all day without breaks and I wouldn't feel any issues as far as eyes are concerned.

- Since I have high myopia, using single contact gives my vision very high dynamic range. The eye without a contact is almost like a microscope.

- The potential wear of the cornea is reduced twice.

I can also attest that the brand of the contacts matters a lot. I remember several brands of contacts being really uncomfortable to use. Currently I wear 'Biofinity XR' if anyone cares. There were several other brands that I liked, but I don't remember them.

johnt15 · on Aug 18, 2017

Any large project needs a build system with complex data structures to express all the dependencies just due to sheer amount of them. Add various types of source generation, tooling, non-trivial build steps and simple build systems quickly become infeasible.

johnt15 · on Dec 19, 2016

It's possible to work around the restriction easily. This tool (https://github.com/p12tic/eagle-brd-merge) can move a single Eagle board around, so whenever you want to move a component outside the allowed window you simply move the board. You can panelize the boards without any board size restriction using that tool too.

johnt15 · on May 13, 2016

It's possible to do automatically this by using a SIMD library such as libsimdpp (https://github.com/p12tic/libsimdpp). Everything is mostly abstracted away and you just write the SIMDified code once and add a CMake rule that builds the same file for several architectures, sets up dynamic dispatch and links everything together.

johnt15 · on April 3, 2016

It's much better to use any of the numerous SIMD wrappers such as libsimdpp or Vc and get various benefits for free. It's possible to target everything from SSE and NEON to AVX512 with what is essentially a single code path.

clevernickname · on April 4, 2016

Realistically the vast majority of C and C++ codebases today will never touch anything more than x86 and ARM, and I wouldn't be surprised if most never even get past x86, so I don't buy the portability argument. Portability between SSE and AVX is a better argument.

But in any case, if you're using SIMD in anger, chances are you have hard performance requirements that you really care about, and a one size fits all approach is going to leave valuable performance on the table. Whether you just have to target your own servers, or any x86 CPU made in the past 6 years, or that plus NEON-equipped ARMs, it will probably be worth the effort to duplicate the code paths, especially in comparison to the initial effort of figuring out how to vectorize your problem in the first place.

And while it's nowhere near "leftpad", if you really want an SIMD wrapper and know what you're doing, it should be well within your capabilities to write your own. Maybe not quite as spiffy as the one on github, but when I get anywhere close to assembly I find that I get more value out of doing everything from scratch and truly understanding what I'm dealing with, rather than leaving anything in someone else's hands.

cm3 · on April 4, 2016

> Realistically the vast majority of C and C++ codebases today will never touch anything more than x86 and ARM, and I wouldn't be surprised if most never even get past x86, so I don't buy the portability argument.

Just recently a Gentoo developer ported GHC to m68k and found some portability issues who fixed in the process, which benefit all architectures. This is also why OpenBSD devs are still on gcc3.

RISC and POWER are just two very modern ISAs to mention and not something you can ignore easily. We need more ISAs like in the past, not just two. It's very dangerous to limit ourselves to just ARM/x86 and diversity is a plus for writing more correct code and having more options. lowRISC is a nice fit for many things as is POWER, while of course ARM and x86 are here to stay. I'd count Nvidia's and AMD's GPUs as the other major architectures, but we don't usually deal directly at that level with GPUs. You choose the right chip for the job, just as phones select different SoCs for different use cases.

clevernickname · on April 4, 2016

The idea that compiling your code for 68000 or MIPS can reveal bugs in your code does not change the fact that x86 and ARM are pretty much the only relevant CPU architectures that all but the most entrenched of government contractors could ship a product on today or in the foreseeable future that would have any use for SIMD. If you actually have a need to do extensive SIMD optimizations (say, it could shave 5ms off your frame time in a game, or save you $XXXXXX/year in your data center), PowerPC does not enter your mind at any moment.

You see it as weeding out bugs and future proofing your code in case x86 or ARM disappears tomorrow, I see it as a load of completely wasted work and optimization opportunities.

Also lowRISC learned nearly nothing from the past 20 years of CPU architecture advancement. It is not modern, it is a naive copy of a very outdated design.

johnt15 · on April 4, 2016

By saying single code path, I don't mean single instruction stream. libsimdpp, for example, supports building same code for different instruction sets, linking into the same executable and then dispatching dynamically. Doing this by hand would mean that either:

- lots of time is wasted creating slightly different versions of code. I'm talking about e.g. AVX vs. AVX2 for floating-point code not SSE2 vs. AVX.

- micro-optimization opportunities are wasted by only coding for major revisions of the instruction set

Even when optimal performance may only be achieved via completely different approaches, the SIMD wrappers are easier to use, because they present consistent interface. Any specialized instructions may be used by simply falling back to native intrinsics.

Thus I don't see much benefit of writing SIMD code without a wrapper. The only advantage is that it's harder to shoot oneself into the foot with naive use of these wrappers, e.g. if one doesn't actually look into the generated assembly code.

clevernickname · on April 4, 2016

Yeah, I understood what you meant, I've used wrappers like that before. My contention was with your original comment,

>It's possible to target everything from SSE and NEON to AVX512 with what is essentially a single code path.

the practice of which does not generally make the best usage of any particular instruction set, emulating certain operations that aren't available on a platform with multiple instructions, etc. It might be good enough for many light optimization jobs, in which case I'd say go for it, you're doing so much better than the vast majority of programmers writing Python or whatever. But what I was trying to argue was that if you really need to crunch the hell out of some numbers, then you probably have a small set of target platforms that you can justify directly using intrinsics (or even assembly) for.

This claim, however:

>I'm talking about e.g. AVX vs. AVX2 for floating-point code not SSE2 vs. AVX.

is a lot more reasonable, but you could do the same with some strategically placed #ifdefs with native intrinsics or assembly.

wmu · on April 4, 2016

Not sure about "single code path". Differences amid SIMD flavors are significant, there are cases when translation one-to-one is either impossible or unpractical. A bright example are AVX2 instructions operating on 128-bit lanes rather whole 256-bit registers.

And wrappers exists in the C++ ecosystem, C programmers are stuck to intrinsics.

exDM69 · on April 4, 2016

> And wrappers exists in the C++ ecosystem, C programmers are stuck to intrinsics.

If you can accept working with GNU extensions that are available in recent-ish GCC and Clang (but not MSVC, not sure about Intel ICC), there are pretty nice vector extensions [0].

With them you can get standard binary operators working for arithmetic (+,-,*,/ etc) and shuffling with __builtin_shuffle. These are CPU independent, the same code compiles neatly to ARM NEON as well as x86 SSE+AVX+FMA. All you need is a typedef with an __attribute__.

The vector extension functions don't cover the whole instruction sets but the vector types are compatible with _mm128 and NEON native formats so you can resort to intrinsics when necessary.

However, for a lot of SIMD tasks I encounter, just basic arithmetic + shuffles is more than 80% of what I need.

If you want to see some examples, take a look at my collection of 3d graphics and physics related SIMD routines [1]. (note: this project could use some help, let me know if you're interested in doing something with it or porting some of the hand optimized routines to more used math libs like glm)

[0] https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html#Ve... [1] https://github.com/rikusalminen/threedee-simd

wmu · on April 5, 2016

> If you can accept working with GNU extensions that are available in recent-ish GCC and Clang

I do my private project in C++ so it's not a case, but at my current company we use also MSVC. I wish we could abandon that compiler and work with GCC or clang only.

> However, for a lot of SIMD tasks I encounter, just basic arithmetic + shuffles is more than 80% of what I need.

Your remaining 20% is my 80%. :)

exDM69 · on April 5, 2016

> ... but at my current company we use also MSVC. I wish we could abandon that compiler and work with GCC or clang only.

Good news! These days you can produce MSVC compatible binaries with Clang or even use Clang as a compiler from the C++ IDE.

Whether or not you can do this in practice is another matter, but it can be done.

> Your remaining 20% is my 80%. :)

Yeah, if you look at my examples, they're rather straightforward arithmetic with 4 dimensional vectors. There's very little need for any integer arithmetic or more exotic combinations of operations. A little fused multiply-and-add here and there.

But I haven't seen a better method for this, most of the code is CPU-agnostic and will compile to x86 or ARM code using all the available instruction sets (depending on compiler arguments, e.g. -mavx2 or -march=native). I really haven't seen a SIMD math lib with so little duplication for different CPUs elsewhere.

johnt15 · on April 4, 2016

The property of AVX and AVX2 you mentioned actually helps having single code path. If the SIMD wrapper allows parameterization on vector width (most do that), you can simply increase vector width when compiling for AVX and that's it.

wmu · on April 4, 2016

I understand you point, however it not as simple as it seems. Of course, for trivial code transition between different SIMD flavors could be seamless. But the world is cruel. :)

Think about shuffling instructions (pshufb), lookup vector for the instruction are different in AVX2 and SSE. Even if an AVX2 vector could be created by cloning SSE vector twice, this must be a programmer decision.

Another example is algorithm using video-encoding instruction mpsadbw to locate substrings (http://0x80.pl/articles/sse4_substring_locate.html#introduct...). AVX2 instruction vmpsadw operates on 128-bit lanes and the algorithm have to be rewritten in some parts to align with this limitation.

andrewf · on April 4, 2016

Would you be able to point me towards a shipping product/library that does this? It's easy to find examples of people hardcoding x64 assembly (x264, zlib, libyuv) but I haven't stumbled across anybody making good use of a high level wrapper.

johnt15 · on April 4, 2016

There is entire high-lever scientific computing framework built using a SIMD wrapper: https://github.com/jfalcou/nt2.

Though I must note in this case the SIMD wrapper has significant problems. Due certain design decisions the wrapper performs suboptimally on mixed float-integer code on AVX for example.

speps · on April 4, 2016

Mentioned just in the parent, here is the link : https://github.com/p12tic/libsimdpp

Reaching 2.0 very soon (in RC phase right now), with support for VS which was lacking before.

jmfisch · on April 4, 2016

Although it's way more than an SSE wrapper, the Eigen library is excellent in my experience and targets multiple platforms.

http://eigen.tuxfamily.org/index.php?title=Main_Page

Ono-Sendai · on April 4, 2016

I had a look at the matrix*vector multiplication code for Eigen once and it was rubbish.

drmpeg · on April 4, 2016

https://github.com/gnuradio/volk

nivertech · on April 6, 2016

Why to use wrapper libraries when you can use OpenCL for CPU compute device?