Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Background: I work at Block/Square, on the team that owns (but didn't build) our internal Feature Flag system, and also have a lot of experience with using LaunchDarkly.

I like the idea of caching locally, although k8s makes that a bit more difficult since containers are typically ephemeral. People will use feature flags for things that they shouldn't, so eventually "falling back go default values" will cause production problems. One thing you can do to help with this is run proxies closer to your services. For example, LaunchDarkly has an open source "Relay".

Local evaluation seems to be pretty standard at this point, although I'd argue that delivering flag definitions is (relatively) easy. One of the real value-add of a product like LaunchDarkly is all the things they can do when your applications send evaluation data upstream: unused flags, only-ever-evaluated-to-the-default flags, only-ever-evaluated-to-one-outcome flags, etc.

One best practice that I'd love to see spread (in our codebases too) is always naming the full feature flag directly in code, as a string (not a constant). I'd argue the same practice should be taken with metrics names.

One of the most useful things to know (but seldom communicated clearly near landing pages) is a basic sketch of the architecture. It's necessary to know how things will behave if there is trouble. For instance: our internal system uses ZK to store (protobuf) flag definitions, and applications set watches to be notified of changes. LaunchDarkly clients download all flags[1] in the project on connection, then stream changes.

If I were going to build a feature flag system, I would ensure that there is a global, incrementing counter that is updated every time any change is made, and make it a fundamental aspect of the design. That way, clients can cache what they've seen, and easily fetch only necessary updates. You could also imagine annotating that generation ID into W3C Baggage, and passing it through the microservices call graph to ensure evaluation at a consistent point in time (clients would need to cache history for a minute or two, of course).

One other dimension in which feature flag services vary is by the complexity of the rules they allow you to evaluate. Our internal system has a mini expression language (probably overkill). LaunchDarkly's arguably better system gives you an ordered set of rules within which conditions are ANDed together. Both allow you to pass in arbitrary contexts of key/value pairs. Many open source solutions (Unleash, last I checked, some time ago) are more limited: some of them don't let you vary on inputs, some only a small set of prescribed attributes.

I think the time is ripe for an open standard client API for feature flags. I think standardizing the communication mechanisms would be constricting, but there's no reason we couldn't create something analogous to (or even part of) the Open Telemetry client SDK for feature flags. If you are seriously interested in collaborating on that, please get in touch. (I'm "zellyn" just about everywhere)

[1] Yes, this causes problems if you have too many flags in one project. They have a pretty nice filtering solution that's almost fully ready.

[Update: edited to make 70% of it not italics ]



One more update. I spent a little time the other day trying to find all the feature flag products I could. I'm sure I missed a ton. Let me know in the comments!

LaunchDarkly Split Apptimize CloudBees ConfigCat DevCycle FeatBit FeatureHub Flagsmith Flipper Flipt GrowthBook Harness Molasses OpenFeature Posthog Rollout Unleash

Here's my first draft of the questions you'd want to ask about any given solution:

    Questionnaire
    
    - Does it seem to be primarily proprietary, primarily open-source, or “open core” (parts open source, enterprise features proprietary)?
      - If it’s open core or open source with a service offering, can you run it completely on your own for free?
    - Does it look “serious/mature”?
      - Lots of language SDKs
      - High-profile, high-scale users
      - Can you do rules with arbitrary attributes or is it just on/off or on/off with overrides?
    - Can it do complex rules?
    - How many language SDKs (one, a few, lots)
    - Do feature flags appear to be the primary purpose of this company/project?
      - If not, does it look like feature flags are a first-class offering, or an afterthought / checkbox-filler? (eg. split.io started out in experimentation, and then later introduced free feature flag functionality. I think it’s a first-class feature now.)
    - Does it allow approval workflows?
    - What is the basic architecture?
      - Are flags evaluated in-memory, locally? (Hopefully!)
      - Is there a relay/proxy you can run in your own environment?
      - How are changes propagated?
        - Polling?
        - Streaming?
      - Does each app retrieve/stream all the flags in a project, or just the ones they use?
      - What happens if their website goes down?
    - Do they do experiments too?
      - As a first-class offering?
    - Are there ACLs and groups/roles?
      - Can they be synced from your own source of truth?
    - Do they have a solution for mobile and web apps?
      - If so, what is the pricing model?
      - Do they have a mobile relay type product you can run yourself?
    - What is the pricing model?
      - Per developer?
      - Per end-user? MAU?


A few more: https://featurevisor.com/ https://configcat.com/

I will toss our hat in the ring but we are early in this space! https://lekko.com


Togglz is another option: https://www.togglz.org/



> Are flags evaluated in-memory, locally? (Hopefully!)

This seems like a MUST rather than a SHOULD, right?


I would have thought so. But flagsmith apparently does primarily server-side eval. And even OpenFeature has `flagd`, which I guess is a sidecar, so a sort of hybrid approach.

And LaunchDarkly's Big Segments fetch segment inclusion data live from redis (although I believe they then cache it for a while).


If this is the case, then flag evaluation can't possibly be part of any kind of hot loop, right?


¯\_(ツ)_/¯

I guess see if dabeeeenster is monitoring this thread anymore and ask them?


Do you have the answers to that questionnaire for the services you mention?


I more or less know all the answers for LaunchDarkly (except pricing details), and for the internal feature flag service we're deprecating, but I haven't gone through and answered it for all the other offerings. It would be time-consuming, but very useful.

Also, undoubtedly contentious. If you want an amusing read, go check out LaunchDarkly's "comparison with Split" page and Split's "comparison with LaunchDarkly" page. It's especially funny when they make the exact same evaluations, but in reverse.


Could you add Statsig to your research?


> One best practice that I'd love to see spread (in our codebases too) is always naming the full feature flag directly in code, as a string (not a constant).

Can you elaborate on this? As a programmer, I would think that using something like a constant would help us find references and ensure all usage of the flag is removed when the constant is removed.


One of the most common things you want to do for a feature flag or metric name is ask, "Where is this used in code?". (LaunchDarkly even has a product feature that does this, called "Code References".) I suppose one layer of indirection (into a constant) doesn't hurt too much, although it certainly makes things a little trickier.

The bigger problem is when the code constructs metric and flag names programmatically:

    prefix = "framework.client.requests.http.{status%100}s"
    recordHistogram(prefix + ".latency", latency)
    recordCount(prefix + ".count", 1)

    flagName = appName + "/loadshed-percent"

    # etc...
That kind of thing makes it very hard to find references to metrics or flags. Sometimes it's impossible, or close to impossible to remove, but it's worth trying hard.

Of course, this is just, like, my opinion, man!


Agreed. Flags are a type of technical debt. Keeping them as full strings in the code encourages and facilitates cleanup.

This sort of programmatic naming is a dangerous step down a slippery slope.


Not OP but multiple code bases may refer to the same flag by a different constant. Having a single string that can be searched accross all repos in an organization is quite handy to find all places where it's referenced.


especially when you have different languages with different rules, `MY_FEATURE_FLAG` and `kMyFeatureFlag` and `@MyFeatureFlag` might all be reasonable names for what is defined as `"my_feature_flag"` in the configuration.

Using just the string-recognizable name everywhere is...better.


IME searching for the name of the flag name and getting 1 result is less helpful than 15 results that directly show point-of-use.


After typing that, and realizing I have a lot more to say, I guess I should write a blog post on the subject


You definitely should! These questions are great, and could use some appropriate context for evaluation.


Yes please. Blog would be awesome.


Yes, please!


Oh, and one last(?) update.

If you create your own service to evaluate a bunch of feature flags for a given user/client/device/location/whatever and return the results, for use in mobile clients (everyone does this), PLEASE *make sure the client enumerates the list of flags it wants*. It's very tempting to just keep that list server-side, and send all the flags (much simpler requests, right?), but you will have to keep serving all those flags for all eternity because you'll never know which deployed versions of your app require which flags, and which can be removed.

[Edit: speling]


You should be collecting metrics on used flags and their values if you’re rolling your own. A saas offering will do that for you.


Well, it seems to be a common theme to build a server that uses the flag eval _server_ SDK to evaluate a bunch of flags and then pass them back to the client.

For example, a client may call myserver.com/mobile-flags?merchant=abcdef&device=123456&os=ios&os_version=15.2&app_version=6.1 and the server will pass back: flag1: true flag2: 39 flag3: false flag4: green

This seems to be a common theme. For example, LaunchDarkly has a mobile client SDK, but they charge by MAU, which would be untenable. So folks tend to write a proxy for the mobile apps to call. If the client (as in my example above) doesn't specify which flags it wants, then the metrics are missing, whether you're using a commercial product or your own: it'll simply tell you that all the flags got used. (Of course, you could be collecting metrics from the client apps).

But based on our experience, you'd be better of having the mobile client pass in an explicit list of desired flags. Which will give accurate metrics.

Hope that clarifies what I meant.


> I'd argue that delivering flag definitions is (relatively) easy.

I'd argue that coming up with good UI that nudges developers towards safe behavior, as well as useful and appropriate guard rails -- in other words, using the feature flag UI to reduce likelihood of breakage -- is difficult, and one of the major value propositions of feature flag services.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: