Background: I work at Block/Square, on the team that owns (but didn't build) our...

zellyn · on Sept 22, 2023

One more update. I spent a little time the other day trying to find all the feature flag products I could. I'm sure I missed a ton. Let me know in the comments!

LaunchDarkly Split Apptimize CloudBees ConfigCat DevCycle FeatBit FeatureHub Flagsmith Flipper Flipt GrowthBook Harness Molasses OpenFeature Posthog Rollout Unleash

Here's my first draft of the questions you'd want to ask about any given solution:

    Questionnaire
    
    - Does it seem to be primarily proprietary, primarily open-source, or “open core” (parts open source, enterprise features proprietary)?
      - If it’s open core or open source with a service offering, can you run it completely on your own for free?
    - Does it look “serious/mature”?
      - Lots of language SDKs
      - High-profile, high-scale users
      - Can you do rules with arbitrary attributes or is it just on/off or on/off with overrides?
    - Can it do complex rules?
    - How many language SDKs (one, a few, lots)
    - Do feature flags appear to be the primary purpose of this company/project?
      - If not, does it look like feature flags are a first-class offering, or an afterthought / checkbox-filler? (eg. split.io started out in experimentation, and then later introduced free feature flag functionality. I think it’s a first-class feature now.)
    - Does it allow approval workflows?
    - What is the basic architecture?
      - Are flags evaluated in-memory, locally? (Hopefully!)
      - Is there a relay/proxy you can run in your own environment?
      - How are changes propagated?
        - Polling?
        - Streaming?
      - Does each app retrieve/stream all the flags in a project, or just the ones they use?
      - What happens if their website goes down?
    - Do they do experiments too?
      - As a first-class offering?
    - Are there ACLs and groups/roles?
      - Can they be synced from your own source of truth?
    - Do they have a solution for mobile and web apps?
      - If so, what is the pricing model?
      - Do they have a mobile relay type product you can run yourself?
    - What is the pricing model?
      - Per developer?
      - Per end-user? MAU?

konradlekko · on Sept 22, 2023

A few more: https://featurevisor.com/ https://configcat.com/

I will toss our hat in the ring but we are early in this space! https://lekko.com

blawson · on Sept 22, 2023

Togglz is another option: https://www.togglz.org/

tpetr · on Sept 23, 2023

Also https://prefab.cloud

kiitos · on Sept 23, 2023

> Are flags evaluated in-memory, locally? (Hopefully!)

This seems like a MUST rather than a SHOULD, right?

zellyn · on Sept 25, 2023

I would have thought so. But flagsmith apparently does primarily server-side eval. And even OpenFeature has `flagd`, which I guess is a sidecar, so a sort of hybrid approach.

And LaunchDarkly's Big Segments fetch segment inclusion data live from redis (although I believe they then cache it for a while).

kiitos · on Sept 27, 2023

If this is the case, then flag evaluation can't possibly be part of any kind of hot loop, right?

zellyn · on Sept 27, 2023

¯\_(ツ)_/¯

I guess see if dabeeeenster is monitoring this thread anymore and ask them?

vlovich123 · on Sept 22, 2023

Do you have the answers to that questionnaire for the services you mention?

zellyn · on Sept 22, 2023

I more or less know all the answers for LaunchDarkly (except pricing details), and for the internal feature flag service we're deprecating, but I haven't gone through and answered it for all the other offerings. It would be time-consuming, but very useful.

Also, undoubtedly contentious. If you want an amusing read, go check out LaunchDarkly's "comparison with Split" page and Split's "comparison with LaunchDarkly" page. It's especially funny when they make the exact same evaluations, but in reverse.

vijayer · on Sept 22, 2023

Could you add Statsig to your research?

daigoba66 · on Sept 22, 2023

> One best practice that I'd love to see spread (in our codebases too) is always naming the full feature flag directly in code, as a string (not a constant).

Can you elaborate on this? As a programmer, I would think that using something like a constant would help us find references and ensure all usage of the flag is removed when the constant is removed.

zellyn · on Sept 22, 2023

One of the most common things you want to do for a feature flag or metric name is ask, "Where is this used in code?". (LaunchDarkly even has a product feature that does this, called "Code References".) I suppose one layer of indirection (into a constant) doesn't hurt too much, although it certainly makes things a little trickier.

The bigger problem is when the code constructs metric and flag names programmatically:

    prefix = "framework.client.requests.http.{status%100}s"
    recordHistogram(prefix + ".latency", latency)
    recordCount(prefix + ".count", 1)

    flagName = appName + "/loadshed-percent"

    # etc...

That kind of thing makes it very hard to find references to metrics or flags. Sometimes it's impossible, or close to impossible to remove, but it's worth trying hard.

Of course, this is just, like, my opinion, man!

dgorton · on Sept 22, 2023

Agreed. Flags are a type of technical debt. Keeping them as full strings in the code encourages and facilitates cleanup.

This sort of programmatic naming is a dangerous step down a slippery slope.

athenot · on Sept 22, 2023

Not OP but multiple code bases may refer to the same flag by a different constant. Having a single string that can be searched accross all repos in an organization is quite handy to find all places where it's referenced.

joshuamorton · on Sept 23, 2023

especially when you have different languages with different rules, `MY_FEATURE_FLAG` and `kMyFeatureFlag` and `@MyFeatureFlag` might all be reasonable names for what is defined as `"my_feature_flag"` in the configuration.

Using just the string-recognizable name everywhere is...better.

grork · on Sept 22, 2023

IME searching for the name of the flag name and getting 1 result is less helpful than 15 results that directly show point-of-use.

zellyn · on Sept 22, 2023

After typing that, and realizing I have a lot more to say, I guess I should write a blog post on the subject

snorlaxmorlax · on Sept 22, 2023

You definitely should! These questions are great, and could use some appropriate context for evaluation.

ferrantim · on Sept 22, 2023

Yes please. Blog would be awesome.

gastonfournier · on Sept 22, 2023

Yes, please!

zellyn · on Sept 22, 2023

Oh, and one last(?) update.

If you create your own service to evaluate a bunch of feature flags for a given user/client/device/location/whatever and return the results, for use in mobile clients (everyone does this), PLEASE *make sure the client enumerates the list of flags it wants*. It's very tempting to just keep that list server-side, and send all the flags (much simpler requests, right?), but you will have to keep serving all those flags for all eternity because you'll never know which deployed versions of your app require which flags, and which can be removed.

[Edit: speling]

baq · on Sept 22, 2023

You should be collecting metrics on used flags and their values if you’re rolling your own. A saas offering will do that for you.

zellyn · on Sept 25, 2023

Well, it seems to be a common theme to build a server that uses the flag eval _server_ SDK to evaluate a bunch of flags and then pass them back to the client.

For example, a client may call myserver.com/mobile-flags?merchant=abcdef&device=123456&os=ios&os_version=15.2&app_version=6.1 and the server will pass back: flag1: true flag2: 39 flag3: false flag4: green

This seems to be a common theme. For example, LaunchDarkly has a mobile client SDK, but they charge by MAU, which would be untenable. So folks tend to write a proxy for the mobile apps to call. If the client (as in my example above) doesn't specify which flags it wants, then the metrics are missing, whether you're using a commercial product or your own: it'll simply tell you that all the flags got used. (Of course, you could be collecting metrics from the client apps).

But based on our experience, you'd be better of having the mobile client pass in an explicit list of desired flags. Which will give accurate metrics.

Hope that clarifies what I meant.

zellyn · on Sept 22, 2023

> I'd argue that delivering flag definitions is (relatively) easy.

I'd argue that coming up with good UI that nudges developers towards safe behavior, as well as useful and appropriate guard rails -- in other words, using the feature flag UI to reduce likelihood of breakage -- is difficult, and one of the major value propositions of feature flag services.