Just thinking out loud here... It seems to me that if you wanted to root out sen...

rossdavidh · on Sept 22, 2018

I think I would agree. You otherwise run the risk of having fixed the metric ("Italian" vs. "Mexican", "Chad" vs. "Shaniqua", etc.) without actually fixing the underlying issue.

Also, regarding black/white etc., there might legitimately be words which have so many different meanings (whether race-related or not) that you should just exclude them from sentiment analysis. "Right" can mean like "human rights", "right thing to do", or "not left". Probably plenty of other words like that. You might do better to have a list of 100-200 words that are just excluded because of issues like that.

taneq · on Sept 22, 2018

> there might legitimately be words which have so many different meanings

I haven't studied word embeddings past the pop-sci level but wouldn't such words form multiple clusters in the embedding space? I would have thought it would be relatively easy to get different 'words' for 'right (entitlement)', 'right (direction)', etc?

Edit: Nibling post answers this question.

acpetrov · on Sept 22, 2018

Would it be worth trying to think of words with different meanings as entirely new words? So, "white" in one sentence may be a different word than "white" in another?

visarga · on Sept 22, 2018

There's a long list of papers on that - 'multi-sense word embeddings'. But more recently we have found that passing the raw character embeddings through a two layer BiLSTM will resolve the ambiguity of meaning from context - 'ElMO'.

https://arxiv.org/abs/1802.05365 (state of the art)

mattkrause · on Sept 22, 2018

Does “a dark black alley” have a sentiment at all?

I would argue that it’s pragmatically associated with bad things (e.g., being mugged, overcrowded areas) but it’s not intrinsically bad (or good) itself.

grandmczeb · on Sept 22, 2018

> associated with bad things

Is that not what's meant by sentiment?

mattkrause · on Sept 22, 2018

My intuition is that word-level sentiment is rather pointless. “The Disaster Artist was not bad” has a positive sentiment overall, but each of the individual words, except possibly ‘artist’, have are usually thought to be negative. Moreover, you can totally flip the overall sentiment by adding another neutralish word “The Disaster Artist was not even bad.”

Similarly, my guess is that alley is rarely found in a positive context, but the actual sentiment comes from elsewhere in the utterance.

TheCoelacanth · on Sept 22, 2018

Word-level sentiment is like spherical cows in a vacuum in physics. Everyone knows its an extremely flawed model, but it produces good results in a lot of scenarios, so it will inevitably be used because it also has the enormous benefit of simplicity.

monochromatic · on Sept 22, 2018

This article is about a simple model. Within that model, it absolutely makes sense for “dark black alley” to get a negative score.

mattkrause · on Sept 22, 2018

It certainly gets a sentiment score, but whether that score is in any way meaningful or corresponds to actual human sentiment is important. Otherwise, you’re just playing stupid games, and winning stupid prizes...though I suppose just stupid is a step up from stupid and racist.