This is a naive answer and neglects (1) models for housing loans, insurance, etc. are trained on historical data. (2) Policy (in the USA) at the time of data collection, and therefore currently was racist (this is not subjective).
Historically, at least in the United States, loans were unavailable to PoC, especially black Americans. Districts in many American cities were "redlined"[1], that is to say certain districts of cities were deemed "unprofitable" for banks. Redlining was policy and was targeted towards discriminating against black Americans, who were often victims to predatory loans with unserviceable interest rates, which commonly resulted in defaults. The defaults caused worse scores for people in the neighborhood and created a positive feedback loop. Historical housing loan data (and insurance data) includes this data. People today are affected by this historical data and this absolutely must be taken into account by the developers of these models.
Because of the entangled nature of real data, dropping "race" as a feature for training wouldn't solve the problem. Factors like zip codes (think of the redlined districts) would also influence the outcome[2].
Creating models, whose output can impact people so profoundly, e.g. can Jane get health insurance, calls for more reflection than just "numbers correlate!".
There isn't an "easy" solution, but step 0 is recognizing that there is an historic problem that is being dragged with us into the future because of the way our current systems work.
Political solutions are necessary, maybe something like subsidized loans for people from formerly redlined communities to purchase and restore homes, or start businesses. Urban planning projects, like increasing mixed-zoning, pedestrian traffic, and good public transportation would help keep money in the neighborhood. Then there is the question is how to deal with gentrification and increase quality of living in a community without displacing people from that community. It takes a team of experts from various fields, the community itself, and goal-oriented cooperation.
Historically, at least in the United States, loans were unavailable to PoC, especially black Americans. Districts in many American cities were "redlined"[1], that is to say certain districts of cities were deemed "unprofitable" for banks. Redlining was policy and was targeted towards discriminating against black Americans, who were often victims to predatory loans with unserviceable interest rates, which commonly resulted in defaults. The defaults caused worse scores for people in the neighborhood and created a positive feedback loop. Historical housing loan data (and insurance data) includes this data. People today are affected by this historical data and this absolutely must be taken into account by the developers of these models.
Because of the entangled nature of real data, dropping "race" as a feature for training wouldn't solve the problem. Factors like zip codes (think of the redlined districts) would also influence the outcome[2].
Creating models, whose output can impact people so profoundly, e.g. can Jane get health insurance, calls for more reflection than just "numbers correlate!".
[1]: https://www.smithsonianmag.com/history/how-federal-governmen...
[2]: https://www.educative.io/blog/racial-bias-machine-learning-a...