Description
The discussion thread in an old issue (#3148) mentioned the possibility of introducing a transform object to filter for outliers. I have not fully grasped (yet) the inner workings of the existing transform objects but would like to start out very simple, e.g. with a Filter()
object that takes pairs of absolute or quantile values as input. Ideally, this filter can then be toggled to include or exclude data that matches.
Does this fit into the current seaborn landscape at all? Maybe it is possible to extend the functionality of Perc
or Stat
?
Background: In my quest to go full so.Plot
I am currently trying to recreate sns.boxplot()
with seaborn objects. My current attempt has a few very rough edges (feedback always welcome) but does what I want it to, apart form plotting the outliers:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import seaborn.objects as so
rng = np.random.default_rng(seed=2)
data = pd.DataFrame(list(zip(np.concatenate((rng.integers(5, 20, size=10),
rng.integers(50, 100, size=100),
rng.integers(130, 145, size=10))),
rng.integers(0, 2, size=120))), columns=list('xy'))
data["y"] = data["y"].astype("category")
fig, axs = plt.subplots(2,1,sharex=True)
sns.boxplot(x="x",
y="y",
hue="y",
data=data,
ax=axs[0],
showcaps=False,
saturation=1,
legend=False)
(
so.Plot(
data=data,
x="x",
y="y",
color="y")
.add(
so.Range(color="black"),
so.Est(errorbar=lambda x: [
min([e for e in x if e >= x.quantile(.25) - 1.5*(x.quantile(.75) - x.quantile(.25))]),
max([e for e in x if e <= x.quantile(.75) + 1.5*(x.quantile(.75) - x.quantile(.25))])]),
legend=False)
.add(
so.Range(artist_kws={"capstyle": "butt"},linewidth=40),
so.Perc([25,75]),
legend=False)
.add(
so.Dash(width=0.55, color="black"),
so.Perc([50]),
legend=False)
.scale(
color=sns.color_palette(n_colors=2))
.on(axs[1])
.layout(engine="tight")
.plot()
)