Skip to content

Feature request: stat transform to filter for outliers #3855

Open
@stewmehr

Description

@stewmehr

The discussion thread in an old issue (#3148) mentioned the possibility of introducing a transform object to filter for outliers. I have not fully grasped (yet) the inner workings of the existing transform objects but would like to start out very simple, e.g. with a Filter() object that takes pairs of absolute or quantile values as input. Ideally, this filter can then be toggled to include or exclude data that matches.

Does this fit into the current seaborn landscape at all? Maybe it is possible to extend the functionality of Perc or Stat?

Background: In my quest to go full so.Plot I am currently trying to recreate sns.boxplot() with seaborn objects. My current attempt has a few very rough edges (feedback always welcome) but does what I want it to, apart form plotting the outliers:

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns
import seaborn.objects as so

rng = np.random.default_rng(seed=2)
data = pd.DataFrame(list(zip(np.concatenate((rng.integers(5, 20, size=10),
                                             rng.integers(50, 100, size=100),
                                             rng.integers(130, 145, size=10))),
                             rng.integers(0, 2, size=120))), columns=list('xy'))

data["y"] = data["y"].astype("category")

fig, axs = plt.subplots(2,1,sharex=True)

sns.boxplot(x="x",
            y="y",
            hue="y",
            data=data,
            ax=axs[0],
            showcaps=False,
            saturation=1,
            legend=False)

(
    so.Plot(
        data=data,
        x="x",
        y="y",
        color="y")
    .add(
        so.Range(color="black"),
        so.Est(errorbar=lambda x: [
            min([e for e in x if e >= x.quantile(.25) - 1.5*(x.quantile(.75) - x.quantile(.25))]),
            max([e for e in x if e <= x.quantile(.75) + 1.5*(x.quantile(.75) - x.quantile(.25))])]),
        legend=False)
    .add(
        so.Range(artist_kws={"capstyle": "butt"},linewidth=40),
        so.Perc([25,75]),
        legend=False)
    .add(
        so.Dash(width=0.55, color="black"),
        so.Perc([50]),
        legend=False)
    .scale(
        color=sns.color_palette(n_colors=2))
    .on(axs[1])
    .layout(engine="tight")
    .plot()
)

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions