Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"xbins" on Histogram not working Correctly #1978

Closed
nyan314sn opened this issue Sep 1, 2017 · 1 comment
Closed

"xbins" on Histogram not working Correctly #1978

nyan314sn opened this issue Sep 1, 2017 · 1 comment
Labels
bug something broken

Comments

@nyan314sn
Copy link

Consider these 3 data series, where each of which I want to plot.
"2017M06" : 0.041601, 0.041601, 0.041601, 0.041601
"2017M07" : 0.032993, 0.037393, -0.00078 , 0.0289
"2017M08" : 0.035036, 0.00589 , 0.021899, 0.047374

Please see the following Python code.

step_in_between=0.01
starting_value=-0.2
ending_value=0.1
iplot(

{'data': [
            
{'histfunc': 'count',
   'histnorm': 'probability',
   'marker': {'color': 'rgba(255, 153, 51, 1.0)',
    'line': {'color': '#4D5663', 'width': 3}},
   'name': '2017M06',

   'opacity': 0.8,
   'orientation': 'v',
   'type': 'histogram',
   'x': [0.041601,  0.041601,  0.041601,  0.041601],
    'xbins': {'end':ending_value, 'size':step_in_between, 'start': starting_value}
    },
  {'histfunc': 'count',
   'histnorm': 'probability',
   'marker': {'color': 'rgba(55, 128, 191, 1.0)',
    'line': {'color': '#4D5663', 'width': 3}},
   'name': '2017M07',
   'opacity': 0.8,
   'orientation': 'v',
   'type': 'histogram',
   'x': [0.032993,  0.037393, -0.00078 ,  0.0289  ],

    'xbins': {'end':ending_value, 'size':step_in_between, 'start': starting_value}

  },
  {'histfunc': 'count',
   'histnorm': 'probability',
   'marker': {'color': 'rgba(50, 171, 96, 1.0)',
    'line': {'color': '#4D5663', 'width': 3}},
   'name': '2017M08',
   'opacity': 0.8,
   'orientation': 'v',
   'type': 'histogram',
   'x': [0.035036,  0.00589 ,  0.021899,  0.047374],
    'xbins': {'end':ending_value, 'size':step_in_between, 'start': starting_value}
  }
         
         ],
 'layout': {'barmode': 'overlay',
  }}
)

This is the result.
newplot

For whatever reason, the starting value "-0.2 " is not respected while the ending value "0.1" is respected. So, that is problem 1.

We have another interesting issue issue when I commented out "xbins" parameter.

step_in_between=0.01
starting_value=-0.2
ending_value=0.1
iplot(

{'data': [
            
{'histfunc': 'count',
   'histnorm': 'probability',
   'marker': {'color': 'rgba(255, 153, 51, 1.0)',
    'line': {'color': '#4D5663', 'width': 3}},
   'name': '2017M06',

   'opacity': 0.8,
   'orientation': 'v',
   'type': 'histogram',
   'x': [0.041601,  0.041601,  0.041601,  0.041601],
    #'xbins': {'end':ending_value, 'size':step_in_between, 'start': starting_value}
    },
  {'histfunc': 'count',
   'histnorm': 'probability',
   'marker': {'color': 'rgba(55, 128, 191, 1.0)',
    'line': {'color': '#4D5663', 'width': 3}},
   'name': '2017M07',
   'opacity': 0.8,
   'orientation': 'v',
   'type': 'histogram',
   'x': [0.032993,  0.037393, -0.00078 ,  0.0289  ],

    #'xbins': {'end':ending_value, 'size':step_in_between, 'start': starting_value}

  },
  {'histfunc': 'count',
   'histnorm': 'probability',
   'marker': {'color': 'rgba(50, 171, 96, 1.0)',
    'line': {'color': '#4D5663', 'width': 3}},
   'name': '2017M08',
   'opacity': 0.8,
   'orientation': 'v',
   'type': 'histogram',
   'x': [0.035036,  0.00589 ,  0.021899,  0.047374],
   # 'xbins': {'end':ending_value, 'size':step_in_between, 'start': starting_value}
  }
         
         ],
 'layout': {'barmode': 'overlay',
  }}
)

This is the result.
newplot 1

This is clearly very misleading. The default xbins is messed up where there is no variation in the data. In this particular example, the xbins for "2017M06" is literally 20 times more than the other two series. It can easily be misinterpreted as any value between 0 and 1 is equally likely for 2017M06. This is problem #2.

I attempted to fix it using 'nbinsx" parameter. That does not work as well.

@alexcjohnson
Copy link
Collaborator

Thanks for the detailed report @flyingBurman

Problem 1 does look like a bug - it seems to also be related to the uniform data in 2017M06, and I can confirm it in pure javascript. Note that what I would consider the "correct" behavior here is for all zeros to be trimmed from the edges, so the automatic x axis range would be [-0.01, 0.05] in order to give the largest possible view of the data you provided. If you want the x axis range to be the full allowed bin range instead, just specify it explicitly:

'xaxis': {'range': [starting_value, ending_value]}

Problem 2 is a bit trickier. For other modes we recently fixed this issue (#1944) but when you use barmode: 'overlay' we don't want the separate traces to interact with each other, so for each trace we decide on automatic bin width & offset purely based on the data in that trace. This is based on the use case of very different distribution widths, where forcing bin widths to match could result in unduly small or sparse bars in the wider distribution, or if we went with the wider bins, a loss of detail in the narrower distribution.

So with that in mind, what do we do when we see a distribution with no width at all? There's nothing to base our bin width estimate on (which is also why nbinsx can't help you), so we just fall back on a bin width of 1. I suppose when there are multiple traces we could have the fallback be the smallest value chosen by any of the other overlaying histograms... it's a bit complicated for what's presumably a fairly rare case, but I suspect the machinery to do it is already present from #1944 so I'm happy to give it a shot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something broken
Projects
None yet
Development

No branches or pull requests

2 participants