My learners have been using Plot.ly for a week, and have asked me a ton of questions on how to do certain things with their data. I wanted to add details to my last post on Plot.ly v. JMP and tell you the decision I made regarding the issue. All of the questions I have below are actual questions / issues my learners ran into using Plot.ly.
Issue 1. How to add % totals to the columns of data in a graph?
One group of learners had a beautiful graph made in Plot.ly. It was nice, communicated well, but had lots of information in it. They wanted to put the % of each column in the graph to make it more informative.
In other words, they had this ……….and wanted this. (the reason for the arrow in a sec)
Yes, these are JMP graphs. Why? Because after an hour of looking, I could not find a way to have Plot.ly do it. Their help is silent on this issue, and I looked through a whole bunch of graphs shared on their website and found not a single one to do that.
As far as JMP, it took two clicks. I can’t show the menu because it is a drop down and as I tried to screen cap, it went away. You click the red triangle I pointed to, hover over to “Histogram Options,” and click on “Show percents.” If you want to “Show counts,” you can do that too. One or both! Two clicks. This was incredibly simple to do in JMP, incredibly difficult in Plot.ly.
Issue 2: Chi-Square test
I already dealt with the fact that Plot.ly calls graphs that use categorical information histograms in my last post. This has caused so. much. confusion.
But now my learners are trying to do the statistics for their data and see if there are significant differences in their samples. They are trying to DO statistical inferences. If their data is quantitative, they can do a t-test easily. Well, they can do a two sample t-test easily. They cannot do a one sample t-test or a matched pair t-test. They cannot do a z-test in Plot.ly, and as it turns out, you cannot do a Chi-Square test in Plot.ly unless you already have the summary counts.
Really? I can do the “histogram” to get the counts, but I cannot import those counts into the table to do the Chi-square? It won’t count the instances of words to count them for the test?
For example, if the learners data looks like this:
Plot.ly will do a histogram for it and tell me what percent or what counts there are for Gender and AP/Honors.
If I want a Chi-Square test for these two columns, the only way I could make it work was to look at the graph of counts, write down the information into a two-way table, and enter the counts as a matrix in the graphing calculator.
To do the same thing in JMP, we do the following steps:
1. Go to Analyze, Fit y by x
2. Click on OK. That’s it. The output contains the following:
A mosaic plot of the graph which is nothing more than a stacked bar chart, except the width of each column is proportional to the total number of things in the column.
Next, we get the contingency table. If I click the red triangle, I can choose other values to include or exclude from the table.
Finally, the Chi-Square test p-value.
That was around 6 clicks, instead of making the graph, counting from the graph and writing a table, and then inputting the table to the calculator.
Issue 3: separating data by a response
The group who was doing the AP/Honors and work in Issue 2 had another problem. They asked for GPA and the number of hours you worked. But they needed the mean GPA of only those in AP/Honors and those not in AP/Honors, as well as the number of hours worked.
Plot.ly will give us the total 1 variable stats for the column of hours worked, but it will not give it to us in two groups of Y/N based on type of classes taken. It will not do it.
Enter JMP. 6 clicks. Analyze, Distribution, put the variable where you want them, OK.
That’s it. You get a 1 variable stats for those who are in AP/Honors, and a separate 1 variable stats for those not in AP/Honors. Doing a two sample t-test is simple and easy once this information is obtained. This is not information Plot.ly can give us.
Issue 4: Linear Regression t-test
Last issue, and then I will stop. I have several learners doing quantitative projects that lend themselves to linear regressions and linear regression t-tests.
Plot.ly makes beautiful scatterplots. You can adjust the axis, overlay the regression line, insert the equation into the graph, etc. They are pretty.
But, if you want a residual plot. No go. If you want to reinforce the statistics of y=a + bx. No go.
This is what it looks like in Plot.ly.
You have y=mx + b from algebra, you cannot do residuals, and you CANNOT do a linreg t-test.
In JMP, it looks like this:
5 clicks, Analyze, Fit Y by X, put the variables in the correct spots, and hit OK. Notice this is the exact same dialogue box you use for categorical data. JMP uses the same path for different types of data, but tells you in the bottom left corner HOW it will act on your data.
You get output that looks like this:
If you want the residual plot, hit the red triangle next to “Linear Fit” and show residual plot. That easy.
Although I fully understand that every single complaint I have had with Plot.ly can be solved by learning the programming language and learning to program the software, I don’t think I can ask high school learners, in the last 4 weeks of class, to learn it so they can do a project on statistics. Honestly, I don’t want to take the time to learn the programming language of Plot.ly so that I can do it for them, either.
Plot.ly makes BEAUTIFUL graphs. It is a powerful platform to show connections between quantitative data sets. But, it does a so-so to bad job on statistics.
JMP makes graphs that may not be beautiful, but the statistics is primary to the operation of the program and makes doing the statistics easy. I think without some major changes to Plot.ly to work towards the statistics side instead of the data representation side I will go back to using JMP next year.
It was just too difficult to teach the way Plot.ly handles or mishandles the stats.