Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal draft: Enhance Amy workshop management tool #36

Merged
merged 3 commits into from
Mar 28, 2015
Merged

Proposal draft: Enhance Amy workshop management tool #36

merged 3 commits into from
Mar 28, 2015

Conversation

pbanaszkiewicz
Copy link
Collaborator

Ref #6

@rgaiacs
Copy link
Contributor

rgaiacs commented Mar 24, 2015

@pbanaszkiewicz looks good to me.


## Abstract

The number of Software Carpentry's workshops run weekly dynamically grows. All
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe “is growing rapidly” instead of “dynamically grows”? I know Greg publishes charts of students-taught in the blog. I don't know if he publishes charts of workshops-held, but linking to either of those from here would be nice.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was unable to find numbers so I inquired @gvwilson about them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Tue, Mar 24, 2015 at 11:38:20AM -0700, Piotr Banaszkiewicz wrote:

+The number of Software Carpentry's workshops run weekly
dynamically grows. All

I was unable to find numbers so I inquired @gvwilson about them.

So I'm not sure about workshops, but I think this is the last graph
showing the increase in student numbers 1.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice, thank you.

I've just got data from Greg, I'm producing an updated chart. Not sure where to publish it, though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

index

Nice :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Tue, Mar 24, 2015 at 01:32:00PM -0700, Piotr Banaszkiewicz wrote:

index

You need to sort your data to avoid the downward jiggle around April
2014.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not able to get rid of jiggling entirely - this may be because I have a dataset with non-unique index (aka date). Very small jiggling is visible even after sorting.

cum_enrolment_workshops

cum_workshops

cum_instructors
(To fix horrible jiggling in this one I removed entries with duplicated dates leaving the latest entry with biggest number of instructors)

Anyway, I'll include two nicest looking plots I could get to this proposal and later change them to links - if they get published. Otherwise I'll link to the plot you pointed out earlier (#36 (comment)).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Tue, Mar 24, 2015 at 02:21:07PM -0700, Piotr Banaszkiewicz wrote:

I'm not able to get rid of jiggling entirely - this may be because I
have a dataset with non-unique index (aka date).

Right, you want to sort by increasing count (which will automatically
sort by increasing date). It looks like you currently have multiple
entries for one date and the higher-count entry is currently landing
before the lower count entry.

To fix horrible jiggling in this one I removed entries with
duplicated dates leaving the latest entry with biggest number of
instructors

That works. I'm not sure how you're creating the plots, but most
plotting packages have something that lets you draw stepped lines.
You should use something like gnuplot's 'steps' 1. With that
plotting style, it won't matter whether you de-dup the dates (as long
as you've sorted by increasing count).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using Pandas with Seaborn for nicer colors.

Pandas is very straightforward when it comes to reading and plotting CSV date-indexed files:

df = pandas.read_csv("enrolment_workshops_data.csv", index_col=0, parse_dates=True)
df.plot()

However, if I understand correctly, Pandas has support for multi-index (multi-indices?) data frames. It would take me 5 minutes to remove duplicate data points from the CSV files, but way longer to figure out how to do that in Pandas.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Tue, Mar 24, 2015 at 02:55:38PM -0700, Piotr Banaszkiewicz wrote:

However, if I understand correctly, Pandas has support for
multi-index (multi-indices?) data frames. It would take me 5 minutes
to remove duplicate data points from the CSV files, but way longer
to figure out how to do that in Pandas.

Or you could load the CSV with the Python stdlib, sort and dedup, and
then pass that to Pandas ;). But still, it may not be worth the
trouble.

@wking
Copy link
Member

wking commented Mar 24, 2015 via email

@pbanaszkiewicz
Copy link
Collaborator Author

Thanks @wking for your comments and suggestions :) Now I only need to find some source to back up my "SwC is growing rapidly" claim.

@pbanaszkiewicz
Copy link
Collaborator Author

I added some plots to the proposal - you can view the rendered proposal with the plots here.

@wking
Copy link
Member

wking commented Mar 24, 2015

On Tue, Mar 24, 2015 at 02:26:16PM -0700, Piotr Banaszkiewicz wrote:

https://github.com/pbanaszkiewicz/gsoc/blob/pbanaszkiewicz-proposal2/2015/proposals/banaszkiewicz-piotr-amy.md

Probably use a right-side y axis for workshops if you're plotting both
enrollment and workshop counts in the same figure. Otherwise the
workshop line is squashed way down at the bottom ;). And it looks
like this is currently showing “enrollment/workshops” and then
“workshops”. Maybe you want “enrollment/workshops” and then
“instructors”?

@pbanaszkiewicz
Copy link
Collaborator Author

Hey @wking,

Probably use a right-side y axis for workshops if you're plotting both enrollment and workshop counts in the same figure. Otherwise the workshop line is squashed way down at the bottom ;)

Indeed, the workshops line is squashed. I was thinking about a log scale, but then the plot would not be intuitive.

And it looks like this is currently showing “enrollment/workshops” and then “workshops”.

Yes, and that was intended. I wanted to first show how few workshops we run compared to number of people reached.

Then I wanted to show how workshops line really looks.

@wking
Copy link
Member

wking commented Mar 24, 2015

On Tue, Mar 24, 2015 at 02:59:42PM -0700, Piotr Banaszkiewicz wrote:

And it looks like this is currently showing “enrollment/workshops” and then “workshops”.

Yes, and that was intended. I wanted to first show how few workshops
we run compared to number of people reached.

Then I wanted to show how workshops line really looks.

I'd spell that out in the surrounding text then.

@rgaiacs
Copy link
Contributor

rgaiacs commented Mar 26, 2015

@pbanaszkiewicz Thanks for your proposal.

You need to submit your proposal to https://www.google-melange.com/gsoc/homepage/google/gsoc2015 as soon as possible. The deadline is March 27th 19:00 UTC.

@gvwilson
Copy link
Contributor

Ship it!

@rgaiacs rgaiacs added the 2015 label Mar 28, 2015
@rgaiacs
Copy link
Contributor

rgaiacs commented Mar 28, 2015

I'm merging this issue since student application period is over.

rgaiacs pushed a commit that referenced this pull request Mar 28, 2015
Proposal draft: Enhance Amy workshop management tool
@rgaiacs rgaiacs merged commit b41a5ae into numfocus:master Mar 28, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants