how not to do data analytics…😎

image by google search

Roughly two years ago I did this little exercise where I used Mandrill package to parse through Dynamo Forum statistics and come up with a few interesting snippets of information. Recently Havard asked me if I would be willing to refresh that post, and create updated charts to see what’s been going on since then. Here it is:

So I started off my creating the Power-Law Distribution chart. It was basically a Bar Chart that had Dynamo users on the x-axis, and number of likes that they received on the y-axis. The shape of the curve that it produces is a fairly typical phenomenon in not just Open Source communities, as I have stated before, but also in other aspects of our lives like economics. The most popular example that comes to mind, is the so called Pareto Distribution which was originally used to explain wealth distribution in a society. It’s often called “80-20” rule or the “Matthew principle”. Basically it states that the majority of wealth in the society is held by the 20%. That can also be applied to the Dynamo Forum where majority of the posts are read, and answered by the small minority of the power users. We can assume that it happens to be like so because they are the most sophisticated users of the software, so any question that comes their way, is probably something that they already know the answer to, or they can figure it out really quickly since they know the software so well. I also noticed that once you earn the label of an “expert” people will actually toss questions at you directly, just because they realized that you are most likely to answer them anyways.This basically funnels even more opportunities to earn likes in your direction. I guess it is “winner, takes all”. Anyways, here’s the chart, and the most recent winners of the Dynamo Forum “likes” contest:

This year’s top 5:

  1. Kulkul
  2. JacobSmall
  3. Vikram_Subbaiah
  4. Dimitar_Venkov
  5. Konrad_K_Sobon

Another chart that I looked at two-years ago was a scatter plot style visualization to show the relationship between people’s overall activity on the forum, and a number of likes that they receive. I consider likes received as sort of “forum currency”, where if you have given an answer that people like, they can either approve it (it can only be done once per post), or they can like it (it’s a gift that keeps on giving if more people think that your answer was good). So with this chart, I was trying to figure out who’s out there, reading a lot of posts and leaving comments (potentially “noisy” activity), and how does that activity translate into likes received.

In this chart we have number of posts, comments etc. that user created on the x-axis, while the y-axis is number of likes received. The size of the dot refers to number of likes that user has given out to other users. Let’s talk about that last bit, and why I put that in there.

Unlike two years ago, this year I will dig in a little deeper. I don’t like just taking the results at their face value. I am afraid that they are not accurately representing the true behavior of users on the Forum. Let’s have a look.

It’s my understanding that there are ways to game the system. What?! One can “cheat” the forum? Yeah, we might not even be doing this consciously, but there are things that we do, that can potentially affect other people’s behavior, and if we play our hand right, we can get a better chance at a favorable outcome. In this case, I had a hunch that it’s much more likely that people that give out a lot of likes, will also receive a lot of them in return. As it turns out humans are social creatures, and its not a secret, that if you “scratch my back, I will scratch yours”. There was a cool, non-scientific, experiment done with Instagram likes here: Lovematically. Rameet Hawla created a little Instagram bot that automatically liked every image that showed up in his thread. The basic conclusion was that, yes, if we like other people’s images, they are likely to follow us, and like ours in return. Who knew? Here’s a little write up about it. Anyways, I am pointing this out, not to throw shade at anyone, but to highlight potential “like me back bias” that might be occurring on the forum.

Here’s top 5 of people creating comments and posts (aka most active users):

  1. Kulkul
  2. JacobSmall
  3. Vikram_Subbaiah
  4. Dimitar_Venkov
  5. Yna_Db

…and top 5 of people that give out the most likes to others:

  1. JacobSmall
  2. Vikram_Subbaiah
  3. john_pierson
  4. Nico_Stageman
  5. Kulkul

So let’s say that we adjust the likes received for the number of likes given out. I am not saying that each like given out should cancel a like received, but let’s say that there is around 50/50 chance that if I liked your posts, you are going to like mine back. Let’s set the weight at 0.4, and I am completely unscientific about it, but for the sake of an argument let’s do that and see what happens. First a theoretical example so that we all know what I am doing:

User 1:

  • likes received: 2000
  • likes given: 1000

User 2:

  • likes received: 2500
  • likes given: 3000

So I am creating a theoretical user with lots of likes given to others, that seems to be clearly outperforming User 1 who doesn’t like to give out so much love. Let’s weight their likes by 0.4 and see what happens:

User 1: likes_received – (likes_given*0.4) = 1600

User 2: likes_received – (likes_given*0.4) = 1300

So if we were to redo the first chart with the new formula we would get something like this:

Adjusted Top 5:

  1. Kulkul
  2. Dimitar_Venkov
  3. Vikram_Subbaiah
  4. Nick_Boyts
  5. Konrad_K_Sobon

It’s pretty much the same with one person dropping out, and the rest remaining intact. I don’t think that this is actually a good adjustment yet. I think I would want to take into account number of posts/comments created by each user as well. It’s no secret that people that comment a lot will inevitably get more opportunities to have their comments/posts liked, hence creating more opportunities for themselves. It can be seen in this chart pretty clearly where people all the way to the right are also people that get the most likes:

I drew that line across to showcase a 2:1 ratio that I think is a pretty good ratio that you want to be at. It’s a ratio of comments/posts created to likes received. Basically, if you imagine that someone posted a question on the forum, you don’t want to post an answer that doesn’t really move the needle for anyone. If you think that the question is inappropriate, doesn’t have enough info, and in general is garbage, in my opinion it should be flagged and moderators should talk to such user about upgrading their question. What I have seen a lot, is that people will engage with such user, and a 10 page long conversation ensues. That’s all fine, if such conversation is actually meaningful, but if its someone saying, “I think this was answered before”, and gives no link to the answer or no actual answer, then its a waste of time. So a ratio of 2:1 seems appropriate to me, because that gives you a chance to answer a question, and then maybe answer a follow up question without incurring a penalty (in my ranking that is). Basically, you want to stay above the red line, not below. The idea is to ANSWER the question, NOT COMMENT on them.

Let’s weight the results again and this time include likes given and amount of posts/comments created.

Here’s the formula that I will use:

likes_received = Math.Max(0, likes_received + (likes_received*(2-(posts_created/likes_received))))

So basically I am taking posts created and dividing it by likes received. That gives me a ratio. Then I am looking at it against my baseline 2:1. So if you are at 1.5 then 2-1.5 = +0.5 or if you are at 2.4 then its 2-2.4 = -0.4. If you are in the positive range, that means that you get more than one like for every two posts. That’s good. Remember, stay above the red line. We then use that above/below baseline number to calculate a reward/penalty for our likes received. If you are in the positive your likes will be multiplied by that ratio and total number of likes will be added to the results. You gained extra likes for being “efficient”. If you were below that threshold, you are going to be penalized with likes subtracted. Sorry. Here’s a new chart:

New Top 5:

  1. Dimitar_Venkov
  2. Thomas_Mahon
  3. awilliams
  4. Konrad_K_Sobon
  5. erfajo

Notable dropouts are of course Kulkul and Vikram_Subbaiah. They are still in top 10. The biggest drop from the Top 5 overall that I have noticed was JacobSmall (2 => 32). Another notable drop was Yna_Db that went from 14 => 76. Notable jumps? Thomas_Mahon cracked the Top 5 going from 9 => 2.

All right, shout out time. Here’s a quick glance at my Bad Monkeys friends and their overall activity on the forum:

It looks like not much has changed. Dimitar is still the boss of Dynamo Forum, with Andreas and me at a distant second place. Get a life Dimitar! 😂

This post is really getting long. Let’s wrap it up. Hope you like my line of thinking, and if you don’t, post comments and we can talk about it.

Cheers!

Ps. I am using people’s usernames that are publicly available on the Dynamo Forum. If you don’t like everyone knowing your name, just don’t use it in the username. :-)

Ps2. Special thanks to Zach Kron for making that data available to me even though I am not an admin on the forum. Much appreciated.

Poll:

You guys probably know by now, that I quit my job, and wanted to do consulting. By consulting I mean, writing software mostly, but I used to teach a little back in the days, so I was wondering how much and if at all anyone would want to see a workshop/tutorial series that would demonstrate these techniques. I don’t know if everyone prefers to hire people to do their coding, or would they rather spend money to train their staff to learn how to code themselves. Anyways, I figured I can ask, to see what’s up. Let’s assume that I will have some time to teach workshops, or make video tutorials. Anyone?

Would you be interested in learning about AEC data analysis?

View Results

Loading ... Loading ...

If yes, do you prefer learning from an in-person workshop or online video tutorial?

View Results

Loading ... Loading ...

If in person Workshop is preferred, what location works best for you?

View Results

Loading ... Loading ...

If I have offered to teach a Workshop in New York, would you attend if cost was:

View Results

Loading ... Loading ...

Thanks for answering these question. It will help me figure out what you all need, and what it is that I can do to help.

3 Comments

  1. Daniel Woodcock says:

    This is fantastic and a great read! It also shows me I really need to do way more on the forums! Haha!

  2. Kulkul says:

    Thanks Konrad! I was planning to do forum analytics on Dec 15 you where quicker than me. Have a great day.
    Best Regards,

Leave a Comment