Tag Archives: Mastodon

On Scraping Mastodon

Mastodon was scraped, again. It was not the first time it had happened, and it probably wont be the last. This time it was for research, not just archiving which we had encountered in the past. The actual scraping happened in 2018, but the research was recently published, and this is why we’re talking about it now.

Background:

The research article, “Mastodon Content Warnings: Inappropriate Contents in a Microblogging Platform”, was written by authors from the Computer Science Department, University of Milan. The same group of people have previously published another research article related to Mastodon, “The Footprints of a “Mastodon”: How a Decentralized Architecture Influences Online Social Relationships”. In their previous paper they also had a lot of misunderstandings of the technology as well as the culture of Mastodon.

While it is tempting to do a complete analysis of the research, in this post I will point out a few issues with it, both from a technical perspective and an ethical one. In doing so I will reference and quote a few sections. However, it will not be a full analysis of all of the paper.

They wrote that they hashed the usernames, but included the URI of the posts in their database, which has the username in it.
Screenshot from Mastodon

The research papers both contained datasets: the first one had focused on meta data; and this last one’s dataset was match-able with the previous one, even though it was “anonymized”. However, it was brought to my attention that their anonymization was pointless, because the username was still in the URI.

The 2nd dataset, for the latest research paper, has been removed from online access with the comment:

“Deaccessioned Reason: Legal issue or Data Usage Agreement Many entries in the datasets do not fulfill the law about personal data release since they allow identification of personal information.”

Does this mean that they did not take any of these things into account when they wrote the paper to begin with? If we look at their ethical and legal considerations we can see that they half-considered it, and I would argue missed the mark. The way most people were talking about it, it did not actually seem like they even had made any ethical nor legal considerations in their research. Reading them, I realized that they probably would’ve been better off if they had written the legal consideration first, and then have that inform the ethical consideration.

Legal and Ethical Considerations

In the legal consideration, they said that from what they had gathered they had not found anything in the ToS (Terms of Service) of the standard agreement, bundled in with a Mastodon installation, indicating that they were breaking it by doing this gathering of data. I would like to argue that there may be ethical considerations about not technically breaking any legal barriers. What do I mean when I say this? I’m trying to convey that the legal considerations could have also had ethical concerns. As the saying goes: just because you can do something doesn’t mean you should.

In the legal section they also write:

 “In the terms of service and privacy policy the gathering and the usage of public available data is never explicitly mentioned, consequently our data collection seems to be complaint with the policy of the instance.” 

I can understand that if a legal document does not explicitly mention something you may feel like you have free rein. Stating that there is nothing explicitly mentioned, may indicate that there’s something implicit that they chose to ignore. However, they do not elaborate. If they had followed the legal considerations up with the ethical considerations, maybe they could have discussed the ethical implications of the decision they made there.

Further, they do recognize that each instance has the ability to adopt their own Terms of Service (ToS), but then seemed to have not followed through and actually checked if any of these 300 something servers had added their own ToS. I feel like there’s a clear disregard for the possibility of there being other ToS. With no indication that they checked a certain % (say 10%) of the listed servers and their ToS, which would have showed that a clear “majority” used the standard ToS. They could have recognized what differences do exist. I feel like there was simply an assumption rather than actual research done for this part.

Did they make any ethical considerations? It seems to mostly reflect the collection methodology, rather than answering any ethical questions, such as:

  • Would the users of Mastodon want to / expect to have their data scraped?
  • Would it be better to ask servers/users if they would want to participate in the research? 
  • Is this research actually a Computer Science research, or should it be a Social studies research paper, taking into consideration such ETHICAL questions?
  • Should Computer Science have mandatory ethics courses?

Credit where credit is due: The last question is lifted from several people on the fediverse who’ve asked it before this research paper was published, and continued to ask after it was published.

I think the biggest issue here, is that because these researchers do not seem to understand some of the culture on Mastodon (no there’s not only one culture, but there are some which come to mind for me) and have some basic misconceptions about the community and software, it was hard to come to any useful ethical considerations. Would they have allowed themselves to come to the conclusion that they should not publish their paper? Probably not.

Technically the Content Warning

While there are two research papers available to me, I only want to focus on the misconceptions in this research paper: “Mastodon Content Warnings: Inappropriate Contents in a Microblogging Platform”. I believe that their entire conclusion is way off because they simply misinterpreted how a feature is used on the servers.

In their methodology they described how they interpreted the technological “sensitive” field in the meta data:

“each toot provides the fields related to the inappropriate-ness of its content, namely the entries “sensitive”, “content”,“spoiler-text” and “language”. The boolean field “’sensitive” indicates whether or not the author of the toot thinks that the content is appropriate. If the toot is inappropriate, the field is set up to “True” and the field “spoiler-text” would contain a brief and publicly available description of the content.” (Sic)

Correction: The sensitive tag happens when someone adds a Content Warning to their post. The sensitive tag says nothing about the actual content, and what the person thought about it when they did us (I’ll elaborate on what Content Warnings mean culturally on Mastodon further down).

However, they had interpreted the technical function of content warnings correctly, with this first two sentences:

“By clicking on the “CW” button, a user can enter a short
summary of what the ”body” of her post contains, namely a
spoiler-text, and the full content of her toot. Automatically,
the system marks this toot as “sensitive” and only shows the
spoiler-text in all the timelines. (…)

The next part was unfortunately where one of the misinterpretations of the data happened:

“(…) We exploit this latter feature
to build our released dataset. This way the toots are labelled
by the users, and we assume that they are aware of the policy
of the instance and aware of what is appropriate or not for
their community.”

This section emphasizes that they believe that the Content Warning is only used to mark content as sensitive if it’s inappropriate, and if it does not belong on the server. Correction: If the content does not belong on the server, the users is most likely going to be banned. 

This point was an reiteration of the previous statement in the methodology:

“Here we describe the collection methodology of the two main elements of our dataset: i) the instance meta-data and ii) the local timelines of all the instances which allow toots written in English.

Specifically, we are interested in the full description of each instance and the list of allowed topics. From our viewpoint, these two fields contain the information related to the context which makes a post inappropriate or not.”

The misinterpretations seem to be stemming from assumptions, rather than research, about how the technology is used, what the “sensitive” tag actually means, and how it’s used on the over 300 servers used. This leads me to the cultural and social misinterpretation.

The Social Construct of the Content Warning

I believe that the biggest issue is that this research was in computer science, without any social science involved, with no consideration to the social part of social media. I’ve already noted that their assumption and interpretation is incorrect, so how are the Content Warnings used?

While I only have the empirical evidence from the servers I’m connected with, I’m still going to go out and say that: Content Warnings are in fact not used for content we do not believe belong in our communities

Rather, Content Warnings can be used in many ways. One way to describe it is simply as a subject line, similar to email. In some cases we will talk about more sensitive subjects, like addictions, drugs, war, news, politics. This is not to hide the content, but rather to offer the people reading it a chance to decide if they want to open it or not. If today is a day where reading about US Politics would just drain all my energy, I can choose to not open it. 

We can also use it for other things, that may be slightly sensitive to some, like food, meat, sex, nudity, private, venting (of emotions). It’s also common to use for post about money, house-hunting, mental and physical health, very positive emotions and very negative emotions. In some cases it offers us a chance to unburden ourselves, without dumping those emotions onto someone who is not given a fair chance to prepare themselves for it. 

There are other fantastic uses for Content Warnings, one which is especially dear to the community’s heart is as a setup for a joke. Some times the same CW will circulate in a meme like fashion, and contain things that make us giggle. Another common one is as spoiler warnings for Movies or TV series, or even books or other readings. You can then use the headline to tell everyone which TV series you’re about to talk about, and also denote which episode. This was great towards the last year of Game of Thrones for example, when a lot of people would be talking about it the day of the new episode. 

So, to emphasize, we do not post Content Warnings because we believe the subject is inappropriate, we just want to offer the reader of the post the chance to give informed consent. And using informed consent, is something which I believe the authors of the research could take a lesson from.


This article was supported by my patrons. If you enjoyed it and would like me to be able to write more of them, feel free to head over to my patreon page and pledge your support! 
Alternatively, check out my support page for more info.

On Mastodon and Nazis

mastodon mascot

For the past 2 years Mastodon has been promoted as a place without Nazis. Anyone familiar with social media technology knows that it’s not necessarily possible to entirely make such a promise, especially with a network which allows users to set up their own village to invite their friends.

The Fediverse is the interconnected villages of decentralized alternatives of popular social networks such as YouTube (PeerTube), Twitter (Mastodon), Facebook (Hubzilla), SoundCloud (FunkWhale), Instagram (Pixelfed), to mention a few.
 It isn’t immune to Nazis, but offers the tools to everyday users, and local leaders (administrators and moderators) to protect their village from them. On Twitter you can report, and block, but then you have to sit around and wait for that content to maybe get removed or maybe not. On Mastodon you get the chance to join a village, where you know that the admin has made a promise to you that Nazis, racist, or homophobes etc. aren’t welcome there. If your admin doesn’t fulfill this promise you have the power to move to a different village. With Twitter you simply can’t do that.

Nazis on the Fediverse: Gab

On the 4th of July, a big group of Nazi’s migrated into their own little village: Gab.com. They used Mastodon’s software to run the village. Gab has been a home to Nazis for a very long time, and anyone who’s been keeping an eye on social networks that keep popping up knew that their policies would welcome a lot of dangerous people. Gab the Social Network actively encourages people to harm other people, and let people run loose with harassment, all in the name of Free Speech. They have also been directly linked to a mass shooting. Yes, we could argue that mass shooters have been on Twitter and Facebook too, because duh it’s social networks. The major difference is, this place has become a breeding ground for these kind of ideas, and they are actively encouraged.

The Vice Article

This migration into the Fediverse by these racists and Nazis caught the interest of VICE, who wrote an article now proclaiming that Mastodon “the nazi-free alternative to Twitter, is now home to the biggest far right social network”. 

This is incorrect. While Gab has made their home in the Fediverse, they are not the biggest instance. The Vice article utilized numbers from fediverse.network displaying user count to decide that gab was the largest instance on the fediverse. 

A list of the top 5 instaces by user count on the fediverse
list of instances sorted by user count

The marked instance in 3rd place, is the Mastodon Flagship instance. The instance in 2nd place is pawoo.net which is a Japanese equivalent to DeviantArt. 

How can an instance so new have so many users? 

995391 users. Here’s the tricky part, they don’t. Not really. Basically what they did was migrate all the existing accounts from Gab. Simply just importing all existing accounts, including suspended and inactive ones, all old beta accounts from 2016 (because as far as I know they have not actually cleared any of those old accounts). So this number, while it sounds incredibly big doesn’t translate to much in activity:

List of instances sorted by activity

Comparatively they are not nearly as high up, but still fairly big. There are a few ways to spoof and fake numbers that show up for these stats. The below screenshot was taken just a few moments ago (and less than an hour after the above ones), here banana.dog is on the top of this list:

Eugen (creator of Mastodon) points out himself that:  

Gargron commenting on Active User count numbers being removed from Gab.
toot by Eugen about Gab removing Monthly Active Users

“Gab already removed the Monthly Active User counter from their frontpage (a default Mastodon feature). That’s easier than faking active user numbers I suppose” — Eugen

Their public timeline is also filled with spam posts, for accounts which haven’t been suspended, and even if those accounts were suspended they would still count as a body for the user count.

Is the Fediverse riddled with Nazis now?

No it’s not, unless you join a village which actively wants to communicate with them. First, let me cover how Gab migrated to the fediverse, and what that means for communication. Simply put, Gab installed a radio station (Activity Pub), by making a copy of the Mastodon software, and making it their own. This means that they can now call all the other villages if they so please. Or at least attempt to call the other villages. A major part of the Fediverse and Mastodon servers prepared by preemptively blocking gab.com, before they officially joined on the 4th of July. By blocking them, we’re effectively not listening to their radio station.

Unfortunately because the radio waves are publicly available, they are still able to listen into us, and “interact” with our radio shows (Public Posts), on their side of the fediverse, even if we refuse to listen to them (by blocking them). This is a flaw in the current design in the Mastodon software, and to some degree the Activity Pub (the radio waves). There is a lot of people on here who are working on the software, or are at least interested in it are working on different ways to deal with this issue, and hopefully we’ll be coming up with even more creative solutions in the future.

To use Eugen’s own words. Mastodon has still hard-lined against Nazis, and their fairly new covenant, enforces that by deciding which servers JoinMastodon.org will advertise for. If you don’t follow the covenant you wont be featured, if you’re a racist / Nazi instance you wont be featured. 

On top of that there has been massive efforts between instance (village) admins to organize against this influx of racist and Nazi users. There are even apps developers have decided to block gab.com users from connecting through the app (eg. Tusky and Sengi — full disclosure, I merged the feature to block gab via Tusky as I work for that app). And users are actively sharing lists of Fascist-harbouring instances that they have blocked. 

We are still here, and we’re still fighting Nazis and by no means welcoming them into our midst.


If you enjoyed this bit of musing, and would like me to be able to write them more, feel free to head over my patreon and pledge your support!
Alternatively, check out my support page for more info.

Mastodon, compassion vs Facebook, in your face.

After having spent a good 1.5 years on Mastodon, I feel like I just get bombarded with crap on Facebook that I don’t want to see / not comfortable with seeing.

Why? It’s not because my friends are bad people, it’s because Facebook doesn’t offer a way for my friends to add content warnings which protect the images.

On Mastodon, while it has it’s flaws, you can choose to put up a warning for what your content contains.

You can use this for Trigger Warnings, Sensitive Subjects, Food, and even SPOILERS for movies/series. Or just put your nerdy discussions behind it, and let people opt in to see it.

People will only see it if they click through, and it’s such a different experience. Even though I mostly almost always click through I find that when I’m prepared it’s a lot easier to deal with.

It allows the people posting to be cognizant about what they put out there, and how it presents to other people. It makes a lot of the people on the platform a lot more compassionate, than I’ll ever see here on Facebook unfortunately.

The first #ForkTogether meeting, and what went wrong

On the 30th of June, we had the first Fork Off Together meeting, for which the goal is to fork off from the Mastodon project. The idea had been simmering for a while, and the required logistics was a lot bigger than one person could do on their own, yet, I tried to do it on my own.

Let me explain, I was not doing it on my own per se, but rather I was doing a lot of the preparations for this one meeting alone, even if I had two people that I worked fairly tight with, at one point my head just got too tired to properly communicate with others about what help I needed, so it more or less got easier to “just do it myself”, or ask my live-in boyfriend for help, as I could point and grunt at things, when words wouldn’t come out properly.

So, what went wrong with the first meeting?

To start off, over all it was a good experience, but we definitely had some teaching moments which we seemed to, as a group, react well to.

However, I want to start by pointing out what went wrong from my side ie what I could have done better or different. This isn’t about placing blame, but rather a reflection on why I did it the way I did.

So, my initial idea was based on something that I had experienced and learnt when I was active in the Pirate Party here in Sweden between 2009-2014. The organization had a way, which is common (from my understanding) for certain types of organizations, namely the type that has a lot of smaller organizations under the same umbrella. Eg. political org or youth organizations that wants to try and get funding for their work locally.

Having this kind of meeting, is a way to make it easy to start up one of those new small orgs, only requiring 3-5 people, and being youth orgs it meant that they could get a little money from the government. This also doubles as a means to encourage youth to get engaged in activities which will in the long run keep them too busy and away from crime, (but don’t quote me on this, this is just my general understanding of the concept).

What I tried to do was leverage that knowledge I had, to have our own startup meeting, and jeebus I had to try really hard to not accidentally use that term.

In the political org case, it was easy to adopt the same bylaws, coc, operational plan etc. because we were all part of the same organization. This was definitely my first mistake.Unfortunately I didn’t mentally connect the dots until the actual meeting, and I couldn’t have done it different at the time.

I need to highlight here, that the accepting the bylaws and things during the first meeting of this kind was based on it being a sub-association of a bigger org. There wasn’t supposed to be a need to do to much with the bylaws and if there was it would’ve been done before the meeting.

In my foggy mind I didn’t get this out in time and worded correctly. Heck, I had even said “no we won’t draft bylaws” before I realized the translation in full. I could have, and probably should have, checked myself when I realized that that document translated to bylaws. But I didn’t.

Another member of the meeting wrote some good reflections about this type of meeting too.

If that was my first mistake, what was my second one?

I thought that I could distance myself from responsibility and active choices by leveraging that I was just inviting people to a meeting. I didn’t want to make decisions for us, and this was the only way I knew how.

This isn’t as much a mistake as it is paradoxical. Mostly because either way I make decisions and it becomes a really weird situation. Especially if I couldn’t get all the info out of my head as fast as the questions came my way.

About 14 days in I was able to entirely fall apart, and did some public spectacle which didn’t reflect well on me, and also ended up possibly harming the project, I pushed away some people I really wanted in on the ground floor.

I could make excuses, and I could try to explain myself, but it won’t change anything. However, what I can do is recognize that I did screw up and that I can do better in the future. I understand my why, and that means that I can take preventive measures.

So, what preventive measures can I take in the future?

A big one, delegate. While we were 3 people working together in the early days, the same people who’ve also rejected any direct involvement in management, or interim-committee or the committee / board for the first year of this project, I did a bulk of the work and had trouble getting stuff out of my head.

When I felt like I was about to entirely break, the incident referenced above, I should’ve let go right there and just set up a Discord server and invited everyone, and continued to contribute to the group in their process of preparing a meeting together etc.

But at the same time, if I hadn’t done the meeting the way I did, I would not have learnt the lessons I did, so this is a double edged sword, imo.

So, what went right?

I took my time, and worked through it slowly. I built small road maps for myself to guide me along the way and asked for help when I felt stuck.

I need to remind you all that the survey blew up way bigger than I had ever expected. I think by the end we had almost 200 responses to the survey, and over 120 saying “let’s do this”. [link to the shared data on June 11th]

I couldn’t have planned for that, but when it happened I tried to baby-step my way through it.

The meeting, even though it was long and had it’s issues, was also pretty damn fantastic. The way I had translated Swedish meeting formalities to a discord server turned out to work pretty well, and once people got a hang of it they seemed to appreciate the somewhat rigid structure.

I hope, that using this experience I can create a template for hosting a first meeting when a group of people want to start an org together, and maybe I can help someone else avoid some of the problems that we encountered. Because there’s some solid structure here that definitely can be reused. That said, I will be publishing a separate post about the actual meeting structure and how set it up.