Thanks to the collective effort of all the people who participated in the #GenigmaChallenge, in just 20 weeks 181 regions of the genome with chromosomal anomalies were identified in the T47D breast cancer cell line. We have named these regions Eureka!

The games provided during the #GenigmaChallenge were prepared at 100k resolution, meaning that each game piece (the ones you have to move and place to increase the score), corresponded to fragments of 100,000 DNA base pairs. This allowed us to obtain a first snapshot of the genomic map of these cells and now, instead of going “blind” or analysing the entire genome, the research team is focusing on these regions to look for genes of interest in order to advance the research.

At this stage, we are still asking for citizen collaboration to analyse in more detail the chromosomal reorganisation of these areas of high interest.

To do this, we have introduced into the app games from the Eureka regions found, at a resolution of 10K, which means that each piece you move, from now, has the size of 10,000 base pairs of DNA. It’s as we were looking at continents, and now at countries. We are moving closer, to observe from a nearer distance.

During the #GenigmaEureka, players will be analysing some of these regions with 15-piece puzzles, which are more difficult but can provide a lot of information to the scientific team.

On 16 June heralded the conclusion of the #GenigmaChallenge, the first participative experiment to analyse the human genome through a mobile phone app, GENIGMA, in which more than 39,000 people from 154 countries participated.

–> 600,287 solutions have been gathered in a matter of 20 weeks and 181 Eureka! regions, areas of the genome of interest in breast cancer research, have been identified.

–> The player community dedicated 19,485 hours to scanning, on a step-by-step basis, the 23 chromosomes loaded into the app by the scientific team.

–> As of now, it will be possible to play with fragments of the genome of the Eureka! zones identified during the weeks the #GenigmaChallenge ran. In this way, the scientific team will try to test whether it is possible to progress in the analysis of the arrangement of these fragments to a higher level of resolution.

In the course of 139 days, the GENIGMA video game for iOS and Android threw down the #GenigmaChallenge gauntlet to players from all over the world to solve puzzles to contribute to furthering progress in cancer research. The objective was to collaborate to identify the chromosomal regions affected by anomalies in T-47D breast cancer cell culture. These are the most commonly used cells by the worldwide scientific community in research into this type of cancer.

Cell cultures are a cornerstone of modern biology and have been used to discover vaccines, chemotherapy for cancer or in vitro fertilisation. Nevertheless, we still lack detailed knowledge of how the genome of each one of these cultures is organised and this continues to limit scientific progress. This experiment was designed to test whether it is possible to identify areas of interest in the genome of these cells quickly and collaboratively.

20 weeks with the #GenigmaChallenge

Based on the real data obtained in the laboratory with the T-47D breast-cancer cell culture, the scientific team divided all the DNA into small fragments and created 5,442 puzzles to be analysed by the community of players. Every week, puzzles of 8, 10, 12 and up to 15 pieces were uploaded to GENIGMA for analysis using human logic and with the aid of virtual tools integrated into the game.

Genigma has more than 5,000 puzzles, which contain parts of the genome of 100,000 base pairs of the 3,000,000,000 that make up our genome. The challenge of putting them in order could only be accomplished by dividing the “big puzzle” into small parts that each player could solve.

In 20 weeks, more than 44,000 people from 154 countries have downloaded the game and provided more than 600,000 solutions. With the help of the social media and the traditional media, the team gradually assembled a huge team of volunteers from all over the world who contributed to this research in their free time. Throughout the experiment, it also kept the community posted on the achievements obtained on a weekly basis through this web, the app’s internal messaging system, Twitter, Facebook and Instagram. The team also interacted with the community, tweaking certain features of the app, and also prepared extra content to handle queries and to maintain interest in the project. In fact, the experiment lasted four and a half months, in the course of which the app was updated on 5 occasions, bringing in improvements suggested by the citizens (in the tutorial, the scoring system, the viewing of individual achievements and also including new scientific dissemination cards).

When CNAG-CRG launched the game, players were asked to play a minimum of 50 games to participate in the experiment. Nevertheless, this minimum was more than surpassed in many cases. Moreover, some people have been playing and supporting the project right to the end and have also participated in disseminating and sharing their playing strategies in order to get the highest number of people possible to opt into this collective effort. Their involvement was also decisive in the success of the project.

Chromosomal regions of great research interest

Over the 139 days that the #GenigmaChallenge lasted, the puzzles in which at least 40 different players managed to beat the record score indicated in the game were tagged as Eureka! regions. All in all, in the entire genome of these cells, 181 of these regions regarded as being of high scientific interest were identified, since they show that the genome sequence could be affected by chromosomal rearrangements if we compare it to the sequences of the non-cancerous cells. The regions in which the proposed record was not beaten, also solved by consensus, are probably regions with a similar arrangement to that of their homologous regions in healthy cells, which implies that they would not be affected by any chromosomal alteration.

Now that the analysis of the entire genome has been completed, the Eureka! regions identified will be analysed in detail by our scientific team. Once these results have been analysed, we will consider the possibility of using GENIGMA to analyse other types of cancer.

Some data

20 weeks of play, 139 days
People from 154 different countries took part in the experiment
More than 44,000 downloads (Android and iOS)
39,543 active players, of whom 1,900 are “super” players (they played more than 50 games)
A total of 19,485 hours logged by all players
600,287 solutions gathered
181 regions of high scientific interest (Eureka!) identified
5,472 puzzles analysed by the community, representing more than 300,000 pieces overall (equivalent to 3,000,000,000 DNA base pairs)
1 h 46 min – mean interaction time per player
6 – average number of games submitted per session and by player
Mean resolution time per game: 10-piece puzzles: 18 min; 12-piece puzzles: 25 min; 15-piece puzzles: 33 min; 20-piece puzzles: 1 h

Top 5 players who played most: Player 1: 6,139 solutions submitted. 240 hours played; Player 2: 5,342 solutions submitted. 180 hours played; Player 3: 4,518 solutions submitted. 220 hours played; Player 4: 4,502 solutions submitted. 320 hours played; Player 5: 3,920 solutions submitted. 299 hours played

Top 5 contributing countries: 60% Spain (>23,000 people); 7.6% United States (>2,800 people); 4.7% United Kingdom (>1,700 people); 3.2% Russia (>1,200 people); 3% Italy (>1,000 people)

What the future holds for GENIGMA

In this period of data analysis by the scientific team, the GENIGMA app will continue to run so that anyone who wants to can still play and keep honing their skills.

As of now, it will be possible to play with fragments of the genome of the Eureka! zones identified during the weeks the #GenigmaChallenge ran. In this way, the scientific team will try to test whether it is possible to progress in the analysis of the arrangement of these fragments to a higher level of resolution.

Although we reached 100% of games in the breast cancer cell line, we will keep GENIGMA running to explore the Eureka!’ regions found by the players even further. In this way, when we analyse them, we will have more detailed information about the possible alterations in these zones.

The results of the analysis of the data provided by the GENIGMA players will be made public in the coming months.

By Juan Rodríguez and Marco di Stefano (Members of the GENIGMA scientific team)

Have you ever wondered how scientists study cancer in the lab? How is it possible to reproduce a biological system in miniature to understand how the body works? What if you wanted to test the effectiveness of new drugs?

Laboratories around the world carry out their research using cell lines, also known as cell cultures. They have been around for a while, allowing scientists to accomplish several discoveries that would have been otherwise inconceivable. Thanks to cell lines, researchers made importance advances for the development of the polio vaccine, tested new chemotherapies, made important inroads in cell cloning processes, and played a key role in the development of in vitro fertilization. Some of this work went on to win Nobel prizes. The Covid-19 vaccines we use today were first developed and tested thanks to the cell cultures researchers had available at the time, allowing them to safely reproduce infections in a realistic environment.

As a research tool, cell cultures are unique in life sciences, having evolved leaps and bounds since the early 20th century. But you might be wondering… what does a video game like Genigma has to do with all this? To answer this question, we have to travel back in time.

The concept and origin of cell cultures

The concept of cell culture refers to the process by which cells are grown in a favourable artificial environment. After the cells have been isolated from living tissue, they are maintained under carefully controlled conditions. Advantages of using cell lines are manifold, for example:

  • Allow for reproducing a cost-effective, biologically relevant environment for research
  • Offer an alternative to animal experimentation in the lab
  • Can be maintained for years (even frozen!)
  • Respond to drugs and treatments similarly to body cells

The early 20th century was the time when the basic principles for plant and animal cell cultures in vitro were developed. The formulation of cellular theory in the late 19th century formed the basis for biological research, with three main hypotheses:

  1. All living organisms are composed of one or more cells
  2. The cell is the basic unit of structure and organization in organisms
  3. Cells arise from pre-existing cells

German embryologist Wilhelm Roux demonstrated in 1885 that it is possible to maintain living cells outside of the body in saline solution for a few days. He observed that neural cells of chicken embryos cells continued to “work” outside the body.

Since then, the process of generating cell cultures has continued developing, with the first cell line – the “L929” cell line – being established by Earle in 1948, and which is still around today! This cell line was derived from subcutaneous mouse tissue and displayed quite different morphology from the origin of tissue.

How cell lines are grown in a lab

To establish a new cell line, researchers first generate a primary culture, obtained directly from the tissues or organs. Cells grow until they occupy the full surface of a culture plate, and then stop growing. Then, they have to be subcultured (passaged) to a new plate. After this step, the process starts all over again and from now on it can be considered a cell culture or cell line.

The subculture technique allowed researchers to obtain cell lines from primary cultures, analogous to what we do with the sourdough for the bread or the kefir yeast for yoghurt. Primary cell cultures are mainly initiated from normal or malignant adult and/or embryonic tissues. For example, cancer cells from a biopsy of a cancer patient can be grown in a plate and a cell line can be established.

The cell lines established from normal tissues display finite growth. In contrast, cell lines obtained from cancerous tissues proliferate indefinitely (after all, that is the natural trait of cancer). However, normal cells can also be immortalized using routine laboratory procedures, meaning that under particular optimal conditions, they can reproduce forever.

The limitations of cell lines

Different cell lines are commonly used in different studies, but the use of cell lines also has some disadvantages and limitation.

One disadvantage is that the number of genetic aberrations in cell lines increase over time. Another is that a cell line’s response toward a drug might be not 100% equal to an actual patient response. After all, the cell culture environment in a plastic plate is obviously different from that of the original tumour in the body. Due to culture conditions, natural properties of the tumour or tissue can be lost, altering the tumour’s potential responses to treatments.

Cross‐contamination of cell cultures with another cell line is another particularly important issue. In a laboratory working with several cell lines, cross-contamination of cells can happen. When you think you are observing a certain phenomenon in a cell line, it could be that it has been contaminated with a different one.

Bacterial infections are another problem that can change a culture’s properties and ruin the line. Another problem is that not all types of tumours can be turned into cell lines, which makes some cancer research biased towards the use of easy to grow cell lines.

Unfortunately, we cannot solve all of the problems above. Still, the advantages outweigh the disadvantages.

The importance of the Genigma challenge

Genetics is what makes a cell behave and respond the way it does to treatments and stimuli. Everything is coded into a cell’s DNA; when to replicate, when to die, when to activate gene programs and react to an external cue.

Every cell comes from another cell. As cells are immortalized in culture, they have to fully copy their own genetic material once per division, a phenomenon that occurs so much in totality that it will likely give rise to copy-write errors.

Imagine that you have to manually copy the same page of a book every day for years and years. Eventually, small typos can be made, pass unnoticed, propagating and amplifying the errors. In a few years your page may not resemble anything like the original anymore.

You can easily imagine what may happen to the genetic material of a cell after decades in culture. We know that these errors are everywhere in the cell lines! In some cases, even full chromosomes have been duplicated (image). For cancer cell lines this is particularly pronounced, as the disease is characterised by processes where the cell loses control, parts of the genome are rearranged, duplicated or deleted, either as a cause or consequence of the disease.

Figure 1. Left: Normal human karyotype (set of chromosomes in a cell). Right: Cell line karyotype. Note how the normal karyotype has the canonical 23 pairs of chromosomes, while the cell line has suffered full chromosome duplications and deletions, not resembling anymore the original one.

When researchers work with cancer cell lines, they interpret their discoveries using a genomic map, which tells them, among other things, where the genes they have studied are located in the cell’s DNA. This map is what is known as a genome reference sequence.

For most studies, researchers use a canonical human genome reference sequence to navigate through the genomic landscapes of the cells and interpret their results. These maps, in principle, should be free of these aforementioned errors, but in reality, using these maps would be like navigating the streets of a city using a 150-year-old map.

Using Barcelona as an example, we’d recognise the sea and the mountain that border the city, but many neighborhoods or streets would have been created, expanded, or even bulldozed in that period of time. The map is no longer useful.

This is exactly what we want to do in Genigma. How can we build a tailored, precise genomic map of the most commonly-used cancer cell lines, reflecting all these changes in the genetic landscape of the cancer cell lines?

And how exactly do we want to do this? With the help of three-dimensional genomics (you can read more about it in our dedicated blog entry) and the help of powerful citizen science, in which we will use your brain computing power to solve some of the most complex and exciting problems in biology!

Figure 2: A dynamically changing map of Barcelona across the last ~150 years.
Figure 2: A dynamically changing map of Barcelona across the last ~150 years. Photos by Museu d’Història de Barcelona (MUHBA).

Humans vs. machines: ‘herd intelligence’ vs. ‘artificial intelligence’

Machines and computer algorithms can be very helpful to solve biological problems like the one we are tackling in Genigma. However, they have their limitations and cannot return all the accuracy such a daunting task would require.

For example, take a look at the image below. How quickly can you spot dogs?

Figure 3: Some pictures that may confound machines, but not brains…

It should be immediate obviously to most people! But what are the unique patterns that you are spotting in a dog’s face that your brain recognises as a dog’s face? Expressing that in words is no simple task. For example, muffins have blueberries or chocolate chunks that could be confused for dog faces with dark eyes and a nose, but something in your brain tells you that the chunks in muffins are not in an anatomical position. Similarly, dog faces are ight and brown but so is the dough in muffins.

If you had to build an automatic computer program to perform this task, you would have to introduce the concept of “dog” to the algorithm. It is extremely costly to accurately teach this concept to a computer program, with all that it implies. There are way too many concepts, both tangible and abstract, consciously or subconsciously, that our brain processes before reaching an answer.

So, how can we describe a set of complex concepts to a machine that make it return the correct answer in the correct context? This is one of the challenges of using current algorithms when we use them to generate reference genomic sequences. They can perform well, fast and accurate at first, but then at a more fine-grained level they may get easily confused, returning imprecise results. On the contrary, our brains can process a limited amount of information per time unit, while computers can work much faster. However, the brain tends to perform better when faced with fine scale problems.

For Genigma, we asked – what if we could split a huge problem in a thousand small pieces and then use the computing power of a thousand brains to get the solution for each of them?

Brains can also be prone to error, but collectively they are powerful. While algorithms will always make the same systematic errors when facing a same challenge, different human brains are less likely to make the same mistakes in the same places. This herd intelligence is exactly what we want to exploit with Genigma, in order to reconstruct this precious valuable cancer genome sequences to help researchers develop their therapeutic strategies guided with a more precise genomic map.

The game uses data from experiments conducted on breast cancer cells in the CRG lab. The experiments produced interaction maps between all the genomic regions of the breast cancer cells analysed. Each element of an interaction map is an experimental measure of the number of interactions that two chromosomal regions make within the nucleus.

Yasmina Cuartero and François Le Dily have been the people responsible at the CRG for performing the experiments in the laboratory applying the Hi-C technique with the T47D breast cancer cell line.

François, who works on the response of breast cells to steroids and is an expert in chromatin conformation, explains the advantages of this technique: “The Hi-C technique allows us to obtain a 3D “picture” of how the chromatin (DNA and its associated proteins) is arranged in the cells. This method quantifies the number of interactions between regions of DNA sequence in a chromosome, which are close together in 3D space, but may be separated by many nucleotides in the linear genome.

Scientists discovered a few years ago that to understand how a cell functions, it is not enough to know what the linear structure of the genes is (i.e. how they are distributed one behind the other). This first approximation provides only part of the information, since DNA is packaged in such a way that some genes may be far apart linearly, but in reality they are very close to each other, and from this position, they interact. That’s about 2 metres of DNA that folds up to fit into a nucleus of just a few micrometres!

Cancer cells are characterised by alterations of the genome (set of genes) that affect their functioning: this is because their DNA may contain many copies of the same gene (due to an error in the DNA copying phase), genes that are no longer in their original position (marked by the structure of a healthy cell), genes that have been “turned 180⁰” (therefore their reading is wrong and they do not fulfil their function) or even chromosomes that are fused. In all these cases, the consequences on the proper functioning of the cell are obvious and can be serious”. The Hi-C technique makes it possible to find these errors and can be used to reconstruct the genome map of abnormal cells.

Yasmina, an expert in genome structure study techniques, gives us some details of how it is applied in the lab. “With the Hi-C technique, we quantify the possible interactions between all pairs of DNA fragments simultaneously. This is done with many cells at the same time, because in each cell, depending on its life cycle, the DNA may be in a different position. The first step is to fix the chromatin with a reagent. Then, pieces of 400 base pairs are cut with enzymes at defined locations (based on knowledge of the reference genome of healthy cells) and finally, the remaining pieces are ligated with another enzyme at the 3D level. Once this process is completed, this DNA is extracted and a series of chimeric fragments (which do not exist in reality in linear form, but are the result of the 3D picture we have taken in the lab) are obtained, which will be sequenced”.

Once out of the lab, the sequencing results are passed to a computer programme. This generates a contact matrix, which contains the information that is given to male and female players, in the form of a game, to analyse.

These experiments have been carried out thanks to the help of ARIMA Genomics, which has given Genigma the kits for these analyses.

What do I have to do?
Rearrange the game pieces and try to get a score as high as possible.

What do the symbols inside each piece mean?
Nothing, it just helps you to identify any piece form the others.

What do the colours of the pieces mean?
The colour of the pieces goes from red to green. More green pieces means a better score.

What do the numbers inside each piece mean?
It shows how strong the relationship is with all the others. High numbers mean that you are getting a good score.

Are there any guides to improve the order of the pieces?
Yes, you can use the tools in the box at the bottom of the puzzle. Virtual tools allow you to get information about the different pieces, move them or return to your best score. You can consult the information button to the right of the toolbox.

What is the top bar telling us?
In the top bar you will find information on how many points you’ve scored, what your record is and what is the record provided by all the players.

Why are the tools not activated?
You have to hire the mechanic to be able to use the tools.

What’s the point of hiring professionals?
It allows you to get more coins, more science cards or to be able to use the power-up. See what each one does by clicking on their image

What does the level indicate?
As you score more points, you will level up. The games will also increase in complexity.

How does the scoring change?
Each piece has a score relative to the other pieces. When you change one, the total score changes because it changes the order between all of the pieces.

What is the difference between zones and areas on the map and what does the colour say?
Each zone has 3 areas with 3 different levels (0, 1, 2). The zones are used to count the number of moves. The colour of the area belongs to the group that has achieved the most stars playing in it so far. At the end of the week, the current challenge is closed, winning clans are proclaimed, and new clan pairings and new maps to explore are opened.

What are the costs to move around the map?
The cost to get to an area is 5 coins for each zone you have to enter to get to your destination and there is an “extra” cost of coins for the level of the area (marked on the map with 0, 1 or 2).

If I don’t have any coins, can I continue exploring the map?
If you run out of coins you can continue to play in the level 0 area of the zone you are in.
If you want to gamble on earning more coins you can always move to more distant higher level areas.

How does the reward system work?
The reward is the result of adding 1 to the area where you play and multiplying it by the stars earned. The areas give higher rewards depending on the level (marked with a number on the map). If at the moment of choosing where to move we had an explorer in the team, the result is multiplied by 2.

Can I see how many areas I have conquered?
It is not possible to know how many areas have been conquered on an individual level. Zones and areas are not conquered individually, it is the result of the collective effort of the clan. They change hands during the week and the final tally gives the winner of the weekly battle.

Is there a right answer?
No. That’s what we’re looking for. When many people get the same record, we will be close to the right answer.

What is the score for?
We don’t know the correct answer to each set of pieces. We are going to determine the solution using data provided by the players as a whole. When at least 40 players have provided the same solution with the highest score, this will be considered the best solution. The outcome will be the result of a collective solution that reflects the consensus among the players.

What are the three stars at the bottom?
As your game score rises you will see how many stars you have already achieved: your challenge is to achieve three!

Why can people be better than the machine?
The algorithm is not totally accurate. People can do better because the order of the pieces is a visual matter, and humans are better at this than machines.

Why is it important to play a lot?
If we get lots of different people playing and solving the same games, we can explore better where the algorithm fails. This means we can obtain more scientific information, which allows us to advance research much more quickly.

How can I repeat the tutorial?
On the little wheel on the main page you will find a button to access it.

What is GENIGMA? 

Genigma is both a game and a citizen science project.  

The game enlists players to solve puzzles and help find the correct order of gene sequences in cancer. Cancers have a huge number of mutations and chromosomal abnormalities that make it difficult to study.  Players help us deduce the correct order of the sequence by reordering small blocks that represent genome fragments. While individually the puzzles don’t provide direct solutions, collectively they provide important clues on how the cancer genome is arranged.  

Genigma is also a citizen science project: it has been co-created with citizens for two and half years, and is listening to player feedback to improve. The project pretends to use the universal appeal of videogames to get people involved in a real-world breast cancer research experiment. If the experiment is successful, we could use this tool and citizen science methodology to investigate other types of cancer.  

Why did you create the game? 

Knowing the correct order of sequence in a human genome is essential for studying cancer. However, worldwide cancer research efforts depend on using a human genome sequence that is based off healthy individuals. Because of cancer’s mutations and abnormalities, this is like navigating modern cities using maps from the past. Things aren’t in the right place, or have been built, demolished or moved. 

The game was created to find the correct order of sequence in the cancers we study in the laboratory, also known as a genome reference map. Having this would help us better pinpoint the location of genes of therapeutic interest or potential mutation sites. 

Genome reference maps can be created using artificial intelligence, but these require significant time and resources to train, as well as vast computational power. AI also results in just one potential solution, with little space for nuance. Players on the other hand have a collective ‘herd intelligence’ that can provide creative solutions in ways that AI might not be able to. Player solutions can help us identify ‘Eureka!’ regions – areas of the genome that are flagged as being of interest because they are different to the healthy human genome reference. Analyzing these regions in the lab could reveal the correct order of the cancer sequence. So far, we have found more than 100 Eureka regions! 

How does the game work? 

Players move pieces, each of which represents a genome fragment, along a thin line, which represents the filament of a chromosome. The aim of each game is to try to get the highest score possible. The higher the score, the more likely it is that players have identified a Eureka region. Players improve the score by assessing the changing colours or numbers associated with each piece, which tells them how “happy” they are to be put into a particular place. When players move the pieces around, they are assessing the mutual distance between a pair of pieces along the sequence. This information is essential for assessing the correct order of the sequence.  

What is a Eureka region? 
A Eureka region is found when players provide a record score that is higher than what has been previously calculated. It indicates that players have identified a region affected by chromosomal rearrangements when compared to non-cancerous genome sequences. Once we get those regions identified, we can study it in depth to map the genes hosted in them and investigate their implication in cancer.  Having this would help us better pinpoint the location of genes of therapeutic interest or potential mutation sites.  

How does the game know I’ve found a Eureka region? 
A lot of work has occurred behind the scenes to create the score used in the game. Before creating Genigma, the researchers carried out experiments using cancer cells in the laboratory, producing maps that predict how different genomic regions interact with each other. One of the fundamental biological phenomena is that the higher the number of interactions between genomic regions, the more likely they are to be next to each other. Researchers use this to help create genome reference maps. 

The starting score is based off the number of interactions that occur naturally within human healthy cells. When players move the pieces around, they are assessing the mutual distance between a pair of pieces along the sequence. This information is essential for assessing the correct order of the sequence. 

What happens I reach a “record” score? How does the game know it’s a “record”?  
The record score is the highest score provided by the players for that particular individual game. The game knows it because it has access to all the scores provided by all the players (in an anonymous form) for that particular game. When a player reaches a new absolute record, it will be passed to the games of all the other players in the world! When a consensus of at least 40 players giving that record is reached, we consider that individual game resolved. 

I can’t get a record. Am I still useful equalling the reference score? 
Yes! Not all the genomes of cancer cells are affected by rearrangements. What we expect is that in regions of the sequence where there are no rearrangements, the order of the fragments corresponding to the configuration with the maximum score is the same as in the reference genome of a healthy cell. If at least 40 players provide a record for a particular game that is equal to the reference score (that we calculated based in non-cancer cells), the region under analysis will also be marked as solved.  This result would indicate that this specific region may not affected by chromosomal rearrangements.   

How long will GENIGMA run for? 
On launch, we launched the #GenigmaChallenge, which pretends to completely analyse the genome of breast cancer cells in a ninety-day period. If the experiment is successful, we will launch games for other types of cancer. 

How do the GENIGMA team use that data about Eureka regions?  
Eureka regions will be studied in depth by the scientific teams. They will map the genes hosted in them and investigate their implication in cancer. Identification of Eureka regions will help in building a more accurate new reference map for  T47D breast cancer cell lines.  

How many DNA configurations can we explore with the game? 
It depends on the difficulty of the level that we are playing at. In fact, the game can provide puzzles of different sizes. Let’s give an example with 3 sizes: 8 pieces, 16 pieces, 35 pieces.  The more games you will play, the faster you will move from one level to another. In particular, in the easy level, the player will have to rearrange 8 DNA fragments, which correspond to 4 * 10 ^ 4 possible configurations; in the intermediate level, the player will have to order 16 fragments, which correspond to 2 * 10 ^ 13 configurations; in the third, or in the difficult level, the player will have to rearrange 35 fragments, which correspond to about 10 ^ 40 configurations. The number of configurations that we can explore grows rapidly as the number of fragments increases: here it becomes clear that we need to adopt a strategy that allows an efficient search for the best solution. 

Why can people do it better than the AI? 
In the easier levels, an algorithm can explore all the possible configurations and choose the one with the highest score. The game wants to make the search more efficient, providing tools that guide the player’s eye, in order to avoid a forced exploration of all the possible combinations. 
This is done, first of all, through colors. In fact, the fragments can take on 4 colors, ranging from red (worst positioned fragments) to green (best positioned fragments). However, it may happen that we reach configurations where all the fragments have the same color, in this case we can use the score associated with each piece: the higher the number, the better the position of the fragment within the sequence. 

A game of just 15 pieces results in 1 trillion possible existing combinations. A computer, not having the same visual and logical ability as a human, would have to go through all the sequences, calculate the score for each one and at the end of the process select the one with the highest score. This is an enormous calculation and takes a very long time. Genigma has games with 35 fragments! The algorithm, to be efficient, changes strategy and drastically reduces the number of configurations to be explored, carrying out a local search for the best movement, step by step, without taking into account the entire hierarchy of contacts among the fragments. 

As a result, in most of the cases, the algorithm cannot reach the best piece arrangement, leaving aspects of the genome unexplored. However, players can reach these configurations and find the best one. Since we do not know a priori the maximum value of the score that we can reach, we will use the consensus criterion to define the best configuration, requiring that at least 40 players identify the same solution with the highest score. In this way, the genome organization provided by the game will be the result of a collective solution that reflects the consensus of the players. 

How do you interact with players in Genigma?  
Genigma is sustained by a transparent and open dialogue with the community from the beginning. We use social networks and in-game messaging to get in contact with players.  The idea is to generate a live community around the challenge.  We use the player’s feedback to update the app and create new content for the web. 

We are also collecting player’s game strategies and tactics that have led to high scores in the game: this information is very valuable since, in the future, we could use it to train artificial intelligence to create new algorithms to investigate the genome.  

Reference genomes have great benefits for medicine: they allow the discovery of the molecular mechanisms responsible for many diseases and facilitate the diagnosis and development of more specific therapies. Knowing the genomic map of cancer cells can provide us with useful information to understand how they work. In some types of cancer, we do not know if mutations in the sequence compared to the normal genome are the cause or the effect of the cancer itself.

Through the game we will analyze the genome of the cancer cells in parts and in a collaborative way: first we will do a chromosome by chromosome analysis and then we will compare pairs of chromosomes with pairs of chromosomes. Always taking into account the known genome of cells without cancer, we will look in the genome of the cancer cells for the presence of modified or moved fragments or the absence of known sequences.

Analysing breast cancer, we will put a lot of attention looking for information about certains genes that science know are particularly relevant in this type of cancer.

These are the chromosomes we are analyzing in the order we have released to the players

Chromosome 17. (Launched on January 27th) This chromosome contains a high number of breast cancer related genes. Some of them are tumor suppressor genes like TP53 or BRCA1. Tumor suppressor genes are genes that regulate a cell growth and division. Thus, if they are mutated it might result in the development of cancer. For example, a reduction of the function of BRCA1 has been associated with about 40% of inherited breast cancer. Other associated genes are oncogenes like MAP2K4 or BCAS3. Oncogenes are genes that, when mutated or overexpressed, can cause cells to survive and proliferate, instead of undergo a programmed cell death (apoptosis). ERBB2 is an important oncogene, since its overexpression is associated with 20% of invasive breast carcinomas. Finally, an interesting chr17 located gene is RAD51C. It’s localized in a region where amplification occurs frequently in breast tumors, suggesting a role in tumor progression. 

Chromosome 10. (Launched on February 1st) This chromosome contains interesting internal rearrangements, like translocations (portion of a chromosome that breaks and jumps to a different location) or duplications (production of one or more copies of a gene or region of a chromosome). Moreover, it encloses breast cancer related genes. Some of them (PTEN or FGFR2) are linked with an important pathway that regulates the cell cycle, playing a role in functions like metabolism, growth, proliferation or cell survival (the PI3K/AKT pathway).  Therefore, an aberrant activation of this pathway will lead to the survival/proliferation of tumour cells. PTEN is a gene that functions as a tumour suppressor, acting as a negative regulator of the PI3K/AKT pathway. Mutations in this gene will lead to overexpression of the pathway, increasing the risk of cancer. FGFR2 can activate the PI3K/AKT pathway, and participates in cell maturity or bone maintaining. An erroneous increment in the number of copies of this gene have been observed in breast cancer. Other interesting genes localized in chromosome 10 are KIF5B or SUFU. The first one functions as a motor protein, and have been observed to be highly expressed in breast cancer.  SUFU plays a role in the hedgehog pathway, which is a signalling pathway that participates in human development, and is mostly inactive in the adult organism. Thus, an aberrant hedgehog signalling has been linked with various cancer types. 

Chromosome 7. (Launched on Febrary 4th) This chromosome contains some internal rearrangements, like an interesting translocation.  It encloses breast cancer related genes. We can find BRAF or KMT2C, two of the most commonly mutated genes in this type of cancer. BRAF is an oncogene that participates in cell division and differentiation. KMT2C participates in the modification of histones (proteins that protect DNA). This gene has a mutation frequency of 8% in breast cancer. We can find other important genes like EGFR, an epidermal growth factor receptor, that leads to cell proliferation. Amplifications and mutations of this gene have been shown to be the detonating factors in many cancer types. 

Chromosome 16. (Launched on Febrary 4th) This chromosome contains small rearrangements and some breast cancer related genes. We can find tumour suppressor genes like PALB2, BRD7 or CTCF. PALB2 helps to repair DNA breaks, and BRD7 has an important role by interacting with the oncogene p53 and preventing tumour growth. CTCF regulates gene expression and is involved in the 3D structure of the genome. It is commonly mutated in breast cancer cell lines and breast tumours. Moreover, chromosome 16 contains CDH1, which is a gene that encodes a protein involved in the adhesion of proteins. Mutations in this gene are related with a variety of cancers, since its loss of function is thought to contribute to cancer progression. 

Chromosome 1. (Launched on Febrary 11th)This chromosome is the largest human chromosome. It contains some important breast cancer related genes such as MTOR, involved in the PI3K/AKT pathway with an essential role in cell growth, proliferation, apoptosis and angiogenesis. Deregulation of MTOR has been observed in many cancer types. BCAS2, also in chr1, has been associated with breast cancer since it increases the activity of oestrogen receptor (ER), and might promote carcinogenic processes in breast cancer cells. Moreover, tumour suppressor genes like SPEN are located in chr1. 

Chromosome 13. (Launched on Febrary 11th) This chromosome contains an important breast cancer gene, BRCA2. This gene is involved in DNA double strand break repair. Mutations in this gene have become a hallmark for hereditary breast and ovarian cancers. Moreover, it contains RB1, a gene that regulates negatively the cell cycle, and was the first tumor suppressor gene found. 

Chromosome 2. (Launched on Febrary 18th) This chromosome contains some breast cancer related genes like BARD1, DNMT3A or SF3B1. The first gene, BARD1, encodes a protein that interacts with other breast cancer gene, BRCA1 (chr17). Their interaction promotes tumour suppression functions, since they participate in double strand break repair and apoptosis. DNMT3A participates in methylation processes, and its downregulation has been related with breast cancer. On the other hand, SF3B1 is involved in RNA splicing. Mutations in this gene have been described in breast cancer. 

Chromosome 3. (Launched on Febrary 18th) This chromosome conteinsan important oncogene, PIK3CA.This gene has the highest mutation frequency in breast cancer, being an important focus on cancer studies in the last decade. It has a role on diverse cell functions, including proliferation and survival. Other gene located in this chromosome is SETD2, involved in histone modifications. Its mutations have been commonly found in cancers, and has a high frequency mutation rate in phyllopodes tumour of the breast (PT). This type of tumours can be really aggressive. 

Chromosome 4. (Launched on Febrary 25th) This chromosome contains some breast cancer related genes like REST, FGF2 or FBXW7. REST acts as a transcriptional repressor of neuronal genes in non-neuronal tissues. Interestingly, can act as an oncogene or a tumour suppressor depending on the context. FGF2 is a member of the fibroblast growth factor family (FGF). It participates in wound healing or tumour growth among other functions. FBXW7 participates in the ubiquitination of proteins. Is a critical tumour suppressor gene and mutations in this gene have been detected in ovarian and cancer cell lines.  

Chromosome 5. (Launched on Febrary 25th) This chromosome conteins tumour suppressor genes like APC or IRF1. APC encodes a tumour suppressor protein that acts as an antagonist of cell surface signaling pathways. Is involved in cell migration or apoptosis among other functions. IRF1 acts as a transcriptional regulator and tumour suppressor. It activates the transcription of genes involved in immune response. Defects in this gene have been associated with some cancer types. Other interesting gene located in chromosome 5 is TERT, which encodes a protein that maintains telomere ends in the chromosomes. Deregulation of this protein may be involved in oncogenesis. 

Chromosome 21. (Launched on March 4th) This chromosome contains an interesting breast cancer related gene, RUNX1. This gene controls the expression of genes essential for cell development. A bad regulation in this gene is associated with many cancers, including breast. 

Chromosome 22. (Launched on March 4th) This chromosome is one of the smaller chromosomes, it contains three breast cancer related genes. One of them is PRODH which encodes a mitochondrial protein that is involved in processes that produce ATP or reactive oxygen species, having a role in cell survival or death. CHEK2 is another gene involved in cell cycle checkpoints and is a tumour suppressor. It stabilizes tumour suppressor protein p53, leading to cell cycle arrest. Moreover, it interacts with BRCA1 (located in chromosome 17), thus is involved in cell survival after DNA damage. Mutations on CHEK2 confer predisposition to breast cancer, sarcomas or brain tumours among others. Finally, APOBEC3A is a gen that encodes a protein involved in immunity. Mutagenesis in this gene are one of the major sources in breast cancer.  

Chromosome 6. (Launched on March 11th) This chromosome contains a huge variety of breast cancer related genes. Like TRIM27, a transcription repressor involved in cell senescence. It has a role in the development of cancer, since is highly expressed in cancer cells, leading to cell dysregulation, tumour cell proliferation and migration. It has the potential to serve as a biomarker for cancer patients. MAPK14 is a member of the MAP kinase family. They act as integration point for multiple biochemical signals and are involved in a wide variety of cellular processes. This gene has an essential role in cell migration in breast cancer cells. HSP90AB1 encodes a protein that belong to the family of HSP (heat shock proteins), which are involved in cell survival, signal transduction or protein folding among other processes. They have been related with tumour formation and cancer cell proliferation, and are being studied as new therapeutic approaches in cancer treatment. Finally, FOXO3 functions as a trigger for apoptosis through expression of genes necessary for cell death. Is an important tumour suppressor gene in a variety of human cancers.  

Chromosome 20. (Launched on March 11th) This chromosome contains BCAS4 and BCAS1, two genes located in the region 20q13.2, a region that undergo amplification, overexpression and fusion in breast cancer. Amplification of this region is associated with more aggressive tumour phenotypes. Other breast cancer related gene located in this chromosome is CD40. This gene is a receptor on antigen-presenting cells of the immune system, which mediates a broad variety of immune inflammatory responses. Is a member of the tumour necrosis factor receptor (TNR) family, which are proteins that develop antitumour responses against cancer cells. CD40 is broadly expressed on the surface of immune cells and in diverse cancer types, including breast. 

Chromosome 19. (Launched on March 25th) This chromosome contains one tumour suppressor gene like STK1, which regulates cell polarity and is involved in the cell cycle and is altered in almost 3% of cancers with lung adenocarcinoma or breast invasive ductal carcinoma. Moreover, we can find other genes like CCNE1 or KCNN4. Overexpression of CCNE1 has been observed in many tumors, which results in chromosome instability and may contribute to tumorigenesis. KCNN4 encodes a protein that participates in the formation of potassium channels in the cell membrane. It has been seen that this gene is a modulator of progression and drug resistance in breast cancer. Targeting this gene might serve as a therapeutic strategy.

Chromosome 8. (Launched on March 25th) This chromosome encloses some cancer related genes like LOXL2, MYC or NDRG1. LOXL2 encodes a protein essential for biogenesis of connective tissue. Moreover allows the cross-link of collagen and elastin in the extracellular matrix of tumors, facilitating the process of metastases. Is of particular interest in cancer biology since is highly expressed in some tumors, and affects proliferation of breast cancer cells. Amplification of MYC is observed in numerous human cancers. Moreover, is highly expressed in triple-negative breast cancer type, the most aggressive breast cancer subtype. Finally, NDRG1 encodes a cytoplasmic protein involved in stress responses, cell growth and differentiation. It drives tumor progression and brain metastasis in aggressive breast cancers, thus it may serve as a therapeutic target and prognostic biomarker.

Chromosome 9. (Launched on 1 April) This chromosome contains the NOTCH1 gene, an important gene as it is part of a signalling pathway involved in processes related to cell fate specification, differentiation and proliferation. Increased Notch receptors have been observed in several types of cancer, including breast cancer. In addition, chromosome 9 contains the SMC5 gene, which is involved in DNA recombination, cellular senescence and DNA double-strand breaks. Changes in the expression of this gene have been observed in breast cancer patients.

Chromosome 11. (Launched on April 8th). This chromosome contains the gene ATM that encodes a protein which is an important cell cycle checkpoint. It regulates a wide variety of proteins, including p53 or BRCA1, two important tumor suppressors. Mutations on the ATM gene are associated with an increased risk on breast cancer development and a worse prognosis.

Chromosome 12. (Launched on April 8th). This chromosome contains three interesting breast cancer genes. CD9 encodes a cell surface glycoprotein. It participates in differentiation, adhesion and signal transduction. Expression of this gene plays a critical role in suppression of cancer cell motility and metastasis. The ETV6 gene is involved in protein-protein interactions. Rearrangements of this gene have been seen in secretory breast carcinoma patients, which is an uncommon type of breast cancer which usually has a favorable outcome. Other gene is MDM2. This gene encodes an ubiquitin ligase. It can promote tumor formation by targeting tumor suppressor proteins, such as p53. Thus, overexpression of this gene is detected in a variety of different cancers.

Chromosome 14: (Launched on April 22h). This chromosome contains the DICER1 gene. It encodes a miRNA processing protein that regulates gene expression. The processing of miRNA have been related with a broad range of cancer types, thus, mutations in DICER1 have been related with cancer development. Is also known as a strong antiviral agent with activity against RNA viruses like SARS-CoV-2. Other gene contained is TEP1. It encodes a protein that catalyzes the addition of new telomeres on the chromosome ends. Since telomere length has been related with breast cancer, is a really interesting gene understudy. Gene AKT1 is also present in this chromosome: it is a known oncogene that plays a role in cell survival, angiogenesis and tumor formation.

Chromosome 15: (Launched on April 22h). This chromosome encloses some cancer related genes like SMAD3, a tumour suppressor gene. It transmits signals from the cell surface to the nucleus, regulating gene activity and cell proliferation. NTRK3 encodes a protein that acts as a membrane-bound receptor. Mutations in this gene have been associated with medulloblastomas, secretory breast carcinomas and other cancers. this chromosome also contains IGF1R gene, which binds to the insulin-like growth factor. Ii is highly overexpressed in malignant tissues by enhancing cell survival.

Chromosome 18: (Launched on April 22h). This chromosome contains the gene SMAD4, which encodes a signal transduction protein. A low expression of this gene have been related with the development of breast cancer. This is because SMAD4 is activated by the growth factor TFG-beta, which usually is altered in various tumor types. Another gene present in this chromosome is BCL2. This gene encodes an integral outer mitochondrial membrane protein that blocks the apoptotic death of some cells such as lymphocytes.  Constitutive expression of BCL2, is thought to be the cause of follicular lymphoma, a cancer type that affects white blood cells, lymphocytes.

Chromosome X: (Launched on April 22h). This chromosome contains the gene KDM6A, which encodes UTX, a histone demethylase involved in embrionic development. Mutations on this gene has been described in a wide range of cancers, like breast cancer. Other enclosed gene is XIST, which participates in the X chromosome silencing process in mammalian females, to provide dosage equivalence between males and females, since males have only one X chromosome. XIST is expressed exclusively in the X chromosome that is inactivated. Expression of this gene has been found to be dysregulated in a variety of human cancers when compared to normal cells.