In this post I want to outline how I prepare my related work for a new or ongoing project, how I keep up to date with the literature for it, and how I convert all that reading into a related work section. I’ve experimented with my approach over the last 2 years and tried different variations and strategies. My goal is to minimize time spent searching and reading, while maximizing sensitivity and depth.
I realize that there is no one-size-fits-all approach to this, and you have to see what works for you. My intention here is to show what I’m currently doing, and I hope that it might spark some ideas on what you can try. You’re of course free to simply copy my approach and see how well it works for you (Spoiler: It’s amazing and the best approach ever!), but you will probably want to tweak things to suit your needs. If you do, let me know what you do differently, so I can improve my approach as well 🙂
The method I present here is based around a spreadsheet. I like spreadsheets, because they are easy to modify, sort, and filter. The main selling point for me is that I can easily add another column to track some aspect of my collection of papers that I find important. It also gives a good sense of progress, because I add papers even if I don’t want to include them in the final paper, and simply mark them as excluded with a reason. It gives me peace of mind that I’ve done something during the day, even if I didn’t find any new relevant papers. Here’s a step by step breakdown:
- Identify “seed” papers – Usually, when I start a project it’s because I got an idea from reading a paper, or a small collection of papers. I set these aside as the first entries of my spreadsheet. If you don’t have any seed papers, try Google Scholar with search terms that seem related / relevant. Google Scholar is quite forgiving when it comes to search terms, so you’ll likely find something relevant this way. The idea isn’t to be exhaustive, but just to get a starting point.
- Add the seed papers to the spreadsheet. I usually start with 5 columns: (1) Title, (2) Authors, (3) CitationKey, (4) Included, (5) Reason. I also create a new folder in Mendeley with the name of my project. The CitationKey in the spreadsheet is the key I give to the paper in Mendeley; this has proved helpful when crossreferencing papers. In Included, I put yes if I want to cite the paper in my project, or no otherwise. In the Reason column, I write a short bullet point why I decided to either include or exclude the paper.
- Expand the pool – For each seed paper I look through the related work, and the titles of citations from the last 3 years (google scholar) to find anything that stands out. All the interesting titles get added to the spreadsheet for later reading. I also scroll through older citations, see if any of those papers received a significant amount of attention (high citation count), and add those papers as well if the title looks interesting.
- Filter the expanded pool – Next I will go through the identified papers and check the abstract, again sorting out anything that I can definitely say isn’t relevant. At this point I usually add another column to my spreadsheet called “where”. I note where I made the decision to discard the paper, in the Abstract, Title, or full-text. This helps, because if I discard a paper somewhere in the full-text and look at it 3 months later, the abstract will look relevant again. While working through the expanded pool, I will also look for any jargon and buzz-words that I see frequently / stand out to me. If I find a paper relevant, I will add it to my Mendeley library and the project folder. I will then create a citation alert for it in google scholar, informing me about anything new that’s coming up; I remove all alerts once the project is done.
- Get important keywords – This is a fun step I’ve been experimenting with recently. I will take all the papers that are relevant until now, and I will extract the keywords of them into a new sheet. I will then count and rank order them by occurrence. I add the top 10 or so to the list of buzz-words I identified earlier.
- Create a list – Equipped with this list of keywords, jargon, and buzzwords I head over to Web of Science and Scopus and construct a search query. Using the already identified papers from steps 1 – 4, I can measure how inclusive my query is. The idea is to err on the side of being too broad and be overinclusive – this list will reduce quickly. I will then export the lists from each platform, merge them, remove duplicates from different engines, and amend my spreadsheet with all these papers. Don’t get discouraged if it’s several hundred or even thousands; I read maybe 5% of them.
- Exclude by Title – I will then read through the titles of each paper and exclude all those that I think don’t fit the project. This usually removes around 70% of the list. The reasons why something isn’t relevant tend to cluster, so I sometimes add another column to my spreadsheet where I write down if the excluded paper falls into any of these clusters.
- Exclude by Abstract – Same as 7. but this time with the Abstract. Roughly 10% of the papers survive this stage in my experience.
- Exclude by Full Text – If a paper makes it this far, it goes onto my reading list. I will put it into my Mendeley library, and create a note there summarizing the key takeaway (I try to stick to 180 characters; it’s a fun exercise I got from one of the links below). I will do this, even if I decide to exclude a paper at this stage. I spent effort reading the paper, so I want to make the most of it, and when I later re-encounter the paper I have a summary already.
- Add a relevant paper – If a paper survived until now, it is relevant. I will add a reason why it is relevant, the citation key, and any other aspect that I might find important to my spreadsheet. Again I will create a citation alert for this paper on google scholar, and, if I feel like it, I will also look at the last 3 years of citations in google scholar, adding any relevant looking titles to my spreadsheet; for these I will repeat steps 7 – 10.
At this point I feel confident to claim that I have a pretty solid overview of the literature surrounding my project. All the citation alerts will also keep me up to date. They are likely over-inclusive and will spam me a little, but I just pile them up and go through everything once a week, or bi-weekly depending on how busy I am. Again, I add interesting titles to the spreadsheet and filter them as laid out above. This is also a good time to look through paper suggestions from my supervisors (mine are typically amazing).
The only step that remains is something I do when writing up the full-paper. Once the venue is decided, I will run a relaxed version of my search query over all the papers published at this venue. I will also manually scroll through the last 3 – 5 years of the venue. I really do not want to miss anything from the venue that can even be remotely relevant. I am also a lot less strict for those papers regarding inclusion. The reason is two-fold:
(1) It makes it easy for journal editors to see why my contribution is relevant to the venue. From a Machiavellian perspective I boost their citation count and h-index. Editors like their journal to be important, and this helps them, which is attractive. From a scientific perspective, I clearly work on stuff that the community is working on, too. This makes it harder to argue that my work is a poor fit.
(2) It makes it easier for the editors to find reviewers. The chance of the editor being an expert in my field are slim. They may have a Ph.D. in robotics or computer science, but will likely work in a different niche. From the top of your head, could you suggest 3+ good reviewers for the papers all your in-office colleagues are working on? I couldn’t, and chances are your editor / meta reviewer can’t either. Help them help you, by showing which previous contributions are relevant to your work (by citing them). Here is a question on this topic I asked on StackExchange a while back that might be insightful.
With all the literature reviewed, it’s time to put together a related work section. Probably the most hated section amongst computer science PhDs I’ve talked to. It’s a lot of work, frustrating, unrewarding, and rarely relevant to what we have worked on for the past few months, right? Well not quite. Working in a cross-disciplinary field, and working with an amazing colleague, who is hands down the best when it comes to knowing related work – she seems knows every paper ever written (It’s uncanny) -, has really broadened my horizon in this area; I am very thankful for that.
What I learned is that in CS our view on related work is dominated by the idea “show how your work differs from other, previous work, and how your work addresses shortcomings, issues, and problems of other people’s work“. In psychology the approach is “show how your work builds on-top of previous work, how previous authors have done a great job getting this far, and how you pick up their torch and carry it another mile“. I will let you be the judge which mindset is more positive, productive, and leads to a better paper. Also, what would you as a reviewer like to read? A text that tells you how your paper sucks, and how someone else claims to have done something better (even though they totally misunderstood your intention, and look at a domain you didn’t even target), or a text that tells you how your work is important and how it can be used in other areas than what you originally thought of if you just tweak things slightly?
Going back to the spreadsheet idea I laid out above, for each paper that is included, I already have a bullet point, why this particular paper is relevant. It’s a straight forward exercise to reorganize the sheet and create loose clusters of papers that talk about similar topics based on these bullet points, or based on their content (which I like to summarize in a Mendeley note). From here, I can literally copy the bullet point – which essentially is one sentence – and the 180-ish characters from my mendeley library into an empty google doc. I can then create paragraphs based on the clustering I did beforehand, fix some spelling, and add the citation keys from my spreadsheet to each sentence / paragraph. Et voila! First draft of the related work section completed.
A limitation of this approach is that it currently ignores arxiv and other pre-print sites. I’m experimenting with those, but haven’t really found a good way to navigate them yet that has a good result-to-time-spent ratio. If you have any ideas or tips, please let me know in the comments below!
Thanks for reading and happy researching!