weix.us

Filtering the Firehose

To keep up with publications and preprints in my research, I find it useful to get everything delivered in one place. By generating RSS feeds from a set of specific search queries, one can filter for relevant work and new publications—as opposed to sifting through five or ten whole journals worth of content.

This post serves as a companion to the slides from my literature workshop with the UW-Madison ASBMB.

1. Install an RSS reader.

Zotero—a popular free and open-source citation software—can read RSS feeds. If you don’t want to install anything new and want to do all of your literature work in one app, I’d recommend this one. Otherwise, there’s lots of great specialty readers to choose from: NetNewsWire (free, MacOS/iOS), Outlook (Windows), InoReader (Web), Feeder (android) and others. Once you’ve got your feed reader of choice, then you can start thinking about what to put into it. This will be your one-stop shop for keeping up with new publications from different labs and areas of research.

2. Assemble mini-journals.

While the big journals have their own RSS feeds, each one mirroring the latest table of contents, skimming each one can become tedious. I bypass most of these journals, following to only a few smaller journals in the areas I’m interested in. To replace the large journals, I make a set of “mini-journals” about specific topics (or following specific authors/labs) using PubMed. Using a search query, you can tell PubMed to generate an RSS feed. This will pull in papers even from obscure journals that you may not otherwise follow, as well as nabbing relevant information from the big journals.

For each “mini-journals,” we want it to return the perfect results for a specific topic. Envision the ideal paper: what keywords appear in it? What topics does it cover? Even four or five specific words can be the difference between a wide search and a narrow one. Adding more terms to your search makes your search more specific—and this is usually better. In his Power Reporting Tutorial, Bill Dedman suggests that we think of searches as a sort of “zooming lens” on the internet. An initial search should be specific enough that you get very few (or even zero) results! This way, you can “zoom back out” by removing restrictions, rather than adding them.

To achieve this, we turn to Boolean logic: the language of true/false statements. George Boole’s framework gives us access to a powerful set of modifications that can make our searches even more powerful and specific. In fact, you’ve probably used them at some point! “AND,” “OR,” and “NOT” are the staple operations in Boolean algebra, as well as stable modifiers in search engines and databases. Here’s what they mean in a little more common language:

  • “X” AND “Y”: Return all results that meet both condition X and condition Y. For example, “apples” AND “pies” would show us a list of pages that mention both apples and pies.
  • “X” OR “Y”: Return all results that meet condition X or condition Y. For example, “apples” OR “pies” would show us a list of pages that mention apples or pies, or even both.
  • “X” NOT “Y”: Return all results that meet condition X, but do not meet condition Y. For example, “apples” NOT “pies” would show us a list of pages that mention apples, but do not mention pies.

The PubMed Advanced Search tool helps with structuring these. Once I got the hang of it, I found it easier to write my own searches rather than use the tool.

Example Feed: “Killer Yeast”

As an example, I’ll walk through a feed that I generated to find papers related to my interests in toxin/antitoxin systems in yeast (colloquially called “killer yeast”). An initial search for just “killer yeast” yields 97 results, which is actually quite good—though it may miss some papers that are just about the mechanisms of the system.

To get everything that is related to the killer yeast systems, I looked for other names for the killer yeast: “k1 toxin,” “k2 toxin,” “killer toxin,” and “killer virus.” We can then do a search for either “killer yeast” or papers that match all of these terms.

"killer yeast" OR ("yeast" AND ("killer toxin" OR "killer virus" OR "k1 toxin"
OR "k2 toxin"))

Now we have 582 results, which is much better! While it might look a little hard to comb through at first, looking at the publication history shows a steady publication rate in of around 7-15 papers per year since the 1970s. That’s exactly what I want for this journal: specific enough to usually be relevant and in an area where papers are still being published.

When making these journals, play around with synonyms, pluralizations, and other little tweaks to words that can change how a search works. Once you have one that you’re satisfied with, generate an RSS feed and load it into your reader program of choice.

Further Reading

  • The Fraser Lab Method of Following the Scientific Literature (James Fraser, 2019) offers a method similar to this article. Fraser advocates adding in feeds for the biggest journals, and includes URLs for many of them. I prefer a more heavily-curated feed, since it means less to sort through. Try out both and see which you like!
  • The Power Reporting Tutorial (Bill Dedman, 1997) explains boolean search operators. Although Altavista Advanced has long since passed to the great beyond, boolean logic remains very popular in modern search engines and databases.
  • Automating Academic Literature Searches With RSS Feeds and Google Reader [PubMed] (Erick M. Dubuque, 2011) outlines the steps for doing something like this with the now-defunct Google Reader. However, the tutorial remains thorough and useable with other software. It also offers a great explanation of PubMed search syntax!
  • arXiv, bioRxiv, and other preprint serves also have RSS feeds—this can be a great way to stay up-to-date with the newest work in an area, though this research hasn’t yet been peer-reviewed.

Updated by Elliott Weix.