Since we recently announced our $10001 Binary Battle to promote applications built on the Mendeley API (now including PLoS as well), I decided to take a look at the data to see what people have to work with. My analysis focused on our second largest discipline, Computer Science. Biological Sciences (my discipline) is the largest, but I started with this one so that I could look at the data with fresh eyes, and also because it’s got some really cool papers to talk about. Here’s what I found:
What I found was a fascinating list of topics, with many of the expected fundamental papers like Shannon’s Theory of Information and the Google paper, a strong showing from Mapreduce and machine learning, but also some interesting hints that augmented reality may be becoming more of an actual reality soon.
The top graph summarizes the overall results of the analysis. This graph shows the Top 10 papers among those who have listed computer science as their discipline and chosen a subdiscipline. The bars are colored according to subdiscipline and the number of readers is shown on the x-axis. The bar graphs for each paper show the distribution of readership levels among subdisciplines. 17 of the 21 CS subdisciplines are represented and the axis scales and color schemes remain constant throughout. Click on any graph to explore it in more detail or to grab the raw data.(NB: A minority of Computer Scientists have listed a subdiscipline. I would encourage everyone to do so.)
1. Latent Dirichlet Allocation (available full-text)
LDA is a means of classifying objects, such as documents, based on their underlying topics. I was surprised to see this paper as number one instead of Shannon’s information theory paper (#7) or the paper describing the concept that became Google (#3). It turns out that interest in this paper is very strong among those who list artificial intelligence as their subdiscipline. In fact, AI researchers contributed the majority of readership to 6 out of the top 10 papers. Presumably, those interested in popular topics such as machine learning list themselves under AI, which explains the strength of this subdiscipline, whereas papers like the Mapreduce one or the Google paper appeal to a broad range of subdisciplines, giving those papers a smaller numbers spread across more subdisciplines. Professor Blei is also a bit of a superstar, so that didn’t hurt. (the irony of a manually-categorized list with an LDA paper at the top has not escaped us)
2. MapReduce : Simplified Data Processing on Large Clusters (available full-text)
It’s no surprise to see this in the Top 10 either, given the huge appeal of this parallelization technique for breaking down huge computations into easily executable and recombinable chunks. The importance of the monolithic “Big Iron” supercomputer has been on the wane for decades. The interesting thing about this paper is that had some of the lowest readership scores of the top papers within a subdiscipline, but folks from across the entire spectrum of computer science are reading it. This is perhaps expected for such a general purpose technique, but given the above it’s strange that there are no AI readers of this paper at all.
3. The Anatomy of a large-scale hypertextual search engine (available full-text)
In this paper, Google founders Sergey Brin and Larry Page discuss how Google was created and how it initially worked. This is another paper that has high readership across a broad swath of disciplines, including AI, but wasn’t dominated by any one discipline. I would expect that the largest share of readers have it in their library mostly out of curiosity rather than direct relevance to their research. It’s a fascinating piece of history related to something that has now become part of our every day lives.
4. Distinctive Image Features from Scale-Invariant Keypoints
This paper was new to me, although I’m sure it’s not new to many of you. This paper describes how to identify objects in a video stream without regard to how near or far away they are or how they’re oriented with respect to the camera. AI again drove the popularity of this paper in large part and to understand why, think “Augmented Reality“. AR is the futuristic idea most familiar to the average sci-fi enthusiast as Terminator-vision. Given the strong interest in the topic, AR could be closer than we think, but we’ll probably use it to layer Groupon deals over shops we pass by instead of building unstoppable fighting machines.
5. Reinforcement Learning: An Introduction (available full-text)
This is another machine learning paper and its presence in the top 10 is primarily due to AI, with a small contribution from folks listing neural networks as their discipline, most likely due to the paper being published in IEEE Transactions on Neural Networks. Reinforcement learning is essentially a technique that borrows from biology, where the behavior of an intelligent agent is is controlled by the amount of positive stimuli, or reinforcement, it receives in an environment where there are many different interacting positive and negative stimuli. This is how we’ll teach the robots behaviors in a human fashion, before they rise up and destroy us.
6. Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions (available full-text)
Popular among AI and information retrieval researchers, this paper discusses recommendation algorithms and classifies them into collaborative, content-based, or hybrid. While I wouldn’t call this paper a groundbreaking event of the caliber of the Shannon paper above, I can certainly understand why it makes such a strong showing here. If you’re using Mendeley, you’re using both collaborative and content-based discovery methods!
7. A Mathematical Theory of Communication (available full-text)
Now we’re back to more fundamental papers. I would really have expected this to be at least number 3 or 4, but the strong showing by the AI discipline for the machine learning papers in spots 1, 4, and 5 pushed it down. This paper discusses the theory of sending communications down a noisy channel and demonstrates a few key engineering parameters, such as entropy, which is the range of states of a given communication. It’s one of the more fundamental papers of computer science, founding the field of information theory and enabling the development of the very tubes through which you received this web page you’re reading now. It’s also the first place the word “bit”, short for binary digit, is found in the published literature.
8. The Semantic Web (available full-text)
In The Semantic Web, Tim Berners-Lee, Sir Tim, the inventor of the World Wide Web, describes his vision for the web of the future. Now, 10 years later, it’s fascinating to look back though it and see on which points the web has delivered on its promise and how far away we still remain in so many others. This is different from the other papers above in that it’s a descriptive piece, not primary research as above, but still deserves it’s place in the list and readership will only grow as we get ever closer to his vision.
9. Convex Optimization (available full-text)
This is a very popular book on a widely used optimization technique in signal processing. Convex optimization tries to find the provably optimal solution to an optimization problem, as opposed to a nearby maximum or minimum. While this seems like a highly specialized niche area, it’s of importance to machine learning and AI researchers, so it was able to pull in a nice readership on Mendeley. Professor Boyd has a very popular set of video classes at Stanford on the subject, which probably gave this a little boost, as well. The point here is that print publications aren’t the only way of communicating your ideas. Videos of techniques at SciVee or JoVE or recorded lectures (previously) can really help spread awareness of your research.
10. Object recognition from local scale-invariant features (available in full-text)
This is another paper on the same topic as paper #4, and it’s by the same author. Looking across subdisciplines as we did here, it’s not surprising to see two related papers, of interest to the main driving discipline, appear twice. Adding the readers from this paper to the #4 paper would be enough to put it in the #2 spot, just below the LDA paper.
So what’s the moral of the story? Well, there are a few things to note. First of all, it shows that Mendeley readership data is good enough to reveal both papers of long-standing importance as well as interesting upcoming trends. Fun stuff can be done with this! How about a Mendeley leaderboard? You could grab the number of readers for each paper published by members of your group, and have some friendly competition to see who can get the most readers, month-over-month. Comparing yourself against others in terms of readers per paper could put a big smile on your face, or it could be a gentle nudge to get out to more conferences or maybe record a video of your technique for JoVE or Khan Academy or just Youtube.
Another thing to note is that these results don’t necessarily mean that AI researchers are the most influential researchers or the most numerous, just the best at being accounted for. To make sure you’re counted properly, be sure you list your subdiscipline on your profile, or if you can’t find your exact one, pick the closest one, like the machine learning folks did with the AI subdiscipline. We recognize that almost everyone does interdisciplinary work these days. We’re working on a more flexible discipline assignment system, but for now, just pick your favorite one.
These stats were derived from the entire readership history, so they do reflect a founder effect to some degree. Limiting the analysis to the past 3 months would probably reveal different trends and comparing month-to-month changes could reveal rising stars.
To do this analysis I queried the Mendeley database, analyzed the data using R, and prepared the figures with Tableau Public. A similar analysis can be done dynamically using the Mendeley API. The API returns JSON, which can be imported into R using the fineRJSONIO package from Duncan Temple Lang and Carl Boettiger is implementing the Mendeley API in R. You could also interface with the Google Visualization API to make motion charts showing a dynamic representation of this multi-dimensional data. There’s all kinds of stuff you could do, so go have some fun with it. I know I did.
PhD candidates: You are welcome and encouraged to deposit your dissertation here, but be aware that
1) it is optional, not required (the ProQuest deposit is required); and
2) it will be available to everyone on the Internet; there is no embargo for dissertations in the UNL DigitalCommons.
Master's candidates: Deposit of your thesis or project is required. (If an embargo, [restricted access] is necessary, you may deposit the thesis at http://digitalcommons.unl.edu/embargotheses/ — but only after getting the prior approval of your department and the Graduate Office; contact Terri Eastin).
All depositors: We try to observe a 24-hour "cooling off" period to give you opportunity to correct those "oops" issues that seem to emerge just after deposit.
Upon deposit, you will immediately receive an email that your submission has been received (and this is what you need to show the Graduate Office).
However, you can still log back in and select Revise and upload a new version with your advisor's name spelled right, or your mother thanked in the Acknowledgments, or whatever you're stressing about.
After about a day, your submission will be "published" or "posted", making it available to the Internet; you will get another email to that effect, and your submission can no longer be changed--by you.
If further changes are needed, these can be made by sending a revised file to the administrator < firstname.lastname@example.org > requesting replacement of the current online version. DO NOT RESUBMIT YOUR THESIS / DISSERTATION. That creates duplicate records, confusion, wasted effort, frustration, sadness, tears, and causes kittens to get sick.
Finally: Congratulations; you are almost there. Click the "Submit your paper or article" link at the bottom of the gray box at left. Follow the instructions. You should be able to copy (Ctrl-C) and paste (Ctrl-V) most fields.
You are the sole author; your advisor is not considered a co-author.
Your institution is "University of Nebraska-Lincoln" (not "at Lincoln" or ", Lincoln"). Do not leave it blank; then the administrator has to fill it in, and he is tempted to make it something silly.
You do not need to repeat your name and title in the Abstract field; just the body of the abstract.
When you reach the question "Was this submission previously published in a journal?", just skip that part.
Be sure to click the "Submit" button at the bottom. Files upload at the rate of about 5 Mb per minute, so if you have an ungodly large file, it may take a bit of time. If your file exceeds 40 Mb, think about reducing its size--there are many ways; Google "reduce pdf file size" to find some.
Okay, get started. That thesis is not going to submit itself.
Detection of Plant Emergence Based on Spatio Temporal Image Sequence Analysis, Bhushit Agarwal
Investigating Diversity in Open Multiagent Team Formation, Pooja Ahuja
An Unmanned Aerial System for Prescribed Fires, Evan M. Beachly
PLANT IMAGE PROCESSING: 3D VOLUME RECONSTRUCTION, HYPERSPECTRAL INFORMATION MINING AND VISUALIZATION, Shi Cao
INVESTIGATING AGENT AND TASK OPENNESS IN ADHOC TEAM FORMATION, Bin Chen
Hierarchical Active Learning Application to Mitochondrial Disease Protein Dataset, James D. Duin
Design and Implementation of a Stand-Alone Tool for Metabolic Simulations, Milad Ghiasi Rad
Exploring the Telecommunications Properties of the Human Nervous System: Analytical Modeling and Experimental Validation of Information Flow through the Somatosensory System, Natalie Hanisch
Analytical Modeling of a Communication Channel Based on Subthreshold Stimulation of Neurobiological Networks, Alireza Khodaei
Rate based Impact Analysis, Nishant Sharma
Deep Learning and Transfer Learning in the Classification of EEG Signals, Jacob M. Williams
Querying and Visualization of Moving Objects Using Constraint Databases, Semere M. Woldemariam
Study of comparison of OCS and Hybrid switching in FSO data centers, Suraj Yadav
Feature Extraction and Parallel Visualization for Large-Scale Scientific Data, Lina Yu
Autonomous UAVs for Near Earth Environmental Sensing, David J. Anthony
TOWARDS BUILDING AN INTELLIGENT INTEGRATED MULTI-MODE TIME DIARY SURVEY FRAMEWORK, Hariharan Arunachalam
Determination of Plant Architecture and Component Phenotyping Based on Time-lapse Image Analysis, Srinidhi Bashyam
Using Software Testing Techniques to Infer Biological Models, Mikaela Cashman
A ROADMAP TO SAFE AND RELIABLE ENGINEERED BIOLOGICAL NANO-COMMUNICATION NETWORKS, Justin W. Firestone
Why Do Record/Replay Tests of Web Applications Break?, Mouna Hammoudi
A New System for Human MicroRNA functional Evaluation and Network, Jiachun Han
SEMEO: A SEMANTIC EQUIVALENCE ANALYSIS FRAMEWORK FOR OBFUSCATED ANDROID APPLICATIONS, Zhen Hu
Towards building a review recommendation system that trains novices by leveraging the actions of experts, Shilpa Khanal
Exploring Dynamic Memory Allocations for Bioinformatics Applications, Nitya Kovur
Finding DNA Motifs: A Probabilistic Suffix Tree Approach, Abhishek Majumdar
Sonifying Git History, Kevin J. North
OPTIMIZATION OF IRRIGATION DECISION IN CORNSOYWATER, Dharmic Payyala
TESTING THE INDEPENDENCE HYPOTHESIS OF ACCEPTED MUTATIONS FOR PAIRS OF ADJACENT AMINO ACIDS IN PROTEIN SEQUENCES, Jyotsna Ramanan
On Path Consistency for Binary Constraint Satisfaction Problems, Christopher G. Reeson
USE OF CLUSTERING TECHNIQUES FOR PROTEIN DOMAIN ANALYSIS, Eric Rodene
The Effect of Frequency Resolution on Intelligibility Sentence and its Relevance to Cochlear Implant Design, Seth H. Roy
EventFlowSlicer: A Goal-based Test Case Generation Strategy for Graphical User Interfaces, Jonathan Saddler
Characterization of Molecular Communication Based on Cell Metabolism Through Mutual Information and Flux Balance Analysis, Zahmeeth Sayed Sakkaff
Significant Permission Identification for Android Malware Detection, Lichao Sun
Power Management in Heterogeneous MapReduce Cluster, Rojee Sunuwar
ACTIVITY ANALYSIS OF SPECTATOR PERFORMER VIDEOS USING MOTION TRAJECTORIES, Anish Timsina
Improving the Efficiency of CI with Uber-commits, Matias Waterloo
ON OPTIMIZATIONS OF VIRTUAL MACHINE LIVE STORAGE MIGRATION FOR THE CLOUD, Yaodong Yang
Joint Resource Provisioning in Optical Cloud Networks, Pan Yi
SECURE AND LIGHTWEIGHT HARDWARE AUTHENTICATION USING ISOLATED PHYSICAL UNCLONABLE FUNCTION, Mehrdad Zaker Shahrak
Rectilinear Steiner Tree Construction, Zhiliu Zhang
AN EXTENDABLE VISUALIZATION AND USER INTERFACE DESIGN FOR TIME-VARYING MULTIVARIATE GEOSCIENCE DATA, Yanfu Zhou
Dynamic Data Management In A Data Grid Environment, Björn Barrefors
A Visual Analysis of Articulated Motion Complexity Based on Optical Flow and Spatial-Temporal Features, Beau Michael Christ
Reflective, Deliberative Agent-Based Information Gathering, Adam D. Eck
Routing Optimization in Interplanetary Networks, Sara El Alaoui
REMOTE MOBILE SCREEN (RMS): AN APPROACH FOR SECURE BYOD ENVIRONMENTS, Santiago Manuel Gimenez Ocano
Model-Based Condition Monitoring and Power Management for Rechargeable Electrochemical Batteries, Taesic Kim
Using Software-Defined Networking to Improve Campus, Transport and Future Internet Architectures, Adrian Lara
Discovery Over Application: A Case Study of Misaligned Incentives in Software Engineering, Eric F. Rizzi
Transforming C OpenMP Programs for Verification in CIVL, Michael Rogers
On Problematic Robotic Thresholds, Adam K. Taylor
Enabling Distributed Scientific Computing on the Campus, Derek J. Weitzel
Visual Analytics for Large Communication Trace Data, Jieting Wu
Bandwidth Estimation for Virtual Networks, Ertong Zhang
A COMPARATIVE STUDY OF GENERALIZED ARC-CONSISTENCY ALGORITHMS, Olufikayo S. Adetunji
A COMPARATIVE STUDY OF UNDERWATER ROBOT PATH PLANNING ALGORITHMS FOR ADAPTIVE SAMPLING IN A NETWORK OF SENSORS, Sreeja Banerjee
DECAF: A New Event Detection Logic For The Purpose Of Fusing Delineated-Continuous Spatial Information, Kerry Q. Hart
DNN: A Distributed NameNode Filesystem for Hadoop, Ziling Huang
INVARIANT INFERRING AND MONITORING IN ROBOTIC SYSTEMS, Hengle Jiang
IMPROVING PREFERENCE RECOMMENDATION AND CUSTOMIZATION IN REAL WORLD HIGHLY CONFIGURABLE SOFTWARE SYSTEMS, Dongpu Jin
Using a UAV to Effectively Prolong Wireless Sensor Network Lifetime with Wireless Power Transfer, Jinfu Leng
ANALYSIS, OPTIMIZATION, AND IMPLEMENTATION OF A UAV-BASED WIRELESS POWER TRANSFER SYSTEM, Andrew Mittleider
A New Spatio-Temporal Data Mining Method and its Application to Reservoir System Operation, Abhinaya Mohan
Autonomous Aerial Water Sampling, John-Paul W. Ore
Measuring Autonomy And Solving General Stabilization Problems With Multi-Agent Systems, Rasheed A. Rajabzadeh
A Self-Adaptive Framework for Failure Avoidance in Configurable Software, Jacob Swanson
POWER MANAGEMENT IN THE CLUSTER SYSTEM, Leping Wang
SimExplorer: A Testing Framework to Detect Elusive Software Faults, Tingting Yu
A Methodology and Tool for Concurrent Fault Injection, ZhongYin Zhang
Understanding Human Learning Using a Multiagent Based Unified Learning Model Simulation, Vlad T. Chiriacescu
Decentralized Collision Avoidance, Jayasri K. Janardanan
Online Ecosystems in Software Development, Corey J. Jergensen
Practical Tractability of CSPS by Higher Level Consistency and Tree Decomposition, Shant Karakashian
Improving Virtual Collaboration: Modeling for Recommendation Systems in a Classroom Wiki Environment, Derrick A. Lam
Discovering Divergence: A Framework for Finding Unexpected Behavior Using Directed Exploration, Heath G. Roehr
FastLane: Flow-Based Channel Assignment in Dense Wireless Networks, Dane N. Seaberg
Clustering and Classification of Multi-domain Proteins, Neethu Shah
Algorithms for Grid Graphs in the MapReduce Model, Taylor P. Spangler
Solving the Search for Source Code, Kathryn T. Stolee
User Modeling via Machine Learning and Rule-Based Reasoning to Understand and Predict Errors in Survey Systems, Leonard Cleve Stuart
Test Advising Framework, Yurong Wang
Energy-efficient Failure Recovery in Hadoop Cluster, Weiyue Xu
Directed Test Suite Augmentation, Zhihong Xu
Automated Test Case Generation to Validate Non-functional Software Requirements, Pingyu Zhang
Data Mining of Protein Databases, Christopher Assi
SIMULATION, DEVELOPMENT AND DEPLOYMENT OF MOBILE WIRELESS SENSOR NETWORKS FOR MIGRATORY BIRD TRACKING, William P. Bennett Jr
Statistical Software Properties: Definition, Inference and Monitoring, Javier A. Darsie
Improving Performance of Solid State Drives in Enterprise Environment, Jian Hu
A WLAN Fingerprinting Based Indoor Localization Technique, Landu Jiang
Dynamic Data Race Detection and Healing, Du Li
Modeling of Yeast Pheromone Pathway using Petri Nets, Abhishek Majumdar
Improving Backup and Restore Performance for Deduplication-based Cloud Backup Services, Stephen Mkandawire
Routing over the Interplanetary Internet, Joyeeta Mukherjee
On heterogeneous user demands in peer-to-peer video streaming systems, Zhipeng Ouyang
AUTOMATION OF LANDMARK SELECTION FOR RODENT BRAIN MRI-HISTOLOGY REGISTRATION USING THIN-PLATE SPLINES, Ayan Sengupta
Identification of TCP Protocols, Juan Shao
A Unifying Approach to Behavioral Coverage, Elena Sherman
AN ENHANCED SELF-ADAPTIVE MAPREDUCE SCHEDULING ALGORITHM, Xiaoyu Sun
Supporting developer-onboarding with enhanced resource finding and visual exploration, Jianguo Wang