Behind the numbers: The many practical ways data science is proving its worth

By Sam Brake Guia September 21, 2018

Data has now become a prevalent and increasingly useful tool, utilized in almost every corner of society, as the popular Reddit sub Data Is Beautiful demonstrates. But it is not just Redditors that are putting this information to use. As a recent post on Tech World demonstrated, there is a strong future ahead for the growing subject of data science, with a variety of uses, such as to build stronger businesses, diversify career opportunities and to create tailored algorithms for business success.

Scientists are now using the power of data science to tackle some of the world’s most pressing issues. El Nino, the warm phase of the El Niño Southern Oscillation, is triggered by periodic warming in the eastern Pacific Ocean which can trigger drought in some regions, heavy rain in others. Given that recent headlines have issued a warning of 70% chance that this event will occur this year, it is clearly a high priority among scientist in order to mitigate its damage. But there is hope thanks to data science.

A collaborative effort among the University of Chicago, University of Wisconsin-Madison and the University of California-Irvine, have produced the new TRIPODS+Climate project which will develop novel data science tools to discover these hidden patterns, improving weather forecasts and scientific understanding of global climate.

“There are fundamental challenges pervasive in data science that are epitomized in the climate science setting, making this collaboration a nice opportunity for advances on a number of fronts,” said Rebecca Willett, professor of computer science and statistics at UChicago, states the university’s website. “The question really is: Can we find some middle ground that’s going to allow us to harness climate data as fully as possible without ignoring existing physical models of climate?”

Moreover, researchers will apply data science methods such as machine learning, network analysis and predictive modeling to the growing flood of climate data.

In addition to this many startups are using data science or the services of data science companies, such as John Snow Labs, an award winning global data operations and AI company. Since its public release, the John Snow Labs NLP Library for Apache Spark has experienced widespread adoption as it set a new bar for production-grade natural language understanding.

In terms of performance, detailed benchmarks published by O’Reilly Media show the library to be 38x to 80x times faster than spaCy (the top performing library to date) on a single machine. Spark NLP is also the only open source library which can natively scale on a distributed cluster, often delivering near-linear scalability.

The rising data science startup will be speaking about their industry-leading technology at the Data Strata Conference September 11-13. The event will be taking place in New York, and the team will convert its half-day tutorial on “Natural Language Understanding at Scale” to cover only Spark NLP, and some of its advanced use cases

“We are super excited to see how quickly the industry’s most selective thought leaders have picked up Spark NLP and increasingly recommend it as the default choice to data scientists worldwide. We are committed as ever to keep pushing the state of the art and provide the community with the best performing and most accurate NLP software ever built”, said Saif Addin Ellafi, lead NLP engineer at John Snow Labs, in a previous press release.

John Snow Labs maintains a full-time development team to keep improving the open source library, which delivered 10 new releases during the first six months of 2018. The library is monetized by licensing pre-trained models and data sets for the healthcare vertical. The models and data sets are continuously updated and optimized for accuracy on top of the high-performing Spark NLP core. The global market for natural language pro.