Recently, I was tasked with building a method for performing data analysis as a service, as well as building a data science team that could perform those services for various industries. So, I quickly kicked off a deep exploration of the field and the technologies available in the space. I found what I learned to be very engaging and it got me thinking not only about technical challenges in performing analysis, but also managing the process of doing data science in a business context.
Now that Serial, Season 1, is over, I’ve been thinking about some stories that would be better told in the Serial format, slowly examining one angle at a time, with the soul of the story building over time.
I work very close to a golf course, and so when weather permits, I get out there to practice on the range during the occasional lunch break. Its been about 1 year since I played my very first round of golf, and last week, I broke 80. Since the beginning, I’ve gotten my share of punishment and reward from the game, and now that I’m a decent enough golfer that its possible to play at that level, I’ll tell you what I’ve discovered about my own game, and perhaps anyone’s game at that level.
I occasionally get invited to attend performances at a Resort / Casino in the Palm Springs area by a family member who has hookups there. Usually what happens is we’ll get seats to a performance, and often before the show, my wife and I will get the opportunity to meet and greet with the performer. Usually, the performer is kinda in a trance, stuck in their own thoughts, presumably preparing mentally for their performance, and the meet and greet is more like a quiet handshake and a picture, and that’s that. But one time, Joan Rivers did a show there, and in the meet and greet, she was clearly a much different kind of person than many of the other performers that we’d been able to meet, and I also got some interesting insight into what kind of business woman she is. It was illuminating.
DBPedia, in general, is a linked-data data extraction of Wikipedia. If you’ve been living under a rock and don’t know what Wikipedia is, its a crowd sourced encyclopedia hosted on the internet. In terms of data structure, Wikipedia reports on its own wiki page that it is powered by clusters of Linux servers and MySQL databases, and uses Squid caching servers in order to handle the 25,000 to 60,000 page requests per second that it gets on average. In terms of the product, it is very culturally significant in that it is one of the most referenced sources of general information on earth, if not the outright leader. Again, DBPedia, for all intents and purposes, is a linked-data version of that dataset.
So, you’ve heard of a triplestore, that’s an important first step. Now, you’re wondering why you’d need one? That is a good question. I believe that the best way to answer the question is to talk a little bit about we know about triples as a data model, what SPARQL is good for, and where the industry has gone in the last few years that has caused us to need triples and SPARQL in the first place. Let’s get started.
The act of computationally creating an answer via cognitive computing or conceptual reasoning rather than searching for it with text curiously gets described in so many ways, but nobody ever seems to talk about it directly, its always a talked about in terms of how it is done. I propose we call it “answer synthesis”. Let’s dig deeper.
Ubiquitous Computing, as a term, has been around for quite some time now. It refers to a state of computing in which there is a presence of data, interfaces, computing, etc, that is essentially omnipresent and is available for interaction in a wide variety of forms for a wide array of purposes. In essence, when people talk about the Internet of Things, they usually are describing what others refer to as ubiquitous computing. One of the aspects of this paradigm that makes it ubiquitous is a somehow-universal interoperability between all things connected.
Also, separate from that, there should be a sense of ambient intelligence that persists around all of these interacting agents. Obviously, interoperability, intelligence, high-availability, access, security, communication, data interoperability, data analysis, prediction, etc, are all under the umbrella of the term. However, is all of this really needing to be solved in order to have the user experience of having interoperability and ambient intelligence? I think not. Either way, there are lots of things to think about when it comes to putting your finger on what the real problems are that are left to solve in this space.
Semantic web is alive, and I will tell you why. But first, let me tell you how I arrived at this conclusion.
When I first came to my current job, I was tasked with writing an automated implementation of Schema.org as a service, which could be implemented by multi-site owners as a way to shortcut the tagging and structuring of their site data for the sake of acquiring rich snippets, and ultimately to get better search engine performance.
During that time, I learned a lot about schema.org, semantic web technologies, linked data, and Google. So, with that said, if you’re here wanting to know if you should care about the semantic web, let me drop some knowledge.