The following is republished from ES Online Blog, a data journalist that covers new media.  

The original post can be found here.

Stephen Larson, founder and CEO of, talked about their online tools to be useful for journalists.

Could you explain your current position and activities briefly?

I’m founder and CEO of a software development firm specializing in web based news media. Our legacy business is content management systems, primarily in the US. Our new products include map and sidebar widgets that enhance content using natural language processing and machine learning. In addition, we provide research tools for journalists and readers.

How do you find data for your works? 

In our case we are creating our own data from our client’s news stories by using natural language processing to identify proper names (aka named entities) in the stories and storing information about them in our database. The three primary tables are:

1.  a named entity table who’s key is the named entity;

2.  a story table who’s key is the URL of the story; and

3.  a table who’s key is the combination of  both named entity and URL, capturing both all the stories a given named entity appears in and all the named entities in a given story.

In the named entity table we store the type of entity it is: a person, organization, place, event, award, holiday or other named period of time. For places, we store the longitude and latitude. The story table has the date and time the story was published.

How to find stories from data?

We find stories for any given location with our interface. Both location and publication date/time are used to order the stories in the default mode but there also are buttons to view a map where all stories are listed regardless of publication date and one to view more recent stories from a wider area.

Could you tell me your best work?

We created a map widget that automatically displays the locations mentioned for any given story. We have quite a few news websites with these maps embedded in every story. It’s described further here:

We also have a, soon to be released, feature that uses the database to add a sidebar to any given story for any named entity, typically one mentioned in the story. The sidebar feature that excites me the most is a kind of word cloud but instead of words, it’s named entities. Other features are a list of other stories that mention the named entity and a clickable timeline of stories that mention the named entity.

Future releases will list related stories based on the frequency of the occurrences of the named entities in the given story and those of other stories.

What is the most notable feature of the work?

As far as I can tell, it’s the largest operating real time database of stories indexed as described above. We index 3 million plus stories and over 1 million unique named entities. Thousands of stories are added daily.

What do you think about the most important skills, knowledge, and attitudes for data journalists to create good pieces?

Beyond technical skills it is most important to understand the true nature of the data sources used.  Just realize that all databases have varying degrees of accuracy but none are truly perfect.