Model Changes and Extensions

Posted on Mon 08 January 2018 in articles

Models change...

Whenever you integrate data into a database, you have to choose a data model. The data model defines the available elements of data and how they relate to each other. Sometimes, you explicitly prescribe the data model, e.g. in an object-relational mapping (ORM). At other times, you …

Continue reading

Testing Times for Data Analytics

Posted on Thu 28 December 2017 in articles

TL;DR: You should do automated testing of your data analytics solution, using data consistency checks.

The best do test

If you write software that is being developed and used over an extended period of time, you will know about the value of automated testing. This may include unit tests …

Continue reading

Ozelot 0.2.2 and Recursive Task Clearing

Posted on Fri 01 December 2017 in articles

Ozelot version 0.2.2 has been released yesterday. This release contains a few bug fixes and improvements. Most notably, it fixes the order in which tasks are cleared when using ozelot.etl.tasks.check_completion(...) with the clear=True flag.

As always, you can find the latest code on GitHub …

Continue reading

I Like Pandas (More than SQL)!

Posted on Sat 21 October 2017 in articles

I recently felt the urge to brush up on my SQL skills. Doing most of my data analytics work in Python, and talking to databases via SQLAlchemy and an ORM layer, I hadn't had any need to write 'advanced SQL'. My interest was sparked by platforms like Mode and Periscope …

Continue reading

My Evolution of Data Integration

Posted on Fri 13 October 2017 in articles

If you have a cool idea for a visualization or a machine learning project, preparing data is probably not what you want to spend your time on. You want to get over it and move on. While trying to get to results quickly, I have repeatedly undergone an evolution in …

Continue reading

538 Ideas for Data Projects

Posted on Fri 06 October 2017 in articles

Sometimes you need inspiration for data science projects. Maybe you want to try out some new technology, build a demo project, or learn by studying and building on interesting ideas.

One of my favourite sources is the data repository of the FiveThirtyEight data journalism site. The repository contains a selection …

Continue reading

98% of Excel Power Users Get This Question Wrong!

Posted on Thu 28 September 2017 in articles

And the question is...

... should I be doing mission-critical data analytics in Excel?

Excel (like its siblings from other vendors) has lots of powerful features and lots of valid use cases. Serious analytics work, on the basis of which you want to make important decisions, is not one of them …

Continue reading Opens its Doors

Posted on Sun 27 August 2017 in articles opens its doors today. Welcome!

Continue reading

'Ozelot' Now Publicly Available

Posted on Fri 18 August 2017 in articles

Today the 'Ozelot' library, version 0.2, is being made available to the public. It is hosted on GitHub, the documentation is on readthedocs.

Ozelot is a Python library for building maintainable data integration pipelines. The library is based on Luigi for pipeline management and SQLAlchemy for the ORM layer …

Continue reading