Internship at Oracle

2013-09-02

University of Waterloo - Internship

Subscribe with RSS

My first co-op job experience was incredible: I moved down to Boston to work with a big-name company right across the street from MIT. Basically, I was in Heaven.

The product I was working on was their Endeca Information Discovery system. Basically, it was a system which would push user data (read: Excel spreadsheets) into Oracle databases and then run information discovery on them to determine data trends, groupings, et cetera.

My first major project was to expand the data input system to include JSON as a data source. Since JSON data and standard databases are fairly different in structure, a large part of this task was in determining exactly how the data should be converted. In the end, I wrote a streaming parser (which, for example, could be connected to an auto-updating URL like Twitter's firehouse) which made heavy use of determined flattening (figuring out which key was most important and flattening the data around that) and multi-selection (basically, data structures within other data structures in the Oracle DB).

I also dealt with increasing the number of useable back-end databases. EID was initially only able to interface with a few pre-selected databases (MySQL 5+, Oracle 11, and SQLite 3, IIRC). By re-configuring the interface between EID and the database connection, I was able to expand this list to include any database which had JDBC drivers.

I finished the term by writing an installation script for the full EID stack. This was previously a notoriously difficult product to install, but can now be installed and configured for any environment with one click.

As always, this article is available on GitHub. Comments (ie. issues) are welcome!