picalo

3
8/11/2019 PICALO http://slidepdf.com/reader/full/picalo 1/3 Picalo is a data analysis application, with focus in fraud detection and data retrieved from corporate databases. It is also the foundation for an automated fraud detection system (see below). It started in 2000 as a pet project of Conan C. Albrecht, a professor at !". #icalo is currently focused on data analysis for fraud and corruption detection. $owever, it is an open framewor% that could actually be used for many di&erent types of data analysis' networ% los, scientic data, any type of database*oriented data, and data minin. #icalo is built upon a three*level architecture, includin open source and potentially closed source parts. +he followin diaram describes the di&erent parts of the #icalo platform. evel - outines' +his open source level includes all the basic data structures in #icalo. It was nished in the 200-*2002 time frame, and while we continue to add some routines to it, is /uite stable and nished. evel 2 outines' +his level allows non*technical people to run advanced analyses without havin to script or write prorams. #icalos pluable architecture allows new detectlets to be installed in the e1istin proram. e hope that detectlet libraries will be made available by individuals and3or companies 4 the license and price is up to the developers. evel 5 outines' hen developed, this level will apply e1pert system rules to intelliently run level 2 routines (detectlets) to discover many di&erent types of fraud on its own. +he license for the e1pert system is not decided at this point. Cross #latform 6raphical "ser Interface' +his open source layer provides a 6"I on top of the evel - routines to allow non*technical users to access the heart of #icalo. It provides dialo*access to most #icalo features. 7etectlet i8ard' +his open source wi8ard provides access to all detectlets installed at your location. It uides users throuh the use and application of detectlets in fraud analysis. 7etectlets are one of the most e1citin parts of the #icalo architecture. +hey allow non*prorammers to run analysis routines created by others. 9ee the detectlets pae for more information. #icalo is built upon the shoulders of many reat projects. +housands of individuals have contributed time and enery to these projects, and the #icalo e&ort is rateful for their wor%. +hese are listed as follows' #ython hat about /uality control: Another way to phrase this /uestion is, ;Can I trust #icalos results:< +he short answer is you can trust #icalo as much as any analysis application. e ta%e /uality control very seriously.  +he lon answer is you should never fully trust any analysis application. !ou should always double chec% each step of the process, print control totals, and manually ensure that your routines are doin what you thin% they are. It can be very embarrasin (and danerous) to ma%e decisions on faulty analysis routines.

Upload: irene-kelly

Post on 02-Jun-2018

231 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PICALO

8/11/2019 PICALO

http://slidepdf.com/reader/full/picalo 1/3

Picalo is a data analysis application, with focus in fraud detection and data

retrieved from corporate databases. It is also the foundation for an automated fraud

detection system (see below). It started in 2000 as a pet project of Conan C.

Albrecht, a professor at !".

#icalo is currently focused on data analysis for fraud and corruption detection.

$owever, it is an open framewor% that could actually be used for many di&erent

types of data analysis' networ% los, scientic data, any type of database*oriented

data, and data minin.

#icalo is built upon a three*level architecture, includin open source and potentially

closed source parts. +he followin diaram describes the di&erent parts of the #icalo

platform.

evel - outines' +his open source level includes all the basic data structures in

#icalo. It was nished in the 200-*2002 time frame, and while we continue to add

some routines to it, is /uite stable and nished.

evel 2 outines' +his level allows non*technical people to run advanced analyses

without havin to script or write prorams. #icalos pluable architecture allows

new detectlets to be installed in the e1istin proram. e hope that detectlet

libraries will be made available by individuals and3or companies 4 the license and

price is up to the developers.

evel 5 outines' hen developed, this level will apply e1pert system rules to

intelliently run level 2 routines (detectlets) to discover many di&erent types of

fraud on its own. +he license for the e1pert system is not decided at this point.

Cross #latform 6raphical "ser Interface' +his open source layer provides a 6"I on

top of the evel - routines to allow non*technical users to access the heart of #icalo.

It provides dialo*access to most #icalo features.

7etectlet i8ard' +his open source wi8ard provides access to all detectlets installed

at your location. It uides users throuh the use and application of detectlets in

fraud analysis.

7etectlets are one of the most e1citin parts of the #icalo architecture. +hey allow

non*prorammers to run analysis routines created by others. 9ee the detectlets

pae for more information.

#icalo is built upon the shoulders of many reat projects. +housands of individuals

have contributed time and enery to these projects, and the #icalo e&ort is rateful

for their wor%. +hese are listed as follows'

#ython

hat about /uality control: Another way to phrase this /uestion is, ;Can I trust

#icalos results:< +he short answer is you can trust #icalo as much as any analysis

application. e ta%e /uality control very seriously.

 +he lon answer is you should never fully trust any analysis application. !ou should

always double chec% each step of the process, print control totals, and manually

ensure that your routines are doin what you thin% they are. It can be very

embarrasin (and danerous) to ma%e decisions on faulty analysis routines.

Page 2: PICALO

8/11/2019 PICALO

http://slidepdf.com/reader/full/picalo 2/3

"sers unfamiliar with the open source world may implicitly trust ;corporate<

software and be wary of ;open< software. e hope you will re*evaluate this common

misconception as you use #icalo. Certainly, closed*source software applications are

often more user friendly than community*built applications. ut ;ood loo%in< and

easy*to*use prorams are not necessarily trustworthy.

As more users test #icalo and more developers help proram it, well have a lot of

eyes loo%in throuh the code and testin the routines. =pen source software often

nds and 1es bus much faster than closed*source software because of thenumber of individuals loo%in at its code. ;Corporate< software is often written by

small development teams who are driven by mar%etin calendars and new features.

 +he open source world has many e1amples of incredibly well*written software,

includin inu1 (widely %nown for crashin very rarely), #ostre9> (a hihly*

respected database), Apache (which runs most of the web), ind (which runs the

domain names on the Internet), ?7@ and 6nome (e1cellent user interfaces that loo%

similar to indows), w1idets (the 6"I tool%it #icalo is built upon), #erl and #ython

and 6CC (prorammin lanuaes many ;real< prorams are written with), a+e (a

reat word processin platform), and Birefo1 (an incredible web browser). +his list

could o on with thousands of successful open source products that are inproduction use today.

In summary, is #icalo perfect: =f course not. +here may even be one or two bus

left in the evel - routines of #icalo. ut were wor%in to ma%e it a world*class

analysis application that encodes information about thousands of fraud schemes

used worldwide. Consider helpin out to be part of the team that ma%es #icalo

better and better.

ost of #icalo is released under the 6D" 6eneral #ublic icense (6#). +he 6# is a

restrictive, open source license. eve released this pac%ae under the 6# to

protect it. +he source code comes with #icalo, and you are encouraed to improveand add to its funcationality.

7etectlet libraries can be released under licenses other than the 6#. 9ince

7etectlets will be built by companies, orani8ations, and individuals, it is up to the

developers to decide whether to sell, open source, or even public domain their

routines.

 +he restriction is that you cannnot use any #icalo code in your own products unless

those products are also released under the 6#. If you are usin a closed*source

license or even one of many incompatible open source licenses, you cannot use

#icalo code. +he license protects the code from bein ;stolen< by any individual orcompany.

Page 3: PICALO

8/11/2019 PICALO

http://slidepdf.com/reader/full/picalo 3/3

Package picalo

Picalo is a #ython library to help anyone who wor%s with data les, especially those

who wor% with data in relational3spreadsheet format. It is primarily created for

investiators and auditors search throuh data sets for anomalies, trends, ond other

information, but it is enerally useful for any type of data or te1t les.

#icalo is di&erent from Dum#y3Dumarray in that it is meant for heteroeneous data

rather than homoenous data. In Dum#y, you have an array (table) of the same

type**all ints, for e1ample. In #icalo, you have a table made up of di&erent column

types, very similar to a database.

=ne of #icaloEs primary purposes is ma%in relational databases easier to wor% with.

=nce you have a #icalo table, you can add, move, or delete columnsF wor% with

records (hori8ontal slices of the data)F select and roup records in various waysF and

run analyses on tables. #icalo includes adapters for popular databases, and it

provides a >uery object that ma%e /ueries seem just li%e reular +ables (e1cept

they are live from the database).

If you wor% with relational databases, delimited (C9G3+9G) les, @C7IC les, 9

@1cel les, lo les, te1t les, or other heteroeneous datasets, #icalo miht ma%e

your life easier.

#icalo is prorammed to be as #ythonic as possible. ItEs core objects** tables,

columns, records**they act li%e lists. A column is a list of cells. A record is a list of

cells. A table is a list of records. +ables can be sorted via the sort function, just li%e

the 9ortin $ow+o shows. +he return values of almost all functions are new tables,

so functions can be chained toether li%e pipes in "ni1.

#icalo includes an optional #roject object that stores tables in Hope =bject 7 les.hen #rojects are used, #icalo automatically swaps records in and out of memory

as needed to ensure ecient use of resources. #rojects allow #icalo to wor% with

essentially an unlimited amount of data.

 +he project was started in 2005 by Conan C. Albrecht, a professor in Information

9ystems at riham !oun "niversity. Conan remains the primary developer of

#icalo.

Example uses of Picalo:

- Analyzing financial data, employee records, and purchasing systems for errors and fraud

- Interactively analyzing network events, web server logs, and system login records

- Importing email into relational or text-based databases

- Embedding controls and fraud testing routines into production systems