picalo
TRANSCRIPT
8/11/2019 PICALO
http://slidepdf.com/reader/full/picalo 1/3
Picalo is a data analysis application, with focus in fraud detection and data
retrieved from corporate databases. It is also the foundation for an automated fraud
detection system (see below). It started in 2000 as a pet project of Conan C.
Albrecht, a professor at !".
#icalo is currently focused on data analysis for fraud and corruption detection.
$owever, it is an open framewor% that could actually be used for many di&erent
types of data analysis' networ% los, scientic data, any type of database*oriented
data, and data minin.
#icalo is built upon a three*level architecture, includin open source and potentially
closed source parts. +he followin diaram describes the di&erent parts of the #icalo
platform.
evel - outines' +his open source level includes all the basic data structures in
#icalo. It was nished in the 200-*2002 time frame, and while we continue to add
some routines to it, is /uite stable and nished.
evel 2 outines' +his level allows non*technical people to run advanced analyses
without havin to script or write prorams. #icalos pluable architecture allows
new detectlets to be installed in the e1istin proram. e hope that detectlet
libraries will be made available by individuals and3or companies 4 the license and
price is up to the developers.
evel 5 outines' hen developed, this level will apply e1pert system rules to
intelliently run level 2 routines (detectlets) to discover many di&erent types of
fraud on its own. +he license for the e1pert system is not decided at this point.
Cross #latform 6raphical "ser Interface' +his open source layer provides a 6"I on
top of the evel - routines to allow non*technical users to access the heart of #icalo.
It provides dialo*access to most #icalo features.
7etectlet i8ard' +his open source wi8ard provides access to all detectlets installed
at your location. It uides users throuh the use and application of detectlets in
fraud analysis.
7etectlets are one of the most e1citin parts of the #icalo architecture. +hey allow
non*prorammers to run analysis routines created by others. 9ee the detectlets
pae for more information.
#icalo is built upon the shoulders of many reat projects. +housands of individuals
have contributed time and enery to these projects, and the #icalo e&ort is rateful
for their wor%. +hese are listed as follows'
#ython
hat about /uality control: Another way to phrase this /uestion is, ;Can I trust
#icalos results:< +he short answer is you can trust #icalo as much as any analysis
application. e ta%e /uality control very seriously.
+he lon answer is you should never fully trust any analysis application. !ou should
always double chec% each step of the process, print control totals, and manually
ensure that your routines are doin what you thin% they are. It can be very
embarrasin (and danerous) to ma%e decisions on faulty analysis routines.
8/11/2019 PICALO
http://slidepdf.com/reader/full/picalo 2/3
"sers unfamiliar with the open source world may implicitly trust ;corporate<
software and be wary of ;open< software. e hope you will re*evaluate this common
misconception as you use #icalo. Certainly, closed*source software applications are
often more user friendly than community*built applications. ut ;ood loo%in< and
easy*to*use prorams are not necessarily trustworthy.
As more users test #icalo and more developers help proram it, well have a lot of
eyes loo%in throuh the code and testin the routines. =pen source software often
nds and 1es bus much faster than closed*source software because of thenumber of individuals loo%in at its code. ;Corporate< software is often written by
small development teams who are driven by mar%etin calendars and new features.
+he open source world has many e1amples of incredibly well*written software,
includin inu1 (widely %nown for crashin very rarely), #ostre9> (a hihly*
respected database), Apache (which runs most of the web), ind (which runs the
domain names on the Internet), ?7@ and 6nome (e1cellent user interfaces that loo%
similar to indows), w1idets (the 6"I tool%it #icalo is built upon), #erl and #ython
and 6CC (prorammin lanuaes many ;real< prorams are written with), a+e (a
reat word processin platform), and Birefo1 (an incredible web browser). +his list
could o on with thousands of successful open source products that are inproduction use today.
In summary, is #icalo perfect: =f course not. +here may even be one or two bus
left in the evel - routines of #icalo. ut were wor%in to ma%e it a world*class
analysis application that encodes information about thousands of fraud schemes
used worldwide. Consider helpin out to be part of the team that ma%es #icalo
better and better.
ost of #icalo is released under the 6D" 6eneral #ublic icense (6#). +he 6# is a
restrictive, open source license. eve released this pac%ae under the 6# to
protect it. +he source code comes with #icalo, and you are encouraed to improveand add to its funcationality.
7etectlet libraries can be released under licenses other than the 6#. 9ince
7etectlets will be built by companies, orani8ations, and individuals, it is up to the
developers to decide whether to sell, open source, or even public domain their
routines.
+he restriction is that you cannnot use any #icalo code in your own products unless
those products are also released under the 6#. If you are usin a closed*source
license or even one of many incompatible open source licenses, you cannot use
#icalo code. +he license protects the code from bein ;stolen< by any individual orcompany.
8/11/2019 PICALO
http://slidepdf.com/reader/full/picalo 3/3
Package picalo
Picalo is a #ython library to help anyone who wor%s with data les, especially those
who wor% with data in relational3spreadsheet format. It is primarily created for
investiators and auditors search throuh data sets for anomalies, trends, ond other
information, but it is enerally useful for any type of data or te1t les.
#icalo is di&erent from Dum#y3Dumarray in that it is meant for heteroeneous data
rather than homoenous data. In Dum#y, you have an array (table) of the same
type**all ints, for e1ample. In #icalo, you have a table made up of di&erent column
types, very similar to a database.
=ne of #icaloEs primary purposes is ma%in relational databases easier to wor% with.
=nce you have a #icalo table, you can add, move, or delete columnsF wor% with
records (hori8ontal slices of the data)F select and roup records in various waysF and
run analyses on tables. #icalo includes adapters for popular databases, and it
provides a >uery object that ma%e /ueries seem just li%e reular +ables (e1cept
they are live from the database).
If you wor% with relational databases, delimited (C9G3+9G) les, @C7IC les, 9
@1cel les, lo les, te1t les, or other heteroeneous datasets, #icalo miht ma%e
your life easier.
#icalo is prorammed to be as #ythonic as possible. ItEs core objects** tables,
columns, records**they act li%e lists. A column is a list of cells. A record is a list of
cells. A table is a list of records. +ables can be sorted via the sort function, just li%e
the 9ortin $ow+o shows. +he return values of almost all functions are new tables,
so functions can be chained toether li%e pipes in "ni1.
#icalo includes an optional #roject object that stores tables in Hope =bject 7 les.hen #rojects are used, #icalo automatically swaps records in and out of memory
as needed to ensure ecient use of resources. #rojects allow #icalo to wor% with
essentially an unlimited amount of data.
+he project was started in 2005 by Conan C. Albrecht, a professor in Information
9ystems at riham !oun "niversity. Conan remains the primary developer of
#icalo.
Example uses of Picalo:
- Analyzing financial data, employee records, and purchasing systems for errors and fraud
- Interactively analyzing network events, web server logs, and system login records
- Importing email into relational or text-based databases
- Embedding controls and fraud testing routines into production systems