Posts tagged ciência da computação
Finalizando a série de slides, é apresentado o estudo de caso (fictício) da WCM e algumas soluções de automação do PDI. Apenas relembrando, o material com os exercícios foram publicados no post da 1a aula.
Na 2a aula do tutorial são mostradas as principais técnicas de manipulação de dados e fluxo de controle, além de apresentar algumas técnicas de transformação dos dados. No fim, é mostrado como validar e tratar erros com o PDI. O roteiro e o material para os exercícios foram publicados na 1a aula dessa série.
Nesse semestre eu reduzi a zero o número de aulas nas faculdades onde ensinava. As atividades no MCT e no doutorado estão tomando todo o meu tempo de trabalho e ficou inviável continuar lecionando. De qualquer forma, vou começar a publicar nesse espaço o material dos últimos cursos e disciplinas que ministrei. Minha intenção inicial sempre foi produzir um conjunto de apostilas para servir de apoio aos meus alunos, mas por enquanto vou publicar apenas os slides das aulas.
O primeiro conjunto de slides é sobre um tutorial do Pentaho Data Integration. Esse material foi mostrado na cadeira de Banco de Dados para Suporte a Decisão, do Prof. Robson Fidalgo. Fiquem à vontade para criticar e sugerir modificações no material.
O roteiro dos exercícios encontra-se disponível nos slides abaixo:
As bases de dados dos exercícios podem ser baixadas aqui.
Vale a pena dar uma olhada na apresentação de Glenn Vanderburg na última Lone Star Ruby Conference. Apesar do título provocativo, a palestra mostra uma visão bem prática sobre como métodos ágeis podem ser utilizados para produzir software com qualidade e sem estourar o orçamento. O vídeo da apresentação (em inglês) pode ser acessado clicando aqui.
Após quase um mês de aulas no doutorado do CIN, percebi o quanto estou enferrujado em alguns princípios básicos de cálculo. Durante a minha graduação cursei 4 disciplinas de cálculo, 3 de álgebra e 1 de estatística. Mesmo assim, senti uma dificuldade enorme em acompanhar as primeiras aulas – em grande parte, pelo tempo que passou entre a graduação e o meu novo contato com o mundo dos números. Mas o principal motivo foi o simples fato que nunca precisei utilizar esses conhecimentos na minha carreira como analista de sistemas. Esse assunto veio à tona quando li o post de Alan Skorkin falando exatamente sobre os problemas que estou enfrentando agora. Vale a pena dar uma olhada.
You Don’t Need Math Skills To Be A Good Developer But You Do Need Them To Be A Great One
A little while ago I started thinking about math. You see, I’ve been writing software for quite a few years now and to be totally honest, I haven’t yet found a need for math in my work. There has been plenty of new stuff I’ve had to learn/master, languages, frameworks, tools, processes, communication skills and library upon library of stuff to do just about anything you can think of; math hasn’t been useful for any of it. Of course this is not surprising, the vast majority of the work I’ve been doing has been CRUD in one form or another, that’s the vast majority of the work most developers do in these interweb times of ours. You do consulting – you mostly build websites, you work for a large corporates – mostly build websites, you freelance – you mostly build websites. I am well aware that I am generalising quite a bit, but do bear with me, I am going somewhere.
Eventually you get a little tired of it, as I did. Don’t get me wrong it can be fun and challenging work, providing opportunities to solve problems and interact with interesting people – I am happy to do it during work hours. But the thought of building yet more websites in my personal time has somewhat lost its luster – you begin to look for something more interesting/cool/fun, as – once again – I did. Some people gravitate to front-end technologies and graphical things – visual feedback is seductive – I was not one of them (I love a nice front-end as much as the next guy, but it doesn’t really excite me), which is why, when I was confronted with some search-related problems I decided to dig a little further. And this brings me back to the start of this story because as soon as I grabbed the first metaphorical shovel-full of search, I ran smack-bang into some math and realized exactly just how far my skills have deteriorated. Unlike riding a bike – you certainly do forget (although I haven’t ridden a bike in years so maybe you forget that too ).
Learning a little bit about search exposed me to all sorts of interesting software-y and computer science-y related things/problems (machine learning, natural language processing, algorithm analysis etc.) and now everywhere I turn I see math and so feel my lack of skills all the more keenly. I’ve come to the realization that you need a decent level of math skill if you want to do cool and interesting things with computers. Here are some more in addition to the ones I already mentioned – cryptography, games AI, compression, genetic algorithms, 3d graphics etc. You need math to understand the theory behind these fields which you can then apply if you want to write those libraries and tools that I was talking about – rather than just use them (be a producer rather than just a consumer – to borrow an OS metaphor ). And even if you don’t want to write any libraries, it makes for a much more satisfying time building software, when you really understand what makes things tick, rather than just plugging them in and hoping they do whatever the hell they’re supposed to.
The majority of developers will tell you that they’ve never needed math for their work (like I did a couple of paragraphs above ), but after musing on it for a while, I had a though. What we might have here is a reverse Maslow’s hammer problem. You know the one – when you have a hammer, everything looks like a nail. It is a metaphor for using a favourite tool even when it may not be best for the job at hand. Math is our hammer in reverse. We know the hammer exists but don’t quite know how to use it, so even when we meet a problem where our hammer would be the perfect tool, we never give it serious consideration. The screwdriver was good enough for my granddaddy, it was good enough for my daddy and it is good enough for me, who needs a hammer anyway? The trick with math is – people are afraid of it – even most programmers, you’d think we wouldn’t be, but we are. So, we turn our words into a self-fulfilling prophecy. It’s not that I don’t need math for my work it’s just that I don’t really know it and even if I do, I don’t know how to apply it. So I get by without it and when you make-do without something for long enough, after a while you don’t even notice it’s missing and so need it even less – self-fulfilling prophecy.
Here is some food for thought about something close to all our hearts – learning new skills. As a developer in the corporate world, you strive to be a generalizing specialist (read this book if you don’t know what I am talking about). You try to be decent at most things and really good at some. But what do you specialize in? Normally people choose a framework or two and a programming language and go with that, which is fine and worthwhile. But consider the fact that frameworks and to a lesser extent languages have a limited shelf life. If you’re building a career on being a Hibernate, Rails or Struts expert (the struts guys should really be getting worried now ), you will have to rinse and repeat all over again in a few years when new frameworks come along to supersede the current flavour of the month. So is it really the best investment of your time – maybe, but then again maybe not. Math, on the other hand is not going away any time soon. Everything we do in our field is built upon solid mathematical principles at its root (algorithms and data structures being a case in point), so time spent keeping up your math skills is arguably never wasted. And it, once again, comes down to really understanding something rather than just using it by rote – math can help you understand everything you do more deeply, when it comes to computers. Infact, as Steve Yegge said, what we do as programmers is so much like math we don’t even realise it.
What/Who Makes A Difference
You don’t believe me, then consider this. Most of the people who are almost universally respected in our field as great programmers are also great mathematicians. I am talking people like Donald Knuth, Edsger W. Dijkstra, Noam Chomsky, Peter Norvig. But then again these guys weren’t really developers, they were computer scientists, so it doesn’t really count right? I guess, but then again, maybe we shouldn’t really talk until our output in pure lines of code even begins to approach 10% of what these people have produced. Of course, you can be successful and famous without being a boffin, everyone has heard of Gavin King or DHH. That’s kinda true (although it’s an arguable point whether or not many people have heard of Gavin or DHH outside their respective niches), but “heard of” and universally respected are different things, about as different as creating a framework and significantly advancing the sum-total of human knowledge in your field (don’t get me wrong, I respect Gavin And David, they’ve done a hell of a lot more than I have, but that doesn’t make what I said any less of a fact). How is all of this relevant? I dunno, it probably isn’t, but I thought I’d throw it in there anyway since we’re being introspective and all.
The world is getting filled up with data, there is more and more of it every day and whereas before we had the luxury of working with relatively small sets of it, these days the software we write must operate efficiently with enormous data sets. This is increasingly true even in the corporate world. What this means is that you will be less and less likely to be able to just “kick things off” to see how they run, because with the amount of data you’ll be dealing with it will just grind to a halt unless you’re smart about it. My prediction is that algorithm analysis will become increasingly important for the lay-programmer, not that it wasn’t before, but it will become more so. And what do you need to be a decent algorist – you guessed it, some math skills.
So, what about me? Well, I’ve decided to build up/revive my math skills a little bit at a time, there are still plenty of books to read and code to write, but I will try to devote a little bit of my time to math at least once in a while, because like exercise, a little bit once in a while, is better than nothing (to quote Steve Yegge yet again). Of course I have a bit of an ace up my sleeve when it comes to math, which is good for me, but luckily with this blog, we might all benefit (I know you’re curious, I’ll tell you about it soon ).
Where Do You See Yourself In 5 Years
So, is all this math gonna be good for anything? It’s hard to say in advance, I am pretty happy with where I am at right now and so might you be, but it’s all about potential. End of the day, if you’re a developer in the corporate world you don’t really need any math. If you’re happy to go your entire career doing enterprise CRUD apps during work hours and paragliding or wakeboarding (or whatever trendy ‘sport’ the geeky in-crowd is into these days) during your off time then by all means, invest some more time into Spring or Hibernate or Visual Studio or whatever. It will not really limit your potential in that particular niche; you can become extremely valuable – even sought after. But if you strive for diversity in your career and want to have the ability to try your hand at almost any activity that involves code, from information retrieval to Linux kernel hacking. In short if you want to be a perfect mix of developer, programmer and computer scientist, you have to make sure you math skills are up to scratch (and hell, you can still go wakeboarding if you really want ). Long story short, if you grok math, there are no doors that are closed to you in the software development field, if you don’t – it’s going to be all CRUD (pun intended)!
O Digg é mais um grande nome da Web 2.0 que acaba de migrar os seus (gigantescos) conjuntos de dados do mundo relacional para o modelo “pós-relacional”, esse último conhecido como NoSQL. Eles se juntaram à empresas como Google, Amazon, e-Bay, LinkedIn, Twitter e Facebook, com o objetivo de prover níveis de performance mais adequados para as consultas realizadas em bases de dados monstruosas, típicas de aplicações da Web 2.0. Para ter idéia do problema, imagine uma consulta na base de 2 PB (dois Petabytes) do e-Bay sendo feita online por um conjunto de centenas ou milhares de usuários simultaneamente. Imagine agora fazer um join com essa quantidade toda de linhas em tabelas de um banco relacional.
A estratégia do Digg está descrita em dois posts no blog do serviço. O primeiro fala sobre as dificuldades de escalabilidade da infraestrutura de banco de dados, baseada em uma solução mestre-escravo particionada um servidor MySQL. O texto mostra um exemplo de um a consulta com join que levava 14 segundos para ser completada. Após a migração para um datastore não-relacional, baseado no modelo distribuído do Cassandra, a mesma consulta pôde ser realizada em menos de um segundo. O segundo texto fala sobre a migração completa dos principais serviços do Digg usando o Cassandra e ainda lista as principais contribuições da equipe de desenvolvimento para o projeto.
Um excelente post de Chuck Connel na Dr. Dobb’s, defendendo a tese que a Engenharia de Software (ES) não faz parte da Ciência da Computação (!). Achei muito interessante o ponto de vista dele, mostrando que a ES não precisa ter um rigor matemático em suas atividades, basicamente, porque software é feito com “criatividade, visão, pensamento multidisciplinar e humanidade”. Isso me lembra alguns colegas de trabalho em um passado recente, tentando provar por fim da força que Análise por Pontos de Função é algo confiável . O trecho onde ele fala que essas técnicas de estimativas são meramente subjetivas, devido aos fatores humanos presentes em sua formulação, é um excelente argumento na defesa de metodologias mais ágeis para tentar prever os diversos aspectos da construção de software.
Veja abaixo o post de Connell. O
“A few years ago, I studied algorithms and complexity. The field is wonderfully clean, with each concept clearly defined, and each result building on earlier proofs. When you learn a fact in this area, you can take it to the bank, since mathematics would have to be inconsistent to overturn what you just learned. Even the imperfect results, such as approximation and probabilistic algorithms, have rigorous analyses about their imperfections. Other disciplines of computer science, such as network topology and cryptography also enjoy similar satisfying status.
Now I work on software engineering, and this area is maddeningly slippery. No concept is precisely defined. Results are qualified with “usually” or “in general”. Today’s research may, or may not, help tomorrow’s work. New approaches often overturn earlier methods, with the new approaches burning brightly for a while and then falling out of fashion as their limitations emerge. We believed that structured programming was the answer. Then we put faith in fourth-generation languages, then object-oriented methods, then extreme programming, and now maybe open source.
But software engineering is where the rubber meets the road. Few people care whether P equals NP just for the beauty of the question. The computer field is about doing things with computers. This means writing software to solve human problems, and running that software on real machines. By the Church-Turing Thesis, all computer hardware is essentially equivalent. So while new machine architectures are cool, the real limiting challenge in computer science is the problem of creating software. We need software that can be put together in a reasonable amount of time, for a reasonable cost, that works something like its designers hoped for, and runs with few errors.
With this goal in mind, something has always bothered me (and many other researchers): Why can’t software engineering have more rigorous results, like the other parts of computer science? To state the question another way, “How much of software design and construction can be made formal and provable?” The answer to that question lies in Figure 1.
The topics above the line constitute software engineering. The areas of study below the line are the core subjects of computer science. These latter topics have clear, formal results. For open questions in these fields, we expect that new results will also be formally stated. These topics build on each other — cryptography on complexity, and compilers on algorithms, for example. Moreover, we believe that proven results in these fields will still be true 100 years from now.
So what is that bright line, and why are none of the software engineering topics below it? The line is the property “directly involves human activity”. Software engineering has this property, while traditional computer science does not. The results from disciplines below the line might be used by people, but their results are not directly affected by people.
Software engineering has an essential human component. Software maintainability, for example, is the ability of people to understand, find, and repair defects in a software system. The maintainability of software may be influenced by some formal notions of computer science — perhaps the cyclomatic complexity of the software’s control graph. But maintainability crucially involves humans, and their ability to grasp the meaning and intention of source code. The question of whether a particular software system is highly maintainable cannot be answered just by mechanically examining the software.
The same is true for safety. Researchers have used some formal methods to learn about a software system’s impact on people’s health and property. But no discussion of software safety is complete without appeal to the human component of the system under examination. Likewise for requirements engineering. We can devise all sorts of interview techniques to elicit accurate requirements from software stakeholders, and we can create various systems of notation to write down what we learn. But no amount of research in this area will change the fact that requirement gathering often involves talking to or observing people. Sometimes these people tell us the right information, and sometimes they don’t. Sometimes people lie, perhaps for good reasons. Sometimes people are honestly trying to convey correct information but are unable to do so.
This observation leads to Connell’s Thesis:
Software engineering will never be a rigorous discipline with proven results, because it involves human activity.
This is an extra-mathematical statement, about the limits of formal systems. I offer no proof for the statement, and no proof that there is no proof. But the fact remains that the central questions of software engineering are human concerns:
- What should this software do? (requirements, usability, safety)
- What should the software look like inside, so it is easy to fix and modify? (architecture, design, scalability, portability, extensibility)
- How long will it take to create? (estimation)
- How should we build it? (coding, testing, measurement, configuration)
- How should we organize the team to work efficiently? (management, process, documentation)
All of these problems revolve around people.
My thesis explains why software engineering is so hard and so slippery. Tried-and-true methods that work for one team of programmers do not work for other teams. Exhaustive analysis of past programming projects may not produce a good estimation for the next. Revolutionary software development tools each help incrementally and then fail to live up to their grand promise. The reason is that humans are squishy and frustrating and unpredictable.
Before turning to the implications of my assertion, I address three likely objections:
The thesis is self-fulfilling. If some area of software engineering is solved rigorously, you can just redefine software engineering not to include that problem.
This objection is somewhat true, but of limited scope. I am asserting that the range of disciplines commonly referred to as software engineering will substantially continue to defy rigorous solution. Narrow aspects of some of the problems might succumb to a formal approach, but I claim this success will be just at the fringes of the central software engineering issues.
Statistical results in software engineering already disprove the thesis.
These methods generally address the estimation problem and include Function Point Counting, COCOMO II, PROBE, and others. Despite their mathematical appearance, these methods are not proofs or formal results. The statistics are an attempt to quantify subjective human experience on past software projects, and then extrapolate from that data to future projects. This works sometimes. But the seemingly rigorous formulas in these schemes are, in effect, putting lipstick on a pig, to use a contemporary idiom. For example, one of the formulas in COCOMO II is PersonMonths = 2.94 × SizeB, where B = 0.91 + 0.01 × Σ SFi, and SF is a set of five subjective scale factors such as “development flexibility” and “team cohesion”. The formula looks rigorous, but is dominated by an exponent made up of human factors.
Formal software engineering processes, such as cleanroom engineering, are gradually finding rigorous, provable methods for software development. They are raising the bright line to subsume previously squishy software engineering topics.
It is true that researchers of formal processes are making headway on various problems. But they are guilty of the converse of the first objection: they define software development in such a narrow way that it becomes amenable to rigorous solutions. Formal methods simply gloss over any problem centered on human beings. For example, a key to formal software development methods is the creation of a rigorous, unambiguous software specification. The specification is then used to drive (and prove) the later phases of the development process. A formal method may indeed contain an unambiguous semantic notation scheme. But no formal method contains an exact recipe for getting people to unambiguously state their vague notions of what software ought to do.
To the contrary of these objections, it is my claim that software engineering is essentially different from traditional, formal computer science. The former depends on people and the latter does not. This leads to Connell’s Corollary:
We should stop trying to prove fundamental results in software engineering and accept that the significant advances in this domain will be general guidelines.
As an example, David Parnas wrote a wonderful paper in 1972, On The Criteria To Be Used in Decomposing Systems into Modules. The paper describes a simple experiment Parnas performed about alternative software design strategies, one utilizing information hiding, and the other with global data visibility. He then drew some conclusions and made recommendations based on this small experiment. Nothing in the paper is proven, and Parnas does not claim that anyone following his recommendations is guaranteed to get similar results. But the paper contains wise counsel and has been highly influential in the popularity of object-oriented language design.
Another example is the vast body of work known as CMMI from the Software Engineering Institute at Carnegie Mellon. CMMI began as a software process model and has now grown to encompass other kinds of projects as well. CMMI is about 1000 pages long — not counting primers, interpretations, and training materials — and represents more than 1000 person-years of work. It is used by many large organizations and has been credited with significant improvement in their software process and products. But CMMI contains not a single iron-clad proven result. It is really just a set of (highly developed) suggestions for how to organize a software project, based on methods that have worked for other organizations on past projects. In fact, the SEI states that CMMI is not even a process, but rather a meta-process, with details to be filled in by each organization.
Other areas of research in this spirit include design patterns, architectural styles, refactoring based on bad smells, agile development, and data visualization. In these disciplines, parts of the work may include proven results, but the overall aims are systems that foundationally include a human component. To be clear: Core computer science topics (below the bright line) are vital tools to any software engineer. A background in algorithms is important when designing high-performance application software. Queuing theory helps with the design of operating system kernels. Cleanroom engineering contains some methods useful in some situations. Statistical history can be helpful when planning similar projects with a similar team of people. But formalism is just a necessary, not sufficient, condition for good software engineering. To illustrate this point, consider the fields of structural engineering and physical architecture (houses and buildings).
Imagine a brilliant structural engineer who is the world’s expert on building materials, stress and strain, load distributions, wind shear, earthquake forces, etc. Architects in every country keep this person on their speed-dial for every design and construction project. Would this mythical structural engineer necessarily be good at designing the buildings he or she is analyzing? Not at all. Our structural engineer might be lousy at talking to clients, unable to design spaces that people like to inhabit, dull at imagining solutions to new problems, and boring aesthetically. Structural engineering is useful to physical architects, but is not enough for good design. Successful architecture includes creativity, vision, multi-disciplinary thinking, and humanity.
In the same way, classical computer science is helpful to software engineering, but will never be the whole story. Good software engineering also includes creativity, vision, multi-disciplinary thinking, and humanity. This observation frees software engineering researchers to spend time on what does succeed — building up a body of collected wisdom for future practitioners. We should not try to make software engineering into an extension of mathematically-based computer science. It won’t work, and can distract us from useful advances waiting to be discovered.”
Aproveitando toda a repercussão do anúncio do Chrome OS por parte do Google, transcrevo abaixo o excelente post de Kent Back sobre o assunto.
“I’ve been reading reactions to the Chrome OS announcement today and so far everything I’ve read has missed the point, or at least the point I see. Here’s my take on Chrome OS. It’s a story, though, not a soundbite.
Walks like a duck…
When Chrome came out I took a look at the architecture and thought, “Hmm… separate address spaces for each tab. That looks like an operating system.” So I decided to try it, to simulate using Chrome as my operating system. I made it my default browser (in spite of Microsoft’s periodic attempts to change my preference) and expanded it to full screen. From then on I did everything I could on the web.
The best part of a year later I can say Chrome is a little clunky as a desktop. Multiple windows are more work than they should be (I wish they’d automatically size to their contents). Some of the web GUIs aren’t as polished as native interfaces. However, there is a whole lot I don’t miss about Windows apps. Overall the advantages outweigh the disadvantages.
I still use a handful of native apps regularly: Skype, Eclipse, iTunes, and Outlook. I’d be happy to have web-based replacements (actually I only use Outlook out of inertia, not because Gmail wouldn’t do a just fine job). If I had them, I wouldn’t miss my old desktop at all.
Negative reactions to Chrome OS seem to be based on how much worse it is than its competitors. Christensen provides an alternative perspective on the situation. The existing desktops are overkill for most users’ needs. The more features added to the desktops, the smaller percentage most people find useful. An alternative that is better at stuff users care about would be welcome.
This is a story that has played out thousands of times: digital photography was worse than chemical photography, wireless LANs were worse than wired LANs, microcomputers were worse than minicomputers were worse than mainframes, Java was worse than C++. Now Chrome OS is worse than Windows and the Mac and Linux desktops.
Innovations that start out worse need to be better at something new that matters. Imagine never having to install an application again. Never having to back up. Never having to reinstall the OS because it’s just gotten way too weird. I’d give up a lot to gain that. That was the point of telling you about my experiment: I’ve seen the future and it’s not so bad.
The next step in the innovator’s dilemma script is predictable. The existing participants will ignore Chrome (they may fuss, but they aren’t going to introduce something even simpler, even better, even cheaper–that’s just not how they think). Chrome OS will grow better and better, and be attractive on bigger and bigger hardware. More and more of the necessary apps will migrate to the browser or be replaced by inferior-but-good-enough entrants (do you hear that, Skype?) Since Chrome OS is genuinely better along some dimensions, the motivation is there for users, for application developers, and for Google to continue the march.
After a decade of nibble, nibble, nibble, Apple and Microsoft will occupy highly-profitable but miniscule markets. If I had to guess I would say that Apple will have the very best high end desktops and Microsoft will be strong on servers. By that time, though, Chrome OS will have grown bloated with seemingly-indispensible features and will be ripe for a little nibble, nibble, nibble of its own.
Disruption isn’t inevitable. Relational databases successfully fended off clearly-superior object databases (although the stupidification of data poses a fresh disruptive threat). To remain strong in desktop operating systems, though, Apple or Microsoft or the Linux desktops would have to abandon their current profit model, find a fresh ultra-simplification twist, and run the new business far from rational-but-doomed headquarters (Merlin, Oregon has a lovely abandoned sawmill site ready for development, in case you’re interested). They aren’t likely to do so, though, because it makes no sense.
The current desktops are dead, even though they will linger for a decade or more. Welcome, Chrome OS. Here’s to a worse future.”
No mês passado tive a honra de apresentar uma palestra na Semana Universitária de Caruaru, com o tema “Reuso de Software: uma Visão Geral”. O evento aconteceu no Campus da UPE no dia 10 de Outubro. A palestra é uma adaptação do material utilizado pelo RiSE nas disciplinas de reuso do mestrado do CIN.
Gostaria de agradecer ao Prof. Humberto Rocha pelo convite e aos alunos pelo comparecimento na palestra.
No último mês de Setembro aconteceu em Parma na Itália, a 34a Conferência da Euromicro, um dos eventos mais tradicionais da área de Engenharia de Software do mundo. Nessa edição, o RiSE Group teve dois artigos publicados, sendo um deles de minha autoria. O primeiro artigo foi “A Case Study in Software Product Lines – The Case of the Mobile Game Domain” de autoria de Leandro Nascimento, Eduardo Almeida e Sílvio Meira. Leandro por sinal foi o apresentador (vide foto) de um artigo de minha autoria, com participação de Eduardo Almeida e Sílvio Meira, intitulado “InCoME: Integrated Cost Model for Product Line Engineering”. Meus agradecimentos ao esforço e parceria de Kid por se dispor a apresentar o artigo para uma platéia altamente especializada em reuso de software.