Iscte

Mestrado

Informática e Gestão

Título

Triagem de pedidos de assistência médica

Autor

Gerardo, Ália

Resumo

Nesta dissertação foi avaliada a capacidade de efectuar a triagem de pedidos de assistência médica recorrendo a técnicas de Data Mining. Com base na revisão da literatura decidiu-se seguir a metodologia de Cios et. al (2000), tendo-se explorado diversas abordagens. Uma das principais razões para a escolha desta metodologia foi o facto de se verificar que é a mais utilizada em estudos na área da saúde. Os dados utilizados consistem em 2.070.227 pedidos de assistência médica com as variáveis Ano, Mês, Dia, Dia da Semana, Hora, Distrito, Concelho, Prioridade, Tipo de Ocorrência, Faixa Etária e Sexo, sendo a variável Prioridade o nível de triagem atribuído, podendo este assumir um de quatro valores Emergentes, Urgente, Pouco-urgente e Nãourgente. O tratamento de dados médicos exige cuidados que vão além dos requisitos habituais neste tipo de trabalhos. Para além da dificuldade na obtenção de dados por questões de confidencialidade, é importante que o resultado seja transparente e perceptível e cuidadosamente avaliado. Nesse sentido, foram aplicados os algoritmos árvores de decisão (J48), o Naïve Bayes e Máquinas de Vectores de Suporte (SMO e LibSVM) considerando a escala real de quatro níveis (Emergente, Urgente, Pouco-urgente e Não-urgente). Foi igualmente considerada uma escala de dois níveis, derivada a partir da escala real. As medidas de avaliação utilizadas foram a taxa de acerto, sensibilidade e especificidade. Os resultados mostram que as técnicas de Data Mining são mais eficazes a efectuar a triagem considerando apenas dois níveis. Igualmente se demonstrou nas diferentes abordagens que as Máquinas de Vectores de Suporte são mais eficazes que as restantes técnicas utilizadas.

In this dissertation was evaluated the ability to perform the screening of medical assistance requests using Data Mining techniques. Based on the literature review it was decided to follow the methodology of Cios et. al (2000), and several approaches have been explored. One of the main reasons for choosing this methodology was the fact that it is used most frequently in healthcare studies. The data consists of 2,070,227 requests of medical assistance and it features the following variables: Year, Month, Day, Day of the Week, Hour, District, County, Priority, Type of Occurrence, Age Group and Gender. The variable for Priority is the level of triage attributed, which may assume one of four values: Emergent, Urgent, Less Urgent and Nonurgent. The processing of medical data demands a supplementary degree of caution when comparing to other kinds of data. In addition to the difficulties of obtaining sensitive and confidential information, it is important that the results are transparent, perceptible and carefully evaluated. In this regard, the following algorithms are applied: Decision Tree (J48), the Naïve Bayes and Support Vector Machines (SMO and LibSVM), considering the four-levels of the real scale: Emergent, Urgent, Less Urgent and Nonurgent. A two-level scale was also derived from the original scale. The evaluation measures used were: Accuracy, Sensitivity and Specificity. The results show that Data Mining techniques are more effective performing triage considering only two levels. It has also been demonstrated in the different approaches investigated that the Support Vector Machines are more effective than the other techniques analyzed.