Извлечение данных с веб-сайта для отображения в чате свежих новостей

Страница: 1

Сообщений 1 страница 5 из 5

Поделиться125.04.2024 17:18

Автор: nexuxirc
Заинтересованный
Зарегистрирован: 28.09.2021
Сообщений: 13

Я изменил скрипт, который извлекает данные с веб-страницы, но разница заключается в следующем:
он ищет все элементы <li id="nationalU">,
он приводит меня, но также сохраняет заголовок и, когда он отображается в комнате, читает строку за строкой. И он читает не только содержимое, которое находится внутри тегов <li id="nationalU">.

Код:

on *:TEXT:!nacional:#lps: nacional

alias nacional {
  if ($hget(ns,stop)) { 
    .notice $nick Una sola vez por favor, espera por lo menos 10 segundos para enviar de vuelta. 
    | halt 
  } 
  .hadd -mu10 ns stop 1
  .hadd -m ns nick $nick
  .hadd -m ns chan $chan
  .hadd -m ns port 443
  .hadd -m ns domen multicuadros.000webhostapp.com
  .hadd -m ns webpage https://multicuadros.000webhostapp.com/index.php
  .hadd -m ns file $scriptdir $+ nacional.txt

  ; Eliminamos el archivo existente para asegurar que esté limpio
  if ($exists($hget(ns,file))) .remove $hget(ns,file)
  .sockclose nacional | if ($hget(ns,inc)) .hdel -sw ns inc

  if (https: isin $hget(ns,webpage)) var %flag -e
  .sockopen %flag nacional $hget(ns,domen) $hget(ns,port)
}

on *:SOCKOPEN:nacional:{
  if ($sockerr) { 
    echo -s $hget(ns,domen) : Server is not available. 
    | halt 
  }
  .sockwrite -nt $sockname GET $hget(ns,webpage) HTTP/1.0
  .sockwrite -nt $sockname Host: $hget(ns,domen)
  .sockwrite -nt $sockname User-Agent: *
  .sockwrite -nt $sockname Content-Type: text/html; charset=utf-8
  .sockwrite -nt $sockname $str($crlf,2)
  .sockwrite -nt $sockname
}

;===

on *:SOCKREAD:nacional:{
  if ($sockerr > 0) { 
    echo -s $hget(ns,domen) : $error 
    | halt 
  }

  :readmore
  .sockread %temp_ns
  if ($sockbr == 0) {
    ; Final de la lectura, comprobamos si hemos encontrado contenido
    if (%temp_ns) {
      ; Filtramos el contenido para encontrar las etiquetas deseadas
      var %ns_find = id="nacionalU"
      if (%temp_ns && $regex(%temp_ns, /%ns_find/)) {
        var %content = $strip($regml(%temp_ns,/<li id="nacionalU">(.*)<\/li>/s))
        if (%content) {
          .write -c $hget(ns,file) %content
        }
      }
    }
    return
  }

  ; Si no hemos llegado al final, continuamos leyendo
  ; Ignoramos todas las líneas hasta que encontramos <li id="nacionalU">
  if (!$hget(ns,html_started)) {
    if ($regex(%temp_ns,/<li id="nacionalU">/i)) {
      .hadd ns html_started 1
      ; Escribimos la línea actual en el archivo
      .write -i $hget(ns,file) %temp_ns
    }
    goto readmore
  }

  ; Si ya hemos encontrado <li id="nacionalU">, escribimos en el archivo
  .write -i $hget(ns,file) %temp_ns
  goto readmore
}


;===

on *:SOCKCLOSE:nacional:{
  .echo -st $+(12,$hget(ns,domen),) - Lectura completada!
  if ($exists($hget(ns,file)) && $lines($hget(ns,file)) > 0) { 
    msg $hget(ns,chan) 12Contenido de nacional:14 $read($hget(ns,file)) 
  }
}

Сохраненный txt-файл выглядит следующим образом:

[flist=black]HTTP/1.1 200 OK
Date: Thu, 25 Apr 2024 14:17:52 GMT
Content-Type: text/html; charset=UTF-8
Connection: close
Server: awex
X-Xss-Protection: 1; mode=block
X-Content-Type-Options: nosniff
X-Request-ID: 3485e3a9d7dab53b9782012d1f6b3be1

<ul><li id="nacionalU">
Intendente de Luque permanece en silencio y ausente tras muerte de mujeres arrastradas por raudal
</li><li id="nacionalU">
Video: Senad captura a supuesto traficante argentino en aparatoso procedimiento
</li><li id="nacionalU">
Fiscalía acusa y pide juicio oral para Joaquín Roa, ex ministro de la SEN
</li><li id="nacionalU">
Accidentado despegue de avión de la Fuerza Aérea en el Silvio Pettirossi
</li><li id="nacionalU">
Fiscal aún no tiene elementos penales contra municipio tras muerte de mujeres en Luque
[/flist]

Цитировать Сообщение 1

Поделиться227.04.2024 15:00

Автор: Epic
Администратор
Зарегистрирован: 30.09.2010
Откуда: Россия ★ Москва
Создано тем: 293
Сообщений: 1094

1544,358 написал(а):

Я изменил скрипт, который извлекает данные с веб-страницы, но разница заключается в следующем:
он ищет все элементы <li id="nationalU">,
он приводит меня, но также сохраняет заголовок и, когда он отображается в комнате, читает строку за строкой. И он читает не только содержимое, которое находится внутри тегов <li id="nationalU">

Судя по всему из-за плохого перевода вашего текста с испанского на русский язык суть вашего вопроса остаётся не совсем понятной и трудно уловимой.

Не могли бы вы более подробно объяснить/описать проблему, которая возникает у вас при использовании данного скрипта?
В чём именно вам нужна помощь? В скрипте что-то не работает или он делает не совсем то, что вы хотите?
Нужно что-то добавить/изменить/исправить или вам нужно изменить способ чтения сохранённого текстового файла?

Пожалуйста пришлите текстовый лог из чата, того что вы видите в окне канала #lps, после использования команды !nacional
Также приведите примеры того как вы хотите, чтобы работал этот скрипт и что он должен отображать в окне канала?

Подпись автора: [html]<style>img {vertical-align:middle;}.hnet{color:#FFFFFF;}.hstar{color:#DE0000;}.htext{font-family:Verdana;font-size:13px;color:#6E1E00;}.heading{font-family:Verdana;font-size:13px;font-weight:bold;background-color:#4897E7;}.stitle{font-family:Verdana;font-size:12px;}.dot{color:#808000;}.desc{color:#ADADAD;}a .curl{font-family:Verdana;font-size:13px;color:#3A92CD;}</style><table><tr><td width="20px" height="20px"><img src="https://forumstatic.ru/files/000d/c9/8c/34681.jpg"></td><td><a href="https://forum.epicnet.ru/viewtopic.php?id=234"><span class="heading"> <span class="hstar">★</span> <span class="hnet">EpicNet.Ru</span> <span class="hstar">★</span> </span><span class="htext"> - IRC Чат © 2008</span></a></td></tr><tr><td></td><td><div class="stitle"><span class="dot">•</span> <span class="desc">Вход через вебгейт:</span> <a href="http://irc.epicnet.ru"><span class="curl">http://irc.epicnet.ru</span></a><br><span class="dot">•</span> <span class="desc">Сервер:</span> irc.epicnet.ru <span class="desc">Порты:</span> 6667, 6668 (ssl)<br><div></td></tr></table>[/html]

Цитировать Сообщение 2

Поделиться327.04.2024 17:40

Автор: nexuxirc
Заинтересованный
Зарегистрирован: 28.09.2021
Сообщений: 13

This code I fixed, with trial and error, works fine.

Before it did not include the headers, now I was able to get it to exclude and not include in the text file.

Код:


on *:TEXT:!nacional:#code_: nacional

alias nacional {
  if ($hget(ns,stop)) { 
    .notice $nick Una sola vez por favor, espera por lo menos 10 segundos para enviar de vuelta. 
    | halt 
  } 
  .hadd -mu10 ns stop 1
  .hadd -m ns nick $nick
  .hadd -m ns chan $chan
  .hadd -m ns port 443
  .hadd -m ns domen multicuadros.000webhostapp.com
  .hadd -m ns webpage https://multicuadros.000webhostapp.com/index.php
  .hadd -m ns file $scriptdir $+ nacional.txt

  ; Eliminamos el archivo existente para asegurar que esté limpio
  if ($exists($hget(ns,file))) .remove $hget(ns,file)
  .sockclose nacional | if ($hget(ns,inc)) .hdel -sw ns inc

  if (https: isin $hget(ns,webpage)) var %flag -e
  .sockopen %flag nacional $hget(ns,domen) $hget(ns,port)
}

on *:SOCKOPEN:nacional:{
  if ($sockerr) { 
    echo -s $hget(ns,domen) : Server is not available. 
    | halt 
  }
  .sockwrite -nt $sockname GET $hget(ns,webpage) HTTP/1.0
  .sockwrite -nt $sockname Host: $hget(ns,domen)
  .sockwrite -nt $sockname User-Agent: *
  .sockwrite -nt $sockname Content-Type: text/html; charset=utf-8
  .sockwrite -nt $sockname $str($crlf,2)
  .sockwrite -nt $sockname
}

;===

on *:SOCKREAD:nacional:{
  if ($sockerr > 0) { 
    echo -s $hget(ns,domen) : $error 
    | halt 
  }

  :readmore
  .sockread %temp_ns

  ;========[ desde aca


  echo -s Checking line: %temp_ns
  if ($left(%temp_ns,5) == HTTP/ || $left(%temp_ns,4) == Date || $left(%temp_ns,12) == Content-Type || $left(%temp_ns,10) == Connection || $left(%temp_ns,6) == Server || $left(%temp_ns,16) == X-Xss-Protection || $left(%temp_ns,22) == X-Content-Type-Options || $left(%temp_ns,12) == X-Request-ID) {
    echo -s Skipping header line: %temp_ns
    goto readmore
  }


  ;if ($left(%temp_ns,5) == HTTP/ || $left(%temp_ns,6) == Date || $left(%temp_ns,7) == Content-Type || $left(%temp_ns,8) == Connection || $left(%temp_ns,9) == Server || $left(%temp_ns,10) == X-Xss-Protection || $left(%temp_ns,11) == X-Content-Type-Options || $left(%temp_ns,12) == X-Request-ID) {
  ; This line contains header information, skip reading it
  ; goto readmore
  ;}

  ; Check if the line is empty, indicating the end of the headers
  if (%temp_ns == $crlf ) {
    ; Indicate that we've reached the end of the headers
    .hadd ns headers_done 1
    goto readmore
  }

  ; Check if we've reached the marker indicating the start of the content
  if (%temp_ns == StartOfWeb) {
    ; Indicate that we've found the start of the content
    .hadd ns start_content 1
    goto readmore
  }

  ; Check if we've found the start of the content
  if ($hget(ns,start_content)) {
    ; Write the content to the file
    .write -i $hget(ns,file) %temp_ns
  }
  ;========[ desde aca

  if ($sockbr == 0) {
    ; Final de la lectura, comprobamos si hemos encontrado contenido
    if (%temp_ns) {
      ; Filtramos el contenido para encontrar las etiquetas deseadas
      var %ns_find = id="nacionalU"
      if (%temp_ns && $regex(%temp_ns, /%ns_find/)) {
        var %content = $strip($regml(%temp_ns,/<br>(.*)<\/li>/s))
        if (%content) {
          .write -c $hget(ns,file) %content
        }
      }
    }
    return
  }
  ;============
  if ($hget(ns,start_content) == 3) {
    inc -u3 %temp_ns
    .hadd -mu3 ns start_content 1
    goto readmore
  }
  ;============
  ; Si no hemos llegado al final, continuamos leyendo
  ; Ignoramos todas las líneas hasta que encontramos <li id="nacionalU">
  ; Si encontramos <br>, extraemos el texto después de <br> y lo escribimos en el archivo
  if ($regex(%temp_ns,/.*?<br>(.*)/i)) {
    var %textAfterBR = $regml(1)
    if (%textAfterBR) {
      .write -i $hget(ns,file) %textAfterBR
    }
  }

  ; Si ya hemos encontrado <br>, escribimos en el archivo
  ; Si ya hemos encontrado <br>, escribimos en el archivo sin etiquetas HTML
  var %textWithoutTags = $regsubex(%temp_ns, /(<([^>]+)>)/g, $chr(32))
  .write -i $hget(ns,file) %textWithoutTags

  goto readmore
}


;===

on *:SOCKCLOSE:nacional:{
  .echo -st $+(12,$hget(ns,domen),) - Lectura completada!
  if ($exists($hget(ns,file)) && $lines($hget(ns,file)) > 0) { 
    msg $hget(ns,chan) 12Contenido de nacional:14 $read($hget(ns,file)) 
  }
  .hinc -mu1 ns start_content
}

But now the text file has 3 empty lines before the extracted content. But it has empty lines between each line of text. And when running !national it reads the empty lines and the lines containing the text.

It's weird that it reads the empty lines, what I'm looking at is how to remove those empty lines before writing to the file or read the file without those empty lines.

[flist=black]- Line empty -
- Line empty -
- Line empty -
Luque: A tres días de muertes por raudal, Fiscalía recién inicia pericias
- Line empty -
Sábado caluroso y con posibles lluvias hacia el sur del país
- Line empty -
Hombre mata a su tío político en sector rural de Concepción
- Line empty -
“Paraguay es una isla conservadora, un país absolutamente temeroso de Dios”, dice Leite en cumbre mundial
- Line empty -
Sigue sin novedades el acuerdo de la tarifa luego de la reunión del Consejo de Administración de Itaipú
- Line empty -
Cae sospechoso de asaltar con arma blanca a dos mujeres en el Mercado 4
- Line empty -
Candente e incidentada aprobación de acta en Junta Municipal de Arroyito
- Line empty -
Gobernación de Central asiste a 800 familias de Limpio
- Line empty -
Familias de Villa Florida fueron desplazadas por las lluvias
- Line empty -[/flist]

And code html web is

<br>
Luque: A tres días de muertes por raudal, Fiscalía recién inicia pericias
<br>
Sábado caluroso y con posibles lluvias hacia el sur del país
<br>
Hombre mata a su tío político en sector rural de Concepción
<br>
“Paraguay es una isla conservadora, un país absolutamente temeroso de Dios”, dice Leite en cumbre mundial
<br>
Sigue sin novedades el acuerdo de la tarifa luego de la reunión del Consejo de Administración de Itaipú
<br>
Cae sospechoso de asaltar con arma blanca a dos mujeres en el Mercado 4
<br>
Candente e incidentada aprobación de acta en Junta Municipal de Arroyito
<br>
Gobernación de Central asiste a 800 familias de Limpio
<br>
Familias de Villa Florida fueron desplazadas por las lluvias

In this case, simplify the html code by separating each news item with <br>.

The idea is that eventually change instead of <br> can change by <li id=“noticiasU”> or any element.

Why? Because I have how to edit the html file.
If in the future I want to change the html tags, I can just modify the script that looks for the tags I want.

Отредактировано nexuxirc (27.04.2024 17:46)

Цитировать Сообщение 3

Поделиться427.04.2024 19:05

Автор: Epic
Администратор
Зарегистрирован: 30.09.2010
Откуда: Россия ★ Москва
Создано тем: 293
Сообщений: 1094

Хм... если вам не нужны и мешают эти пустые строки в текстовом файле, то тогда зачем вы их туда записываете и сохраняете?

При извлечении данных с веб-страницы вам нужно сделать условие, при котором код будет записывать всё, кроме пустых строк или строк, которые содержат определённые теги, чтобы игнорировать их и не записывать в файл.

Что то вроде этого:

Код:

if ((%temp_ns != $null) || (<br> !isin %temp_ns) || (id="nacionalU" !isin %temp_ns)) {
  .write -i $hget(ns,file) %temp_ns
}

Тем не менее, если по какой то причине вам всё-таки нужны эти пустые строки в файле, но они мешают правильному извлечению/чтению текста для вывода его на канале, то после того как файл уже будет создан и будет содержать необходимый вам текст, вы можете попробовать использовать подобный алиас, который будет выбирать случайную строку из файла, при условии, что она не будет содержать пустоту $null, в противном случае алиас продолжит выбирать дальше случайную строку, пока не выберет строку с текстом:

Код:

alias -l find_not_empty_line {
  if ($exists($hget(ns,file)) && $lines($hget(ns,file)) > 0) {
    :new_rand_line | var %str $read($hget(ns,file),nt) | if (%str == $null) goto new_rand_line
    msg $hget(ns,chan) 12Contenido de nacional:14 %str
  }
}

Этот алиас можно вызвать из любого места вашего кода.

[indent=0.8,0.5]
Либо вы можете сразу добавить этот фрагмент в финальную часть вашего кода:

Код:

on *:SOCKCLOSE:nacional:{
  .echo -st $+(12,$hget(ns,domen),) - Lectura completada!
  if ($exists($hget(ns,file)) && $lines($hget(ns,file)) > 0) {
    :new_rand_line | var %str $read($hget(ns,file),nt) | if (%str == $null) goto new_rand_line
    msg $hget(ns,chan) 12Contenido de nacional:14 %str
  }
  .hinc -mu1 ns start_content
}

Подпись автора: [html]<style>img {vertical-align:middle;}.hnet{color:#FFFFFF;}.hstar{color:#DE0000;}.htext{font-family:Verdana;font-size:13px;color:#6E1E00;}.heading{font-family:Verdana;font-size:13px;font-weight:bold;background-color:#4897E7;}.stitle{font-family:Verdana;font-size:12px;}.dot{color:#808000;}.desc{color:#ADADAD;}a .curl{font-family:Verdana;font-size:13px;color:#3A92CD;}</style><table><tr><td width="20px" height="20px"><img src="https://forumstatic.ru/files/000d/c9/8c/34681.jpg"></td><td><a href="https://forum.epicnet.ru/viewtopic.php?id=234"><span class="heading"> <span class="hstar">★</span> <span class="hnet">EpicNet.Ru</span> <span class="hstar">★</span> </span><span class="htext"> - IRC Чат © 2008</span></a></td></tr><tr><td></td><td><div class="stitle"><span class="dot">•</span> <span class="desc">Вход через вебгейт:</span> <a href="http://irc.epicnet.ru"><span class="curl">http://irc.epicnet.ru</span></a><br><span class="dot">•</span> <span class="desc">Сервер:</span> irc.epicnet.ru <span class="desc">Порты:</span> 6667, 6668 (ssl)<br><div></td></tr></table>[/html]

Цитировать Сообщение 4

Поделиться529.04.2024 14:30

Автор: nexuxirc
Заинтересованный
Зарегистрирован: 28.09.2021
Сообщений: 13

Thank you very much. It works very well. Now I set it to do more randomly. But it works fine. Thank you.

Цитировать Сообщение 5

Ответить

Страница: 1

Быстрый ответ

Напишите ваше сообщение и нажмите «Отправить»

Имя

EpicNet.Ru - Форум IRC Чата

Меню навигации

Пользовательские ссылки

Информация о пользователе

Извлечение данных с веб-сайта для отображения в чате свежих новостей

Сообщений 1 страница 5 из 5

Поделиться125.04.2024 17:18

Поделиться227.04.2024 15:00

Поделиться327.04.2024 17:40

Поделиться427.04.2024 19:05

Поделиться529.04.2024 14:30

Быстрый ответ