Fernando Guillen, a Freelance Web Developer » Blog Archive » Ruby, sanitizando tus tÃtulos en 2 lÃneas.

Ruby, sanitizando tus tÃtulos en 2 lÃneas.

Ayer me acostÃ© super contento, habÃa conseguido escribir una funciÃ³n para sanitizar strings, o como dicen por ahÃ: crear un SLUG (que todavÃa no he encontrado la definiciÃ³n exacta).

Si no sabes de que hablo se trata de convertir un “Hola mundo!, quÃ© tal?” en un “hola-mundo-que-tal” para las URLS bonitas y todo eso.

No habÃa sido fÃ¡cil pues Ruby se lleva mal con los caracteres no-ASCII y el castellano tiene muchos, habÃa que hacer un pequeÃ±o malabarismo con la gema Unicode.

Al final mi funciÃ³n se veÃa asÃ:

require 'unicode'
def to_slug( sentence, length = 64 )
Â  return if sentence.blank?
 
  wrong = ['Ã¡','Ã©','Ã','Ã³','Ãº','Ã¤','Ã«','Ã¯','Ã¶','Ã¼','Ã ','Ã¨','Ã¬','Ã²','Ã¹','Ã±','Ã§','Âº','Âª','_']
Â  right = ['a','e','i','o','u','a','e','i','o','u','a','e','i','o','u','n','s','o','a','-']
 
  sentence = sentence[0..length-1]
  sentence = Unicode.downcase( sentence )
 
  for i in 0..wrong.size-1
Â    sentence.gsub!( wrong[i], right[i] )
Â  end
 
  sentence.gsub!( /[^a-z0-9-]/, '-' ) # not letters of numbers
Â  sentence.gsub!( /-{2,}/, '-' )      # 2 or more '-' together becoming 1 '-'
Â  sentence.gsub!( /^-|-$/, '' ) unless sentence.size == 1 # '-' at begging or at end
  sentence
end

Estaba super orgulloso hasta que me despierto por la maÃ±ana y cambiando la pregunta a Google me encuentro con un… ‘inombrable’ que me hace esto:

require 'unicode'
def to_slug
Â  str = Unicode.normalize_KD(self).gsub(/[^\x00-\x7F]/n,'')
Â  str = str.gsub(/\W+/, '-').gsub(/^-+/,'').gsub(/-+$/,'').downcase
end

Exactamente (casi) lo que yo tenÃa pero en 2 lÃneas.

Â¡AsÃ es Ruby!

Al final he hecho algÃºn cambio y cogido lo bueno de uno y de otro y le he aÃ±adido soporte para STOPWORDS:

STOPWORDS = [
  'de','a','que','no','tiene','en','para',
  'por','le','la','lo','las','los','el',
  'una','un'
]
 
def to_slug( length = 64, drop_stopwords = false )
  return "" if self.length == 0
 
  str = Unicode.normalize_KD(self).gsub(/[^\x00-\x7F]/n,'').downcase
 
  # stopwords
  if drop_stopwords
    STOPWORDS.each do |stopword|
      str.gsub!( /\s#{stopword}\s|^#{stopword}\s/, ' ' )
    end
  end
 
  str = str.gsub(/[^A-Za-z0-9]/, '-').gsub(/^-+/,'').gsub(/-+$/,'').downcase
  str = str[0..length-1]
end

Por su puesto que se recomienda completar la lista de STOPWORDS con las que quieras.

This entry was posted on Sábado, Agosto 30th, 2008 at 7:18 pm and is filed under how to, programando, ruby. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

David Calavera Says:
Agosto 30th, 2008 at 7:56 pm

jajaja, de esas puedes encontrar unas cuantas en refactormycode.com. Una de las mejores que recuerdo fue, como parsear los parÃ¡metros de una url y meterlos en un hash, tras muchos comentarios de todo tipo por fin alguien llegÃ³ a esto:

CGI.parse(URI.parse(url).query)

fguillen Says:
Agosto 30th, 2008 at 8:08 pm

Buen link.. muy divertido :)

Juanjo BazÃ¡n Says:
Agosto 30th, 2008 at 9:32 pm

Echale tambiÃ©n un ojo al mÃ©todo normalize que te da Rails:
http://api.rubyonrails.org/classes/ActiveSupport/Multibyte/Handlers/UTF8Handler.html#M000155

YoNoSoyTu Says:
Agosto 30th, 2008 at 9:58 pm

No necesitas la gema de Unicode para ello si utilizas Rails, que viene con Unicode incorporado:

‘Ã Ã©Ã¯Ã´ÃºÃ±’.chars.normalize(:kd).gsub(/[^\x00-\x7F]/n, ”) # => “aeioun”

(Es una dependencia menos)

AndrÃ©s Panitsch Says:
Agosto 30th, 2008 at 10:11 pm

PodÃ©s pensar que tu funciÃ³n, aunque mÃ¡s larga, es clara para vos y por ello podÃ©s mantenerla y modificarla si alguna vez te da algÃºn problema… si en alguna situaciÃ³n ese cÃ³digo de dos lÃneas no funciona la vas a ver de a cuadros.

Me parece el camino acertado entender esas dos lÃneas y utilizarlas para mejorar tu cÃ³digo en vez de tomarlas a ciegas.

Todo esto muy teÃ³rico porque si bien entiendo regular expressions (con lo cual imagino lo que hace) la sintaxis de ruby se me escapa completamente :-)

Muy buen post!

fguillen Says:
Agosto 30th, 2008 at 10:11 pm

Juanjo: thx.. a ver si se levanta api.rubyonrails.org para verlo

Daniel (;P): cierto, lo he comprobado.

fguillen Says:
Agosto 30th, 2008 at 10:15 pm

AndrÃ©s: Estoy contigo en que hay que anteponer lo auto-explicativo de un cÃ³digo a lo corto que sea.

christos zisopoulos Says:
Septiembre 10th, 2008 at 6:41 pm

Y ahora hay esto:

http://github.com/rails/rails/commit/90366a1521659d07a3b75936b3231adeb376f1a4

fguillen Says:
Septiembre 10th, 2008 at 9:28 pm

christos demasiado tosco:

?> "hola! quÃ© pasa con las Ã±Ã‘Ã±Ã±s y tal y cuÃ¡l?".gsub(/[^a-z0-9]+/i, "-").downcase
=> "hola-qu-pasa-con-las-s-y-tal-y-cu-l-"

El to_slug que tengo hace esto:

>> "hola! quÃ© pasa con las Ã±Ã‘Ã±Ã±s y tal y cuÃ¡l?".to_slug
=> "hola-que-pasa-con-las-nnnns-y-tal-y-cual"

Pero buscando en github me encuentro esto:
http://github.com/ludo/to_slug/tree/master

Que mola bastante.

ceritium Says:
Septiembre 19th, 2008 at 12:13 pm

Sin duda interesante el post y el hilo de comentarios, pero nadie ha dicho nada sobre lo positivo o negativo del uso de las stop words en el posicionamiento de una pÃ¡gina.

Â¿Hasta que punto es interesante usarlas? A veces interesa posicionar por una frase concreta y no por palabras clave.

Â¿Como lo veis?

fguillen Says:
Septiembre 19th, 2008 at 12:57 pm

@ceritium: Yo dejo las stop-words en mis slugs. Si te sirve de opiniÃ³n.

Veo mÃ¡s sentido que sean obviadas por el propio indexador (en este caso google) pero los slugs me gustan mÃ¡s con la frase completa como tÃº dices.

fguillen Says:
Septiembre 29th, 2008 at 4:33 pm

Rolando Says:
Agosto 19th, 2015 at 4:59 pm

If you desire to improve your knowledge only keep
visiting this web site and be updated with the hottest news update posted here.

Shelli Says:
Septiembre 24th, 2015 at 9:36 am

Thanks for sharing your info. I truly appreciate your efforts and I am waiting for your further
post thanks once again.

Pauline Says:
Septiembre 24th, 2015 at 5:44 pm

whoah this weblog is excellent i love studying your articles.
Keep up the great work! You understand, many individuals are searching round for this information, you can help them greatly.

leptin resistance Says:
Octubre 13th, 2015 at 8:34 am

An impressive share! I have just forwarded this onto a colleague who has been doing a little
homework on this. And he in fact bought me breakfast simply because I found it for
him… lol. So allow me to reword this…. Thanks
for the meal!! But yeah, thanks for spending the time
to discuss this issue here on your web page.

http://studentresourceservices.com/UserProfile/tabid/61/UserID/36313/Default.aspx Says:
Abril 9th, 2018 at 2:25 pm

Escoja historias y libros que puedan interesarle.

chargers Says:
Julio 3rd, 2018 at 7:42 am

Great post. I was checking continuously this blog and I’m impressed!
Very helpful info specifically the last part :) I handle such info much.

I was looking for this particular information for a very long time.
Thanks and best of luck.

Thinking on hiring me?

Fernando Guillén

a Freelance Web Developer

Ruby, sanitizando tus tÃtulos en 2 lÃneas.

18 Comments to “Ruby, sanitizando tus tÃtulos en 2 lÃneas.”

Leave a comment

Archives

Categories

Thinking on hiring me?

Fernando Guillén

a Freelance Web Developer

Ruby, sanitizando tus tÃ­tulos en 2 lÃ­neas.

18 Comments to “Ruby, sanitizando tus tÃ­tulos en 2 lÃ­neas.”

Leave a comment

Archives

Categories

Ruby, sanitizando tus tÃtulos en 2 lÃneas.

18 Comments to “Ruby, sanitizando tus tÃtulos en 2 lÃneas.”